Noise removal for PCA: Difference between revisions

From Powers Wiki
(Created page with "Category:Protocols Category:Metabolomics ==Prepare the data set== The data can be prepared in txt file from ACDLab 1D processor. After the spectra are phased and re...")
 
No edit summary
Line 3: Line 3:


==Prepare the data set==
==Prepare the data set==
The data can be prepared in txt file from ACDLab 1D processor.  
1. The data can be prepared in txt file from ACDLab 1D processor.  


After the spectra are phased and referenced correctly, click the "Integration" icon in the tool bar.  
2. After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.  


Click the "Series" from the menu and choose "Table of common integrals".  
3. Click the "Series" from the menu and choose "Table of common integrals".  


Then export table to the targeted file folder.  
4. Then export table to the targeted file folder.  


Open the file in Office Excel. Delete the first row and insert a new row below the first row of sample numbers.  
5. Open the file in Office Excel. Delete the first row and insert a new row below the first row of sample numbers.  


Fill the row with sample class names.  
6. Fill the row with sample class names.  


==Z score transformation==
==Z score transformation==
Line 25: Line 25:
1. For each class, calculate the standard deviation and average values.
1. For each class, calculate the standard deviation and average values.
   
   
2. Calculate the absolute value for relative standard deviation by using the standard deviation to be divided by the absolute average values.  
2. Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.  


3. Find out the maximum for each row. If the maximum is smaller than 0, indicating all the z score values are smaller than 0.  
3. Find out the maximum for each row. If the maximum is smaller than 0. It indicates all the z score values are smaller than 0.  


4. Find out the region that no peak exists, find out the maximum of the relative standard deviation for each class.
4. Find out the region that no peak exists, find out the maximum of the relative standard deviation for each class.

Revision as of 05:44, 13 October 2012


Prepare the data set

1. The data can be prepared in txt file from ACDLab 1D processor.

2. After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.

3. Click the "Series" from the menu and choose "Table of common integrals".

4. Then export table to the targeted file folder.

5. Open the file in Office Excel. Delete the first row and insert a new row below the first row of sample numbers.

6. Fill the row with sample class names.

Z score transformation

Z score is used for normalizing the individual spectrum. The scaling of the data set across all the spectra is performed in SIMCA-P+. (UV scaling is by default)


Noise cutoff calculation

0. The calculation is based on the z-score data set.

1. For each class, calculate the standard deviation and average values.

2. Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.

3. Find out the maximum for each row. If the maximum is smaller than 0. It indicates all the z score values are smaller than 0.

4. Find out the region that no peak exists, find out the maximum of the relative standard deviation for each class.

5. Only when the z score is smaller than 0, AND the value of relative standard deviation is smaller than the maximum of the noise region, then that bin can be considered as a noise region.


Noise cutoff application

If the data set is prepared for PCA, only the noise region across the whole data set can be removed. For data set for OPLS-DA, the noise region determined for each class can be removed separately.