Noise removal for PCA: Difference between revisions

From Powers Wiki
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Protocols]]
 
[[Category:Metabolomics]]
[[category:Data_Processing_and_Analysis]]


==Prepare the data set==
==Prepare the data set==
1. The data can be prepared in txt file from ACDLab 1D processor.  
# The data can be prepared in txt file from ACDLab 1D processor.
# After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.
# Click the "Series" from the menu and choose "Table of common integrals".
# Then export table to the targeted file folder.
# Open the file in Office Excel. Delete the first row and insert a new row below the row of sample numbers.
# Fill the row with sample class names.


2. After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.
==Z score transformation==
 
Z score is used for normalizing the individual spectrum:
3. Click the "Series" from the menu and choose "Table of common integrals".
 
4. Then export table to the targeted file folder.


5. Open the file in Office Excel. Delete the first row and insert a new row below the row of sample numbers.
<math>Z=\frac{x_i-\overline x}{\sigma}</math>


6. Fill the row with sample class names.
The scaling of the data set across all the variables is performed in SIMCA-P+. (UV scaling is by default)
 
==Z score transformation==
Z score is used for normalizing the individual spectrum. The scaling of the data set across all the spectra is performed in SIMCA-P+. (UV scaling is by default)


To enter into excel:
To enter into excel:
1) Click the first row, first column of data
# Click the first row, first column of data
2) Add in minus sign
# Add in minus sign
3) Click first average data point
# Click first average data point
4) Put () around first 2 terms in equation  
# Put () around first 2 terms in equation  
5) Add in division sign  
# Add in division sign  
6) Click first standard deviation data point
# Click first standard deviation data point
7) Add dollar signs after letter in standard deviation equation point and average equation point (Ex: C$480)
# Add dollar signs after letter in standard deviation equation point and average equation point (Ex: C$480)
8) Hit enter, click and drag columns.  
# Hit enter, click and drag columns.  
 
 
<math>Z=(x_i-\overline x)/\sigma</math>


==Noise cutoff calculation==
==Noise cutoff calculation==


This is based on the Excel template that is exported directly from the ACDLabs.
This is based on the Excel template that is exported directly from the ACDLabs. The calculation is based on the z-score data set.


0. The calculation is based on the z-score data set.
# Across the board, calculate the standard deviation and average values for each row.
# Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.
# Find out the average value and standard deviation for the pre-assigned noise region of bins for each sample (chemical shift<0ppm or >10ppm). Calculate the cutoff equals to the average plus 3 times standard deviation.
# Only when the z score is smaller than 0, AND the value of relative standard deviation is smaller than the cutoff of the noise, then that bin can be considered as a noise bin. All noise region-defined bins should be set to 0 and remove from the analysis data sets.


1. Cross the board, calculate the standard deviation and average values for each row.
==Noise cutoff application==
If the data set is prepared for PCA, only the noise region across the whole data set can be removed. For data set for OPLS-DA, the noise region determined for each class can be removed separately.
2. Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.  


3. Find out the average value and standard deviation for the pre-assigned noise region of bins for each sample (chemical shift<0ppm or >10ppm). Calculate the cutoff equals to the average plus 3 times standard deviation.


4. Only when the z score is smaller than 0, AND the value of relative standard deviation is smaller than the cutoff of the noise, then that bin can be considered as a noise bin. All noise region-defined bins should be set to 0 and remove from the analysis data sets.
==Noise Removal Script==
  idx = findnearest (x, a)
  [Xrm, abrm] = rmnoise (X, ab, idx)


==Noise cutoff application==
Note: Reference the MVAPACK manual for a detailed description of noise removal.
If the data set is prepared for PCA, only the noise region across the whole data set can be removed. For data set for OPLS-DA, the noise region determined for each class can be removed separately.

Latest revision as of 06:22, 20 January 2022


Prepare the data set

  1. The data can be prepared in txt file from ACDLab 1D processor.
  2. After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.
  3. Click the "Series" from the menu and choose "Table of common integrals".
  4. Then export table to the targeted file folder.
  5. Open the file in Office Excel. Delete the first row and insert a new row below the row of sample numbers.
  6. Fill the row with sample class names.

Z score transformation

Z score is used for normalizing the individual spectrum:

The scaling of the data set across all the variables is performed in SIMCA-P+. (UV scaling is by default)

To enter into excel:

  1. Click the first row, first column of data
  2. Add in minus sign
  3. Click first average data point
  4. Put () around first 2 terms in equation
  5. Add in division sign
  6. Click first standard deviation data point
  7. Add dollar signs after letter in standard deviation equation point and average equation point (Ex: C$480)
  8. Hit enter, click and drag columns.

Noise cutoff calculation

This is based on the Excel template that is exported directly from the ACDLabs. The calculation is based on the z-score data set.

  1. Across the board, calculate the standard deviation and average values for each row.
  2. Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.
  3. Find out the average value and standard deviation for the pre-assigned noise region of bins for each sample (chemical shift<0ppm or >10ppm). Calculate the cutoff equals to the average plus 3 times standard deviation.
  4. Only when the z score is smaller than 0, AND the value of relative standard deviation is smaller than the cutoff of the noise, then that bin can be considered as a noise bin. All noise region-defined bins should be set to 0 and remove from the analysis data sets.

Noise cutoff application

If the data set is prepared for PCA, only the noise region across the whole data set can be removed. For data set for OPLS-DA, the noise region determined for each class can be removed separately.


Noise Removal Script

  idx = findnearest (x, a)
  [Xrm, abrm] = rmnoise (X, ab, idx) 

Note: Reference the MVAPACK manual for a detailed description of noise removal.