# Noise removal for PCA

From BioNMR

## Contents |

## Prepare the data set

- The data can be prepared in txt file from ACDLab 1D processor.
- After the spectra are "Autophased" and "Referenced" to TMSP correctly, click the "Integration" icon in the tool bar.
- Click the "Series" from the menu and choose "Table of common integrals".
- Then export table to the targeted file folder.
- Open the file in Office Excel. Delete the first row and insert a new row below the row of sample numbers.
- Fill the row with sample class names.

## Z score transformation

Z score is used for normalizing the individual spectrum:

The scaling of the data set across all the variables is performed in SIMCA-P+. (UV scaling is by default)

To enter into excel:

- Click the first row, first column of data
- Add in minus sign
- Click first average data point
- Put () around first 2 terms in equation
- Add in division sign
- Click first standard deviation data point
- Add dollar signs after letter in standard deviation equation point and average equation point (Ex: C$480)
- Hit enter, click and drag columns.

## Noise cutoff calculation

This is based on the Excel template that is exported directly from the ACDLabs. The calculation is based on the z-score data set.

- Across the board, calculate the standard deviation and average values for each row.
- Calculate the absolute value for relative standard deviation by dividing the standard deviation by the absolute average values.
- Find out the average value and standard deviation for the pre-assigned noise region of bins for each sample (chemical shift<0ppm or >10ppm). Calculate the cutoff equals to the average plus 3 times standard deviation.
- Only when the z score is smaller than 0, AND the value of relative standard deviation is smaller than the cutoff of the noise, then that bin can be considered as a noise bin. All noise region-defined bins should be set to 0 and remove from the analysis data sets.

## Noise cutoff application

If the data set is prepared for PCA, only the noise region across the whole data set can be removed. For data set for OPLS-DA, the noise region determined for each class can be removed separately.