Skip Navigation

Impact of Noise on Principal Component Analysis of NMR Data

Principal component analysis (PCA) is routinely applied to the study of NMR based metabolomic data. PCA is primarily used to identify relative changes in the concentration of metabolites to identify trends or characteristics within the NMR data that permits discrimination between various samples that differ in their source or treatment. The data are generally presented as a two or three-dimensional plot (scores plot) where the coordinate axis correspond to the principal components (representing the directions of the two or three largest variations in the data set. Effectively, each NMR spectrum is reduced to a single point in the PC coordinate axis, where similar spectra will cluster together and variations along any of the PC axes will highlight experimental differences between the spectra. A common concern with PCA of NMR data are the potential over emphasis of small changes in high concentration metabolites that would over-shadow significant and large changes in low-concentration components that may lead to a skewed or irrelevant clustering of the NMR data. We have identified an additional concern, very small and random fluctuations within the noise of the NMR spectrum can also result in large and irrelevant variations in the PCA clustering. Our analysis of 'ideal' metabolomic data (NMR spectra of ATP, ATP+glucose and glucose) indicates that this inclusion of noise may result in significant and irrelevant spreading of the PCA scores clusters that may inhibit proper interpretation of the data. Alleviation of this problem is obtained by simply excluding the noise region from the PCA by a judicious choice of a threshold above the spectral noise.