2D-PCA/OPLS

From BioNMR
Jump to: navigation, search

PCA and OPLS can be applied to both 1D and 2D datasets using our in house software, mvapack.

An example script to perform both PCA and OPLS is included below. This script, hsqc.m has been verified using Octave 3.6.0. The % lines are comments, that are included to break up discrete sections and explain what is happening at each step.


% source mvapack. addpath('/opt/mvapack'); % source the parameters. if (fexist('parms.m'))

 source('parms.m');

else

 error('parameter file not found');

end % define a small function to check if any region of interest contains a % maximum intensity that is greater than a specified threshold. function y = worthiness (X, ab, thresh)

 if (ismatrix(X))
   y = any(max(X, [], 2) > thresh);
 elseif (iscell(X))
   y = any(cellfun(@(x) max(vec(x)) > thresh, X));
 end

end % check if the processed data has been saved. if (fexist('hsqc.dat.gz'))

 % load the preprocessed data.
 load('hsqc.dat.gz');

else

 % load the dataset.
 [F.data, F.parms, F.t] = loadnmr(F.dirs, F.doswap);
 % apply a cosine window function along 1H and 13C.
 F.data = apodize(F.data, F.parms, F.apod.fn, F.apod.opts);
 % zero fill once in 1H and twice in 13C.
 F.data = zerofill(F.data, F.parms, F.nzf);
 % fourier transform the time domain data.
 [S.data, S.ppm, S.hz] = nmrft(F.data, F.parms);
 % manually phase-correct the spectral data.
 S.data = phase(S.data, F.parms, S.phc0, S.phc1);
 % and apply the referencing values to the chemical shift axes.
 S.ppm{1} = refadj(S.ppm{1}, S.ref.old(1), S.ref.new(1));
 S.ppm{2} = refadj(S.ppm{2}, S.ref.old(2), S.ref.new(2));
 % extract the real portion of the spectral data.
 X.ppm = S.ppm;
 X.data = realnmr(S.data, F.parms);
 % compute the noise floor of the subspectral data.
 [X.noise.mu, X.noise.sigma] = estnoise(X.data, F.parms);
 X.noise.th = median(X.noise.mu + X.noise.k .* X.noise.sigma);
 % cut out a subspectrum from the complete spectral data.
 [X.data, X.ppm] = subspect(X.data, X.ppm, F.parms, ...
   [X.roi(1), X.roi(3)], [X.roi(2), X.roi(4)]);
 % grid the subspectral axes.
 [X.grid.x, X.grid.y] = meshgrid(X.ppm{1}, X.ppm{2});
 % build the contour level vector.
 X.grid.lev = logspace(log10(X.grid.cmin), log10(X.grid.cmax), X.grid.cnum);
 % compute the gridded intensity matrix.
 X.grid.z = zeros(size(X.data{1}));
 for idx = 1 : length(X.data)
   X.grid.z += X.data{idx};
 end
 clear idx
 % build an roi matrix of the entire dataset.
 B.all = [min(X.ppm{1}), max(X.ppm{1}), min(X.ppm{2}), max(X.ppm{2})];
 % uniformly bin the subspectral data.
 [junk, B.un.ppm, B.un.widths] = ...
   binunif(X.data, X.ppm, F.parms, B.wbin);
 B.un.roi = bin2roi(B.un.ppm, B.un.widths);
 % adaptively bin the subspectral data.
 [junk, B.ai.ppm, B.ai.widths] = ...
   binadapt(X.data, X.ppm, F.parms, B.wbin, B.res);
 B.ai.roi = bin2roi(B.ai.ppm, B.ai.widths);
 clear junk
 % remove all regions of interest below the noise threshold.
 % ... from the uniformly binned data ...
 keepers = roifun(X.data, X.ppm, F.parms, B.un.roi, ...
                  @(x, ab) worthiness(x, ab, X.noise.th));
 B.un.roi = B.un.roi(find(keepers), :);
 % ... and from the adaptively binned data ...
 keepers = roifun(X.data, X.ppm, F.parms, B.ai.roi, ...
                  @(x, ab) worthiness(x, ab, X.noise.th));
 B.ai.roi = B.ai.roi(find(keepers), :);
 clear keepers
 % re-bin the subspectral data using the noise-removed regions of interest.
 [B.un.data, B.un.ppm, B.un.widths] = ...
   binmanual(X.data, X.ppm, F.parms, B.un.roi);
 [B.ai.data, B.ai.ppm, B.ai.widths] = ...
   binmanual(X.data, X.ppm, F.parms, B.ai.roi);
 % build a data matrix from the regions of interest.
 D.ai.data = roi2data(X.data, X.ppm, F.parms, B.ai.roi);
 D.un.data = roi2data(X.data, X.ppm, F.parms, B.un.roi);
 D.ev.data = roi2data(X.data, X.ppm, F.parms, B.all);
 % remove outlying observations.
 if (!isempty(cls.rm))
   B.ai.data = rmobs(B.ai.data, cls.rm);
   B.un.data = rmobs(B.un.data, cls.rm);
   D.ai.data = rmobs(D.ai.data, cls.rm);
   D.un.data = rmobs(D.un.data, cls.rm);
   D.ev.data = rmobs(D.ev.data, cls.rm);
   cls.Y = rmobs(cls.Y, cls.rm);
 end
 % normalize the data matrices.
 B.ai.data = B.nrm(B.ai.data);
 B.un.data = B.nrm(B.un.data);
 D.ai.data = D.nrm(D.ai.data);
 D.un.data = D.nrm(D.un.data);
 D.ev.data = D.nrm(D.ev.data);
 % compute a pca model from the adaptively binned data.
 mdl.pca.ai = pca(B.ai.data, mdl.pca.scal);
 mdl.pca.ai = addclasses(mdl.pca.ai, cls.Y);
 mdl.pca.ai = addlabels(mdl.pca.ai, cls.lbl);
 % compute a pca model from the uniformly binned data.
 mdl.pca.un = pca(B.un.data, mdl.pca.scal);
 mdl.pca.un = addclasses(mdl.pca.un, cls.Y);
 mdl.pca.un = addlabels(mdl.pca.un, cls.lbl);
 % compute an opls model from the adaptively binned data.
 mdl.opls.bins.ai = opls(B.ai.data, cls.Y, mdl.opls.scal, ...
   mdl.opls.cv.sm, [1, 1]);
 mdl.opls.bins.ai = addlabels(mdl.opls.bins.ai, cls.lbl);
 % cross-validate the adaptively binned opls model.
 mdl.opls.bins.ai.cv.anova = cvanova(mdl.opls.bins.ai);
 mdl.opls.bins.ai.cv.perm = permtest(mdl.opls.bins.ai, mdl.opls.cv.nperm);
 mdl.opls.bins.ai.CV = [];
 % compute an opls model from the uniformly binned data.
 mdl.opls.bins.un = opls(B.un.data, cls.Y, mdl.opls.scal, ...
   mdl.opls.cv.sm, [1, 1]);
 mdl.opls.bins.un = addlabels(mdl.opls.bins.un, cls.lbl);
 % cross-validate the uniformly binned opls model.
 mdl.opls.bins.un.cv.anova = cvanova(mdl.opls.bins.un);
 mdl.opls.bins.un.cv.perm = permtest(mdl.opls.bins.un, mdl.opls.cv.nperm);
 mdl.opls.bins.un.CV = [];
 % compute an opls model from the adaptively compressed full-res data.
 mdl.opls.full.ai = opls(D.ai.data, cls.Y, mdl.opls.scal, ...
   mdl.opls.cv.sm, [1, 1]);
 mdl.opls.full.ai = addlabels(mdl.opls.full.ai, cls.lbl);
 % cross-validate the adaptively compressed full-res opls model.
 mdl.opls.full.ai.cv.anova = cvanova(mdl.opls.full.ai);
 mdl.opls.full.ai.cv.perm = permtest(mdl.opls.full.ai, mdl.opls.cv.nperm);
 mdl.opls.full.ai.CV = [];
 % compute an opls model from the uniformly compressed full-res data.
 mdl.opls.full.un = opls(D.un.data, cls.Y, mdl.opls.scal, ...
   mdl.opls.cv.sm, [1, 1]);
 mdl.opls.full.un = addlabels(mdl.opls.full.un, cls.lbl);
 % cross-validate the uniformly compressed full-res opls model.
 mdl.opls.full.un.cv.anova = cvanova(mdl.opls.full.un);
 mdl.opls.full.un.cv.perm = permtest(mdl.opls.full.un, mdl.opls.cv.nperm);
 mdl.opls.full.un.CV = [];
 % compute an opls model from the uncompressed full-res data.
 mdl.opls.full.ev = opls(D.ev.data, cls.Y, mdl.opls.scal, ...
   mdl.opls.cv.lg, [1, 1]);
 mdl.opls.full.ev = addlabels(mdl.opls.full.ev, cls.lbl);
 % cross-validate the uncompressed full-res opls model.
 mdl.opls.full.ev.cv.anova = cvanova(mdl.opls.full.ev);
 mdl.opls.full.ev.cv.perm = permtest(mdl.opls.full.ev, mdl.opls.cv.nperm);
 mdl.opls.full.ev.CV = [];
 % build backscaled full-res spectral matrices.
 D.back.ai = data2roi(backscale(mdl.opls.full.ai), X.ppm, F.parms, B.ai.roi);
 D.back.un = data2roi(backscale(mdl.opls.full.un), X.ppm, F.parms, B.un.roi);
 D.back.ev = data2roi(backscale(mdl.opls.full.ev), X.ppm, F.parms, B.all);
 % save the processed data.
 save('-binary', '-z', 'hsqc.dat.gz');

end

---


Generating Backscaled Loadings Plots for 2D Data

It is possible to generate backscaled loadings plots for 2-D (or higher) datasets, though the process can be slightly different depending on the dataset being considered. Scripts for doing so, with usage notes are available in /home/eriekeberg/research/Scripts/backscaleplots or by request.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox