PCA Utilities
The PCA Utilities package provides small software routines for plotting PCA/OPLS scores and building dendrograms based on those scores. This page outlines how to install and use the pca-utils software.
Obtaining pca-utils
You can obtain the source code to pca-utils by clicking here.
Installing pca-utils
The PCA utilities are a set of command line open-source UNIX/Linux programs. The software is highly portable: provided your distribution has glibc, it should compile without incident. Once you have the source code, run these commands to install it:
cd /path/to/source/tarball tar xf pca-utils-YYYYMMDD.tar.gz cd pca-utils-YYYYMMDD/ make sudo make install
By default, the programs install to /usr/bin, but you can easily change this by modifying the Makefile if you need to.
Plotting scores with ellipses
For an input list file called list.txt, you can quickly generate a postscript plot file (in this case called plot.ps) that shows your PCA scores with 95% confidence ellipses around each group:
pca-ellipses -1 44.4 -2 22.2 -i list.txt -o plot.ps
In the above statement, the optional arguments -1 and -2 were used to set contributions of PC1 and PC2 to 44.4% and 22.2%, respectively. You can then edit plot.ps to your liking. If you need a bit more control over your output, you can generate gnuplot-readable ellipses instead like so:
pca-ellipses -i list.txt > ellipses.txt awk -F '\t' '/^[0-9]/{print$3,$4}' list.txt > points.txt gnuplot> plot 'points.txt' w p, 'ellipses.txt' w l
Of course, in the second case, you're free to style everything any way you like. Happy hacking!
Generating dendrograms
Two complementary methods exist for generating trees. The first uses Euclidean distances and bootstrapping statistics, while the second uses Mahalanobis distances and p-values. For datasets containing well-separated groups in scores space, the bootstrapping method will do fine. However, highly separation in overlapped data may be better quantified with p-values in many cases.
Using bootstrapping
FIXME
Using parameterizing
FIXME
Calculating p-values
FIXME
Calculating basic statistics
FIXME
Generating random datasets
FIXME