1H NMR Analysis (SIMCA): Difference between revisions

From Powers Wiki
(Created page with "Category:Protocols Category:Metabolomics ===1H NMR Analysis (SIMCA)=== ==Excel Processing== Secondary observation ID labels can be added by inserting a blank second...")
 
No edit summary
Line 2: Line 2:
[[Category:Metabolomics]]
[[Category:Metabolomics]]


===1H NMR Analysis (SIMCA)===
=1H NMR Analysis (SIMCA)=


==Excel Processing==  
==Excel Processing==  
Secondary observation ID labels can be added by inserting a blank second row and filling in the desired labels. This does NOT affect the raw data.  It merely makes “viewing” easier in the SIMCA output.
Secondary observation ID labels can be added by inserting a blank second row and filling in the desired labels. This does NOT affect the raw data.  It merely makes “viewing” easier in the SIMCA output.


Note:  
Note:  
Line 11: Line 11:
Use the standard method to remove instrument noise.
Use the standard method to remove instrument noise.


For PLS-DA analysis, the y-variable (e.g., the paralysis score for mouse urine metabolomics) values are placed in a row below the row containing the last NMR integral bucket value. When the spreadsheet is transposed in SIMCA, this last row becomes the last column.   
For PLS-DA analysis, the y-variable (e.g., the paralysis score for mouse urine metabolomics) values are placed in a row below the row containing the last NMR integral bucket value. When the spreadsheet is transposed in SIMCA, this last row becomes the last column.   


After noise removal, the data should be autoscaled. However, the application of this step is still being explored. According to van den Berg and coauthors (see BMC Genomics 2006, 7, 142), the scaling is applied to the individual bins. 
After noise removal, the data should be autoscaled. By default, SIMCA-P will use "UV" autoscale the imported data set.  


==PCA Analysis==
==PCA Analysis==


1. Start a “new” project by opening the desired integral table spreadsheet.  
1. Start a “new” project by opening the desired integral table spreadsheet   


2. Click on new project icon
2. Click on new project icon
Line 25: Line 25:
4. Click “Open”
4. Click “Open”


5. Click on the comma option.  The data formatting should now look correct.  Click “OK”
5. Click on the comma option.  The data formatting should now look correct


6. Project type should be SIMCA-P Project.
6. Click “OK”


7. Click “next” button.
7. Project type should be SIMCA-P Project


8. Respond “No” to the question regarding the one row that is “empty or contains only text”.  It is probably the row of labels that was inserted
8. Click “next” button
using Excel.


9. Use the “Commands” button (bottom left) to “transpose” the data set.  The rows should now correspond to a given NMR spectrum.  Each row is an
9. Respond “No” to the question regarding the one row that is “empty or contains only text”
“observation”.  The integral values contained in each row are referred to as “variables”.
  It is probably the row of labels that was inserted using Excel.


10. After transposing, make sure that first column has been identified as the “primary observation ID’s”.  To do this, click on the button at the
10. Use the “Commands” button (bottom left) to “transpose” the data set. The rows should now correspond to a given NMR spectrumEach row is an “observation”. The integral values contained in each row are referred to as “variables”.
top of the column (note: it may already be labeled as primary) and then click on the “Observation IDs primary” button. The observation IDs should now  
be color coded with the observation ID primary color (dark yellow or mustard color)If secondary observation IDs were inserted using Excel, then
click on the second column button and then click on “Observation IDs secondary”. Again, the column should be color coded to the correct color (light
yellow).  


11. Click on the butoon for the first row  (note: it may already be labeled as primary) and then click on the “Variable IDs primary”  
11. After transposing, make sure that first column has been identified as the “primary observation ID’s”.  To do this, click on the button at the top of the column
button to color code the first row (green). Next, repeat for the second row containing the ppm ranges that define each bucket.  These will be the “Variable IDs secondary” and are turquoise. Click “Next” button in the lower-right.  Click the “Finish” button.
    (Note: it may already be labeled as primary) and then click on the “Observation IDs primary” button. The observation IDs should now be color coded with the observation ID primary color (dark yellow or mustard color). If secondary observation IDs were inserted using Excel, then click on the second column button and then click on “Observation IDs secondary”. Again, the column should be color coded to the correct color (light yellow).  


12. Exclude the solvent region and “ends” of the spectra.
12. Click on the butoon for the first row  (note: it may already be labeled as primary) and then click on the “Variable IDs primary” button to color code the first row (green).  Next, repeat for the second row containing the ppm ranges that define each bucket. These will be the “Variable IDs secondary” and are turquoise.  Click “Next” button in the lower-right.  


13. Highlight the desired rows by dragging the cursor along the top set of buttons and then click the “exclude” button (along the left edge).
13. Click the “Finish” button


14. Repeat for each desired region.
14. Exclude the solvent region and “ends” of the spectra


15. Click “Next” then click “Finish”
15. Highlight the desired rows by dragging the cursor along the top set of buttons and then click the “exclude” button (along the left edge)


16. Using the menu bar, click on the “autofit” button.  This should calculate the first and second primary components.  Additional components can
16. Repeat for each desired region
be calculated using the “Calculate next component” button. 


17. View the results, click on the “Create four overview plots” button.  This produces the “Score Scatter” plot in the upper-left hand corner. 
17. Click “Next” then click “Finish”
The lower-left hand corner contains the “Loading Scatter” plot. 


18. Expand the score scatter plot for better viewing.  
18. Using the menu bar, click on the “autofit” button.  This should calculate the first and second primary components.  Additional components can be calculated using the “Calculate next component” button.


19. Click on data point
19. View the results, click on the “Create four overview plots” button.  This produces the “Score Scatter” plot in the upper-left hand corner. The lower-left hand corner contains the “Loading Scatter” plot. 


20. Right-click and choose “Properties”
20. Expand the score scatter plot for better viewing.   


21. Choose the color tab and choose coloring type by “identifiers”The default then is to color by secondary observation IDs.  This uses the  
21. Click on data point
labels inserted using Excel.  If desired, change the default colors. Click “Apply”. Click “Ok”.
 
22. Right-click and choose “Properties”
 
23. Choose the color tab and choose coloring type by “identifiers”  
    The default then is to color by secondary observation IDs.  This uses the labels inserted using Excel.  If desired, change the default colors.  
 
24. Click “Apply”.  
 
25. Click “OK”.


==PLS Analysis==
==PLS Analysis==
Line 86: Line 87:
in the left hand frame and choosing x-variables.   
in the left hand frame and choosing x-variables.   


7. Next, the discriminator values (e.g., the single column of paralysis scores in mouse urine metabolomics) are highlighted.  These values are  
7. Next, the discriminator values (e.g., the single column of paralysis scores in mouse urine metabolomics) are highlighted.  These values are labeled as “y-variables” by clicking on the VARIABLES button and choosing “y-variables”. The column of y-variables will not have a variable ID number in Row 1 because that row was not created in the ACD processing. Incrementing the variable number and typing the value into the cell will save SIMCA from asking about that issue.  Also, SIMCA does not like a mix of text and numerical values in the “y-variables” column.  I discovered this when I used simply “0, 1, 2, 3, 4” instead of “EAE-0, EAE-1,…,EAE-4”.  One response is to write in the text value using the missing value box in the left hand panel (it is below the “exclude data button”).  This approach preserves the numerical content for PLS.  Another answer is to use all text or all numerical values, but not a mix.   
labeled as “y-variables” by clicking on the VARIABLES button and choosing “y-variables”. The column of y-variables will not have a variable ID number  
in Row 1 because that row was not created in the ACD processing. Incrementing the variable number and typing the value into the cell will save SIMCA  
from asking about that issue.  Also, SIMCA does not like a mix of text and numerical values in the “y-variables” column.  I discovered this when I  
used simply “0, 1, 2, 3, 4” instead of “EAE-0, EAE-1,…,EAE-4”.  One response is to write in the text value using the missing value box in the left  
hand panel (it is below the “exclude data button”).  This approach preserves the numerical content for PLS.  Another answer is to use all text or all  
numerical values, but not a mix.   


8. Selected data may still be excluded prior to the statistical analysis.
8. Selected data may still be excluded prior to the statistical analysis.


9. The dataset is “fit”, additional components can be added/substracted and the results are visualized using the same commands as for PCA  
9. The dataset is “fit”, additional components can be added/substracted and the results are visualized using the same commands as for PCA analysis described above.
analysis described above.


==OPLS Analysis==
==OPLS-DA Analysis==


1. This approach applies orthogonal signal correction prior to PLS analysis.  I need to do more reading…  But, I believe that I have basic command sequence needed to explore the approach using SIMCA-P+ Version 12.0.
1. This approach applies orthogonal signal correction prior to PLS analysis.  I need to do more reading…  But, I believe that I have basic command sequence needed to explore the approach using SIMCA-P+ Version 12.0.
Line 107: Line 101:
3. The data file should be first analyzed using either PCA or PLS as described above.
3. The data file should be first analyzed using either PCA or PLS as described above.


4. The analysis is reportedly hindered by extreme outliers found in PCA 2D scores plot (where did I read this…?...the Umetrics manual maybe…).
4. The analysis is reportedly hindered by extreme outliers found in PCA 2D scores plot (where did I read this…?...the Umetrics manual maybe…). So, any outliers should be removed prior to OPLS analysis.
So, any outliers should be removed prior to OPLS analysis.


5. Open the Excel file in SIMCA, go to the Dataset pull-down menu.  Choose Spectral Filters.  In the available column, scroll down and select  
5. Open the Excel file in SIMCA, go to the Dataset pull-down menu.  Choose Spectral Filters.  In the available column, scroll down and select OCS.  Press the button labeled as  => to move OCS into the selected column.  Click OK.
OCS.  Press the button labeled as  => to move OCS into the selected column.  Click OK.


6. The OSC panel should appear.  Refer to your NMR spectral data to identify bins that contain strong peaks for metabolites (e.g., citrate peaks  
6. The OSC panel should appear.  Refer to your NMR spectral data to identify bins that contain strong peaks for metabolites (e.g., citrate peaks in mouse urine spectra).  Highlight several of these bins (maybe 5-10).  Click on Y to change the state of these to Y values.  Click on the “Next >” button.  There may be a message regarding exclusion of variables with no variance.   
in mouse urine spectra).  Highlight several of these bins (maybe 5-10).  Click on Y to change the state of these to Y values.  Click on the “Next >”  
button.  There may be a message regarding exclusion of variables with no variance.   


7. The result, both in terms of the scores plot and the plots showing the difference between PC score points, depends which peaks are assigned as the Y values.
7. The result, both in terms of the scores plot and the plots showing the difference between PC score points, depends which peaks are assigned as the Y values.


8. A new OSC panel will appear.  There should be a table with columns labeled No, Angle in Degreees, Remaining SS in % and Eigenvalue.  Click on the “next component” button.  Generally, two components are recommended by Umetrics.  Click on the Next button.


8. A new OSC panel will appear.  There should be a table with columns labeled No, Angle in Degreees, Remaining SS in % and Eigenvalue.  Click on
9. Check the destination folder and file name. Click on the “Finish” button.
the “next component” button.  Generally, two components are recommended by Umetrics.  Click on the Next button.
 
9. Check the destination folder and file name. Click on the “Finish” button.


10. Read and close the OSC message box.
10. Read and close the OSC message box.


11. The current model will probably say “PLS <unfitted>”  in the Type column.  Fit the data using the usual commands for autofit and plot  
11. The current model will probably say “PLS <unfitted>”  in the Type column.  Fit the data using the usual commands for autofit and plot visualization
visualization


12. Go the Analysis pull-down menu and select Change Model Type.  From the list, choose OPLS/O2PLS.  Choose the Analysis pull-down menu and select  
12. Go the Analysis pull-down menu and select Change Model Type.  From the list, choose OPLS/O2PLS.  Choose the Analysis pull-down menu and select Autofit (…or just use the Autofit button on the toolbar).
Autofit (…or just use the Autofit button on the toolbar).

Revision as of 04:34, 3 October 2012


1H NMR Analysis (SIMCA)

Excel Processing

Secondary observation ID labels can be added by inserting a blank second row and filling in the desired labels. This does NOT affect the raw data. It merely makes “viewing” easier in the SIMCA output.

Note:

Use the standard method to remove instrument noise.

For PLS-DA analysis, the y-variable (e.g., the paralysis score for mouse urine metabolomics) values are placed in a row below the row containing the last NMR integral bucket value. When the spreadsheet is transposed in SIMCA, this last row becomes the last column.

After noise removal, the data should be autoscaled. By default, SIMCA-P will use "UV" autoscale the imported data set.

PCA Analysis

1. Start a “new” project by opening the desired integral table spreadsheet

2. Click on new project icon

3. Select the desired Excel file

4. Click “Open”

5. Click on the comma option. The data formatting should now look correct

6. Click “OK”

7. Project type should be SIMCA-P Project

8. Click “next” button

9. Respond “No” to the question regarding the one row that is “empty or contains only text”

  It is probably the row of labels that was inserted using Excel.

10. Use the “Commands” button (bottom left) to “transpose” the data set. The rows should now correspond to a given NMR spectrum. Each row is an “observation”. The integral values contained in each row are referred to as “variables”.

11. After transposing, make sure that first column has been identified as the “primary observation ID’s”. To do this, click on the button at the top of the column

   (Note: it may already be labeled as primary) and then click on the “Observation IDs primary” button. The observation IDs should now be color coded with the observation ID primary color (dark yellow or mustard color). If secondary observation IDs were inserted using Excel, then click on the second column button and then click on “Observation IDs secondary”. Again, the column should be color coded to the correct color (light yellow).    

12. Click on the butoon for the first row (note: it may already be labeled as primary) and then click on the “Variable IDs primary” button to color code the first row (green). Next, repeat for the second row containing the ppm ranges that define each bucket. These will be the “Variable IDs secondary” and are turquoise. Click “Next” button in the lower-right.

13. Click the “Finish” button

14. Exclude the solvent region and “ends” of the spectra

15. Highlight the desired rows by dragging the cursor along the top set of buttons and then click the “exclude” button (along the left edge)

16. Repeat for each desired region

17. Click “Next” then click “Finish”

18. Using the menu bar, click on the “autofit” button. This should calculate the first and second primary components. Additional components can be calculated using the “Calculate next component” button.

19. View the results, click on the “Create four overview plots” button. This produces the “Score Scatter” plot in the upper-left hand corner. The lower-left hand corner contains the “Loading Scatter” plot.

20. Expand the score scatter plot for better viewing.

21. Click on data point

22. Right-click and choose “Properties”

23. Choose the color tab and choose coloring type by “identifiers”

   The default then is to color by secondary observation IDs.  This uses the labels inserted using Excel.  If desired, change the default colors. 

24. Click “Apply”.

25. Click “OK”.

PLS Analysis

Note: 1. The data are prepared in an Excel file as described above.

2. The EXCEL file is opened in SIMCA as described above for PCA analysis.

3. The opened file should then be transposed. Again, the commands button found in the lower left provides access to this command.

4. The label information for observations and variables should be processed as described for PCA analysis above.

5. The difference between the PCA approach and the PLS approach occurs with labeling of the variables (i.e., the bucketed intensities and the discriminator values).

6. The region containing the bucketed intensities is highlighted. These values are labeled as “x-variables” by clicking on the VARIABLE button in the left hand frame and choosing x-variables.

7. Next, the discriminator values (e.g., the single column of paralysis scores in mouse urine metabolomics) are highlighted. These values are labeled as “y-variables” by clicking on the VARIABLES button and choosing “y-variables”. The column of y-variables will not have a variable ID number in Row 1 because that row was not created in the ACD processing. Incrementing the variable number and typing the value into the cell will save SIMCA from asking about that issue. Also, SIMCA does not like a mix of text and numerical values in the “y-variables” column. I discovered this when I used simply “0, 1, 2, 3, 4” instead of “EAE-0, EAE-1,…,EAE-4”. One response is to write in the text value using the missing value box in the left hand panel (it is below the “exclude data button”). This approach preserves the numerical content for PLS. Another answer is to use all text or all numerical values, but not a mix.

8. Selected data may still be excluded prior to the statistical analysis.

9. The dataset is “fit”, additional components can be added/substracted and the results are visualized using the same commands as for PCA analysis described above.

OPLS-DA Analysis

1. This approach applies orthogonal signal correction prior to PLS analysis. I need to do more reading… But, I believe that I have basic command sequence needed to explore the approach using SIMCA-P+ Version 12.0.

2. Start from an Excel file that has NOT been noise corrected and NOT been autoscaled.

3. The data file should be first analyzed using either PCA or PLS as described above.

4. The analysis is reportedly hindered by extreme outliers found in PCA 2D scores plot (where did I read this…?...the Umetrics manual maybe…). So, any outliers should be removed prior to OPLS analysis.

5. Open the Excel file in SIMCA, go to the Dataset pull-down menu. Choose Spectral Filters. In the available column, scroll down and select OCS. Press the button labeled as => to move OCS into the selected column. Click OK.

6. The OSC panel should appear. Refer to your NMR spectral data to identify bins that contain strong peaks for metabolites (e.g., citrate peaks in mouse urine spectra). Highlight several of these bins (maybe 5-10). Click on Y to change the state of these to Y values. Click on the “Next >” button. There may be a message regarding exclusion of variables with no variance.

7. The result, both in terms of the scores plot and the plots showing the difference between PC score points, depends which peaks are assigned as the Y values.

8. A new OSC panel will appear. There should be a table with columns labeled No, Angle in Degreees, Remaining SS in % and Eigenvalue. Click on the “next component” button. Generally, two components are recommended by Umetrics. Click on the Next button.

9. Check the destination folder and file name. Click on the “Finish” button.

10. Read and close the OSC message box.

11. The current model will probably say “PLS <unfitted>” in the Type column. Fit the data using the usual commands for autofit and plot visualization

12. Go the Analysis pull-down menu and select Change Model Type. From the list, choose OPLS/O2PLS. Choose the Analysis pull-down menu and select Autofit (…or just use the Autofit button on the toolbar).