Generating Protein Structure Statistics Table
From Powers Wiki
Revision as of 05:12, 20 January 2022 by Mjeppesen (talk | contribs) (→Example Structural Statistics Table)
Definitions
- <SA> is an ensemble of the "best" 30 simulated annealing structures
- SA is average structure
- (SA)r is the restrained minimized average structures
Experimental: Python table calculator
Run the following commands if you want to try the automagic structure statistics table calculator:
cd /path/to/my/structure/calculation/files /home/bworley/research/struct-stats/pull vi options.py ./Table.py cat table.txt
The step in which options.py is edited is the most critical. All the files required by the calculator must be available in the current working directory and accessible by the filenames stored in the options file.
Use at your own risk. If it doesn't work, it doesn't work. If it does work, it will complete all tasks described below except the PROCHECK analysis.
Manual Method: Use a set of old scripts
All script files required by this method are stored in /home/PROGRAMS/XPLOR_FILES/analysis.
RMS Deviations from Experimental Distance Restraints
- NOE.cor file needs to be subdivided into separate files containing different constraints classes
- interresidue sequential NOEs (|i-j|=1)
- use awk script: seq_sel.awk
awk -f seq_sel.awk NOE.cor > seq.cor - the total number of constraints in the seq.cor file will be listed at the end
- use awk script: seq_sel.awk
- interresidue short range (1 <|i-j| < 5)
- use awk script: short_sel.awk
awk -f short_sel.awk NOE.cor > short.cor - the total number of constraints in the short.cor file will be listed at the end
- use awk script: short_sel.awk
- interresidue long range (1 <|i-j| > 5)
- use awk scrihortpt: long_sel.awk
awk -f long_sel.awk NOE.cor > long.cor - the total number of constraints in the long.cor file will be listed at the end
- use awk scrihortpt: long_sel.awk
- intraresidue
- use awk script: intra_sel.awk
awk -f intra_sel.awk NOE.cor > intra.cor - the total number of constraints in the intra.cor file will be listed at the end
- use awk script: intra_sel.awk
- H-bonds
- These constraints need to be hand collated and counted into the file h-bonds.cor
- They should all be located in one region (end) of the original NOE.cor file
- interresidue sequential NOEs (|i-j|=1)
- Edit energy.inp and energy_ave.inp to include the filenames for your PSF file, 30 best structures (energy) and restrained minimized structure (energy_ave).
- Run XPLOR and parse output file:
- xplor < energy_ave.inp >& energy.out &
- csh stat.csh > stat_ave.out
- cp energy.out energy_ave.out
- xplor < energy.inp >& energy.out &
- csh stat.csh > stat.out
- The files stat.out and stat_ave.out contain the rms deviations from experimental distance restraints for the table.
- A typical stat.out file will looke like:
RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )
RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )
RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )
RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )
RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )
RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )
RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )
RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )
RMS_NOE_SHORT- Column 7 mean: 0.0427 Std Dev 0.00536128 (N= 30 )
ENERGY_BOND- Column 7 mean: 53.9053 Std Dev 5.35852 (N= 60 )
ENERGY_ANGLE- Column 9 mean: 215.364 Std Dev 21.6272 (N= 60 )
ENERGY_IMPR- Column 5 mean: 40.4231 Std Dev 10.4815 (N= 30 )
ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )}} - rms deviations from experimental distance restraints from the stat.out file
- all
RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 ) - interresidue sequential (|i-j| = 1)
RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 ) - interresidue long-range (|i-j| > 5)
RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 ) - Intraresidue
RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 ) - H-bonds
RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
- all
- A typical stat.out file will looke like:
RMS Deviation from Experimental Dihedral Restraints (deg)
- Include the appropriate dihedral contraint (dihed.tbl) file in the energy.inp file used above.
- The value is reported in the stat.out file:
RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 ) - To count the number of dihedral restraints used to refine the structure, use the awk script cnt_ang.nawk
awk –f cnt_ang.nawk dihed.tbl
RMS Deviation from Experimental Cα Restraints (ppm)
- Use the CARB_AVE.nawk awk script to parse the best 30 simulated annealing structures
awk -f CARB_AVE.nawk *dg*.sam
- *dg*.sam is the wild-card representation of the list of 30 simulated annealing structures
- Please use the proper representation to list your 30 best structures
- Typical output of CARB_AVE.nawk
Average CA RMS: 0.885654 +/- 0.0299637 For 30 values.
Average CB RMS: 0.905546 +/- 0.0229673 For 30 values.
Average J RMS: 0.606855 +/- 0.04947 For 30 values. - To count the number of Cα carbon restrains used to refine the structure, us the awk script cnt_ang.nawk
awk -f cnt_carbon.nawk carbon.tbl - This approach can be repeated to determine rms deviation from experimental Cβ restraints (ppm)
RMS Deviation from 3JNHα Restrains (Hz)
- Same approach as RMS Deviation from Experimental Cα Restraints (ppm)
- To count the number of coupling restraints used to refine the structure, use the awk script cnt_coupling.nawk
awk -f cnt_coupling.nawk coupling.tbl
Structure Energies
- FNOE (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )
- The value is reported in the stat.out file (see above):
- Ftor (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
- The value is reported in the stat.out file (see above):
- Frepel (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
- The value is reported in the stat.out file (see above):
- FL-J (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
- The value is reported in the stat.out file (see above):
Deviations from Idealized Covalent Geometry
- bonds (Å)
- The value is reported in the stat.out (see above):
RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 ) - The number of bonds is the value in the parenthesis
- 1993 in the example above
- 1993 in the example above
- The value is reported in the stat.out (see above):
- angles (deg)
- The value is reported in the stat.out file (see above):
RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 ) - The number of angles is the value in the parenthesis
- 3655 in the example above
- 3655 in the example above
- The value is reported in the stat.out file (see above):
- impropers (deg)
- The value is reported in the stat.out file (see above):
RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 ) - The number of improper dihedral angles is the value in the paranthesis
- 972 in the example
- 972 in the example
- The value is reported in the stat.out file (see above):
PROCHECK
- Run PROCHECK on each individual structure file from the list of 30 Best Structures
procheck structure filename 1.0 - Calculate averages and standard deviations using awk script PRO_AVE.nawk
awk -f PRO_AVE.nawk *dg*.sum
- *dg*sum is the list of procheck output *.sum files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
Average rama RMS: 83.5167 +/- 2.5297 For 30 values.
Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.
Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values. - Overall G-Factor:
Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values. - %Residues in most favorable region of Ramachandran plot
Average rama RMS: 83.5167 +/- 2.5297 For 30 values. - Number of bad contacts/100 residues
Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.
- Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
- Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
- Calculate averages and standard deviations using awk script PRO_AVE_2.nawk
awk -f PRO_AVE_2.nawk *dg*.out
- *dg*.out is the list of procheck output *.out files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
Average hbond RMS: 1.04333 +/- 0.0558768 For 30 values. - The above result is the H-bond energy
Atomic RMS Differences
- Use the rmsdiff.inp and rmsdiff_sec.inp XPLOR scripts
- Repeat the calculates twice, once with the average structure (SA) as the reference structure and the other time with the restrained minimized average structures (SA)r as the reference structure
- Evaluate ($1 = "reference structure name")
- Include the 30 best structures, the average structure and the restrained minimized average structure names in the for loop
- Exclude regions of poorly defined structure in the select statments for both the fit and rms calculations
- Only include regions of α-helix and/or β-sheets in the rmsdiff_sec.inp for both the fit and rms calculations
- Use the awk script average.nawk to calculate the average and standard deviation of the 30 Best Structures
awk –f average.nawk back_all.rms
- rmsd for backbone atoms
- back_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are list in file
- Need to delete results for average and restrained minimized average structure from back_all.rms
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output:
Average RMS: 0.408533 +/- 0.056732 For 30 values.
- Use average.nawk awk script
awk –f average.nawk all_all.rms
- rmsd for all atoms
- all_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are listed in file
- Need to delete results for average and restrained minimized average structure
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output
Average RMS: 1.00183 +/- 0.0711051 For 30 values.