Generating Protein Structure Statistics Table
From Powers Wiki
Definitions
- <SA> is an ensemble of the "best" 30 simulated annealing structures
- SA is average structure
- (SA)r is the restrained minimized average structures
Location of XPLOR, awk and csh Files
/PROGRAMS/XPLOR_FILES/analysis
RMS Deviations from Experimental Distance Restraints
- NOE.cor file needs to be subdivided into separate files containing different constraints classes
- interresidue sequential NOEs (|i-j|=1)
- use awk script: seq_sel.awk
awk -f seq_sel.awk NOE.cor > seq.cor - the total number of constraints in the seq.cor file will be listed at the end
- use awk script: seq_sel.awk
- interresidue short range (1 <|i-j| < 5)
- use awk script: short_sel.awk
awk -f short_sel.awk NOE.cor > short.cor - the total number of constraints in the short.cor file will be listed at the end
- use awk script: short_sel.awk
- interresidue long range (1 <|i-j| > 5)
- use awk scrihortpt: long_sel.awk
awk -f long_sel.awk NOE.cor > long.cor - the total number of constraints in the long.cor file will be listed at the end
- use awk scrihortpt: long_sel.awk
- intraresidue
- use awk script: intra_sel.awk
awk -f intra_sel.awk NOE.cor > intra.cor - the total number of constraints in the intra.cor file will be listed at the end
- use awk script: intra_sel.awk
- H-bonds
- These constraints need to be hand collated and counted into the file h-bonds.cor
- They should all be located in one region (end) of the original NOE.cor file
- interresidue sequential NOEs (|i-j|=1)
- Edit energy.inp and energy_ave.inp to include the filenames for your PSF file, 30 best structures (energy) and restrained minimized structure (energy_ave).
- Run XPLOR and parse output file:
- xplor < energy_ave.inp >& energy.out &
- csh stat.csh > stat_ave.out
- cp energy.out energy_ave.out
- xplor < energy.inp >& energy.out &
- csh stat.csh > stat.out
- The files stat.out and stat_ave.out contain the rms deviations from experimental distance restraints for the table.
- A typical stat.out file will looke like:
RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )
RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )
RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )
RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )
RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )
RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )
RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )
RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )
RMS_NOE_SHORT- Column 7 mean: 0.0427 Std Dev 0.00536128 (N= 30 )
ENERGY_BOND- Column 7 mean: 53.9053 Std Dev 5.35852 (N= 60 )
ENERGY_ANGLE- Column 9 mean: 215.364 Std Dev 21.6272 (N= 60 )
ENERGY_IMPR- Column 5 mean: 40.4231 Std Dev 10.4815 (N= 30 )
ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )}} - rms deviations from experimental distance restraints from the stat.out file
- all
RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 ) - interresidue sequential (|i-j| = 1)
RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 ) - interresidue long-range (|i-j| > 5)
RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 ) - Intraresidue
RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 ) - H-bonds
RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
- all
- A typical stat.out file will looke like:
RMS Deviation from Experimental Dihedral Restraints (deg)
- Include the appropriate dihedral contraint (dihed.tbl) file in the energy.inp file used above.
- The value is reported in the stat.out file:
RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 ) - To count the number of dihedral restraints used to refine the structure, use the awk script cnt_ang.nawk
awk –f cnt_ang.nawk dihed.tbl
RMS Deviation from Experimental Cα Restraints (ppm)
- Use the CARB_AVE.nawk awk script to parse the best 30 simulated annealing structures
awk -f CARB_AVE.nawk *dg*.sam
- *dg*.sam is the wild-card representation of the list of 30 simulated annealing structures
- Please use the proper representation to list your 30 best structures
- Typical output of CARB_AVE.nawk
Average CA RMS: 0.885654 +/- 0.0299637 For 30 values.
Average CB RMS: 0.905546 +/- 0.0229673 For 30 values.
Average J RMS: 0.606855 +/- 0.04947 For 30 values. - To count the number of Cα carbon restrains used to refine the structure, us the awk script cnt_ang.nawk
awk -f cnt_carbon.nawk carbon.tbl - This approach can be repeated to determine rms deviation from experimental Cβ restraints (ppm)
RMS Deviation from 3JNHα Restrains (Hz)
- Same approach as RMS Deviation from Experimental Cα Restraints (ppm)
- To count the number of coupling restraints used to refine the structure, use the awk script cnt_coupling.nawk
awk -f cnt_coupling.nawk coupling.tbl
Structure Energies
- FNOE (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )
- The value is reported in the stat.out file (see above):
- Ftor (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
- The value is reported in the stat.out file (see above):
- Frepel (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
- The value is reported in the stat.out file (see above):
- FL-J (kcal mol-1)
- The value is reported in the stat.out file (see above):
ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
- The value is reported in the stat.out file (see above):
Deviations from Idealized Covalent Geometry
- bonds (Å)
- The value is reported in the stat.out (see above):
RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 ) - The number of bonds is the value in the parenthesis
- 1993 in the example above
- 1993 in the example above
- The value is reported in the stat.out (see above):
- angles (deg)
- The value is reported in the stat.out file (see above):
RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 ) - The number of angles is the value in the parenthesis
- 3655 in the example above
- 3655 in the example above
- The value is reported in the stat.out file (see above):
- impropers (deg)
- The value is reported in the stat.out file (see above):
RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 ) - The number of improper dihedral angles is the value in the paranthesis
- 972 in the example
- 972 in the example
- The value is reported in the stat.out file (see above):
PROCHECK
- Run PROCHECK on each individual structure file from the list of 30 Best Structures
procheck structure filename 1.0 - Calculate averages and standard deviations using awk script PRO_AVE.nawk
awk -f PRO_AVE.nawk *dg*.sum
- *dg*sum is the list of procheck output *.sum files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
Average rama RMS: 83.5167 +/- 2.5297 For 30 values.
Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.
Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values. - Overall G-Factor:
Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values. - %Residues in most favorable region of Ramachandran plot
Average rama RMS: 83.5167 +/- 2.5297 For 30 values. - Number of bad contacts/100 residues
Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.
- Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
- Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
- Calculate averages and standard deviations using awk script PRO_AVE_2.nawk
awk -f PRO_AVE_2.nawk *dg*.out
- *dg*.out is the list of procheck output *.out files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
Average hbond RMS: 1.04333 +/- 0.0558768 For 30 values. - The above result is the H-bond energy
Atomic RMS Differences
- Use the rmsdiff.inp and rmsdiff_sec.inp XPLOR scripts
- Repeat the calculates twice, once with the average structure (SA) as the reference structure and the other time with the restrained minimized average structures (SA)r as the reference structure
- Evaluate ($1 = "reference structure name")
- Include the 30 best structures, the average structure and the restrained minimized average structure names in the for loop
- Exclude regions of poorly defined structure in the select statments for both the fit and rms calculations
- Only include regions of α-helix and/or β-sheets in the rmsdiff_sec.inp for both the fit and rms calculations
- Use the awk script average.nawk to calculate the average and standard deviation of the 30 Best Structures
awk –f average.nawk back_all.rms
- rmsd for backbone atoms
- back_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are list in file
- Need to delete results for average and restrained minimized average structure from back_all.rms
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output:
Average RMS: 0.408533 +/- 0.056732 For 30 values.
- Use average.nawk awk script
awk –f average.nawk all_all.rms
- rmsd for all atoms
- all_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are listed in file
- Need to delete results for average and restrained minimized average structure
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output
Average RMS: 1.00183 +/- 0.0711051 For 30 values.