Generating Protein Structure Statistics Table: Difference between revisions
From Powers Wiki
		| m (1 revision: Protocols) | |
| (No difference) | |
Revision as of 00:02, 13 March 2012
Definitions
- <SA> is an ensemble of the "best" 30 simulated annealing structures
- SA is average structure
- (SA)r is the restrained minimized average structures
Location of XPLOR, awk and csh Files
/PROGRAMS/XPLOR_FILES/analysis
RMS Deviations from Experimental Distance Restraints
- NOE.cor file needs to be subdivided into separate files containing different constraints classes
- interresidue sequential NOEs (|i-j|=1)
- use awk script:  seq_sel.awk 
 awk -f seq_sel.awk NOE.cor > seq.cor
- the total number of constraints in the seq.cor file will be listed at the end
 
- use awk script:  seq_sel.awk 
- interresidue short range (1 <|i-j| < 5)
- use awk script:  short_sel.awk 
 awk -f short_sel.awk NOE.cor > short.cor
- the total number of constraints in the short.cor file will be listed at the end
 
- use awk script:  short_sel.awk 
- interresidue long range (1 <|i-j| > 5)
- use awk scrihortpt:  long_sel.awk 
 awk -f long_sel.awk NOE.cor > long.cor
- the total number of constraints in the long.cor file will be listed at the end
 
- use awk scrihortpt:  long_sel.awk 
- intraresidue
- use awk script:  intra_sel.awk 
 awk -f intra_sel.awk NOE.cor > intra.cor
- the total number of constraints in the intra.cor file will be listed at the end
 
- use awk script:  intra_sel.awk 
- H-bonds
- These constraints need to be hand collated and counted into the file h-bonds.cor
- They should all be located in one region (end) of the original  NOE.cor  file
 
 
- interresidue sequential NOEs (|i-j|=1)
- Edit  energy.inp  and  energy_ave.inp  to include the filenames for your PSF file, 30 best structures (energy) and restrained minimized structure (energy_ave).
- Run XPLOR and parse output file:
- xplor < energy_ave.inp >& energy.out &
- csh stat.csh > stat_ave.out
- cp energy.out energy_ave.out
- xplor < energy.inp >& energy.out &
- csh stat.csh > stat.out
 
- The files  stat.out  and  stat_ave.out  contain the rms deviations from experimental distance restraints  for the table.
- A typical  stat.out  file will looke like:
 RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )
 RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )
 RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )
 RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )
 RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )
 RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
 RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )
 RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )
 RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )
 RMS_NOE_SHORT- Column 7 mean: 0.0427 Std Dev 0.00536128 (N= 30 )
 ENERGY_BOND- Column 7 mean: 53.9053 Std Dev 5.35852 (N= 60 )
 ENERGY_ANGLE- Column 9 mean: 215.364 Std Dev 21.6272 (N= 60 )
 ENERGY_IMPR- Column 5 mean: 40.4231 Std Dev 10.4815 (N= 30 )
 ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
 ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
 ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
 ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )}}
- rms deviations from experimental distance restraints from the  stat.out  file
- all
 RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )
- interresidue sequential (|i-j| = 1)
 RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )
- interresidue long-range (|i-j| > 5)
 RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )
- Intraresidue
 RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )
- H-bonds
 RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
 
- all
 
- A typical  stat.out  file will looke like:
RMS Deviation from Experimental Dihedral Restraints (deg)
- Include the appropriate dihedral contraint (dihed.tbl) file in the energy.inp file used above.
- The value is reported in the  stat.out  file:
 RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )
- To count the number of dihedral restraints used to refine the structure, use the awk script cnt_ang.nawk
 awk –f cnt_ang.nawk dihed.tbl
RMS Deviation from Experimental Cα Restraints (ppm)
- Use the CARB_AVE.nawk awk script to parse the best 30 simulated annealing structures
 awk -f CARB_AVE.nawk *dg*.sam
 - *dg*.sam is the wild-card representation of the list of 30 simulated annealing structures
- Please use the proper representation to list your 30 best structures
 
- Typical output of CARB_AVE.nawk
 Average CA RMS: 0.885654 +/- 0.0299637 For 30 values.
 Average CB RMS: 0.905546 +/- 0.0229673 For 30 values.
 Average J RMS: 0.606855 +/- 0.04947 For 30 values.
- To count the number of Cα carbon restrains used to refine the structure, us the awk script cnt_ang.nawk
 awk -f cnt_carbon.nawk carbon.tbl
- This approach can be repeated to determine rms deviation from experimental Cβ restraints (ppm)
RMS Deviation from 3JNHα Restrains (Hz)
- Same approach as RMS Deviation from Experimental Cα Restraints (ppm)
- To count the number of coupling restraints used to refine the structure, use the awk script cnt_coupling.nawk
 awk -f cnt_coupling.nawk coupling.tbl
Structure Energies
- FNOE (kcal mol-1)
- The value is reported in the stat.out file (see above):
 ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )
 
- The value is reported in the stat.out file (see above):
- Ftor (kcal mol-1)
- The value is reported in the stat.out file (see above):
 ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
 
- The value is reported in the stat.out file (see above):
- Frepel (kcal mol-1)
- The value is reported in the stat.out file (see above):
 ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
 
- The value is reported in the stat.out file (see above):
- FL-J (kcal mol-1)
- The value is reported in the stat.out file (see above):
 ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
 
- The value is reported in the stat.out file (see above):
Deviations from Idealized Covalent Geometry
- bonds (Å)
- The value is reported in the stat.out (see above):
 RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )
- The number of bonds is the value in the parenthesis
- 1993 in the example above
 
- 1993 in the example above
 
- The value is reported in the stat.out (see above):
- angles (deg)
- The value is reported in the stat.out file (see above):
 RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )
- The number of angles is the value in the parenthesis
- 3655 in the example above
 
- 3655 in the example above
 
- The value is reported in the stat.out file (see above):
- impropers (deg)
- The value is reported in the stat.out file (see above):
 RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )
- The number of improper dihedral angles is the value in the paranthesis
- 972 in the example
 
- 972 in the example
 
- The value is reported in the stat.out file (see above):
PROCHECK
- Run PROCHECK on each individual structure file from the list of 30 Best Structures
 procheck structure filename 1.0
- Calculate averages and standard deviations using awk script PRO_AVE.nawk
 awk -f PRO_AVE.nawk *dg*.sum
 - *dg*sum is the list of procheck output *.sum files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
 Average rama RMS: 83.5167 +/- 2.5297 For 30 values.
 Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.
 Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.
- Overall G-Factor:
 Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.
- %Residues in most favorable region of Ramachandran plot
 Average rama RMS: 83.5167 +/- 2.5297 For 30 values.
- Number of bad contacts/100 residues
 Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.
 - Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
 
- Number needs to be scaled to 100 residues. Multiply by 100/number of residues.
 
- Calculate averages and standard deviations using awk script PRO_AVE_2.nawk
 awk -f PRO_AVE_2.nawk *dg*.out
 - *dg*.out is the list of procheck output *.out files for each of the 30 Best Structures
- Make sure the wild-card form only selects for the correct 30 structures
- Typical output:
 Average hbond RMS: 1.04333 +/- 0.0558768 For 30 values.
- The above result is the H-bond energy
 
Atomic RMS Differences
- Use the rmsdiff.inp and rmsdiff_sec.inp XPLOR scripts
- Repeat the calculates twice, once with the average structure (SA) as the reference structure and the other time with the restrained minimized average structures (SA)r as the reference structure
- Evaluate ($1 = "reference structure name")
 
- Include the 30 best structures, the average structure and the restrained minimized average structure names in the for loop
- Exclude regions of poorly defined structure in the select statments for both the fit and rms calculations
- Only include regions of α-helix and/or β-sheets in the rmsdiff_sec.inp for both the fit and rms calculations
- Use the awk script average.nawk to calculate the average and standard deviation of the 30 Best Structures
 awk –f average.nawk back_all.rms
 - rmsd for backbone atoms
- back_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are list in file
- Need to delete results for average and restrained minimized average structure from back_all.rms
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output:
 Average RMS: 0.408533 +/- 0.056732 For 30 values.
 
 
- Use average.nawk awk script 
 awk –f average.nawk all_all.rms
 - rmsd for all atoms
- all_sec.rms output file for rmsdiff_sec.inp file
- Make sure only the 30 Best Structures are listed in file
- Need to delete results for average and restrained minimized average structure
- Make backup first
- Record (SA)r vs SA rmsd values before deleting
- Typical output
 Average RMS: 1.00183 +/- 0.0711051 For 30 values.