Generating Protein Structure Statistics Table

From Powers Wiki

Definitions

  1. <SA> is an ensemble of the "best" 30 simulated annealing structures
  2. SA is average structure
  3. (SA)r is the restrained minimized average structures

Experimental: Python table calculator

Run the following commands if you want to try the automagic structure statistics table calculator:

cd /path/to/my/structure/calculation/files
/home/bworley/research/struct-stats/pull
vi options.py
./Table.py
cat table.txt

The step in which options.py is edited is the most critical. All the files required by the calculator must be available in the current working directory and accessible by the filenames stored in the options file.

Use at your own risk. If it doesn't work, it doesn't work. If it does work, it will complete all tasks described below except the PROCHECK analysis.

Manual Method: Use a set of old scripts

All script files required by this method are stored in /home/PROGRAMS/XPLOR_FILES/analysis.

RMS Deviations from Experimental Distance Restraints

  1. NOE.cor file needs to be subdivided into separate files containing different constraints classes
    1. interresidue sequential NOEs (|i-j|=1)
      1. use awk script: seq_sel.awk
        awk -f seq_sel.awk NOE.cor > seq.cor
      2. the total number of constraints in the seq.cor file will be listed at the end
    2. interresidue short range (1 <|i-j| < 5)
      1. use awk script: short_sel.awk
        awk -f short_sel.awk NOE.cor > short.cor
      2. the total number of constraints in the short.cor file will be listed at the end
    3. interresidue long range (1 <|i-j| > 5)
      1. use awk scrihortpt: long_sel.awk
        awk -f long_sel.awk NOE.cor > long.cor
      2. the total number of constraints in the long.cor file will be listed at the end
    4. intraresidue
      1. use awk script: intra_sel.awk
        awk -f intra_sel.awk NOE.cor > intra.cor
      2. the total number of constraints in the intra.cor file will be listed at the end
    5. H-bonds
      1. These constraints need to be hand collated and counted into the file h-bonds.cor
      2. They should all be located in one region (end) of the original NOE.cor file

  2. Edit energy.inp and energy_ave.inp to include the filenames for your PSF file, 30 best structures (energy) and restrained minimized structure (energy_ave).

  3. Run XPLOR and parse output file:
    1. xplor < energy_ave.inp >& energy.out &
    2. csh stat.csh > stat_ave.out
    3. cp energy.out energy_ave.out
    4. xplor < energy.inp >& energy.out &
    5. csh stat.csh > stat.out
  4. The files stat.out and stat_ave.out contain the rms deviations from experimental distance restraints for the table.
    1. A typical stat.out file will looke like:

      RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )
      RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )
      RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )
      RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )
      RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )
      RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )
      RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )
      RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )
      RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )
      RMS_NOE_SHORT- Column 7 mean: 0.0427 Std Dev 0.00536128 (N= 30 )
      ENERGY_BOND- Column 7 mean: 53.9053 Std Dev 5.35852 (N= 60 )
      ENERGY_ANGLE- Column 9 mean: 215.364 Std Dev 21.6272 (N= 60 )
      ENERGY_IMPR- Column 5 mean: 40.4231 Std Dev 10.4815 (N= 30 )
      ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )
      ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )
      ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )
      ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )}}

    2. rms deviations from experimental distance restraints from the stat.out file
      1. all
        RMS_NOE- Column 6 mean: 0.0410533 Std Dev 0.00483168 (N= 30 )

      2. interresidue sequential (|i-j| = 1)
        RMS_NOE_SEQ- Column 7 mean: 0.0378333 Std Dev 0.00800451 (N= 30 )

      3. interresidue long-range (|i-j| > 5)
        RMS_NOE_LONG- Column 7 mean: 0.0451 Std Dev 0.00898647 (N= 30 )

      4. Intraresidue
        RMS_NOE_INTRA- Column 7 mean: 0.0075 Std Dev 0.00455887 (N= 30 )

      5. H-bonds
        RMS_NOE_H-BONDS- Column 7 mean: 0.0958333 Std Dev 0.0133618 (N= 30 )

RMS Deviation from Experimental Dihedral Restraints (deg)

  1. Include the appropriate dihedral contraint (dihed.tbl) file in the energy.inp file used above.
  2. The value is reported in the stat.out file:

    RMS_CDIH- Column 6 mean: 1.43315 Std Dev 0.794072 (N= 30 )

  3. To count the number of dihedral restraints used to refine the structure, use the awk script cnt_ang.nawk

    awk –f cnt_ang.nawk dihed.tbl

RMS Deviation from Experimental Cα Restraints (ppm)

  1. Use the CARB_AVE.nawk awk script to parse the best 30 simulated annealing structures

    awk -f CARB_AVE.nawk *dg*.sam

    1. *dg*.sam is the wild-card representation of the list of 30 simulated annealing structures
    2. Please use the proper representation to list your 30 best structures
  2. Typical output of CARB_AVE.nawk

    Average CA RMS: 0.885654 +/- 0.0299637 For 30 values.
    Average CB RMS: 0.905546 +/- 0.0229673 For 30 values.
    Average J RMS: 0.606855 +/- 0.04947 For 30 values.

  3. To count the number of Cα carbon restrains used to refine the structure, us the awk script cnt_ang.nawk

    awk -f cnt_carbon.nawk carbon.tbl

  4. This approach can be repeated to determine rms deviation from experimental Cβ restraints (ppm)

RMS Deviation from 3JNHα Restrains (Hz)

  1. Same approach as RMS Deviation from Experimental Cα Restraints (ppm)
  2. To count the number of coupling restraints used to refine the structure, use the awk script cnt_coupling.nawk

    awk -f cnt_coupling.nawk coupling.tbl

Structure Energies

  1. FNOE (kcal mol-1)
    1. The value is reported in the stat.out file (see above):

      ENERGY_NOE- Column 8 mean: 234.949 Std Dev 57.3573 (N= 30 )

  2. Ftor (kcal mol-1)
    1. The value is reported in the stat.out file (see above):

      ENERGY_CDIH- Column 5 mean: 35.9804 Std Dev 45.4492 (N= 30 )

  3. Frepel (kcal mol-1)
    1. The value is reported in the stat.out file (see above):

      ENERGY_REP- Column 8 mean: 107.973 Std Dev 23.6387 (N= 30 )

  4. FL-J (kcal mol-1)
    1. The value is reported in the stat.out file (see above):

      ENERGY_LJ- Column 8 mean: -374.082 Std Dev 21.1904 (N= 30 )

Deviations from Idealized Covalent Geometry

  1. bonds (Å)
    1. The value is reported in the stat.out (see above):

      RMS_BONDS (1993)- Column 6 mean: 0.00519457 Std Dev 0.000252635 (N= 30 )

    2. The number of bonds is the value in the parenthesis
      1. 1993 in the example above

  2. angles (deg)
    1. The value is reported in the stat.out file (see above):

      RMS_ANGLES (3655)- Column 6 mean: 0.621244 Std Dev 0.030363 (N= 30 )

    2. The number of angles is the value in the parenthesis
      1. 3655 in the example above

  3. impropers (deg)
    1. The value is reported in the stat.out file (see above):

      RMS_IMPROPERS (972)- Column 6 mean: 0.518672 Std Dev 0.0634619 (N= 30 )

    2. The number of improper dihedral angles is the value in the paranthesis
      1. 972 in the example

PROCHECK

  1. Run PROCHECK on each individual structure file from the list of 30 Best Structures

    procheck structure filename 1.0

  2. Calculate averages and standard deviations using awk script PRO_AVE.nawk

    awk -f PRO_AVE.nawk *dg*.sum

    1. *dg*sum is the list of procheck output *.sum files for each of the 30 Best Structures
    2. Make sure the wild-card form only selects for the correct 30 structures
    3. Typical output:

      Average rama RMS: 83.5167 +/- 2.5297 For 30 values.
      Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.
      Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.

    4. Overall G-Factor:

      Average G-factors RMS: -0.107333 +/- 0.0287441 For 30 values.

    5. %Residues in most favorable region of Ramachandran plot

      Average rama RMS: 83.5167 +/- 2.5297 For 30 values.

    6. Number of bad contacts/100 residues

      Average bad contacts RMS: 17.5 +/- 2.53969 For 30 values.

      1. Number needs to be scaled to 100 residues. Multiply by 100/number of residues.

  3. Calculate averages and standard deviations using awk script PRO_AVE_2.nawk

    awk -f PRO_AVE_2.nawk *dg*.out

    1. *dg*.out is the list of procheck output *.out files for each of the 30 Best Structures
    2. Make sure the wild-card form only selects for the correct 30 structures
    3. Typical output:

      Average hbond RMS: 1.04333 +/- 0.0558768 For 30 values.

    4. The above result is the H-bond energy

Atomic RMS Differences

  1. Use the rmsdiff.inp and rmsdiff_sec.inp XPLOR scripts
  2. Repeat the calculates twice, once with the average structure (SA) as the reference structure and the other time with the restrained minimized average structures (SA)r as the reference structure
    1. Evaluate ($1 = "reference structure name")
  3. Include the 30 best structures, the average structure and the restrained minimized average structure names in the for loop
  4. Exclude regions of poorly defined structure in the select statments for both the fit and rms calculations
  5. Only include regions of α-helix and/or β-sheets in the rmsdiff_sec.inp for both the fit and rms calculations
  6. Use the awk script average.nawk to calculate the average and standard deviation of the 30 Best Structures

    awk –f average.nawk back_all.rms

    1. rmsd for backbone atoms
    2. back_sec.rms output file for rmsdiff_sec.inp file
    3. Make sure only the 30 Best Structures are list in file
      1. Need to delete results for average and restrained minimized average structure from back_all.rms
      2. Make backup first
      3. Record (SA)r vs SA rmsd values before deleting
      4. Typical output:

        Average RMS: 0.408533 +/- 0.056732 For 30 values.

  7. Use average.nawk awk script

    awk –f average.nawk all_all.rms

    1. rmsd for all atoms
    2. all_sec.rms output file for rmsdiff_sec.inp file
    3. Make sure only the 30 Best Structures are listed in file
      1. Need to delete results for average and restrained minimized average structure
      2. Make backup first
      3. Record (SA)r vs SA rmsd values before deleting
      4. Typical output

        Average RMS: 1.00183 +/- 0.0711051 For 30 values.

Example Structural Statistics Table

Check it Out!