Bioscreen Bioinformatics Scripts

From Powers Wiki

Bioinformatics data from several different sources is saved in Bioscreen to facilitate access and further aid in computational identification of possible functional annotations. To load the results of any bioinformatics tool into Bioscreen, they must be in the form of a bioinformatics tarball (tgz file). Such tarballs are created by running perl scripts provided on the cluster.

Running the perl scripts on workstations

Certain perl modules are required by the perl scripts. If any of them fail due to missing modules, you can run sudo cpan and type install ModuleName to install any modules. The following modules are definitely needed to run the scripts:

  • Cwd
  • Net::HTTP
  • File::Path
  • XML::LibXML
  • Archive::Tar
  • File::Basename
  • LWP::UserAgent
  • HTTP::Request::Common

Dali

  1. In a terminal, type:
bioscreen-info-dali.pl $pdbID

In this case, $pdbID is the Protein Databank ID that you wish to pull results for. This will create a file called Dali.tgz containing the results of the Dali run. While the script runs, you should also see information messages about downloaded PDB files. If you don't see any messages about downloads, check the contents of the TGZ file to ensure the script completed successfully.

STRING

  1. In a terminal, type:
bioscreen-info-string.pl $uniprotID

Here, $uniprotID is the UniProtKB accession number for your protein. The script will ask you which protein to use. Just hit enter if only one protien appears in the list. This script produces a file called STRING.tgz containing results.

PFP

  1. In a terminal, type:
bioscreen-info-pfp.pl $sequence

Here, $sequence is the amino acid sequence of your protein. This script produces a tarball called PFP.tgz.