Our Beowulf Clusters have been designed for high-throughput
bioinformatic analyses, with an emphasis upon sequence similarity
searching (e.g. BLAST, HMMER) and phylogenetics (e.g. MrBayes, PAUP).
Whenever possible, custom software has been developed to utilize many
processors per analysis. For example, BLAST and PAUP can use every
processor on a Cluster to complete analyses quickly.
The Beowulf Clusters support a subset of the available
software and local databases.
We are happy to develop support for additional software upon request
(contact biocomp@lists.mbl.edu).
Even if we are unable to develop parallelization for the requested
software, the ability to batch submit many single-processor jobs to the
Clusters can be very productive for your research.
All software (e.g. PAUP, PUZZLE, BLAST, etc.) must be run using
specialized programs. For example, you will be unable to simply type paup
to use PAUP. Below is the current list of available programs. If you
attempt to run your own executable, it will not migrate to a free
processor and will instead interfere with the Master Node. Such jobs
will be terminated - please seek assistance from biocomp@lists.mbl.edu
before running your own executables or scripts on the Beowulf Clusters.
Users with advanced parallel programming skills are welcome to develop
their own direct-to-cluster tools, in consultation with the JBPC
BioComp Staff.
Available Beowulf Clusters
The most up to date list of available computing facilities are maintained
as the login message on evol5.mbl.edu.
Queue Management and Tools
The clusters utilize Sun Grid Engine for job scheduling and execution. If a cluster is
in full use, your jobs will wait in a queue until processors become
available (there is some load balancing to make usage balanced between
users). We added an email notice feature to all of the programs
so you can log in and start a job that will wait in the queue until
processors are available and then will email you when it is done. You
can view the current Cluster Status at the command line by using the qstat command. At the command line, you can additionally Kill One of Your Jobs using qdel <jobId> command (the jobId is the left most column in the output of qstat). If you need help managing your queue, please contact biocomp@lists.mbl.edu.
Cluster Software
For help using these commands, type them in without arguments to see the help screen or email biocomp@lists.mbl.edu.
Note, programs that have an interactive interface, such as PHYLIP
programs or PUZZLE, are difficult to impliment on the clusters. To do
so, we have opted to use command files that contain the keystrokes you would have used interactively. Here is an example for distance tree construction and bootstrapping.
- clusterblast - a tool to BLAST many sequences that are contained in a multisequence fasta file using many processors and the NCBI blastall executable. Several BLAST databases are available. Contact biocomp@lists.mbl.edu to add your own custom BLAST database.
- clustercritica - high-throughput gene finding using CRITICA and many processors.
- clustermeme - high-throughput consensus motif finding using MEME and many processors.
- clusterhmmpfam
- a tool to compare many protein sequences that are contained in a
multisequence fasta file against a Pfam database of HMM models using
many processors and the hmmpfam executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clusterestwisepfam
- a tool to compare many nucleotide (cDNA, ESTs) sequences that are
contained in a multisequence fasta file against a Pfam database of HMM
models using many processors and the estwisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clustergenewisepfam
- a tool to compare many nucleotide (genomic) sequences that are
contained in a multisequence fasta file against a Pfam database of HMM
models using many processors and the genewisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clustermodeltest - tool to produce the modeltest3 score file rapidly by using many processors and the paup executable. Compatible with version 3.6 of Modeltest.
- clusterpaup - a tool for heuristic-based searches for best trees or bootstrapping using many processors and the paup executable. Currently supports random taxa addition, but other taxa addition routines can be supported upon request.
|
|
- clusterpauprestart - a tool to restart a clusterpaup job that unexpectedly stopped or produced incomplete output.
- runpaup - a tool to process a single NEXUS file by the executable paup using a single processor. Use this program for any PAUP analysis types not supported by clusterpaup or request the clusterpaup be expanded to support your analysis type.
- clusterphylip - a tool to run any PHYLIP program on the cluster using a single processor. Requires command files.
- clusterpuzzle - runs TREE-PUZZLE (formerly PUZZLE) on the Beowulf Clusters, using many processors, when feasible, via MPI. Requires command files.
- clusterpuzzleboot
- a tool to run PUZZLEBOOT on the Beowulf Clusters by performing
SEQBOOT generation of bootstrap data and TREE-PUZZLE generation of
distance matrices (the latter uses multiple processors). clusterphylip can then be used to generate trees from the distance matrices. Requires command files.
- clusterfitch - a tool to use multiple processors when using FITCH to analyze multiple distance matrices. Requires command files.
- clustermb3 - a tool to process a single NEXUS file by MrBayes, using four processors and MPI. We are working on additional speed-ups for MrBayes.
- runmb3
- a tool to process a single NEXUS file by MrBayes, using one
processor. Use this if the cluster appears network bound (i.e. slow MPI) or if four processors will not be available in the near future.
- runclustalw - a tool to run CLUSTALW using one processor.
|