Our Beowulf Clusters have been designed for high-throughput
bioinformatic analyses, with an emphasis upon sequence similarity
searching (e.g. BLAST, HMMER) and phylogenetics (e.g. MrBayes, PAUP).
Whenever possible, custom software has been developed to utilize many
processors per analysis. For example, BLAST and PAUP can use every
processor on a Cluster to complete analyses quickly.
The Beowulf Clusters support a subset of the available software and local databases. We are happy to develop support for additional software upon request (contact biocomp@lists.mbl.edu).
Even if we are unable to develop parallelization for the requested
software, the ability to batch submit many single-processor jobs to the
Clusters can be very productive for your research.
All software (e.g. PAUP, PUZZLE, BLAST, etc.) must be run using
specialized programs. For example, you will be unable to simply type paup
to use PAUP. Below is the current list of available programs. If you
attempt to run your own executable, it will not migrate to a free
processor and will instead interfere with the Master Node. Such jobs
will be terminated - please seek assistance from biocomp@lists.mbl.edu
before running your own executables or scripts on the Beowulf Clusters.
Users with advanced parallel programming skills are welcome to develop
their own direct-to-cluster tools, in consultation with the JBPC
BioComp Staff.
Available Beowulf Clusters
- CLUSTER1.MBL.EDU (Beowulf Cluster 1) has been retired! All
of its Dual Pentium III 1 GHz computers with 512 MB of RAM are being
reconfigured to augment Beowulf Cluster 3 in the near future.
- CLUSTER2.MBL.EDU (Beowulf Cluster 2) is a Beowulf Cluster running NPACI Rocks release 3.1.0, a version of Linux. CLUSTER2 consists of 37 Dual Athlon MP 1900+ servers with 512 MB of RAM each for a total of 74 available processors. We will be retiring this cluster soon as we are removing nodes, reconfiguring them, and folding them into Beowulf Cluster 3.
- CLUSTER3.MBL.EDU (Beowulf Cluster 3) is a Beowulf cluster running an
in-house custom linux implementation based on Red Hat Fedora Core
3, Sun Grid Engine 6 and MPICH2. Cluster 3 consists of 30 Dual Opteron
246 2GHz with 2GB RAM each for a total of 60 processors.
Que Management and Tools
The clusters run under a Que Management Tool. If a cluster is
in full use, your jobs will wait in a que until processors become
available (there is some load balancing to make usage balanced between
users). We added an email notice feature to all of the programs
so you can log in and start a job that will wait in the que until
processors are available and then will email you when it is done. You
can view the current Cluster Status at the command line by using the qstat command. At the command line, you can additionally Kill One of Your Jobs using the jobid and the qdel command. If you need help managing your que, please contact biocomp@lists.mbl.edu.
Cluster Software
For help using these commands, type them in without arguments to see the help screen or email biocomp@lists.mbl.edu.
Note, programs that have an interactive interface, such as PHYLIP
programs or PUZZLE, are difficult to impliment on the clusters. To do
so, we have opted to use command files that contain the keystrokes you would have used interactively. Here is an example for distance tree construction and bootstrapping.
- clusterblast - a tool to BLAST many sequences that are contained in a multisequence fasta file using many processors and the NCBI blastall executable. Several BLAST databases are available. Contact biocomp@lists.mbl.edu to add your own custom BLAST database.
- clustercritica - high-throughput gene finding using CRITICA and many processors.
- clustermeme - high-throughput consensus motif finding using MEME and many processors.
- clusterhmmpfam
- a tool to compare many protein sequences that are contained in a
multisequence fasta file against a Pfam database of HMM models using
many processors and the hmmpfam executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clusterestwisepfam
- a tool to compare many nucleotide (cDNA, ESTs) sequences that are
contained in a multisequence fasta file against a Pfam database of HMM
models using many processors and the estwisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clustergenewisepfam
- a tool to compare many nucleotide (genomic) sequences that are
contained in a multisequence fasta file against a Pfam database of HMM
models using many processors and the genewisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.
- clustermodeltest - tool to produce the modeltest3 score file rapidly by using many processors and the paup executable. Compatible with version 3.6 of Modeltest.
- clusterpaup - a tool for heuristic-based searches for best trees or bootstrapping using many processors and the paup executable. Currently supports random taxa addition, but other taxa addition routines can be supported upon request.
|
|
- clusterpauprestart - a tool to restart a clusterpaup job that unexpectedly stopped or produced incomplete output.
- runpaup - a tool to process a single NEXUS file by the executable paup using a single processor. Use this program for any PAUP analysis types not supported by clusterpaup or request the clusterpaup be expanded to support your analysis type.
- clusterphylip - a tool to run any PHYLIP program on the cluster using a single processor. Requires command files.
- clusterpuzzle - runs TREE-PUZZLE (formerly PUZZLE) on the Beowulf Clusters, using many processors, when feasible, via MPI. Requires command files.
- clusterpuzzleboot
- a tool to run PUZZLEBOOT on the Beowulf Clusters by performing
SEQBOOT generation of bootstrap data and TREE-PUZZLE generation of
distance matrices (the latter uses multiple processors). clusterphylip can then be used to generate trees from the distance matrices. Requires command files.
- clusterfitch - a tool to use multiple processors when using FITCH to analyze multiple distance matrices. Requires command files.
- clustermb3 - a tool to process a single NEXUS file by MrBayes, using four processors and MPI. We are working on additional speed-ups for MrBayes.
- runmb3
- a tool to process a single NEXUS file by MrBayes, using one
processor. Use this if the cluster appears network bound (i.e. slow MPI) or if four processors will not be available in the near future.
- runclustalw - a tool to run CLUSTALW using one processor.
|