The Marine Biological Laboratory
Home
Directory
JBPC Forms
JBPC Wiki
The Marine Biological Laboratory The Marine Biological Laboratory
 
Faculty
Mitchell Sogin
Seth Bordenstein
Julie Huber
David Mark Welch
David Patterson
Anton Post
William Reznikoff
Jennifer Wernegreen
Research Faculty
Mark Alliegro
Linda Amaral Zettler
Irina Arkhipova
Hilary Morrison
Margrethe (Gretta) Serres
Adjunct Faculty
Robert Campbell
Matthew Meselson
Monica Riley
Andreas Teske
Harold Zakon
MBL/Brown Faculty
David Rand
Gary Wessel
Other Personnel
Administration
Graduate Students
Postdoctoral Fellows
Research Associates
Computer Facilities
Computer Resources
Sequencing Informatics
Software
Databases
Beowulf Clusters
Personnel
Local Databases
Antonospora locustae
GenProtEC
GiardiaDB
ICOMM
Micro*Scope
Spraguea lophii
Education
Advances in Genome Technology and Bioinformatics
Workshop on Molecular Evolution
Brown-MBL Graduate Program
Microbial Life Education Resources
Living in the Microbial World
HHMI-MBL Precollege Science Education Lab Series
Protistology Workshop
Beowulf Clusters

Our Beowulf Clusters have been designed for high-throughput bioinformatic analyses, with an emphasis upon sequence similarity searching (e.g. BLAST, HMMER) and phylogenetics (e.g. MrBayes, PAUP). Whenever possible, custom software has been developed to utilize many processors per analysis. For example, BLAST and PAUP can use every processor on a Cluster to complete analyses quickly.

The Beowulf Clusters support a subset of the available software and local databases. We are happy to develop support for additional software upon request (contact biocomp@lists.mbl.edu). Even if we are unable to develop parallelization for the requested software, the ability to batch submit many single-processor jobs to the Clusters can be very productive for your research.

All software (e.g. PAUP, PUZZLE, BLAST, etc.) must be run using specialized programs. For example, you will be unable to simply type paup to use PAUP. Below is the current list of available programs. If you attempt to run your own executable, it will not migrate to a free processor and will instead interfere with the Master Node. Such jobs will be terminated - please seek assistance from biocomp@lists.mbl.edu before running your own executables or scripts on the Beowulf Clusters. Users with advanced parallel programming skills are welcome to develop their own direct-to-cluster tools, in consultation with the JBPC BioComp Staff.

Available Beowulf Clusters

  • CLUSTER1.MBL.EDU (Beowulf Cluster 1) has been retired! All of its Dual Pentium III 1 GHz computers with 512 MB of RAM are being reconfigured to augment Beowulf Cluster 3 in the near future.

  • CLUSTER2.MBL.EDU (Beowulf Cluster 2) is a Beowulf Cluster running NPACI Rocks release 3.1.0, a version of Linux. CLUSTER2 consists of 37 Dual Athlon MP 1900+ servers with 512 MB of RAM each for a total of 74 available processors. We will be retiring this cluster soon as we are removing nodes, reconfiguring them, and folding them into Beowulf Cluster 3.

  • CLUSTER3.MBL.EDU (Beowulf Cluster 3) is a Beowulf cluster running an in-house custom linux implementation based on Red Hat Fedora Core 3, Sun Grid Engine 6 and MPICH2. Cluster 3 consists of 30 Dual Opteron 246 2GHz with 2GB RAM each for a total of 60 processors.

Que Management and Tools

The clusters run under a Que Management Tool. If a cluster is in full use, your jobs will wait in a que until processors become available (there is some load balancing to make usage balanced between users). We added an email notice feature to all of the programs so you can log in and start a job that will wait in the que until processors are available and then will email you when it is done. You can view the current Cluster Status at the command line by using the qstat command. At the command line, you can additionally Kill One of Your Jobs using the jobid and the qdel command. If you need help managing your que, please contact biocomp@lists.mbl.edu.


Cluster Software

For help using these commands, type them in without arguments to see the help screen or email biocomp@lists.mbl.edu.

Note, programs that have an interactive interface, such as PHYLIP programs or PUZZLE, are difficult to impliment on the clusters. To do so, we have opted to use command files that contain the keystrokes you would have used interactively. Here is an example for distance tree construction and bootstrapping.

  • clusterblast - a tool to BLAST many sequences that are contained in a multisequence fasta file using many processors and the NCBI blastall executable. Several BLAST databases are available. Contact biocomp@lists.mbl.edu to add your own custom BLAST database.

  • clustercritica - high-throughput gene finding using CRITICA and many processors.

  • clustermeme - high-throughput consensus motif finding using MEME and many processors.

  • clusterhmmpfam - a tool to compare many protein sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the hmmpfam executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clusterestwisepfam - a tool to compare many nucleotide (cDNA, ESTs) sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the estwisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clustergenewisepfam - a tool to compare many nucleotide (genomic) sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the genewisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clustermodeltest - tool to produce the modeltest3 score file rapidly by using many processors and the paup executable. Compatible with version 3.6 of Modeltest.

  • clusterpaup - a tool for heuristic-based searches for best trees or bootstrapping using many processors and the paup executable. Currently supports random taxa addition, but other taxa addition routines can be supported upon request.


  • clusterpauprestart - a tool to restart a clusterpaup job that unexpectedly stopped or produced incomplete output.

  • runpaup - a tool to process a single NEXUS file by the executable paup using a single processor. Use this program for any PAUP analysis types not supported by clusterpaup or request the clusterpaup be expanded to support your analysis type.

  • clusterphylip - a tool to run any PHYLIP program on the cluster using a single processor. Requires command files.

  • clusterpuzzle - runs TREE-PUZZLE (formerly PUZZLE) on the Beowulf Clusters, using many processors, when feasible, via MPI. Requires command files.

  • clusterpuzzleboot - a tool to run PUZZLEBOOT on the Beowulf Clusters by performing SEQBOOT generation of bootstrap data and TREE-PUZZLE generation of distance matrices (the latter uses multiple processors). clusterphylip can then be used to generate trees from the distance matrices. Requires command files.

  • clusterfitch - a tool to use multiple processors when using FITCH to analyze multiple distance matrices. Requires command files.

  • clustermb3 - a tool to process a single NEXUS file by MrBayes, using four processors and MPI. We are working on additional speed-ups for MrBayes.

  • runmb3 - a tool to process a single NEXUS file by MrBayes, using one processor. Use this if the cluster appears network bound (i.e. slow MPI) or if four processors will not be available in the near future.

  • runclustalw - a tool to run CLUSTALW using one processor.

 

 
     
Supported by NIH, NSF, NASA, The Josephine Bay Paul and C. Michael Paul Foundation, W.M. Keck Foundation, G. Unger Vetlesen Foundation, and Ellison Medical Foundation.
Unless otherwise stated, all material © 2004 Bay Paul Center, MBL.