MBL Logo The Marine Biological Laboratory, Woods Hole The Josephine Bay Paul Center for Comparative Molecular Biology and Evolution
The Josephine Bay Paul Center
Home
JBPC Forms
JBPC Wiki
 
Faculty
Mitchell Sogin
Mark Alliegro
Linda Amaral Zettler
Irina Arkhipova
Joshua Hamilton
Julie Huber
David Mark Welch
Anton Post
Sheri Simmons
Joel Smith
Research Faculty
Sue Huse
Jessica Mark Welch
Hilary Morrison
William Reznikoff
Margrethe (Gretta) Serres
Adjunct Faculty
Marlene Belfort
Seth Bordenstein
Robert Campbell
Alex Keynan
Matthew Meselson
Robert Prendergast
MBL/Brown Faculty
David Rand
Gary Wessel
Other Personnel
Visiting Scientists
Senior Scholars
Administration
Graduate Students
Postdoctoral Fellows
Informaticists
Computer Facilities
Computer Resources
Sequencing Informatics
Software
Databases
Beowulf Clusters
Personnel
Local Databases
VAMPS Project
GenProtEC
ICOMM
Micro*Scope
Spraguea lophii
Education
Workshop on Molecular Evolution - 2011
Strategies and Techniques for Analyzing Microbial Population Structures (STAMPS) - 2011
Lectures in Ecological Statistics (Archive)
Brown-MBL Graduate Program
Micro-Eco Journal Club
Living in the Microbial World (Archive)
HHMI-MBL Precollege Science Education Lab Series
Protistology Workshop (Archive)
JBPC Parallel processing

Our Beowulf Clusters have been designed for high-throughput bioinformatic analyses, with an emphasis upon sequence similarity searching (e.g. BLAST, HMMER) and phylogenetics (e.g. MrBayes, PAUP). Whenever possible, custom software has been developed to utilize many processors per analysis. For example, BLAST and PAUP can use every processor on a Cluster to complete analyses quickly.

The Beowulf Clusters support a subset of the available software and local databases. We are happy to develop support for additional software upon request (contact biocomp@lists.mbl.edu). Even if we are unable to develop parallelization for the requested software, the ability to batch submit many single-processor jobs to the Clusters can be very productive for your research.

All software (e.g. PAUP, PUZZLE, BLAST, etc.) must be run using specialized programs. For example, you will be unable to simply type paup to use PAUP. Below is the current list of available programs. If you attempt to run your own executable, it will not migrate to a free processor and will instead interfere with the Master Node. Such jobs will be terminated - please seek assistance from biocomp@lists.mbl.edu before running your own executables or scripts on the Beowulf Clusters. Users with advanced parallel programming skills are welcome to develop their own direct-to-cluster tools, in consultation with the JBPC BioComp Staff.

Available Beowulf Clusters

The most up to date list of available computing facilities are maintained as the login message on evol5.mbl.edu.

Queue Management and Tools

The clusters utilize Sun Grid Engine for job scheduling and execution. If a cluster is in full use, your jobs will wait in a queue until processors become available (there is some load balancing to make usage balanced between users). We added an email notice feature to all of the programs so you can log in and start a job that will wait in the queue until processors are available and then will email you when it is done. You can view the current Cluster Status at the command line by using the qstat command. At the command line, you can additionally Kill One of Your Jobs using qdel <jobId> command (the jobId is the left most column in the output of qstat). If you need help managing your queue, please contact biocomp@lists.mbl.edu.


Cluster Software

For help using these commands, type them in without arguments to see the help screen or email biocomp@lists.mbl.edu.

Note, programs that have an interactive interface, such as PHYLIP programs or PUZZLE, are difficult to impliment on the clusters. To do so, we have opted to use command files that contain the keystrokes you would have used interactively. Here is an example for distance tree construction and bootstrapping.

  • clusterblast - a tool to BLAST many sequences that are contained in a multisequence fasta file using many processors and the NCBI blastall executable. Several BLAST databases are available. Contact biocomp@lists.mbl.edu to add your own custom BLAST database.

  • clustercritica - high-throughput gene finding using CRITICA and many processors.

  • clustermeme - high-throughput consensus motif finding using MEME and many processors.

  • clusterhmmpfam - a tool to compare many protein sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the hmmpfam executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clusterestwisepfam - a tool to compare many nucleotide (cDNA, ESTs) sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the estwisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clustergenewisepfam - a tool to compare many nucleotide (genomic) sequences that are contained in a multisequence fasta file against a Pfam database of HMM models using many processors and the genewisedb executable. The Pfam_fs and Pfam_ls databases are available. Contact biocomp@lists.mbl.edu to search with a subset of PFAM or with custom HMMs.

  • clustermodeltest - tool to produce the modeltest3 score file rapidly by using many processors and the paup executable. Compatible with version 3.6 of Modeltest.

  • clusterpaup - a tool for heuristic-based searches for best trees or bootstrapping using many processors and the paup executable. Currently supports random taxa addition, but other taxa addition routines can be supported upon request.


  • clusterpauprestart - a tool to restart a clusterpaup job that unexpectedly stopped or produced incomplete output.

  • runpaup - a tool to process a single NEXUS file by the executable paup using a single processor. Use this program for any PAUP analysis types not supported by clusterpaup or request the clusterpaup be expanded to support your analysis type.

  • clusterphylip - a tool to run any PHYLIP program on the cluster using a single processor. Requires command files.

  • clusterpuzzle - runs TREE-PUZZLE (formerly PUZZLE) on the Beowulf Clusters, using many processors, when feasible, via MPI. Requires command files.

  • clusterpuzzleboot - a tool to run PUZZLEBOOT on the Beowulf Clusters by performing SEQBOOT generation of bootstrap data and TREE-PUZZLE generation of distance matrices (the latter uses multiple processors). clusterphylip can then be used to generate trees from the distance matrices. Requires command files.

  • clusterfitch - a tool to use multiple processors when using FITCH to analyze multiple distance matrices. Requires command files.

  • clustermb3 - a tool to process a single NEXUS file by MrBayes, using four processors and MPI. We are working on additional speed-ups for MrBayes.

  • runmb3 - a tool to process a single NEXUS file by MrBayes, using one processor. Use this if the cluster appears network bound (i.e. slow MPI) or if four processors will not be available in the near future.

  • runclustalw - a tool to run CLUSTALW using one processor.

 
     
Supported by NIH, NSF, NASA, The Josephine Bay Paul and C. Michael Paul Foundation, W.M. Keck Foundation, G. Unger Vetlesen Foundation, and Ellison Medical Foundation.
Unless otherwise stated, all material © 2004 Bay Paul Center, MBL.
Please send notifications or content errors, content updates, and other requests regarding this site to JBPC Webmaster.