Databases maintained at the JBPC come in two forms. The first group
consists of databases produced by scientists at the JBPC, such as GiardiaDB.
Some of these are in the public domain while others consist of
unpublished data protected by the MBL firewall and/or password access.
The second group consists of maintained local copies of external data,
such as a local copy of GenBank non-redundant nucleotide and protein
blast databases.
Databases Produced by JBPC Scientists
- GiardiaDB: The Giardia lamblia Genome and Gene Expression Database.
- AntonosporaDB: The Antonospora locustae Genome Project
- GenProtEC: E. coli Genome and Proteome Database.
- Spraguea lophii Genome Survey
- ICOMM: The International Census of Marine Microbes
- Micro*Scope: A database of microbial diversity
- GMOD.MBL.EDU is our cluster of
Advanced Genome Browsers, used for creating online resources for
combining and analyzing genome, gene expression, and annotation data
for genome-scale sequence data. It is the primary bioinformatics engine
for the third phase of the GiardiaDB
project, by incorporating genome assembly, genome annotation, gene
expression, and high-throughput phylogenetic information into the Generic Model Organism Database
(GMOD) paradigm. The GMOD server can be used to support any ongoing
genome-level analysis in the JBPC and is currently being used for Antonospora, Blochmania, Trypanosoma, EST, rotifer, and many other projects. See the GMOD Home Page for the complete list of projects and databases.
Local Copies of NCBI, Pfam, and Other External Databases
The JBPC automatically updates its local databases monthly, so
BLAST, HMMER, and other analyses can be reliably performed on a number
of our computers, including the Beowulf Clusters. See the Server and Beowulf Cluster
details to see which computers maintain copies of these databases.
Instructions are provided below for maintaining these databases on your
personal computer.
| Database |
Local Path |
| MITOP (not updated every 30 days). |
/blastdb/mitop |
| E. coli peptides (via NCBI) |
/blastdb/ecoli.aa |
| GenBank non-redundant peptides* |
/blastdb/nr (excluding environmental)
/blastdb/nr_plus_env (including environmental)
/blastdb/env_nr (environmental only) |
| GenBank RefSeq peptides |
/blastdb/refseq_protein |
| GenBank non-redundant nucleotides* |
/blastdb/nt (excluding environmental)
/blastdb/nt_plus_env (including environmental)
/blastdb/env_nt (environmental only) |
| PFAM HMM library, local alignment models |
/blastdb/Pfam_fs |
| PFAM HMM library, global alignment models |
/blastdb/Pfam_ls |
| SwissProt (via NCBI) |
/blastdb/swissprot |
| GMOD Databases (blast databases associated with our various GMOD Projects) |
Varies according to available GMOD data. View contents of /blastdb or contact gmod@lists.mbl.edu for details. |
| EukDB, the predicted proteins from 34 diverse eukaryotes for which genome projects have been performed |
/blastdb/EukDB |
| ProkDB, the predicted proteins from 26 prokaryotes for which genome projects have been performed |
/blastdb/ProkDB |
| SsuDB, a subset of the NCBI environmental and nt databases that was generated by using BLAST to pull out matches to 16s and 18s rRNA genes |
/blastdb/SsuDB |
| RefEuks, a collection of peptides from reference genomes for high-throughput eukaryotic phylogeny investigations. |
/blastdb/RefEuks.fa |
| InterPro databases |
(used by InterProScan) |
| PROSITE databases |
/blastdb/prosite.dat |
| NCBI Taxonomy data |
(used by SEALS) |
| SEALS GI data |
(used by SEALS) |
Maintaining Copies of these Databases on your Personal Computer
Copies of these databases can be obtained for any computer within the MBL firewall by performing rsync to WINTER.MBL.EDU. To see the databases that are available on WINTER.MBL.EDU, enter the following at the command line:
rsync winter.mbl.edu::
The output should look similar to this:
yourprompt% rsync winter.mbl.edu::
mitop mitop fasta file and blast ready database (964 KB)
pfam Pfam_ls & Pfam_fs HMMS (compressed)(~200MB)
swissprot SwissProt blastdb (compressed)(~150kb)
ecoli NCBI E.coli proteins blastdb (compressed) (~1 MB)
nt NCBI non-redundant nucleotide (nt) blastdb (compressed)(~2.5 GB)
nr NCBI non-redundant protein (nr) blastdb (compressed) (~750 MB)
BioDB nr,nt,swissprot,Pfam,etc (~3.5GB compressed)
linuxDB linux formated ecoli,nr,nt,swissprot ready for blasting (~6.4GB)
seals necessary seals databases (gi,taxonomy,etc) (~120MB compressed)
interpro interpro database (~500MB compressed)
This is a list of download modules, their description, and size.
Note that some are compressed and will take up more than the listed
hard drive space once uncompressed.
To download a module, rsync to WINTER.MBL.EDU as follows:
rsync -avuz winter.mbl.edu::[modulename] [destination on your machine]
Here is an example to download Genbank NR to your /blast directory:
rsync -avuz winter.mbl.edu::nr /blast
Information about rsync can be obtained at the rsync website or by using its man page.