The Marine Biological Laboratory
Home
Directory
JBPC Forms
JBPC Wiki
The Marine Biological Laboratory The Marine Biological Laboratory
 
Faculty
Mitchell Sogin
Seth Bordenstein
Julie Huber
David Mark Welch
David Patterson
Anton Post
William Reznikoff
Jennifer Wernegreen
Research Faculty
Mark Alliegro
Linda Amaral Zettler
Irina Arkhipova
Hilary Morrison
Margrethe (Gretta) Serres
Adjunct Faculty
Robert Campbell
Matthew Meselson
Monica Riley
Andreas Teske
Harold Zakon
MBL/Brown Faculty
David Rand
Gary Wessel
Other Personnel
Administration
Graduate Students
Postdoctoral Fellows
Research Associates
Computer Facilities
Computer Resources
Sequencing Informatics
Software
Databases
Beowulf Clusters
Personnel
Local Databases
Antonospora locustae
GenProtEC
GiardiaDB
ICOMM
Micro*Scope
Spraguea lophii
Education
Advances in Genome Technology and Bioinformatics
Workshop on Molecular Evolution
Brown-MBL Graduate Program
Microbial Life Education Resources
Living in the Microbial World
HHMI-MBL Precollege Science Education Lab Series
Protistology Workshop
Sequencing Informatics

Tutorials

 

Pipelines

The JBPC genomics research routinely uses a series of bioinformatics programs to analyze and assemble genomics data.  The more common of these series of programming steps have been combined into "pipelines", programs that automate the series of steps into one or a few steps.  The use of these pipelines facilitates the inclusion of sequencing projects in the GMOD interface.  Each of these pipeline scripts are available to all users of the JBPC computing facility.

Please follow the pipeline links for detailed information on the use of these pipelines. 

  • straw:  takes the sequencing reads, trims vector and low quality data and assembles them into contigs.  The output files should be reviewed with consed and then used directly in make_scaffold. [Programs included:  phred, phd2fasta, cross_match, phrap]

  • consed:  this is not a pipeline program, but an editor for editing sequence assemblies.  It should be used for QAQC of sequences and assembly prior to running make_scaffold.

  • make_scaffold: combines the straw output files containing contig information and scaffolds them into supercontigs.  The output data from make_scaffold can be provided to the GMOD administrator for import into the GMOD and GBrowse system. [Programs included: stripx.pl, makemates.pl, goBambus, toArachne.pl]

  • arachne2gbrowse: the final pipeline that imports sequencing data into GMOD.  This script is used by the GMOD administrator, using the output files from make_scaffold provided by the project researcher.

  • assemble_cdna: an alternative initial assembly script, similar to straw, but optimized for cDNA / EST projects that include very large numbers of reads [Programs included:  phred, phd2fasta, lucy, zapping.awk, cross_match, stripx.pl, tgicl]

 

Vector Library

The Sogin Lab has been collecting vector and splice files that are commonly used in sequencing.  These are available to anyone to use in trimming their sequences for assembly and analysis.  The map images are listed below, but the fasta files are on the xraid.  You can copy these to your workspace.  Example, to determine the exact filename you want, like the pcr4 topo vector file, and to copy it to your current directory (NB the final dot): 
$ls /xraid/bioware/linux/seqinfo/vectors
$cp  /xraid/bioware/linux/seqinfo/vectors/pcr4topo_vector.fa .

 

Useful Programs 

There are several smaller programs that are useful in analyzing sequences.  We have listed several tools below that can be useful:

  • SEALS -  a very useful set of utilities for sequencing and manipulating sequence files.  Provided through NCBI, follow the Documentation link to see the list.

  • EMBOSS - a second set of applications, like SEALS, for data manipulation.  You will find a surprising number of useful tools.  See Overview for the list.

  • Seqinfo/bin - a directory of useful bioinformatic and sequence manipulation tools created by Sue Huse. All of these scripts can be used from your home directory (i.e. $>countbp myseq.fa

  • ren - renames a series of files in your current directory based on pattern-matching.  Use * and ? in specifying the old names, and #1, #2, etc. to refer to them in the new name.

    A simple example is:  >ren "*.fa" "#1.fasta"
    to rename .fa files to .fasta
    Or to change from fas.pep and fas.cds to pep.fas and cds.fas:
    > ren "*.fas.*"  "#1.#2.fas"

  • stripx.pl - takes an input fasta or contig file and changes all x or X's in the sequence to n or N.

  • defline_organism.pl - moves the Genus species in a fasta definition line to the beginning of the text and encloses it in [ ]s.

  • defline_jgi.pl - creates a full NCBI style definition line for sequences downloaded from jgi. 

  • measure_polyAT - returns a very approximate measure of the length of polyA and polyT tails in a fasta file.

 

 

 
     
Supported by NIH, NSF, NASA, The Josephine Bay Paul and C. Michael Paul Foundation, W.M. Keck Foundation, G. Unger Vetlesen Foundation, and Ellison Medical Foundation.
Unless otherwise stated, all material © 2004 Bay Paul Center, MBL.