|
Tutorials
- Using XWindows at your desktop: how to install and use X-windows programs on both Macintosh and PC platforms.
- JBPC Sequencing Tutorial: walks the user through the sequencing informatics process from downloading the sequence
files to creating and assembly using the JBPC pipelines straw.
- Submitting your data to GMOD: instructions on how to prepare your sequence assembly data to submit to the GMOD administrator, as well as the list of files and information necessary to create the GMOD dataset.
- Unix Tutorials: helpful links for learning unix
- Software
- The JBPC Software page lists bioinformatic software available in the Center and has links to get you started.
-
A guide to Using BLAST on the Center's bioware computers, with a brief introduction to how BLAST works.
Pipelines
The JBPC genomics research routinely uses a series of bioinformatics
programs to analyze and assemble genomics data. The more common
of these series of programming steps have been combined into
"pipelines", programs that automate the series of steps into one or a
few steps. The use of these pipelines facilitates the inclusion of sequencing projects in the GMOD interface. Each of these pipeline scripts are available to all users of the JBPC computing facility.
Please follow the pipeline links for detailed information on the use of these pipelines.
- straw: takes the sequencing reads, trims vector and
low quality data and assembles them into contigs. The output
files should be reviewed with consed and then used directly in
make_scaffold. [Programs included: phred, phd2fasta, cross_match, phrap]
- consed:
this is not a pipeline program, but an editor for editing sequence
assemblies. It should be used for QAQC of sequences and assembly prior
to running make_scaffold.
- make_scaffold: combines the straw output files containing
contig information and scaffolds them into supercontigs. The output
data from make_scaffold can be provided to the GMOD administrator for
import into the GMOD and GBrowse system. [Programs included: stripx.pl,
makemates.pl, goBambus, toArachne.pl]
- arachne2gbrowse: the final pipeline that imports sequencing data
into GMOD. This script is used by the GMOD administrator, using the
output files from make_scaffold provided by the project researcher.
- assemble_cdna: an alternative initial assembly script, similar to
straw, but optimized for cDNA / EST projects that include very large
numbers of reads [Programs included: phred, phd2fasta, lucy,
zapping.awk, cross_match, stripx.pl, tgicl]
Vector Library
The Sogin Lab has been collecting vector and splice
files that are commonly used in sequencing. These are available
to anyone to use in trimming their sequences for assembly and
analysis. The map images are listed below, but the fasta files
are on
the xraid. You can copy these to your workspace. Example,
to determine the exact filename you want, like the pcr4 topo vector
file, and to copy it to your current directory (NB the final
dot):
$ls /xraid/bioware/linux/seqinfo/vectors
$cp /xraid/bioware/linux/seqinfo/vectors/pcr4topo_vector.fa .
Useful Programs
There are several smaller programs that are useful in
analyzing sequences. We have listed several tools below that can
be useful:
- SEALS - a very useful set of utilities for sequencing and manipulating sequence files. Provided through NCBI, follow the Documentation link to see the list.
- EMBOSS - a second
set of applications, like SEALS, for data manipulation. You will
find a surprising number of useful tools. See Overview for the list.
- Seqinfo/bin - a directory of useful bioinformatic and sequence manipulation tools created by Sue Huse. All of these scripts can be used from your home directory (i.e. $>countbp myseq.fa)
- ren - renames a series of files in your
current directory based on pattern-matching. Use * and ? in
specifying the old names, and #1, #2, etc. to refer to them in the new
name.
A simple example is: >ren "*.fa" "#1.fasta"
to rename .fa files to .fasta Or to change from fas.pep and fas.cds to pep.fas and cds.fas:
> ren "*.fas.*" "#1.#2.fas"
- stripx.pl - takes an input fasta or contig file and changes all x or X's in the sequence to n or N.
- defline_organism.pl - moves the Genus species in a fasta definition line to the beginning of the text and encloses it in [ ]s.
- defline_jgi.pl - creates a full NCBI style definition line for sequences downloaded from jgi.
- measure_polyAT - returns a very approximate measure of the length of polyA and polyT tails in a fasta file.
|