Welcome to the Nosema locustae Genome Project at the Marine Biological Laboratory


Data Use Policy: Read Before Proceeding


WARNING: Our goal is to make the genome sequence of Nosema locustae rapidly and broadly available. We will post assembly contig consensus sequences on our WEB site. We will continue to post our contigs-in-progress, but our sequence data on this web site or in future submissions to NCBI's unfinished genomes are NOT published. We ask only that users of our unpublished sequence data respect our intention to publish the complete, accurate, and annotated sequence of the Nosema locustae genome, along with our large-scale interpretation of that genome sequence, when it nears or reaches closure. Examples of large-scale interpretations include identification of regions of evolutionary conservation across the genome and/or individual chromosomes, multiple gene phylogenies, identification of extensive sets of genomic features such as genes, gene-families, biochemical and metabolic pathways, repeat structures, G+C content, etc. To avoid any possible misunderstanding, please email Dr. Mitchell L. Sogin (sogin@mbl.edu) or Dr. Hilary G. Morrison (morrison@mbl.edu).

With this single exception, the pre-publication data are available for all other scientific uses (e.g., array design, design of PCR amplification). For example, the data may be used to "jump-start" biological experimentation including PCR amplification and publication of confirmed gene sequences. Users of this information are encouraged to share their results with the Nosema locustae sequencing project. Except for large-scale analyses described above, it is not our intent to restrict publication of a handful of genes extracted from this database.

1. Publications of results should refer to the specific version or date of the data release and should include the following citation: "Nosema locustae Genome Project, Marine Biological Laboratory at Woods Hole, funded by NSF award number 0135272".

2. Users are free to download the Nosema locustae genome sequences for their own use and that of others within their research environment. This data release policy must be displayed to all users of the downloaded data.

3. We explicitly request that users not serve our Nosema locustae genome sequence data to external users. For an exception to our request, you must receive explicit written permission from Dr. Mitchell L. Sogin.

The Project

The genome project and its companion course,
Advances in Genomics and Bioinformatics, are funded by the National Science Foundation. The sequencing is carried out in the laboratory of Mitchell L. Sogin in the Josephine Bay Paul Center for Comparative Molecular Biology and Evolution at the Marine Biological Laboratory, Woods Hole, Massachusetts, U.S.A.

Additional support is provided by the G. Unger Vetlesen Foundation and the W.M. Keck Foundation.

Co-principal investigators are Dr. Charles R. Vossbrinck (Connecticut Agricultural Experiment Station, New Haven CT) and Dr. Hilary G. Morrison (MBL). Team members include Amy Crump, Jillian Ward, Steve Biller, Erica Lasek-Nesselquist, Ulandt Kim, and Bruce Luders. Dr. Andrew G. McArthur, Dr. Bertil Olsson, and Dr. Laura Shulman provide bioinformatics and computational support.


Members of the phylum Microsporidia are highly successful, obligate intracellular eukaryotic parasites with remarkably small genomes of 2.3-20 megabases (MB). They infect nearly all the invertebrate phyla, most commonly arthropods, and all five classes of vertebrates. In agricultural settings, some microsporidial species serve as biological control agents of pests, including Nosema locustae (Nolo Bait) while others are significant pathogens of beneficial insects. They are most prevalent in Lepidoptera, Diptera, and Coleoptera. Microsporidial species with predictable host ranges are attractive candidates for biological control. Knowing the molecular determinants of invasion, pathogenesis, and transmission will identify targets for the control of insect pests or for the prevention and treatment of microsporidal infections in beneficial insects.

The significance of these enigmatic protists for molecular evolution stems from initial phylogenetic analyses based upon comparisons of rRNA and elongation factor genes that place Microsporidia basal to most other eukaryotes. This placement is contradicted by analyses of other genes, which suggest a close affinity between Microsporidia and Fungi. Equally intriguing is the potential influence of intracellular lifestyles on genome evolution. Microorganisms are usually members of complex communities that require interactions between large numbers of genomes. In contrast, associations between a host and an intracellular parasite require genome communication between only two organisms. The cell interior is a specialized ecological niche that exerts selective pressure on the parasite genome. This may lead to rapid diversification and acute specialization. Potential exists for co-evolution by pathway complementation and gene transfer in eukaryote-eukaryote interactions, such as between intracellular protists and animal hosts.


Plasmid libraries containing inserts from 1-3 kbp in size were constructed in pUC18 and other vectors. Sequence data are obtained from each end of the insert using M13F and M13R primers. At present, we use an Applied Biosystems 3730XL capillary sequencer which generates reads of over 850 high quality bases (PHRED score >20) (view graph). Sequences are assembled using the ARACHNE assembler (view scaffolding diagram) and ORFs are called on stable contigs over 3000bp using Glimmer2.0 and CRITICA.