Step 1: Create output comparing the following two genomes:
Step 2: Parse the output to find SNPs.
These genomes are very similar to each other, and have a relatively small number of nucleotide differences among them. Once you make the mummer output carefully examine the output and and identify where you see single nucleotide differences between the genomes.
SPECIAL HINT: The output of the mummer DOES NOT come in numerical order. You will probably have to sort the data in order to get an answer.
One way to sort a perl list numerically is:
@list = sort numerically (@list);
sub numerically {
$a <=> $b;
}
Step 3: "Publish" your results to a web page.
Report to the user the total number of SNPs found, and then create a
table that lists all of them. An alternative is that you dont have to
just find the SNPs, which are _single_ nucleotide differences, you can
report all the polymorphisms to the user. You'll need to know the SNPs
for the next part of the assignment though.
The file containing the gene coordinates for Tuberculosis is:
TBCDC1551.pptIts format is the following:
Mycobacterium tuberculosis CDC1551, complete genome - 0..4403836
4187 proteins
Location Strand Length PID Gene Synonym Code COG Product
1..1524 + 508 13879042 MT0001 - - - chromosomal replication initiator protein DnaA
2052..3260 + 403 13879043 MT0002 - - - DNA polymerase III, beta subunit
3280..4437 + 386 13879044 MT0003 - - - recF protein
Most of this information you can ignore. The positions for the
coordinates are found in the first two coloums of data listed in the
file. The locatio of the first gene is starts at nucleotide 1 and goes
to nucleotide 1524 for example.
|
|
|---|