How to do assignment 1

Step 1: Create the data.

 % mummer TBCDC1551.1con TBH37rv.1con > TB.mum.out 

Step 2: Write a perl parser to chow TB.mum.out.

This would be so much easier if only the mummer output were numerically sorted. Because its not, you have to go over the list of data twice. Here's how

First, we load the data into a list:

my (@list);
open(F_IN, $file) || print "was not able to open file\n";
while() {
    if (!/>/) {
	s/^\s*//;
	my($x1, $x2, $l) = split(/\s+/);
	push(@list, $x1 . " " . $x2 . " " . $l);
    }
}
close(F_IN);

@list = sort numerically (@list);

The data now is in @list, and needs to be sorted.

Use this:

@list = sort numerically (@list);
sub numerically {
    $a <=> $b;
}

Now I'm going to go over that list again, and find cases where the position of the first mum is only one nucleotide away from the next mum.

my $first_time = 1;
my $old_pos;

my @list2;

my $count_snps = 0;
foreach my $row (@list) {
    my ($x1, $x2, $l) = split(/\s/,$row);

    if ($first_time != 1) {
	if (($x1 - $old_pos) == $insert_size) {
	    $count_snps++;
	    push(@list2, $old_pos);
	}
    }

    $first_time = 0;
    $old_pos = $x1 + $l;
}
I'm running a counter ($count_snps) that is capturing the total number of snps and then I'll show that in my final output.

Lets look at the whole program next!.


left right