Two-Dimensional Arrays.

A two-dimensional array is a table, something like this:
 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 A A A G T T G T A G T T T C T G T T
1 T T T T T A G T A G C A A T T G C T
2 G A A A A A G T A G C A G T T G C T
3 T T G C T G T A G C A G T A A C T
4 T T T T A T G T A T C A G C T G T T
5 C G A A C T G T T A C A T C G A T T
6 C G T T C T G T A A C A A A G A C T
7 A A T T T T G T A G C T A C T T A T
8 A A T T T A G T A T C A A A A A T A
9 A A A G C T G T A A C A A A A T C T
10 T C A T T T G T A C A G T C T G T T

The numbers on the edges are the indexes of the array.  (Just as Henry Ford allowed his Model T customers any color of car they wanted, as long as it was black, so Perl gives us a single choice for the start of an array: 0.)

We call out an item from the array by using two numbers in brackets: the row number, then the column number. If our array is called @sequences, then in the top row, $sequences[0][0] is "A", $sequences[0][1] is also "A", $sequence[0][6] is "G", and so forth.  Moving away from the top row, $sequences[8][12] is "A", and $sequences[9][6] if "G".

As with a one-dimensional array, one of the most straightforward things we can do is to print out the contents of an array. Here's a start:

for (my $row= 0; $row < @sequences; $row = $row+1) {
    print the contents of the row
}
Now we need to replace the comment in blue with Perl code. To print one item, we could use print $sequences[$row][$column]; so to print a row we would use something like
for (my $row= 0; $row < @sequences; $row = $row+1) {
    for (some number of $columns) {
        print $sequences[$row][$column];
    }
}
We need to vary the $column in the same way we varied $row. Perl doesn't make it as easy to get the number of columns in an array as the number of rows. In our example, since each row is a sequence, the number of columns is the length of the sequence. Let's assume that all sequences are the same length, and the length is stored in a variable $sequence_length.
for (my $row= 0; $row < @sequences; $row = $row+1) {
    for (my $column = 0; $column < $sequence_length; $column = $column+1) {
        print $sequences[$row][$column];
    }
}
We're missing one piece: we never go to a new line.  Somewhere in there we need a "\n", or our printing will run off the edge of the screen, or the paper.

If $sequence_length is not too big, we can add the "\n" at the end of each row:

for (my $row= 0; $row < @sequences; $row = $row+1) {
    for (my $column = 0; $column < $sequence_length; $column = $column+1) {
        print $sequences[$row][$column];
    }
   print "\n";
}
The consensus.pl program has just this piece of code. Consensus.pl calculates a consensus sequence by looking at each column of a table of sequences. If 80% of the bases are A, for instance, then the program will print a capital A in that column of the consensus sequence. If at least 50% of the bases in the column are A, then the program prints a lowercase a in that column. (Otherwise it prints a period.)  Download NtcA-binding.txt to try it out. (This is the data from page 5 of the microbiology notes for Scenario 1, with a few errors.)

SQ6. In consensus.pl, why does the calculate_consensus_sequence routine have $column in the outside for loop, and $row on the inside for loop, the opposite of the printing routine?

   for (my $column = 0; $column < $sequence_length; $column = $column+1) {
      for (my $row = 0; $row < @sequences; $row = $row+1) {
      ...
         }
      }
The BlastN.pl program (which you can test with the DG47.nt and M29081.nt data files) uses two dimensional arrays extensively, but its printing routines are rather complicated.  What it doesn't have is a routine to print out two-dimensional arrays like the scoring matrix.  If we had such a routine, we might be able to watch BlastN as it calculated the scores, and so get a better idea of how it works.

SQ7. Write a subroutine that will print out part of the scoring matrix. It should look something like this:

sub print_score {        
# prints out scoring matrix
# It's called by print_score(a,b,c,d)
# where a is assigned to $first_target
#       b is assigned to $last_target
#       c is assigned to $first_query
#       d is assigned to $last_query
# The table produced should look something like what's shown below
# (you don't have to print labels, just the numbers)

                #                 Query
                #        1st                     last
                # T  1st score score score  ...  score
                # a      score score score  ...  score
                # r      score score score  ...  score
                # g      score score score  ...  score
                # e       ...   ...   ...   ...   ...
                # t last score score score  ...  score

my ($first_target,$last_target,$first_query,$last_query) = @_;
   foreach $t (  CONTINUE THIS LINE
      foreach $q (  CONTINUE THIS LINE
         print  CONTINUE THIS LINE
      }
      print  CONTINUE THIS LINE
   }
}
A few complications make this subroutine a little harder than the printing routine we discussed above: These hints have been incorporated into the suggested format below:
sub print_score {        
my ($first_target,$last_target,$first_query,$last_query) = @_;
   foreach $t ( 
CONTINUE THIS LINE
      foreach $q (  CONTINUE THIS LINE
         if (defined( PUT SOMETHING HERE )) {
            printf "%6d",   CONTINUE THIS LINE
         }
      }
      print 
CONTINUE THIS LINE
   }
}