Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2002 
Comparison of genomes to look for genes responsible for pathogenesis

Scientific story (html)

In brief: You hit on the idea of undestanding the basis for pathogenesis by the deadly E. coli O157:H7 by comparing its total complement of protein with that of the nonpathogenic strain E. coli K12. Unfortunately, the comparison nets you a file bigger than anything you could go through in a year. How can you extract the useful information from the file and put it in a form a human could understand?
Bioinformatic tools
Blast
     Standard program to find similarities between sequences or sets of sequences.
Parsing program
     Scans output, looking for items of interest as you define them. Outputs them to a separate file.
Notes - Molecular biology (PDF) (Questions)
Notes - Blast/Parsing program  (html) (Questions)

Programs

Blast (obtainable from NCBI site - see instructions) on how to download and run the program)
Most people run this program off of the web. The point of interest for now is learning how to download the program so that you can tailor it to your own purposes.

Protein databases (obtainable from TIGR-CMR site - see instructions)
Files containing all proteins deduced from completed DNA sequences of E. coli strains, used by Blast.

Parsing program:  BlastParser.pl- slightly simplified
                            BlastParser2.pl - full strength version

Perl focus: Pattern matching and extraction of strings through regular expressions

Problem Set - Molecular biology (PDF)
Problem Set - Programming (html)