Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2002 
Definition of set of coregulated genes

Scientific story (html)

In brief: You are looking for a gene critical in the regulation of cellular differentiation. One important clue is that this gene should be regulated by a protein whose DNA binding site has been partly characterized. How extract a more precise definition of the binding site from known sequences and use this information to scan the genome for candidate genes?
Bioinformatic tools
Position-specific scoring matrices (PSSM)
Collects information about known sequence features into a table, in such a form that allows you to score sequences for similarity using a continuous scale.
Notes - Position-specific scoring matrices (PDF)
            Perl concepts: Hashes, loops, sorting (html)

Programs

FindMotif(Homegrown version)
Takes a training set of aligned sequences, constructs from it a PSSM, and uses it to scan a database for sequences that are the most similar to those of the training set.
Meme(http://meme.sdsc.edu/meme/website/)
Gibbs Sampler (http://bayesweb.wadsworth.org/gibbs/gibbs.html)
Both programs find regions from an input training set that share common motifs and return PSSMs constructed from each motif.
Perl focus: Hashes, loops, sorting.

Problem Set - PS5P: Perl concepts (html)
                      P5B: Using PSSM's (html)