Scientific story (html)
In brief: You are looking for a gene critical in the regulation of cellular differentiation. One important clue is that this gene should be regulated by a protein whose DNA binding site has been partly characterized. How extract a more precise definition of the binding site from known sequences and use this information to scan the genome for candidate genes?Bioinformatic tools
Position-specific scoring matrices (PSSM)Notes - Position-specific scoring matrices (PDF)
Collects information about known sequence features into a table, in such a form that allows you to score sequences for similarity using a continuous scale.
FindMotif(Homegrown version)Perl focus: Hashes, loops, sorting.Takes a training set of aligned sequences, constructs from it a PSSM, and uses it to scan a database for sequences that are the most similar to those of the training set.Meme(http://meme.sdsc.edu/meme/website/)
Gibbs Sampler (http://bayesweb.wadsworth.org/gibbs/gibbs.html)Both programs find regions from an input training set that share common motifs and return PSSMs constructed from each motif.
Problem Set - PS5P: Perl concepts (html)
P5B: Using PSSM's (html)