Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2003 
Nuanced search for motifs in a genome

Scientific story (html)

In brief: There must exist a gene in the cyanobacterium Anabaena that is regulated by nitrogen deprivation (through the DNA-binding protein NtcA) and whose product regulates differentiation (leading to N2-fixing heterocysts). You have in hand a reasonable collection of sequences known to bind NtcA, but you're not sure exactly what features the protein finds important. Your task is to extract as much information from the known binding sequences as possible and use it to scan the genome of Anabaena looking for candidate binding sites.
Bioinformatic tools
Position-specific scoring matrices (PSSMs)
Identify positions in sequence alignments that carry the most information and use frequencies at those positions to characterize aligned motifs
Molecular biology concepts: Nothing new

Perl focus: Hashes; Sorting

Programs

FindMotif.pl - Constructs PSSM from aligned sequences, scans genome, produces list of most plausible motifs
     Data: Small set of aligned sequences (71NpNtSm.txt)

Meme - Web-based program designed to find statistically overrepresented motifs in a collection of sequences.
      Click on MEME - Submission form to use program. Explore other links to learn more about the program.

Notes
Position-specific scoring matrices (PDF) (Questionnaire)
PSSM program (PDF)  (Questionnaire)

Problem Set: Just one for this scenario (HTML)