Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2003 
Identifying DNA foreign to a genome

Scientific story (html)

In brief: Genes that provide bacteria with exotic abilities, such as pathogenesis, often arise by horizontal transfer from other organisms. You would like to identify all genes in the sequenced genome of a bacterium that have foreign origins. Current methods work well with large blocks of DNA (i.e. many tens of genes in length) but not so well with individual genes, because they do not extract sufficient amount of DNA from a single gene to permit the characteristics of foreign genes to reliably rise above random variation. You would like to adapt a technique that makes greater use of the information within genes and use it to identify foreign genes.
Bioinformatic tools
Markov models
Contrary to all those disclaimers from investment advisors, past performance CAN predict future behavior.
Molecular biology concepts: Compositional inhomogeneities in genomic sequences

Perl focus: Using hashes

Papers

Ute Hentschel and Jörg Hacker (2001). Pathogenicity islands: the tip of the iceberg [Review]. Microbes and Infection 3:545-548
A quick review of pathogenicity islands (referred to in Notes for Nov 24)
Samuel Karlin (2001). Review: Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends in Microbiology 9:335-343
Review of methods to detect pathogenicity islands (main focus of Notes for Nov 24)
Jan Mrázek, Devaki Bhaya, Arthur R. Grossman, and Samuel Karlin (2001). Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Research 29:1590-1601
Attempt to apply methods to detect pathogenicity islands for the detection of individual foreign genes. Most of the article is concerned with highly expressed genes, however. (referred to in Notes for Nov 24)
Notes
Detection of anomolous regions of a genome (PDF)
Construction of programs to detect anomalous genes (PDF)
Programs
Hamlet.pl - Creates Markov model based on text in input file and uses it to create pseudotext
DATA: HamletSpeech.txt - Possible input for Hamlet.pl
DATA: Carols.txt - Possible input for Hamlet.pl
Display_hash.pl - Displays the contents of a hash in a logical format
MakeMarkov.pl - Creates Markov model based on set of DNA sequences. You'll write this based on Hamlet.pl
DATA: 6803PHX.nt - Training set of DNA sequences from bona fide genes of the Synechocystis PCC 6803
Run_orfs_through_model.pl - Assesses open reading frames using Markov model
DATA: 6803Orfs.nt - All protein-encoding genes from Synechocystis PCC 6803
Problem Set: Problem Set 8
Alternate results (used in PS8.1h): 6803orfs_codon_bias.xls
Data file (used in PS8.6): aa_info.txt