Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2003 
Distinguishing different classes of acute leukemia

Kellie Archer (Department of Biostatistics) E-mail: kjarcher@vcu.edu

Scientific story (html)

In brief: You have in hand RNA from bone marrow samples from two classes of patients: those with acute lymphoblastic leukemia and those with acute myeloid leukemia. Superficially, the two classes of leukemia are very similar, but effective treatment of them differs markedly. How can you use the RNA to identify genes that are expressed differentially between the two classes of leukemia. How can you use this knowledge to build a tool to identify patients with one class or another, thereby pointing the way to effective treatment?
Bioinformatic tools
Statistical analysis of microarray data
How to find from data exhibiting some degree of random fluctuation genes whose expression levels can be used to distinguish between two classes of people.
Molecular biology concepts: Microarrays

Perl focus: Planning and writing a Perl program

Paper

Golub et al (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531-537.

Data (Training set): Expression pattern of 6817 human genes in leukemia patients with identified class of leukemia (txt)

Samples 1 - 27 are from patients with acute lymphoblastic leukemia (ALL)
Samples 28-38 are from patients with acute myeloid leukemia (AML)
Data (Independent set): Expression pattern of 6817 human genes in leukemia patients with identified class of leukemia (txt)
Samples 39-72 are from patients with acute leukemia of an unknown class
See also web site for paper: http://www-genome.wi.mit.edu/MPR
Presentations and notes
Microarray technology: Presentation (ppt)
Spreadsheet: Average difference (xls)
Distinguishing clinical subgroups using microarrays: Presentation (ppt) Notes (pdf)
Spreadsheet: ALL vs AML example (xls)
Spreadsheet: Correlation (xls)
A program to predict the ALL/AML class distinction: Notes (pdf)
Programs
Class_predictor.pl - Shell of a program to calculate and sort correlation values a la Golub et al and to display the genes that best predict the ALL/AML class distinction.

Permute_training_set.pl - Program to calculate predicted curves for randomized data (as seen in Fig. 2 of Golub et al). Used in Problem Set 6.

Vote.pl - Shell of a program to predict ALL/AML status of patients in independent training set (see above). Used in Problem Set 6. Golub et al's own opinions on the matter can be found at the web site for the paper (see above).
     data_set_ALL_AML_best.txt: Predictor set used by Vote.pl
Problem Set: (pdf) (uses programs with links above)