BNFO 301 
Introduction to Bioinformatics
Topic: Genome Sequences
Where they come from
Spring 2014 
  1. Our story (and commentary)
  2. Overview of genome sequencing
  3. Tools of genome sequencing

Our Story
Fragile X Syndrome is the single greatest genetic cause of mental retardation in humans. You are a molecular neurologist hoping to understand the nature of the disease in order to effect its ultimate cure. The protein responsible for Fragile X Syndrome has long been known. FMRP is a protein that affects the translation of 100's of mRNAs. You'd like to know if there is one mRNA or perhaps a small subset of them that is responsible for the disease symptoms. 

One approach is to systematically disrupt the genes encoding the affected mRNAs. Such experiments with humans are currently frowned upon, so you have turned to mice. Unfortunately, FMRP-deficient mice turn out not to be the ideal system in which to study the basis of mental retardation. Mice are only subtly affected by the loss of the protein, and experiments are not easy to perform. Things are not going well.

Then one morning, you wake up and think Flies. Yes… flies. Fruit flies. People have been using the fruit-fly Drosophila as a model system for a hundred years. It is easy to do genetic manipulations and a great deal is known about the behavior and development of Drosophila. True, it might be difficult to detect mental retardation in a fruit-fly, but that question shows a certain want of feeling. You decide to go for it.

You obtain the FMRP sequence, use it to scan the Drosophila genome for a gene that encodes a similar protein, find it, clone it, mutate it, put the modified gene back into Drosophila, gain deep insight into the causes of mental retardation, and book a flight to Stockholm to pick up your Nobel Prize.

Commentary
The key point in this little tale is contained in this fragment "… scan the Drosophila genome for a gene that encodes a similar protein…". Having in hand the genome of Drosophila and hundreds of other organisms has made possible lines of inquiry that were unthinkable just a several years ago. Our goal today is to understand how genomic sequences are obtained and how they may be put to good use. First we'll search the Drosophila genome (as did the hero of our tale) for a gene encoding an FMRP-like protein. Then we'll examine the process of sequencing of the Drosophila genome. Understanding how the genome sequence was deduced may illuminate both the power and the limitations of the resource we have at our disposal.


Overview of genome sequencing
Note that the path leading to Stockholm described in this story relied on the existence of Drosophila genes and proteins in an accessible database. Before 2000, no database contained entries for more than a small fraction of genes and proteins from Drosophila. Before 1995, no database contained entries for more than a small fraction of genes from any organism. The fact that GenBank and other similar databases provide so rich a source of information results from the thousands of genome sequencing projects that have sprung up since 1995.

One can break up a genome project in many ways. Here's one:

  • Obtain the raw sequence of a genome
  • Identify genes within the genome
  • Deduce function of the protein encoded by the genes
In this module, we focus on the first problem, getting the raw sequence and figuring out how much of it we really have. To do this, we'll consider as an example the elucidation of the Drosophila genome, as described in:
Myers EW et al (2000). A whole-genome assembly of Drosophila. 
Science 287:2196-2204. 
You'll eventually want to digest much of this article. For now, I just want to make sure that you can obtain it.

If you need help getting the article, consult How to Find Articles. If you're having a problem getting this article, solve it! Now! You won't be able to get anywhere in this course if you can't find articles.


Tools of genome sequencing
The main task for today is to understand some of the techniques used in the paper. I know you are capable of finding background on the web, but I've saved you some trouble by gathering together some useful links (I won't always be so helpful). Use them or anything else you like to get the basic idea.

What is shotgun sequencing?

What is dideoxy sequencing? What are BAC libraries? What are P1 inserts?
  • Monaco AP and Larin Z (1994). YACs, BACs, PACs, and MACs: Artificial chromosomes as research tools. Trends in Biotechnology 12:280-286.
     
  • Shizuya H et al (1992). Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proceedings of the National Academy of Sciences USA 89:8794-8797.