BNFO301 – Introduction to Bioinformatics
Genome Sequences - Where they come from, How they're used

Where genome sequences come from

Note that the path leading to Stockholm described in the previous set of notes relied on the existence of Drosophila genes and proteins in an accessible database. Before 2000, no database contained entries for more than a small fraction of genes and proteins from Drosophila. Before 1995, no database contained entries for more than a small fraction of genes from any organism. The fact that GenBank and other similar databases provide so rich a source of information results from the hundreds of genome sequencing projects that have sprung up since 1995.

One can break up a genome project in many ways. Here's one:

  • Obtaining the raw sequence of a genome
  • Identifying genes within the genome
  • Deducing function of the protein encoded by the genes
In these notes we'll examine the first step, using as an example the elucidation of the Drosophila genome, as described in:
Myers EW et al (2000). A whole-genome assembly of Drosophila. 
Science 287:2196-2204
You'll want to go to the link to that article and either follow along on line or print out a copy. If you choose the latter route, then click on the Full Text (PDF) link under Article View. As you read the article (I would suggest up to but not including the section on validation), generate questions, particularly on issues that are essential for you to understand how genome sequences are elucidated. I have tried to anticipate some questions and provide some ways for you to answer them.

What is shotgun sequencing?

What is dideoxy sequencing? What are BAC libraries? What are P1 inserts?