Welcome to BioBIKE's tutorial
What is a Gene?

Preliminaries: Genes and gene coordinates

We'll be considering the genes of one of the world's smallest free-living bacteria, the cyanobacterium Prochlorococcus marinus ss120 . You can refer to it by its nickname ss120.
  1. "Small" is a relative term. How much DNA does ss120 have? Let's see what it's got by asking for the sequence. Go to the pallette at the top of the BioBIKE window, mouse over the Genome. Then locate and click on the DISPLAY-SEQUENCE-OF entry, as shown below:


    That will bring up the following box:


    Click on the word "entity" in the gray argument box and provide an entity, namely the organism ss120. There are a few ways to do this. Try going back to the pallette, mouse over the Data button and then the Prochlorococcus marinus ss120 entry and click on ss120


    Now that the command is complete, set it into action by clicking on the green action arrow and then on Execute.


    If all has gone well, you should see in the results window at the bottom of the screen the DNA sequence of a living organism, everything that organism knows about staying alive.
     
  2. If you scroll through the results window, you'll see only several thousand nucleotides were given. Only several thousand? Well, yes. If you look at the top of the window, you'll see that BioBIKE protected you from having the full 1.75 million nucleotide sequence dropped on your screen. That's the nature of genomes. They're too big for a mere human to take in. We'll have to sometimes look at small pieces of it and sometimes consider it abstractly as a whole.

     
  3. That's how big it is. How many genes does it have? Display all of ss120's genes by returning to the GENOME menu and clicking on GENES-OF. Then pull down ss120 from the DATA menu as before or (faster) just click on the gray entity box, type in ss120, and click Enter. Finally, click the Execute entry under the green action arrow. [Note: the genes in the list are given their full names organism.gene (BioBIKE happens to call "ss120" "pro1375"), but don't worry about that]

  4. Again too much to take in! Get a count of the genes in this list by going to the List-Tables menu, then List analysis, and click on COUNT-OF, as shown below:


    How to convince the COUNT-OF function to count the list of genes? There are several ways. Here's a useful trick. Go to the green action arrow of the GENES-OF box and select Cut:

    The box will disappear but remain on the BioBIKE clipboard. Now paste it into the gray argument box of COUNT-OF:

    Execute the function (be sure to choose the green action arrow of the outer function, COUNT-OF, otherwise you'll just execute GENES-OF again).

     
  5. That's a lot of genes for a small, unicellular organism. How many nucleotides are there for every gene in ss120?

     
  6. Let's go back to that list of genes in the results window and take a look at one of them more or less at random. Notice that most of them are called "PRO" followed by a four-digit number. I chose Pro0029. Go to the Genes-proteins menu and click on DESCRIPTION-OF. Then put pro0029 into the gray entity box. Short cut: Just click on the entity box, type pro0029, and press Enter. Then execute the function as usual.

     
  7. That result wasn't very informative. Evidently we don't know much about the function of Pro0029. But we do know a good deal about the gene. To learn more, go back to the DESCRIPTION-OF box and click the Options arrow. Then click on FULL, and execute the function again.


    BioBIKE is trying to be helpful, giving you all sorts of information. Some you already knew, like the name of the gene and its organism. Some you could figure out, like Encodes-protein has the value T,... true, it encodes a protein (isn't that what genes are SUPPOSED to do?).

    The main point of interest here are the coordinates of the gene: From 29973 To 30119. How long is the gene? (pause for quick math) Much shorter than the ~1000 nt per gene you calculated before. This is a short gene.

     
  8. Or is it? Why do the math when the computer can. Leave the white display window where you can go back to it, mouse over to the GENES-PROTEINS menu, Description-analysis submenu, and click on LENGTH-OF. Then type in pro0029, press enter, and execute the function.

    Is that the same number as what you calculated? (Off by 1? Note that something that goes from a coordinate 1 to a coordinate 2 has a length of 2, not 1!).

     
  9. What IS the gene? A gene is a piece of DNA, so we should (in principle) be able to learn just about everything there is to know about it by looking at its sequence. To do this, grab DISPLAY-SEQUENCE-OF from the GENES-PROTEINS menu, fill in Pro0029, press enter, and execute it.

     
  10. What are we looking at? You can imagine that each of the 1928 genes in Prochlorococcus marinus ss120 is a piece of DNA floating around in the cell. Or you can imagine that there is one piece of DNA, the chromosome, where all the genes reside.

    Well, which is it? Earlier on, you saw a bit of the chromosome and learned it went on for about 1.75 million nucleotides. If the genes do reside on the chromosome, there's a big filing problem to worry about: how to find them?

    Go back to the window you opened with information about pro0029 (or if you lost it, just execute the DESCRIPTION-OF function again). It says there that the gene goes from coordinate 29973 to coordinate 30119. If pro0029 is on the chromosome, we should look at the chromosome's sequence between those coordinates.

    Get another copy of DISPLAY-SEQUENCE-OF, getting it from the GENOMES menu, type ss120 into the gray Entity box, and, to get some context, set the FROM to 29901 and the TO option to 30200. Remember to click Enter after entering each number. You should end up with:


     
  11. Is the sequence of pro0029 embedded in this region of the ss120 chromosome? The gene frame said the gene goes from coordinates 29973 to 30119. Compare the sequence in the pro0029 display window with the sequence in the ss120 display window. Can you find one within the other?


PROBLEM 1:

If you understand how coordinates work, then you should be able to go to Pro0001, get its coordinates, and find its sequence in the chromosome. Display the first 500 nucleotides of the ss120. Then using the coordinates you found for Pro0001, find the sequence of the beginning of the gene. To check if you're right, display the sequence of the gene and compare the two sequences.


That's it for the preliminaries!
Use your browser to go back
one page to the table of contents.