BNFO 301 
Introduction to Bioinformatics
Gene Annotation
Spring 2013 

Gene Annotation

  1. Why annotate genes?
  2. What genes to annotate?
  3. How to annotate genes? (technical matters)
      1. Registration
      2. Examining the existing evidence for an assertion
      3. Accessing the annotation of a gene - Which field?
      4. Editing a field
  4. How to annotate genes? (scientific matters)
  5. When to annotate genes?

A. Why annotate genes?

As you already know, a major purpose of your research project (indeed, of the course) is to acquaint you with the notion that you can discover things with your own efforts, things never before seen by anyone else. I hope this realization does you, personally, some good, but it should also make the world a better place. Unfortunately, the world will be unaware of your discoveries unless you make the good news public.

The usual way to make the world aware of a discovery is through publication, but there's not enough time for that, so you'll resort to a quicker method for a smaller parcel of information: online annotation of genes.

B. What genes to annotate?

If yours is a protein-centered project, then it is pretty clear what genes you can annotate. You probably began with an article that reported results concerning specific proteins within the class of proteins you're interested in. If the genes encoding these proteins are in BioBIKE, then they are prime candidates for annotation.

Those with protein-centered projects are also likely to generate a list of proteins that are probably in the class of interest by reason of shared characteristics with proteins proven experimentally to be in that class. These proteins are also worthy of annotation.

Those with DNA-centered projects have a more complicated task, since Phantome/BioBIKE organizes information around genes, not segments of DNA. However, in some cases you will be able to find a reasonable gene to latch on to. For example, if you discover regulatory sites of some sort upstream of certain genes, you can annotate those genes (as described below) to with that information. There may be some DNA-centered projects that seem utterly devoid of anything to annotate. If you think your project falls into that category, see me.

C. How to annotate genes? (technical matters)

C.1. Registration
The BioBIKE instance in PhAnToMe (and only that instance) allows registered users to alter the database, adding to or correcting the annotations of genes. When you first logged into PhAnToMe/BioBIKE, you were given the opportunity to enter as either registered or unregistered. Right now, there is no easy way for you to alter that status on your own.

If you don't know whether you are registered, here's how you can tell:

  • In PhAnToMe/BioBIKE, mouse over the GENES/PROTEINS menus and select INFORMATION-ABOUT-GENE/S
  • In the gene-of-protein box of the function, type all4312 and press Enter
  • Execute the function. You should get the annotation window of the gene all4312.
  • Click any yellow box
  • If you are not registered, a popup window will appear "You must be logged in through..."
  • If you are registered, the box will expand to an editing interface. Click CANCEL.
You need to be registered to annotate genes. If you are not registered, then contact me, and I can change your status.

C.2. Examining the existing evidence for assertions

  • Mousing over the Evidence icon next to a field will pop up the current evidence regarding the assertion
  • Clicking the E icon may bring you to the abstract of an article related to the evidence.
  • Mousing over the History icon will pop up the name of the last entity to modify the assertion
  • Clicking the H icon will display all editing activity regarding the field

C.3. Accessing the annotation of a gene - Which field?
When you have established what gene you want to annotate, put the name of that gene into the gene-of-protein box of INFORMATION-ABOUT-GENE/S and execute the function. The gene's annotation page should appear. Clicking on a yellow box associated with one of the fields will open that field for editing. But which field to choose?

  • Most of you will focus on the Main annotation, which is a summary of the supposed function of the protein encoded by the gene. If the gene you're annotating has a listed function you believe to be wrong (e.g. "hypothetical protein") or too general (e.g. "phage protein"), you can change it, as described below.
     
  • If you have some reason to believe that the start codon is in error, you can change either the From or the To field (depending on whether the gene is on the forwards or backwards strand).
     
  • You may have information concerning the Genetic Name of a gene. The genetic name is a short symbol, typically four-letters in length, such as DnaA. Do not confuse the genetic name with the gene's description, which is one or more full words (e.g. beta-galactosidase).
     
  • If you happen to have information concerning the protein's membrane spans or signal sequence, you can modify these fields to contain the coordinates of the region. You might have gotten relevant information through the DOMAINS-OF function.
     
  • Those of you identifying regulatory sites can put what you've found in the Regulation field of the annotation. Clicking the plus-icon next to Regulation will cause a new section of the page to appear below. Clicking (add another) will open up a new editing interface.
     
  • You might conceivably have reason to edit the Physiological Role< field, if you know from a laboratory experiment what effect the encoded protein has on the physiology of the phage or bacterium
     
  • You might conceivably have reason to edit the Operon Structure< field, if you know from a laboratory experiment or suspect from the genome analysis of you or others that the gene is transcribed together with other genes. In that case, you would list the extent of the operon, i.e. which genes are part of the same transcriptional unit.
     
  • You might conceivably have reason to edit the Mutants< field, if you know of a laboratory experiment that describes the nature of a mutant of the gene
     
  • It is not likely that you will have any reason to alter the Aliases or Subsystem Role fields. Please don't touch the latter without special direction.

C.4. Editing a field
Clicking a yellow field should bring up the editing interface. The SAVE button will be green (available for use) only after justification for the annotation has been supplied. You can always get out of the interface without saving any change by clicking CANCEL. To make a change:

  • Modify the field (in the dark yellow box) as you see fit
     
  • Click one of the four radio buttons under Evidence inferred from. If evidence for your assertion comes from a laboratory experiment, click that button. If it comes from genome analysis from a computer program, click Computation. If it comes from the results of a computer program as interpretted by a human (possibly yourself) or from the human's own analysis, click Human/Computation.
     
  • In the (way too small) Evidence window, provide a brief justification for the change, as described in the next section.
     
  • If your justification comes from a published article, provide in the Evidence Link/PubMed ID subfield a link to the abstract of that article (so that it is accessible even by those in institutions without electronic subscriptions). Alternatively, provide the PubMed ID (e.g. 11266551), which is shown just below the abstract on a PubMed page. You can search for the appropriate PubMed abstract by clicking Go to PubMed.

D. How to annotate genes? (scientific matters)

Whatever field you modify, the justification for your assertion should be an observation resulting from a laboratory or computational experiment. That experiment should be briefly described in the Evidence window in just enough detail that the reader can get an idea of the kind of experiment that was conducted. Leave further details for the article appearing in the Evidence Link/PubMed ID subfield. It is not sufficient merely to copy an assertion from an article.

If you report evidence from an article, one important step is to determine whether the gene/protein used by the authors is the same as the gene you're annotating. This will generally come by a comparison of sequences. Do not rely on the coincidence of the function of the protein and the gene's annotation.

You can see what I think are reasonable justifications by going to the annotation pages for all4312 and T4p014.

E. When to annotate genes?

Now may not be a bad time.