BNFO301 – Introduction to Bioinformatics
Introduction to Microarrays
Consider some of the most fundamental questions in biology:
  • How does one cell or tissue type differ from another?

  •  - What makes a cell tumorigenic?
     - How is smooth muscle different molecularly from skeletal muscle?
     
  • How does an organism respond to external events?

  •   - What are brain cells doing when they are creating a new memory?
      - How do liver cells change when challenged by a drug?
     
  • What is the basis for changes in an organism over time?

  •   - What is the process through which an embryo develops into a mature organism?
      - What marks the process of apoptosis (programmed cell death)?
     
  • What is the mechanism by which key regulatory proteins exert their effects?

  •   - How does the loss or mutation of FMR lead to the array of observable symptoms
         collected under the umbrella Fragile-X syndrome?
These questions can be addressed by comparing the state of the cell, tissue, or organism under two conditions: at different developmental states (e.g. normal vs tumorigenic), in different external environments (e.g. with or without drug), at different moments in time (e.g. throughout the time course of development), and in different genetic backgrounds (e.g. with or without a specific mutation). 

But what is “state”? Included in that term is the collection of all (small) molecules within a cell, tissue, or organism, studied through the tools of metabolomics. We’d also like to know the collection of all protein, the molecules that most directly affect cell behavior, studied through the tools of proteomics. The set of proteins is determined in part and can be monitored indirectly by the collection of messenger RNA, studied through the tools of transcriptomics. The set of genes and noncoding DNA can be studied through the tools of genomics, but the questions listed above cannot be addressed in this way, as almost all cells within an organism possess the same DNA.

At present, the best developed set of tools are those of transcriptomics. It’s far easier to monitor RNA than protein, and the chief tool in doing so is the microarray. To see what a microarray is and how it can be used, see any/some/all of the web sites below for possibly useful animations:

Animation of procedure to use microarrays (short animation)
     Malcolm Campbell, Davidson College
 

Animation of procedure to make and use microarrays (long animation)
     Genetic Science Learning Center, University of Utah

Anatomy of a Comparative Gene Expression Study (non-animation)
     Jeremy Buhler, Washington University of St. Louis

There are two general kinds of microarrays: those that give absolute measures of RNA abundance (most of them marketed by Affymetrix) and those that give relative measures, comparing two conditions. The two animations illustrate the latter, generally less expensive kind of microarray, and we’ll also focus henceforth on that kind, termed spotted arrays. The process of spotting is intrinsically variable: the spots may contain different amounts of DNA, the shapes of the spot may vary, the amount of DNA lying outside of the spotted area may differ from one region of the microarray to another. So absolute levels of RNA measured by a spotted array are not as meaningful as you might hope. Useful analysis of the results come from comparing the binding of RNA isolated under one condition) to the binding of RNA isolated under the other condition -- all to the same microarray. The two RNA's are distinguished by the color of the fluorescent tag, as illustrated in the animations.

Enough generalities. Let's look at a real problem addressed by real microarrays. 

To plants and photosynthetic bacteria, light is life. No light, no food. But too much light... death. That's because if more light is absorbed by photosynthetic pigments than can be harnassed or effectively dissipated, the excess energy can find its way to oxygen, creating reactive species (superoxide, peroxides, and free radicals) that can damage the cell. Under conditions of high light, photosynthetic cells have to radically retool to prevent this fate. 

But how? Which proteins are down-regulated in response to high light? How does the cell alter its metabolism to cope with the stress imposed by the new environment? What regulatory proteins sense the condition of high light and how do they alter gene expression? These are basic questions that microarrays can help answer.

To find some microarrays to this problem:

  • Go to the Stanford MicroArray Database 

  •  
  • Click on Public Login (on the left)

  •  
  • Under Category, highlight stress. Click Display data.

  •  
  • Too many experiments! Go back to the menus, and under SubCategory, highlight light-treatment. Click Display data.

  •  
  • The list of experiments is certainly difficult to understand. The names were chosen by the experimenters for their own use, not for ours. The first four experiments (39587 - 39590) ask the question of how gene expression is affected when the gene encoding the regulatory protein DspA is disrupted. Skip over those for now.

  •  
  • The next four experiments (39583 - 39586) examine the response of wild-type (wt) cells to high light (HL) intensity for periods of 0.5, 1.0, 3.0, and 6.0 hours. That sounds pertinent! Consider Experiment 39584, examining the effect of 1 hr of high light intensity. In this experiment the RNA level in cells exposed to 1 hr of high light intensity is compared to the RNA level in cells maintained continuously under low light intensity.

  •  
  • Look at the Options column on the row of that experiment. You'll see seven icons. The sixth icon is a multicolored box (representing the microarray image). Click on that (and be prepared to wait a while). You should see a microarray in front of you.
SQ1. How many spots are there on this microarray?

SQ2. How would you describe the range of characteristics of the spots?

SQ3. What is the significance of each kind of spot? E.g. What does red mean?
          What does green mean? (Not sure? Stay tuned!)

Find a particularly green spot and click on it. After a while a new screen will appear, showing a blow-up of the spot plus information about the gene whose DNA is present in the spot and information about the fluorescence emanating from the spot. There's plenty -- too much -- information here. For now focus on just the following:
Biological Information (left column)
  • Sequence name (this will enable us to refer to the same gene in BioBIKE)
  • Subcategory (this tells you the general class to which the protein belongs)
  • Product (this tells you the specific function of the protein, if known)
Spot Information (right column)
  • Channel 1 and 2 intensities (mean)
  • Channel 1 and 2 backgrounds (median)
  • Channel 1 and 2 nets (mean)
  • Spot Flag
  • Number of spot pixels
  • G/R mean
  • R/G mean
Note the quantities above (copy them down someplace) and repeat the process on another spot, this time red. Do this with a few other spots of your choice.
    SQ4. What is the significance of Channel 1 and 2 intensities?
SQ5. What do you think is the significance of the backgrounds? (More on this in a moment)
SQ6. Where does the net come from?
SQ7. How many pixels are there per spot? How does this number relate to the size of the white box at the top of the left column?
    SQ8. Where does G/R mean and R/G mean come from?
Now for some strangeness. Go back to the picture of the microarray and identify the 19th spot on the top row (it will be SLR0311). 
    SQ9. From your knowledge of how microarrays are made, how do you explain its peculiar form? Notice that there are many other spots with the same form.
Go to a different microarray, one for... never mind the experiment. Just look at it. Scroll down to the bottom of the array and look at the midle of the bottom row. There you'll see something awful. Click on a dot obscured by the noise. Surely there's no useful information in some of these spots. But the computer can't see the image, just numbers. 
    SQ10. What in the Spot Information for this spot could alert you to the problem?
There's a great deal more that you can do on the Stanford MicroArray Database site to analyze microarray data, but we will generally work on BioBIKE, which has its own capabilities. Just to connect what you've done on SMD with what's available on BioBIKE, try this:
  • Log onto BioBIKE. [NB. At the moment, what follows works on the Stanford BioBIKE site but not on the VCU site, but there are momentous changes in store for the VCU site!]
  • Enter the following:
        (RATIO-OF slr0612 FROM Hihara2001 COLUMN 2)
  • Repeat the same command but with gene names you found in your perusal of the microarray spots.

  • SQ11. (= SQ3 revisited) Putting together the results of your investigations, what do you think is the significance of red and green? Which color represents RNA expressed in cells exposed to high light and which represents RNA maintained under low light?