There isn’t one ‘right’ design for a microarray experiment. You will choose different designs for the same scientific question depending on resources and long-range plans. Experienced researchers know that the art of experimental design is to balance scientific interest with experimental skill, reagent cost, and assay sensitivity, using available equipment. Some common sense principles apply to microarrays as to all scientific experiments; but too often these well-known principles are ignored in the enthusiasm for new technology.

It is hard to predict where an investigation will take you; if you are fortunate, what starts off as a one-off study starts a whole programme of investigation. In most laboratory studies researchers ensure that their protocols are similar. Surprisingly many researchers don't apply the same scientific common sense to microarray studies. Gene expression measures are the outcomes of many delicately balanced processes. Many factors can tip the balance and thus distort the measures so as to make them non-comparable. Technicalities such as differences in RNA preservation method, the protocols and times of hybridisation, and protocols for generating cDNA, can introduce systematic differences comparable in size to the biological differences you wish to detect.

Many microarray researchers have been surprised to find that arrays processed at different times or by different technicians often show pronounced batch differences. Thus it is important to try to process all the arrays under as close to identical conditions as possible, or, if that is impossible, to randomize the assignment of samples to processing conditions. Many researchers close to large cities find that they cannot use Cy-5 dyes in the summertime because ozone bleaches the dye. So it's worth planning hybridizations ahead or using other dyes (such as Alexa dyes) that are less labile.

If you plan to do a series of two-color hybridisations,
it is worth the hassle to prepare enough common reference cDNA in one batch to serve for all experiments.
Chip failures are common, and it is wise to prepare more labelled cDNA than you expect to use.
In the early days of microarray studies there was an emphasis on efficient two-color hybridization schemes, such as loop designs.
Data from such designs, analyzed correctly, will give the most precise measures possible, provided *all chips are good*.
Unfortunately these designs lose a lot of information if a single hybridisation fails.
In my opinion the extra efficiency of these designs isn't worth the risk
if you can’t set aside samples to be re-hybridized quickly to chips from the same batch, as is usually the case.

There is no one 'right' answer to this question. Most researchers don't want to do any more replicates than necessary, but it isn't always so clear what 'necessary' really means. A useful way to think about sample size in a comparison of two sets of samples (e.g. treated .vs. control) is to:

- estimate how many genes are likely to be changed to a degree that matters (e.g. by a factor of two-fold in the assay)
- specify what fraction of those you want to be able to identify
- estimate the sample-to-sample variability (noise) in your measures

- microarray assays usually underestimate true fold change because a large part of their signal comes from cross-hybridization.
- chips from different manufacturers have different characteristic variability, as do different labs using the same chip.
- most labs produce failed or outlier arrays sometimes (see Quality), and these will be dropped.

To be definite let's consider trying to identify genes with a fold change of two in the assay
(which might correspond to a real fold change of four or more) using a microarray with 40,000 genes.
A two-fold change corresponds to a difference of 1.0 when the measures are transformed to a log2 scale.
A typical microarray measure from a typical lab has an error standard deviation of about 20%
(that corresponds to a standard deviation of 0.25 in log2 measures).
Thus we want to be able to detect an effect size of about 4 standard deviations.
Suppose we want to pick a threshold for siginifance that limits the number of false positives to about 1;
therefore using the Bonferroni criterion (see Selecting Genes)
we would set the p-value criterion to be 1/40,000 = 0.000025.
Suppose we compare two samples of size 5:
If there is no true difference for a particular gene,
then 0.000025 is the probability that the difference of two sample means would exceed 8.6 sample standard
deviations of the error by chance (there's a calculation involving a *t*-distribution behind this).
Therefore *t* = 8.63 is the threshold at which we would declare a statistically significant difference.
The probability that a gene would reach this threshold if the true effect size is 4 standard deviations is
P( *t* + 4 * sqrt( 5 * 5 / (5 + 5) ) > 8.63 ) = 0.737, or about 75%.
Therefore we would expect to detect about 75% of all genes whose fold change is two or greater,
using a design with five samples per group.
If we use six samples per group then the two-sided 0.000025 point is *t* = 7.33 and
the probability that a real two-fold change will exceed this level is 93%.

A final important consideration is that almost all microarray data has correlated errors (see Selecting Genes). The only reliable ways to select significantly changed genes in correlated data require resampling or permutation methods. Permutation methods require at least 6 replicate samples in each condition.

A more sophisticated approach to estimating sample size based on detailed power calculations by based on variability observed in public microarray data is implemented as a web tool at David Allison's Microarray Power Atlas site. As I understand the calculations on this site they assume that errors are independent.