An Opinionated Guide to Microarray Data Analysis

Normalization of ChIP-chip Data

Most of the issues that arise in expression normalization are also issues for other array assays such as ChIP. However many of the methods some normalization issues specific to ChIP arrays, such as compensating for the different enrichment, and other forms of technical variation in the IP step, and the PCR step. Although many popular ChIP assays use competitive two-color hybridization, the widely-used loess normalization for two-color expression arrays distorts the signals, because the average of the two channels is correlated with the signal; furthermore, there are usually two branches to the distribution of log-ratios, and the probes in these branches seem to have different characteristics. Because of the difficulty of normalizing ChIP and MeDIP data, some major innovations in array normalization were tried first on ChIP arrays, and are still best known there.

The most successful normalizations for ChIP-chips use probe sequence characteristics to try to predict the signal intensities on each chip; the predicted part of the signal is considered to be technical artefact since the probe sequences are unrelated to the biological process being measured. The MAT software package [1] utilizes this idea to normalize Affymetrix™ tiling arrays, but there isn’t a generic package available for most chip types. Furthermore the MAT package builds a statistical model by assuming that only a few probe measures reflect real signal; this assumption is true for arrays assaying transcription factor binding sites, but not for arrays assaying chromatin modifications. I have received calls from labs trying to use MAT on arrays which assay chromatin-modifications, and they are very confused about why they get such poor results.