Dept. of Biostatistics (cross-appointed to the Virginia Institute for Psychiatric and Behavioral Genetics, the Massey Cancer Center, and the Center for the Study of Biological Complexity), at Virginia Commonwealth University
Today’s sophisticated biotechnologies and electronics enable researchers to gather data in quantities unimagined ten years ago. These data acquisition technologies are changing the nature of research in biology and are poised to revolutionize medical diagnosis and treatment. At the same time the infrastructure of knowledge is changing: a great deal of relevant information is stored in online databases, which may aid interpretation of experimental and clinical data.
Biostatisticians must accept the challenge of analyzing and integrating these new data sets. The first challenge is to extract a clear signal from the technologies; there are many confounding factors, such as technical or physiological artifacts, which distort the signals. Then we may test hypotheses about biological organization or mechanisms against the data. Usually we are testing hypotheses of a common form for many specific items, such as genes or brain regions; these may be simple hypotheses (e.g. which gene expressions are changed) or more complex (e.g. which measures are correlated). Finally we must take advantage of previous efforts, usually in the form of databases, to constrain and aid our analysis.
We who analyze such data are like the prisoners in Plato’s Cave: with our measures we perceive only a shadow of the reality, and we must infer the reality from the data using our imagination and logic. In my opinion the best analytic approaches combine statistical subtility with knowledge of the processes under study.
I work on genomic data: gene expression, genotype and epigenetic measures, and on neuroscience data, particularly global methods such as fMRI.
Recently new technologies such as fMRI, calcium imaging, and voltage-sensitive dyes have enabled collection of broad swathes of neural activity over time. This is the domain of multivariate analysis but only recently have a few statisticians begun to develop multivariate methods specific for such data. I am developing methods to identify artifacts in global imaging data such as fMRI, and methods for characterizing the factors behind the measures obtained by unit data.
Besides participating in the technical aspects of methods for microarray and high-throughput sequencing, I have been developing methods to characterize biological processes by their genomic signature. I have also been developing genetic methods to address cumulative small genetic effects ('polygenes').
My graduate students are developing methods for RNA-Seq, and better normalization for microarrays. We are working on methods for characterizing microbial ecology in the human body, and on understanding the developmental roots of liver cancer.
Recent advances in technology have enabled functional neuroscientists to obtain data at high resolution over large areas of the brain or over many cells. This highly multivariate data is our best hope of understanding the dynamic interaction of different brain regions or of different cells within a region. I work on methods to try to identify underlying patterns in the shifting activations.
We have recently completed a large study of gene expression in the developing human brain. I am currently involved in a large study by RNA-Seq of gene expression in the cortex of psychiatric patients compared to normal individuals.