In the past ten years, genomic and neuroscience technologies have introduced statistical researchers to a host of new problems. I will consider problems of two broad classes. In the first class are technical issues to do with estimation. Historically these have been 'solved' problems, but we have recently become aware that these are not solved at all, in those situations where many more measures are taken than samples observed, and that standard estimators can be dramatically improved by taking advantage of reasonable assumptions about the distributions of the data: recent work in high-dimensional systems has employed both regularization and empirical Bayes approaches. In the second class are problems of inferring specific kinds of relationships among the many variables observed over small numbers of samples. Again these have historically been 'solved' problems, both in statistics and more recently in computer science, via data mining algorithms. I hope to raise sufficient doubts about these solutions in the ‘large P, small N’ situation, to motivate new kinds of approaches for these complex multivariate inference problems, and to present some examples, taking advantage of reasonable assumptions about the system under study.
Issues that arise in high-throughput data analysis
|Back to Seminar Page