BIOS 667: Advanced Data Analysis
Course Objectives
Biostatisticians must increasingly utilize statistical methods that enable one to discover patterns and relationships in a vast amount of data. During this course students will gain insight regarding statistical methods used to discover the underlying structure of large complex datasets. Specific topics will include bootstrap methods, discriminant analysis, k-nearest neighbors, classification and regression trees, and random forests. At the conclusion of the course students will be able to analyze data using the methods presented in the R/S-Plus programming environment.
Data Mining and Pattern Recognition
This course focuses on methods that are tied to machine learning, i.e., methods that seek to discover structure from the evidence of the data alone. Hence, most methods discussed are computationally intensive, requiring the analyst to develop proficiency in using an efficient statistical programming environment. Therefore, this course requires the use of the R/S-Plus programming environment.
Course Time: Monday/Wednesday 1:30PM– 2:50PM
Room: Theater Row, Room 1015
Office hours: Monday/Wednesday 3:00PM– 4:00PM
Office: Theater Row, 3022
Required Text
Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition.
Springer, New York.
Supplemental Materials are posted via Blackboard
Pre-requisites: BIOS 513, 514, and 524
Grading: Grades will be based on assigned homeworks that will consist of short answer, statistical computing, and problem solving exercises. In addition, a final project is required, with weighting for the final assigned grade as follows:
- 90% Homework
- 10% Term Paper
Course Outline
I. Introduction to the R Programming Environment
II. Review of Linear and Logistic Regression
III. Penalized models
IV. Bayes' rule
V. Discriminant analysis
VI. kernel density estimation
VII. k-nearest neighbors
VIII. Model assessment and cross validation
IX. Classification and regression trees
X. Random Forests
XI. Bootstrap methods