BIOS 667: Advanced Data Analysis

Course Objectives
Biostatisticians must increasingly utilize statistical methods that enable one to discover patterns and relationships in a vast amount of data. During this course students will gain insight regarding statistical methods used to discover the underlying structure of large complex datasets. Specific topics will include bootstrap methods, discriminant analysis, k-nearest neighbors, classification and regression trees, and random forests. At the conclusion of the course students will be able to analyze data using the methods presented in the R/S-Plus programming environment.

Data Mining and Pattern Recognition
This course focuses on methods that are tied to machine learning, i.e., methods that seek to discover structure from the evidence of the data alone. Hence, most methods discussed are computationally intensive, requiring the analyst to develop proficiency in using an efficient statistical programming environment. Therefore, this course requires the use of the R/S-Plus programming environment.

Course Time: Tuesdays/Thursdays 10:30AM– 11:50am

Room: Theater Row, Room 1015

Office hours: Mondays/Tuesdays 12:00PM– 1:00PM

Office: Theater Row, 3022

Required Text
Trevor Hastie, Robert Tibshirani, Jerome Friedman (2001) The Elements of Statistical Learning. Springer, New York.

Supplemental Materials are posted via Blackboard

Pre-requisites: BIOS 513, 514, and 524

Grading: Grades will be based on assigned homeworks that will consist of short answer, statistical computing, and problem solving exercises. In addition, a final project is required, with weighting for the final assigned grade as follows:

  • Students must use VCU's honor system when handing in any take-home work.
  • Course Outline

    I. Introduction to the R Programming Environment

    II. Bayes' rule

    III. Discriminant analysis

    IV. kernel density estimation

    V. k-nearest neighbors

    VI. Classification and regression trees

    VII. Model assessment and cross validation

    VIII. Random Forests

    IX. L1 penalized models

    X. Bootstrap methods



    R