STA 2201: Methods of Applied Statistics II

This course provides an in-depth exploration of fundamental statistical techniques, focusing on both unsupervised and supervised methods. Key topics include clustering algorithms, dimensionality reduction techniques, and supervised classification models. Students will gain an understanding of the mathematics underlying these approaches, enhancing their theoretical knowledge and practical skills.

A significant component of the course emphasizes hands-on implementation, where students will apply these methods to analyze and interpret real-world data. By the end of the course, participants will be equipped to design, implement, and critically evaluate statistical models in diverse applications.

Course topics

  • Review: Gaussian MVN, matrix decompositions.

  • High-dimensional data and curse of dimensionality.

  • Principal component analysis: three ways to interpret PCA.

  • More on PCA: functional, kernel and sparse PCA, PCA with missing values, stability of principal components.

  • Non-linear dimension reduction techniques: t-SNE and UMAP.

  • Clustering methods: k-means, gaussian mixture models, spectral and hierarchical clustering.

  • Classification methods: logistic regression, KNN, linear and quadratic discriminant analysis.

We will use R programming language for computations. RStudio is a user-friendly environment for developing, running, and documenting R code. R is available for free from CRAN, along with RStudio for a nicer user interface. Downloading and installing R and RStudio on your computer is highly recommended for optimal performance and flexibility. However, if you prefer, you can use the server version of RStudio.

Course content

Lecture notes Practice
Lecture 1 Matrix decomposition practice
Lecture 2 Multivariate normal distribution practice
Lecture 3 Curse of dimensionality practice
Lecture 4 Principal component analysis practice
Lecture 5 Low-rank matrix approximation practice
Lecture 6 PCA variations practice
Lecture 7 K-means clustering practice
Lecture 8 Hierarchical clustering and Gaussian mixture models practice
Lecture 9 Spectral clustering practice
Lecture 10 Linear discriminant analysis practice
Lecture 11 Logistic regression and K-nearest neighbors practice