CS 584 - Big Data Analytics
Course Description
The course covers scalable machine learning and data mining algorithms for large/complex data. Topics include large-scale optimization techniques, hashing, recommendation systems, and tensor factorization. This will be structured as a seminar course with emphasis on public data sets such as Kaggle competitions, MovieLens, and various healthcare datasets. There will be introductory lectures that set the context and provide reviews of relevant material.
Course prerequisites: Graduate Data Mining (CS 570) and familiarity with Python, Matlab, or R.
- Instructor: Joyce Ho
- Office Hours: Tu/Th 1-4 pm @ MSC W414
Course Schedule
Topic | Subtopic | Readings | Presenter | Slides |
---|---|---|---|---|
Introduction | Instructor | |||
Large-Scale Learning Techniques | Stochastic Gradient Descent |
| Instructor | |
Alternating direction method of multipliers |
| Instructor | ||
Sampling |
| Instructor | ||
Nearest Neighbor Search | KD-tree |
| Instructor | |
Locality-sensitive hashing |
| Instructor + students | ||
Sketches |
| Instructor + students | ||
Matrix Factorization | Distributed matrix factorization |
| Students | |
Tensor Factorization | Large scale tensor factorization |
| Students | |
Transfer Learning | Multitask learning |
| Students |