CS 584  Big Data Analytics
Course Description
The course covers scalable machine learning and data mining algorithms for large/complex data. Topics include largescale optimization techniques, hashing, recommendation systems, and tensor factorization. This will be structured as a seminar course with emphasis on public data sets such as Kaggle competitions, MovieLens, and various healthcare datasets. There will be introductory lectures that set the context and provide reviews of relevant material.
Course prerequisites: Graduate Data Mining (CS 570) and familiarity with Python, Matlab, or R.
 Instructor: Joyce Ho
 Office Hours: Tu/Th 14 pm @ MSC W414
Course Schedule
Topic  Subtopic  Readings  Presenter  Slides 

Introduction  Instructor  
LargeScale Learning Techniques  Stochastic Gradient Descent 

Instructor  
Alternating direction method of multipliers 

Instructor  
Sampling 

Instructor  
Nearest Neighbor Search  KDtree 

Instructor  
Localitysensitive hashing 

Instructor + students  
Sketches 

Instructor + students  
Matrix Factorization  Distributed matrix factorization 

Students  
Tensor Factorization  Large scale tensor factorization 

Students  
Transfer Learning  Multitask learning 

Students 