# CS 584 - Big Data Analytics

### Course Description

The course covers scalable machine learning and data mining algorithms for large/complex data. Topics include large-scale optimization techniques, hashing, recommendation systems, and tensor factorization. This will be structured as a seminar course with emphasis on public data sets such as Kaggle competitions, MovieLens, and various healthcare datasets. There will be introductory lectures that set the context and provide reviews of relevant material.

*Course prerequisites*: Graduate Data Mining (CS 570) and familiarity with Python, Matlab, or R.

- Instructor: Joyce Ho
- Office Hours: Tu/Th 1-4 pm @ MSC W414

### Course Schedule

Topic | Subtopic | Readings | Presenter | Slides |
---|---|---|---|---|

Introduction | Instructor | |||

Large-Scale Learning Techniques | Stochastic Gradient Descent | - Large Scale Online Learning [Bottou & LeCun, 2003]
- Large-Scale Machine Learning with Stochastic Gradient Descent [Bottou, 2010]
- Stochastic Gradient Descent Tricks [Bottou, 2012]
- Hogwild! [Niu et al., 2011]
| Instructor | |

Alternating direction method of multipliers | - Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers [Boyd et al., 2011]
- Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results [Eckstein, 2012]
- A distributed algorithm for fitting generalized additive models [Chu et al., 2013]
| Instructor | ||

Sampling | - Big Data Bootstrap [Kleiner et al., 2012]
- Bayesian Methods for Machine Learning [Neal, 20014]
- An Introduction to MCMC for Machine Learning [Andrieu et al., 2003]
- Scalable and Robust Bayesian Inference via the Median Posterior [Minsker et al., 2014]
| Instructor | ||

Nearest Neighbor Search | KD-tree | - An introductory tutorial on kd-trees [Moore, 1991]
- An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions [Arya et al., 1998]
| Instructor | |

Locality-sensitive hashing | - Chapter 3 of Mining of Massive Datasets [Leskovec et al, 2014]
- Locality-Sensitive Hashing for Finding Nearest Neighbors [Slaney & Casey, 2008]
- Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions [Andoni & Indyk, 2008]
- Hashing for Similarity Search: A Survey [Wang et al., 2014]
- Streaming similarity search over one billion tweets using parallel locality-sensitive hashing [Sunderam et al., 2013]
- Coding for Random Projections [Li et al., 2014]
| Instructor + students | ||

Sketches | - Chapter 4 of Mining of Massive Datasets [Leskovec et al, 2014]
- Sketch Techniques for Approximate Query Processing [Cormode, 2011]
- Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data [Li et al., 2006]
- One Sketch For All: Theory and Application of Conditional Random Sampling [Li et al, 2008]
- FLAG: Fast Large-Scale Graph Construction for NLP [Goyal et al., 2012]
| Instructor + students | ||

Matrix Factorization | Distributed matrix factorization | - (Introduction) Matrix Factorization Techniques for Recommender Systems [Koren et al., 2009]
- Scalable Collaborative Filtering Approaches for Large Recommender Systems [Takacs et al., 2009]
- Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [Gemulla et al., 2011]
- A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems [Zhuang et al., 2013]
- Divide-and-Conquer Matrix Factorization [MacKey et al., 2011]
- Parallel Matrix Factorization for Recommender Systems [Yu et al., 2014]
| Students | |

Tensor Factorization | Large scale tensor factorization | - (Introduction) Tensor Decompositions and Applications [Kolda & Bader, 2009]
- PARCUBE: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition [Papalexakis, 2015]
- FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop [Beutel et al., 2014]
- Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data [Hu et al., 2015]
| Students | |

Transfer Learning | Multitask learning | - A Survey on Transfer Learning [Pan & Yang, 2010]
- Integrating Low-Rank and Group-Sparse Structures for Robust Multi-Task Learning [Chen et al., 2011]
- A Regularization Approach to Learning Task Relationships in Multitask Learning [Zhang & Yeung, 2014]
- Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising [Ahmed et al., 2014]
| Students |