CS 584 - Big Data Analytics

Course Description

The course covers scalable machine learning and data mining algorithms for large/complex data. Topics include large-scale optimization techniques, hashing, recommendation systems, and tensor factorization. This will be structured as a seminar course with emphasis on public data sets such as Kaggle competitions, MovieLens, and various healthcare datasets. There will be introductory lectures that set the context and provide reviews of relevant material.

Course prerequisites: Graduate Data Mining (CS 570) and familiarity with Python, Matlab, or R.

  • Instructor: Joyce Ho
  • Office Hours: Tu/Th 1-4 pm @ MSC W414

Course Schedule

Topic Subtopic Readings Presenter Slides
Introduction Instructor
Large-Scale Learning Techniques Stochastic Gradient Descent Instructor
Alternating direction method of multipliers
  • Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers [Boyd et al., 2011]
  • Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results [Eckstein, 2012]
  • A distributed algorithm for fitting generalized additive models [Chu et al., 2013]
Sampling Instructor
Nearest Neighbor Search KD-tree
  • An introductory tutorial on kd-trees [Moore, 1991]
  • An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions [Arya et al., 1998]
Locality-sensitive hashing Instructor + students
Sketches Instructor + students
Matrix Factorization Distributed matrix factorization Students
Tensor Factorization Large scale tensor factorization
  • (Introduction) Tensor Decompositions and Applications [Kolda & Bader, 2009]
  • PARCUBE: Sparse Parallelizable CANDECOMP-PARAFAC Tensor Decomposition [Papalexakis, 2015]
  • FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop [Beutel et al., 2014]
  • Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data [Hu et al., 2015]
Transfer Learning Multitask learning
  • A Survey on Transfer Learning [Pan & Yang, 2010]
  • Integrating Low-Rank and Group-Sparse Structures for Robust Multi-Task Learning [Chen et al., 2011]
  • A Regularization Approach to Learning Task Relationships in Multitask Learning [Zhang & Yeung, 2014]
  • Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising [Ahmed et al., 2014]