CS 534 - Machine Learning

Course Overview

Machine learning impacts many applications including the sciences (e.g., predicting genome-protein interactions, detecting tumors, personalized medicine) and consumer products (e.g., Amazon’s Alexa, Microsoft Kinect, Neflix). In this course, students will learn the fundamental theory and algorithms of machine learning. Students will also obtain practical experience applying standard machine learning methods to solve a variety of problems.

Prerequisites:

  • Undergraduate-level linear algebra
  • Undergraduate-level probability
  • Undergraduate-level algorithms
  • Exposure to statistics
  • Programming ability in Python, Matlab, Julia, R, or C++

Course Logistics

  • Piazza: All annoucements, assignment clarifications, and slide corrections will be posted here. Make sure you check it on a regular basis.
  • Textbook
    • Required: The Elements of Statistical Learning: Data Mining, Inference, and Prediction), by Trevor Hastie, Robert Tibshirani & Jerome Friedman
    • Supplemental: Machine Learning: a Probabilistic Perspective, by Kevin Murphy
    • Supplemental: Pattern Recognition and Machine Learning, by Christopher Bishop
    • Supplemental: Understanding Machine Learning:From Theory to Algorithms, by Shai Shalev-Shwartz & Shai Ben-David
    • Supplemental: A Course in Machine Learning, by Hal Daumé III
  • Office Hours
    • Joyce Ho: Tu 2:30-3:30 PM; Wed 9:00-10:00 AM; By Appointment (MSC W414)
    • Huan He: Fri 9:00-11:00 AM (MSC E308)

(Tentative) Course Schedule

The reading material listed below is optional and the lecture plan may deviate over the course of the semester.

# Date Topic Reference (Chapter) Assignment
1 8/30 Intro + Course Logistics Ch. 1 (Hastie et al.)
Ch. 1 (Murphy)
Homework #0 (Due 9/2)
2 9/4 Intro to Optimization Convex optimization notes Part I and II from Stanford’s machine learning class
Rosenberg’s abridged notes
 
3 9/6 Intro to Statistics and Regression    
4 9/11 Regression Ch 3.1 - 3.4 (Hastie et al.)
Ch. 17.1 - 17.2 (Barber)
Prof. Carlos Carvalho’s MLR Slides
 
5 9/13 Regression + Naive Bayes   Homework #1 (Due 9/28)
6 9/18 Linear Classification    
7 9/20 Linear Classification + Bias-Variance Tradeoff    
8 9/25 Model Assessment    
9 9/27 Model Selection   Homework #2 (Due 10/12)
10 10/2 Model Selection + Bootstrap    
11 10/4 Trees + Additive Models    
12 10/11 Boosting   Homework #3 (Due 10/26)
13 10/16 Random Forest Ch. 15, 16 (Hastie et al.)
Breiman’s paper on random forests
 
14 10/18 Random Forest & Ensembles    
15 10/23 Support Vector Machines Ch. 12 (Hastie et al.)
Ch. 15 (Shalev-Shwartz & Ben-David)
SVM notes from Stanford’s ML class
SVM notes from NYU’s ML class
Ch. 11 (Daume)
 
16 10/25 Support Vector Machines   Homework #4 (Due 11/14)
17 10/30 Dimensionality Reduction Ch. 14 (Hastie et al.)
PCA notes from Stanford’s ML class
Ch. 23 (Shalev-Shwartz & Ben-David)
 
18 11/1 Dimensionality Reduction    
19 11/6 Midterm Exam    
20 11/8 Project Madness + Things to Know    
21 11/13 K-Nearest Neighbors Ch. 13 (Hastie et al.)
Ch. 3.2 - 3.3 (Daume)
Ch. 19.1 - 19.2 (Shalev-Shwartz & Ben-David)
Homework #5 (Due 11/28)
22 11/15 Neural Networks Ch. 11 (Hastie et al.)
Ch. 1-3 (Nielsen)
Ch. 20.1 - 20.3 (Shalev-Shwartz & Ben-David)
 
23 11/20 Neural Networks    
24 11/27 Crash Course in Deep Learning    
25 11/29 Applications (Recommendation Systems)    
26 12/4 Presentations    
27 12/6 Presentations    
28 12/11 Presentations    

Course Grading

Component Weight
Homeworks 35%
Midterm 15%
Project 40%
Participation 10%

Project

You are encouraged to work in groups of 2-3 for the final project. The goal is to either develop a novel algorithm (novelty bonus points will be given depending on the level of difficulty) or try various ML existing algorithms on the dataset. The project is a critical part of the course and a significant factor in determining your grade. Teams are required to hand in a project proposal, a final project report and prepare two presentations on their work.

By default, all team members will receive the same score for their project. If a team feels that this is unfair perhaps due to HIGHLY imbalanced contributions, then every team member needs to provide feedback on the contribution of each of the other team members via email before submission of the final report. After that I will have a meeting with the entire group to mediate.

More details on projects are posted on Piazza under the projects folder.

Component Due Date Weight
Proposal 10/25 15%
Madness 11/8 10%
Presentation 12/4-12/11 25%
Report 12/12 50%

Assignment and Exam Policy

  • Assignments
    • Assignments (homeworks 1-5, final projects) are due electronically on Canvas at 11:59 PM.
    • Each student receives 6 late days that can be used across the 5 homeworks throughout the semester. These late days extend the deadline for 24 hours.
    • A maximum of 3 late days can be used on a given homework.
    • Late days apply to the entire homework, so handing in one problem late counts as a late day towards the whole homework.
    • No credit will be given if you submit the homework late and have no remaining late days.
  • Exam
    • The midterm (open-book, open-notes, no electronic devices) must be taken at the required time.
    • Requests for rescheduling the midterm exam will only be considered if the request is made at least a week prior to the exam date.

Honor Code

All class work is governed by the College Honor Code and Departmental Policy. It is acceptable and encouraged to discuss homeworks with other students. However, this should be noted on your submitted homework and all code and writeup must be written by yourself. Any code and writeup that is found to be similar is grounds for an honor code investigation by the Director of Gradute Studies, Laney Graduate School, and the honor council. Additional extensions on assignments will be granted with appropriate documentation from the Office of Undergraduate Education (OUE)