CS 334 - Machine Learning

Jump to:

Course Overview

Machine learning impacts many applications including the sciences (e.g., predicting genome-protein interactions, detecting tumors, personalized medicine) and consumer products (e.g., Amazon’s Alexa, Microsoft Kinect, Netflix). In this course, students will cover the underpinnings, algorithms, and practices that enable a computer to learn. Emphasis will be on fundamental theory and algorithms in statistical machine learning, and approaches to applying machine learning in a variety of domains. Students will also obtain practical experience applying standard machine learning methods to solve a variety of problems.

Prerequisites

CS 170/171/224/253
MATH 221

Learning Outcomes

Upon successful completion of this course, students will be able to:

Understand the basic building blocks and general principles that underlie machine learning algorithms;
Be familiar with specific, widely used machine learning algorithms;
Learn methodology and tools to apply machine learning algorithms to real data and evaluate their effectiveness and performance.

Textbook

Required (R) : An Introduction to Statistical Machine Learning, by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
Supplemental (S1): A Course in Machine Learning, by Hal Daumé III
Supplemental (S2): A First Encounter with Machine Learning, by Max Welling
Supplemental (S3): The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert Tibshirani & Jerome Friedman
Supplemental (S4): Understanding Machine Learning:From Theory to Algorithms, by Shai Shalev-Shwartz & Shai Ben-David

Course Schedule (Tentative)

The reading material listed below is optional and the lecture plan may deviate over the course of the semester.

#	Date	Topic	References
1	8/28	Introduction + Course Logistics	R: Ch. 1 S2: Ch. 1
2	9/4	Supervised Learning + K-Nearest Neighbors	R: Ch. 2.1-2.1.2, 2.1.4 S1: Ch. 3 S2: Ch. 5
H1	9/4	Homework 1 Out 9/4. Due 9/18.
3	9/9	Decision Tree	R: Ch. 8.1.2 S1: Ch. 1
4	9/11	Model Assessment	R: Ch. 5.1 S1: Ch. 5.5
5	9/16	Model Selection	R: Ch. 5.2 S1: Ch. 5.6
6	9/18	Model Selection + Linear Algebra Review	CS229 Linear Review Math4ML
H2	9/18	Homework 2 Out 9/18. Due 9/30.
7	9/23	Linear Regression	R: Ch. 3 Ch. 17.1 - 17.2 (Barber) UT’s MLR Slides
8	9/25	(Stochastic) Gradient Descent	S1: Ch. 7.4 S1: Ch. 14.2
9	9/30	Linear Regression (Part II)
H3	9/30	Homework 3 Out 9/30. Due 10/9.
10	10/2	Dimensionality Reduction	R: Ch. 6.1-6.3
11	10/7	Dimensionality Reduction (Part II)	R: Ch. 10.1-10.2 Stanford PCA notes
12	10/9	Naive Bayes	S2: Ch. 6
13	10/16	Logistic Regression + Catchup	R: Ch. 4.3
P1	10/16	Project Proposal Due
E1	10/21	In-Class Midterm
15	10/23	Perceptron	S1: Ch. 4 S2: Ch. 6
H4	10/23	Homework 4 Out 10/23. Due 11/4.
P2	10/27	Spotlight Slides Due
16	10/28	Project Madness
17	10/30	Practical ML Advice
18	11/4	Bagging + Random Forest	R: Ch. 8.2.1-8.2.2
H5	11/4	Homework 5 Out 11/4. Due 11/15.
19	11/6	Gradient Boosting	R: Ch. 8.2.3 S1: Ch. 13.2 Ruder’s blogpost
20	11/11	Ensembles	S1: Ch. 13.1
21	11/13	Bias + Variance Tradeoff	R: Ch. 2.2.2 S3: Ch. 9.2 S4: Ch.5 Fortmann-Roe’s notes
21	11/18	Support Vector Machine	R: Ch. 9 S1: Ch. 11.5 S2: Ch. 8-9 Stanford SVM notes NYU SVM notes
22	11/20	Neural Network	S1: Ch 10 S2: Ch. 11 Nielsen: Ch. 1-3 S4: Ch. 20.1 - 20.3
23	11/25	Neural Network II
24	11/27	Deep Learning
P3	12/2	Project Presentations
P3	12/4	Project Presentations
P3	12/9	Project Presentations
P4	12/13	Final Project Report Due
E2	12/17	Final Exam

Course Logistics

Contact and Communication

Piazza is the forum for the class. Make sure you check it on a regular basis.

All official announcements, assignment clarifications, slide corrections, and any other communications will happen here.
Any questions regarding the course should be posted on Piazza. You are strongly encouraged to answer other students’ questions when you know the answer. Do not ask or answer questions that are exactly a homework question.
For longer discussions or to get help in person, please attend office hours.
If there are private matters specific to you (e.g special accommodations, requesting alternative arrangements, etc.) or confidential matters you want to report, please email with subject heading starting with the term “CS334-F19”.

Office Hours

Joyce Ho: M 1:00-3:00PM; W 9:00AM-11:00AM @MSC W302M

Course Grading

The final grade will be determined by a weighted average of all the graded items. Final grades may be curved up so that the class mean falls at least in a B range. The class median, mean, and standard deviation will be announced for each assignment and exam so that you have an idea of where you stand. The participation element consists of Piazza surveys, bi-weekly Canvas quizzes, and in-class discussion. Active participation in the class (i.e., answering questions and constructive contributions on Piazza, attendence and involvement in lectures, and office hours) may earn you one higher letter grade (e.g B+ becomes A-) at the discretion of the instructor. The precise requirements is intentionally vaguely specified.

Component	Weight
Homeworks	35%
Midterm	15%
Project	25%
Final	20%
Participation	5%

Homework and Exam Policy

Homework
- Homeworks are due electronically on Canvas and Gradescope at 11:59 PM ET.
- Each student receives six 24-hour “late days” that can be used on any of the 5 homeworks throughout the semester. No more than 3 late days may be used on any single homework.
- Use your late days wisely, no more late days remaining means you will receive zero credit for any late homework.
- Additional extensions on homework will be granted with appropriate documentation from the Office of Undergraduate Education (OUE).
Exam
- The midterm (10 double-sided cheat sheets, no electronic devices) must be taken at the required time.
- Rescheduling the midterm is not permitted except in emergencies with appropriate documentation from OUE.
- The final will be on 12/17 3:00 P.M - 5:30 P.M and cannot be rescheduled under any circumstance.

Honor Code

All class work is governed by the College Honor Code and the Department Statement of Policy on Computer Assignments. While we encourage you to discuss assignments with other students (e.g., high-level discussion, getting hints or debugging help, talking about problem-solving strategies, and discussing ideas together), any code and writeups must be written by yourself. It is not okay to share code or write code collaboratively (this includes posting and/or sharing your code publicly). If you have collaborated with any other students, you must acknowledge the classmates in a comment on the top of your homework.

The internet is also a useful resource for learning. While it is okay to look at resources for the broad topic (e.g., how decision trees work), it is not okay to look for solutions to a specific homework problem (e.g., how to implement decision tree in Python). If you are unsure whether something is allowed, ask. You must cite all online sources used while working on homework and final projects. It is always your responsibility to learn if a source is allowed.

Apparent copies from any source, including your colleagues and internet sites, will be referred to the Emory Honor Council. Every homework submission must have a README file with the following comment:

/* THIS CODE IS MY OWN WORK, IT WAS WRITTEN WITHOUT CONSULTING CODE WRITTEN BY OTHER STUDENTS. Your_Name_Here */

Accommodations

The Department of Computer Science at Emory supports equal access for students with disabilities. Any students needing special accommodations due to a disability should contact the Office of Accessibility Services (OAS) and appropriate arrangements will be made.

If you are a student that is currently registered with OAS and have not received a copy of your accommodation notification letter within the first week of class, please notify OAS immediately. Students who have accommodations in place are encouraged to contact the instructor during the first week of the semester, to communicate your specific needs for the course as it relates to your approved accommodations. All discussions with OAS and faculty concerning the nature of your disability remain confidential.

Python

Programming questions in the homework will use Python 3.7. You are encouraged to use Anaconda to help manage your Python packages.

Running Python

There are three different ways to invoke / launch Python.

Interactive command-line Python. This launches an interactive session where you can type commands that execute interactively, similar to R or MATLAB. This is also a good way to determine which version of python is on your system.
```
  $ python
```
Execute a Python program from the command-line. For example, run all the contents in foo.py.
```
  $ python foo.py  # foo.py is a file containing a Python program
```
Jupyter notebook. This allows you to write and execute Python code interactively in your web browser. It allows you to mix Python code with rendered text and math for the purposes of demonstration or tracking your train of thought. You may find the Stanford CS231n iPython tutorial useful.
```
  $ jupyter notebook
```

Python Resources

Here are some resources and tutorials for picking up Python:

Project

You will work in groups of 2-3 for the final project. The goal of the project is to apply machine learning algorithms to real-world tasks or to prepare you for machine learning or AI research. The project is a critical part of the course and a significant factor in determining your grade (greater than each individual exam). Each team has four deliverables on Canvas: a proposal, a short “madness” presentation, a final presentation, and a final report.

Topic

The project topic is open-ended and can be an application project or an algorithmic project (developing a new learning algorithm or novel variant of existing one). Each team is free to choose any problem and dataset provided the number of samples times the features is at least on the order of millions. This means that the data should be large enough to find some interesting properties and insights from it. If you use an existing dataset, there is a strong preference for using a publicly avialably one. If you intend to collect the needed data yourself, keep in mind this is only one part of the expected project work and can often take considerable time.

Evaluation

The breakdown of the final project is tentatively - proposal (15%), spotlight (10%), presentation (25%), report (50%). These numbers may change slightly, but the majority of the grade will be from the presentation and the report. The team size will also be taken under consideration when evaluating the scope of the project in breadth and depth, meaning that a three-person team is expected to accomplish more than a two-person team would.

Projects will be evaluated based on:

Technical quality of the work: Are the pre-processing and choice of models appropriate and well-justified? Are the models properly evaluated?
Analysis: What are the insights obtained from the project? Are there detailed analysis of what went right and wrong (i.e., why certain methods are more promising than others)?
Significance: Did the team choose an interesting “real” problem or is it a relatively small toy problem?
Novelty: Is this project a well-studied problem? In order to highlight these components, it is important you present a solid discussion regarding the learnings from the development of your method, and summarizing how your work compares to existing approaches.

Project Proposal

For the proposal, you’ll pick a project idea to work on early and receive feedback from the instructor.

Format: Your proposal should be a PDF document that contains approximately 1-2 pages worth of material. It should have the following parts:

Title of the project.
Full name of all team members.
Description of the problem and the data. If you are using a Kaggle competition or dataset, it is not sufficient to provide just the link or to copy the text from the website.
Description of what you plan to do. For example, what are the machine learning methods you plan to apply or improve on. What are the experiments you plan to run and how do you plan to evaluate the algorithms?
Short list of references to showcase what has been done to date.

Grading: The project proposal is mainly intended to make sure you decide on a project topic, thought about the initial steps to take, and get feedback early. As long as your proposal follows the instructions above and the project seems reasonably well-thought out, you should do well on the proposal.

Spotlight

For the spotlight, each team will give a 1-slide, 90-second talk to convey an overview of their project. This gives each team a chance to experience talking to a large audience and get some presentation feedback while also getting an idea of what all the other teams are working on.

Format: 1 Powerpoint presentation slide (PPT) that will auto-advance to the next slide.

Grading: Each team will be scored by the other teams, a small panel of judges, and the instructor based on clarity, problem motivation, quality of the content, and overall presentation quality.

Presentation

For the presentation, each team will give an 8-10 minute talk (based on the size of the group).

Format: Slides with content that motivate the problem, the approach, and the experimental results. A PDF version of the slides will be submitted to Canvas.

Grading: Each team will be scored by the other teams and the instructor based on clarity, problem motivation, quality of the content, cohesiveness of the presentation, and overall presentation quality.

Report

Final project reports can be at most 5 pages long (including appendices and figures). We will allow for extra pages containing only references. If you did this work in collaboration with someone else, or if someone else (such as another professor) had advised you on this work, your write-up must fully acknowledge their contributions.

Format: A PDF document that is between 8-12 pages (single columns and no smaller than 11-point font) with additional pages for references. The typical ingredients of a project report are approximately:

Abstract (1-2 paragraphs)
Introduction: What is the problem being addressed? Why is it important? Overview and high-level rationale of your approach. Highlight the novelty, key contributions, or significance of your project.
Background: Past related work on the problem done by others.
Methods: Describe your learning algorithms or proposed algorithm(s). Introduce any relevant mathematical notation if needed. For each algorithm, give a short descrption of how it works. For algorithms covered in class, this can be 1-3 sentences to convey why it might be useful for this problem. If you are using a niche or cutting-edge algorithm, you may want to explain your algorithm in 1-2 paragraphs.
Experiments/Results
- Data description
- Exploratory data analysis, preprocessing, feature extraction, feature selection: Enough to ensure your experiments can be replicated by others
- Modeling choices: What models did you select? Why? How were parameters determined? Other design choices such as subsampling, oversampling, etc.
- Empirical results and comparisons (figures and tables)
Discussion: What are the key findings and lessons learned from the experiment?
Contributions: Please include a section that describes what each team member worked on and contributed to the project. We may reach out and factor in contributions and evaluations when assigning project grades.
Code: Please include a link to either a Github repository, Google drive folder, or Dropbox folder with the code for your final project.

Grading: The final report will be judged based off of the clarity of the report, the novelty of the problem, the technical quality, the analysis, and significance of the work. The major components include a clear motivation and description of the project goals, the chosen methodologies (preprocessing and machine learning models), the presentation and evaluation of the results, replicability of the results (is the description such that someone well-versed in machine learning could obtain similar results on the same dataset), insights from the project and analysis of the results, appropriate/relevant reference list, and the quality of the writing (grammar and style).

More details on projects are posted on Piazza under the projects folder.