Data Management and Data Systems

Introduction to the use, design, and implementation of database and data-intensive systems.

Course Overview


This course covers how to use databases in applications, first principles on how to scale for large data sets and how to design good data systems.

A few key topics:

— Introduction to relational data model, relational database engines, and SQL.

— How to scale systems for large data sets on servers and server clusters

— How to design good schemas based on dependencies, normal forms so we build and evolve good applications. This will include indexes, views and transactions.

The class will culminate in a hands-on programming project in SQL+Python — a key part of the course — where you will query, visualize and predict from terabytes of data on BigQuery, a popular cloud database part of Google Cloud Platform.

Key Dates


Lectures: Tues/Thurs
4:30 PM - 5:50 PM
NVIDIA Auditorium
Midterm: Thursday, October 31
4:30 PM - 5:50 PM
NVIDIA Auditorium (In Class Exam)
Final Exam: Monday, December 9
7:00PM - 10:00PM
CEMEX Auditorium

Schedule

Event Date Description Course Materials
Lecture 1 9/24 Tu Why Databases?
Concepts: Data models, DB systems overview
[Introduction: Why databases?]
[Project outline]
[Systems Primer]
[Getting Your Google Cloud Platform Credits]
Reading List:
[AWS: Data Lakes and Analytics]
[AWS: What is a Data Lake?]
Lecture 2 9/26 Th SQL I
Concepts: Schemas, Systems, Select-From-Where
[Example SQL]
[SQL - Part I]
Project 1 Release 9/30 Mon See Course Info for general submission information and the regrade policy. [project1_handout.pdf]
[project1_submission.py]
[Getting started with BigQuery]
Lecture 3 10/1 Tu SQL II
Concepts: Joins, Set operators, Subqueries
[SQL Deep Dive]
Homework 1 Released 10/1 Tu Reminder: Submit your solutions directly via Gradescope. [CS145_Fall_2019_Homework_1.pdf]
Lecture 4 10/3 Th SQL III, Advanced
Concepts: Grouping, Aggregations, Nested queries
[SQL Deep Dive (Same Slides as Previous Lecture)]
Section 1 10/4 Fr 9:30 AM — 10:20 AM
NVIDIA Auditorium
[Section #1 Slides]
Lecture 5 10/8 Tu Scale: Indexing and IO Model
[Scale Slides]
Lecture 6 10/10 Th DJ Patil (former US Chief Data Scientist)
Guest Lecture: Data Ethics and Open Datasets
Project 1 Due 10/11 Fri
Lecture 7 10/15 Tu Sorting, Building Indices Part 1
[ Sorting, Building Indices Slides]
Homework 1 Due 10/15 Tu
Homework 2 Released 10/16 Tu [CS145_Fall_2019_Homework_2.pdf]
Lecture 8 10/17 Th Sorting, Building Indices Part 2
Query Optimization Part 1
[ Sorting, Building Indices Slides]
[ Query Optimization Slides]
Project 2 Release 10/17 Th [proj2_handout.pdf]
[Project 2 Colab Notebook]
Section 2 10/18 Fr 9:30 AM — 10:20 AM
NVIDIA Auditorium
[ Section #2 Slides]
Lecture 9 10/22 Tu Query Optimization Part 2 [ Query Optimization Slides]
Lecture 10 10/24 Th Systems Design: Putting it all together
[ Systems Design Slides]
Lecture 11 10/29 Tu Exam Review
[Midterm Review Slides]
Homework 2 Due 10/29 Tu
Midterm 10/31 Th In-class (4:30 - 5:50pm)
Project 2 Due 11/4 Mo
Project 3 Release 11/4 Mo [project3_handout.pdf]
[high_level_rubric_project_3.pdf]
[Project 3 ML Warmup Colab Notebook]
[Project 3 Colab Template]
Lecture 12 11/5 Tu Transactions [Transactions Slides]
Homework 3 Released 11/5 Tu [CS145_Fall_2019_Homework_3.pdf]
Lecture 13 11/7 Th Transactions [Transactions Slides]
Project 3 Proposal Due 11/8 Fr
Section 3 11/8 Fr 9:30 AM — 10:20 AM
NVIDIA Auditorium
[ Section #3 Slides]
Lecture 14 11/12 Tu Transactions [Transactions Slides]
Lecture 15 11/14 Th Zulfikar Ramzan (CTO of RSA)
Guest Lecture: Data Security
Lecture 16 11/19 Tu E/R Model and Design Theory
[E/R Model Slides] [Design Theory Slides]
Homework 3 Due 11/19 Tu
Homework 4 Released 11/20 We [CS145_Fall_2019_Homework_4.pdf]
Lecture 17 11/21 Th Design Theory Continued
[Design Theory Slides]
Section 4 11/22 Fr 9:30 AM — 10:20 AM
NVIDIA Auditorium
[ Section #4 Slides]
Thanksgiving -- no lecture 11/26 Tu
Thanksgiving -- no lecture 11/28 Th
Project 3 Due 12/2 Mo
Homework 4 Due 12/3 Tu
Lecture 18 12/3 Tu John Doerr (Chairman of Kleiner Perkins)
Guest Lecture: Attacking Big Problems with Data.
Lecture 19 12/5 Th Final Review
[ Final Review Slides]
Final Exam 12/9 Mo 7:00 - 10:00 p.m CEMEX Auditorium

Course Logistics and Policies


Prerequisites CS 103 and CS 107 (or equivalent)

Grading Homework: 10%, Projects: 35% (10 + 10 + 15), Midterm Exam: 20%, Final Exam: 35%.

We will be offering extra credit for in class participation and high quality answers to fellow student questions in piazza.

Piazza Join our Piazza to receive important announcements and get answers to your questions.

Homeworks There will be 4 biweekly homework assignments, worth 10% of your final grade, that accompany the material being taught in class. They will be graded on completion basis — meaning that you will receive full credit as long as you submit the assignment on time and receive a grade above 70%. You will submit your homeork through Gradescope. No late days can be used on homework.
The homework assignments reflect the exam material, so it is in your best interest to complete them thoroughly. Aside from preparing you for the exam, they will assess and reinforce your understanding of the material.

Sections There will be 4 optional discussion sections that will accompany each homework assignment. We encourage you to come in person because they will not be recorded. The slides will be posted online.

Exam Dates

Exam Notes Sheet For the midterm, we allow students 1 two-sided 8.5x11" piece of paper with typed or written notes. For the final exam we will allow students two notes sheets.

Late Days You are allowed a total of two late days shared between all project deadlines. You do not lose any credit when using a late day. If you run out of late days and submit after the deadline, you receive a 0. (Late days can only be applied for projects.)

Lectures Lectures occur on Tues/Thurs 4:30-5:50 p.m. in NVIDIA Auditorium. NOTE that while attendance is not mandatory, we will be giving out extra credit for students with insightful in-class participation.

Lecture Videos Lecture videos will be recorded and posted on Canvas.

Textbook There is no required textbook, but for students who want additional resources, we recommend the following two:
  • Database Systems 2nd Edition by Garcia-Molina
  • First Course in Database Systems 3rd Edition by Ullman

Accomodations Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE) and notify us at least 7 days (ONE week) prior to the Midterm and/or Final Exam.

Projects


Group Size The first two projects are individual only, but the third project you are allowed to work in teams of two.

Project Submissions You will submit your projects via Gradescope. Sign up for Gradescope using your Stanford email address and student ID. The course code is 98Y6NR. Each assignment will include specific instructions regarding what files to submit.

Regrade Policy If you think that we've made a grading mistake or that the work you submitted should be regraded, submit a regrade request on Gradescope within one week of receiving your grade. Be sure that you prepare a short and convincing argument on Gradescope about why you think your work was incorrectly graded – we reserve the right to ignore your regrade request if you don't provide a justification. If you submit a regrade request, we reserve the right to regrade your entire assignment. This means that your overall score could go down.

Staff


Instructor

shiva
Shiva Shivakumar

Teaching Assistants

albert
Albert Feng (head)
andrew
Andrew Sharp
bryan
Bryan Kim
eric
Eric Matsumoto
hang
Hang Jiang
jennie
Jennie Chen
luyao
Luyao Hou
ning
Ning Niu
nishant
Nishant Rai
qiwen
Qiwen Wang
yulian
Yulian Zhou

Office Hours


All OH will be held in Huang Basement, in the large, open collaboration area with desk/tables. The one exception is Prof Shiva's OH, which will be held in Huang 050A.

To sign up for OH, use our queue: https://queuestatus.com/queues/518/