Global Curriculum

General Assembly's Data Science Immersive is made up of 4 Units split into 3 weeks each. These are:

Unit Breakdown

Unit	Title	Lessons	Topics	Flex Sessions
Unit 1	Data Science Foundations	Weeks 1-3	Programming Fundamentals, Pandas & EDA, Linear Regression	3 Flex Sessions
Unit 2	Supervised Learning Algorithms	Weeks 4-6	Logistic Regression, Classification, Databases, Ensemble Models	0 Flex Sessions
Unit 3	Advanced Modeling Techniques	Weeks 7-9	Unsupervised Learning, Bayes, Time Series	0 Flex Sessions
Unit 4	Data Science Careers	Weeks 10-12	Spark & Big Data, Portfolios, Presentations	9 Flex Sessions

Note: Flex sessions can be used for content review or additional topics that instructors may want to cover.

Unit 1: Data Science Fundamentals

In this unit, students will be practicing the basics of Python, Git, and the Command Line, as well as reviewing foundational statistical concepts that we'll use throughout the rest of the course.

Currently, our student onboarding tasks consist of five modules: Python, Statistics, Git, Command Line, & SQL that pre-train and reinforce these foundations. Student readiness in these topics can be assessed using the required onboarding exercise.

Another goal of this unit is to get students comfortable with the data science workflow, emphasizing the use of Pandas and other tools to acquire, clean, and plot data. Students will learn about data visualization tools and techniques from Tableau to seaborn, and practice communicating their findings to different audiences.

Unit 2: Supervised Learning

Now that students have had practice in Python, Pandas, and statistical foundations, we'll prepare students to tackle supervised learning models, beginning with logistic regression. They'll use sklearn and pipelines to prepare data, while practicing regularization, tuning, and evaluation. Ultimately, students will learn to evaluate the tradeoffs of each model and communicate their recommendations through formal reports and informal blog posts.

Students will learn the basics of natural language processing, classification, and ensemble models. In addition, students will learn to acquire and parse data from different sources, including web scraping, remote databases, and APIs. Coupled with the review and refresher lessons/labs on SQL, students will practice working in local Postgresql databases.

Finally, students are introduced to the Capstone Project in week 4, with their first deliverable due in Week 6.

Unit 3: Advanced Modeling

Now that students have had practice in acquiring, cleaning, and modeling data using SQL, pipelines, and sklearn, we'll move into more advanced topics, including unsupervised learning, Bayesian inference, and modeling time series data. Ultimately, students will learn to build their own local databases while running principal component analysis and ARMA/ARIMA models. They'll understand the difference between Bayesian and frequentist reasoning, and practice articulating these topics to stakeholder audiences.

Students will also practice real world Github workflows as they apply their skills to a group Kaggle project. Coupled with case studies and workshops, student work should really start to take shape as they complete their second Capstone deliverable.

Unit 4: Data Science Careers

Now that students have had repeated practice with acquiring, cleaning, modeling, tuning, and presenting data, iterating through every step of the data science workflow, we'll get students to think more about potential industry applications of these concepts. They'll continue working on their third Capstone deliverable while they learn about additional data science topics, including MapReduce, Hive, and Spark. Ultimately, students will learn to articulate big data use cases while experimenting with Hadoop and navigating the AWS ecosystem.

Students will also have time to practice common whiteboard problems and interview scenarios, while applying all of their newfound knowledge to part 4 and 5 of their Capstone project, fleshing out a professional portfolio and meeting with industry experts.

Curriculum Material Availability

Resource status is indicated using the following symbols:

The resources are linked to their location: clicking on a link will take you to the relevant ReadMe file.
" + " - Resource links with a + are suggested topics for that time block and do not have an existing baseline resource. We'd love for you to contribute a resource with a pull request.
" # " - Resource links with a # are time blocks dedicated for outcomes lessons. Coordinate with your local outcomes teams to fill these slots.
" @ " - Resource links with a @ are resources from other courses that need to be adapted for DSI or resources that only contain learning objectives. We'd love for you to contribute a resource with a pull request.
" * " - Resource links with a * are resources that are currently being worked on.

Weekly Topic Breakdown

Week 1: Programming Fundamentals

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9 - 10	+Welcome to Data Science!	+Morning Exercise	#Outcomes	+Morning Exercise	Reflection
10 - 11:30	Command Line	Python Control Flow	Programming Fundamentals	Arrays & Functions	Plotting tools intro
11:30 - 1	Intro to Git	Lab: Python function practice	Notebooks & CSV Files	Lab: NumPy	Lab: Plotting
2 - 3:30	Python Data Types	Python Iteration	Intro to NumPy	Lab: Stats practice with Python	+Instructor choice
3:30 - 5	Python Collections	Lab: Python with GitHub	Lab: Datasets and NumPy	Dataviz Principles	Project 1: Workshop

Week 2: Exploratory Data Analysis & Pandas

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Pandas 1	Intro to Pandas 2	Intro to Pandas 3	Stats Review & Intro to Scipy	Plotting With Pandas
11:30-1	Study Design & Pandas	Pandas Computation Lab	Pandas & Pivot Tables	Scipy & Stat Lab	Pandas, Plotting, & Project 2
2-3:30	Stats 101	Intro to Data Cleaning	Categorical & Dummy Variables	Joins & Pandas	+Instructor Choice
3:30-5	Pandas & Numpy	Data Cleaning Lab	Lambda Functions & Missing Data	Practicing Joins	Project 2: Workshop

Week 3: Linear Regression & Statsmodels

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Modeling	Bias Variance Tradeoff	Regression Metrics & Loss Functions	Gradient Descent	Stakeholder Analysis
11:30-1	Data Plotting	Evaluating Model Fit	Train/Test Split	Feature Scaling	Presenting to Stakeholders
2-3:30	Intro to Stats Models & Sklearn	Regularization & Overfitting	Data Workflow Lab 1: Cleaning	Study Design	+Instructor Choice
3:30-5	Linear Regression Lab	Regularization Lab	Data Workflow Lab 2: Optimizing	Case Study	Project 3: Presentations

Week 4: Intro to Logistic Regression

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Classification	Intro to Logistic Regression	Visualizing Classification Models	Advanced Model Evaluation	Communicating Results
11:30-1	Web Scraping 101	Logistic Regression Lab	Plotting Classification Lab	Sklearn & Project 4	Prepare Visuals
2-3:30	Scraping Practice	Evaluating Model Fit	Project 4: Workshop	Regularization Lab	Project 4: Workshop
3:30	Classification Lab	Model Tuning Lab	Intro to Project Capstone, Pt 1	Project 4: Workshop	Project 4: Presentations

Week 5: Classification & Databases

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Different Databases	Logistic Regression Case Study	More SQL	Feature Selection	SVM: Advanced Topic Lesson
11:30-1	Intro to SQL	Pipelines in Sklearn	SQL Lab	Feature Selection Lab	SVM: Advanced Topic Lab
2-3:30	Remote Database Lab 1	Project Pipeline Lab	Setup Local Postgresql Server	Project 5: Workshop	Project 5: Workshop
3:30-5	Remote Database Lab 2	Logistic Regression Lab	+Flex: Workshop	Project 5: Workshop	Project 5: Presentations

Week 6: Trees & Ensemble Methods

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to CARTS	SQL Joins	Random Forests and Boosting	Intro to NLP	Capstone Pt 1: Presentations
11:30-1	CARTS Lab	Join API Data Lab	Practice Methods & Visualize Results	NLTK Lab	Communicating Models
2-3:30	Servers, JSON, & APIs	Decision Trees and Bagging	Model Evaluation & Feature Importance	Project 6: Workshop	Project 6: Workshop
3:30-5	APIs & Classification Tree Lab	Practice Methods With Sklearn	Model Comparison Lab	+Flex: Workshop	Project 6: Workshop

Week 7: Unsupervised Learning Methods

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Clustering	Intro to Dimensionality Reduction	Linear Algebra Review	Instructor FLEX	PCA Case Study
11:30-1	Clustering Lab	Intro to PCA	K-Means & Hierarchical Clustering	Clustering & PCA	Project 7: Workshop
2-3:30	Tuning Clusters	PCA Lab 1	Classifier Clustering Lab	Clustering & PCA Lab	Project 7: Workshop
3:30-5	Advanced SQL & Database Practice	PCA Lab 2	Unsupervised Learning Trends	Project 7: Workshop	Project 7: Presentations

Week 8: Bayesian Inference

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Bayes	Linear Regression + Bayes	Intro to Requests Library	Bayesian Stat Tests	Communicating Bayesian Results
11:30-1	Intro to Bayes Lab	Logistic Regression + Bayes	API Lab	Bayesian Stat Testing	Capstone Pt 2: Workshop
2-3:30	Bayes Deep Dive	Review Prior Models + Bayes	Intro to LDA	Naive Bayes Lesson	Capstone Pt 2: Workshop
3:30-5	Bayes Case Study 1	Bayes Case Study 2	LDA & API Data Lab	Naive Bayes Lab	Capstone Pt 2: Workshop

Week 9: Time Series Data

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Github for Teams	Analyzing Time Series Data	Intro to ARIMA Model	Tuning ARIMA Models	Visualizing Time Series Data
11:30-1	Github for Teams Lab	Autocorrelation & Time Series Data	ARIMA Predictions Lab	ARIMA Tuning Lab	Visualizing Time Series Data Lab
2-3:30	Intro to Time Series Data	Autocorrelation & Time Series Data	Capstone Pt 3: Workshop	Kaggle: Workshop	Kaggle: Workshop
3:30-5	Kaggle: Workshop Setup	Kaggle: Workshop	Kaggle: Workshop	Kaggle: Workshop	Kaggle: Presentations

Week 10: Intro to Big Data

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to Big Data	AWS: EC2 & S3	Intro to Spark	Database Design: Case Study	Big Data Review: Case Study
11:30-1	Hadoop Intro Lab	AWS: HDFS, HUE, & EMR	Spark Lab 1	Group Project: Setup Data	Group Project: Workshop
2-3:30	MrJob Wordcount Lab	AWS + Hive Lab	Spark Lab 2	Group Project: Workshop	Group Project: Workshop
3:30-5	Hive Wordcount Lab	Big Data: Case Study	Spark: Case Study	Group Project: Workshop	Group Project: Presentations

Week 11: Advanced Topics & Interview Tips

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	Intro to A/B Testing	+Advanced Topics: Flex	Interview Prep	Interview Prep	+Portfolio Prep
11:30-1	A/B Testing	+Advanced Topics: Flex	Interview Practice	Interview Practice	+Portfolio Lab
2-3:30	+Content Review: Flex	+Content Review: Flex	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop
3:30-5	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop

Week 12: Careers & Capstone

Session Time	Day 1	Day 2	Day 3	Day 4	Day 5
9-10	(Project Review)	Morning Exercise	(Outcomes)	Morning Exercise	(Reflection)
10-11:30	+Advanced Topics: Flex	+Content Review: Flex	+Interview Prep: Flex	Capstone: Workshop	Capstone Pt 5: Presentations
11:30-1	+Advanced Topics: Flex	+Content Review: Flex	+Interview Practice: Flex	Capstone Pt 5: Presentations	Capstone Pt 5: Presentations
2-3:30	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop	Capstone Pt 5: Presentations	Capstone Pt 5: Presentations
3:30	Capstone: Workshop	Capstone: Workshop	Capstone: Workshop	Capstone Pt 5: Presentations	+Graduation!

i. Students