Global Curriculum

General Assembly's Data Science Immersive is made up of 4 Units split into 3 weeks each. These are:

Unit Breakdown

Unit Title Lessons Topics Flex Sessions
Unit 1 Data Science Foundations Weeks 1-3 Programming Fundamentals, Pandas & EDA, Linear Regression 3 Flex Sessions
Unit 2 Supervised Learning Algorithms Weeks 4-6 Logistic Regression, Classification, Databases, Ensemble Models 0 Flex Sessions
Unit 3 Advanced Modeling Techniques Weeks 7-9 Unsupervised Learning, Bayes, Time Series 0 Flex Sessions
Unit 4 Data Science Careers Weeks 10-12 Spark & Big Data, Portfolios, Presentations 9 Flex Sessions

Note: Flex sessions can be used for content review or additional topics that instructors may want to cover.


Unit 1: Data Science Fundamentals

In this unit, students will be practicing the basics of Python, Git, and the Command Line, as well as reviewing foundational statistical concepts that we'll use throughout the rest of the course.

Currently, our student onboarding tasks consist of five modules: Python, Statistics, Git, Command Line, & SQL that pre-train and reinforce these foundations. Student readiness in these topics can be assessed using the required onboarding exercise.

Another goal of this unit is to get students comfortable with the data science workflow, emphasizing the use of Pandas and other tools to acquire, clean, and plot data. Students will learn about data visualization tools and techniques from Tableau to seaborn, and practice communicating their findings to different audiences.

Unit 2: Supervised Learning

Now that students have had practice in Python, Pandas, and statistical foundations, we'll prepare students to tackle supervised learning models, beginning with logistic regression. They'll use sklearn and pipelines to prepare data, while practicing regularization, tuning, and evaluation. Ultimately, students will learn to evaluate the tradeoffs of each model and communicate their recommendations through formal reports and informal blog posts.

Students will learn the basics of natural language processing, classification, and ensemble models. In addition, students will learn to acquire and parse data from different sources, including web scraping, remote databases, and APIs. Coupled with the review and refresher lessons/labs on SQL, students will practice working in local Postgresql databases.

Finally, students are introduced to the Capstone Project in week 4, with their first deliverable due in Week 6.

Unit 3: Advanced Modeling

Now that students have had practice in acquiring, cleaning, and modeling data using SQL, pipelines, and sklearn, we'll move into more advanced topics, including unsupervised learning, Bayesian inference, and modeling time series data. Ultimately, students will learn to build their own local databases while running principal component analysis and ARMA/ARIMA models. They'll understand the difference between Bayesian and frequentist reasoning, and practice articulating these topics to stakeholder audiences.

Students will also practice real world Github workflows as they apply their skills to a group Kaggle project. Coupled with case studies and workshops, student work should really start to take shape as they complete their second Capstone deliverable.

Unit 4: Data Science Careers

Now that students have had repeated practice with acquiring, cleaning, modeling, tuning, and presenting data, iterating through every step of the data science workflow, we'll get students to think more about potential industry applications of these concepts. They'll continue working on their third Capstone deliverable while they learn about additional data science topics, including MapReduce, Hive, and Spark. Ultimately, students will learn to articulate big data use cases while experimenting with Hadoop and navigating the AWS ecosystem.

Students will also have time to practice common whiteboard problems and interview scenarios, while applying all of their newfound knowledge to part 4 and 5 of their Capstone project, fleshing out a professional portfolio and meeting with industry experts.


Curriculum Material Availability

Resource status is indicated using the following symbols:

  • The resources are linked to their location: clicking on a link will take you to the relevant ReadMe file.
  • " + " - Resource links with a + are suggested topics for that time block and do not have an existing baseline resource. We'd love for you to contribute a resource with a pull request.
  • " # " - Resource links with a # are time blocks dedicated for outcomes lessons. Coordinate with your local outcomes teams to fill these slots.
  • " @ " - Resource links with a @ are resources from other courses that need to be adapted for DSI or resources that only contain learning objectives. We'd love for you to contribute a resource with a pull request.
  • " * " - Resource links with a * are resources that are currently being worked on.

Weekly Topic Breakdown

Week 1: Programming Fundamentals

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9 - 10 +Welcome to Data Science! +Morning Exercise #Outcomes +Morning Exercise Reflection
10 - 11:30 Command Line Python Control Flow Programming Fundamentals Arrays & Functions Plotting tools intro
11:30 - 1 Intro to Git Lab: Python function practice Notebooks & CSV Files Lab: NumPy Lab: Plotting
2 - 3:30 Python Data Types Python Iteration Intro to NumPy Lab: Stats practice with Python +Instructor choice
3:30 - 5 Python Collections Lab: Python with GitHub Lab: Datasets and NumPy Dataviz Principles Project 1: Workshop

Week 2: Exploratory Data Analysis & Pandas

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Pandas 1 Intro to Pandas 2 Intro to Pandas 3 Stats Review & Intro to Scipy Plotting With Pandas
11:30-1 Study Design & Pandas Pandas Computation Lab Pandas & Pivot Tables Scipy & Stat Lab Pandas, Plotting, & Project 2
2-3:30 Stats 101 Intro to Data Cleaning Categorical & Dummy Variables Joins & Pandas +Instructor Choice
3:30-5 Pandas & Numpy Data Cleaning Lab Lambda Functions & Missing Data Practicing Joins Project 2: Workshop

Week 3: Linear Regression & Statsmodels

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Modeling Bias Variance Tradeoff Regression Metrics & Loss Functions Gradient Descent Stakeholder Analysis
11:30-1 Data Plotting Evaluating Model Fit Train/Test Split Feature Scaling Presenting to Stakeholders
2-3:30 Intro to Stats Models & Sklearn Regularization & Overfitting Data Workflow Lab 1: Cleaning Study Design +Instructor Choice
3:30-5 Linear Regression Lab Regularization Lab Data Workflow Lab 2: Optimizing Case Study Project 3: Presentations

Week 4: Intro to Logistic Regression

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Classification Intro to Logistic Regression Visualizing Classification Models Advanced Model Evaluation Communicating Results
11:30-1 Web Scraping 101 Logistic Regression Lab Plotting Classification Lab Sklearn & Project 4 Prepare Visuals
2-3:30 Scraping Practice Evaluating Model Fit Project 4: Workshop Regularization Lab Project 4: Workshop
3:30 Classification Lab Model Tuning Lab Intro to Project Capstone, Pt 1 Project 4: Workshop Project 4: Presentations

Week 5: Classification & Databases

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Different Databases Logistic Regression Case Study More SQL Feature Selection SVM: Advanced Topic Lesson
11:30-1 Intro to SQL Pipelines in Sklearn SQL Lab Feature Selection Lab SVM: Advanced Topic Lab
2-3:30 Remote Database Lab 1 Project Pipeline Lab Setup Local Postgresql Server Project 5: Workshop Project 5: Workshop
3:30-5 Remote Database Lab 2 Logistic Regression Lab +Flex: Workshop Project 5: Workshop Project 5: Presentations

Week 6: Trees & Ensemble Methods

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to CARTS SQL Joins Random Forests and Boosting Intro to NLP Capstone Pt 1: Presentations
11:30-1 CARTS Lab Join API Data Lab Practice Methods & Visualize Results NLTK Lab Communicating Models
2-3:30 Servers, JSON, & APIs Decision Trees and Bagging Model Evaluation & Feature Importance Project 6: Workshop Project 6: Workshop
3:30-5 APIs & Classification Tree Lab Practice Methods With Sklearn Model Comparison Lab +Flex: Workshop Project 6: Workshop

Week 7: Unsupervised Learning Methods

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Clustering Intro to Dimensionality Reduction Linear Algebra Review Instructor FLEX PCA Case Study
11:30-1 Clustering Lab Intro to PCA K-Means & Hierarchical Clustering Clustering & PCA Project 7: Workshop
2-3:30 Tuning Clusters PCA Lab 1 Classifier Clustering Lab Clustering & PCA Lab Project 7: Workshop
3:30-5 Advanced SQL & Database Practice PCA Lab 2 Unsupervised Learning Trends Project 7: Workshop Project 7: Presentations

Week 8: Bayesian Inference

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Bayes Linear Regression + Bayes Intro to Requests Library Bayesian Stat Tests Communicating Bayesian Results
11:30-1 Intro to Bayes Lab Logistic Regression + Bayes API Lab Bayesian Stat Testing Capstone Pt 2: Workshop
2-3:30 Bayes Deep Dive Review Prior Models + Bayes Intro to LDA Naive Bayes Lesson Capstone Pt 2: Workshop
3:30-5 Bayes Case Study 1 Bayes Case Study 2 LDA & API Data Lab Naive Bayes Lab Capstone Pt 2: Workshop

Week 9: Time Series Data

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Github for Teams Analyzing Time Series Data Intro to ARIMA Model Tuning ARIMA Models Visualizing Time Series Data
11:30-1 Github for Teams Lab Autocorrelation & Time Series Data ARIMA Predictions Lab ARIMA Tuning Lab Visualizing Time Series Data Lab
2-3:30 Intro to Time Series Data Autocorrelation & Time Series Data Capstone Pt 3: Workshop Kaggle: Workshop Kaggle: Workshop
3:30-5 Kaggle: Workshop Setup Kaggle: Workshop Kaggle: Workshop Kaggle: Workshop Kaggle: Presentations

Week 10: Intro to Big Data

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to Big Data AWS: EC2 & S3 Intro to Spark Database Design: Case Study Big Data Review: Case Study
11:30-1 Hadoop Intro Lab AWS: HDFS, HUE, & EMR Spark Lab 1 Group Project: Setup Data Group Project: Workshop
2-3:30 MrJob Wordcount Lab AWS + Hive Lab Spark Lab 2 Group Project: Workshop Group Project: Workshop
3:30-5 Hive Wordcount Lab Big Data: Case Study Spark: Case Study Group Project: Workshop Group Project: Presentations

Week 11: Advanced Topics & Interview Tips

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 Intro to A/B Testing +Advanced Topics: Flex Interview Prep Interview Prep +Portfolio Prep
11:30-1 A/B Testing +Advanced Topics: Flex Interview Practice Interview Practice +Portfolio Lab
2-3:30 +Content Review: Flex +Content Review: Flex Capstone: Workshop Capstone: Workshop Capstone: Workshop
3:30-5 Capstone: Workshop Capstone: Workshop Capstone: Workshop Capstone: Workshop Capstone: Workshop

Week 12: Careers & Capstone

Session Time Day 1 Day 2 Day 3 Day 4 Day 5
9-10 (Project Review) Morning Exercise (Outcomes) Morning Exercise (Reflection)
10-11:30 +Advanced Topics: Flex +Content Review: Flex +Interview Prep: Flex Capstone: Workshop Capstone Pt 5: Presentations
11:30-1 +Advanced Topics: Flex +Content Review: Flex +Interview Practice: Flex Capstone Pt 5: Presentations Capstone Pt 5: Presentations
2-3:30 Capstone: Workshop Capstone: Workshop Capstone: Workshop Capstone Pt 5: Presentations Capstone Pt 5: Presentations
3:30 Capstone: Workshop Capstone: Workshop Capstone: Workshop Capstone Pt 5: Presentations +Graduation!

results matching ""

    No results matching ""