General Assembly's Data Science Immersive is made up of 4 Units split into 3 weeks each. These are:
|Unit 1||Data Science Foundations||Weeks 1-3||Programming Fundamentals, Pandas & EDA, Linear Regression||3 Flex Sessions|
|Unit 2||Supervised Learning Algorithms||Weeks 4-6||Logistic Regression, Classification, Databases, Ensemble Models||0 Flex Sessions|
|Unit 3||Advanced Modeling Techniques||Weeks 7-9||Unsupervised Learning, Bayes, Time Series||0 Flex Sessions|
|Unit 4||Data Science Careers||Weeks 10-12||Spark & Big Data, Portfolios, Presentations||9 Flex Sessions|
Note: Flex sessions can be used for content review or additional topics that instructors may want to cover.
Unit 1: Data Science Fundamentals
In this unit, students will be practicing the basics of Python, Git, and the Command Line, as well as reviewing foundational statistical concepts that we'll use throughout the rest of the course.
Currently, our student onboarding tasks consist of five modules: Python, Statistics, Git, Command Line, & SQL that pre-train and reinforce these foundations. Student readiness in these topics can be assessed using the required onboarding exercise.
Another goal of this unit is to get students comfortable with the data science workflow, emphasizing the use of Pandas and other tools to acquire, clean, and plot data. Students will learn about data visualization tools and techniques from Tableau to seaborn, and practice communicating their findings to different audiences.
- Week 1: Programming Fundamentals
- Week 2: Exploratory Data Analysis & Pandas
- Week 3: Linear Regression & StatsModels
Unit 2: Supervised Learning
Now that students have had practice in Python, Pandas, and statistical foundations, we'll prepare students to tackle supervised learning models, beginning with logistic regression. They'll use sklearn and pipelines to prepare data, while practicing regularization, tuning, and evaluation. Ultimately, students will learn to evaluate the tradeoffs of each model and communicate their recommendations through formal reports and informal blog posts.
Students will learn the basics of natural language processing, classification, and ensemble models. In addition, students will learn to acquire and parse data from different sources, including web scraping, remote databases, and APIs. Coupled with the review and refresher lessons/labs on SQL, students will practice working in local Postgresql databases.
Finally, students are introduced to the Capstone Project in week 4, with their first deliverable due in Week 6.
- Week 4: Intro to Logistic Regression
- Week 5: Classification & Databases
- Week 6: Trees & Ensemble Methods
Unit 3: Advanced Modeling
Now that students have had practice in acquiring, cleaning, and modeling data using SQL, pipelines, and sklearn, we'll move into more advanced topics, including unsupervised learning, Bayesian inference, and modeling time series data. Ultimately, students will learn to build their own local databases while running principal component analysis and ARMA/ARIMA models. They'll understand the difference between Bayesian and frequentist reasoning, and practice articulating these topics to stakeholder audiences.
Students will also practice real world Github workflows as they apply their skills to a group Kaggle project. Coupled with case studies and workshops, student work should really start to take shape as they complete their second Capstone deliverable.
Unit 4: Data Science Careers
Now that students have had repeated practice with acquiring, cleaning, modeling, tuning, and presenting data, iterating through every step of the data science workflow, we'll get students to think more about potential industry applications of these concepts. They'll continue working on their third Capstone deliverable while they learn about additional data science topics, including MapReduce, Hive, and Spark. Ultimately, students will learn to articulate big data use cases while experimenting with Hadoop and navigating the AWS ecosystem.
Students will also have time to practice common whiteboard problems and interview scenarios, while applying all of their newfound knowledge to part 4 and 5 of their Capstone project, fleshing out a professional portfolio and meeting with industry experts.
Curriculum Material Availability
Resource status is indicated using the following symbols:
- The resources are linked to their location: clicking on a link will take you to the relevant ReadMe file.
- " + " - Resource links with a + are suggested topics for that time block and do not have an existing baseline resource. We'd love for you to contribute a resource with a pull request.
- " # " - Resource links with a # are time blocks dedicated for outcomes lessons. Coordinate with your local outcomes teams to fill these slots.
- " @ " - Resource links with a @ are resources from other courses that need to be adapted for DSI or resources that only contain learning objectives. We'd love for you to contribute a resource with a pull request.
- " * " - Resource links with a * are resources that are currently being worked on.