Data Science: Weekly Projects

There are 7 Weekly Projects in our Data Science Immersive, each building on top of skills learned previously to scaffold students' learning over the entire course.

Our weekly projects include objectives, requirements, starter-code, rubric, and suggested resources - all of which tie into the overall competencies for each unit.

See the feedback guidelines to read more about how we provide feedback to students.

Project 1: SAT Scores + Summary Statistics

Provided with a dataset of SAT scores from across the United States, students will perform exploratory analysis for their client the College Board, using NumPy, Matplotlib, and Tableau to apply basic summary statistics. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with performing exploratory data analysis, including plotting visuals and descriptive statistics.

Goal: Describe data, apply summary statistics, visualize data.
Detailed Spec File

Project 2: Billboard Hits + Data Munging

Using a dirty dataset of Billboard hits, students will use Pandas to munge data, create a problem statement, and perform exploratory analysis for a local music publisher. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with the importance of data cleansing and munging in addition to data analysis.

Goal: Clean data, run statistical analysis, evaluate findings.
Detailed Spec File

Project 3: Liquor Sales + Linear Regression

Given access to state liquor sales data, students will choose between performing market research or conducting tax audits, using Pandas, stats models, and sklearn to transform data, perform linear regression, and plot results. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with the role of audience analysis and model defense in real world data science presentations.

Goal: Acquire & transform data, run linear regression, plot results.
Detailed Spec File

Project 4: Web Scraping + Logistic Regression

Posing as a private contractor, students will scrape website data and use Pandas, Statsmodels, and NLTK to clean and analyze data, perform logistic regression, and evaluate correlation coefficients. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with acquiring and cleaning scraped data, tuning and evaluating a model, and presenting real world recommendations.

Goal: Scrape data, perform logistic regression, correlate data, present insights.
Detailed Spec File

Project 5: Disaster Reflief + Classification

As a researcher for a disaster response agency, students will pull remote data on Titanic survivors in order to build a local database, run a logistic regression classification model, and validate results from test subsets. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with acquiring remote data, building a local database, tuning and evaluating a model, and presenting real world recommendations.

Goal: Acquire data, build pipeline, perform classification, validate results.
Detailed Spec File

Project 6: IMDB API + Random Forests

Acting as a member of the Netflix data science team, students will collect data from IMDB's API and use SQL to join it with additional scraped website data, in order to construct a random forest model that identifies ratings indicators and correlates these findings with viewer sentiment analysis. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with acquiring API data, joining multiple datasets in a local database, tuning and evaluating a model, and presenting real world recommendations.

Goal: Acquire & join data, build and tune model, evaluate and present insights.
Detailed Spec File

Project 7: Airport Delays + Cluster Analysis

Working as an airport operations consultant, students will analyze plane delay data in US airports, creating a local PostgreSQL database, performing a principal component analysis, and writing up a detailed technical report describing their clustering methods and evaluation metrics. Present your findings in a Jupyter notebook with an executive summary, visuals, and recommendations.

This project will familiarize students with acquiring and cleaning complex data, tuning and evaluating an unsupervised learning model, and presenting real world recommendations.

Goal: Build local database, build and tune model, evaluate and present insights.
Detailed Spec File

a: Weekly Projects