Data Science: Capstone Project
The Capstone Project is divided into 5 deliverables, each building on top of skills learned previously to scaffold students' learning over the entire course.
The Capstone project deliverables include objectives, requirements, rubrics, and suggested resources - all of which tie into the overall competencies for this course.
See the feedback guidelines to read more about how we provide feedback to students.
Capstone, Part 1: Capstone Topic + Dataset Validation
In Part 1, get started by choosing a topic and problem, describing your goals & criteria for success, potential audience(s), and identifying 1-2 potential datasets. You will present this information in a slide deck, and should be prepared to answer questions and defend your data selection(s). Presentations should take no more than 3-5 minutes.
- Goal: Prepare a 3-5 minute lightning talk that covers your potential topic, audience, and dataset.
- Detailed Spec File
Capstone, Part 2: Problem Statement + EDA
For Part 2, provide an overview of your approach to solving your problem with the data you've chosen. Summarize your objectives, goals & success metrics, and any risks & assumptions. Outline your proposed methods and models, perform your initial EDA, and summarize the process. Describe any data munging needed and create a local database for your Dataset(s). Create a data dictionary as needed.
- Goal: Describe your proposed approach and summarize your initial EDA. Create a local database and data dictionary as needed.
- Detailed Spec File
Capstone, Part 3: Progress Report + Preliminary Findings
In Part 3, you'll create a Progress Report of your work in order to get feedback along the way. Describe your approach, initial results, and any setbacks or lessons learned so far. Your report should include updated visual and statistical analysis of your data. You’ll also meet with your instructional team to get feedback on your results so far!
- Goal: Discuss progress and setbacks, include visual and statistical analysis, review with instructor.
- Detailed Spec File
Capstone, Part 4: Report Writeup + Technical Analysis
By now, you're ready to apply your modeling skills to make machine learning predictions. Your goal for Part 4 is to develop a technical document that can be shared among your peers.
Document your research with a summary, explaining your modeling approach as well as the strengths and weaknesses of any variables in the process. You should provide insight into your analysis, using best practices like cross validation or applicable prediction metrics.
Use your model to display correlations, feature importance, and unexplained variance. Look at how your model performs compared to a dummy model, and articulate the benefit gained by using your specific model to solve this problem. Finally, build visualizations that explain outliers and the relationships of your predicted parameter and independent variables.
- Goal: Detailed iPython technical notebook with a summary of your statistical analysis, model, and evaluation metrics.
- Detailed Spec File
Capstone, Part 5: Presentation + Recommendations
Whether during an interview or as part of a job, you will frequently have to present your findings to business partners and other interested parties - many of whom won't know anything about data science! That's why for Part 5, you'll create a presentation of your previous findings with a non-technical audience in mind.
You should already have the analytical work complete, so now it's time to clean up and clarify your findings. Come up with a detailed slide deck or interactive demo that explains your data, visualizes your model, describes your approach, articulates strengths and weaknesses, and presents specific recommendations. Be prepared to explain and defend your model to an inquisitive audience!
- Goal: Detailed presentation deck that relates your data, model, and findings to a non-technical audience.
- Detailed Spec File