Week 2 | Lesson 1.2
After this lesson, you will be able to:
- Design an experiment
- Demonstrate good and bad examples of study design
- Explain the objectives for Project 2
Before this lesson, instructors will need to:
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck
|5 min||Introduction||Why care about experimental design?|
|10 min||Demo / Guided Practice||Designing a good experiment|
|10 min||Demo / Guided Practice||Picking the type of question to be answered|
|10 min||Demo / Guided Practice||What a good question looks like|
|10 min||Demo / Guided Practice||Reproducibility|
|10 min||Demo / Guided Practice||Randomization|
|10 min||Demo / Guided Practice||Data analysis steps|
|15 min||Independent Practice|
|10 min||Conclusion||Project 2|
Introduction: Why care about experimental design? (5 mins)
Why care about experimental design? A scientific article, Genomic signatures to guide the use of chemotherapeutics, was published in Nature and caused quite a stir. Using genomics, the amount of chemo could potentially be personalized for several kinds of cancer treatment. Nothing short of amazing!
But wait, upon further review, the study was found to have a flawed study design and an incorrect statistical analysis. This not only lead to a retraction of the paper, but resulted in a law suit from people who were already participating in clinical trials.
Bad experimental design can lead to serious consequences.
Demo / Guided Practice: Designing a good experiment (10 mins)
A good experiment:
- is reproducible (the entire experiment can be duplicated by the same researcher or by someone else) - measures variability
Demo / Guided Practice: Picking the type of question to be answered (10 mins)
The starting point for any experiment, should ask, what type of question are you trying to answer?
- What is the mean, median, and mode of this dataset?
- What day of the week is public transportation in DC used most heavily?
- Based on a small sample, can I infer something about a larger population?
- How do I predict what word a user may type next?
The four questions above represent descriptive, exploratory, inferential, and predictive questions respectively. Depending on what type of question you want to answer will help guide your experimental design. For the most part, descriptive and exploratory questions are asked earlier in the experimental design flow and help to inform inferential and predictive questions.
Check: What type of question do you think Netflix is asking, when it recommends a movie to you, based on movies you've watched previously?
Demo / Guided Practice: What a good question looks like (10 mins)
Now that we've decided on the type of question we're going to ask, what would make it a good question? The goals of a high quality, reproducible question are similar to the SMART Goals Framework.
- Specific: The dataset and key variables are clearly defined.
- Measurable: The the type of analysis and major assumptions are articulated.
- Attainable: The question you are asking is feasible for your dataset and is not likely to be biased.
- Reproducible: Another person (or you in 6 months!) can read your state and understand exactly how your analysis is performed
- Time-bound: You clearly state the time period and population for which this analysis will pertain
Demo / Guided Practice: Reproducibility (10 mins)
Reproducibility is the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently. The term reproducible research refers to the idea that the ultimate product of research is a paper, along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
- What dataset did you use?
- What OS did you use?
- What software did you use? Versions?
- What code did you write?
- What programming language did you use?
- What packages/libraries did you use?
- What you did and why you did it.
Check: What happens if an experiment isn't reproducible?
Demo / Guided Practice: Randomization (10 mins)
It is generally extremely difficult for experimenters to eliminate bias using only their expert judgment, for this reason, the use of randomization in experiments is a common practice. In a randomized experimental design, data are randomly assigned (by chance) to an experimental group. Using randomization is the most reliable method of creating homogeneous groups, without involving any potential biases or judgments.
Check: Why is randomization important?
Demo / Guided Practice: Data science work flow (10 mins)
We've discussed a few high level considerations we need to take in to account, but let's look at the data science work flow again.
Check: What do you think the most important step in this process is and why?
Independent Practice: (20 minutes)
Explain to a partner why the type of question, reproducibility, and randomization are important in study design.