Iterative Data Science
There are few best practices for data science that are as widely publicized and agreed on as there are for software development. This has led to a disconnect between product teams and the ambiguity associated with data science. Due to this disconnect, product teams are finding it difficult to implement solutions in a timely fashion.
The goal of this post is to outline some general best practices for data science, developed at Target, that will facilitate more effective product development.
This post will focus on developing the following patterns:
3) Experiment Design
Data science investigations are frequently lengthy and complex endeavors that can take months to be completed, if at all. By contrast, software teams typically work in defined units of time called sprints, sometimes as short as a week or a few days. Working in sprints has enabled steady, measurable progress toward goals throughout Target. Stories are the units of work that are completed during sprints, which allow work to be broken down into manageable chunks. It would be ideal then to find a way of organizing data science investigations in such a way that they can be described in terms of sprints and stories while not constraining the ability of data scientists to explore the problem and find the best solution.
We propose organizing data science investigations into three distinct story types, with each story time boxed for one sprint: discovery, experiment and implementation.
Discovery stories allow for time-boxed but open-ended exploration of different possible approaches to the problem.
Experiment stories consist of individual experiments to examine the specifics of the solution that will be implemented.
Implementation stories represent the work of taking a proposed solution and putting it into production.
Note: Prior to the discovery phase, teams should be able to define the problem and provide an action plan with realistic goals set in place.
The discovery phase represents the initial effort spent on finding a solution to a problem.
This phase could include general research around the topic of interest, or investigating the problem statistically and devising possible approaches. This is also an ideal time for less-structured activities such as exploratory data analysis. The intent of this phase is not to devise a complete solution but to determine feasible approaches that can be worked into a complete solution through experimentation.
The acceptance criteria for a discovery story should include:
An estimate of whether any approaches that were investigated are feasible
A proof of concept for the best feasible approach investigated
Follow-up stories that either specify additional approaches to investigate or experiments to be performed
The majority of stories will generally be experiments. Experiments must be structured carefully to ensure that they can be performed as part of iterative development.
They must answer a single yes/no question.
The hypothesis must be falsifiable, with clear criteria for a positive or negative result.
The results of the experiment must be actionable regardless if the hypothesis is true or false.
The results of each experiment should be well-documented and go through at least one peer review before the story can be moved to done.
Peer review ensures consistency of methodology across the team and improves communication of results. Peer reviewing should be done by someone who is familiar with good experiment design. They should verify that the conclusions make sense based on the evidence obtained during the experiment. If no one is familiar with experiment design, there should be multiple peer reviewers to aggregate the necessary knowledge.
Documentation of results helps the team keep track of whether a particular approach has been taken and why. Documentation should take cues from lab reports, include details such as which datasets were chosen and why, what changes were made to the data, conclusions from the experiment and recommended next actions. Data privacy and integrity should always be kept in mind during experiments and documentation. Target emphasizes security, making sure all data is being used responsibly.
Ultimately, every experiment story should result in either a follow-up experiment or an implementation.
The work of creating or modifying production systems based on results from experiments is encapsulated in implementation stories.
Communication is a key component of the implementation phase since there is often an implicit handoff between the person who has been developing an approach through theory and experimentation and someone who will be integrating that approach with an existing system in a performant and well-designed way. Communication can easily be improved during this phase, through proper documentation of the approach to be implemented and through pairing.
It is necessary to have measurements in place to be able determine whether an experiment has been successful, and to determine whether a solution in production is helping you reach your goals. It is also necessary to choose the correct measurements for your problem domain; a system designed to flag logs that contain errors might require high sensitivity, while a system that filters applications might require a strong balance between precision and negative predictive value.
Examples of measurements:
Two-Class Classification Problems
Positive predictive value (ppv, or precision): What % of positive classifications correspond to positive labels?
Negative predictive value (npv): What % of negative classifications correspond to negative labels?
Sensitivity (recall): What % of positive labels are positively classified, also called recall?
Specificity: What % of negative labels are negatively classified?
Overall accuracy: What % of labels are correctly classified?
R^2: How well correlated is the forecast to the set of observations?
Root Mean Squared Error (RMSE): Giving preference to large errors, what is the rate of error?
RMSE/Mean Actual: What is RMSE, adjusted for scale?
Mean Absolute Error: What is the average error, regardless of sign?
Mean Absolute Error/Mean Actual: What is the average error, adjusted for scale?
As product teams continue to seek out data scientists, bridging the disconnect between the two workflows is vital. Adopting iterative data science combines experimental processes with agile practices. The value from this workflow is seen in the acceleration of data science experimentation and the overall faster times from discovery to implementation.
* This post and its content is the result of efforts by the Metrics Ingestion and Prediction System (MIPS) team at Target, with the intent of developing successful data science processes for product teams.