Anomaly Detection for Data Quality and Metric Shifts at Netflix
Recorded at DataEngConf SF17 in April, 2017 In the course of transforming, publishing and visualizing data, there’s risk of “bad data” creeping into your output at every turn, hurting data credibility and distracting teams from investigating real metric shifts. How does Netflix prevent bad data from causing bad decision-making? We use a variety of techniques to automate the basics, allowing us to focus our energy on the changes in data that indicate real problems with the Netflix product. Hear examples of 1) the checks we impose at multiple steps of the data pipeline to identify source data quality issues and business metric shifts, 2) techniques for anomaly detection on datasets with many dimensions that are highly cardinal, 3) how to set up evaluations in an automated fashion and 4) how we make it easy for humans to investigate issues.