Causal Inference in Data Science From Prediction to Causation
Predictive modeling has led to big successes in making inferences from data. Such models are used extensively, including in systems for recommending items, optimizing content, delivering ads, matching applicants to jobs, identifying health risks and so on. However, predictive models are not well-equipped to answer questions about cause and effect, which form the basis of many practical decision-making scenarios. For example, if a recommendation system is changed or removed, what will be the effect on total customer activity? Which strategy leads to a higher engagement with a product? How can we learn generalizable insights about users from biased data (e.g. that of opt-in users)? Through practical examples, I will show the value of counterfactual reasoning and causal inference for such scenarios, by demonstrating that relying on predictive modeling based on correlations can be counterproductive. I will then present an overview of experimental and observational causal inference methods, that can better inform decision-making through data, and also lead to more robust and generalizable prediction models.