1. Introduction to Counterfactual Analysis

What is counterfactual analysis?

Counterfactual analysis is a set of techniques for trying to understand the impact of a treatment. It does so by comparing people who experienced that treatment to a hypothetical ‘counterfactual’ version of themselves who did not. Since in the real world, people either do or not experience a treatment, we cannot literally compare the same person under the scenario of experiencing versus not experiencing a treatment. Instead controls (who did not experience the treatment) who are very similar to the treated people in all important respects are used to stand in for the counterfactual. By ‘very similar in all important respects’ we mean that although they did not experience the treatment, they had a very similar propensity to or ‘similar chances of’ experiencing the treatment’ as a person who did experience the treatment. This propensity is estimated based on their levels of ‘confounding factors’ I.e., factors that impact both the likelihood of receiving the treatment and the outcome of interest. We can compare treated and control units on an outcome of interest (e.g., depression) and if those outcomes differ we can be more sure that it is due to the treatment, rather than any confounding factors. As such, counterfactual analysis can support causal inference in a way that models that don’t take into account confounding factors cannot. As we discuss in these tutorials, there are several methods by which propensities can be estimated and used to support causal inference in observational data.

As an example of where counterfactual analysis can be useful, we might ask whether social media has a negative impact on adolescents’ mental health. We might find an association between social media use and depression in adolescents. However, we can’t be sure yet that this association is not due to confounding. For example, maybe adolescents with poorer mental health tend to turn to social media more (and are also more likely to continue to have poorer mental health ,given the stability of mental health over time). Or maybe there are other confounding factors, like attention deficit hyperactivity disorder (ADHD) that might lead people to use more social media and separately increase their risk of depression. If the set of potential confounding factors in this association can be identified and measured they could be used in a counterfactual analysis to see if there is still an association between social media use and depression after accounting for confounding. People with similar propensities to use social media a lot but who differ in their actual use could be compared. If those who use social media more show higher depression levels, we can be more sure that this is actually due to the social media use, rather than other confounding factors. We will use the example of social media use and mental health as an example throughout these tutorials as social media use was identified by our young person advisory group as a factor that they thought it particularly important to explore in research on young people’s mental health.

What kind of research questions can be answered with counterfactual analysis?

Counterfactual analysis can be used to address research questions concerned with the causal impact of some candidate treatment or exposure on some outcome variable of interest. Usually this is in observational data where it is not possible to control whether people experience a treatment or not. It is particularly valuable when it is suspected that people do not randomly experience a treatment or not and this might also be related to the outcome of interest (e.g., we suspect that certain adolescents who use social media more might differ from those who use it less in ways that also increase their risk of depression. Counterfactual analysis needs three main things:

  1. A treatment variable (e.g., social media use)
  2. An outcome variable (e.g., depression)
  3. Confounders or ’matching variables (e.g., ADHD, prior depression, sex/gender, socioeconomic status, physical health etc.)

Other than the social media use example above, examples of research questions that could be asked using counterfactual analysis include:

  • What is the impact of reading for pleasure on mental health?
  • Does school exclusion lead to poorer labour market outcomes?
  • Do more positive relationships with teachers lead to fewer behavioural problems in adolescence?

In fact, as DigiCAT was originally developed for mental health researchers we asked young people from our lived experience advisory groups what mental health research questions they think would be most interesting to investigate with counterfactual analysis. Here are some examples of the questions they felt should be a priority for mental health researchers:

  • What is the impact of social media use on mental health?
  • What is the impact of poorer sleep on mental health?
  • What is the impact of poor peer relationships on mental health?
  • What is the impact of spending time in nature on mental health?
  • What is the impact of physical activity on mental health?

Why not just adjust for covariates in regression?

Perhaps the most widely used approach to attempting to deal with confounding is to adjust for potential confounding within a regression or ANOVA-type model. This approach should be described as ‘conditional adjustment’. However, counterfactual analysis provides a more principled method that places causal inference at the fore. Importantly, it includes steps to check and ensure good covariate balance has been achieved. This is not guaranteed and may fail in traditional covariate adjustment approaches, leading to biases when trying to understand the causal effect of a treatment without any warning signs for the user. Further, counterfactual analysis provides a way of more clearly defining the target population about which inferences are being made. It can also readily incorporate a very large number of covariates. None of this comes at the cost of Interpretability when using methods like propensity matching or weighting. In DigiCAT in particular, both methods use a linear regression for the outcome model so the interpretation of the treatment effect is no more complicated than in a traditional regression model.

What counterfactual analysis approaches are offered in this tool?

DigiCAT currently offers several broad approaches to counterfactual analysis: propensity score matching for binary (traditional propensity score analysis) and ordinal treatment variables (non-bipartite optimal matching) and propensity score weighting for binary variables (inverse propensity of treatment weighting; IPTW). Propensity score matching for binary treatments is based on matching treated and control units with similar propensity scores to achieve balance between treated and control groups. Propensity weighting for binary treatments also uses a propensity score but transforms it into a weight that is used in a weighted regression. The weighted regression up-weights some cases and down-weights others to achieve balance between the treated and control groups. Non-bipartite propensity scoring estimates a propensity score based on treating the treatment variable as ordinal and then matches cases that have similar propensity scores but different treatment levels. We also have several features that are planned for incorporation in the future.

We are continuing to develop DigiCAT to make more approaches available. These are listed under features in development. Feel free to get in touch uoe_digicat-group@uoe.onmicrosoft.com if you have suggestions for approaches you would like to see within the tool.