2. Using DigiCAT

Why develop DigiCAT?

Counterfactual analysis provides a set of tools that can help us understand what the active ingredients are in mental health. While the gold standard for doing so would be something like a randomised controlled trial (RCT), RCTs are not always possible and when they are, they are expensive and challenging to do well. Their highly controlled nature also means that they may lack ecological validity so that results from RCTs might not apply very well in the real world.

As such there is a lot of value in being able to get an idea of what might impact mental health from observational data (I.e., non-experimental data). In fact, there is a wealth of pre-existing mental health data ready to be used in this way. Some examples of such datasets can be found in, for example, this catalogue of mental health datasets: https://www.cataloguementalhealth.ac.uk/

However, we noticed that counterfactual analysis is not as widely used in mental health research as one might expect. There may be a number of reasons for this but we think a major one is that counterfactual analysis might not be very accessible for mental health researchers. It would not typically be part of training and it can be technically quite complex, which may limit researcher’s ability to use it. There’s also a lack of accessible tools for counterfactual analysis. Most tools require coding skills and while some web applications exist they don’t do everything. In particular, their functionality tends to be narrowly focused on a particular type of analysis and they tend not be able to deal with the ‘real world’ features of data like the presence of missingness or complex survey data or non-binary treatments. These are things we wanted to address with DigiCAT.

Another reason we made DigiCAT is we noticed that it took rather a long time to code counterfactual analyses using existing packages and things got complicated when we had to consider things like missingness, clustering, large numbers of confounders etc. By pulling together all the code from common ‘journeys’ and creating an easy ‘point and click’ interface we also aim to speed up the implementation of counterfactual analysis even for expert users adept in the technique in existing softwares like R.

Our approach

DigiCAT was built with FAIR principles: https://www.go-fair.org/fair-principles/ in mind and is therefore built in open source software R and Shiny. All code is available at the DigiCAT github and you can also download the code for your specific analysis by clicking ‘Download R script’ at the end of the workflow. This is also useful if you would like to build on the analyses offered in DigiCAT: it provides a script that you can adapt, extend, and otherwise customise to your needs. DigiCAT development also took a user centred approach and many features have been adapted based on the helpful input we received from our user engagements. It is also informed by lived experience expert input and you can read more about below about how you might draw on lived experience expertise in planning, interpreting and disseminating your analyses using the tool.

How to get DigiCAT

There are two ways to use DigiCAT. The first is to download it and use it on your local computer. This is recommended if you need to avoid uploading your data to a server or if you have a very large dataset or very computationally intensive analyses. The second way to use it is via the web application. The two versions are mostly very similar but the web application has a few more limits. These limits are to help ensure data security and to limit very computationally demanding analyses (that would take a long time to run). These types of analyses are better to do with the package version.

R Package

To use DigiCAT on your local computer you should have a recent version of R installed. Within R you can then enter the following lines to install and then run DigiCAT. Once it is installed you only need to run the third line to launch it each time.

install.packages("remotes") 
remotes::install_github("uoe-digicat/DigiCAT")
DigiCAT::run_DigiCAT()
Web Version

The web version of DigiCAT is available at https://digicatapp.shinyapps.io/DigiCAT. For the moment we have disabled data upload until we move it over to a new server. However, you can use the example dataset pre-loaded with the tool to explore its features.

How to use DigiCAT

Both the downloadable and package version of DigiCAT guide you through a journey which begins with data upload (or loading of example datasets) then proceeds through the selection of a counterfactual analysis approach, a balancing stage, and an outcome model stage. Each of these is explained in turn below and more details of the specific methods offered are provided in later sections.

Data upload

The downloadable version of DigiCAT allows users to upload data. This data should be in .csv format and include variable headers. All the data should be in ‘numeric’ form meaning that only numbers can be used to represent variable values. This means that variable labels should be recoded as numbers, e.g., ‘yes = 1; no=0’. Missing values should be coded as ‘NA’ which is the convention that R, the tool that powers DigiCAT uses to represent missing values.

One a file has been selected for upload you are then able to flag any categorical variables. This is to ensure that they are properly treated as categorical. You also select your outcome variable (e.g., ‘anxiety’, your treatment variable (e.g., ‘social media use’) and your matching variables (e.g., ‘gender, socioeconomic status’ etc.). You can also select any additional covariates that you want to include in your outcome model but that you don’t want to use as matching variables. These might be selected if you want to estimate a treatment effect over and above the effects of other influences on mental health (e.g., the effect of social media use after adjusting for the effects of physical activity).

Finally, DigiCAT has been built to handle complex survey data which is commonly seem in large-scale mental health studies. For datasets like this you can select a survey weight or non-response weight variable. If only a survey weight variable is available you can select ‘select if data includes survey weight’. If a non-response or attrition weight is available then you can additionally select ‘select if survey weights above compensate for non-response’. Complex survey datasets also commonly include clustering and stratification variables which can also be selected at the data upload stage.,

Once the various variables have been selected only these will be carried forward for analysis. After uploading the data and selecting your variables you can then click ‘validate data’ and DigiCAT will perform some checks to make sure the data are all in the right format no issues with the analysis are anticipated.

Approach

After uploading the data and validating it you can then proceed to the ‘Approach’ page which is where you select your counterfactual analysis and missingness approaches. Note that on this page you will only be shown options that are applicable to the data you uploaded/selected. For example, if your treatment variable is binary you will only be shown counterfactual analysis options for binary treatments. Different options are shown for ordinal treatment variables. Similarly, you will only be shown non-response weighting as a missingness option if your variable selection included a non-response weight variable. Some approaches have more options than others. For example, for propensity score analysis for binary treatments it is relevant to choose a matching algorithm and matching ratio whereas for non-bipartite propensity matching (for ordinal treatments) these parameters are fixed (always optimal matching with a 1:1 ratio).

Balancing

After choosing a balancing approach and running the ‘balancing model’ some interim output will be shown that can be used to assess the quality of the balancing. Details of this output for specific methods are provided below.

Outcome

The outcome model stage uses the output from the balancing stage and fits a model to estimate the treatment effect. After selecting and running a model, the output tab provides the key information needed to see if the candidate treatment variable appears to have an impact on the outcome. An effect estimate, standard error, and p-value. For some methods marginal effects are also provided (see ‘marginal effects’ for more details). At this point you can also download the R script for your analysis.

Example Dataset

DigiCAT includes an example synthetic dataset which is based on the example here: TODO LINK which examined the effects of reading engagement on mental health. This can be used to help get to grips with counterfactual analysis, for teaching, or just to explore the features of DigiCAT. It includes a set of pre-selected matching, treatment and outcome variables. The treatment variable is reading engagement at age 15 and is offered as both a binary (‘Reading_age15’) and ordinal variable (‘ReadingO_age15) to allow different types of analyses to be explored. The outcome variable (’Anxiety_age17’) is a continuous measure of anxiety at age 17. The matching variables are a set of pre-treatment variables (measured at age 13 or before) that could confound the treatment-outcome association and should, therefore, be matched on. All categorical variables are pre-selected as such. The dataset also includes some missing data to allow exploration of the different missing data options. In the future we also plan to include datasets with complex survey features and other relevant features to allow all tool features to be explored with the example dataset as they are added.

You can use the example dataset by selecting ‘Load example data’ in the Upload data page of DigiCAT. The ‘Data’ tab on the right-hand panel will then show you the data values, allowing you to inspect the dataset and get a feel for it. The data dictionary can be seen in Table 1 below. As per data upload requirements the file is in .csv format and codes missing values as ‘NA’.

Table 1:

Data Dictionary for DigiCAT example data

Variable.label Description Type Values
Gender Participant gender Nominal 1 = male, 2 = female*
Anxiety_age17 Anxiety levels at age 17 as the average of a set of 10 anxiety items measured on a 4-point scale Continuous 1 = lowest to 4 = highest
Reading_age15 A single item indicating whether a participant reads more or less than once a week at age 15 Binary 0 = no, 1 = yes
ReadingO_age15 A single item indicating the extent to which participants' read at age 15 Ordinal 1 = least to 4 = most
AnxietyDepression_age13 A combined measure of pre-treatment anxiety and depression levels at age 13. The average of 20 items on a 4-point scale Continuous 1 = lowest to 4 = highest
Trust_age13 A measure of pre-treatment generalised trust (a wellbeing measure). The average of 5 items on a 4-point scale Continuous 1 = lowest to 4 = highest
SelfControl_age13 A measure of pre-treatment self-control. The average of 5 items on a 4-point scale Continuous 1 = lowest to 4 = highest
SubstanceUse1_age13 Binary 0 = no, 1 = yes
SubstanceUse2_age13 A measure of pre-treatment cannabis use (whether the participant uses cannabis or not) Binary 0 = no, 1 = yes
SubstanceUse3_age13 A measure of pre-treatment alcohol use (whether the participant uses alcohol or not) Binary 0 = no, 1 = yes
SubstanceUse4_age13 A measure of pre-treatment hard drug (whether the participant uses hard drugs or not) Binary 0 = no, 1 = yes
TeacherBond_age13 A measure of pre-treatment positive bond with teachers. The average of 4 items measured on a 4-point scale. Continuous 1 = lowest to 4 = highest
ClassBond_age13 A measure of pre-treatment positive bond with classmates. The average of 4 items measured on a 4-point scale. Continuous 1 = lowest to 4 = highest
SchoolDifficulties_age13 A measure of pre-treatment school difficulties. The average of 4 items measured on a 4-point scale. Continuous 1 = lowest to 4 = highest
Achievements1_age13 A measure of pre-treatment academic achievement in standardised tests of Science Ordinal 1 = lowest to 5 = highest
Achievements2_age13 A measure of pre-treatment academic achievement in standardised tests of Maths Ordinal 1 = lowest to 5 = highest
Achievements3_age13 A measure of pre-treatment academic achievement in standardised tests of English Ordinal 1 = lowest to 5 = highest
K5_nSBQ_REAGGR A measure of pre-treatment (age 13) reactive aggression. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_nSBQ_PHYSAGGR A measure of pre-treatment (age 13) physical aggression. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_nSBQ_PROAGGR A measure of pre-treatment (age 13) proactive aggression. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_nSBQ_INDAGGR A measure of pre-treatment (age 13) proactive aggression. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_nSBQ_PROSO A measure of pre-treatment (age 13) prosociality. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_DevVar19 A measure of pre-treatment (age 13) deviant behaviour. A variety score combining 5 items on a 5-point scale (0-4) Continuous 1 = lowest to 4 = highest
K5_involv A measure of pre-treatment (age 13) parental involvement. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_ResilAdult A measure of pre-treatment (age 13) social support from adults. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_ResilFriends A measure of pre-treatment (age 13) social support from friends. The average of 10 items with a 4-point response scale Continuous 1 = lowest to 4 = highest
K5_BullVict4 A measure of pre-treatment (age 13) bullying victimisation. The average of 10 items with a 5-point response scale Continuous 1 = lowest to 4 = highest

\(*\)Note: this does not imply that we consider gender to be binary: no other genders than male or female happened to be reported in this sample.

Counterfactual Analyses informed by lived experience experts

Why should the research use Lived Experience expertise ?

Including LE experiences alongside this tool can be a useful reminder of the subjective nature of mental health. It is also a direct way to strengthen your research. Increasingly, co-production and PPI (patient and public involvement) is required for funding. This should be taken seriously, rather than treated as an add-on or in a tokenistic way.

The inclusion of lived experience voices and feedback can also help with the practical application of the research: how does it relate to real experiences? This can help you to hone in on practically relevant research questions, and facilitate clinical validity of the interpretation of research findings. If applicable, this can also help with assessing the likely efficacy in real world application of the findings.

The effects are not just one sided, as there can be positive impacts for lived experience collaborators. This can include empowerment, purpose, acquisition of new skills and knowledge, and having a role in changing and improving mental health research and interventions.

How should the research use Lived Experience expertise?

There is not one single definition of lived experience, and so it can come in many forms. It might include those who have previously or continue to experience mental illness difficulties, but it can also include family members of those who have experienced it.

Those with lived experience may already be working in the academy, so it is also fruitful to engage with colleagues (across disciplines) who may bring in a unique lived experience perspective.

Lived Experience experts can help to create or refine research questions. This can include the identification of relevant factors to explore. For example, in exploring active ingredients of mental health, a young person’s advisory group were asked to rank their importance, explain reasons, and add other possible influences. This exercise demonstrated that ‘social media’ was a more important factor than initially expected by the researchers. Young people also highlighted the importance of ‘entertainment’ in general as beneficial for well-being. They compared reading books to watching movies and noted that: “Umm, the thing between movies and books is books are generally better for you because it exposes you to better vocabulary and staring at a screen for too long isn’t the best thing to do. So, it is just that entertaining thing that can really benefit you.” Answers like these may help researchers refining their focus on the important ingredients for young people’s mental health.

The inclusion of those with lived experience can help researchers to understand different perspectives and more fully understand key concepts. For example, further discussion in our advisory group emphasised that ‘social media’ is not homogenous but takes multiple forms, and therefore can have different impacts. The presence of advertising, for example, and how media is circulated via sites and apps may have significant negative influence on mental health. On the other hand, using it to keep in touch with friends, particularly when no other options are available due to distance or illness, may have a positive impact on mental health. As one young person pointed out relying on social media to keep in touch with friends “during COVID helped keep everybody sane”. However, they added that since then “We have not recovered from that.” Referring to social media use post covid they added: “It has worsened. People meeting up has gone down a lot or people are less willing to meet up because they’re, like, we could just text.” They pointed out that not engaging in activities outside of social media may have a negative impact on young people’s wellbeing. Further highlighting the complexities of ‘social media’ use and its impact. Similarly, when the potential impact of reading on young people’s mental health was discussed, one young person highlighted that: “some books can be more influential than others to different people, so it doesn’t matter how much you read. It matters what you read.”, again adding nuance. These examples suggest that exploring use of social media or reading would have been misleading without understanding why, how and when the young people thought these were important.

Lived experience experts can also assist in the interpretation of data. As researchers (and individuals) we come to what we do with our own set of biases and expectations, so it is therefore a good idea to run our interpretations by third parties with lived experience in the area we are wishing to inform.

Such consultation can also assist in the presentation of data, and co-production of research findings. In addition to publishing in scientific journals it is important to present the findings in ways that are easy to understand and utilise for those for whom they are most practically relevant; that is service users and generally those with lived experience. Such communications are best produced in collaboration with them.

Lived experience experts should also be consulted on the dissemination and use of data and findings. Collaboration with lived experience experts increases credibility and practical reliability and applications of research findings.

When they were asked whether it is important to engage youths with lived experience in research, our young experts told us that it is crucial to engage them in all stages of it. One young person stated: “Adults were kids once as well, but I think, feel that since social media is involved that it’s changed a lot. So, they still had the same problems, but now there are also different things and those things could be changed to make it better.” They suggested that this could be done via online advisory groups, dedicated apps that would gather young people’s input or surveys.

Counterfactual Analysis and FAIR principles

How can I make my research open?

Open research refers to the practice of making research methods and materials as open as possible. The aim of open research is to maximize the public benefit of research based on the principle that the wider information is shared, the more benefit it provides. While the application of open research practices can vary between research subjects, it is still relevant to all researchers in every discipline. Among the eight pillars of open science, as defined by The European Commission, are FAIR Data, Research Integrity and Citizen Science which are described below, along with some guidance on how these foundations can be implemented to promote open research when using DigiCAT.

FAIR Data

In 2016, Wilkinson et al. published a set of guiding principles aimed at promoting the reuse of digital assets, such as research data and software: https://www.nature.com/articles/sdata201618. The four guiding FAIR principles (Findability, Accessibility, Interoperability and Reusability) describe how research output should be organized and shared to maximize their use.

  • Findable: Making research outputs discoverable is critical for their reuse. Findability can be promoted by providing metadata (information on research outputs, such as associated keywords.
  • Accessible: After the material has been found, other researchers need to know how they can access it. In some cases, data and software can be openly available, but authentication may also be required.
  • Interoperable: Reuse of materials is supported by maximizing the capacity of data and software to be integrated with other materials, in terms of information retention and difficulty.
  • Reusable: Ensuring data and software are well described will enable its reuse. Publishing accompanying metadata, such as a README file, will better enable material to be extended by other researchers.

If you would like to know more about applying the FAIR principles to your study, an in-depth guide has been provided by GO FAIR.

Research Integrity

Research should comply with standards of honesty, respect, transparency, and accountability. Examples of open science practices that promote research integrity are:

Preregistration: We recommend registering your research plan before using DigiCAT to conduct your own counterfactual analysis. This involves outlining your hypotheses and analysis plan prior to data collection and analysis. Registering your research plan demonstrates that your work is credible and has lasting reproducibility. OSF (Open Science Framework) provides a free and open platform to register your research plan, which offers a variety of templates to accommodate your research question and materials.

Open materials: Sharing the materials used to conduct your research increases its transparency and reproductivity. An example of how materials can be made open when using DigiCAT is the publication of DigiCAT code used to carry out your counterfactual analysis (from the “Get R Code” tool feature) on open-source platforms, such as Git Hub.

Open access: We recommend you choose an open-access journal when publishing results obtained using DigiCAT. This ensures your work can be accessed by anyone, with no restrictions. Open access promotes the impact, reproducibility and accessibility of your work.

Citizen Science

Citizen science is used to describe any type of public involvement in scientific research where researcher and communities work together to address knowledge gaps. There are many benefits to conducting citizen science, including the more meaningful application of research to real-world issues and increased public interest and awareness of research. If you would like to know more about incorporating this in your own research, Wellcome have outlined a step by step guide to public engagement for researchers.