Assessing the Risk of Bias in non-randomised studies of interventions
So risk of bias is grippingly exciting and we’ve got the statistics to prove it. The paper describing the ROBINS-I (Risk of Bias in Non-Randomised Studies of Interventions) has been tweeted 363 times and cited 5 times (putting it in the top 5% of all research outputs scored by Altmetric). And on 17th October 2016 we at the MESS had a great presentation (slides below) from Julian Sterne, Julian Higgins and Barney Reeves, three of its key developers(1).
Why do we need a risk-of bias tool for non-randomised studies? Surely, when we think about interventions, we should only consider getting our evidence from RCTs?
Well it is true that RCTs provide us with the best platform for a comparing interventions and allow us to remove confounding influences on the treatment effect and that is why they are considered the gold standard or best evidence for effects of interventions(2). But some interventions cannot be tested with an RCT and non-randomised studies are often the only studies available to provide evidence long term outcomes, they are the place where tweaks and small changes to interventions can be tested, and where the delivery of interventions in real world settings or to broad spectrum populations can also be tested out. Often they are the only place where harms and unintended negative effects of interventions can be found. So, despite RCTs being the ‘gold standard’ non-randomised studies are often the working mines where some real evidence “ore” lies.
The work on ROBINS-I was funded by the Cochrane Collaboration Methods Innovation Fund and the Medical Research Council. When starting their work the ROBINS-I team asked Cochrane systematic reviewer teams if they did, in fact, include non-randomised studies in their systematic reviews? An aside – Cochrane systematic reviews typically are reviews of interventions and usually include, exclusively, only RCTs so asking them this was a bit like asking traffic police if they ever went above the speed limit, or vegetarians if they occasionally ate a bacon sandwich. They found, unsurprisingly, for all the reasons stated above, that Cochrane review teams did indeed include non-randomised studies. Also a quick look at Matt Page’s cross sectional study of the current state of systematic reviews, as reported a few months ago in the MESS, also reported that 9% of Cochrane reviews and 25% of systematic reviews published in 2014 contain non-randomised studies(3). So a means of clearly and responsively assessing bias of nonrandomised studies of interventions for use in systematic reviews was needed.
Why would we be excited by the ROBINS-I? Why not just use one of the existing tools?
Before the ROBINS-I tool was developed, and this paper was published, reviewers had access to a huge number (193) of other scales and checklists available to assess ‘methodological quality’ of non-randomised studies as documented by Deeks et al in an evaluation of non-randomised intervention studies (4). From these six emerged as useful for systematic reviews (4). While these covered core and important domains identified by team Deeks et al (such as creation of the intervention group, comparability of groups at analysis, blinding of participants and investigators) they also included aspects related to reporting of the study and to generalisability (external validity) which are not related to how biased a treatment effect might be – and so are not that useful to the systematic reviewer. And in reality only 2 scales were actually really practically used in systematic reviews the Newcastle Ottawa Scale and the Downs and Black scale. With the ROBINS-I the authors (many part of the Deeks et al research project) tell us we now have a comprehensive, domain based tool to assess bias in non-randomised studies of interventions that focuses on domains related solely on bias and not related to external validity or reporting. In addition the tool comes with detailed instructions, a manual, for how to complete it, which was often lacking from the pre-existing tools (1).
What makes this tool so different?
The main thing that sets the ROBINS-I apart from the other tools is that it sets up the premise of an ‘imaginary’ RCT. This is the RCT that would replace the non-randomised study you are assessing. This RCT need not be ethical or feasible, for example you could randomise people to receive care on and intensive care ward or to living in certain parts of a city e.g that have more cycling infrastructure. In the context of the risk of bias tool this is entirely acceptable, whereas in real life it might not be possible to do this (and probably why there are no RCTs). It is against this hypothetical, ‘target’ RCT that you assess the risk of bias of your non-randomised study. So bias is defined as the differences between the non-randomised study you are assessing and the target RCT. The other main aspect of difference is that it asks you to set out what the potential confounders are, and what were those measured in the study. This is particularly important as while other tools might ask if confounders have been measured and adjusted for this tool asks you to think about which specifically have been measured and which are actually appropriate to have adjusted for; a study cannot get a low risk of bias for adjusting for inappropriate confounders in a ‘tickbox’ type exercise. Also it asks you to think if there are any confounders that should not have been adjusted for and might interfere with the analysis. Assessment of confounding is particularly important in non-randomised studies because we would expect in non-randomised studies that people would be offered the intervention depending on their prognosis or prognostic variables. E.g. people who are more poorly might be less likely to get a certain intervention than those who are less poorly.
What domains of bias are assessed and how are they operationalised?
The authors identified seven domains of bias. Before the intervention starts or at the time of intervention; bias due to confounding, bias in selection of participants and bias in classification of intervention. And Post intervention; bias due to deviations from intended interventions, bias due to missing data, bias in measurement of outcomes and bias in selection of the reported result. Only the latter four domains, that occur post-intervention, are substantially similar or overlap with those in the risk of bias assessment for RCTs as discussed in a previous MESS (5). Each signalling question is answered with Yes’, ‘Probably yes’, ‘No’, ‘Probably no’, or ‘No information’ and from these guidance follows on whether the domain is at ‘Low risk’, ‘Moderate risk’, ‘Serious risk’ or ‘Critical risk’ of bias. A study with a numerical outcome judged to be at ‘Low risk’ of bias would be considered to be similar risk of bias as that in a ‘high quality’ RCT. The paper (open access) sets out clearly how to use the ROBINS –I with specific instructions for reviewers to follow that I won’t try to reproduce here(1). But it is worth saying that the risk of bias tool is applied per numerical outcome result, not per study. Thus allowing a reviewer to make nuanced risk of bias judgements for studies that have both objective outcomes which may be at low risk of bias and subjective outcomes which may be at higher risk of bias. In their presentation (slides available here) the authors provide illustrative examples for each of the domains which is enormously helpful.
How was the tool developed?
The tool was developed over three years by experts discussing and arguing the best way forward until a consensus was reached on which were the domains of bias to assess. To help reviewers to judge bias for each domain signalling questions were drafted and the wording for each of these was also discussed and refined over several rounds of piloting, and face to face meetings and workshops. This paper is the culmination of huge research collaboration by 35 authors from six countries UK, USA, Canada, France, Denmark and Australia as well as numerous reviewers acting as pilot testers(1). The use of the tool will mean that evidence from non-randomised studies can be assessed with more clarity and transparency and detailed description of bias can be provided.
The authors stated that next they plan to think about how well the tool works for specific study designs such as self-controlled designs, controlled before and after studies, interrupted time series and studies based on regression discontinuity and instrumental variable analyses. They also plan to develop interactive software for using ROBINS-I and guidance for use of ROBINS-I in specific healthcare areas for example public health. The team are keen to generate a repository of research data for the ROBINS-I from people who are using it for systematic reviews. In this way meta-research on ROBINS-I can be carried out in the future to improve it, and could even facilitate automated incorporation of RoB assessments alongside the original papers in databases and other repositories .
- Sterne JA, Hernan MA, Reeves BC, Savovic J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ (Clinical research ed). 2016;355:i4919.
- Howick J, Phillips B, Ball C, Sackett D, Badenoch D, Straus S, et al. Oxford Centre for Evidence-based Medicine – Levels of Evidence (March 2009). http://www.cebm.net/oxford-centre-evidence-based-medicine-levels-evidence-march-2009/. 2009.
- Page MJ, Shamseer L, Altman DG, Tetzlaff J, Sampson M, Tricco AC, et al. Epidemiology and Reporting Characteristics of Systematic Reviews of Biomedical Research: A Cross-Sectional Study. PLoS Med. 2016;13(5):e1002028.
- Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health technology assessment (Winchester, England). 2003;7(27):iii-x, 1-173.
- Savović J, Weeks L, Sterne JA, Turner L, Altman DG, Moher D, et al. Evaluation of the Cochrane Collaboration’s tool for assessing the risk of bias in randomized trials: focus groups, online survey, proposed recommendations and their implementation. Systematic Reviews. 2014;3(1):37.