|Ahead of print
|Using Grading of Recommendations Assessment, Development, and Evaluation (GRADE) to rate the certainty of evidence of study outcomes from systematic reviews: A quick tutorial
Shih-Chieh Shao1, Liang-Tseng Kuo2, Yen-Ta Huang3, Pei-Chun Lai4, Ching-Chi Chi5
1 Department of Pharmacy, Chang Gung Memorial Hospital, Keelung, Taiwan
2 School of Medicine, College of Medicine, Chang Gung University, Taoyuan; Division of Sports Medicine, Department of Orthopedic Surgery, Chang Gung Memorial Hospital, Chiayi, Taiwan
3 Department of Surgery, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
4 Education Center, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
5 School of Medicine, College of Medicine, Chang Gung University; Department of Dermatology, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan
Click here for correspondence address and email
|Date of Submission||24-Sep-2022|
|Date of Decision||08-Dec-2022|
|Date of Acceptance||11-Dec-2022|
|Date of Web Publication||20-Jan-2023|
The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework offers a structured approach to assess the certainty of evidence (CoE) in systematic reviews (SRs). The CoE for each outcome falls into one of the four categories: very low, low, moderate, or high. The judgment of CoE is based on five downgrading factors (including the risk of bias, indirectness, inconsistency, imprecision, and publication bias) and three upgrading factors (including large effect size, dose-response relationship, and opposing plausible residual bias and confounding). To improve the transparency of SRs, authors should indicate how they grade the CoE for each outcome and provide a rationale for downgrading or upgrading the CoE.
Keywords: GRADE approach, meta-analysis, systematic review
|How to cite this URL:|
Shao SC, Kuo LT, Huang YT, Lai PC, Chi CC. Using Grading of Recommendations Assessment, Development, and Evaluation (GRADE) to rate the certainty of evidence of study outcomes from systematic reviews: A quick tutorial. Dermatol Sin [Epub ahead of print] [cited 2023 Feb 4]. Available from: https://www.dermsinica.org/preprintarticle.asp?id=368303
| Introduction|| |
The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework offers a transparent system to rate not only the certainty of evidence (CoE) in systematic reviews (SRs) and guidelines but also the strength of recommendations in guidelines., Compared to previous simple critical appraisal tools (e.g., critical appraisal worksheets from the Oxford Centre for Evidence-Based Medicine or Critical Appraisal Skills Program checklists), the main distinguishing features of the GRADE framework include: (1) it separates CoE from the strength of recommendations; (2) the CoE is assessed for each outcome; and (3) observational studies could be upgraded if they meet certain criteria. The major advantage of GRADE is its structured approach to developing evidence summaries that can provide clinicians, patients, and policymakers with a comprehensive guide for formulating recommendations in clinical practice. Currently, more than 100 international organizations worldwide have endorsed the GRADE framework for judging CoE in SRs and clinical guidelines.
Under the GRADE approach, the CoE of randomized controlled trials (RCTs) starts as high and observational studies (e.g., cohort, case-control, before-after, and time series studies) as low unless the risk of bias in nonrandomized studies-of interventions (ROBINS-I) is applied (see details in the next section)., The CoE for each outcome falls into one of four categories (very low, low, moderate, and high) based on five downgrading factors (including the risk of bias [RoB], indirectness, inconsistency, imprecision, and publication bias) and three upgrading factors (including large effect size, dose-response relationship, and opposing plausible residual bias and confounding) [Table 1]. The upgrading factors are usually applied to evidence from observational studies and nonrandomized experimental studies (e.g., quasi-randomized and non-RCTs) to address over-confidence in the effect estimates from the studies with these factors [Table 1]. Of note, the decision to upgrade CoE should be made cautiously if there are serious concerns about any of the five downgrading factors. Primarily guideline developers, rather than SR authors, make final decisions about the strength of recommendation (e.g., strong and weak recommendations) based on the CoE and the balance between desirable and undesirable consequences of alternative management options.,
|Table 1: Factors determining the certainty of evidence of study outcomes from systematic reviews and meta-analysis|
Click here to view
In this quick tutorial for the GRADE approach, we demonstrate how we assess the CoE using a published SR example “Effects of fish oil supplement on psoriasis: a meta-analysis of RCTs,” which reported that fish oil supplementation did not significantly reduce the psoriasis severity, based on 3 RCTs involving 337 participants.
| Five Factors to Downgrade the Certainty of Evidence|| |
Risk of bias (downgrade by 1 to 3 levels)
Review authors could use the Cochrane RoB assessment tool version 1 or 2 and the Newcastle-Ottawa Scale or ROBINS-I tool to evaluate the study quality of the included RCTs and observational studies, respectively.,,, However, if review authors use the ROBINS-I tool to assess the RoB for nonrandomized studies, the CoE for each outcome is initially judged as high. In general, the CoE will not be downgraded if individual studies achieve a low RoB when most or all major domains are judged to be acceptable and any violations are not serious, based on the RoB assessment tools. When one or more serious study biases substantially reduce the confidence in a point estimate, review authors may downgrade the CoE by 1 to 2 levels to indicate that the evidence summary permits only limited inferences regarding the magnitude of a treatment effect. If review authors find more than one very serious RoB in the included nonrandomized studies by using ROBINS-I tools, the CoE for the study outcome can be downgraded by 3 levels.
About 25% of the domains of the Cochrane RoB 1.0 tools were considered unclear in the three RCTs and reporting bias was present in the RCT of Søyland et al. 1993.,, We, therefore, downgraded the CoE by 2 levels due to very serious concerns about the study quality of the included RCTs.
Indirectness (downgrade by 1 to 2 levels)
The CoE may be downgraded when there are substantial differences in populations, interventions, or outcomes between the review question and included studies of SRs. In addition, if the summary of evidence comes from indirect comparisons, review authors should downgrade the CoE due to indirectness.
The review question aimed to determine the effects of fish oil supplementation on the severity of psoriasis. The patient / problem, intervention, comparison, and outcome (PICO) framework of the included RCTs was similar to the review question, and the study outcome (e.g., Psoriasis Area and Severity Index [PASI] score) is the gold standard assessment of psoriasis severity. We, therefore, did not downgrade the CoE by any level on account of indirectness.
Inconsistency (downgrade by 1 to 2 levels)
Review authors should evaluate the extent of statistical heterogeneity based on the similarity of point estimates, the extent of overlap of confidence intervals (CIs), and statistical test for heterogeneity and I2 value (<40% is low, 30%–60% may be moderate, 50%–90% may be substantial, and 75%–100% is considerable). Even if the variability might be explained by chance (e. g. P > 0.1 in tests of heterogeneity, and low I2 values), the GRADE still recommends that review authors include formal tests of whether a priori hypotheses suffice to explain the inconsistency between important subgroups.
The I2 value for the effect of fish oil supplements on the severity of psoriasis among 3 RCTs was 53%, and no exploration of an a priori hypothesis was conducted to explain the heterogeneity. We, therefore, downgraded the CoE by 1 level due to statistical inconsistency among the included RCTs.
Imprecision (downgrade by 1 to 3 levels)
Review authors should refer to the thresholds (i.e., minimal clinically important difference [MCID]) and CIs of the absolute effect as a primary criterion for imprecision rating. For example, when the CI is not wide enough to cross the thresholds (i.e., one or both boundaries of CIs suggest inferences not appreciably different from the point estimate), review authors should consider rating down the CoE by 1 level due to imprecision [Figure 1], Case A], and when the CI is wide and considerably crosses the thresholds (i.e., one or both boundaries of CIs suggest inferences appreciably different from the point estimate), review authors should consider rating down the CoE by 2 levels, due to imprecision [Figure 1], Case B]. When the CI is very wide and two boundaries of CI suggest very different inferences, review authors should consider rating down the CoE by 3 levels, due to imprecision [Figure 1], Case C]. When the CI does not cross the thresholds and the relative effect is large (i. e., relative risk reduction or increase of >30%), the GRADE framework suggests that review authors evaluate whether the sample size meets optimal information size (OIS). If the sample size of the meta-analysis is far less than the OIS, GRADE recommends rating CoE down by >1 level for imprecision. More specifically, for dichotomous outcomes, GRADE recommends rating CoE down by 2 levels for imprecision when the ratio of the upper to the lower boundary of the CI is more than 2.5 for odds ratio or 3 for risk ratio. For continuous outcomes, review authors should consider rating CoE down by 2 levels for imprecision when the sample size is <30%–50% of the OIS (approximately a total sample size of 800 with 400 per group). Moreover, when the baseline risk is very low, the GRADE approach suggests being more restrained in rating CoE down for imprecision.
|Figure 1: Hypothetical scenario for the GRADE assessment of imprecision by confidence interval approach. CI: Confidence interval, MCID: Minimal clinically important difference, GRADE: Grading of Recommendations Assessment, Development, and Evaluation.|
Click here to view
The meta-analysis of three RCTs involving 337 participants indicated that fish oil supplementation did not significantly reduce psoriasis severity, as assessed by the PASI score (mean difference: −0.28; 95% CI: −1.74 to 1.19). A 50% reduction in the PASI score has been found to correlate significantly with demonstrable improvements in patients' quality of life. We, therefore, defined an increase or decrease in PASI scores of 1.15 as the MCID from fish oil intervention after considering the baseline PASI scores in the largest included trial (Kristensen et al. 2018). The point estimate (−0.28) indicated that the treatment effect is not clinically important (e.g., within the range of − 1.15 to 1.15), but the upper (1.19) and lower (−1.74) CI boundary suggested clinically important harm and benefit, respectively (i.e., surpassing the MCID of 1.15). We finally downgraded the CoE by 2 levels due to imprecision.
Publication bias (downgrade by 1 level)
Even when individual studies are well-designed and conducted, meta-analysis of studies may provide biased estimates when review authors fail to identify potentially relevant studies. In practice, studies with negative results or a small sample size, particularly in industry-funded studies, are prone to go unpublished or obscurely published, leading to an upward biased effect estimates. Many approaches are available to help detect publication bias, but the most popular is the funnel plot. As a rule of thumb, tests for funnel plot asymmetry should be used only when there are ≥10 studies included in a meta-analysis, because, with fewer studies, the power of the tests is low. In addition, we suggest that review authors do not use the funnel plot test for continuous outcomes to detect publication bias since the performance of this approach may be misleading. Unfortunately, it is very difficult to be sure that there is no publication bias and to define the thresholds around which to rate CoE down for its likely presence. Therefore, the terms suggested in the GRADE framework for publication bias are “undetected” and “strongly suspected.”
The evidence regarding fish oil supplementation and psoriasis severity comes from 3 RCTs only, and the test for funnel plot asymmetry was not performed to detect publication bias. We, therefore, judged publication bias undetected for this outcome, and did not downgrade the CoE by any level on account of publication bias.
Three factors to upgrade the CoE (usually applied for meta-analysis of observational studies or nonrandomized studies).
Large effect size (upgrade by 1 to 2 levels)
Review authors could upgrade CoE by 1 to 2 levels if there is a large effect (e.g., risk ratio [RR] >2 or RR <0.5) in the absence of plausible confounders, or if a very large effect (e.g., RR >5 or RR <0.2) in pooled results from the meta-analysis with no major threats to internal validity are found. The decision to upgrade the CoE due to the large effect size should take into account not only the point estimate but also the precision around the effects. Review authors should rarely upgrade for large effects if the CI overlaps substantially with effects smaller than the chosen MCID.
This meta-analysis only included RCTs, and we, therefore, did not upgrade the CoE by any level due to the large effect size.
In observational studies, the presence of a dose-response relationship may increase the confidence in the findings and the CoE.
This meta-analysis only included RCTs, and we, therefore, did not upgrade the CoE by any level on account of the dose-response relationship.
Opposing plausible residual bias and confounding
Rigorous observational studies measure confounding factors associated with the outcome of interest and conduct an adjusted analysis that accounts for differences in these factors between the intervention and control groups. For example, a SR addressed the effect of condom use on HIV infection among men who have sex with men. The pooled RR from the meta-analysis of 5 observational studies was 0.34 (95% CI: 0.21 to 0.54) favoring condom use, compared with no condom use. Two of the included studies that evaluated the number of sex partners in those using condoms and not using condoms found that condom users were more likely to have more sex partners. In this case, considering the number of sex partners would strengthen the effect estimate favoring condom use, the review authors could upgrade the CoE due to opposing plausible confounding.
This meta-analysis only included RCTs, and we therefore did not upgrade the CoE by any level for having considered plausible confounding.
| Reproducibility to Rate the Certainty of Evidence by Using Grade|| |
It is important for review authors to acknowledge that judgments of the CoE using GRADE will involve certain subjective opinions, probably varying between assessors. However, the reproducibility of judgments substantially increases with full training, calibration exercises based on clear instructions, and assessment by at least two assessors. To ensure transparency, we also suggest an explicit presentation of the considerations made for all judgments when applying GRADE.
| Conclusions|| |
In summary, the CoE can be downgraded or upgraded after consideration of five and three factors, respectively, regarding the outcomes included in SRs. For the details of the GRADE assessments, we recommend that review authors regularly consult the Cochrane Handbook for SRs of Interventions and the GRADE series published in the Journal of Clinical Epidemiology. To enhance the transparency of SRs, review authors should report how they assess the CoE for each outcome and provide a rationale for any downgrading or upgrading of the CoE.
Financial support and sponsorship
Conflicts of interest
Prof. Ching-Chi Chi, the Editor-in-Chief at Dermatologica Sinica, had no role in the peer review process of or decision to publish this article. The other authors declared no conflicts of interest in writing this paper.
| References|| |
Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al.
GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol 2011;64:383-94.
Kuo LT, Shao SC, Chi CC. Ten essential steps for performing a systematic review: A quick tutorial. Dermatol Sinica 2022;40:204-6.
Goldet G, Howick J. Understanding GRADE: An introduction. J Evid Based Med 2013;6:50-4.
Zhang S, Wu QJ, Liu SX. A methodologic survey on use of the GRADE approach in evidence syntheses published in high-impact factor urology and nephrology journals. BMC Med Res Methodol 2022;22:220.
Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al.
Grading quality of evidence and strength of recommendations. BMJ 2004;328:1490.
Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al.
GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol 2011;64:1311-6.
Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al.
GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924-6.
Yang SJ, Chi CC. Effects of fish oil supplement on psoriasis: A meta-analysis of randomized controlled trials. BMC Complement Altern Med 2019;19:354.
Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al.
The Cochrane collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011;343:d5928.
Sterne JA, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al.
RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 2019;366:l4898.
Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al.
ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016;355:i4919.
Schünemann HJ, Cuello C, Akl EA, Mustafa RA, Meerpohl JJ, Thayer K, et al.
GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. J Clin Epidemiol 2019;111:105-14.
Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al.
GRADE guidelines: 4. Rating the quality of evidence-study limitations (risk of bias). J Clin Epidemiol 2011;64:407-15.
Søyland E, Funk J, Rajka G, Sandberg M, Thune P, Rustad L, et al.
Effect of dietary supplementation with very-long-chain n-3 fatty acids in patients with psoriasis. N Engl J Med 1993;328:1812-6.
Mayser P, Mrowietz U, Arenberger P, Bartak P, Buchvald J, Christophers E, et al.
Omega-3 fatty acid-based lipid infusion in patients with chronic plaque psoriasis: Results of a double-blind, randomized, placebo-controlled, multicenter trial. J Am Acad Dermatol 1998;38:539-47.
Kristensen S, Schmidt EB, Schlemmer A, Rasmussen C, Johansen MB, Christensen JH. Beneficial effect of n-3 polyunsaturated fatty acids on inflammation and analgesic use in psoriatic arthritis: A randomized, double blind, placebo-controlled trial. Scand J Rheumatol 2018;47:27-36.
Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al.
GRADE guidelines: 8. Rating the quality of evidence – Indirectness. J Clin Epidemiol 2011;64:1303-10.
Feldman SR, Krueger GG. Psoriasis assessment tools in clinical trials. Ann Rheum Dis 2005;64 Suppl 2:i65-8.
Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al.
GRADE guidelines: 7. Rating the quality of evidence – Inconsistency. J Clin Epidemiol 2011;64:1294-302.
Schünemann HJ, Neumann I, Hultcrantz M, Brignardello-Petersen R, Zeng L, Murad MH, et al.
GRADE guidance 35: Update on rating imprecision for assessing contextualized certainty of evidence and making decisions. J Clin Epidemiol 2022;150:225-42.
Chularojanamontri L, Griffiths CE, Chalmers RJ. Responsiveness to change and interpretability of the simplified psoriasis index. J Invest Dermatol 2014;134:351-8.
Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al.
GRADE guidelines: 5. Rating the quality of evidence – Publication bias. J Clin Epidemiol 2011;64:1277-82.
Doleman B, Freeman SC, Lund JN, Williams JP, Sutton AJ. Funnel plots may show asymmetry in the absence of publication bias with continuous outcomes dependent on baseline risk: Presentation of a new publication bias test. Res Synth Methods 2020;11:522-34.
Guyatt G, Akl EA, Oxman A, Wilson K, Puhan MA, Wilt T, et al.
Synthesis, grading, and presentation of evidence in guidelines: Article 7 in Integrating and coordinating efforts in COPD guideline development. An official ATS/ERS workshop report. Proc Am Thorac Soc 2012;9:256-61.
Granholm A, Alhazzani W, Møller MH. Use of the GRADE approach in systematic reviews and guidelines. Br J Anaesth 2019;123:554-9.
Kumar A, Miladinovic B, Guyatt GH, Schünemann HJ, Djulbegovic B. GRADE guidelines system is reproducible when instructions are clearly operationalized even among the guidelines panel members with limited experience with GRADE. J Clin Epidemiol 2016;75:115-8.
Department of Dermatology, Chang Gung Memorial Hospital, Linkou, No. 5, Fuxing St., Guishan District, Taoyuan
Source of Support: None, Conflict of Interest: None
| Article Access Statistics|
| Viewed||2135 |
| PDF Downloaded||51 |