Zurich Open Repository andArchiveUniversity of ZurichMain LibraryStrickhofstrasse 39CH-8057 Zurichwww.zora.uzh.chYear: 2020The quality of evidence for medical interventions does not improve orworsen: a metaepidemiological study of Cochrane reviewsHowick, Jeremy ; Koletsi, Despina ; Pandis, Nikolaos ; Fleming, Padhraig S ; Loef, Martin ; Walach,Harald ; Schmidt, Stefan ; Ioannidis, John P AAbstract: OBJECTIVES The objective of the study was to determine the change in quality of evidencein updates of Cochrane reviews that were initially published between January 1, 2013 and June 30, 2014.We used the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) systemto document evidence quality. STUDY DESIGN AND SETTING We searched the Cochrane Databaseof Systematic Reviews on March 20, 2020 to identify which of the reviews from the initial (2013/14)sample had been updated. Using the same methods to determine the quality of evidence in the previousanalysis, we assessed the quality of evidence for the first-listed primary outcomes in the updated reviews.RESULTS Of the 608 reviews in the original sample, 154 had been updated with and 151 containedavailable data for both original and updated systematic reviews (24.8%). The updated reviews included:15 (9.9%) with high-quality evidence, 56 (37.1%) with moderate-quality evidence, 47 (31.1%) with lowquality evidence, and 33 (21.9%) with very low-quality evidence. No change in the GRADE quality ofevidence was found for most (103, 68.2%) of the updated reviews. The quality of evidence rating wasdowngraded in 28 reviews (58.3%) and upgraded in 20 (41.7%), although only six reviews were promotedto high quality. CONCLUSION Updated systematic reviews continued to suggest that only a minorityof outcomes for health care interventions are supported by high-quality evidence. The quality of theevidence did not consistently improve or worsen in updated reviews.DOI: ed at the Zurich Open Repository and Archive, University of ZurichZORA URL: ArticleAccepted VersionOriginally published at:Howick, Jeremy; Koletsi, Despina; Pandis, Nikolaos; Fleming, Padhraig S; Loef, Martin; Walach, Harald;Schmidt, Stefan; Ioannidis, John P A (2020). The quality of evidence for medical interventions does notimprove or worsen: a metaepidemiological study of Cochrane reviews. Journal of Clinical Epidemiology,126:154-159.DOI:

The quality of evidence for medical interventions does not improve or worsen: A MetaEpidemiological Study of Cochrane ReviewsJeremy Howick, PhD1, Despina Koletsi, DiplDS, Dr. med. dent2*, Nikolaos Pandis3, PadhraigS. Fleming, PhD4, Martin Loef, PhD5, Harald Walach, PhD5,6, Stefan Schmidt, PhD7, JohnP.A. Ioannidis, MD, DSc81Faculty of Philosophy, University of Oxford, Oxford OX2 6GG, United KingdomClinic of Orthodontics and Pediatric Dentistry, Center of Dental Medicine, University ofZurich, Switzerland*joint first author3Department of Orthodontics and Dentofacial Orthopedics, School of Dental Medicine,Medical Faculty, University of Bern, Bern, Switzerland4Institute of Dentistry, Queen Mary, University of London5CHS-Institute, Berlin, Germany6Poznan University of the Medical Sciences, Department of Pediatric Gastroenterlogy,Poznan, Poland7Department of Psychosomatic Medicine and Psychotherapy, Medical Center, University ofFreiburg8Departments of Medicine, of Epidemiology and Population Health, of Biomedical DataScience, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS),Stanford University, CA, USA2Publication:Howick J, Koletsi D, Pandis N, Fleming PS, Loef M, Walach H, Schmidt S, Ioannidis JPA.The quality of evidence for medical interventions does not improve or worsen: a MetaEpidemiological Study of Cochrane Reviews. J Clin Epidemiol. 2020 Aug 10:S08954356(20)30777-0. doi: 10.1016/j.jclinepi.2020.08.005. Epub ahead of print.Correspondence to: Jeremy Howick, Faculty of Philosophy, University of Oxford, OxfordOX2 6GG, 44 (0)7771925412, E-mail: [email protected]

RegistrationOpen Science Framework: Howick, J., Koletsi, D., Fleming, P., Schmidt, S., Loef, M.,Walach, H., Ioannidis, J. (2020, March 30). Has the Quality of Evidence for MedicalInterventions Improved? Protocol for a Meta-Epidemiological Study. Retrieved, (guarantor) JPAI conceived of the idea, JH wrote the first draft of the protocol, DK didthe data extraction; JH, ML, PF, HW checked the extraction. DK and NP did the initialanalysis. All authors interpreted the analyses, contributed to drafting the protocol and writingthe manuscript.SupportThe writing of this protocol was not independently funded.Declaration of interestNone of the authors have any conflicts of interests related to this paper.2

AbstractBackground: A previous analysis of Cochrane Reviews published between January 1st,2013 and June 30th, 2014 found that only 13.5% reported high quality evidence for theintervention according the Grading of Recommendations, Assessment, Development andEvaluation (GRADE) system. 31.7% had low level, and 24% revealed very low level ofevidence. Many of these reviews have been updated, and it is unknown whether the updatedreviews report a change in the quality of evidence.Objectives: To determine the change in quality of evidence in updates of Cochrane reviewsthat were initially published between 1st January 2013 and 30th June 2014.Methods: We searched the Cochrane Database of Systematic Reviews on March 20th, 2020to identify which of the reviews from the initial (2013/14) sample have been updated. Usingthe same methods to determine the quality of evidence in the previous analysis, we assessedthe quality of evidence for the first listed primary outcomes in the updated reviews.Results: Of the 608 reviews in the original sample, 154 had been updated with 151presenting available data for both original and updated SRs (24.8%). The updated reviewsincluded: 15 (9.9%) with high quality evidence, 56 (37.1%) with moderate, 47 (31.1%) withlow, and 33 (21.9%) with very low-quality evidence. No change in the GRADE quality ofevidence was found for most (103, 68.2%) of the updated reviews. Of the 48 reviews with achange in GRADE rating (58.3%) were downgraded, mostly to low or very low. The qualityof evidence rating improved in 20 (41.7%), although only 6 reviews were promoted to highquality.Conclusions: Updated systematic reviews continued to suggest that only a minority ofoutcomes for healthcare interventions are supported by high-quality evidence. The quality ofthe evidence did not consistently improve or worsen in updated reviews.3

Keywords: Systematic review; evidence; Quality score; Meta-analysisWhat is new?Key findings The quality of evidence (according to GRADE) supporting the main finding changesin about a quarter of updated reviews. Upgrading of quality of evidence (according to GRADE) for the main outcome is notmore common than downgrading quality of evidence.What this adds to what was known? Quality of evidence does not seem to improve overall with the addition of newevidence, at least within the timeframe assessed.What is the implication and what should change now? Methods investigating when review updates are likely to change our confidence in theestimated outcome effect could inform decisions about whether to update reviews inorder to save resources. The quality of evidence supporting most healthcare interventions remains low; higherquality evidence is required.4

1. Introduction1.1. RationaleSeveral meta-epidemiological studies have attempted to determine the proportion ofhealthcare interventions that are evidence-based. A 2001 estimate found that about a quarter(26.7%) of healthcare interventions whose effectiveness was reported in 160 CochraneReviews were considered effective, based on the interpretation of the review authors. 1 In2007, Garrow claimed that 50% of healthcare treatments have good evidence to supportthem. 2 In the same year, El Dib et al. (2007) found that just 44% of a random selection ofCochrane Reviews evaluating interventions suggested that they were likely to be beneficial. 3Since these studies were published, the Grading of Recommendations, Assessment,Development and Evaluation (GRADE) system has been introduced offering a less subjectiveway of ranking the quality of evidence. 4 An evaluation of all Cochrane Reviews publishedbetween January 1, 2013 and June 30, 2014 found that 13.5% of reviews were found to havehigh quality of evidence for the first listed primary outcome according to GRADE.5 Highquality evidence was more common in updated compared to new reviews and in associationwith pharmacologic than other types of interventions. Even when any outcomes (includingbut not limited to the first listed primary outcome) were considered, only 116/608 (19.1%) ofthe reviews reported at least one outcome with high quality of evidence.Most researchers agree that it is important to update systematic reviews so that theyreflect current knowledge, 6 7 to maximize patient benefits, and to avoid harm. 8 However,updated reviews frequently reveal no change in conclusions when compared with theoriginal. According to French et al., only about 9% of updated Cochrane Reviews in 2002presented a change in conclusion relative to their precursors from 1998. 9 However, the claim5

that the updates did not overturn results from the original review was based on whetherreview authors stated there was a change in the conclusion of the updated review.There is currently no consensus on the timing that would appropriately guide a reviewupdate and the Cochrane Collaboration’s policy is to update reviews when evidenceaccumulates, based on the availability of new data that would have a meaningful impact onthe findings and on the importance of the review question. 10 Previous reports have identifieda median time required for an update of a systematic review of approximately 5.5 years. 11 Itwas therefore considered appropriate to assess whether reviews conducted back in 20132014 (Fleming et al., 2016) have been updated by early 2020, and if so, whether there arechanges in the quality of the evidence based on GRADE. 51.2. ObjectivesThe primary objective was to determine whether updates from a previous sample ofsystematic reviews resulted in a different quality evidence, as assessed by GRADE. Thesecondary objectives were to determine whether there is a difference in the change of qualityof evidence across different interventions, outcomes, or Cochrane Review Groups.2. Methods2.1. Eligibility criteria6

We included any Cochrane Review that was an update of a Cochrane Review publishedin the (01/01/2013—30/06/2014) parent sample of reviews which included a GRADEassessment.2.2. Information sourcesCochrane Database of Systematic Reviews: Search strategyWe searched the Cochrane Database of Systematic Reviews to identify the reviewsfrom the original sample which had updates. The most recent search was on March 20th,2020.2.4. Data sources and searchesOne author (DK) retrieved the systematic reviews from the original (2013/14) sampleand piloted the extraction form with one other author (JH). One author (DK) checked whetheran update had been published and extracted data for the updated review. Other authors (JH,ML, PF, HW) were second extractors (all records were checked by two authors). Alldiscrepancies were resolved by discussion.2.5. Data items7

Extracted information included: titles, corresponding author name and email, CochraneReview Group, year of publication, country, study design, intervention (and interventioncategory), control and outcome. In relation to the GRADE Summary of Findings tables(SoF), the following were recorded for the first listed outcome: category of intervention(including surgical, pharmacologic, behavioural or medical treatments, and diet or exerciseinterventions). In brief, “behavioural” interventions pertained to psychological treatment,psychotherapy, cognitive training, group therapy; “diet or exercise” interventions largelyrelated to training exercise, physiotherapy, rehabilitation, dietary modification; “medicaltreatments” were summarized by electronic optical/ hearing aids, appliance/ device use fordental treatment, ultrasound or other radiography and medical interventions not related tosurgical or pharmacologic approaches; type of outcomes (objective, such as mortality oroutcomes assessed with an instrument or pre-specified measurable criteria; or subjective) andoverall GRADE ranking with reasons for downgrade or upgrade. In cases where multipleSummary of Findings tables within the same review existed for the primary outcome, weconsidered only the one listed first. In cases where no high-quality evidence was recorded forthe first listed primary outcome, we documented whether any other outcome was rated ashigh and, if so, whether this was a primary (but not first listed) one.We reported whether the Cochrane review authors concluded that the experimentalintervention should be used in clinical or public health practice or not. This information wasobtained from the conclusions section in the review abstract and the body of the review(subsections “implications for practice” and/ or “implications for research”), following theoriginal strategy implemented in the parent study.5 Examples of positive interpretations were:“Buprenorphine should be supported as a medication to use,” and in the “Implications forresearch or practice” section: “There does not appear to be any need for further randomizedcontrol trials of the relative efficacy of methadone compared with buprenorphine.”8

2.6. OutcomesThe primary outcome was the change in quality of the evidence for the primaryoutcome in updated Cochrane Reviews compared with reviews published in an earlier(01/01/2013—30/06/2014) parent sample. The secondary outcomes were the proportion ofreviews in the updated sample that have high, moderate, low, or very low-quality evidence.We also assessed the review authors’ interpretation of results (as reported in the reviewconclusions), for high quality evidence and reports of statistically significant results.2.7. Data synthesis and analysisDescriptive statistics on year of publication of the update, as well as the time intervalbetween the publication in the parent sample and the update were calculated. In addition,frequency of type of intervention and related outcome were calculated for the reviews thathad been updated until the date of search. For studies that were updated, a change in therating of evidence, if present, and its direction was recorded (downgrade, upgrade). Dataaccumulation for the review update was also recorded, based on number of studies/participants included in the review’s first listed outcome.We reported actual proportions (n/N) as well as percentages of reviews reporting high,moderate, low or very low-quality evidence in the new sample of reviews. The quality ofevidence according to GRADE in the new subset of reviews with updates was tabulatedacross the respective versions in the parent sample in a matched 4 x 4 table. We thencompared the difference in quality of evidence between the original and updated sample. Weused the 2-sided exact signed-rank test to assess upgrades/downgrades between the original9

and updated reviews. We also performed a Stuart-Maxwell marginal homogeneity test. Inaddition, we performed assessments considering the presence of high-quality rating for anymain outcome rather than just the first listed primary outcome.For outcomes reported in the Summary of Findings table to be at the extremes (verylow or high) of evidence quality, we reported the distribution of statistically significantresults (P 0.05 or 95% confidence interval (CI) excluding the null), along with the reviewers’interpretation of the value of the intervention in clinical practice.All statistical analyses were conducted with STATA software 15.1 (Stata Corporation,College Station, TX, USA) and R Software version 3.6.1 (R Foundation for StatisticalComputing, Vienna, Austria).2.8. Protocol AmendmentsIn the protocol, we planned a subgroup analyses by disease area, intervention type, andCochrane Review Group. However, data for subgroups were deemed too sparse to allow formeaningful subgroup analyses.3. Results3.1. Search resultsOf the 608 reviews in the original sample, 154 (25.3%) had been updated, and 151 ofthose presented information on GRADE quality of evidence for both initial and updatedreviews so were retained for further assessment (Figure 1). The median year of the update10

was 2017 (interquartile range 2, range: 2015 to 2020), with a median of 4 years (IQR 2,range: 2 to 7 years) after the original review was published. Among the updated reviews, theoriginal version with which it was compared (published in 2013-2014) was already an updateof a previous version for 69 (45.7%) reviews.Most reviews in the present samples of Cochrane updates pertained to pharmacologicalinterventions (n 82; 54.4%), followed by behavioural (n 24; 15.9%) and surgical (n 23;15.2%) interventions, the use of medical devices (n 15; 9.9%), and diet- or exercise- relatedinterventions (n 7; 4.6%). In most of the reviews, the primary outcome considered wasclassified as objective (127/151; 84.1%).3.2. Quality of evidence in the entire updated (2020) sampleWithin the 151 updated reviews, 15 (9.9%) had high quality evidence supporting thefirst listed primary outcome, 56 (37.1%) moderate, 47 (31.1%) low, and 33 (21.9%) very low.Compared with the original sample, there was a reduction in the proportion of reviews withhigh quality. However, this reduction was not statistically significant (see below). GRADEranking comparison between the original and updated reviews are presented in Table 1, Table2, and Figure 2.Table 1. Summary of Review Quality from Updated and Original SamplesYear of reviewassessmentHighN (%)ModerateN (%)LowN (%)Very LowN (%)202015 (9.9)56 (37.1)47 (31.1)33 (21.9)2013/1482 (13.5)187 (30.8)193 (31.7)146 (24)11

Table 2. Change in quality of evidence across 151 reviews with updates for primaryoutcomes (the numbers below the diagonal are those which were upgraded, while thoseabove were downgraded).GRADE quality of evidence inoriginal sample (2013- 2014)GRADE quality of evidence in Updated Reviews(sample 2020)HighN (%)ModerateN (%)LowN (%)Very LowN (%)TotalHighN (%)9 (60.0)4 (7.1)7 (14.9)0 (0.0)20 (13.2)ModerateN (%)4 (26.7)40 (71.4)8 (17.0)3 (9.1)54 (35.8)LowN (%)2 (13.3)8 (14.3)30 (63.8)6 (18.2)47 (31.1)Very LowN (%)0 (0.0)4 (7.2)2 (4.3)24 (72.7)30 (19.9)Total15 (100.0)56 (100.0)47 (100.0)33 (100.0)151 (100.0)3.3. Change in quality of evidence3.3.1. Change in quality of evidence for primary outcomeMost (103/151, 68.2%) of the updated reviews reported no change in the GRADEquality of evidence compared with the initial sample (blue diagonal in Table 1). Of thereviews with unchanged grading, 9 (8.7%) reported high-quality evidence, 40 (38.8%) hadmoderate, 30 (29.2%) low, and 24 (23.3%) very low quality of evidence. In 63 of the 10312

updated reviews without a changed GRADE rating (61.2%), there was no additional dataincluded in the updates, whereas in the remaining 35 reviews, more data had been added. In 5reviews (4.9%) the update contained fewer primary studies than the original, but there wasstill no change in the GRADE rating. There was no statistical difference in the change in thequality of the evidence ratings (P 0.30) between the original and updated reviews. The Pvalue for the marginal homogeneity test was 0.55.A change in GRADE rating was reported in 48 of the 151 updated reviews. Twentyeight of these (58.3%) were downgraded, mostly (24/28) to low or very low. Of first-listedprimary outcomes initially recorded as having “high” quality evidence (n 15), 11 weredowngraded to low (n 7) or moderate (n 4) quality of evidence. Twenty of the 48 reviewsthat had a changed GRADE involved an upgrade. Of those, 6 were upgraded to “high”.Thirty of the 48 trials (62.5%) that had a changed GRADE rating included additionaldata. Among these, 15 resulted in upgrades, and 15 in downgrades. In 16 (33.3%) thechanged GRADE rating was not based on new data. In two updated reviews (4.2%), changeswere based on fewer data for the primary outcome of interest; both resulted in upgrades.Finally, 16 out of 48 reviews with a change in GRADE rating, were based on the sameincluded data (33.3%).3.3.1. Change in quality of evidence for other outcomes (those that were not first listed nonprimary)Of the 151 updated reviews which did not present high quality of evidence for the firstlisted primary outcome, 19 had other (non-primary, or primary but not first listed) outcomesthat were ranked as high-quality. Ten of these involved primary outcomes. The overallquality of the evidence in the updates for any outcome was high in 34 out of 151 updated13

reviews (22.5%). Again, we did not find a significant difference between the original andupdated reviews for this comparison (P 0.72). The P-value by the marginal homogeneity testwas P Review authors’ interpretations and statistical significance of resultsAmong extreme evidence quality ratings (very low and high), 8/33 (24.2 %) of thosewith very low quality and 10/15 (66.7 %) of those with high quality evidence had statisticallysignificant results for at least one outcome in the updated sample. Across all 151 updatedreviews, only 2 had high quality evidence, statistically significant results, and a favourableinterpretation of the value of the intervention in clinical practice.4. Discussion4.1. Summary of findingsOne-quarter of the reviews in our sample had been updated over the 6-7-year period. Ofthose, a third reported a change in GRADE ratings. There was no evidence of GRADEratings being more likely to improve than worsen in these topics, with a weak trend towardsworsening.In keeping with a previous finding that 23% of Cochrane Reviews were out of datewithin two years, 11 our study may also show that Cochrane Reviews are not updated veryfrequently.12 Specifically, we observed a median hiatus for publication of the updated reviewof 4 years among the reviews that were updated and most reviews were not even updated atall.14

In some cases, downgrading of evidence quality was related to the new Risk of Biasassessment forming the basis for the GRADE framework. Risk of Bias assessments havebecome stricter in the new Cochrane Handbook and might have led to automaticdowngrading due to items that had not been rated before or rated differently. This seems to bereflected in the fact that in approximately one-third of the reviews where the rating changed(16/48), there was no new data included in the review regarding the primary outcome ofinterest. Nevertheless, 81.3% (13/16) of the reviews with no new data reported worsening ofevidence quality.Another explanation for different GRADE ratings for updated reviews that had no newdata is potential inconsistency in the way the way GRADE is applied. One study foundvariability in the way GRADE is applied leading to different conclusions about strength ofevidence. 13 Another study found low agreement among systematic reviewers using theCochrane Risk of Bias tool (which influences the GRADE rating). 14 This may partiallyexplain why two of the updated reviews whose evidence quality was upgraded were based onfewer studies than the original. The omitted studies also reduced imprecision or risk of bias.15 164.2. LimitationsThe extent to which our findings are generalisable needs to be discussed. Our sample ofreviews from 2013 and 2014 may not be representative of all medical evidence. It pertains totopics where either a new review was published at that time or it was deemed that an updatewas then indicated. Similarly, the reviews that were updated may not be representative of theoriginal sample. Reviews which were not updated may have been less likely to requireupdating. If so, the proportion of changes in GRADE ratings we found may have been even15

exaggerated. If we account also for this selection process, the results suggest thatimprovements in the quality of evidence in different medical topics are even moreuncommon. Finally, we had a relatively small number of updated reviews, thus we could notmeaningfully explore whether improvements in the quality of evidence are more or less likelyin specific fields. However, no consistent patterns were observed for the very few reviews(n 6) where evidence was upgraded to high quality.In addition, our conclusions assumed that GRADE is sensitive enough to detectchanges in evidence quality; this may not be solely the case. GRADE only has fourcategories, and were there additional categories, we may have detected a change in quality ina greater number of reviews. On top of that, GRADE assessments may suffer frominadequate interrater reliability, while evidence exists about training of review authors and/ orduplicate assessments on the use of GRADE, for an improved quality of the evidenceevaluation approach.17 On the other hand, a more sensitive evidence-rating tool could also bemore likely to detect noise. More generally, our findings assumed that the GRADE ratings bythe original review authors were reliable (and, more generally, that GRADE is reliable). Toovercome this limitation, a re-grading of the original and updated reviews would have to beundertaken by blinded reviewers.4.3. ConclusionUpdating Cochrane systematic reviews does not change the fact that only a minority ofoutcomes for healthcare interventions are supported by high quality evidence. In spite ofhaving additional data, most reviews were not updated over the time period of our assessmentwith the majority of updates not resulting in a change in the quality of the evidence. To avoidresearch waste, it should be investigated whether it is possible to decide in advance whether16

updating a review will result in a change in results. Effects of medical interventionssupported by high quality evidence, statistically significant results, and favourableinterpretations of the evidence by review authors remain rare.17

References1. Ezzo J, Bausell B, Moerman DE, et al. Reviewing the reviews. How strong is theevidence? How clear are the conclusions? Int J Technol Assess Health Care2001;17(4):457-66. [published Online First: 2002/01/05]2. Garrow JS. What to do about CAM: How much of orthodox medicine is evidence based?BMJ 2007;335(7627):951. doi: 10.1136/bmj.39388.393970.1F [published OnlineFirst: 2007/11/10]3. El Dib RP, Atallah AN, Andriolo RB. Mapping the Cochrane evidence for decisionmaking in health care. J Eval Clin Pract 2007;13(4):689-92. doi: 10.1111/j.13652753.2007.00886.x [published Online First: 2007/08/09]4. Balshem H, Helfand M, Schunemann HJ, et al. GRADE guidelines: 3. Rating the qualityof evidence. J Clin Epidemiol 2011;64(4):401-6. doi: 10.1016/j.jclinepi.2010.07.015[published Online First: 2011/01/07]5. Fleming PS, Koletsi D, Ioannidis JP, et al. High quality of the evidence for medical andother health-related interventions was uncommon in Cochrane systematic reviews. JClin Epidemiol 2016;78:34-42. doi: 10.1016/j.jclinepi.2016.03.012 [published OnlineFirst: 2016/04/02]6. Chalmers I, Haynes B. Reporting, updating, and correcting systematic reviews of theeffects of health care. BMJ 1994;309(6958):862-5. doi: 10.1136/bmj.309.6958.862[published Online First: 1994/10/01]7. Garritty C, Tsertsvadze A, Tricco AC, et al. Updating systematic reviews: an internationalsurvey. PLoS One 2010;5(4):e9914. doi: 10.1371/journal.pone.0009914 [publishedOnline First: 2010/04/09]8. Moher D, Tsertsvadze A. Systematic reviews: when is an update an update? Lancet2006;367(9514):881-3. doi: 10.1016/S0140-6736(06)68358-X [published OnlineFirst: 2006/03/21]9. French SD, McDonald S, McKenzie JE, et al. Investing in updating: how do conclusionschange when Cochrane systematic reviews are updated? BMC Med Res Methodol2005;5:33. doi: 10.1186/1471-2288-5-33 [published Online First: 2005/10/18]10. Higgins JJ, Thomas JC, Chandler J, et al. Cochrane Handbook for Systematic Reviews ofInterventions version 6.0. Version 6.0 ed. Chichester: The Cochrane Collaboration2019.11. Shojania KG, Sampson M, Ansari MT, et al. How quickly do systematic reviews go outof date? A survival analysis. Ann Intern Med 2007;147(4):224-33. doi: 10.7326/00034819-147-4-200708210-00179 [published Online First: 2007/07/20]12. Higgins JJ, Green S. The Cochrane Handbook for Systematic Reviews of Interventions.Version 5.1.0 [updated March 2011] ed. Chichester: The Cochrane Collaboration2011.13. Berkman ND, Lohr KN, Morgan LC, et al. Interrater reliability of grading strength ofevidence varies with the complexity of the evidence in systematic reviews. J ClinEpidemiol 2013;66(10):1105-17 e1. doi: 10.1016/j.jclinepi.2013.06.002 [publishedOnline First: 2013/09/03]14. Hartling L, Hamm MP, Milne A, et al. Testing the risk of bias tool showed low reliabilitybetween individual reviewers and across consensus assessments of reviewer pairs. JClin Epidemiol 2013;66(9):973-81. doi: 10.1016/j.jclinepi.2012.07.005 [publishedOnline First: 2012/09/18]15. Boardman HM, Hartley L, Eisinga A, et al. Hormone therapy for preventingcardiovascular disease in post-menopausal women. Cochrane Database Syst Rev2015(3):CD002229. doi: 10.1002/14651858.CD002229.pub4 [published Online First:2015/03/11]16. Hakoum MB, Kahale LA, Tsolakian IG, et al. Anticoagulation for the initial treatment ofvenous thromboembolism in people with cancer. Cochrane Database Syst Rev2018;1:CD006649. doi: 10.1002/14651858.CD006649.pub7 [published Online First:2018/01/25]18

17. Mustafa RA, Santesso N, Brozek J, et al. The GRADE approach is reproducible inassessing the qu

Medical Faculty, University of Bern, Bern, Switzerland 4 Institute of Dentistry, Queen Mary, University of London 5 CHS-Institute, Berlin, Germany 6 Poznan University of the Medical Sciences, Department of Pediatric Gastroenterlogy, Poznan, Poland 7 Department of Psychosomatic Medicine and Psychotherapy, Medical