Journal of Clinical Epidemiology 65 (2012) 1245e1248
Estimating benefits and harms of screening across subgroups: the Canadian Task Force on Preventive Health Care integrates the GRADE approach and overcomes minor challenges Kevin Pottiea,*, Sarah Connor Gorberb, Harminder Singhc, Michel Joffresd, Patrice Lindsaye, Paula Brauerf, Alejandra Jaramillob, Marcello Tonellig a
Departments of Family Medicine and Epidemiology and Community Medicine, C.T. Lamont Primary Health Care Research Centre, University of Ottawa, Ottawa, Canada b Public Health Agency of Canada, Ottawa, Canada c Departments of Internal Medicine and Community Health Sciences, University of Manitoba, Winnipeg, Canada d Faculty of Health Sciences, Simon Fraser University, Burnaby, Canada e Canadian Stroke Network, and Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada f Department Family Relations and Applied Nutrition, University of Guelph, Guelph, Canada g Department of Medicine, University of Alberta, Edmonton, Canada; and Department of Public Health Sciences, University of Alberta, Edmonton, Canada Accepted 24 June 2012; Published online 18 September 2012
Abstract Objective: This paper describes the integration of the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach into their clinical preventive guideline development process by the new Canadian Task Force on Preventive Health Care. Study Design: The GRADE approach focused the analytic framework and key questions on patient-important benefits and harms related to screening that incorporated detection, treatment, and follow-up. It also led to an explicit consideration of values and preferences and resource implications on the basis of the recommendations. Results: There are challenges, however, in incorporating the GRADE approach to clinical prevention, as the randomized controlled trials in this field have needed to be very large and of long duration, given the rare occurrence of primary outcome events in asymptomatic individuals. We provide examples of how we met these challenges in relation to developing clinical guidelines for screening for breast cancer, cervical cancer, diabetes, hypertension, and depression in primary care settings. Conclusion: The focus on the patient-important outcomes was helpful in estimating effectiveness of screening approaches and providing explicit detailing of the basis of our recommendations across subgroups. Ó 2012 Elsevier Inc. All rights reserved. Keywords: GRADE; Evidence-based clinical guidelines; Clinical prevention; Primary health care; Screening
1. Introduction The Canadian Task Force on the Periodic Health Examination (referred to herein as ‘‘Task Force’’) was struck in 1976, and its initial report represented one of the first evidence-based clinical guideline initiatives. Early Task Force documents focused on clinical recommendations with Conflict of Interest/Financial Disclosure: The authors have no conflicts of interest to declare. Funding for the Canadian Task Force is provided by the Public Health Agency of Canada and the Canadian Institutes for Health Research. * Corresponding author. Institute of Population Health, 1 Stewart Street, Ottawa, ON, Canada. Tel.: 613-562-5800 ext 2015; fax: 613-5625343. E-mail address: [email protected]
(K. Pottie). 0895-4356/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2012.06.018
links to evidence developed through a multidisciplinary consensus approach . In the late 1980s, the Task Force pioneered a grading system that laid out the basis of recommendations using a unique grading system for the quality of evidence and the strength of the recommendations . This early grading system was later adopted by the U.S. Preventive Services Task Force and many other organizations . In 2010, the newly revitalized Canadian Task Force on Preventive Health Care (herein referred to as ‘‘Canadian Task Force’’) decided to adopt the GRADE approach [4,5] for the appraisal of evidence and the characterization of the strength of each recommendation in our clinical preventive guidelines. The GRADE approach facilitates transparent development of rigorous guidelines and, unlike other commonly used systems, allows for the explicit consideration of several
K. Pottie et al. / Journal of Clinical Epidemiology 65 (2012) 1245e1248
What is new? Key points: The Canadian Task Force on Preventive Health Care has integrated the GRADE approach into the clinical preventive guideline development process. Using GRADE has resulted in a focus of the population, intervention, comparison, and outcome question on screening (encompassing detection, treatment, and follow-up) and patient-important outcomes and the consideration of patients’ values and preferences within the basis of recommendation. Estimating the efficacy and effectiveness of interventions for relevant subgroups has emerged as an important, although sometimes challenging, step.
factors when rating the strength of recommendation (quality of evidence, balance of benefits and harms, values and preferences, and resource use) . In this article, we provide examples of the key methodological issues that have arisen during this transition with a special focus on estimating effectiveness of screening tests and subgroup analysis.
2. Focusing the PICO question on patient-important outcomes In applying the GRADE approach, we first needed to determine the patient-important outcomes, such as mortality and major clinical diseases, for each guideline (see Appendix for details on Canadian Task Force Topic Workgroups). We used a modified Delphi approach in which each workgroup member identified and rated outcomes (both benefits and harms) as critical, important, or not important, based on their perceptions about whether they were clinically relevant. The focus on patient-important outcomes, with less emphasis on intermediate or surrogate outcomes, resulted in a shift of our population, intervention, comparison, and outcome (PICO) question and analytic framework toward the effectiveness of screening a population on hard outcomes, such as mortality. This encompasses detection, treatment, and follow-up and has resulted in less emphasis on the potential effectiveness of merely treating a population for a disease. For example, the breast cancer systematic review (see Fig. 1) focused on evidence for mammography screening linked to patient outcomes (breast cancererelated mortality) rather than only on the evidence for effectiveness of treatment for breast cancer. As another example, for cervical cancer, the review focused on the reduction in the incidence of cervical cancer (in particular advanced stages of the diseases) and mortality because of cervical cancer, rather than on the detection rate of precancerous lesions.
3. Estimating the benefits and harms and rating the quality of evidence Specifying PICO questions in terms of each screening intervention and patient-important outcome allowed us to systematically search the literature for the highest quality evidence to estimate the certainty of a net beneficial effect of each preventive service. There are unique challenges in incorporating the GRADE approach to screening and prevention, as adequately powered randomized controlled trials (RCTs) in this field must be very large and long term, given the relatively rare occurrence of primary outcome events in asymptomatic individuals. Few or no such studies exist for many possible screening and prevention topics, and available studies often study only intermediate or surrogate outcomes. Depending on the research question, different study designs were eligible for inclusion, and this evidence was synthesized in Summary of Finding Tables as recommended by GRADE . For example, because multiple trials examined the benefit of breast cancer screening with mammography, we focused on RCTs for benefits, especially trials that captured the joint effects of screening and any subsequent intervention. In contrast, the review of screening for type 2 diabetes found no relevant RCTs, but the GRADE approach allowed us to focus on patient-important outcomes to assess screening approaches and differences for subgroups . The review of screening for hypertension identified very limited RCT datadso we sought evidence from controlled cohort studies as our next step. To assess the optimal target age and frequency for cervical cancer screening, we included case control studies in the search, as we identified no RCTs that addressed these questions and the number of available cohort studies was also limited. To assess for the effect of screening with Pap smears on cervical cancer outcomes, we also considered the reports from large ecological studies, which had consistent and large effects. For each guideline, we used large observational studies to assess the potential harms of screening, given that these outcomes are often not captured in RCTs. For example, in the diabetes review we recognized that anxiety and depression were a concern when screening the general population and for hypertension we acknowledged that economic costs (such as lost work or insurance benefits) were especially relevant, and so we sought evidence to estimate the effect of screening on these important harms across low- and high-risk subgroups. Previous Task Force methods used a more indirect approach of determining effectiveness , seeking evidence across a series of key questions rather than focusing on one central PICO question. In contrast, the GRADE approach used by the revitalized Canadian Task Force shifts more focus onto combined effects of screening and intervention, formulated as the PICO question. We have also used contextual questions (supported by less comprehensive literature reviews) to assess other relevant pieces of evidence. For
K. Pottie et al. / Journal of Clinical Epidemiology 65 (2012) 1245e1248
Fig. 1. Breast cancer analytical framework . MRI 5 magnetic resonance imaging; CBE 5 clinical breast exam; BSE 5 breast self exam.
example, in breast cancer screening, contextual evidence highlighted the impact of false positive results on the likelihood of returning for future screening. In the depression screening review, contextual questions helped to estimate the patient preferences regarding pharmacologic treatments. 3.1. Subgroup analysis Screening entire populations is sometimes difficult, impractical, or ineffective, with the potential for harms to exceed benefit and to unduly consume resources. In contrast, targeting higher risk groups may enhance the benefit to harm ratio of screening. When possible we have used RCT and observational studies to help define populations where the benefits of screening are likely to be greater than harms. For, example, in the breast cancer review we were able to consider the 40e49- and the 50e69-year age groups separately to determine an estimate of the benefits and harms of intervening with mammography screening . When controlled trials or observational studies were not available for screening interventions, we sought additional evidence for subgroup populations using modeling studies. For example, in the diabetes review we used economic analyses to provide supplementary evidence to help determine if high-risk populations may benefit from screening and subsequent treatment. Our approach to integrating, grading, and presenting modeling evidence is beyond the scope of this study but will be addressed in future Canadian Task Force methods articles.
4. Values, preferences, and resource implications We used a modified Delphi approach, among the Task Force members, to rate the importance of clinical outcomes, and this served to help us determine the relative importance (value) of both benefits and harms. We also did additional
evidence reviews to determine the potential patient screening preferences. For example, patient preferences concerning screening and treatments for depression were searched. To estimate resource implications, we considered costs and feasibility of relevant tests and treatments. Finally, we used evidence reviews to determine differences in baseline prevalence, disease burden, and access to interventions among subgroups, such as indigenous and other ethnic groups. The GRADE approach helped refine our methodology for estimating benefits and harms relevant to screening various population groups. It helped to develop searches seeking the highest quality evidence related to our central PICO questions. We have shifted other important questions to contextual questions to ensure that we have evidence to support other important aspects of the guidelines. For example, what is the cost-effectiveness of !interventionO for !disease/condition in !populationO? This contextual question is used to address resource implications: the fewer the resources consumed, the more likely a strong recommendation in favor of that intervention. The key output from the GRADE process is a detailed and explicit decision table that presents the basis of our recommendations, including the balance of benefits and harms and burden, confidence in estimates of effect (quality of evidence), estimates of patient values and preferences, and resource use.
5. Other considerations The challenges of integrating GRADE have included a need to invest in start-up training of both the Canadian Task Force members and the evidence centers regarding the GRADE approach. Working with the GRADE group has facilitated this process. In addition, our guideline working groups have had to learn how to construct an effective PICO prevention question and set up GRADE evidence profiles. Our biggest challenge has been how to efficiently
K. Pottie et al. / Journal of Clinical Epidemiology 65 (2012) 1245e1248
determine the best available quality evidence when faced with a variety of low quality trials or the absence of clinical trials. We have also had to accept that the confidence in the effect of preventive interventions is often less than the confidence in the effect of pure treatment interventions because preventive interventions aimed at the general population usually involve a complex series of testing and treatment steps that often depend on organizational factors. Fundamentally moderate- to high-quality evidence to address many questions does not exist. Therefore, judgment using the best quality evidence available remains a critical aspect of the process. GRADE improves on past approaches by making each step more explicit and transparent. GRADE also offers an opportunity to explicitly identify these gaps and guide future research and development.
6. Conclusions In conclusion, the GRADE approach has allowed the Canadian Task Force to estimate the effect of screening or screening plus intervention on patient-important outcomes and explicitly document our basis of recommendations across subgroups. Although labor intensive, this process has improved the precision of estimated benefits, harms, and burdens and allowed explicit integration of other important pieces of knowledge, such as values and preferences. Better methods are needed for efficiently seeking and selecting the best evidence when faced with lower quality evidence and modeling studies.
Appendix Composition of the topic-specific workgroups of the Canadian Task Force on Preventive Health Care Each guideline produced by the Canadian Task Force is led by a topic-specific workgroup (panel). This workgroup
consists of two to five Task Force members (one of whom is selected as chair), a scientific research manager from the Public Health Agency of Canada, and members from the Evidence Review and Synthesis Centre, a universitybased research center that is responsible for conducting all Canadian Task Force evidence reviews, as well as from partner organizations, if any such organizations are involved for the particular topic .
References  Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J 1979;121:1193e254.  The Canadian Task Force on Periodic Health Examination. The Canadian guide to clinical preventive health care. Ottawa, ON: Minister of Public Works and Government Services Canada; 1994.  Woolf SH, Atkins D. The evolving role of prevention in health care: contributions of the U.S. preventive services task force. Am J Prev Med 2001;20(Suppl 3):13e20.  Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336:1049e51.  Sch€unemann H, Brozek J, Oxman A, editors. GRADE handbook for grading quality of evidence and strength of recommendation. Version 3.2. The GRADE Working Group; 2009. Available at. www.cc-ims. net/gradepro. Accessed September 30, 2011.  Glasziou P, Higgins J, editors. Obtaining a consensus on the content and methods of a Summary of Findings table for Cochrane Reviews Report to the Cochrane Collaboration Steering Group. Ottawa, ON: Cochrane Collaboration Steering Group; 2005.  Dans LF, Silvestre MA, Dans AL. Trade-off between benefit and harm is crucial in health screening recommendations Part I: general principles. J Clin Epidemiol 2011;64:231e9. PubMed PMID: 21194890.  The Canadian Task Force on Preventive HealthCare. Recommendations on screening for breast cancer in average risk women aged 4074 years. Can Med Assoc J 2011;183:1991e2001.  Canadian Task Force on Preventive Health Care. Procedure manual. Available at http://canadiantaskforce.ca/methods-manual-2011.html. Accessed November 21, 2011.