GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines
BMJ 2016; 353 doi: https://doi.org/10.1136/bmj.i2089 (Published 30 June 2016) Cite this as: BMJ 2016;353:i2089
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Thank you Professor Treweek. It is encouraging to note that I am not too off base here. Looking back, I believe the reasons for my original comment regarding the plain language summary were the following:
The view that the point estimate is our best estimate of the population parameter is, I suspect, contentious (for example, see section 4.2.4 of Spanos A. Philosophy of Econometrics. 2007)
While the Cochrane Handbook for Systematic Reviews of Interventions (online version 5.1.0), I acknowledge, does state that “The point estimate (0.75) is the best guess of the magnitude and direction….”, it also states that “If the confidence interval was wider still, and included the null value of a difference of 0%, we will not have excluded the possibility that the treatment has any effect whatsoever, and would need to be even more sceptical in our conclusion”
Thank you once again for your thoughtful replies.
Competing interests: No competing interests
Thanks for responding Prof Ansari: you are completely correct, our article is saying that compared to warfarin, dabigatran will “probably” result in 8 fewer deaths per 1000 patients but our findings are compatible with the possibility that that there might be no (additional) risk reduction.
Competing interests: No competing interests
Thank you Professor Treweek for responding to my comment. Perhaps, what you are saying is that our best estimate is that compared with warfarin, dabigatran will “probably” result in 8 fewer deaths per 1000 patients; however, our findings are compatible with the possibility that that there might be no (additional) risk reduction. Would this be a fair statement to make?
Competing interests: No competing interests
Many thanks to Prof Ansari for his interest in and comments on the GRADE Evidence to Decision framework articles.
Prof Ansari's point about the plain language summary is interesting; it is indeed hard to capture health information in words-only without running the risk of introducing ambiguity, or misunderstandings. There are, I think, at least three things that mean that the Evidence to Decision framework can help to reduce this risk.
Firstly, although the confidence interval given for the risk of dying having taken dabigatran rather than warfarin runs from 17 fewer to 1 more per 1000 patients, the 'true' value is not equally likely to be any value in that range–it is much more likely to be around the point estimate of 8 fewer deaths than it is to be at the extreme end of the confidence interval of 1 more death. The bulk of the confidence interval is in favour of dabigatran for this outcome and we need to note that.
Secondly, the plain language text is in the example structured in that the wording is related to the size of the treatment effect, the confidence interval and the GRADE assessment of the uncertainty in the evidence. The word 'probably' is used carefully and as part of a consistent approach to producing a short, words-only summary of the evidence. How people interpret 'probably' will vary but how we use it in the Evidence to Decision framework should not. Here the plain language text intends to convey the message that while there is a chance that dabigatran increases the risk of dying, this is much less likely than that it reduces that risk.
Finally, the Evidence to Decision framework presents information in several forms, some words-only, others numerical, with the option for a user to dig deeper if needed. While words or numbers alone may at times be ambiguous, we hope that by providing information in a number of formats, the chance of ambiguity is reduced and people can reach more informed conclusions.
Competing interests: I am a coauthor of the paper being discussed.
I very much enjoyed reading Paper 1 in this series of two articles. As I am in the process of perusing this paper 2, I have one minor comment to share:
In Figure 1, for the outcome of death, the 95% CI is 17 less to 1 more per 1000 patients. Because it is not possible for me to know which value within the 95% confidence interval denotes the population parameter, I am not sure I can conclude, unlike the plain language summary, that “Dabigatran probably reduces the risk of dying” with moderate confidence. If I can, then maybe I can also conclude with equal certainty that Dabigatran probably does not reduce the risk of dying. And if the language of the outcome specific plain language summary has any (intentional or unintentional) role to play in going from evidence to decisions, then over-focusing on the point estimate could be potentially problematic.
Competing interests: No competing interests
Re: GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines
The claim that “a point estimate is our best guess of the magnitude and direction of an effect” needs to be qualified because it represents an observed value of a random variable (the estimator), which often takes an infinite number of possible values; hence, the need to quantify the uncertainty associated with the point estimate, somehow. This often motivates the use of Confidence Intervals (Cis) that aim to provide a way to calibrate this uncertainty using coverage probabilities. Unfortunately, the latter does not extend to the observed CI, despite numerous questionable attempts to assign probabilities to that. The reason is that post-data the observed CI either includes or excludes the true value of the effect, but it’s unknown which is which. Similarly, the p-value can be very misleading as indicating the “the magnitude of the effect” because of its dependence on the sample size. It does, however, indicate the direction of the effect; see Mayo and Spanos (2006).
The best way to establish “the magnitude and direction of an effect” is to use the post-data severity evaluation in conjunction with Neyman-Pearson (N-P) testing. This evaluation establishes the discrepancy from the null value warranted by the particular data. It can be shown that the post-data severity associated with the point estimate has probability .5, but one would prefer to establish warranted discrepancies from the null with much higher probability, say .9 or even .95. Moreover, observed CIs include values with very low severity; see Spanos (2014).
This confirms the comment quoted by Ansari: “If the confidence interval was wider still, and included the null value of a difference of 0%, we will not have excluded the possibility that the treatment has any effect whatsoever, and would need to be even more skeptical in our conclusion.” The post-data severity evaluation is used to establish the warranted discrepancy from the null with high probability for particular data. It enables a practitioner to go beyond accept/reject, p-values and observed CIs, in order to give rise to an evidential interpretation relating to the "the magnitude and direction of an effect". Such established warranted discrepancies with high severity can used as reliable guides for Evidence-based Decision-making.
Mayo, D. G. and Spanos, A. (2006), "Severe testing as a basic concept in a Neyman--Pearson philosophy of induction," British Journal for the Philosophy of Science, 57: 323--57.
Spanos, A. (2014), "Recurring controversies about P values and confidence intervals revisited”, Ecology, 95(3): 645–651.
Competing interests: No competing interests