Volume 10, Issue 1 p. 23-43

RESEARCH ARTICLE

Full Access

Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-analysis

Areti Angeliki Veroniki,

Corresponding Author

Areti Angeliki Veroniki

[email protected]

orcid.org/0000-0001-6388-4825

Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Canada

Department of Primary Education, School of Education, University of Ioannina, Ioannina, Greece

Correspondence

Areti Angeliki Veroniki, PhD, MSc, Department of Primary Education, School of Education, University of Ioannina, Ioannina 455 00, Greece.

Email: [email protected]

Search for more papers by this author

Dan Jackson,

Dan Jackson

MRC Biostatistics Unit, Statistical Innovation Group, AstraZeneca, Cambridge, UK

Search for more papers by this author

Ralf Bender,

Ralf Bender

orcid.org/0000-0002-2422-4362

Department of Medical Biometry, Institute for Quality and Efficiency in Health Care, Cologne, Germany

Search for more papers by this author

Oliver Kuss,

Oliver Kuss

Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany

Institute of Medical Statistics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

Search for more papers by this author

Dean Langan,

Dean Langan

orcid.org/0000-0002-5414-8882

Institute of Child Health, University College London, London, UK

Search for more papers by this author

Julian P.T. Higgins,

Julian P.T. Higgins

Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK

Search for more papers by this author

Guido Knapp,

Guido Knapp

Department of Statistics, TU Dortmund University, Dortmund, Germany

Search for more papers by this author

Georgia Salanti,

Georgia Salanti

Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland

Search for more papers by this author

Areti Angeliki Veroniki,

Corresponding Author

Areti Angeliki Veroniki

[email protected]

orcid.org/0000-0001-6388-4825

Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Canada

Department of Primary Education, School of Education, University of Ioannina, Ioannina, Greece

Correspondence

Areti Angeliki Veroniki, PhD, MSc, Department of Primary Education, School of Education, University of Ioannina, Ioannina 455 00, Greece.

Email: [email protected]

Search for more papers by this author

Dan Jackson,

Dan Jackson

MRC Biostatistics Unit, Statistical Innovation Group, AstraZeneca, Cambridge, UK

Search for more papers by this author

Ralf Bender,

Ralf Bender

orcid.org/0000-0002-2422-4362

Department of Medical Biometry, Institute for Quality and Efficiency in Health Care, Cologne, Germany

Search for more papers by this author

Oliver Kuss,

Oliver Kuss

Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany

Institute of Medical Statistics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

Search for more papers by this author

Dean Langan,

Dean Langan

orcid.org/0000-0002-5414-8882

Institute of Child Health, University College London, London, UK

Search for more papers by this author

Julian P.T. Higgins,

Julian P.T. Higgins

Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK

Search for more papers by this author

Guido Knapp,

Guido Knapp

Department of Statistics, TU Dortmund University, Dortmund, Germany

Search for more papers by this author

Georgia Salanti,

Georgia Salanti

Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland

Search for more papers by this author

First published: 21 August 2018

https://bibliotheek.ehb.be:2102/10.1002/jrsm.1319

Citations: 129

Share a link

Email
Facebook
x
LinkedIn
Reddit
Wechat

Abstract

Meta-analyses are an important tool within systematic reviews to estimate the overall effect size and its confidence interval for an outcome of interest. If heterogeneity between the results of the relevant studies is anticipated, then a random-effects model is often preferred for analysis. In this model, a prediction interval for the true effect in a new study also provides additional useful information. However, the DerSimonian and Laird method—frequently used as the default method for meta-analyses with random effects—has been long challenged due to its unfavorable statistical properties. Several alternative methods have been proposed that may have better statistical properties in specific scenarios. In this paper, we aim to provide a comprehensive overview of available methods for calculating point estimates, confidence intervals, and prediction intervals for the overall effect size under the random-effects model. We indicate whether some methods are preferable than others by considering the results of comparative simulation and real-life data studies.

1 INTRODUCTION

Systematic reviews and meta-analyses provide a method for collecting and synthesizing research and are often used to inform decision making. The number of these publications has increased substantially since the 1990s.1 Meta-analysis is a valuable technique to summarize study-specific results and often reduces bias and uncertainty from individual studies. Guidelines and Health Technology Assessment panels, as well as international organizations, including the World Health Organization,2 the European Medicines Agency,3 and governmental agencies worldwide, such as the Canadian Agency for Drugs and Technologies in Health,4 the Institute for Quality and Efficiency in Health Care (IQWiG),5 and the National Institute for Health and Clinical Excellence,6 recognize the need to ensure that the best available evidence informs clinical practice and health care decision making. This typically involves conducting a high-quality knowledge synthesis and meta-analysis. Quantitative results of meta-analyses of relevant studies, in the form of a point estimate and a confidence interval (CI) for the effect size parameter of interest, are invariably considered together with judgements about the quality of the evidence to produce recommendations for practice.7

Quantification of uncertainty in the estimated overall effect size is important in the process of drawing conclusions from a meta-analysis. This uncertainty should ideally account for between-study heterogeneity in the intervention effects across study settings and populations.8, 9 For this reason, the random-effects model is often employed, which includes a between-study variance parameter. The uncertainty of the estimate of the overall effect size can be described by the corresponding CI under the random-effects model, and its width depends on the magnitude of the between-study variance, the number of studies, the precision of the study-specific effect sizes, and the significance level.10

The estimation of the CI for overall effect size is often conducted with the Wald-type (WT) method using a normal distribution, with variance equal to the inverse of the sum of the study weights, and the DerSimonian and Laird11 estimator for the between-study variance, and this has been used routinely in many meta-analyses.12 However, numerous shortcomings of this approach have been raised, such that the CI for the overall effect size generally does not retain its coverage probability (ie, the proportion of times that the interval includes the true value), and hence, it underestimates the statistical error, producing overconfident results.12-18 This is mainly because the WT CI is based on a large-sample approximation (in terms of the number of studies), and the number of studies is usually small. Typically, the number of studies synthesized in a meta-analysis in medical research is less than 20, suggesting that any large-sample approximation is likely to be inaccurate.19, 23 Several attempts to improve the standard WT CI approach have been suggested, each of which has different statistical properties.

Another important aim in decision making is the prediction of the true effect size in an individual (future) study and setting. Higgins et al24 suggested the use of prediction intervals under the random-effects model for this purpose. The use of prediction intervals has been promoted, and although they have not often been employed in practice, they provide useful additional information.25, 26

In this paper, we aim to provide a comprehensive overview of available methods in the methodological literature for calculating a CI for the overall effect size under the random-effects model and to indicate whether some methods are preferable to others by considering the results of comparative simulation and real-life data studies. We also examine potential issues surrounding the computation of prediction intervals under the random-effects model.

The article is structured as follows. In Section 2, we present the conventional meta-analytic models and set up our notation (section 2.1) and describe our review methodology (section 2.2). In Section 3, we describe the statistical methods found in our literature review. In section 3.1, we describe 15 identified methods to calculate a CI for the overall effect size. In section 3.2, we discuss the comparative performances of different methods for computing a CI for the overall effect size, as described in previous studies, and summarize recommendations made by their respective authors. In section 3.3, we discuss methods for computing prediction intervals. We conclude with a discussion of all intervals (confidence and prediction intervals) in Section 4.

2 METHODS

2.1 Meta-analysis models and notation

The conventional fixed- and random-effects models are the two main meta-analysis models to synthesize the study results.27 The random-effects model accounts for two sources of variation, quantified by the within-study variance (v_i) and the between-study variance (τ²). When $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0001$ , the fitted random-effects model collapses to the fixed-effect model (also known as common-effect model),28, 29 and therefore, the random-effects model can be considered a generalization of the fixed-effect model (ie, that the fixed-effect model is a special case of the random-effects model). Confidence intervals under the fixed-effect model can have poor properties even for low but nonzero heterogeneity.13, 30

Both the conventional fixed- and random-effects models require an estimated effect size y_i (such as log-odds ratio) and an estimated (within-study) variance v_i (var(y_i) = v_i) from every included study i, i = 1, …, k. The choice between the two models has been widely discussed in the literature31-33 and summarized in the Cochrane Handbook.9 In this paper, we focus on the random-effects meta-analysis model using inverse-variance weighting. Other techniques to combine study information to calculate the overall effect size are also available, such as weighting by sample size34, 35 or using confidence distributions.36, 37 Also, dichotomous outcomes do not require inverse-variance methods, as they can be modeled directly using one-step models (eg, generalized linear mixed models).38 Alternative methods accounting for heterogeneity by the use of a multiplicative parameter, where study weights are independent of observed heterogeneity, are also available.10, 39, 40 The description of these methods is beyond the scope of this review.

The conventional random-effects model assumes that the estimated effect size from the ith study y_i is

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0002$

where the study-specific random error (ε_i) and the underlying true effect sizes in the individual studies (θ_i) are normally distributed as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0003$

The random-effects estimated overall effect size $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0004$ , and its variance can be estimated as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0005$

with weights $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0006$ , where it can be seen that these weights are the inverse of the estimated total study variances. Similarly, the fixed-effect weights can be calculated as w_i,FE = 1/v_i, and the estimated overall effect size $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0007$ , and its variance are given by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0008$

The uncertainty in an estimated effect size for a given study, in relation to its study-specific true effect size, is expressed via the within-study variance v_i. The standard approach described above assumes that the estimated within-study variances v_i are fixed and known although they have to be estimated from the data. This assumption is justifiable when each study size is sufficiently large. The v_i estimation is sensitive not only to the study size but also to the data type and effect size used. For example, the v_i estimator for continuous outcomes when using the standardized mean difference depends on the estimated effect size. Hence, although v_i is assumed fixed and known, v_i is in fact estimated with some uncertainty. Several authors point out that this assumption could affect the estimation of the overall effect size, its variance, and related inferences.13, 41-45 Therefore, calculating study weights through within-study variances and assuming that they are known constants may have less desirable properties. This issue has previously been discussed, and some attempts have been made to account for the uncertainty in the weights.46-49 Hence, among other factors, the performance of the CI methods depends on how well we estimate the study weights.

Similarly, the estimation of τ² is performed with some uncertainty, and this uncertainty depends on the size and number of studies in the meta-analysis, as well as the size of the between-study variability. Factors such as these have implications for the accuracy of the standard statistical methods described in this paper, which has motivated many of the attempts to improve this. There are also many methods to estimate τ², any of which can be used in some of the CI methods described below, and we refer to a previous publication on this topic.50 Also, for a review of the simulation studies evaluating the comparative performance of τ² estimators, we direct the reader elsewhere.51

2.2 Review methods

We searched PubMed from inception until April 29, 2016, to identify full-text research articles that describe or compare methods for calculating CI for μ in simulations or in real data sets. We scanned the references of the selected articles for additional relevant articles, and we conducted general internet searches using the web search engine Google. We also used our networks of professional collaborations to identify potentially relevant articles. We included all studies that report the development or comparison of methods to calculate a CI for the overall effect size under the random-effects model. We also included studies reporting on prediction interval methods identified from our internet searches and networks of collaborations. We excluded commentaries, abstracts, and studies written in languages other than English, and studies relating to the hypothesis tests for the overall effect size. We restricted our investigation to CI methods developed under the random-effects meta-analysis model that assume the true study-specific effects are normally distributed, while we excluded CI methods developed for network meta-analysis, one-stage individual patient data meta-analysis, meta-analysis of diagnostic test accuracy studies, meta-analysis of multiple outcomes, and meta-regression analysis. One reviewer (AAV) summarized methods and studies' conclusions from each included article and recorded any conclusions from comparative articles (studies that compare at least two methods). The information extracted refers to the performance of the various methods and the judgements deducted about their related advantages, and this information was checked by all coauthors. Disagreements were resolved by discussion. The PubMed search strategy is included in Appendix S1.

We describe known properties of the methods in terms of coverage probability and CI width in section 3.2. The closer the coverage probability is to the nominal level (usually 0.95), the better the CI is considered to be. A CI is exact when the actual coverage equals the nominal coverage. The coverage probability is closely related to the type I error of the hypothesis test on the overall effect size: Assuming the null hypothesis is true, one minus the type I error rate is the coverage probability. A further criterion for comparing methods is that methods that provide narrower CIs, whilst retaining the correct coverage probability, are preferable because they increase precision and hence are more informative. All statistics presented in this paper refer to two-tailed tests.

3 RESULTS

The database search returned 5628 matches in PubMed and 20 records identified through other sources and searching reference lists. In total, 69 publications met the eligibility criteria, which are listed in Appendix S2. We identified 15 methods to compute a CI for the overall effect size. The properties of those methods have been evaluated in 31 research papers, including 30 simulation studies and 32 real-life data evaluations of two or more methods. Below, we present the 15 identified approaches in seven broad categories, and as a separate section, we present the comparative results of the identified simulations and studies using real data sets (see Appendices S3-S4 for simulation scenarios and study characteristics and Appendix S5 for a summary of performance measures in simulation studies). In Table 1, we summarize the methods available in several software options.112

Table 1. Software options (with packages or macros) for each CI method. Το our knowledge, routines for WTqa (method 3), Bartlett-type correction (method 7), and ZLa (method 11) CIs are not available in any of the software options listed below

Software	License type	Confidence/Credible Interval Methods
Software	License type	WTz (Method 1)	WTt (Method 2)	HKSJc (Method 4)	Modified HKSJ (Method 5)	PL (Method 6)	Skovgaard (Method 8)	HCb (Method 9)	BTd (Method 10)	Bootstrap (Methods 12, 13)	FP (Method 14)	Bayes (Method 15)
Comprehensive Meta-analysis	Commercial	Yes	-	Yes	-	-	-	-	-	-		-
Excel—MetaEasy AddIn	Freeware	Yes	Yes	-	-	Yes	-	-	-	-	Yes	-
Excel—MetaXL AddIn	Freeware	Yes	-	-	-	-	-	-	-	-	-	-
HLM	Commercial	-	-	-	-	Yes	-	-	-	-	-	-
Meta-DiSc	Freeware	Yes	-	-	-	Yes	-	-	-	-	-	-
Metawin	Commercial	Yes	-	-	-	-	-	-	-	Yes	-	-
MIX	Commercial	Yes		-	-	-	-	-	-	-	-	-
MLwin	Freeware	Yes		-	-	Yes	-	-	-	Yes	-	Yes
Open Meta Analyst	Freeware	Yes	-	-	-	-	-	-	-	-		-
RevMan	Freeware	Yes	-	-	-	-	-	-	-	-	-	-
R	Freeware	Yes MAd, meta, metafor, metagen, metalik, metamisc, metaSEM, metatest, metaplus, mvmeta, mvtmeta, netmeta, rmeta	Yes metaplus	Yes MAd, meta, metafor, metamisc, metatest	-	Yes metaLik, metaplus	Yes metaLik, metatest	Yes metafor	Yes metaxae	Yes metaplus, boot	Yes metafor	Yes bayesmetaf, blme, BRugs, gemtc, metamisc R2WinBUGS, SASBUGS rjugs
SAS	Commercial	Yes marandom.sas, PROCs GLM and MIXED	Yes PROCs GLM and MIXED	-	-	Yes marandom.sas, PROC NLP	-	-	-	-	-	Yes PROC MCMC
Stata	Commercial	Yes metaan, metan, metareg, mvmeta, xtreg	-	-	Yes metareg	Yes gllamm, metaan	-	-	-	Yes bootstrap	Yes metaan	-
SPSS	Commercial	Yes meanes.sps, metaf.sps, metareg.sps	-	-	-	-	-	-	-	-	-	-
BUGS, OpenBUGS, WinBUGS	Freeware	-		-		-	-	-	-	-		Yes

Abbreviations: BT, Biggerstaff and Tweedie; FP, Follmann and Proschan; HC, Henmi and Copas; HKSJ, Hartung-Knapp/Sidik-Jonkman; PL, Profile likelihood; WTz, Wald-type with a normal distribution; WTt, Wald-type with a t distribution; WTqa, Quantile approximation; ZL, Zeng and Lin.
‘-’ denote that the underlying method is not available in the specific software.
^a A resampling test is available in the R package metatest.52
^b Henmi and Copas30 provide an R code to implement the HC method.
^c IntHout et al16 provide an approach to easily convert WTz CIs to HKSJ CIs.
^d Biggerstaff and Tweedie53 provide a SAS code to implement the BT method.
^e This package uses the exact random-effects weights in the Biggerstaff and Tweedie approach.54
^f Bayesian approaches can be implemented using the Markov Chain Monte Carlo (MCMC) techniques in several software, such as OpenBUGS,55 WinBUGS56 or without MCMC as described by Turner et al...,57 in the R package bayesmeta.58
Comprehensive Meta-Analysis59 www.meta-analysis.com/
Excel using the MetaEasy AddIn60 https://www.jstatsoft.org/article/view/v030i07 or MetaXL AddIn http://www.epigear.com/
HLM61 http://www.ssicentral.com/hlm/
Meta-DiSc62 ftp://ftp.hrc.es/pub/programas/metadisc/
Metawin63 http://www.metawinsoft.com/
MIX64 www.mix-for-meta-analysis.info/
MLwin65 http://www.bristol.ac.uk/cmm/software/mlwin/
Open Meta Analyst66 http://www.cebm.brown.edu/openmeta/
RevMan67 www.cochrane.org/
R68http://www.r-project.org/ Packages: bayesmeta,58 blme,69 boot,70, 71 BRugs,72 Mad,73 mada,74 meta,75 gemtc,76 metafor,77 metagen,78 metaLik,79, 80 metamisc,81 metaplus,82 metaSEM,83 metatest,52, 84 metaxa,54 mvmeta,85 mvtmeta,86 netmeta87) R2WinBUGS,88 rjugs,72 rmeta.89
SAS90 http://www.sas.com/technologies/analytics/statistics/stat/ Macros: marandom.sas,91 PROC IML,92 PROC MIXED,93, 94 PROC GLIMMIX,95 SASBUGS,96 RASmacro,97 PROC MCMC.98
Stata99 www.stata.com / Routines: bootstrap,100 gllamm,101 metaan,102 metareg,103 metan,104 mvmeta,105 xtreg.106
SPSS107 http://www.spss.co.in/ Macros: meanes.sps,108 metaf.sps,109 metareg.sps.110
BUGS,111 OpenBUGS,55 WinBUGS56 www.mrc-bsu.cam.ac.uk/bugs/.

3.1 CI for the overall effect size

3.1.1 WT methods

Wald-type normal distribution (WTz) CIs (method 1)

The WTz approach is the most popular technique for calculating a CI for μ,11 and a 95% CI is given by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0009$

where z_0.975 is the 0.975 quantile of the standard normal distribution. Any τ² estimator can be used when computing a WTz CI.50, 113, 114 This method often has coverage probability considerably below nominal 0.95 level14, 15, 45, 115-121 when k is small and/or τ² is large.13, 45, 52, 79, 118, 119, 122-125 Brockwell and Gordon13 stated that the greatest source of error in the method is the use of a normal approximation for $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0010$ . Despite the widespread use of the WTz method, it ignores uncertainty of the estimates of τ² and v_i in $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0011$ .

Wald-type t-distribution (WTt) CIs (method 2)

A slight modification of the WTz CI is the WTt approach, where the t distribution with k − 1 degrees of freedom is used, as opposed to the normal distribution. Although the two distributions converge asymptotically, the t quantile is larger than the z quantile associated with the WTz method. Hence, the WTt approach results in a wider CI and was proposed in order to increase the coverage probability, especially when the number of studies is small.117, 126 A 95% CI can be obtained by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0012$

where t_k−1,0.975 is the 0.975 quantile of the t distribution with k − 1 degrees of freedom. Any τ² estimator can be used to compute a WTt CI.50, 113

Quantile approximation (WTqa) CIs (method 3)

Brockwell and Gordon122 proposed the WTqa method as an alternative to WTz method in an attempt to achieve better coverage. The method resembles WTz and WTt, but instead of using normal or t distributions, it approximates the 0.025 and 0.975 quantiles of the distribution of the statistic

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0013$

that are required for the 95% CI for μ. Hence, the WTqa uses different quantiles than the ones used in the WTz and WTt approaches. Let b_k be the quantile approximation function, which monotonically decreases as a nonlinear function of k; then, a 95% CI is calculated as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0014$

The quantiles b_k are estimated via a Monte Carlo simulation process of samples of the M statistic with b_k equal to the average of 0.025 and 0.975 absolute quantiles of the distribution, thus accounting for any small asymmetry in the distribution of M around zero.122 To obtain the function b_k, Brockwell and Gordon fit a regression equation for the quantiles as a function of k. The resulting regression equation (for k = 1, 2, …, 30) is

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0015$

However, both number of studies k and the magnitude of τ² may impact on the performance of the WTqa method,122 and changes in the distribution of the within-study variances can importantly impact on b_k.127 Although WTqa approach has been criticized on the grounds that it is, at best, very difficult to obtain suitable critical values b_k that apply to all meta-analyses,127 we include it in this paper for completeness. As a conservative approach, Jackson and Bowden127 suggested the use of the standard normal quantile instead, and to assess the robustness of the findings via a sensitivity analysis of alternative quantiles. Brockwell and Gordon122 developed the WTqa method using the DerSimonian and Laird11 estimator of τ², but WTqa could, in principle, be implemented for any alternative τ² estimator. However, it is not advised to develop the WTqa CI further.127

3.1.2 Hartung-Knapp/Sidik-Jonkman CIs (methods 4 and 5)

Hartung and Knapp14 and Sidik and Jonkman15 independently introduced the Hartung-Knapp/Sidik-Jonkman (HKSJ) CI (method 4) to handle meta-analyses that include a small number of studies. This method is based on the S statistic, which follows a t distribution with k − 1 degrees of freedom,

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0016$

with

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0017$

and $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0018$ where Q_gen is the generalized Q statistic:

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0019$

A 95% CI for μ is given by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0020$

Although the method does not take into account the uncertainty in τ², the use of a different statistical approximation to the usual WT CI may improve accuracy. Also, the HKSJ method can be applied with any τ² estimator and is exact for known variance components.14 For meta-analysis software where this method is not available yet, IntHout et al16 suggested an approach to convert WTz CIs easily to HKSJ CIs. The extension to meta-regression was investigated by Knapp and Hartung,125 and a generalization of the method to multivariate meta-analysis was explored by Jackson and Riley.128

When all variance components, including the between-study variances, are fixed and known, the expected value of Q_gen is k − 1, which equals the degrees of freedom of the associated χ² distribution.129-131 Hence, the small-sample adjustment q will tend to be close to 1. However, q may in fact turn out to be much smaller than 1, such as in cases where the effect sizes are very homogeneous or when the number and/or size of studies is small. This leads to a narrower CI than the WTt approach and can also lead to a narrower CI compared with the WTz method.132, 133 Although the t quantile is always larger than the z quantile associated with the WTz method, explaining in part why the HKSJ CI performs better than the WTz CI, in the case of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0021$ , the HKSJ CI will be narrower than the WTz CI.

Wiksten et al132 show that if $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0022$ , then we estimate $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0023$ , and further that the variance of the estimate of μ simplifies to $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0024$ , with $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0025$ and Q ≤ (k − 1) when the DerSimonian and Laird11 estimator of τ² is used. Therefore, the variance of the estimated effect size is always smaller or equal for the HKSJ method than the WTz method when this estimator of the between-study variance is zero. The possibility that the variance of the estimated effect size from the HKSJ method can be smaller than the variance of the WTz method was discussed by Knapp and Hartung,125 who proposed a simple modification to the procedure. The authors125 suggested using q^* instead of q (method 5)

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0026$

to ensure more conservative results. However, this practice may be overly conservative, leading to loss of power.134, 135

Sidik and Jonkman15 recommend using the HKSJ CI, but instead of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0027$ , they suggest applying the sandwich variance estimator:

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0028$

This is a robust estimator of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0029$ , where the inverse of the study weights ( $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0030$ ) is estimated through the squared sample residuals ( $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0031$ ) from the data, rather than assuming $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0032$ .

However, Sidik and Jonkman120 state that $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0033$ is biased when k is small, and hence, they suggest a bias corrected estimator of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0034$ (see Sidik and Jonkman120 for details). An alternative approach based on the expected information and on appropriately modified degrees of freedom of the t distribution was suggested by Kenward and Roger.136 These alternative expressions for $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0035$ could also be used in WT CIs but have not been adopted in practice, so we do not explore their use further here.

3.1.3 Likelihood-based methods

Profile likelihood CIs (method 6)

The profile likelihood (PL) method has been established in meta-analysis by Hardy and Thompson43 and is based on the likelihood ratio statistic, which unlike the WTz approach allows for asymmetric intervals. For y_i~Ν(μ, v_i + τ²), the log-likelihood function of the parameter vector (μ, τ²) is given by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0036$

Maximum likelihood estimates of (μ, τ²) can be found by maximizing lnL(μ, τ²) under the restriction τ² ≥ 0. The PL function is based on the log-likelihood function and uses an iterative process that provides CIs for μ that allow for the fact that τ² needs to be estimated as well. Since the PL approach profiles over τ², it accounts for, but does not fully allow for, the uncertainty in τ². This is because asymptotic results are required when using this method. However, the PL method is anticipated to be more accurate than the WT methods in smaller samples.

The profile log-likelihood for μ is defined as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0037$

where $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0038$ (μ) is the maximum likelihood estimator for τ²as μ varies.43 A 95% CI for μ can be obtained as the values that satisfy (see Hardy and Thompson43 equation 11)

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0039$

where $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0040$ is the 0.05 quantile of the χ² distribution with 1 degree of freedom. It has been shown that for small k and τ², iterative algorithms are less likely to converge to a single value.13

Higher order likelihood inference methods (methods 7 and 8)

As Reid137 explains, the main asymptotic properties of likelihood-based inference include (a) consistent, asymptotically normal, and efficient maximum likelihood estimators; (b) an asymptotically normally distributed score statistic with mean zero; and (c) an asymptotic chi-squared distributed likelihood ratio statistic. For example, the PL CI in the previous section relies upon the third of these standard results. As Reid137 also explains, higher order asymptotic results for likelihood-based inference are also available. Some higher order likelihood inference methods have recently been applied to meta-analysis, which is a situation where they may be thought to be especially valuable. This is because the number of studies is often small, so that the commonly used “lower-order” asymptotic approximations to the likelihood function will be inadequate. Higher order likelihood–based methods therefore have the potential to produce more accurate results in meta-analysis, and several proposals for this have been made. We briefly summarize the methods here, but the details are technical, and so we refer the reader to the articles cited below for more information.

The Bartlett-type correction of the likelihood ratio statistic was first introduced by Bartlett (method 7).138 Noma17 explains how to apply this to random-effects meta-analysis and so use a higher-order approximation than the PL method above. Noma17 also explains how to use the score statistic to compute CIs and subsequently derives a higher order Bartlett-type adjustment to this score. Skovgaard139 proposed an alternative higher order approximation to the profile log-likelihood (method 8), and Guolo79 explains how to apply this to random-effects meta-analysis. For details on the method, we direct the reader elsewhere.79, 140 The higher order asymptotic methods have higher degree of accuracy, but in some cases (eg, when the between-study variance is close to zero), they may produce numerically unstable maximum likelihood estimates due to the discontinuity of the statistic.79, 140-142 In such cases, a bias reduction approach is suggested.143 Hence, the Bartlett-type correction (method 7) and the Skovgaard statistic (method 8) are the two main proposals for higher order approximations when using methods based on the PL.79

3.1.4 Henmi and Copas CIs (method 9)

Henmi and Copas (HC)30 propose an alternative strategy for obtaining intervals for μ that are less sensitive to publication bias than the widely used WTz method. Since the fixed-effect estimates assign larger weight to bigger studies, and study size is one component among others that is associated with the overall effect size in the presence of publication bias, this method centres the CI on a fixed-effect estimate. This is because the fixed-effect estimates are less sensitive to publication bias than the random-effects estimates. To allow for heterogeneity, they first estimate the variance of the fixed-effect estimate under the random-effects model as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0042$

Henmi and Copas30 then derive an approximation to the resulting pivot G that is used for making inferences about μ

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0043$

assuming that the DerSimonian and Laird11 estimator of the between-study variance is used. Hence, approximate CIs can be computed. This can be thought of as a hybrid approach, where the fixed-effect estimate is accompanied by a CI that allows for between-study heterogeneity under the assumptions made in the random-effects model. A limitation of the approach is that the fixed-effect estimate is not fully efficient under the random-effects model, but Henmi and Copas30 argue that it is “better to use a method that is more robust to publication bias, even if this means sacrificing some efficiency under the standard setting.” A much simpler, but less accurate, way to implement Henmi and Copas' idea would be to assume that the pivot G approximately follows a standard normal distribution, but this would ignore all uncertainty in the between-study variance. This simpler approach could also be used with alternative estimators of the between-study variance. Alternatively, one could apply the IVher model suggested by Doi et al,144 which uses quasi-likelihood approaches and is performed under the fixed-effect assumption. Doi et al144 show that the IVher model favors larger trials, retains the nominal coverage probability, and exhibits lower variance of the overall effect size as opposed to the random-effects model irrespective of the degree of the between-study heterogeneity.

3.1.5 Biggerstaff and Tweedie CIs (method 10)

Biggerstaff and Tweedie (BT)53 proposed the use of different study-specific weights to those more conventionally used in the random-effects model, and estimated μ along with its variance using the weight $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0045$ , so as to acknowledge for τ² variability in the computation of CI for μ. Acknowledging the uncertainty in the weights allows greater uncertainty in the estimation of μ.53 The $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0046$ weights are the expected value of the random-effects weights (calculated using the estimated τ²) rather than the usual random-effects observed weights:

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0047$

The $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0048$ weights depend on the density form of τ² and were derived using the DerSimonian and Laird11 estimator for the between-study variance. Alternative estimators could also be used, in principle, when using this method, provided that their distribution, and so the expected weights used by the method, can be evaluated. The variance of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0049$ is estimated as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0050$

Assuming normality, a 95% CI can be obtained as

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0051$

Biggerstaff and Tweedie53 use an approximate distribution to obtain the expected weights, but this has been improved upon by Preuß and Ziegler54 who used the exact weights through the exact cumulative distribution function of Q, where $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0052$ 145 Biggerstaff and Tweedie53 provided the algorithm to implement the method in SAS.

3.1.6 Resampling methods

Zeng and Lin CIs (method 11)

Zeng and Lin (ZL)146 examine the distribution of the estimated overall effect size under the random-effects model and find that it is not asymptotically normally distributed for a finite number of studies k. This makes intuitive sense, because the textbook result that a linear combination of normal random variables is normally distributed requires that the coefficients in this linear combination are constants. When estimating the overall effect size however, these coefficients are proportional to the weights and so are functions of the estimated between-study variance. We require a large number of studies in order to estimate this variance accurately enough to treat the weights as fixed constants.

Recognizing that the estimated overall effect size is not asymptotically normally distributed for small k, Zeng and Lin146 suggest a resampling procedure to obtain the distribution of this estimate, assuming that the DerSimonian and Laird11 estimator of τ² is to be used. Briefly, they simulate values of τ² using the DerSimonian and Laird11 estimating equation (where the individual study results used in this estimation are simulated from the fitted random-effects model). They then simulate estimated average effect sizes using the sampled τ² to calculate the weights in the estimating equation for $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0053$ (as given in section 2.1, where the individual study results used in this estimation are simulated from the random-effects model centred at the estimated overall effect and where the between study variance is taken to be the sampled value used to compute the weights). By repeating both aspects of this sampling process B times, B² estimates provide an empirical distribution of estimated overall effects that can be used to compute CIs and make inferences.146 This resampling procedure could be modified to accommodate alternative estimators of τ², by instead calculating alternative estimates at the first stage, but this idea would need to be critically evaluated before it could be accepted.

Bootstrap CIs (methods 12 and 13)

Nonparametric bootstrapping is a way to approximate the sampling distribution of a statistic by resampling, from the sample itself, with replacement. Parametric bootstrapping instead samples from a fitted model. Both forms of bootstrapping can be used to make a variety of inferences but are most usually used to quantify the uncertainty in point estimates through the computation of standard errors and CIs. Briefly, bootstrap datasets are sampled (either nonparametrically, method 12,147, 148 or parametrically, method 13),149 from which the bootstrap statistics (the statistic of interest calculated using the bootstrap datasets) are calculated. Then, the empirical distribution of the bootstrap statistics is taken to approximate the distribution of the statistic of interest. Hence, measures of the uncertainty in the statistic, such as standard errors and CIs, can be calculated from this empirical distribution. In our context, this statistic is the estimated overall effect size.

There is a variety of ways in which the bootstrap samples can be sampled under the random-effects model. For example, we could either sample estimated effect sizes and their standard errors directly or instead sample the individual patient data in situations where this is available (this can readily be derived for dichotomous outcome data from the frequency and sample size). A full discussion of all the possibilities is beyond the scope of this paper, but Van Den Noortgate and Onghena150 describe four different bootstrapping procedures, where two of these are parametric and the other two are nonparametric. We refer the reader to this paper for full details of the sampling methods used. Parametric bootstrap CIs have also been advocated by Turner et al149 and nonparametric bootstrap CIs by Efron.151

Follmann and Proschan CIs (method 14)

Permutation tests have been suggested primarily to assess the true statistical significance of an observed finding under the null hypothesis of the absence of effect, especially in meta-analyses with a small number of studies.152 This method can be extended and used for calculating CIs for the effect size. These tests are especially appropriate when the included studies in a meta-analysis may not be considered randomly sampled from a larger population of studies. Confidence intervals can be constructed by inverting hypothesis tests, where parameter values that are not rejected by the hypothesis test lie within the corresponding CI.

Follmann and Proschan (FP)117 begin by considering a permutation method for testing the null hypothesis H₀ : μ = 0. Their argument assumes that the distributions of the outcome data y_i are symmetric117 and this is implied by the random-effects model. Under the null hypothesis, the sign of y_i is equally likely to be positive or negative for a symmetric y_i. There are 2^k possible permutations of the signs of the values of the outcome data y_i. We take Z^p = ( $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0054$ , $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0055$ ,..., $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0056$ ) to be the pth of these 2^k permutations; for example, with k = 5 studies, (+1, +1, −1, −1, +1) is one of the 32 possible permutations. We define $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0057$ to be the pth permutation of the outcome data corresponding to Z^p. The central idea is that under the null hypothesis, H₀ : μ = 0 and because the distributions of the y_i are symmetric, all 2^k permutations X^p are equally likely. Hence, Follman and Proschan117 propose the null distribution where all values of

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0058$

are equally likely, where $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0059$ are the normalized (sum to one) random-effects weights described in section 2.1, where the between-study variance is estimated using X^p as outcome data. Hence, $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0060$ is the estimated average effect size using the pth permutation of signs of the outcome data. The two-sided P value based on the group permutation method proposed by Follmann and Proschan117 is simply the proportion of the absolute values of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0061$ that are more than or equal to the absolute value of the estimated average effect under the random-effects model using the observed data. Follman and Proschan117 describe their procedure in terms of the DerSimonian and Laird11 estimator of the between-study variance, but, in principle, alterative estimators could be used. If k is too large for all permutations to be evaluated, then the permutation distribution can be approximated by instead simulating a large number of permutations.117

The procedure described above can be extended so that we instead test the hypothesis H₀ : μ = c, and invert this hypothesis test to give the bounds of CIs. For further details on the FP method, we refer the reader to Follman and Proschan.117 This method has been suggested as an alternative approach to the WT and likelihood-based approaches that assume normality of the observed effects, but it can be computationally demanding. The discrete nature of the permutation distribution will ensure that the CI maintains the desired coverage probability, but in general, this coverage probability will be larger than the nominal level.

3.1.7 Bayesian credible intervals (method 15)

Bayesian credible intervals (CrIs) for the overall effect size can be obtained within a Bayesian framework using specialized software and the Markov chain Monte Carlo (MCMC) technique, such as WinBUGS153 or SAS PROC MCMC. Some advantages of the Bayesian approach include (1) incorporation of uncertainty in model parameters (μ, τ²); (2) derivation of CrIs from the posterior distribution; and (3) use of informative prior distributions on the model parameters. However, the use of informative priors for the effect size parameters has been discouraged by some researchers due to potential inclusion of bias.154 The use of vague priors allows the analysis to remain data driven. On the contrary, the use of informative priors for the between-study variance has been suggested to increase confidence in the overall effect size, especially when few studies are included in a meta-analysis.20, 21 Informative priors for τ² under several treatment comparison types and outcome settings are available for dichotomous data20 and for continuous data.21 Friede et al155 suggest Bayesian CrIs perform well even in rare diseases with a small number of studies when the appropriate prior for τ² is applied. In rare diseases and small populations, the use of half-normal priors, with expectation 0 and variance 0.25 or 1 for τ², has been recommended when log-odds ratios are used to measure the effect size.155 Vague priors can also be applied for τ², but caution is needed, as results are sensitive to the prior specification, especially when the number of studies is small.156 This is because the choice of prior may impact on the estimation of the between-study variance and consequently on the estimated overall effect size and the width of its CI. Other difficulties that have been associated with the derivation of Bayesian CrIs include the complication of determining whether convergence is achieved, the need to burn-in when using MCMC, and the impact of MC error. Alternative methods to implement a Bayesian meta-analysis are available by using numerical integration, importance sampling, and data augmentation as described by Turner et al57 and Rhodes et al.157 For practical application, the R package bayesmeta is available.58

3.2 Comparative evaluation of the methods

The properties of the 15 CI approaches have been evaluated in 31 research papers, including 30 simulation studies and 32 real data evaluations of the methods (for simulation scenarios and study characteristics, see Appendices S3-S4). Published articles suggested that the different approaches can provide noticeably different or even conflicting results and their performance can vary regarding coverage and CI width. Below, we discuss the comparative results as presented in the identified studies. However, it is hard to compare simultaneously all 15 CI approaches, as they have never all been compared under the same conditions and simulation scenarios. The presentation of results follows the same CI presentation order with section 3.1.

3.2.1 WT methods (methods 1, 2, and 3)

The performance of the popular WTz method has been assessed in several studies, and it is poor when compared with other methods. Simulations suggest that the WTz performs worse in terms of coverage for small numbers of studies (k < 16) compared with the PL and the WTt methods, whereas for large k, all three methods perform well.158 The performance of the WTz method though does not only depend on the number of studies but also depend on the τ² estimator employed and its magnitude.45 The WTz coverage has been found to differ by up to 0.05 between different τ² estimators, up to 0.30 between meta-analysis samples, and up to 0.20 across between-study variance values ranging from small to large τ². Coverage has been found to be as low as 0.65 (at 0.95 nominal level) when I² (defined as the percentage of the total variability in a set of effect sizes that is due to between-study variability beyond what is expected by within-study random error) is 90% and two or three studies are included in a meta-analysis, but it tends towards the nominal level as the number of studies increases.113

To increase coverage, the t distribution can be used, which produces wider CIs than those obtained by the standard normal distribution, especially when τ² and k are small.15, 117 The coverage probability is therefore higher with the WTt approach, but it depends on the estimator and the magnitude of τ², as well as on the number of studies.45 Simulation showed that the WTt CI is less affected by the number of studies compared with the WTz CI.52 Although WTt coverage may be more robust to changes in the τ² magnitude compared with WTz when few studies are included in a meta-analysis, it has been found to differ by up to 0.05 depending on the τ² estimator used and the number of studies.113 For large meta-analysis samples (eg, k ≥ 20), the coverage of the 95% WTt CI may be below the nominal level, but it becomes conservative (close to 1) when k is small.113, 122, 158

Alternatively, the WTqa method is easy to implement and produces intervals with better coverage in comparison with the WTz method.122 A simulation study45 showed that different estimators of the between-study variance may impact on coverage and that the WTqa method is associated with higher coverage than WTz and HKSJ CIs, but the HKSJ method produced values closer to the nominal level. The same study showed that the WTqa method has similar coverage to the WTt method. For small k, coverage of the WTt method is well above the nominal level and higher than that for the WTqa method, but as k increases, the differences in coverage are not so important.122 Simulations have also shown that WTqa outperforms WTz, PL, and ZL approaches but it is very conservative.146

3.2.2 HKSJ methods (method 4 and 5)

The HKSJ approach (method 4) is often preferred, as in the case of small k, it is conservative and on average produces wider CIs with more adequate type I error compared with the WTz method.16, 119, 141, 152, 159 The HKSJ method provides exact inference when all study sizes are equal and the random-effects model is true, resulting in better inference than WTz,160 but also provides more accurate inference in small meta-analyses with different study sizes.14, 15, 116 Several studies suggested that the HKSJ method has coverage close to the nominal level and that it is not influenced by the magnitude or estimator of τ².16, 45, 113, 115, 119, 121, 125, 132, 135, 155 Nevertheless, Knapp and Hartung125 recommend using the PM161 estimator for the between-study variance along with the HKSJ method to obtain CIs for μ so as to get a cohesive approach based on Q_gen.161, 162 Sanchez-Meca and Marin Martinez45 recommend using the HKSJ method, as it is additionally insensitive to the number of trials. Simulation studies suggest that HKSJ has good coverage when the effect measure is the log-odds ratio,15, 119 the standardized mean difference,121 the mean difference, and the risk difference.118 The coverage of the 95% HKSJ CI is generally better than the WTz and WTt coverages, but it is suboptimal in meta-analyses with binary outcomes and rare events, as shown in simulated meta-analyses where the odds-ratio was used as the measure of effect.113

A real-life data study of 920 Cochrane meta-analyses with k ≥ 3 showed that the WTz method yielded more often statistically significant results compared with the HKSJ method (45% vs 35% of meta-analyses).163 IntHout et al16 found similar results in their real-life data study with 434 Cochrane meta-analyses with dichotomous data (43% vs 34%) and 255 Cochrane meta-analyses with continuous data (51% vs 40%). It is recommended that caution is needed when fewer than five studies of unequal sizes are included in the meta-analysis.16 Wiksten et al132 in their empirical evaluation including 157 meta-analyses with dichotomous data and k ≥ 4 found that in the presence of heterogeneity (using the DerSimonian and Laird11 estimator [ $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0062$ ] or the Cochran Q statistic [P < 0.10]),11, 164 the P value for the overall effect size was typically greater when using the HKSJ than the WTz method. However, they comment that the HKSJ method is not always more conservative when $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0063$ .

It has been shown that in the absence of heterogeneity, the coverage of HKSJ may be smaller than the WTz coverage providing narrower CIs.15, 115, 118, 121, 132, 133, 165 This was more prevalent in cases with rare events.132 Jackson et al133 raise a variety of concerns about the use of the HKSJ method, including (1) the implications of the modification for any given meta-analysis are hard to predict, (2) HKSJ can result in shorter CIs for the overall effect size than the WTz method, and (3) the coverage of the HKSJ CI might be anticipated to be low when $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0064$ . However, in simulation studies conducted by Röver et al,135 Viechtbauer et al,134 and Sanchez-Meca and Marin Martinez,45 HKSJ worked well even in the absence of heterogeneity. This is in line with the simulations by Gonnermann et al,166 but in the presence of only two studies for τ² = 0, HKSJ is associated with very low power compared with WTz (15% vs 60%), which may be due to the wider CI, whereas for mild to moderate τ², both methods have poor control of type I error. A simulation study compared HKSJ with the small sample modification suggested by Knapp and Hartung125 and indicated that the use of the modified HKSJ (method 5) is preferable when few studies of varying size and precision are available.135 Another simulation study suggested the use of the modified HKSJ approach instead of the common HKSJ and WTz approaches when dichotomous data are considered.167 However, for few studies (and particularly for k = 2) and as the between-study variance decreases, the modified HKSJ tends to be overconservative, and selection between the methods is a matter of power vs type I error.133-135

3.2.3 Likelihood-based methods (methods 6, 7, and 8)

The PL method is often preferred to the WTz method, as it is associated with a higher coverage closer to the nominal level, even when k is relatively small.122, 143 Jackson et al158 showed that the PL method performed well and better than the WTz and WTt methods in meta-analyses with few studies (k ≤ 8) with coverage close to the nominal level. However, coverage decreases as τ² increases and/or k decreases.43 Simulations suggest that the PL CI is less affected by the number of studies in a meta-analysis compared with the WTz CI, but both WTz and PL have poor coverage control, as they yield values below the nominal level.52

Simulations found that the Bartlett-type correction CI (method 7) improves coverage properties over the WTz, WTt, and PL methods that their coverage deviates the nominal level as τ² increases and/or k decreases.17, 52 Although the Bartlett-type correction CI has a satisfactory power compared with the WTz, WTt, and PL CIs52 and performs well when τ² = 0,142 caution is needed for k ≤ 5, as it tends to be overconservative.17 The Skovgaard statistic CI (method 8) is associated with coverage closer to the nominal level compared with the WTz and PL CIs, which is remarkable for small k.17, 79, 141 Both the Skovgaard statistic CI and the Bartlett-type correction CI perform very satisfactorily regarding coverage and yield similar results.141

3.2.4 HC and BT (methods 9 and 10)

Simulations showed that in the absence of publication bias and for k > 10, the HC method yields better coverage than WTz, HKSJ, PL, and BT methods, whereas for k < 10, the HKSJ and PL methods perform best.30 The same study showed that when publication bias is present and for k > 10, HC improved coverage compared with WTz, HKSJ, PL, and BT methods and showed less bias than the fixed-effect model. Also, the WTz and BT methods have comparable coverage probabilities with coverage below the nominal level,54, 122 but coverage is increased for the exact weights.54

3.2.5 Resampling methods (methods 11, 12, 13, and 14)

Zeng and Lin146 showed that the ZL CI outperforms both WTz and PL CIs for small k in terms of coverage. Another simulation study showed that the FP CI controls coverage better than WTz, WTt, and PL and is closely followed by the Bartlett-type correction CI, but the latter is slightly more powerful especially for small k.52 The same study showed that the FP CI and the Bartlett-type correction CI were less affected by the number of studies than WTz, PL, and WTt methods.

3.2.6 Bayesian CrIs (method 15)

Simulation studies showed that Bayesian intervals produce intervals with coverage closer to the nominal level compared with the HKSJ, modified HKSJ, and PL CIs,155, 168 and they tend to be smaller than the HKSJ CI even in situations with similar or larger coverage.155 However, the performance of the Bayesian CrIs may vary depending on the prior assigned to the between-study variance.156

3.3 Prediction intervals

One of the most important aims in clinical decision-making is the prediction of the possible effect size in an individual setting. A prediction interval provides a predicted range for the true effect size in a new study, and its calculation is recommended to be conducted under the random-effects model.24, 153 Assuming the random effects are normally distributed, an ad hoc 95% prediction interval can be obtained by

$urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0065$

To date, this is the standard prediction interval approach used in meta-analysis. We call this prediction interval ad hoc, because τ² is unknown, and currently, there is no exact distributional form available. The use of a t distribution instead of a normal distribution reflects the uncertainty resulting from the estimation of the heterogeneity. Higgins et al24 also presented this ad hoc 95% prediction interval but instead using quantiles from the t_k − 2 distribution. However, we suggest using the t distribution with consistent degrees of freedom for both CIs (eg, see the WTt and HKSJ methods) and prediction intervals. This is because when there is truly no heterogeneity (τ² = 0), the overall effect size and the true effect size in a new study are identical, so that CIs (for the overall effect size) and prediction intervals (for the true effect in a new study) should be identical. Taking the estimated between-study variance of zero to be the true value therefore gives rise to the intuition that CIs and prediction intervals should be identical when $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0066$ . In fact, in the metafor R package,77 the CIs, and prediction intervals are always computed in a consistent manner: When a WTz CI is computed, then a prediction interval is also calculated using a standard normal distribution, whereas when a HKSJ CI is computed, then both CI and prediction interval are computed using the t_k − 1 distribution. Hence, when $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0067$ , the CI and prediction interval will coincide, as intuition suggests that they should. To date, other routines, including the meta75 R package, the Stata metan,104 and Stata mvmeta,105 commands calculate a prediction interval using a t distribution with k − 2 degrees of freedom. Another advantage of using the t_k − 1 distribution is that for k = 2, where a CI for the overall effect size is available, prediction intervals can be calculated. However, this is not the case when the t_k − 2 distribution is used. Prediction intervals come especially naturally from a Bayesian approach, but at the price of specifying priors.156

It is worth noting that the prediction interval does not inform the statistical significance of $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0068$ ; it instead describes the region within which the true study effects of new studies are expected to be found. A prediction interval can help understand the uncertainty about whether an intervention is expected to work and reflects the potential effect in future study participants.169 Prediction intervals are particularly helpful when excess between-study heterogeneity exists, and the combination of individual studies into an overall effect size would not be advisable. IntHout et al170 found that in more than 70% of the statistically significant meta-analyses in the Cochrane Database of Systematic Reviews with $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0069$ , the 95% prediction interval suggested that the effect size in a new study could be null or even in the opposite direction from the overall result in some patient populations. The prediction interval can also be used to calculate the probability that a new trial will have a negative result and to improve the calculations of the power of a new trial. Conclusions drawn from a prediction interval are based on the assumption the study-effects are normally distributed. The prediction interval estimation will be imprecise if the estimates of the overall effect size and τ² are away from the true parameter. Partlett and Riley171 assessed the performance of the ad hoc 95% prediction interval in a simulation study, and they concluded that the method is only accurate when heterogeneity is large (I² > 30%) and the study sizes are similar. However, for small heterogeneity and different study sizes, the coverage of prediction interval can be as low as 78% depending on the between-study variance estimator.171 Lee and Thompson172 highlight the importance when calculating a prediction interval to allow for potential skewing and heavy tails in the random-effects distributions. Prediction intervals can be implemented in several software, such as R (using, for example, metafor77 and meta75 packages) Stata (using, for example, metan104 and mvmeta105 commands).

4 DISCUSSION

The estimation of the overall effect size is one of the primary aims in meta-analysis. Therefore, the computation of a CI/CrIs is crucial in order to interpret the uncertainty in the estimated overall effect size. Wald-type methods, and in particular the WTz CI using the DerSimonian and Laird11 estimator for the between-study variance, are commonly used and are the default option in several meta-analysis software (eg, RevMan).67 However, the accuracy of these standard CI methods is not optimal, as the coverage probability associated with these CIs can deviate considerably from the nominal coverage probability in small meta-analyses.12-18 This is not surprising as the WT methods rely upon large-sample approximations requiring many studies to be included in a meta-analysis. However, meta-analyses often include a small number of studies, and large-sample approximations can be inaccurate.19-23 Perhaps because of this property, several other CI methods have been proposed to improve the standard WT CIs, including likelihood-based and resampling methods and, more recently, higher order likelihood inference methods. In the present study, we provide a comprehensive review of the CIs for the overall effect size under the random-effects model.

Our review identified 15 methods for calculating a CI for the overall effect size, each of which has different statistical properties. The selection of a method for computing a CI should be based on its statistical performance according to the corresponding meta-analysis' characteristics, as well as on the method's computational and conceptual complexity. Usually, one of these comes at the price of another. For example, the likelihood-based methods are associated with coverage closer to the nominal level compared with the commonly used WTz method but are computationally more demanding than the WTz CI. Also, the use of some methods (eg, ZL) is limited in meta-analyses, due to complex calculations with standard software or their unavailability in statistical software. Simulations have assessed the performance of various methods and showed that it mostly depends on the magnitude of the between-study variance and number of studies in a meta-analysis. However, additional items should be considered when selecting a CI. These may include the type of outcome data and the study size.

The selection of the most preferable methods to calculate a CI for the overall effect size can be mostly based on coverage, as this measure was the only one consistently reported across the identified studies. The 15 methods identified in this review have never all been compared in one simulation study under the same conditions, and hence, making any clear recommendations about these methods would be difficult. Also, none of the methods had an optimal coverage across all settings. Therefore, we can only offer tentative recommendations based on the available evidence, but these depend on the study findings, their simulation scenarios, and the CIs examined. It would require an extensive simulation study to assess the performance of all of these methods, under the same, realistic settings. Future studies should evaluate the CIs for all relevant properties, including coverage, precision, complexity, and power of the corresponding tests. In addition, further research is necessary to make judgements on the performance of CIs. In particular, a comprehensive simulation study informed by real-life data included in a meta-analysis would help determine the factors that impact the performance of the CI methods. Factors to consider in this analysis may include number and size of studies, baseline risk variability, magnitude and estimator of the between-study variance, frequency of events in dichotomous outcome data, type of outcome data, choice of effect size, distribution of effect sizes, sensitivity to small-study effects or publication bias, and different meta-analytical approaches (eg, Mantel–Haenszel, Peto, or one-step methods).

To date, limited evidence exists to inform which method performs best, especially when studies are few in number (<5), and given that the Bayesian intervals have not been assessed extensively in comparative studies. Overall, studies suggest that the HKSJ method has one of the best performance profiles. It performs well even in meta-analyses with fewer than 10 studies28 and is robust to the use of different estimators for the between-study variance and to changes in the magnitude of the between-study variance.45, 113 However, it should be considered that HKSJ is not always conservative compared with a fixed-effect meta-analysis.132 If the estimated between-study variance is zero, the variance of the estimated overall effect size can be inaccurately small, and hence, the HKSJ CI will be too narrow.132 In such cases, it has been suggested to use the modified HKSJ to avoid inaccurate narrow CIs.125 Also, caution is needed in meta-analyses with rare events, where the HKSJ coverage has been found to be as low as 85%113 and meta-analyses with fewer than five studies.28 In the case of few studies, the modified HKSJ has been suggested,135 but in the case of k = 2, the modified HKSJ tends to be overly conservative.135, 166 The likelihood-based methods, and in particular the higher order methods Skovgaard statistic CI and Bartlett-type correction CI, are also associated with good coverage properties.141 However, the higher order likelihood methods have never been compared directly with HKSJ, which would help make informed decisions on the CI selection. Alternatively, Bayesian intervals may be considered preferable to frequentist intervals in situations where prior information is available and can be considered suitable for use in the meta-analysis. Bender et al28 recommend the use of the HKSJ method as a standard approach but in case studies have considerably different precisions the modified HKSJ should be preferred. The same authors also suggest that when reliable prior information on the between-study variance is available, then the Bayesian intervals with (weakly) informative prior distributions for the heterogeneity should be preferred.

The computation of prediction intervals in meta-analysis is also valuable, as they provide additional information about the overall effect size, and we believe that they should be used more frequently. We propose to use k − 1 degrees of freedom rather than k − 2 to calculate prediction intervals, so that the CIs using a t distribution (eg, WTt and HKSJ CIs) and prediction intervals are identical when $urn:x-wiley:17592879:media:jrsm1319:jrsm1319-math-0070$ . Although some concerns have been raised about prediction intervals, including their actual coverage probability of the true effect in a new study and their sensitivity to distributional assumptions,171, 172 their advantages outweigh their disadvantages, as they are a nice and easy way for people to interpret the implications of the between-study heterogeneity implied by their fitted model. A comprehensive simulation study assessing the different types of prediction intervals under a variety of meta-analytical scenarios and different between-study variance estimators would help critically examine the issues associated with the calculation and interpretation of prediction intervals.

In conclusion, there are multiple methods to compute a CI for the overall effect size, and none of the methods clearly performs best across all meta-analytical settings. We hope that bringing them all together in one place will facilitate investigators in forming their own judgements about the most appropriate method for their needs. Overall, based on the existing literature and consensus among the coauthors of this paper, we tentatively suggest the application of the HKSJ method as a standard approach, at least in a meta-analysis with five or more studies. We recommend conducting a sensitivity analysis using a variety of methods (with at least two to three methods) to assess the robustness of findings and conclusions, especially in a meta-analysis with fewer than 10 studies. It should be highlighted that these results refer to normally distributed true study-specific effects, and simulation studies are necessary to compare the performance of the 15 methods described in this review. For example, Kontopantelis and Reeves173 used various nonnormal distributions for the effect sizes and compared the WTz, WTqa, HKSJ, PL, BT, and FP methods. The authors showed that simulation results were broadly consistent across different effect size distributions (normally distributed, skew-normal, and extreme nonnormal study effects) with PL providing the best coverage, but with wide CI. Also, the FP method provided coverage close to the nominal level, regardless the included number of studies, at the expense of highly lengthy CIs. The HKSJ method had a consistent 94% coverage for nonnormal study effects and small heterogeneity, but with larger heterogeneity, the FP method performed better than the HKSJ CI. For a small number of studies (≤5), the WTz and PL methods performed best, with WTz outperforming PL only when the between-study variance was small. However, more simulation studies with nonnormal true study-specific effects are required to draw robust conclusions for the 15 CIs across different meta-analytical scenarios.

We also recommend the calculation of prediction intervals as a supplement to a CI to illustrate the degree of heterogeneity, particularly when large between-study heterogeneity is present. However, caution is needed for small between-study heterogeneity and unequal study sizes. In this case, it is advisable to prefer prediction intervals derived in Bayesian framework using, for example, informative prior distributions.20, 21 Should any new methods become available, we recommend that these are compared with most, or ideally all, of the methods described in this review, and under the same circumstances both using real-life data and simulation studies. In Appendices S6 to S10, we present a selection of the identified methods for computing a CI to four illustrative, and contrasting, real data examples.174 We hope that our codes presented in Appendices S8A-C and S9 can help to make this possible. This will help obtain a clearer picture about the performance of these methods when these are compared with each other.

List of Abbreviations

BT: Biggerstaff and Tweedie
CI: confidence interval
CrIs: credible intervals
FP: Follmann and Proschan
HC: Henmi and Copas
HKSJ: Hartung-Knapp/Sidik-Jonkman
IQWiG: Institute for Quality and Efficiency in Health Care
PL: profile likelihood
WT: Wald-type
WTqa: quantile approximation
WTt: Wald-type with a t distribution
WTz: Wald-type with a normal distribution
ZL: Zeng and Lin

AVAILABILITY OF DATA AND MATERIALS

All data are from a published study and are available in the supporting file.

DETAILS OF CONTRIBUTORS

Areti Angeliki Veroniki, Dan Jackson, Ralf Bender, Oliver Kuss, Dean Langan, Julian P.T. Higgins, Guido Knapp, and Georgia Salanti contributed to the conception and design of the study and helped to draft the manuscript. Areti Angeliki Veroniki searched PubMed, screened identified articles for eligibility, contacted original study authors when needed, and conducted the statistical analysis. All authors read and approved the final manuscript.

FUNDING INFORMATION

This work did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Ralf Bender is employed by the Institute for Quality and Efficiency in Health Care, Cologne, Germany.

ACKNOWLEDGMENTS

We thank Dr. Jack Bowden and Dr. Wolfgang Viechtbauer for providing their feedback on a previous draft of the paper. We thank Annamaria Guolo and Michael Preuss for responding to our questions regarding their approaches. We also thank Shazia Siddiqui and Myanca Rodrigues for formatting the manuscript.

CONFLICT OF INTEREST

The authors declare that there is no conflict of interest.

Supporting Information

REFERENCES

1da Costa BR, Juni P. Systematic reviews and meta-analyses of randomized trials: principles and pitfalls. Eur Heart J. 2014; 35(47): 3336-3345.
10.1093/eurheartj/ehu424
PubMedWeb of Science®Google Scholar
2 WHO. World Health Organization. http://www.who.int/en/. Accessed 08 February, 2017.
Google Scholar
3 EMA. European Medicines Agency. http://www.ema.europa.eu/ema/. Accessed 08 February, 2017.
Google Scholar
4 CADTH. Canadian Agency for Drugs and Technologies in Health. https://www.cadth.ca/. Accessed 08 February, 2017.
Google Scholar
5 IQWiG. Institute for Quality and Efficiency in Health Care. https://www.iqwig.de/. Accessed 06 October 2017.
Google Scholar
6 NICE. National Institute for Health and Clinical Excellence. https://www.nice.org.uk/. Accessed 08 February, 2017.
Google Scholar
7Atkins D, Best D, Briss PA, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004; 328(7454): 1490.
10.1136/bmj.328.7454.1490
PubMedWeb of Science®Google Scholar
8Chess LE, Gagnier JJ. Applicable or non-applicable: investigations of clinical heterogeneity in systematic reviews. BMC Med Res Methodol. 2016; 16(1): 19.
10.1186/s12874-016-0121-7
PubMedWeb of Science®Google Scholar
9Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration; 2011.
Google Scholar
10Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Stat Med. 1999; 18(20): 2693-2708.
10.1002/(SICI)1097-0258(19991030)18:20<2693::AID-SIM235>3.0.CO;2-V
CASPubMedWeb of Science®Google Scholar
11DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986; 7(3): 177-188.
10.1016/0197-2456(86)90046-2
CASPubMedGoogle Scholar
12Cornell JE, Mulrow CD, Localio R, et al. Random-effects meta-analysis of inconsistent effects: a time for change. Ann Intern Med. 2014; 160(4): 267-270.
10.7326/M13-2886
PubMedWeb of Science®Google Scholar
13Brockwell SE, Gordon IR. A comparison of statistical methods for meta-analysis. Stat Med. 2001; 20(6): 825-840.
10.1002/sim.650
CASPubMedWeb of Science®Google Scholar
14Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med. 2001; 20(24): 3875-3889.
10.1002/sim.1009
CASPubMedWeb of Science®Google Scholar
15Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Stat Med. 2002; 21(21): 3153-3159.
10.1002/sim.1262
CASPubMedWeb of Science®Google Scholar
16IntHout J, Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014; 14(1): 25.
10.1186/1471-2288-14-25
PubMedWeb of Science®Google Scholar
17Noma H. Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Stat Med. 2011; 30(28): 3304-3312.
10.1002/sim.4350
CASPubMedWeb of Science®Google Scholar
18Poole C, Greenland S. Random-effects meta-analyses are not always conservative. Am J Epidemiol. 1999; 150(5): 469-475.
10.1093/oxfordjournals.aje.a010035
CASPubMedWeb of Science®Google Scholar
19Longford NT. Estimation of the effect size in meta-analysis with few studies. Stat Med. 2010; 29(4): 421-430.
10.1002/sim.3814
PubMedWeb of Science®Google Scholar
20Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JP. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane database of systematic reviews. Int J Epidemiol. 2012; 41(3): 818-827.
10.1093/ije/dys041
PubMedWeb of Science®Google Scholar
21Rhodes KM, Turner RM, Higgins JP. Predictive distributions were developed for the extent of heterogeneity in meta-analyses of continuous outcome data. J Clin Epidemiol. 2015; 68(1): 52-60.
10.1016/j.jclinepi.2014.08.012
PubMedWeb of Science®Google Scholar
22Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of meta-analyses and their component studies in the Cochrane database of systematic reviews: a cross-sectional, descriptive analysis. BMC Med Res Methodol. 2011; 11(1): 160.
10.1186/1471-2288-11-160
PubMedWeb of Science®Google Scholar
23Kontopantelis E, Springate DA, Reeves D. A re-analysis of the Cochrane library data: the dangers of unobserved heterogeneity in meta-analyses. PLoS One. 2013; 8(7):e69930.
10.1371/journal.pone.0069930
CASPubMedWeb of Science®Google Scholar
24Higgins JP, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. J R Stat Soc Ser a Stat Soc. 2009; 172(1): 137-159.
10.1111/j.1467-985X.2008.00552.x
PubMedWeb of Science®Google Scholar
25Riley RD, Higgins JP, Deeks JJ. Interpretation of random effects meta-analyses. BMJ. 2011; 342(feb10 2): d549.
10.1136/bmj.d549
PubMedGoogle Scholar
26Borenstein M, Hedges L, Higgins J, Rothstein H. Introduction to meta-analysis. Chichester, UK: Wiley; 2009.
10.1002/9780470743386
CASPubMedGoogle Scholar
27McKenzie JE, Beller EM, Forbes AB. Introduction to systematic reviews and meta-analysis. Respirology. 2016; 21(4): 626-637.
10.1111/resp.12783
PubMedWeb of Science®Google Scholar
28Bender R, Friede T, Koch A, et al. Methods for evidence synthesis in the case of very few studies. Res Synth Methods. 2018. epub ahead of print:1–11
10.1002/jrsm.1297
Web of Science®Google Scholar
29Laird NM, Mosteller F. Some statistical methods for combining experimental results. Int J Technol Assess Health Care. 1990; 6(1): 5-30.
10.1017/S0266462300008916
CASPubMedGoogle Scholar
30Henmi M, Copas JB. Confidence intervals for random effects meta-analysis and robustness to publication bias. Stat Med. 2010; 29(29): 2969-2983.
10.1002/sim.4029
PubMedWeb of Science®Google Scholar
31Schmidt FL, Oh IS, Hayes TL. Fixed- versus random-effects models in meta-analysis: model properties and an empirical comparison of differences in results. Br J Math Stat Psychol. 2009; 62(Pt 1): 97-128.
10.1348/000711007X255327
PubMedWeb of Science®Google Scholar
32Borenstein M, Hedges LV, Higgins JP, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010; 1(2): 97-111.
10.1002/jrsm.12
PubMedWeb of Science®Google Scholar
33Hunter JE, Smith FL. Fixed effects vs. random effects meta-analysis models: implications for cumulative research knowledge. Int J Select Assess. 2000; 8(4): 275-292.
10.1111/1468-2389.00156
Web of Science®Google Scholar
34Marín-Martínez F, Sánchez-Meca J. Weighting by inverse variance or by sample size in random-effects meta-analysis. Educ Psychol Meas. 2009; 70(1): 56-73.
10.1177/0013164409344534
Web of Science®Google Scholar
35Sanchez-Meca J, Marín-Martínez F. Weighting by inverse variance or by sample size in meta-analysis: a simulation study. Educ Psychol Meas. 2016; 58(2): 211-220.
10.1177/0013164498058002005
Web of Science®Google Scholar
36Singh K, Xie M, Strawderman W. Combining information from independent sources through confidence distribution. Ann Stat. 2005; 33(1): 159-183.
10.1214/009053604000001084
Web of Science®Google Scholar
37Xie M, Singh K, Strawderman WE. Confidence distributions and a unifying framework for meta-analysis. J Am Stat Assoc. 2011; 106(493): 320-333.
10.1198/jasa.2011.tm09803
CASWeb of Science®Google Scholar
38Simmonds MC, Higgins JP. A general framework for the use of logistic regression models in meta-analysis. Stat Methods Med Res. 2016; 25(6): 2858-2877.
10.1177/0962280214534409
PubMedWeb of Science®Google Scholar
39Mawdsley D, Higgins JP, Sutton AJ, Abrams KR. Accounting for heterogeneity in meta-analysis using a multiplicative model-an empirical study. Res Synth Methods. 2017; 8(1): 43-52.
10.1002/jrsm.1216
PubMedWeb of Science®Google Scholar
40Stanley TD, Doucouliagos H. Neither fixed nor random: weighted least squares meta-analysis. Stat Med. 2015; 34(13): 2116-2127.
10.1002/sim.6481
CASPubMedWeb of Science®Google Scholar
41Bellio R, Guolo A. Integrated likelihood inference in small sample meta-analysis for continuous outcomes. Scand J Stat. 2016; 43(1): 191-201.
10.1111/sjos.12172
Web of Science®Google Scholar
42Hoaglin DC. We know less than we should about methods of meta-analysis. Res Synth Methods. 2015; 6(3): 287-289.
10.1002/jrsm.1146
PubMedWeb of Science®Google Scholar
43Hardy RJ, Thompson SG. A likelihood approach to meta-analysis with random effects. Stat Med. 1996; 15(6): 619-629.
10.1002/(SICI)1097-0258(19960330)15:6<619::AID-SIM188>3.0.CO;2-A
CASPubMedWeb of Science®Google Scholar
44Sidik K, Jonkman JN. A comparison of heterogeneity variance estimators in combining results of studies. Stat Med. 2007; 26(9): 1964-1981.
10.1002/sim.2688
PubMedWeb of Science®Google Scholar
45Sanchez-Meca J, Marin-Martinez F. Confidence intervals for the overall effect size in random-effects meta-analysis. Psychol Methods. 2008; 13(1): 31-48.
10.1037/1082-989X.13.1.31
PubMedWeb of Science®Google Scholar
46Kulinskaya E, Dollinger MB, Bjørkestøl K. Testing for homogeneity in meta-analysis I. The one-parameter case: standardized mean difference. Biometrics. 2011; 67(1): 203-212.
10.1111/j.1541-0420.2010.01442.x
PubMedWeb of Science®Google Scholar
47Kulinskaya E, Dollinger MB, Bjorkestol K. On the moments of Cochran's Q statistic under the null hypothesis, with application to the meta-analysis of risk difference. Res Synth Methods. 2011; 2(4): 254-270.
10.1002/jrsm.54
PubMedWeb of Science®Google Scholar
48Bohning D, Malzahn U, Dietz E, Schlattmann P, Viwatwongkasem C, Biggeri A. Some general points in estimating heterogeneity variance with the DerSimonian-Laird estimator. Biostatistics. 2002; 3(4): 445-457.
10.1093/biostatistics/3.4.445
PubMedWeb of Science®Google Scholar
49Malzahn U, Böhning D, Holling H. Nonparametric estimation of heterogeneity variance for the standardised difference used in meta-analysis. Biometrika. 2000; 87(3): 619-632.
10.1093/biomet/87.3.619
Web of Science®Google Scholar
50Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods. 2016; 7(1): 55-79.
10.1002/jrsm.1164
PubMedWeb of Science®Google Scholar
51Langan D, Higgins JP, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Res Synth Methods. 2017; 8(2): 181-198.
10.1002/jrsm.1198
PubMedWeb of Science®Google Scholar
52Huizenga HM, Visser I, Dolan CV. Testing overall and moderator effects in random effects meta-regression. Br J Math Stat Psychol. 2011; 64(Pt 1): 1-19.
10.1348/000711010X522687
PubMedWeb of Science®Google Scholar
53Biggerstaff BJ, Tweedie RL. Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Stat Med. 1997; 16(7): 753-768.
10.1002/(SICI)1097-0258(19970415)16:7<753::AID-SIM494>3.0.CO;2-G
CASPubMedWeb of Science®Google Scholar
54Preuß M, Ziegler A. A simplification and implementation of random-effects meta-analyses based on the exact distribution of Cochran's Q. Methods Inf Med. 2014; 53(1): 54-61.
10.3414/ME13-01-0073
CASPubMedWeb of Science®Google Scholar
55Thomas N. OpenBUGS website. Overview. 2010; http://www.openbugs.net/w/Overview Accessed 3 September 2017.
Google Scholar
56Lunn DJ, Thomas ABN, Spiegelhalter D. WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput. 2000; 10(4): 325-337.
10.1023/A:1008929526011
Web of Science®Google Scholar
57Turner RM, Jackson D, Wei Y, Thompson SG, Higgins JP. Predictive distributions for between-study heterogeneity and simple methods for their application in Bayesian meta-analysis. Stat Med. 2015; 34(6): 984-998.
10.1002/sim.6381
PubMedWeb of Science®Google Scholar
58Röver C, Friede T. Bayesian random-effects meta-analysis. 2015; version 1.1:https://cran.r-project.org/web/packages/bayesmeta/index.html. Accessed October 27, 2017.
Google Scholar
59Borenstein M, Hedges L, Higgins JPT, Rothstein H. Comprehensive Meta-Analysis. 104 2005; version 2:https://www.meta-analysis.com/. Accessed 24 October 2017.
Google Scholar
60Kontopantelis E, Reeves D. MetaEasy: a meta-analysis add-in for Microsoft Excel. J Stat Softw. 2009; 30(7): 1-25.
10.18637/jss.v030.i07
PubMedWeb of Science®Google Scholar
61Raudenbush S, Byrk A, Congdon R. HLM 6 for Windows. 2004; n. Accessed 28 September 2017.
Google Scholar
62 Meta-DiSc statistical methods [Computer Dent Prog]. 2006.
Google Scholar
63 MetaWin: Statistical Software for Meta-Analysis. Version 2 [Computer Program]. Sunderland, MA: Sinauer Associates; 2000.
Google Scholar
64Bax L, Yu LM, Ikeda N, Tsuruta H, Moons KG. Development and validation of MIX: comprehensive free software for meta-analysis of causal research data. BMC Med Res Methodol. 2006; 6(1): 50.
10.1186/1471-2288-6-50
PubMedGoogle Scholar
65Rasbash J, Charlton C, Browne W, Healy M, Cameron B. MLwiN. Centre forMultilevelModelling. MLwin: A software package for fitting multilevel models 2014; nnn. Accessed 28 September 2017.
Google Scholar
66Wallace BC, Dahabreh IJT, T.A., Lau J, Trow P, Schmid CH. Closing the gap between methodologists and end-users: R as a computational back-end. J Stat Softw. 2012; 49: 1–15.
10.18637/jss.v049.i05
Web of Science®Google Scholar
67 Centre TNC. Review Manager (RevMan). 2014; www.cochrane.org/. Accessed 28 October 2017.
Google Scholar
68 What is R? The R Project for Statistical Comput Secur 2008; nnn. Accessed 29 September 2017.
Google Scholar
69Chung Y, Rabe-Hesketh S, Dorie V, Gelman A, Liu J. A nondegenerate penalized likelihood estimator for variance parameters in multilevel model. Psychometrika. 2013; 78(4): 685-709.
10.1007/s11336-013-9328-2
PubMedWeb of Science®Google Scholar
70 boot: Bootstrap R (S-Plus) functions [Computer Dent Prog]. 2016.
Google Scholar
71Davison A, Hinkley D. Bootstrap methods and their applications. Cambridge: Cambridge University Press; 1997.
10.1017/CBO9780511802843
Google Scholar
72Thomas A, O'Hara B, Ligges U, Sturtz S. Making BUGS open. R News. 2006; 6(1): 12-17.
Google Scholar
73Del Re A, Hoyt W. MAd: meta-analysis with mean differences R package version 0.8–2 2014; nn. Accessed 23 May 2018.
Google Scholar
74 Doebler P. mada: Meta-analysis of diagnostic accuracy. R package version 0.5.8 2015; https://cran.r-project.org/web/packages/mada/mada.pdf. Accessed 23 May 2018.
Google Scholar
75Schwarzer G. meta: general package for meta-analysis. 2015; version 4.1–0:https://cran.r-project.org/web/packages/meta/meta.pdf. Accessed 1 September 2017.
Google Scholar
76van Valkenhoef G, Kuiper J. GEMTC: Network meta-analysis using Bayesian methods. R Package Version. 2016; 0: 8-2. https://cran.r-project.org/package=gemtc. Accessed 23 May 2018
Google Scholar
77Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010; 36(3): 1-48.
10.18637/jss.v036.i03
Web of Science®Google Scholar
78Möbius T. Metagen: inference in meta analysis and meta regression. R Package Version. 2014; 1: 0. https://cran.r-project.org/package=metagen. Accessed 23 May 2018
Google Scholar
79Guolo A. Higher-order likelihood inference in meta-analysis and meta-regression. Stat Med. 2012; 31(4): 313-327.
10.1002/sim.4451
CASPubMedWeb of Science®Google Scholar
80Guolo A, Varin C. The R package metaLik for likelihood inference in meta-analysis. J Stat Softw. 2012; 50(7): 1-14.
10.18637/jss.v050.i07
PubMedWeb of Science®Google Scholar
81Debray T, de Jong V. metamisc: Diagnostic and prognostic meta-analysis. R Package Version. 2017; 0: 1. https://cran.r-project.org/package=metamisc. Accessed 23 May 2018
Google Scholar
82Beath K. metaplus: an R package for the analysis of robust meta-analysis and meta-regression. R Journal. 2016. https://journal.r-project.org/archive/2016-1/beath.pdfS. Accessed 23 May 2018
Google Scholar
83Cheung MWL. Meta-analysis using. Structural Equation Modeling. 2016; version 0.9.8:https://cran.r-project.org/web/packages/metaSEM/metaSEM.pdf. Accessed 1 October 2017
Web of Science®Google Scholar
84Huizenga H, Visser I, Dolan C. Hypothesis testing in random effects meta-regression. Br J Math Stat Psychol. 2011; 64(1): 1-19.
10.1348/000711010X522687
PubMedWeb of Science®Google Scholar
85Gasparrini A. Multivariate and univariate meta-analysis and meta-regression. 2015; version 0.4.7:https://cran.r-project.org/web/packages/mvmeta/mvmeta.pdf. Accessed 1 October 2017.
Google Scholar
86Chen H. mvtmeta: multivariate meta-analysis. R Package Version 1. 0 2012; https://cran.r-project.org/package=mvtmeta. Accessed 23 May 2018.
Google Scholar
87Rücker G, Schwarzer G, Krahn K, König J. Network meta-analysis using frequentist methods. 2016; version 0.9–0:https://cran.r-project.org/web/packages/netmeta/netmeta.pdf. Accessed 29 September 2017.
Google Scholar
88Sturtz S, Ligges U, Gelman A. R2WinBUGS: a package for running WinBUGS from R. J Stat Softw. 2005; 12(3): 1-16.
10.18637/jss.v012.i03
Web of Science®Google Scholar
89Lumley T. rmeta: meta-analysis. R Package Version. 2012; 2: 16. https://cran.r-project.org/package=rmeta. Accessed 23 May 2018
Google Scholar
90 SAS Institute Inc. SAS Software. 2003; http://www.sas.com/technologies/analytics/statistics/stat/. Accessed 3 October 2017.
Google Scholar
91 marandom.sas. SAS® meta-analysis macros. http://www.senns.demon.co.uk/SAS%20Macros/SASMacros.html. Accessed 4 October 2017.
Google Scholar
92 PROC IML. SAS/IML 9.22 User's guide. http://support.sas.com/documentation/cdl/en/imlug/63541/PDF/default/imlug.pdf. Accessed 29 October 2017.
Google Scholar
93Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS system for mixed models. SAS Inst Inc, Cary, NC. 1996; 31-63.
Google Scholar
94 PROC MIXED. SAS/Stat 9.2 User's guide The MIXED procedure (Book Excerpt). https://support.sas.com/documentation/cdl/en/statugmixed/61807/PDF/default/statugmixed.pdf. Accessed 29 October 2017.
Google Scholar
95 PROC GLIMMIX. SAS/Stat 9.2 User's guide The MIXED procedure (Book Excerpt). https://support.sas.com/documentation/cdl/en/statugmixed/61807/PDF/default/statugmixed.pdf. Accessed 29 October 2017.
Google Scholar
96Zhang Z, McArdle JJ, Wang L, Hamagami F. A SAS interface for Bayesian analysis with WinBUGS. Struct Equ Model Multidiscip J. 2008; 15(4): 705-728.
10.1080/10705510802339106
Web of Science®Google Scholar
97 RASmacro. 2017; https://github.com/rsparapa/rasmacro.
Google Scholar
98 PROC MCMC. SAS Support Site The power to know https://support.sas.com/en/support-home.html.
Google Scholar
99 STATA. Stata Statistical Software 2013; https://www.stata.com/. Accessed 1 October 2017.
Google Scholar
100Huang L, Dziak J, Wagner A, Lanza S. LCA Bootstrap Stata Function Users' Guide (Version 1.0). University Park: The Methodology Center, Penn State; 2016.
Google Scholar
101Rabe-Hesketh S, Skrondal A, Pickles A. Maximum likelihood estimation of generalized linear models with covariate measurement error. Stata Journal. 2003; 3(4): 386-411.
10.1177/1536867X0300300408
Google Scholar
102Kontopantelis E, Reeves D. Metaan: random-effects meta-analysis. Stata Journal. 2010; 10: 395-407.
10.1177/1536867X1001000307
Web of Science®Google Scholar
103Harbord RM, Higgins JPT. Meta-regression in Stata. Stata J. 2008; 8(4): 493-519.
10.1177/1536867X0800800403
Web of Science®Google Scholar
104Harris R, Bradburn M, Deeks J, Harbord R, Altman D, Sterne J. Metan: fixed- and random-effects metaanalysis. Stata Journal. 2008; 8: 3-28.
10.1177/1536867X0800800102
Web of Science®Google Scholar
105White IR. Multivariate random-effects meta-analysis. Stata Journal. 2009; 9: 40-56.
10.1177/1536867X0900900103
Web of Science®Google Scholar
106Gutierrez RG, Carter S, Drukker DM. sg160: on boundary-value likelihood-ratio tests. College Station, TX: Stata 2001; 10: 269–273.
Google Scholar
107 IBM Corp. IBM SPSS for Windows. 2013; http://www.spss.co.in/. Accessed 29 October 2017.
Google Scholar
108Wilson B D. Meta-analysis stuff. meanes.sps, 2010; http://mason.gmu.edu/~dwilsonb/ma.html. Accessed 3 October 2017.
Google Scholar
109Wilson B D. Meta-analysis stuff. metaf.sps 2010; http://mason.gmu.edu/~dwilsonb/ma.html. Accessed 3 October 2017.
Google Scholar
110Wilson B D. Meta-analysis stuff. metareg.sps, 2010; http://mason.gmu.edu/~dwilsonb/ma.html. Accessed 3 October 2017.
Google Scholar
111Thomas A. BUGS: A statistical modelling package. RTA/BCS Modular Languages Newsletter 1994; https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-latest-news/the-winbugs-project-references-to-use-when-citing-bugs/. Accessed 4 October 2017.
Google Scholar
112Polanin JR, Hennessy EA, Tanner-Smith EE. A review of meta-analysis packages in R. J Educ Behav Stat. 2017; 42(2): 206-242.
10.3102/1076998616674315
Web of Science®Google Scholar
113Langan D, Higgins JPT, Jackson D, et al. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Res Synth Methods. 2018. Accepted
10.1002/jrsm.1316
Google Scholar
114Petropoulou M, Mavridis D. A comparison of 20 heterogeneity variance estimators in statistical synthesis of results from studies: a simulation study. Stat Med. 2017; 36(27): 4266-4280.
10.1002/sim.7431
PubMedWeb of Science®Google Scholar
115Hartung J. An alternative method for meta-analysis. Biom J. 1999; 41(8): 901-916.
10.1002/(SICI)1521-4036(199912)41:8<901::AID-BIMJ901>3.0.CO;2-W
Web of Science®Google Scholar
116Hartung J, Knapp G. On tests of the overall treatment effect in meta-analysis with normally distributed responses. Stat Med. 2001; 20(12): 1771-1782.
10.1002/sim.791
CASPubMedWeb of Science®Google Scholar
117Follmann DA, Proschan MA. Valid inference in random effects meta-analysis. Biometrics. 1999; 55(3): 732-737.
10.1111/j.0006-341X.1999.00732.x
CASPubMedWeb of Science®Google Scholar
118Hartung J, Makambi KH. Reducing the number of unjustified significant results in meta-analysis. Commun Stat Simul Comput. 2003; 32(4): 1179-1190.
10.1081/SAC-120023884
Web of Science®Google Scholar
119Makambi KH. The effect of the heterogeneity variance estimator on some tests of treatment efficacy. J Biopharm Stat. 2004; 14(2): 439-449.
10.1081/BIP-120037191
PubMedGoogle Scholar
120Sidik K, Jonkman JN. Robust variance estimation for random effects meta-analysis. Comput Stat Data Anal. 2006; 50(12): 3681-3701.
10.1016/j.csda.2005.07.019
Web of Science®Google Scholar
121Sidik K, Jonkman JN. On constructing confidence intervals for a standardized mean difference in meta-analysis. Commun Stat Simul Comput. 2003; 32(4): 1191-1203.
10.1081/SAC-120023885
Web of Science®Google Scholar
122Brockwell SE, Gordon IR. A simple method for inference on an overall effect in meta-analysis. Stat Med. 2007; 26(25): 4531-4543.
10.1002/sim.2883
PubMedWeb of Science®Google Scholar
123Sidik K, Jonkman JN. Simple heterogeneity variance estimation for meta-analysis. J Roy Stat Soc. 2005; 54(2): 367-384.
10.1111/j.1467-9876.2005.00489.x
Google Scholar
124Jackson D. The significance level of the standard test for a treatment effect in meta-analysis. Stat Biopharm Res. 2009; 1(1): 92-100.
10.1198/sbr.2009.0009
Web of Science®Google Scholar
125Knapp G, Hartung J. Improved tests for a random effects meta-regression with a single covariate. Stat Med. 2003; 22(17): 2693-2710.
10.1002/sim.1482
CASPubMedWeb of Science®Google Scholar
126Hartung J, Makambi KH. Positive estimation of the between-study variance in meta-analysis: theory and methods. S Afr Stat J. 2002; 36(1): 55-76.
Google Scholar
127Jackson D, Bowden J. A re-evaluation of the ‘quantile approximation method’ for random effects meta-analysis. Stat Med. 2009; 28(2): 338-348.
10.1002/sim.3487
PubMedWeb of Science®Google Scholar
128Jackson D, Riley RD. A refined method for multivariate meta-analysis and meta-regression. Stat Med. 2014; 33(4): 541-554.
10.1002/sim.5957
PubMedWeb of Science®Google Scholar
129Jackson D, Bowden J. Confidence intervals for the between-study variance in random-effects meta-analysis using generalised heterogeneity statistics: should we use unequal tails? BMC Med Res Methodol. 2016; 16(1): 118.
10.1186/s12874-016-0219-y
PubMedWeb of Science®Google Scholar
130Jackson D. Confidence intervals for the between-study variance in random effects meta-analysis using generalised Cochran heterogeneity statistics. Res Synth Methods. 2013; 4(3): 220-229.
10.1002/jrsm.1081
PubMedWeb of Science®Google Scholar
131Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Stat Med. 2007; 26(1): 37-52.
10.1002/sim.2514
PubMedWeb of Science®Google Scholar
132Wiksten A, Rücker G, Schwarzer G. Hartung-Knapp method is not always conservative compared with fixed-effect meta-analysis. Stat Med. 2016; 35(15): 2503-2515.
10.1002/sim.6879
PubMedWeb of Science®Google Scholar
133Jackson D, Law M, Rucker G, Schwarzer G. The Hartung-Knapp modification for random-effects meta-analysis: a useful refinement but are there any residual concerns? Stat Med. 2017; 36(25): 3923-3934.
10.1002/sim.7411
PubMedWeb of Science®Google Scholar
134Viechtbauer W, López-López JA, Sanchez-Meca J, Marin-Martinez F. A comparison of procedures to test for moderators in mixed-effects meta-regression models. Psychol Methods. 2015; 20(3): 360-374.
10.1037/met0000023
PubMedWeb of Science®Google Scholar
135Röver C, Knapp G, Friede T. Hartung-Knapp-Sidik-Jonkman approach and its modification for random-effects meta-analysis with few studies. BMC Med Res Methodol. 2015; 15(1): 99.
10.1186/s12874-015-0091-1
PubMedWeb of Science®Google Scholar
136Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997; 53(3): 983-997.
10.2307/2533558
CASPubMedWeb of Science®Google Scholar
137Reid N. The 2000 Wald memorial lectures. Aymptotics and the theory of inference. Ann Stat. 2003; 31(6): 1695-1731.
10.1214/aos/1074290325
Web of Science®Google Scholar
138Bartlett MS. Properties of sufficiency and statistical tests. Proc Royal Soc A. 1937; 160(901): 268-282.
10.1098/rspa.1937.0109
Google Scholar
139Skovgaard IM. An explicit large-deviation approximation to one-parameter tests. Ther Ber. 1996; 2: 145-165.
Google Scholar
140Sharma G, Mathew T. Higher order inference for the consensus mean in inter-laboratory studies. Biom J. 2011; 53(1): 128-136.
10.1002/bimj.201000032
PubMedWeb of Science®Google Scholar
141Guolo A, Varin C. Random-effects meta-analysis: the number of studies matters. Stat Methods Med Res. 2015; 26(3): 1500-1518.
10.1177/0962280215583568
PubMedWeb of Science®Google Scholar
142Rukhin AL. Confidence regions and intervals for meta-analysis model parameters. Dent Tech. 2015; 57(4): 547-558.
Google Scholar
143Kosmidis I, Guolo A, Varin C. Improving the accuracy of likelihood-based inference in meta-analysis and meta-regression. Biometrika. 2017;asx001; 104(3): 751.
10.1093/biomet/asx040
Web of Science®Google Scholar
144Doi SA, Barendregt JJ, Khan S, Thalib L, Williams GM. Advances in the meta-analysis of heterogeneous clinical trials I: the inverse variance heterogeneity model. Contemp Clin Trials. 2015; 45(Pt A): 130-138.
10.1016/j.cct.2015.05.009
PubMedWeb of Science®Google Scholar
145Biggerstaff BJ, Jackson D. The exact distribution of Cochran's heterogeneity statistic in one-way random effects meta-analysis. Stat Med. 2008; 27(29): 6093-6110.
10.1002/sim.3428
PubMedWeb of Science®Google Scholar
146Zeng D, Lin DY. On random-effects meta-analysis. Biometrika. 2015; 102(2): 281-294.
10.1093/biomet/asv011
CASPubMedWeb of Science®Google Scholar
147Switzer FS, Paese PW, Drasgow F. Bootstrap estimates of standard errors in validity generalization. Journal of Applied Psychology. 1992; 77(2): 123-129.
Web of Science®Google Scholar
148Efron B, Tibshirani R. An introduction to the bootstrap. New York: Chapman & Hall; 1993.
10.1007/978-1-4899-4541-9
Google Scholar
149Turner RM, Omar RZ, Yang M, Goldstein H, Thompson SG. A multilevel model framework for meta-analysis of clinical trials with binary outcomes. Stat Med. 2000; 19(24): 3417-3432.
10.1002/1097-0258(20001230)19:24<3417::AID-SIM614>3.0.CO;2-L
CASPubMedWeb of Science®Google Scholar
150Van Den Noortgate W, Onghena P. Parametric and nonparametric bootstrap methods for meta-analysis. Behav Res Methods. 2005; 37(1): 11-22.
10.3758/BF03206394
PubMedWeb of Science®Google Scholar
151Efron B. 10. Nonparametric confidence intervals. In: The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: SIAM1982: 75-90.
Google Scholar
152Higgins JP, Thompson SG. Controlling the risk of spurious findings from meta-regression. Stat Med. 2004; 23(11): 1663-1682.
10.1002/sim.1752
PubMedWeb of Science®Google Scholar
153Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Stat Med. 1995; 14(24): 2685-2699.
10.1002/sim.4780142408
CASPubMedWeb of Science®Google Scholar
154Thorlund K, Thabane L, Mills EJ. Modelling heterogeneity variances in multiple treatment comparison meta-analysis-are informative priors the better solution? BMC Med Res Methodol. 2013; 13(1): 2.
10.1186/1471-2288-13-2
PubMedWeb of Science®Google Scholar
155Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of few small studies in orphan diseases. Res Synth Methods. 2017; 8(1): 79-91.
10.1002/jrsm.1217
PubMedWeb of Science®Google Scholar
156Lambert PC, Sutton AJ, Burton PR, Abrams KR, Jones DR. How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Stat Med. 2005; 24(15): 2401-2428.
10.1002/sim.2112
PubMedWeb of Science®Google Scholar
157Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JP. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Stat Med. 2016; 35(29): 5495-5511.
10.1002/sim.7090
PubMedWeb of Science®Google Scholar
158Jackson D, Bowden J, Baker R. How does the DerSimonian and Laird procedure for random effects meta-analysis compare with its more efficient but harder to compute counterparts? J Stat Plan Infer. 2010; 140(4): 961-970.
10.1016/j.jspi.2009.09.017
Web of Science®Google Scholar
159López-López JA, Botella J, Sánchez-Meca J, Marín-Martínez F. Alternatives for mixed-effects meta-regression models in the reliability generalization approach: a simulation study. J Educ Behav Stat. 2013; 38(5): 443-469.
10.3102/1076998612466142
Web of Science®Google Scholar
160Sidik K, Jonkman JN. Authors reply. Stat Med. 2004; 23(1): 159-162.
10.1002/sim.1729
Web of Science®Google Scholar
161Paule RC, Mandel J. Consensus values and weighting factors. J Res Natl Inst Stand Technol. 1982; 87(5): 377-385.
10.6028/jres.087.022
Web of Science®Google Scholar
162Rukhin AL, Biggerstaff BJ, Vangel MG. Restricted maximum-likelihood estimation of a common mean and the Mandel-Paule algorithm. J Stat Plan Infer. 2000; 83(2): 319-330.
10.1016/S0378-3758(99)00098-1
Web of Science®Google Scholar
163Thorlund K, Wetterslev J, Awad T, Thabane L, Gluud C. Comparison of statistical inferences from the DerSimonian-Laird and alternative random-effects model meta-analyses—an empirical assessment of 920 Cochrane primary outcome meta-analyses. Res Synth Methods. 2011; 2(4): 238-253.
10.1002/jrsm.53
PubMedWeb of Science®Google Scholar
164Cochran WG. The combination of estimates from different experiments. Biometrics. 1954; 10(1): 101-129.
10.2307/3001666
Web of Science®Google Scholar
165Veroniki AA, Mavridis D, Higgins JP, Salanti G. Characteristics of a loop of evidence that affect detection and estimation of inconsistency: a simulation study. BMC Med Res Methodol. 2014; 14(1): 106.
10.1186/1471-2288-14-106
PubMedWeb of Science®Google Scholar
166Gonnermann A, Framke T, Grosshennig A, Koch A. No solution yet for combining two independent studies in the presence of heterogeneity. Stat Med. 2015; 34(16): 2476-2480.
10.1002/sim.6473
PubMedWeb of Science®Google Scholar
167Friedrich T, Knapp G. Generalised interval estimation in the random effects meta regression model. Comput Stat Data Anal. 2013; 64: 165-179.
10.1016/j.csda.2013.03.011
Web of Science®Google Scholar
168Bodnar O, Link A, Arendacka B, Possolo A, Elster C. Bayesian estimation in random effects meta-analysis using a non-informative prior. Stat Med. 2017; 36(2): 378-399.
10.1002/sim.7156
PubMedWeb of Science®Google Scholar
169Graham PL, Moran JL. Robust meta-analytic conclusions mandate the provision of prediction intervals in meta-analysis summaries. J Clin Epidemiol. 2012; 65(5): 503-510.
10.1016/j.jclinepi.2011.09.012
PubMedWeb of Science®Google Scholar
170IntHout J, Ioannidis JP, Rovers MM, Goeman JJ. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open. 2016; 6(7):e010247.
10.1136/bmjopen-2015-010247
PubMedWeb of Science®Google Scholar
171Partlett C, Riley RD. Random effects meta-analysis: coverage performance of 95% confidence and prediction intervals following REML estimation. Stat Med. 2017; 36(2): 301-317.
10.1002/sim.7140
PubMedWeb of Science®Google Scholar
172Lee KJ, Thompson SG. Flexible parametric models for random-effects distributions. Stat Med. 2008; 27(3): 418-434.
10.1002/sim.2897
PubMedWeb of Science®Google Scholar
173Kontopantelis E, Reeves D. Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: a simulation study. Stat Methods Med Res. 2012; 21(4): 409-426.
10.1177/0962280210392008
PubMedWeb of Science®Google Scholar
174Bowden J, Tierney JF, Copas AJ, Burdett S. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics. BMC Med Res Methodol. 2011; 11(1): 41.
10.1186/1471-2288-11-41
PubMedWeb of Science®Google Scholar

Citing Literature

Volume10, Issue1

March 2019

Pages 23-43

Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-analysis

Abstract

1 INTRODUCTION

2 METHODS

2.1 Meta-analysis models and notation

2.2 Review methods

3 RESULTS

3.1 CI for the overall effect size

3.1.1 WT methods

3.1.2 Hartung-Knapp/Sidik-Jonkman CIs (methods 4 and 5)

3.1.3 Likelihood-based methods

3.1.4 Henmi and Copas CIs (method 9)

3.1.5 Biggerstaff and Tweedie CIs (method 10)

3.1.6 Resampling methods

3.1.7 Bayesian credible intervals (method 15)

3.2 Comparative evaluation of the methods

3.2.1 WT methods (methods 1, 2, and 3)

3.2.2 HKSJ methods (method 4 and 5)

3.2.3 Likelihood-based methods (methods 6, 7, and 8)

3.2.4 HC and BT (methods 9 and 10)

3.2.5 Resampling methods (methods 11, 12, 13, and 14)

3.2.6 Bayesian CrIs (method 15)

3.3 Prediction intervals

4 DISCUSSION

List of Abbreviations

AVAILABILITY OF DATA AND MATERIALS

DETAILS OF CONTRIBUTORS

FUNDING INFORMATION

ACKNOWLEDGMENTS

CONFLICT OF INTEREST

Supporting Information

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-analysis

Abstract

1 INTRODUCTION

2 METHODS

2.1 Meta-analysis models and notation

2.2 Review methods

3 RESULTS

3.1 CI for the overall effect size

3.1.1 WT methods

3.1.2 Hartung-Knapp/Sidik-Jonkman CIs (methods 4 and 5)

3.1.3 Likelihood-based methods

3.1.4 Henmi and Copas CIs (method 9)

3.1.5 Biggerstaff and Tweedie CIs (method 10)

3.1.6 Resampling methods

3.1.7 Bayesian credible intervals (method 15)

3.2 Comparative evaluation of the methods

3.2.1 WT methods (methods 1, 2, and 3)

3.2.2 HKSJ methods (method 4 and 5)

3.2.3 Likelihood-based methods (methods 6, 7, and 8)

3.2.4 HC and BT (methods 9 and 10)

3.2.5 Resampling methods (methods 11, 12, 13, and 14)

3.2.6 Bayesian CrIs (method 15)

3.3 Prediction intervals

4 DISCUSSION

List of Abbreviations

AVAILABILITY OF DATA AND MATERIALS

DETAILS OF CONTRIBUTORS

FUNDING INFORMATION

ACKNOWLEDGMENTS

CONFLICT OF INTEREST

Supporting Information

REFERENCES

Citing Literature

References

Related

Information