Footprint of publication selection bias on meta-analyses in medicine, environmental sciences, psychology, and economics
František Bartoš and Maximilian Maier have contributed equally.
Abstract
Publication selection bias undermines the systematic accumulation of evidence. To assess the extent of this problem, we survey over 68,000 meta-analyses containing over 700,000 effect size estimates from medicine (67,386/597,699), environmental sciences (199/12,707), psychology (605/23,563), and economics (327/91,421). Our results indicate that meta-analyses in economics are the most severely contaminated by publication selection bias, closely followed by meta-analyses in environmental sciences and psychology, whereas meta-analyses in medicine are contaminated the least. After adjusting for publication selection bias, the median probability of the presence of an effect decreased from 99.9% to 29.7% in economics, from 98.9% to 55.7% in psychology, from 99.8% to 70.7% in environmental sciences, and from 38.0% to 29.7% in medicine. The median absolute effect sizes (in terms of standardized mean differences) decreased from d = 0.20 to d = 0.07 in economics, from d = 0.37 to d = 0.26 in psychology, from d = 0.62 to d = 0.43 in environmental sciences, and from d = 0.24 to d = 0.13 in medicine.
Highlights
What is already known
- Publication selection bias, where studies with significant or positive results are more likely to be reported and published, distorts the available scientific record.
What is new
- This study surveyed over 68,000 meta-analyses from medicine, environmental sciences, psychology, and economics to assess the extent of publication selection bias. As a result, it underscores the importance of addressing publication bias in evidence synthesis.
- Results suggest that meta-analyses in economics are the most affected by publication selection bias, followed by environmental sciences and psychology. In contrast, meta-analyses in medicine are suggested to be the least affected. Yet, notable biases are found across all of these scientific disciplines.
Potential impact for readers
- This study documents the potential extent of publication bias in different fields, which could help researchers and the public better understand the limitations of research and the potential biases of research synthesis.
1 INTRODUCTION
Publication selection biases (PSB) are defined as the selective reporting of results in ways that deviate from the objective, complete scientific record. PSB may entail the suppression of “negative” findings or the conversion of “negative” results into more “positive” ones (e.g., those with more favorable p-values and/or with larger effect sizes) and might represent a problem in all scientific disciplines, for example, References 1-5. Studies that examine the self-reported behavior of researchers show that 78% of researchers failed to report all dependent measures of a study6 (however, see,7 for a response that suggests a lower proportion). Some studies also suggest that PSB might be modestly increasing in some areas, although the exact nature, prevalence, and impact of PSB is unknown and likely to be variable across scientific fields.8, 9
To gauge the extent of the PSB, one would need to have access to the complete scientific record or a representative and wide-coverage sample of it. However, this is infeasible as much of the relevant data is not publicly recorded. Instead, the footprint of PSB is indirectly probed by re-analyzing meta-analyses in several specific fields with different statistical techniques9-14 and focusing on patterns in the published results that would herald the presence of PSB. All these available methods try to identify the footprint of PSB, and thus their results need to be interpreted with caution since these patterns (e.g., correlations of effect sizes and standard errors) may sometimes be due to factors other than PSB (e.g., genuine heterogeneity across studies). However, when large numbers of meta-analyses show the same patterns, this constitutes a probable footprint of PSB, which can be used to estimate its relative magnitude across different fields.
Previous field-wide assessments of PSB suggested that the prevalence of over-reporting positive results and other possible symptoms of bias increased moving from the physical to the biological and the social sciences, and even suggested that problems might be worsening over time in the latter.9, 15-20 However, these estimates were based on proxy measures of PSB that have several limitations.
To our knowledge, no previous survey of the potential footprint of PSB has used state-of-the-art methods. Our proposed approach is more comprehensive than past surveys: employing different strategies to identify potential PSB, using new measures of PSB, and analyzing a much larger number of research studies covering the fields of medicine, environmental sciences, psychology, and economics.
2 METHODS
2.1 Data sets
We used five large data sets from medicine, environmental sciences, psychology, and economics. The data set from medicine comprises meta-analyses of continuous and dichotomous outcomes obtained from the Cochrane Database of Systematic Reviews published between 1997 and 2020. The data set from environmental sciences comprises the meta-analyses of mean differences, odds ratios, and correlation coefficients by Deressa et al.21 published between 2010 and 2020. The data sets from psychology comprise the meta-analyses of mean differences and correlation coefficients by Stanley et al.12 published between 2011 and 2016 combined with a random sample of meta-analyses published in psychological journals by Sladekova et al.22 published in 2008 and 2018. Finally, the data set from economics comprises the extended data set of meta-analyses of regression and correlation coefficients by Ioannidis et al.10 published between 1967 and 2021. Eighty-four meta-analyses were part of both the Ioannidis et al.10 and Stanley et al.12 data set. Since each of the meta-analyses could be classified in both fields (psychology or economics), we did not remove them from either of the data sets. From each data set we only used meta-analyses with at least three estimates reported using standardized effect size metrics such as log odds ratios, standardized mean differences, and (partial) correlation coefficients that can be transformed to a common standardized mean difference effect size metric, Cohen's d.
2.1.1 Medicine
The data set from medicine comprises meta-analyses of continuous and dichotomous outcomes obtained from the Cochrane Database of Systematic Reviews (CSDR) published between 1997 and 2020. We identified systematic reviews in the CDSR through PubMed, limiting the period to Jan 2000–May 2020. For that, we used the NCBI's EUtils API with the following query: “Cochrane Database Syst Rev”[journal] AND (“2000/01/01” [PDAT]: “2020/05/31” [PDAT]). For each review, we downloaded the XML meta-analysis table file (rm5-format) associated with the review's latest version. We extracted the tables with continuous and dichotomous outcomes from these rm5 files with a custom Javascript and R programs (https://github.com/wmotte/cochrane2022).
We selected meta-analysis tables based on the highest aggregation reported in the CSDR. For each meta-analysis, we removed estimates based on one or fewer participants in the control or treatment group and used all meta-analyses with at least three effect size estimates.
2.1.2 Environmental sciences
The environmental sciences data set consists of meta-analyses of mean differences, correlation coefficients, and odds ratios published between 2010 and 2020. The literature search was performed in the Scopus database using the query: “TITLE-ABS-KEY (“meta analy*” OR “meta-analy*” OR “metaanaly*” OR “meta reg*” OR “meta-reg*” OR “metareg*”) AND SUBJAREA (envi)” on July 21, 2020. Detailed information about the sampling strategy and inclusion/exclusion criteria used can be found in Deressa et al.21
2.1.3 Psychology
The data set from psychology comprises the data set of meta-analyses of mean differences and correlation coefficients of Stanley et al.12 published between 2011 and 2016 combined with data from Sladekova et al.,22 a random sample of 433 meta-analyses from 90 articles published in 2008 and 2018. See Stanley et al.12 and Sladekova et al.22 for more details about the collected data sets. None of the meta-analyses by Sladekova et al.22 were published in Psychological Bulletin, precluding overlap with Stanley et al.12 data set.
2.1.4 Economics
The data set from economics comprises the extended data set of meta-analyses of regression and correlation coefficients of Ioannidis et al.10 published between 1967 and 2021. The meta-analyses were identified using various search engines (e.g., Econlit and Scopus), publisher sites (e.g., Science Direct, Sage, and Wiley), webpages of researchers known to publish meta-analyses, and by searching all volumes of individual journals that are known to publish meta-analyses. We also emailed 109 research teams (associated with either sole-authored or co-authored meta-analyses) for data, with a 67% response rate. The search for data ended on May 30th, 2021.
We selected meta-analyses of standardized mean differences, (partial) correlation coefficients, and mean differences (if enough information was available to compute the standardized mean differences).
2.1.5 Effect size calculation
In cases where the data set did not already feature standardized effect size (Cohen's , correlation coefficient , log(OR), or Fisher's z), we used the metafor R package23 to calculate the standardized effect sizes. For dichotomous outcomes with zero cell counts, we used the default empty cell correction, adding to empty cells. Finally, we converted all standardized effect sizes to Fisher's by using the formulas in Borenstein et al.24
2.2 Publication bias adjustment with Bayesian model-averaging
We used the PSB detection and correction technique RoBMA-PSMA.25, 26 RoBMA employs Bayesian model-averaging27, 28 and combines the best of two well-performing publication bias adjustment methods: selection models with six different weight functions that adjust for publication selection across a combination of statistical significance and direction of the effect29 and PET-PEESE, which adjusts for the relationship between effect sizes and standard errors or standard errors squared.30 Bayesian model-averaging allows us to combine these publication bias adjustment methods based on their predictive adequacy, such that models that predict well have a larger impact on the inference. In that way, we can evaluate the evidence in favor or against the hypothesis of PSB and its impact without committing to any single estimation or correction method.27
We used the default RoBMA parameterization which was shown to achieve better performance in both simulation studies and real data examples than either of publication bias adjustment methods alone.26 It gives equal prior model probabilities to models assuming the presence vs. absence of an effect, heterogeneity, and publication selection bias. RoBMA employs a standard normal distribution on the effect size, , empirically informed Inverse-gamma distribution on the heterogeneity, ,31 cumulative unit Dirichlet prior distributions on publication probabilities, and Cauchy prior distributions on the PET-PEESE regression coefficients, , .
2.2 InfoBox 1: Bayes factors
The Bayes factor is the key inference criterion for much of Bayesian statistics, for example Reference 32, 33. It compares the relative predictive accuracy (i.e., likelihood of the data) under competing hypotheses and it can also be expressed as the ratio of prior and posterior model odds,
Although the Bayes factor is a continuous measure of strength of evidence, the following rules of thumb may aid interpretation: Bayes factors between 1 and 3 are commonly regarded as weak evidence, Bayes factors between 3 and 10 as moderate evidence, and Bayes factors larger 10 as strong evidence for the alternative (or the hypothesis at the top of Equation (1)). When the evidence for the null is considered, the Bayes factor is simply inverted. In other words, a Bayes factor between 1/3 and 1 is considered weak evidence, a Bayes factor between 1/10 and 1/3 moderate and smaller 1/10 strong evidence for the null (e.g., References 34, 35).
2.3 Measures
For each meta-analysis, we used RoBMA to calculate the (PSB) adjusted posterior model-averaged effect size assuming it is present (i.e., without averaging over the point null models to reduce shrinkage toward zero), ; publication bias adjusted posterior probability of the presence of the effect, ; and the posterior probability of the presence of PSB, . To isolate the effect of PSB adjustment, we compare the Bayesian, PSB unadjusted, model-averaged meta-analysis by dropping the PSB adjustment and thereby estimating the unadjusted posterior probability of the presence of the effect assuming it is present, . to denotes the individual meta-analyses. Each meta-analysis is based on estimates that are characterized with data describing the effect size and standard error .
2.3.1 Evidence for the effect
We used the change in the posterior probability of the effect and the (standardized) evidence inflation factor to quantify the effect of PSB on meta-analytic evidence.
The posterior probability of the effect is an intuitive way of quantifying the evidence in favor of the alternative hypothesis of the presence of an effect. Under the assumption of equal prior probability of the presence and the absence of the effect, , posterior probabilities larger than 0.5 indicate that the data are more likely under the presence of the effect. On the other hand, posterior probabilities lower than 0.5 indicate that the data are more likely under the absence of the effect. The ability to quantify evidence for both the null and the alternative is a key benefit of Bayesian methods over null hypothesis significance testing..36, 37
A corresponding way of quantifying the evidence of an effect is via Bayes factors (see InfoBox 1 for more detail). Bayes factors quantify the change from prior to posterior odds for the presence of the effect. The advantage of Bayes factors is that they are independent of the prior odds for the presence of the effect. In other words, Bayes factors isolate the evidence for the presence of the effect contained in the data. In our settings, the assumption of equal prior probabilities leads to an equivalence between Bayes factors and posterior odds.
The change from the PSB unadjusted posterior probability of the effect, , to the PSB adjusted posterior probability of the effect, , quantifies the amount of evidence introduced by PSB. The larger the impact of PSB, the larger the difference between the PSB unadjusted and PSB adjusted posterior probabilities of the effect. If there was no PSB, we would observe no change in the posterior probability of the effect after PSB adjustment.
An evidence inflation factor larger than one indicates inflated evidence in favor of the effect due to PSB.
2.3.2 Effect size estimates
Absolute bias larger than zero indicates that PSB leads to inflated effect size estimates. We compare the average effect sizes to the PSB-adjusted effect sizes assuming the presence of the effect (conditional effect size estimates) rather than averaging across all models. Excluding models assuming the absence of a mean effect mitigates the pooling toward in meta-analyses more consistent with the null hypothesis. Tables 5 and 6 in the Supplementary Materials use the PSB-adjusted effect sizes model-averaged across all models, including models assuming the absence of a mean effect. These estimates, which are model-averaged also over the null, indicate stronger absolute bias compared to the conditional estimates presented in the main manuscript (Table 2).
However, note that can lead to non-sensible results as a meta-analysis with a positive mean effect and very small negative PSB-adjusted effect sizes estimate results in an extremely large negative .
2.3.3 Evidence for publication selection bias
Posterior probability of PSB is an intuitive way of quantifying the evidence in favor of PSB. Similarly to the posterior probability of the presence of the effect, under the assumption of equal prior probability of the presence and the absence of PSB, , a posterior probability larger than 0.5 indicates that the data are more likely under the presence of PSB. On the other hand, a posterior probability lower than 0.5 indicates that the data are more likely under the absence of PSB. As before, the Bayes factor for the presence of PSB, ,1 quantifies the change from prior to posterior odds for the presence of PSB. A Bayes factor in favor of the presence of PSB larger than one provides evidence in favor of the presence of PSB and lower than one provides evidence against the presence of PSB.
Relative publication probabilities quantify the relative probability of an estimate being published for a given p-value interval compared to estimates with statistically significant p-values. We use one-sided p-values, resulting in p-values larger than 0.5 corresponding to estimates in the opposite direction. To facilitate the interpretation we visualize a weight function that shows the change of relative publication probabilities across the range of p-values. We report the results only in Supplementary Materials.
Effect size inflation in imprecise estimates quantifies the relationship between the effect sizes and their standard errors. To facilitate the interpretation of the funnel asymmetry test, we visualize the bias in effect sizes as a function of standard errors (incorporating the quadratic term from the RoBMA model). We report the results only in Supplementary Materials.
3 RESULTS
3.1 Descriptives
Table 1 compares the characteristics of the meta-analyses from each field. Medical meta-analyses contain the smallest number of estimates per meta-analysis, followed by psychology and environmental sciences with five to six times the number of estimates compared to medicine. Finally, economics meta-analyses contain over 12 times the median number of estimates compared to medicine. Contrary to a naive expectation that more estimates may be conducted to establish smaller effects, economic and medical effect sizes are approximately the same magnitude (measured as a mean effect size per meta-analysis). Effect sizes in psychology are roughly twice as large as those in economics, and effect sizes in the environmental sciences are approximately three times larger than those in economics. Differences in the number of estimates per meta-analysis are most closely reflected in the proportion of statistically significant random-effects estimates. Notably, random-effects estimates in economics, psychology, and environmental sciences are statistically significant approximately twice as often as in medicine. This disparity in the proportion of statistically significant meta-analyses is consistent when comparing meta-analyses with a matched number of estimates across the disciplines, although the difference in mean effects is somewhat smaller (see Table 1 in the Supplementary Materials).
Field | Meta-analyses | Estimates | Estimates/MA | Effect sizes | Prop. significant |
---|---|---|---|---|---|
Medicine | 67,386 | 597,699 | 5 (4, 10) | 0.24 (0.09, 0.47) | 0.39 |
Environmental | 199 | 12,707 | 26 (11, 59) | 0.62 (0.31, 0.95) | 0.85 |
Psychology | 605 | 23,563 | 18 (9, 40) | 0.37 (0.18, 0.61) | 0.78 |
Economics | 327 | 91,421 | 66 (30, 283) | 0.20 (0.09, 0.37) | 0.82 |
- Note: The number of estimates per meta-analysis (Estimates/MA) and the unweighted simple mean effect size of estimates within each meta-analysis (Effect Sizes) are reported as medians with the interquartile range (in parentheses). The proportion of the statistically significant (Prop. Significant) meta-analytic effect size estimates is based on a random-effect meta-analysis estimated via restricted maximum likelihood with (removing one environmental sciences and 275 medical meta-analyses that did not converge).
We summarize results from all meta-analyses, apart from seven medical meta-analyses that did not converge. See Supplementary Materials for analyses showing that matching meta-analyses based on the number of primary estimates within each meta-analysis does not meaningfully affect the conclusions.
3.2 Evidence for the effect
Figure 1 shows medians, interquartile ranges, and distributions of the posterior probability of an effect before and after adjusting for PSB. These distributions reveal several patterns. First, meta-analyses in economics, psychology, and environmental sciences predominantly show evidence for an effect before adjusting for PSB (unadjusted); whereas meta-analyses in medicine often display evidence against an effect. This disparity between the fields remains even when comparing meta-analyses with equal numbers of effect size estimates (see Supplementary Materials). After correcting for PSB, the posterior probability of an effect drops much more in economics, psychology, and environmental sciences (medians drop from to , from to , and from to , respectively) compared to medicine . The pattern is especially striking in economics, where the median posterior probability of an effect drops by more than seventy percentage points after PSB correction. Mean decreases in posterior probabilities show a similar pattern but with somewhat smaller reductions (Tables 9 and 10 in Supplementary Materials). In all four disciplines, adjusting for PSB resulted in a substantial decrease in the strength of evidence for the effect: the proportion of meta-analyses with at least strong evidence for the presence of an effect decreased from 20.2% to 5.3% in medicine, from 72.4% to 30.7% in environmental sciences, from 59.8% to 27.3% in psychology, and from 72.8% to 19.6% in economics. A comparable decrease was also present when comparing the proportion of meta-analyses with at least moderate evidence for the presence of an effect (i.e., ; from 28.9% to 12.3% in medicine, from 80.4% to 47.3% in environmental sciences, from 67.4% to 38.39% in psychology, and from 76.8% to 27.6% in economics).

Furthermore, we quantify the inflation of evidence in favor of an effect in meta-analyses via the evidence inflation factor—the increase in Bayes factor in favor of the effect due to the PSB. We find that meta-analyses in economics inflate the evidence by a median factor of 11,369, whereas the meta-analyses in environmental sciences and psychology inflate the evidence by “only” and , respectively, and medicine by a median factor of . These extreme differences between the fields are largely driven by the disparity in the typical numbers of estimates per meta-analysis across the disciplines (Table 1). After standardizing the evidence inflation factor (sEIF) by the number of estimates per meta-analysis, we find that per estimate evidence inflation is the largest in psychology with a median factor of , followed by environmental sciences with a median factor of , economics with a median factor of , and medicine with a median factor of . Again, the strong evidence inflation in economics (11,369) is largely due to having many more estimates per meta-analysis than psychology and medicine. However, even after adjusting for different numbers, meta-analyses in medicine still show the least inflated evidence due to PSB.
3.3 Effect size estimates
Table 2 summarizes the effect of PSB on effect sizes in each field. The first column reveals that environmental sciences, on average, suffer from as much as two and a half times larger absolute bias as medicine, economics, or psychology. The degree of absolute bias in environmental sciences is so large that it is comparable to average unadjusted effect sizes in other fields. Otherwise, medicine, psychology, and economics share a comparable degree of absolute bias. The median absolute bias in each field is lower than the mean bias due to the right skew distribution of absolute biases (see Table 4 in the Supplementary Materials).
Field | Absolute bias | Overestimation factor |
---|---|---|
Medicine | 0.13 [0.12, 0.13] | 1.62 [1.60, 1.64] |
Environmental | 0.33 [0.24, 0.41] | 1.78 [1.42, 2.13] |
Psychology | 0.13 [0.11, 0.14] | 1.39 [1.24, 1.55] |
Economics | 0.15 [0.13, 0.17] | 2.16 [1.69, 2.64] |
- Note: The results are based on the comparison of publication bias adjusted meta-analytic effect size estimates assuming presence of the effect to the mean effect sizes per meta-analysis. The table displays means and 95% confidence intervals. (See Table 4 in the Supplementary Materials for medians and interquartile ranges.)
The second column of Table 2 displays the relative impact of PSB on meta-analytic estimates via the overestimation factor. On average, economics meta-analyses are, relatively, the most PSB exaggerated, inflating effect sizes by over two times; this corroborates a prior survey on power and bias.10 Effect sizes in environmental sciences and medical meta-analyses show smaller yet notable relative effect size inflation. Finally, effect sizes in psychological meta-analyses are the least inflated with the average effect size exaggerated by 40%. In each field, the distribution of absolute biases is right-skewed; consequently, the per meta-analysis overestimation factor median is lower than the mean (see Table 4 in the Supplementary Materials). The median overestimation factor is relatively stable/decreasing with the increasing number of effect size estimates per meta-analysis, suggesting that the number of meta-analyses does not play a role in the relative size of PSB.
3.4 Evidence for publication selection bias
Figure 2 shows medians, interquartile ranges, and distributions of the posterior probability of the PSB in each field. We find the most evidence for PSB in economics, where the typical evidence of the presence of publication bias is moderate, median corresponding to posterior probability of publication selection bias. Meta-analyses in environmental sciences and psychology have weak evidence in favor of PSB; even though the proportion of meta-analyses showing at least moderate evidence for PSB is still considerable (32.2% and 27.4%, respectively). Meta-analyses in medicine show the lowest proportion of at least moderate evidence in favor of PSB (12.9%). However, the proportion of meta-analyses consistent with the evidence of absence of PSB (i.e., ) is also the lowest in medicine (2.6%), indicating that the majority of medical meta-analyses is not informative enough to provide compelling evidence for or against publication bias. The proportion of meta-analyses with at least moderate evidence against PSB is somewhat higher in psychology (7.1%), economics (12.2%), and environmental sciences (12.6%). Meta-analyses with a larger number of effect size estimates present slightly more evidence in favor of the PSB; however, the overall disparity between the fields remains when comparing meta-analyses with a matched number of effect size estimates (Table 17 in Supplementary Materials).

4 CONCLUDING COMMENTS
We present a comprehensive assessment of publication selection bias and its effects on meta-analyses across medicine, environmental sciences, psychology, and economics. Novel methods and measures allowed us to quantify the evidence for the absence or presence of the mean effect and publication selection bias, as well as inflation of the evidence of the effect due to the publication selection bias. Furthermore, we estimated the bias and overestimation factor of the effect sizes of average estimates included in meta-analyses.
Our analysis is based on all effect size estimates found in these meta-analyses, regardless of the type of outcome or how they were analyzed. One can classify outcomes into three categories. First, some outcomes may have been pre-specified as being of primary interest to show a desirable effect (e.g., the effectiveness of a medication in reducing the risk of death). Second, some other outcomes are not pre-specified but may still be used to demonstrate some preferred outcome; thus, they may have larger analytical flexibility (e.g., using alternative measures of effectiveness) and thereby are potentially more affected by publication selection bias. Third, still other outcomes may have been collected and analyzed without any strong interest to show some significant result, or even with some incentive to show non-significant results (e.g., outcomes on collected adverse events). Publication selection bias is expected mostly in the second category, while it may be less in the first category38 and may be entirely absent in the third category.
Furthermore, we assumed independence of the reported primary estimates within and between meta-analyses; that is, each reported estimate is regarded to provide the same amount of new information as every other reported estimate. However, estimates may be dependent between meta-analyses, e.g., a single estimate might be used across multiple meta-analyses, and within meta-analyses, e.g., multiple estimates obtained from a single study/primary data set. As our data does not allow us to tackle those dependencies directly, we discuss how each independence violation might affect the results. The between meta-analysis dependency of estimates is of lesser importance as our inferences are concerned with the population of meta-analyses. Consequently, between meta-analysis dependency of estimates would only affect descriptive summaries of the estimates themselves. The within meta-analysis dependency of estimates is more problematic and can lead to (1) the overestimation of the strength of evidence, as the same primary data set is conditioned upon more than once, and (2) placing more weight on studies with multiple estimates. The first issue is partially mitigated via the standardized evidence inflation factor, which assesses the average evidence contribution of an estimate, that is, adjusting for the number of data sets conditioned upon. However, the absolute measures of evidence (i.e., evidence for the presence of the effect before and after publication bias adjustment or the evidence for publication bias) can be susceptible to overestimation, particularly in fields with relatively large within meta-analysis dependencies such as economics or environmental science (but see Supplementary Materials for comparison of meta-analyses with a matched number of effect size estimates). The second issue cannot be directly addressed; however, all presented measures are based on comparisons of two sets of models, both of which should be affected to a similar degree, thus hopefully canceling most of the bias that is generated by overweighting studies with multiple estimates. Overall, we cannot exclude that the observed between-field differences may at least partially result from systematic differences in how meta-analyses themselves are conducted.
The milder publication selection bias in medical meta-analyses corroborates previous findings and might have multiple concurring explanations.9, 16, 19 First, as in other disciplines, a large share of those medical meta-analyses with seemingly strong evidence no longer had strong evidence when PSB adjustment was made. However, a much lower proportion of medical meta-analyses showed strong evidence of an effect compared to the other disciplines. Therefore, the difference between medicine and the other disciplines might be explained by the higher proportion of meta-analyses in medicine that showed weak evidence for an effect already before adjusting for publication selection bias. Second, medical studies may measure phenomena that are simpler and more stable, using methods that are more solidly and universally codified, which reduces researchers' “degrees of freedom” in generating and publishing evidence.9, 16 Third, it is also possible that the milder publication selection bias seen in medical meta-analyses is reflecting a larger share of meta-analyses that belonged to a category of outcomes with less pressure for publication selection bias. Finally, medical research makes wider use of research integrity practices, such as clinical trial registration, which might reduce the risk of publication selection bias.39 Perhaps, medical research is, therefore, typically of a higher methodological quality and less subject to bias.9
In this paper, we documented the considerable impact of publication selection bias on meta-analyses in a variety of disciplines. Even though we can probe the footprint of these biases with the statistical techniques employed here, science ultimately needs to progress toward mitigating publication bias already while conducting and publishing the research. While the specific patterns of researchers' “degrees of freedom” and causes of publication selection bias are likely to vary widely across fields, our results suggest that the social sciences might especially benefit from adopting practices to mitigate these, including: preregistration, greater transparency, and registered reports.40-42
AUTHOR CONTRIBUTIONS
František Bartoš: Conceptualization; investigation; writing – original draft; writing – review and editing; visualization; validation; methodology; software; formal analysis; data curation. Maximilian Maier: Conceptualization; investigation; writing – original draft; methodology; validation; writing – review and editing; visualization; software; formal analysis; data curation. Eric-Jan Wagenmakers: Supervision; resources; methodology; writing – review and editing; writing – original draft; conceptualization; funding acquisition. Franziska Nippold: Investigation; writing – review and editing; resources. Hristos Doucouliagos: Investigation; writing – review and editing; resources. John Ioannidis: Investigation; writing – review and editing; resources. Willem M Otte: Resources; writing – review and editing; investigation. Martina Sladekova: Resources; writing – review and editing; investigation. Teshome K Deressa: Investigation; writing – review and editing; resources. Stephan B Bruns: Investigation; writing – review and editing; resources. Daniele Fanelli: Investigation; writing – review and editing; resources. T.D. Stanley: Supervision; resources; writing – review and editing; conceptualization; investigation; writing – original draft; methodology; project administration.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
ENDNOTE
Biographies
František Bartoš is a PhD candidate at the Department of Psychological Methods at the University of Amsterdam.
Maximilian Maier is a PhD candidate in Experimental Psychology at University College London.
Eric-Jan Wagenmakers is professor of Bayesian Methodology at the Department of Psychological Methods at the University of Amsterdam.
Franziska Nippold is a former research masters student of the Psychology department at the University of Amsterdam.
Hristos Doucouliagos is Emeritus Professor of Economics at Deakin University, Melbourne Australia.
John P. A. Ioannidis is Professor of Medicine, of Epidemiology and Population Health, and (by courtesy) of Biomedical Data Science, and of Statistics at Stanford University where he leads the Meta-Research Innovation Center at Stanford (METRICS).
Willem M. Otte is Associate Professor at Utrecht University and University Medical Center Utrecht, working on methods for clinical epidemiology.
Martina Sladekova is a Lecturer in Psychology and a PhD Candidate at the School of Psychology, University of Sussex.
Teshome K. Deressa is a PhD candidate at the Centre for Environmental Sciences at Hasselt University.
Stephan B. Bruns is Associate Professor of Environmental Economics at Hasselt University.
Daniele Fanelli is an Assistant Professor at the Doctoral Centre in the School of Social Sciences, Heriot-Watt University, and a Visiting Fellow at the Department of Methodology, London School of Economics and Political Science.
T. D. Stanley is Professor of Meta-Analysis, Emeritus, and Honorary Professor of Economics at Deakin University, Melbourne Australia. He is an elected Fellow of the Society of Research Synthesis Methods and Convener of the Meta-Analysis of Economics Research Network and was the Julia Mobley Professor of Economics at Hendrix College, Conway USA.
Open Research
DATA AVAILABILITY STATEMENT
The majority of the data is available from the references resources (with a portion of the raw data unavailable due to restrictions from the original owners). Anonymized summary estimates for reproducing the analyses is accessible at OSF. See https://osf.io/bgfzp/ for data and analysis scripts.