What Have We (Not) Learnt from Millions of Scientific Papers with <i>P</i> Values?

John P A Ioannidis

doi:10.1080/00031305.2018.1447512

Abstract

1 min read

P values linked to null hypothesis significance testing (NHST) is the most widely (mis)used method of statistical inference. Empirical data suggest that across the biomedical literature (1990–2015), when abstracts use P values 96% of them have P values of 0.05 or less. The same percentage (96%) applies for full-text articles. Among 100 articles in PubMed, 55 report P values, while only 4 present confidence intervals for all the reported effect sizes, none use Bayesian methods and none use false-discovery rate. Over 25 years (1990–2015), use of P values in abstracts has doubled for all PubMed, and tripled for meta-analyses, while for some types of designs such as randomized trials the majority of abstracts report P values. There is major selective reporting for P values. Abstracts tend to highlight most favorable P values and inferences use even further spin to reach exaggerated, unreliable conclusions. The availability of large-scale data on P values from many papers has allowed the development and applications of methods that try to detect and model selection biases, for example, p-hacking, that cause patterns of excess significance. Inferences need to be cautious as they depend on the assumptions made by these models and can be affected by the presence of other biases (e.g., confounding in observational studies). While much of the unreliability of past and present research is driven by small, underpowered studies, NHST with P values may be also particularly problematic in the era of overpowered big data. NHST and P values are optimal only in a minority of current research. Using a more stringent threshold, as in the recently proposed shift from P

What Have We (Not) Learnt from Millions of Scientific Papers with <i>P</i> Values?

Abstract

Discussion(0)

Related publications

Evolution of Reporting<i>P</i>Values in the Biomedical Literature, 1990-2015

Statistical significance and publication reporting bias in abstracts of reproductive medicine studies

P values in display items are ubiquitous and almost invariably significant: A survey of top science journals

Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false

p-Curve and p-Hacking in Observational Research

Related publications

Article2016
Evolution of Reporting<i>P</i>Values in the Biomedical Literature, 1990-2015
Article2016

Article2023
Statistical significance and publication reporting bias in abstracts of reproductive medicine studies
Article2023

Article2018
P values in display items are ubiquitous and almost invariably significant: A survey of top science journals
Article2018

Letter2013
Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false
Letter2013

Article2016
p-Curve and p-Hacking in Observational Research
Article2016