Sequential Discovery, Thinking Versus Dredging, and Shrink or Sink

John P A Ioannidis

doi:10.1097/ede.0b013e31818207f6

RDLNetworkEkosistem

Hakkımızda SSS

Sequential Discovery, Thinking Versus Dredging, and Shrink or Sink — John P A Ioannidis (2008) | RDL Network

Abstract

4 min read

I am greatly honored by the very insightful comments of Senn,1 Willett,2 and Kraft3 on my commentary.4 Professor Senn1 nicely exemplifies that there is no need to invoke a sequential design to observe inflated effects. Any pilot (underpowered) study would do. Nevertheless, evidence in any field can be seen as sequential data accumulation. Meta-analysis, the total-evidence paradigm, is primarily a sequential (cumulative) enterprise5 and, for most topics, evidence is still in the pilot phase. As suggested by the Cochrane meta-analyses, even the best-conducted meta-analyses may have inflated effects, even after many trials are assembled. Moreover, I agree that additional problems beyond regression-to-the-mean are not necessary to see inflated effects. Yet, I believe I have mentioned enough empirical examples demonstrating inflationary practices4 (see also other writings6,7 and the Cochrane Methodology register8). We need more empirical evidence to measure the relative contribution of various reasons for inflated effects, but some healthy scepticism is warranted, as Senn suggests. Professor Willett2 offers an extremely illuminating and dense statement that would solve the entire problem: “If the original question was reasonable, the results are of interest whether statistically significant or not.” However, the phrasing that follows, “many of us have published many ‘negative’ studies,” is alarming. Why not: “all of us have published primarily ‘negative’ studies, which is what one expects to get usually, even with careful thinking and meticulous design, while dredging the hell out of data”? We can debate how many “negative” studies are expected, but meanwhile 100% of prognostic studies in the International Journal of Cancer in 2005 reported statistically significant results.9 I report here an inflated estimate, the most spectacular percentage—the percentage across 343 journals (1575 articles) was 95.8%. I definitely don't define epidemiology as collection of data that are sifted mindlessly. Epidemiology's strength is the careful thinking and planning, both in exploration and replication. However, the average project and paper is not conceived, designed, conducted and reported by giants of epidemiologic thinking of Willett's calibre. I simply propose that one fully records the process (how this was done) so that everybody can see, admire, replicate, and possibly critique analyses that allow large vibration of effects. I agree that data go beyond simple 2 × 3 tables. This is only one more reason to present them as explicitly as possible, with full attention to any prior biological hypotheses, potential recognized sources of confounding, misclassification, complexity of temporal relationships, and recall and selection biases.10 Reporting 2 × 9 or even 2 × 9000 tables is trivial; computers can readily handle petabytes of information.11 Consortia can be instrumental in enhancing both standardization and transparency of information,12 and efforts such as the Pooling Project of Prospective Studies of Diet and Cancer should attract more followers.13 Introductory epidemiology needs some reappraisal. Specifically, “consistency with other biological information” sounds august, but we need more empirical evidence on its workings. Sometimes we think we know everything about biology, while we know little or nothing. Using “consistency with other biological information” subjectively as the post hoc guarantor of poor data dredging is problematic. The commentary12 by Kraft (with whom I fully agree) nicely exemplifies the dangers of trusting biological consistency by describing how much one particular field (genetic epidemiology) has changed through the advent of “agnostic” genome-wide association studies14 (of which I am a great enthusiast). Several years ago, when I suggested that <10% to 30% of seemingly/partially replicated candidate genetic epidemiologic associations were true,6,15 many colleagues felt I was stubbornly unwilling to see the clear consistency of these candidate associations with other biological information. With the paradigm shift of the genome-wide association studies, the old ship was abandoned by its old-time enthusiasts to sink with all its cargo. (By the way, contrary to the deserters, I still believe that some candidate associations were true.) Now I hear talks celebrating that we have for the first time 200 (and rising) true associations, while 5 years ago the same lofty speakers were confident we already had 2000 true associations. What would happen if traditional epidemiology went through a similar paradigm shift in measurement capacity? Would many/most effects (including several “classics” of introductory epidemiology) shrink or sink? Interestingly, when we make more progress, apparently the associations that remain credible become fewer, claims for causality are toned down, and effects decrease. Kraft3 also highlights some of the dangers of stretching inflated effect sizes for predictive purposes. Certainly great caution is needed in the presentation and interpretation of this otherwise fascinating knowledge.16 One might argue that seasoned epidemiologists would be immune to tricks, e.g. the presentation of results as odds ratios of extreme centiles based on multiplicative models. However, empirical evidence shows that it is seasoned epidemiologists who use these tricks par excellence.17 Ordinary people and even most physicians don't even recognize what an odds ratio is.18 Educating thinking citizens may be useful, but is this feasible when even we scientists oversell our data?

Sequential Discovery, Thinking Versus Dredging, and Shrink or Sink

Abstract

Discussion(0)

Related publications

On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies

Required sample size and nonreplicability thresholds for heterogeneous genetic associations

Perspective, part of a Special Feature on Reconciling Art and Science for Sustainability Dual thinking for scientists

Why Most Discovered True Associations Are Inflated

Exposure-wide epidemiology: revisiting Bradford Hill

Related publications

Article2006
On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies
Article2006

Article2008
Required sample size and nonreplicability thresholds for heterogeneous genetic associations
Article2008

Article2015
Perspective, part of a Special Feature on Reconciling Art and Science for Sustainability Dual thinking for scientists
Article2015

Article2008
Why Most Discovered True Associations Are Inflated
Article2008

Article2015
Exposure-wide epidemiology: revisiting Bradford Hill
Article2015