Abstract
4 min readWare & Munafo 1 usefully overview the causes and consequences of and possible solutions to the problem of irreproducibility that haunts many currently published results. I am on the same wavelength with almost everything they say. I will discuss here one additional aspect that usually receives less attention in the current discussions about irreproducibility: the need to test with formal experimental studies the multiple solutions that are proposed and to identify measurable outcomes for the benefits but also the potential harms of each proposed intervention. Scientific practices are probably not yet totally broken. Most scientists still mean well, and plain fraud has not overrun science 2. However, scientific practices are fragile and can easily break, as many interested stakeholders pull from all sides. Any effort to improve practices may inadvertently have collateral harms. It behooves all of us who think about how to improve scientific practices to also consider what might go wrong with each intervention. Ideas on how to fix research are just ideas. They may look good, but may not work well in real life. We need to preconceive what might be the potential harms and be prepared to measure them. Moreover, unpredictable harms may also arise and we should be ready to detect them. Like testing new drugs, one wants to be prepared to capture information on known adverse events, but also have sentinel systems in place to capture unanticipated side effects. The more thinking goes into potential benefits and harms of interventions, the better. For example, many suggested improvements are based on reporting checklists (ranging from the methods checklist proposed by Nature 3 to the checklists contemplated by the National Institute of Health (NIH) 4 and the numerous reporting standards checklists for different study designs summarized by EQUATOR (Enhancing the QUAlity and Transparency Of health Research) 5). One may measure improvement by assessing whether adoption of checklists improved the completeness of reporting of requested items. However, this is naive. Improvement in this scale will occur by default. Investigators will comply if funding or acceptance of their paper for publication is dependent upon 27 check-marks. One should also be able to measure potential collateral harms; for example, how many studies report false or inaccurate (and thus misleading) information in their effort to satisfy requirements. Another very good idea is the review of full manuscripts without results before any data collection 6. There are several potential harms here; for example, this may lead to spurious compliance with the pre-specified format in reporting the study, even though the study had to deviate from the original plan for some (good?) reason—many studies, especially with human subjects, have to deviate substantially from the original plan during their conduct. Moreover, there is no guarantee that a pre-approved manuscript can eliminate or even meaningfully reduce data dredging, given that many subtle but influential choices in the analysis plan may not be fully transparent in the pre-approved manuscript 7. Finally, a practice where papers are pre-approved based on having sufficiently high power may force investigators to choose outcomes that have little scientific value, but good power to be assessed, as opposed to outcomes that are scientifically and/or clinically important but only modestly powered 8. A third example is pre-registration. While an excellent idea, much research is simply exploratory and thus cannot be meaningfully pre-registered. Forcing pre-registration for all research will simply force investigators to either abandon exploratory discovery research or to seemingly pre-register research that has already happened, making the literature even more misleading. I list here only three examples of policies which I personally support fervently, and which I have even been among the first to propose. I am still biased to believe that they are worthy of consideration. However, the devil is in the detail. I would loathe seeing a ‘perfect’ scientific literature where everything is pre-registered, all checklist items are checked and papers are written by robotic automata before the research is conducted, but no real progress is made. We need to find ways to improve science without destroying it. We need to find ways to reward excellence: this includes high impact, high quality, reproducibility, sharing culture and eventually the ability to translate information to useful knowledge and applications—and perhaps more 9. We should not compromise for less. None. The Meta-Research Innovation Center at Stanford is supported by a grant from the Laura and John Arnold Foundation.
Discussion(0)
No comments yet. Be the first to comment.