The War on Statistical Significance
The American Statistician vs.
the New England Journal of Medicine


Al-Lamee, R., Thompson, D., Dehbi, H.-M., Sen, S., Tang, K., Davies, J., Keeble, T., et al. 2018. “Percutaneous Coronary Intervention in Stable Angina (ORBITA): A Double-Blind, Randomised Controlled Trial.” The Lancet 391: 31–40.

Arbuthnot, J. 1710. “II. An Argument for Divine Providence, Taken from the Constant Regularity Observed in the Births of Both Sexes.” Philosophical Transactions of the Royal Society of London 27: 186–90. Reprinted in Studies in the History of Statistics and Probability Volume II, eds. M. G. Kendall and R. L. Plackett, High Wycombe UK: Griffin, 30–34. and

ATLAS Collaboration. 2012a. “Combined Search for the Standard Model Higgs Boson in pp Collisions at √s = 7 TeV with the ATLAS Detector.” Physical Review D 86, 032003.

———. 2012b. “Observation of a New Particle in the Search for the Standard Model Higgs Boson with the ATLAS Detector at the LHC.” Physics Letters B 716: 1–29.

Baker, A. 2016. “Simplicity,” The Stanford Encyclopedia of Philosophy (Winter 2016, ed. E. N. Zalta).

Bayarri, M. J., Benjamin, D. J., Berger, J. O., and Sellke, T. M. 2016. “Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses.” Journal of Mathematical Psychology 72: 90–103.

Bellhouse, D. R. 1993. “Invited Commentary: p Values, Hypothesis Tests, and Likelihood.” American Journal of Epidemiology 137: 497–99.

Benjamin, D. J., and Berger, J. O. 2019. “Three Recommendations for Improving the Use of p-Values.” American Statistician 73:sup1: 186–91.

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., et al. 2017. “Redefine Statistical Significance.” Nature Human Behaviour 2: 6–10.

Benjamini, Y. 2020. “Selective Inference: The Silent Killer of Replicability.” Harvard Data Science Review 2(4).

Berger, J. O., and Berry, D. A. 1988. “Statistical Analysis and the Illusion of Objectivity.” American Scientist 76: 159–65.

Berger, J. O., and Sellke, T. 1987. “Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence” (with discussion). Journal of the American Statistical Association 82: 112–39.

Berkson, J. 1938. “Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test.” Journal of the American Statistical Association 33: 526–36.

Bernoulli, D. “Recherches physiques et astronomiques, sur le problème proposé pour la seconde fois par l’Académie Royale des Sciences de Paris : quelle est la cause physique de l’inclinaison des plans des orbites des planètes par rapport au plan de l’Equateur de la révolution du soleil autour de […].” Paris : chez Bachelier, Libraire, 1808. ETH-Bibliothek Zürich, Rar 15947, Note: Todhunter (1865: 222–23) gives an English explanation of Bernoulli’s contribution.

BIPM (Bureau International des Poids et Mesures). 2006. “The International System of Units.”

Blume, J. D., Greevy, R. A., Welty, V. F., Smith, J. R., and Dupont, W. D. 2019. “An Introduction to Second-Generation p-Values.” American Statistician 73:sup1: 157–67.

Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. R. 2009. Introduction to Meta-Analysis. Chichester, UK: Wiley.

Brattain, M. 2007. “Race, Racism, and Antiracism: UNESCO and the Politics of Presenting Science to the Postwar Public.” American Historical Review 112: 1386–1413.

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., et al. 2018. “Evaluating the Replicability of Social Science Experiments in Nature and Science between 2010 and 2015.” Nature Human Behaviour 2: 637–44.

Campbell, H., and Gustafson, P. 2019. “The World of Research Has Gone Berserk: Modeling the Consequences of Requiring ‘Greater Statistical Stringency’ for Scientific Publication.” American Statistician 73:sup1: 358–73.

Casella, G., and Berger, R. L. 2002. Statistical Inference. 2nd ed. Delhi, India: Cengage Learning India.

Center for Open Science. 2021a. “Registered Reports: Peer-Review before Results Are Known to Align Scientific Values and Practices.” Accessed January 13, 2021.

———. 2021b “The TOP Guidelines.” Accessed January 13, 2021.

Chatterjee, S., and Hadi, A. S. 2012. Regression Analysis by Example. Hoboken, NJ: Wiley.

“Checklist checked [online title: Checklists Work to Improve Science],” editorial. 2018. Nature 556: 273–74.

Clarivate Analytics. 2020a. Journal Citation Reports [online].

———. 2020b. [Journal Citation Reports infographic web page].

CMS Collaboration 2012. “Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC.” Physics Letters B 716: 30–61.

Colquhoun, D. 2019. “The False Positive Risk: A Proposal Concerning What to Do About p-Values.” American Statistician 73:sup1: 192–201.

Cox, D. R. 2006. Principles of Statistical Inference. Cambridge UK: Cambridge University Press.

———. 2020. “Statistical Significance.” Annual Review of Statistics and Its Application 7: 1–10.

Della Negra, M., Jenni P., and Virdee, T. S. 2012. “Journey in the Search for the Higgs Boson: The ATLAS and CMS Experiments at the Large Hadron Collider.” Science 338: 1560–68.

Demidenko, E. 2016. “The p-Value You Can’t Buy.” American Statistician 70: 33–38.

Denworth, L. 2019. “A Significant Problem [online title: The Significant Problem of P Values].” Scientific American 321 (4, Oct): 63–67.

Dienes, Z. 2011. “Bayesian versus Orthodox Statistics: Which Side Are You On?” Perspectives on Psychological Science 6: 274–90.

Edgeworth, F. Y. 1885a. “Observations and Statistics: An Essay on the Theory of Errors of Observation and the First Principles of Statistics.” Transactions of the Cambridge Philosophical Society 14: 138–69. Abstracted in Proceedings of the Cambridge Philosophical Society 5: 310–12 Corrigendum in Proceedings of the Cambridge Philosophical Society 6: 101–102

———. 1885b. “Methods of Statistics.” Journal of the Statistical Society of London, Jubilee Volume: 181–217.

———. 1885c. “On Methods of Ascertaining Variations in the Rate of Births, Deaths and Marriages.” Journal of the Royal Statistical Society 48: 628–49.

———. 1885d. “The Calculus of Probabilities Applied to Psychical Research.” Proceedings of the Society for Psychical Research 3: 190–99.

Efron, B., and Hastie, T. 2016. Computer Age Statistical Inference. New York: Cambridge University Press.

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. “Least Angle Regression” (with discussion). Annals of Statistics 32: 407–99.

Fisher, R. A. 1925. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. The 14th edition appears in Fisher (1991).

———. 1926. “The Arrangement of Field Experiments.” Journal of the Ministry of Agriculture 33: 503–13.

———. 1991. Statistical Methods, Experimental Design, and Scientific Inference. Edited by J. H. Bennett, Oxford: Oxford University Press.

Fleischmann, M., and Pons, S. 1989. “Electrochemically Induced Nuclear Fusion of Deuterium.” Journal of Electroanalytical Chemistry 261: 301–8. See also Errata in volume 263: 187–88.

Fraser, D. A. S., Reid, N., and Lin, W. 2018. “When Should Modes of Inference Disagree? Some Simple but Challenging Examples.” Annals of the Applied Statistics 12: 750–70.

Fricker, R. D., Jr., Burke, K., Han, X., and Woodall, W. H. 2019. “Assessing the Statistical Analyses Used in Basic and Applied Social Psychology after Their p-Value Ban.” American Statistician 73:sup1: 374–84.

Gelman, A. 2015. “The Connection between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective.” Journal of Management 41: 632–43.

Gelman, A., and Carlin, J. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9: 641–51.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: CRC Press.

Gelman, A., and Hennig, C. 2017. “Beyond Subjective and Objective in Statistics.” Journal of the Royal Statistical Society Series A 180: 967–1033.

Gelman, A., and Tuerlinckx, F. 2000. “Type S Error Rates for Classical and Bayesian Single and Multiple Comparison Procedures.” Computational Statistics 15: 373–90.

Gosset, W. S. 1908. See Student (1908).

Hardwicke, T. E., Serghiou, S., Janiaud, P., Danchev, V., Crüwell, S., Goodman, S. N., and Ioannidis, J. P. A. 2020. “Calibrating the Scientific Ecosystem through Meta-Research.” Annual Review of Statistics and Its Application 7: 11–37.

Harrell, F. E., Jr. 2020. “p-Values and Type I Errors Are Not the Probabilities We Need” (blog entry). Accessed January 14, 2021.

Harrington, D., D’Agostino, R. B., Sr., Gatsonis, C., Hogan, J. W., Hunter, D. J., Normand, S. T., Drazen, J. M., and Hamel, M. B. 2019. “New Guidelines for Statistical Reporting in the Journal.” New England Journal of Medicine 381: 285–86.

Hedges, L. V., and Olkin, I. 1985. Statistical Methods for Meta-Analysis. San Diego, CA: Academic Press.

Held, L., and Ott, M. 2016. “How the Maximal Evidence of p-Values against Point Null Hypotheses Depends on Sample Size.” American Statistician 70: 335–41.

———. 2018. “On p-Values and Bayes Factors.” Annual Review of Statistics and Its Application 5: 393–419.

Hodges, J. L., Jr., and Lehmann, E. L. 1954. “Testing the Approximate Validity of Statistical Hypotheses.” Journal of the Royal Statistical Society Series B 16: 261–68.

Huizenga, J. R. 1993. Cold Fusion: The Scientific Fiasco of the Century. New York: Oxford University Press.

Ioannidis, J. P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.

———. 2008. “Effect of Formal Statistical Significance on the Credibility of Observational Associations.” American Journal of Epidemiology 168: 374–83.

———. 2019a. “The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance.” Journal of the American Medical Association 321 (21): 2067–68.

———. 2019b. “Retiring Significance: A Free Pass to Bias,” letter to the editor. Nature, 567: 461.

Jager, L., and Leek, J. T. 2014. “An Estimate of the Science-Wise False Discovery Rate and Application to the Top Medical Literature” (with discussion). Biostatistics 15: 1–45.

Ji, X. 2017. “Dark Matter Remains Elusive.” Nature 542: 172.

Johnson, V. E. 2019. “Evidence from Marginally Significant t Statistics.” American Statistician 73:sup1: 129–34.

Kafadar, K. 2019a. “President’s Corner: Statistics and Unintended Consequences.” AmStat News, June: 3–4.

———. 2019b. “President’s Corner: The Year in Review … And More to Come.” AmStat News, December: 3–4.

Kass, R. E., and Raftery, A. E. 1995. “Bayes Factors.” Journal of the American Statistical Association 90: 773–95.

Kennedy-Shaffer, L. 2019. “Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing.” American Statistician 73:sup1: 82–90.

Konishi, S., and Kitagawa, G. 2008. Information Criteria and Statistical Modelling. New York: Springer.

LeCun, Y., Bengio, Y., and Hinton, G. 2015. “Deep Learning.” Nature, 521: 436–44.

Lehmann, E. L., and Romano, J. P. 2005. Testing Statistical Hypotheses. 3rd ed. New York: Springer.

Louçã, F. 2009. “Emancipation through Interaction—How Eugenics and Statistics Converged and Diverged.” Journal of the History of Biology 42: 649–84.

Macnaughton, D. B. 2016. “Comment on ‘A Low-Uncertainty Measurement of the Boltzmann Constant’.” Metrologia 53: 108–15.

———. 2021. Computer programs for the present book.

Mason, S. F. 1962. A History of the Sciences. rev. ed. New York: Collier Books.

Mayo, D. G. 2014. “On the Birnbaum Argument for the Strong Likelihood Principle” (with discussion). Statistical Science 29: 227–66.

———. 2018. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge UK: Cambridge University Press.

McGrayne, S. B. 2011. The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy.New Haven, CT: Yale University Press.

McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. 2019. “Abandon Statistical Significance.” American Statistician 73:sup1: 235–45.

McShane, B. B., Tackett, J. L., Böckenholt, U., and Gelman, A. 2019. “Large-Scale Replication Projects in Contemporary Psychological Research.” American Statistician 73:sup1: 99–105.

Michelson, A. A., and Morley, E. W. 1887. “On the Relative Motion of the Earth and the Luminiferous Ether.” American Journal of Science (third series) 34: 333–45.

Mohr, P. J., Newell, D. B., and Taylor, B. N. 2016. “CODATA Recommended Values of the Fundamental Physical Constants: 2014.” Reviews of Modern Physics 88: 1–73.

Murphy, K. R., Myors, B., and Wolach, A. 2014. Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. 4th ed. New York: Routledge.

NEJM (New England Journal of Medicine). 2021. “Statistical Reporting Guidelines” [online]. Accessed January 14, 2021.

Newton, I. 1687 (1999). The Principia: Mathematical Principles of Natural Philosophy. Translated by I. B. Cohen and A. Whitman, Berkeley, CA: University of California Press.

Neyman, J., and Pearson, E. S. 1933 “IX. On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231: 289–337.

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., and Mellor, D. T. 2018. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115 (11): 2600–2606.

Palus, S. 2018. “Make Research Reproducible.” Scientific American 319 (4, October): 56–59.

Pearson, K. 1900. “On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling.” Philosophical Magazine Series 5, 50: 157–75. See for links.

———. 1905. National Life from the Standpoint of Science. 2nd ed. London: Adam and Charles Black.

Pfizer Inc 2020. “Pfizer and BioNTech Conclude Phase 3 Study of COVID-19 Vaccine Candidate, Meeting All Primary Efficacy Endpoints” (press release). Accessed December 20, 2020.

Pike, H. 2019. “Statistical Significance Should Be Abandoned, Say Scientists.” BMJ 364: l1374.

Polack, F. P., Thomas, S. J., Kitchin, N., Absalon, J., Gurtman, A., Lockhart, S., Perez, J. L., et al. 2020. “Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine.” New England Journal of Medicine (online version). Accessed December 20, 2020.

Popper, K. R. 1980. The Logic of Scientific Discovery. London: Routledge.

———. 1989. Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge.

———. 1992. Realism and the Aim of Science. London: Routledge.

Provine, W. B. 1986. “Geneticists and Race.” American Zoologist 26: 857–87.

Roche, J. J. 1998. The Mathematics of Measurement: A Threshold History. London: Athlone Press.

Rosenthal, R. 1979. “The ‘File Drawer Problem’ and Tolerance for Null Results.” Psychological Bulletin 86: 638–41.

Rosnow, R. L., and Rosenthal, R. 1989. “Statistical Procedures and the Justification of Knowledge in Psychological Science.” American Psychologist 44 (10): 1276–84.

Rouder, J. N. 2014. “Optional Stopping: No Problem for Bayesians.” Psychonomic Bulletin & Review 21: 301–308.

SAS Institute. 2021. “GLMSELECT Procedure > Details > Criteria Used in Model Selection Methods.” In the “Procedures” section of SAS/STAT User’s Guide. Accessed January 14, 2021.

Sellke, T., Bayarri, M. J., and Berger, J. O. 2001. “Calibration of p Values for Testing Precise Null Hypotheses.” American Statistician 55: 62–71.

Sen, A., and Srivastava, M. 1990. Regression Analysis: Theory, Methods, and Applications. New York: Springer.

“Significant Debate” [online title: “It’s Time to Talk About Ditching Statistical Significance”], editorial. 2019. Nature 567: 283.

Spertus, J. A., Jones, P. G., Maron, D. J., O’Brien, S. M., Reynolds, H. R., Rosenberg, Y., Stone, G. W., et al. 2020. “Health-Status Outcomes with Invasive or Conservative Care in Coronary Disease.” New England Journal of Medicine 382: 1408–19.

Spiegelhalter, D. J., Abrams, K. R., and Myles, J. P. 2004. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester UK: Wiley.

Stigler, S. M. 1978. “Francis Ysidro Edgeworth, Statistician.” Journal of the Royal Statistical Society Series A 141, Part 3: 287–322.

———. 1986. The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, MA: Belknap Press.

———. 1999. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, MA: Harvard University Press.

———. 2016. The Seven Pillars of Statistical Wisdom. Cambridge, MA: Harvard University Press.

Student. 1908. “The Probable Error of a Mean.” Biometrika 6: 1–25.

Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society Series B 58: 267–288.

Todhunter, I. 1865. A History of the Mathematical Theory of Probability from the Time of Pascal to That of Laplace. New York: Hardgrass.

Trafimow, D. 2020. In “The NISS Statistics Debate”, an online debate held on October 15, 2020. The relevant comments begin at 8 minutes, 45 seconds.

UNESCO (United Nations Educational, Scientific and Cultural Organization). 1952. The Race Concept: Results of an Inquiry. Document code SS.53/II.9/A. Available at

Wackerly, D. D., Mendenhall, W., III, and Scheaffer, R. L. 2008. Mathematical Statistics with Applications. Belmont CA: Brooks/Cole.

Wagenmakers, E.-J. 2007. “A Practical Solution to the Pervasive Problems of p Values.” Psychonomic Bulletin & Review 14: 779–804.

Wald, A. 1950. Statistical Decision Functions. New York: Wiley.

Wasserstein, R. L. ed. 2016. “ASA Statement on Statistical Significance and p-Values.” American Statistician 70: 131–33.

———. 2018. “ASA Journals Policy regarding Data Sharing and Reproducibility”, blog post on January 17 to the “ASA Connect” online community of the American Statistical Association.

———. 2019. “Moving to a World Beyond p < 0.05.” Talk given at “Real World Significance Beyond p-Value” online seminar on May 21, 2019. The relevant comments begin at 31 minutes 5 seconds.

Wasserstein, R. L., Schirm, A. L., and Lazar, N. A. 2019. “Moving to a World Beyond ‘p < 0.05’,” editorial. American Statistician 73:sup1: 1–19.

Weiss, S. F. 2010. “After the Fall: Political Whitewashing, Professional Posturing, and Personal Refashioning in the Postwar Career of Otmar Freiherr von Verschuer.” Isis 101: 722–58.

Wellek, S. 2010. Testing Statistical Hypotheses of Equivalence and Noninferiority. 2nd ed. Boca Raton, FL: CRC Press.

Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., and Wagenmakers, E.-J. 2011. “Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests.” Perspectives on Psychological Science 6: 291–98.

Wikipedia contributors. 2021. “Michelson-Morley experiment.” Accessed January 14, 2021.