Welcome to the website for the book
The War on Statistical Significance
The American Statistician vs.
the New England Journal of Medicine
by Donald B. Macnaughton
From the Preface
The “threshold p-value”—the arbiter of statistical significance—has been a widely used gateway to believability and acceptance for publication in scientific research since 1925. However, a growing number of statisticians and other researchers say we should “move beyond” these ideas, suggesting we should greatly reduce our emphasis on them in scientific research. These authors are waging a well-intentioned, polite, and vigorous intellectual war on the ideas of a threshold p-value and statistical significance. This is a “good” war, because it forces important issues into the open, where they can be best understood and assessed.
This book grew from a sense that the threshold-p-value gateway to publication of scientific research results is highly useful but is also widely misunderstood. The book presents, from first principles, a modern view of the role of the gateway, as used by some scientific journals. The ideas are explained in terms of the recent disagreement about them between the editorial in a Special Issue on Statistical Inference of the American Statistician and a subsequent editorial in the New England Journal of Medicine.
The ideas are developed with almost no reference to mathematics. (A computer can do all the standard math if the user properly understands the key ideas.) The explanations are reinforced with practical examples. The discussion shows how the concept of a threshold-p-value gateway helps researchers and journal editors maximize the overall scientific, social, and commercial benefit of scientific research. The gateway does this by optimally balancing the rates of costly “false-positive” and “false-negative” errors in a scientific journal.
The book also discusses the important related ideas of a relationship between variables, a scientific hypothesis test, and the “replication crisis” in some branches of scientific research.
The body of the book, which covers the key ideas, is roughly 30% of the text. The remainder consists of 23 appendices that expand the ideas in useful directions.
The material is aimed at scientific researchers, journal editors, science teachers, and science students in the biological, social, and physical sciences. It will also be of interest to statisticians, data scientists, philosophers of science, and lay readers seeking an integrated modern view of the high-level operation of the study of relationships between variables in scientific research.
Table of Contents
- Chapter 1: Introduction
- Chapter 2: A Practical Use of a Threshold p-Value in Scientific Publishing
- 2.1 The Study of Relationships between Variables
- 2.2 The Weight of Evidence for the Existence of a Relationship
- 2.3 The Research Hypothesis and the Null Hypothesis
- 2.4 Positive Results and Negative Results
- 2.5 How Do We Tell Whether the Weight of Evidence Is Enough?
- 2.6 False-Positive Errors
- 2.7 False-Negative Errors
- 2.8 The Threshold p-Value Balances the Two Error Types
- 2.9 Interpreting the p-Value and the Threshold
- 2.10 Why Must We Draw a Line?
- 2.11 The Optimal Threshold p-Value for a Journal
- 2.12 Why Not Set the Threshold p-Value on a Case-by-Case Basis?
- 2.13 Bending the Rules about p-Values
- 2.14 A Case When We Don’t Need a p-Value
- 2.15 Do p-Values Make Decisions?
- 2.16 Publication Implications of a Low-Enough p-Value
- 2.17 What About the “Lost” Effects?
- 2.18 Alternatives to the p-Value
- 2.19 Negative Results Failing to Replicate an Earlier Positive Result
- 2.20 The Form, Size, and Importance of a Detected Effect
- 2.21 The History of the p-Value
- 2.21 Other Parallel Views
- Chapter 3: The Disagreement between the NEJM and TAS
- 3.1 A View of the Disagreement
- 3.2 Should Journals Use Threshold Values?
- 3.3 Conclusions
- Appendix A: Contrasting the 2016 ASA Statement with the 2019 TAS Special Issue Editorial
- Appendix B: Notes for Beginners
- Appendix C: Details about Nine Measures of the Weight of Evidence
- C.1 A General Example
- C.2 p-Value
- C.3 t-Statistic
- C.4 Confidence Interval
- C.5 Likelihood Ratio
- C.6 Bayes Factor
- C.7 Posterior Probability That the Research Hypothesis Is True
- C.8 Second-Generation p-Value
- C.9 D-Value
- C.10 Information Criteria
- C.11 The Underlying Assumptions
- C.12 The Monotonic Relationships among the Nine Measures
- C.13 Generalization
- C.14 Comparing the Nine Measures
- Appendix D: The Distribution of p-Values
- Appendix E: Some Criticisms of the Ideas
- Appendix F: Unconvincing Arguments about the Preferred Measure
- Appendix G: Teaching p-Value Concepts to Beginners
- Appendix H: The Rate of Publication of False-Positive Results
- Appendix I: Examples of the Publication of Key Negative Results
- Appendix J: Do Research Studies Usually Study Relationships?
- Appendix K: The Objectivity–Subjectivity Dimension
- Appendix L: The Initiative to Preregister Empirical-Research Studies
- Appendix M: Campbell and Gustafson (2019) Implications
- Appendix N: Sharing Data, Programs, and Output
- Appendix O: The NEJM Approach to Threshold Values
- Appendix P: Effects of Abandoning the Threshold-Value Gateway
- Appendix Q: Comparing Hypothesis Testing with Popper’s Falsification
- Appendix R: The Jeffreys-Lindley Paradox
- Appendix S: Do Machine-Learning Systems Need Threshold Values?
- Appendix T: A Case When We Know the Exact Values of Parameters
- Appendix U: Should We Allow True Values of Parameters to Vary?
- Appendix V: Parameter Sign and Magnitude Errors
- Appendix W: Are the Ideas in This Book “Real”?
About the Author
Donald Macnaughton has been a statistical consultant for more than 40 years. He has managed the statistical aspects of research in the fields of experimental psychology, zoology, drug dependence, nursing, education, business, geography, physical education, and inmate rehabilitation, among others. His consulting work supports and informs his main interest, which is to read, understand, and write about the vital role of the field of statistics in scientific research.
The references in the book (with clickable links) are listed here.
The appendices of the book contain five graphs. The computer programs to generate the graphs are available in a ZIP file. These programs consist of source code in text files in the languages R and SAS. They will be useful to readers who wish to understand how the figures were generated. The programs are heavily annotated to assist understanding. Click here to download the ZIP file.
The book is available in paperback from some booksellers now (as listed below) and should be available from other major booksellers in early- to mid-June, 2021.
Title: The War on Statistical Significance: The American Statistician vs. the New England Journal of Medicine
Author: Donald B. Macnaughton
Page count: xvi + 240