Skip to main content

Between two stools: preclinical research, reproducibility, and statistical design of experiments

Abstract

Translation of animal-based preclinical research is hampered by poor validity and reproducibility issues. Unfortunately, preclinical research has ‘fallen between the stools’ of competing study design traditions. Preclinical studies are often characterised by small sample sizes, large variability, and ‘problem’ data. Although Fisher-type designs with randomisation and blocking are appropriate and have been vigorously promoted, structured statistically-based designs are almost unknown. Traditional analysis methods are commonly misapplied, and basic terminology and principles of inference testing misinterpreted. Problems are compounded by the lack of adequate statistical training for researchers, and failure of statistical educators to account for the unique demands of preclinical research. The solution is a return to the basics: statistical education tailored to non-statistician investigators, with clear communication of statistical concepts, and curricula that address design and data issues specific to preclinical research. Statistics curricula should focus on statistics as process: data sampling and study design before analysis and inference. Properly-designed and analysed experiments are a matter of ethics as much as procedure. Shifting the focus of statistical education from rote hypothesis testing to sound methodology will reduce the numbers of animals wasted in noninformative experiments and increase overall scientific quality and value of published research.

Introduction

“…I think we’re falling between two stools at the moment.… I think we have to take a step backward and address the basics of our game.”

––Donal Lenihan 25 Nov 2020, RTÉ Rugby Podcast, on Ireland’s need to revise training strategy following a string of defeats to England.

Criticism of much animal-based preclinical research has centred on reproducibility issues and poor translation [1, 2]. Causes are systemic and multifactorial, and include poor model fidelity, clinical irrelevance of target biomarkers or molecular pathways, and between-lab disparities in models and procedures [3, 4]. Difficulties in verifying and replicating methodology [5] and methodological issues related to poor statistical design and analysis are also major contributors [6,7,8,9,10]. Translational failure has massive economic repercussions. Advances in therapeutic agents or diagnostics development are more than offset by multimillion-dollar losses in investment, and ultimately unsustainable research and development costs [6, 11, 12]. There is also a significant ethical component to these failures. If questionable methodology produces biased or invalid results, evidence derived from animal-based research cannot be a reliable bridge to human clinical trials [13]. It is difficult to justify the continued use of millions of animals each year if the majority are wasted in non-informative experiments that fail to produce tangible benefit.

In this commentary, I suggest that preclinical research has ‘fallen between two stools’, by not conforming to either clinical trial or agricultural research traditions or skillset camps, and with little of the rigour of either. The solution is a return to the basics for statistical educators and consultants: statistical training explicitly tailored to non-statistician investigators, and coverage of statistical issues and topics relevant to preclinical research. In particular, I urge a change in focus from statistics as ‘just maths’ to statistics as process. I argue that reform of introductory statistics curricula along these lines could go far to reverse statistical pathologies common to much of the preclinical research literature.

Main text

Two stools of competing traditions

The clinical trial and agricultural/industrial research traditions show considerable divergence in focus and methodology. Clinical trials are performed when there is uncertainty regarding relative efficacy of a specific clinical intervention. They are constrained by the necessity to minimize subject risk of mortality and severe adverse events. In general, clinical trials tend to be relatively large and simple, with only two or a few comparator interventions randomly assigned to many subjects, ideally representative of the target population. Although clinical trials have a history going back several hundred years (e.g. [14]), the randomized controlled trial (RCT) as the gold standard was a relatively recent development, with the first modern RCT performed in 1946 [15, 16], and formalisation only in the late 1970s. Lagging implementation was due in part to resistance to the so-called “numerical approach” by supporters of the non-randomised “let's-try-it-and-see” attitude to clinical research problems [17, 18]. Meanwhile, methodology for observational studies was being developed in parallel. Cohort studies in particular have had a key role in epidemiological investigations of carcinogenic and environmental hazards when RCTs are not feasible [19]. Because factors are not randomly assigned to subjects, inferring causality requires stringent methodological safeguards for minimising confounding and bias [15, 20, 21].

In contrast, agricultural/industrial designs are characterised by small sample sizes and multiple factors studied simultaneously. In addition to randomisation, key design features include replication and blocking (‘local control’), coupled with formal statistically-structured arrangements of input variables, such as randomized complete block and factorial designs [22]. Agricultural designs were developed primarily by Sir Ronald Fisher in the early half of the twentieth century. These principles were subsequently extended to industrial experimentation by George Box and collaborators [23]. Industrial experiments are further distinguished by sequential implementation (data from a small or restricted group of runs in the original experiment can be used to inform the next experiment), with prompt feedback (immediacy), allowing iteration and relatively rapid convergence to target solutions [24]. For these applications, variable screening and model building are both of interest, and ‘design’ is essentially the imposition of a statistical model as a useful approximation to the response of interest [23, 25].

Preclinical studies: between the stools

Animal-based research studies are unique for the explicit ethical obligation to minimise the numbers of animals used. Application of Three Rs (Replacement, Reduction, Refinement) principles are based on the premise that maximum scientific value should be obtained with minimal harms [26]. However, over-emphasis on numbers reduction has contributed to underpowered experiments generating unreliable, and ultimately noninformative, results [27, 28].

Small sample sizes, large variability, multi-group comparisons, and the exploratory nature of much preclinical research suggest that study designs should be more aligned with the agricultural/industrial tradition. Fisher-type designs (such as randomised complete blocks and factorials) are suitable for purpose and have been vigorously promoted [12, 29,30,31,32,33], as have procedural methods for controlling variation without increasing sample size [34], and design features that increase validity [1, 35]. However, these methods seem to be virtually unknown in the preclinical literature [7, 8, 36,37,38]. Two-group comparisons more typical of clinical trials are common, although unsuited to assessing multiple factors with interactions. Informal examination of introductory textbooks and statistics course syllabi suggest that knowledge gaps are due in part to sparse formal training in experimental design, and neglect of analytical methods more suited to preclinical data. Compounding these problems is lack of general statistical oversight. Unlike human-based studies [39], few animal research oversight committees in the USA have access to properly qualified biostatisticians, statistical analysis plans and study preregistration are not required, and protocol review criteria vary considerably between institutions [40].

Statistical pathologies in the preclinical literature

Bad statistical practices are very deeply entrenched in the preclinical literature. Many of the major errors observed in the research literature involve statistical basics [41,42,43]. Statistics service courses tend to emphasise mathematical aspects of probability and null hypothesis significance testing at the expense of non-mathematical components of statistical process [44,45,46]. Consequently, it is now part of the belief system of many investigators that ‘statistical significance (P < 0.05)’ is the major criterion for assessing biological importance of results, and that P-values are an intrinsic property of the biological event or group of animals being studied [47]. As a result, there is over-reliance on rote hypothesis testing and P-values to interpret results. Related pathologies include reporting of orphan inexact P-values with no context, P-hacking, N-hacking, selective reporting, and spin [41, 48].

A second problem area is poor understanding by investigators of basic statistical concepts and operational definitions. Statistical terms are frequently conflated with lay meanings, confused with other technical definitions, or ignored. Concepts that seem especially misunderstood include ‘study design’, ‘randomisation’, ‘cohort’, ‘unit of analysis’, and ‘replication’. To investigators, ‘study design’ refers primarily to descriptions of technical methodology and materials, e.g. [49]. To applied statisticians, ‘study design’ is the formal arrangement and structuring of independent or predictor variables hypothesized to affect the response or outcome of interest. A good study design maximizes the experimental signal by accounting for diverse sources of variability [31, 50, 51]), and incorporates specific design features to ensure results are reliable and valid, such as correct specification of the unit of analysis, relevant outcome measures, inclusion and exclusion criteria, and bias minimization methods [8, 35, 52]. ‘Randomisation’ to statisticians is a formal probabilistic process that minimizes selection bias and effect of latent confounders, and is the cornerstone for statistical inference. In contrast, randomisation in preclinical studies seems to be frequently misinterpreted in the lay sense of ‘unplanned’ or ‘haphazard’ [53], or is likely not performed at all [8, 38, 54, 55]. The common habit of referring to a group of animals subjected to a given treatment or intervention as a ‘cohort’ likely reflects non-random allocation of subjects to a defined intervention group, an invalid and confounded assignment strategy [56]. The term ‘cohort’ actually refers to groups of subjects in observational studies, where group membership is defined by some common characteristic [19]. It does not refer to experimental treatment groups with group allocation determined by randomisation. The meaning of ‘unit of analysis’ is virtually unknown, or confused with biological and observational units [56,57,58]. ‘Replication’ is frequently interpreted solely as duplication of the total sample size for ‘reproducibility’ [59], rather than as an independent repeat run of each combination of treatment factors [25].

A third area of concern is that the conventional statistical arsenal of t-tests, ANOVA, and χ2 tests [60, 61] is unsuited for analysing ‘problem’ data typical of many preclinical studies. ‘Problem’ data include non-gaussian, correlated (clustered, nested, time dependencies), or non-linear data; data that are missing at random or due to dropout or attrition; data characterised by over-representation of true zeros; and high-dimensional data. A major deficiency that must be addressed is the focus of introductory courses on methods virtually unchanged since the 1950s, with little coverage of modern methods more appropriate for such data [8, 35, 44].

Finally, there is little attention paid to methods for identifying diverse sources of variation during experiment planning. Research papers rarely report auxiliary variables and conditions related to animal signalment, environment, and procedures only indirectly related to the main experiments, e.g. [62]. Such variables contribute to unpredictable effects on animals and experimental results, resulting in uncontrolled variation that obscures true treatment effects. For example, systematic investigations of factors contributing to survival time in mouse models of amyotrophic lateral sclerosis suggested that claims for therapeutic efficacy were most likely due to the effects of uncontrolled variation rather than actual drug effects [12, 29, 33].

Outlook

Lack of knowledge on the part of investigators is related to training deficiencies on the part of statistics educators. The solution is a return to the basics: statistical education that meets the needs of non-statistician investigators, and curricula addressing design and data issues specific to preclinical research. This is hardly new: in 1954, John Tukey identified as essential that “statistical methods should be tailored to the real needs of the user” [63], and this has been repeated in the decades since [9, 44, 46, 64, 65]. Investigators still identify better training in statistics and statistical methods as a high priority [9, 64]. The June 2021 report by the Advisory Committee to the Director of the National Institutes of Health (NIH-ACD) made five major recommendations to improve rigor and reproducibility of animal-based research, among which was recognition of the need for “modern and innovative statistics curricula relevant to animal researchers” [9].

What do researchers need? The poor internal validity characterising much preclinical research [66] reflects poor understanding of the upstream basics of statistically-based study design and data sampling strategies. Unreliable downstream results cannot be rescued by fancy analyses after the fact, as Fisher himself warned [67]. Therefore, the concept that good statistical principles must be built in during planning and before data are collected must be introduced and reinforced. This can be accomplished first, by more appropriate training of entry-level researchers with courses and topic coverage more attuned to specific need, and second by removal of longstanding barriers (such as cost and academic credit) to early consultation with appropriately-training statisticians. Early formal involvement of applied statisticians in the planning process must be encouraged and rewarded [9, 68].

Statistical educators and consultants must be re-educated to better address actual research needs. ‘Statistics’ is neither just maths nor an analytical frill tacked on to a study after data have been collected. Instead, statisticians must structure instructional materials to reflect the basic tenets of statistical process: design before inference, and data quality before analysis [69]. Data curation skills are also part of good statistical practice [46], identified as such for nearly a century [70]. These practices are not strongly mathematical, and unfortunately statisticians tend to be uninterested in non-mathematical procedures [46, 71]. Second, service courses must shift away from pedagogical approaches common to applied maths or algebra, where uncritical analysis of a data set leads to a fixed ‘correct’ solution [46, 71, 72]. Procedural change could be accelerated by statisticians becoming more aware of best-practice expectations though evidence-based planning [73] and reporting [74] guidelines. These tools can direct early-stage study planning to ensure that procedures strengthening study validity can be incorporated [4, 35, 74, 75].

Properly designed and analysed experiments are an ethical issue [28, 66, 69]. Shifting the focus of statistical education from rote hypothesis testing to sound methodology should ultimately reduce the numbers of animals wasted in noninformative experiments and increase overall scientific quality and value of published research.

Availability of data and materials

Not applicable.

Abbreviations

3Rs:

Replacement, Reduction, Refinement

NIH-ACD:

Advisory Committee to the Director of the National Institutes of Health

RCT:

Randomised controlled trial

References

  1. Bailoo JD, Reichlin TS, Würbel H. Refinement of experimental design and conduct in laboratory animal research. ILAR J. 2014;55(3):383–91.

    Article  CAS  PubMed  Google Scholar 

  2. Lowenstein PR, Castro MG. Uncertainty in the translation of preclinical experiments to clinical trials. Why do most phase III clinical trials fail? Curr Gene Ther. 2009;9(5):368–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. McGonigle P, Ruggeri B. Animal models of human disease: challenges in enabling translation. Biochem Pharmacol. 2014;87:162–71.

    Article  CAS  PubMed  Google Scholar 

  4. van der Worp HB, Sandercock PAG. Improving the process of translational research. BMJ. 2012;245: e7837.

    Article  Google Scholar 

  5. Errington TM, Denis A, Allison AB, Araiza R, Aza-Blanc P, Bower LR, Campos J, Chu H, Denson S, Dionham C, et al. Experiments from unfinished registered reports in the reproducibility project: cancer biology. Elife. 2021;10: e73430.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol. 2015;13: e1002165.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Macleod MR. Why animal research needs to improve. Nature. 2011;477:511.

    Article  PubMed  Google Scholar 

  8. Macleod MR, Lawson McLean A, Kyriakopoulou A, Serghiou S, de Wilde A, Sherratt N, Hirst T, Hemblade R, Bahor Z, Nunes-Fonseca C, et al. Risk of bias in reports of in vivo research: a focus for improvement. PLOS Biol. 2015;13(11): e1002301.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Wold B, Tabak LA, Advisory Committee to the Director. ACD working group on enhancing rigor, transparency, and translatability in animal. Washington, DC: Department of Health and Human Services; 2021.

    Google Scholar 

  10. Van Calster B, Wynants L, Riley RD, van Smeden M, Collins GS. Methodology over metrics: current scientific standards are a disservice to patients and society. J Clinical Epidemiol. 2021;138:219–26.

    Article  Google Scholar 

  11. Ledford H. 4 ways to fix the clinical trial. Nature. 2011;477:526–8.

    Article  CAS  PubMed  Google Scholar 

  12. Perrin S. Make mouse studies work. Nature. 2014;507:423–5.

    Article  PubMed  Google Scholar 

  13. Macleod M. Learning lessons from MVA85A, a failed booster vaccine for BCG. BMJ. 2018;360: k66.

    Article  PubMed  Google Scholar 

  14. Collier R. Legumes, lemons and streptomycin: a short history of the clinical trial. CMAJ. 2009;180:23–4.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Doll R. Sir Austin Bradford Hill and the progress of medical science. BMJ. 1992;305:1521–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hart PD. A change in scientific approach: from alternation to randomised allocation in clinical trials in the 1940s. BMJ. 1999;319:572–3.

    Article  PubMed Central  Google Scholar 

  17. Peto R. Reflections on the design and analysis of clinical trials and meta-analyses in the 1970s and 1980s. J R Soc Med. 2019;112(2):78–80.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Silverman WA. Personal reflections on lessons learned from randomized trials involving newborn infants from 1951 to 1967. Clin Trials. 2004;1:179–84.

    Article  PubMed  Google Scholar 

  19. Breslow NE, Day NE. The role of cohort studies in cancer epidemiology. In: Breslow NE, Day NE, editors. Statistical methods in cancer research. Volume II—the design and analysis of cohort studies. Lyon: IARC Scientific Publications; 1987.

    Google Scholar 

  20. Armitage P. Before and after Bradford Hill: some trends in medical statistics. J R Stat Soc A Stat Soc. 1995;158(1):143–53.

    Article  Google Scholar 

  21. Hill AB. The environment and disease: association or causation? Proc R Soc Med. 1965;58:295–300.

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Street DJ. Fisher’s contributions to agricultural statistics. Biometrics. 1990;46(4):937–45.

    Article  Google Scholar 

  23. Box GEP, Draper NR. Empirical model-building and response surfaces. New York: Wiley; 1987.

    Google Scholar 

  24. Box GEP. Statistics as a catalyst to learning by scientific method part II—a discussion. J Qual Technol. 1999;31(1):16–29.

    Article  Google Scholar 

  25. Montgomery DC. Design and analysis of experiments. 8th ed. London: Wiley; 2013.

    Google Scholar 

  26. Russell WMS, Burch RL. The principles of humane experimental technique. London: Methuen; 1959.

    Google Scholar 

  27. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.

    Article  CAS  PubMed  Google Scholar 

  28. Parker RMA, Browne WJ. The place of experimental design and statistics in the 3Rs. ILAR J. 2014;55(3):477–85.

    Article  CAS  PubMed  Google Scholar 

  29. Editorial. The ‘3Is’ of animal experimentation. Nat Genetics. 2012;44(6):611.

    Article  Google Scholar 

  30. Festing MFW. Randomized block experimental designs can increase the power and reproducibility of laboratory animal experiments. ILAR J. 2014;55:472–6.

    Article  CAS  PubMed  Google Scholar 

  31. Festing MFW, Altman DG. Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR J. 2002;432:244–58.

    Article  Google Scholar 

  32. Karp NA, Fry D. What is the optimum design for my animal experiment? BMJ Open Sci. 2021;5: e100126.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Scott S, Kranz JE, Cole J, Lincecum JM, Thompson K, Kelly N, Bostrom A, Theodoss J, Al-Nakhala BM, Viera FG, et al. Design, power, and interpretation of studies in the standard murine model of ALS. Amyotroph Later Scler. 2008;9:4–15.

    Article  CAS  Google Scholar 

  34. Lazic SE. Four simple ways to increase power without increasing the sample size. Lab Anim. 2018;52:621–9.

    Article  CAS  PubMed  Google Scholar 

  35. Muhlhauser BS, Bloomfield FH, Gillman MW. Whole animal experiments should be more like human randomized controlled trials. PLoS Biol. 2013;11(2): e1001481.

    Article  Google Scholar 

  36. Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. Challenges for assessing replicability in preclinical cancer biology. Elife. 2021;10: e67995.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Macleod MR, Mohan S. Reproducibility and rigor in animal-based research. ILAR J. 2020;60:17–23.

    Article  Google Scholar 

  38. Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, Fry D, Hutton J, Altman DG. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE. 2009;4(11): e0007824.

    Article  Google Scholar 

  39. Gaur A, Merz-Nideroest B, Zobel A. Clinical trials, good clinical practice, regulations, and compliance. Regul Focus Quart. 2021;1(1):15–31.

    Google Scholar 

  40. Silverman J, Macy J, Preisig P. The role of the IACUC in ensuring research reproducibility. Lab Anim (NY). 2017;46(4):129–35.

    Article  Google Scholar 

  41. Diong J, Butler AA, Gandevia SC, Héroux ME. Poor statistical reporting, inadequate data presentation and spin persist despite editorial advice. PLoS ONE. 2018;13(8): e0202121.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Lang TA, Altman DG. Basic statistical reporting for articles published in clinical medical journals the SAMPL Guidelines. In: Smart P, Masisonneuve H, Polderman AKS, editors. Science editors’ handbook. Paris: European Association of Science; 2013.

    Google Scholar 

  43. Makin TR, De Orban Xivry J-J. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019;8: e48175.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Preece DA. The design and analysis of experiments: what has gone wrong? Util Mathematica. 1982;21:201–44.

    Google Scholar 

  45. Preece DA. Illustrative examples: illustrative of what? J Roy Stat Soc Ser D. 1986;35(1):33–44.

    Google Scholar 

  46. Preece DA. Good statistical practice. J Roy Stat Soc Ser D. 1987;36(4):397–408.

    Google Scholar 

  47. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Nuzzo R. Statistical errors. Nature. 2014;506:150–2.

    Article  CAS  PubMed  Google Scholar 

  49. Marcus E. A STAR is born. Cell. 2016;166:1059–60.

    Article  CAS  PubMed  Google Scholar 

  50. Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991.

    Google Scholar 

  51. Karp NA. Reproducible preclinical research—is embracing variability the answer? PLoS Biol. 2018;16(3): e2005413.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010;8(6): e1000412.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Altman DG, Bland JM. Treatment allocation in controlled trials: why randomise? BMJ. 1999;318:1209.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Hirst JA, Howick J, Aronson JK, Roberts N, Perera R, Koshiaris C, Heneghan C. The need for randomization in animal trials: an overview of systematic reviews. PLoS ONE. 2014;9: e98856.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Reynolds PS, Garvan CW. Gap analysis of animal-based hemorrhage control research. “Houses of brick or mansions of straw?” Miltary Med. 2020;185:85–95.

    Google Scholar 

  56. Festing MFW. The “completely randomised” and the “randomised block” are the only experimental designs suitable for widespread use in pre-clinical research. Sci Rep. 2020;10:17577.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Lazic SE, Clarke-Williams CJ, Munafò MR. What exactly is “N” in cell culture and animal experiments? PLoS Biol. 2018;16: e2005282.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Parsons NR, Teare MD, Sitch AJ. Unit of analysis issues in laboratory-based research. eLife. 2018;7: e32486.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Frommlet F, Heinze G. Experimental replications in animal trials. Lab Anim. 2021;55(1):65–75.

    Article  CAS  PubMed  Google Scholar 

  60. Bolt T, Nomi JS, Bzdok D, Uddin L. Educating the future generation of researchers: A cross-disciplinary survey of trends in analysis methods. PLoS Biol. 2021;19(7): e3001313.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Gosselin RD. Insufficient transparency of statistical reporting in preclinical research: a scoping review. Sci Rep. 2021;11:3335.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Nevalainen T. Animal husbandry and experimental design. ILAR J. 2014;55(3):392–8.

    Article  CAS  PubMed  Google Scholar 

  63. Tukey JW. Unsolved problems of experimental statistics. J Am Stat Assoc. 1954;49:706–31.

    Google Scholar 

  64. Baker M. Is there a reproducibility crisis? Nature. 2016;533:452–4.

    Article  CAS  PubMed  Google Scholar 

  65. Brown AW, Kaisera K, Allison DB. Issues with data and analyses: errors, underlying themes, and potential solutions. Proc Natl Acad Sci. 2018;115(11):2563–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Sena ES, Currie GL. How our approaches to assessing benefits and harms can be improved. Anim Welf. 2019;28:107–15.

    Article  Google Scholar 

  67. Fisher RA. Presidential address to the first indian statistical congress. Sankhya. 1938;4:14–7.

    Google Scholar 

  68. Sprent P. Some problems of statistical consultancy. J Roy Stat Soc Ser A. 1970;133(2):139–65.

    Article  Google Scholar 

  69. Altman DG. Statistics and ethics in medical research: misuse of statistics is unethical. BMJ. 1980;281:1182–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Dunn HL. Application of statistical methods in physiology. Physiol Rev. 1929;9(2):275–398.

    Article  Google Scholar 

  71. Preece DA. Discussion on the papers on `statistics and mathematics’. J Roy Stat Soc Ser D. 1998;47(2):274.

    Google Scholar 

  72. Preece DA. Biometry in the third world: science not ritual. Biometrics. 1984;40(2):519–23.

    Article  Google Scholar 

  73. Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim. 2017;52(2):135–41.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Percie du Sert N, Hurst V, Ahluwalia A, Alam S, Avey MT, Baker M, Browne W, Clark A, Cuthill IC, Dirnagl U, et al. The ARRIVE guidelines 2.0: updated guidelines for reporting animal research. PLoS Biol. 2020;18(7): e3000410.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Altman DG, Simera I. Using reporting guidelines effectively to ensure good reporting of health research. In: Moher D, Altman DG, Schulz KF, Simera I, Wager E, editors. Guidelines for reporting health research: a user’s manual, edn. Chichester: Wiley; 2014. p. 32–40.

    Google Scholar 

Download references

Acknowledgements

Many thanks to Dr Tamara Hughes and three anonymous reviewers for useful suggestions that greatly improved the manuscript.

Funding

None to declare.

Author information

Authors and Affiliations

Authors

Contributions

PSR wrote the paper, designed, compiled, and composed the original draft, reviewed and revised the final draft. The author read and approved the final manuscript.

Corresponding author

Correspondence to Penny S. Reynolds.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

PSR was a member of the ARRIVE 2.0 International Working Group.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reynolds, P.S. Between two stools: preclinical research, reproducibility, and statistical design of experiments. BMC Res Notes 15, 73 (2022). https://0-doi-org.brum.beds.ac.uk/10.1186/s13104-022-05965-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s13104-022-05965-w

Keywords