Reporting Hypothesis Tests

Reporting Hypothesis Tests in Academic Manuscripts

It is important to provide transparent, accurate reporting of hypothesis tests as it relating to credibility and reproducibility in science.

Introduction

It is important to provide transparent, accurate reporting of hypothesis tests as it relating to credibility and reproducibility in science. Although there has been substantial methodological progress, unreliable practices – for example, selective reporting, poorly defined hypotheses, and non-invasive reporting of important statistical information – will continuously undermine the credibility of research (van den Akker et al., 2023; Kimmel et al., 2023). This article compiles best practices in reporting hypothesis tests, guided by recent methodological literature and reforms designed to increase the rigor and interpretability of published results (Chambers & Tzavella, 2022; McShane et al., 2024).

The Need for Clarity in Hypotheses

  • A clearly articulated hypothesis is a pillar of any empirical research study. Recent meta-research suggests that poor or selectively reported hypotheses are substantial contributors to publication bias and questionable research practices (van den Akker et al., 2023).
  • Authors must be clear about both null hypothesis (H0) and alternative hypothesis (H1), and clarify, for example, whether the test incorporates group differences, relationships, or effects. Berner and Amrhein (2022) advocate for increasing estimation and decreasing the reliance on binary significance testing; to accomplish this clear hypothesis are necessary along with a more descriptive analysis of the effect sizes.
  • It is equally important to clearly identify and summarize variables. As Johnson and Christensen (2024) indicate in an overview of research methods education, readers need to know what is being tested, how the variables are operationalized, and the distributional properties of each variable.
  • Authors should report continuous variables with mean and standard deviation if normally distributed or median and interquartile range otherwise (Kline, 2023), and generally follow suggestions to go beyond p-values to aid richer descriptions of data (McShane et al., 2024).
  • In hypothesis testing, the goal should not be just statistical significance, but also clinical or practical importance (Martin & Martinez, 2023). Reporting on the minimum clinically important difference (MCID) or equivalence margins where relevant will help the reader interpret the real-world influence of findings (Berner & Amrhein, 2022). For instance, a justifiable pre-specified margin should be clearly stated in cases of noninferiority or equivalence trials (Chambers & Tzavella, 2022).
  • Authors should not only name the specific test used but also indicate whether they adopted the one- or two-tailed version, and justify that decision (Johnson & Christensen, 2024).
  • Authors should also report if all the assumptions of the tests-the normality of data, homogeneity of variance, and independence of observations are met (or at least report that robust tests or nonparametric tests can be used where assumptions are not met) (Kline et al., 2023)
  • The integrity of open-ended modeling or other complex analyses is especially dependent on this reporting. Unarticulated choice and assumptions can potentially cause enormous bias in the final result.
  • New research allows us to question our binary treatment of p-values (McShane et al., 2024). We should be reporting the actual p-values reported along with confidence intervals (CI) that indicate plausible values for the parameter estimates (Berner & Amrhein, 2022), and it is recommended that we keep our eyes open toward estimation; this is one part of a broader upheaval toward reform and representation of statistical inference transparency (e.g., Chambers & Tzavella, 2022).
  • Authors should declare the alpha a priori (the common, of course, 0.05), and desire to focus on interpretations of effect estimates, effect sizes, and precision. CI should be reported ideally at the 95% level, and for each of the primary outcome measures.
  • Martin and Martinez (2023) raised an excellent point that “effect sizes with CIs allow readers to assess increasingly practical significance.” This is especially prescient to our fields that are moving toward a more common offer of effect sizes, and the confusion of the differences in significance with practical value remain at times.
  • Exact P-Values and Avoiding Misleading Terms

  • Ambiguous terms like “ns” (non-significant) or thresholds of “p < 0.05” without an exact value push selective reporting and contaminate meta-analytic work (van den Akker et al., 2023).
  • Authors should always provide exact p-values so others can independently assess the strength of the evidence. The exception is “p < 0.001” for extremely small values. But overall, being precise and transparent about p-values is critical (McShane et al., 2024).
  • Addressing Multiple Comparisons

  • When testing multiple hypotheses at the same time, the risk to make a Type I error increases. If authors don’t adjust for multiple comparisons, they are still committing the frequent problem of violating the principle among disciplines (Kimmel et al., 2023).
  • Authors should be explicit about whether they made adjustments for multiple tests, including how they did it, whether it was through Bonferonni correction, false discovery rate control, or some other method (Johnson & Christensen, 2024). This context is intended to replicate some of the reforms that support greater control of family-wise error rates in exploratory data analyses (Chambers & Tzavella, 2022).
  • Disclosure of Statistical Software

  • Authors often forget to mention the statistical software that they used in their analyses. While it seems minor, reporting which statistical software you used helps with reproducibility and is more transparent (Kline, 2023).
  • Different statistical software can yield different results from very similar algorithms. Authors should specify the software and version, including any relevant packages or scripts, as is consistent with open science initiatives (Chambers & Tzavella, 2022).
  • Addressing Selective Reporting and Exaggeration Bias

  • Selective reporting and “spin” are pervasive issues documented in psychology, ecology, and other fields (van den Akker et al., 2023; Kimmel et al., 2023).
  • Registered Reports, which entail peer review of research methods prior to beginning data collection, represent one of many promising solutions to minimize bias and support full reporting of results (Chambers & Tzavella, 2022).
  • According to McShane et al. (2024), journals must additionally promote standards for estimation and clear presentation to diminish “exaggeration bias.”
  • Conclusion

    Transparent reporting of hypothesis testing is fundamental to scientific integrity. By explicitly stating hypotheses, appropriately describing variables, selecting the appropriate tests, and reporting p-values and effect sizes in the exact numbers of the reported confidence intervals, authors help establish a more trustful and reproducible evidence base. Promoting an use of practices such as preregistration and Registered Reports also supports this rigor. While the scientific community progresses beyond the binary of statistical significance, these underlying principles will continue to be vital in nurturing the production of credible and meaningful research.

    Need an Experienced Researcher to Report Hypothesis Tests in Your PhD Thesis? PhD Assistance Research Lab provides a variety of specialized services to assist you with reporting hypothesis tests in your PhD research. Contact us today to enhance the transparency and accuracy of your methodology and statistical reporting for successful publication!

    References

    1. Berner, D., & Amrhein, V. (2022). Why and how we should join the shift from significance testing to estimation. Journal of Evolutionary Biology, 35(6), 777–787.

    2. Chambers, C. D., & Tzavella, L. (2022). The past, present and future of Registered Reports. Nature Human Behaviour, 6(1), 29–42.

    3. Johnson, R. B., & Christensen, L. B. (2024). Educational research: Quantitative, qualitative, and mixed approaches. Sage Publications.

    4. Kimmel, K., Avolio, M. L., & Ferraro, P. J. (2023). Empirical evidence of widespread exaggeration bias and selective reporting in ecology. Nature Ecology & Evolution, 7(9), 1525–1536.

    5. Kline, R. B. (2023). Principles and practice of structural equation modeling. Guilford Publications.

    6. Martin, E. L., & Martinez, D. A. (2023). The effect size in scientific publication. Educación XX1, 26(1), 9–17.

    7. McShane, B. B., Bradlow, E. T., Lynch Jr., J. G., & Meyer, R. J. (2024). “Statistical significance” and statistical reporting: moving beyond binary. Journal of Marketing, 88(3), 1–19.

    8. van den Akker, O. R., van Assen, M. A., Enting, M., de Jonge, M., Ong, H. H., Rüffer, F., & Bakker, M. (2023). Selective hypothesis reporting in psychology: Comparing preregistrations and corresponding publications. Advances in Methods and Practices in Psychological Science, 6(3), 25152459231187988.