Are your paper’s statistics complete and error-free?
The answer for almost every academic physician is likely “close, but not enough.” The truth is that even leading experts throughout medicine sometimes fall for statistical traps, especially those that are simple but non-intuitive. Below, we have outlined some of the most common errors that editors see in the statistical reporting of manuscript submissions, and how to avoid them.
What statistics should you include in your paper?
Often, to a researcher, the value of his results is seemingly apparent simply from descriptive statistics, but this completely ignores the usefulness of statistics in determining correlations and nonrandomness. So, after you have collected your data (and given mean/SD or median/quartiles), the vital question is: what statistical tests will give meaningful inferences from my data?
The answer to this question varies depending on your dataset and research question, but an excellent resource is the SAMPL Guidelines from the EQUATOR Network. Often, researchers will report only a basic P-value or CI, or use only a chi-squared, t-test, or regression analysis, when a more advanced test is applicable or necessary to demonstrate their findings. The statistical experts at Superior Medical Editing can also be a resource to you in determining which further testing would be appropriate for your research.
What are the most common errors in basic statistical reporting?
Even in the most simple aspects of statistical reporting and analysis, medical literature is rife with errors. These include:
- Not reporting numerators and denominators for percentages. For all medical journals, numerators and denominators should be reported with every result, or else the percentage itself is meaningless. It seems simple, but this (along with failing to show the number of patients at each stage of research) is among the most ubiquitous errors in statistical reporting.
- Misusing mean and standard deviation. Mean and standard deviation are actually applicable on only a small set of biological data: normal distributions. If your data are not normally distributed, mean and standard deviation are irrelevant, and you should be reporting the median and interquartile range instead.
- Using regression without confirming linearity and testing “goodness of fit”. Regression analysis can be a powerful tool for demonstrating correlation, but it can also be abused as a tool for communicating confirmation bias. In order to avoid seeing correlations that are tempting but nonexistent, you should graph the residuals of your regression analysis and find your r-squared (or R-squared, for multiple linear regression) value to show the strength of the correlation.
- Never P alone. This sounds funny, but many researchers have an unnecessarily reverent attitude towards any P < 0.05. Remember, confidence intervals are extremely important for showing the clinical implications of your results. P values alone show only the likelihood that a difference is nonrandom, so if the difference is negligible, it may still lack interpretive significance. Also, for any multiple analysis, your P values must be reduced to reflect the number of analyses!
- Relative vs. absolute risk/change. Reporting relative risk is an extremely tempting option, since it almost always inflates the differences between groups. However, for communication of the actual and meaningful differences between two or more groups of test subjects, only the absolute risk (or change) is truly useful.
- Abusing thresholds. Many a statistician has been forced to tears by clumsy placement of arbitrary thresholds, which both reduces the power of an experiment and can violate the often necessary condition of a priori reasoning for placing the threshold(s).
- Statistical significance means clinical importance, right? The Holy Grail of medical statistics, a SIGNIFICANT result, has led many a researcher into asserting the grave importance of a result that has essentially no interpretive value. Statistics can be extremely helpful to demonstrating the meaning of your results, but simply stating the statistical significance of a result asserts only nonrandomness. The onus is still on you to explain why this is clinically relevant and useful.
Tell me more!
This is just scratching the surface of pervasive reporting and analytic errors in medical statistics. For more information, the book How to Report Statistics in Medicine by Tom Lang* provides a wide range of guidelines, and will maneuver you around many hard-to-foresee statistical traps. And, as always, Superior Medical Editing and our team of experts is prepared to provide in-depth and focused guidance on your specific research!
*Tom Lang is a medical communications consultant whose presentation at the Council of Science Editors is the inspiration and source for this blog. His website is www.tomlangcommunications.com.