A well-known food researcher has stepped down from his university teaching and research posts following the retraction of six of his papers, the
National Post reports. The JAMA medical journals retracted the papers published by Cornell University professor Brian Wansink, after the university could not produce original data to verify the results of his research on consumer behavior. Reviews of Wansink's previous work allege that he had cherry-picked data points in his research to make the findings more likely to be published. Those reviews resulted in seven other papers being retracted.
According to Cornell, Wansink's academic misconduct also included misreporting data, problematic statistical techniques, failure to appropriately document and preserve research results, and inappropriate authorship. Researchers are not the only ones who engage in such practices. This kind of deception can arise from any sector of society, including corporations, governments, journalists, and educators.
Internal auditors need to know about the various inappropriate ways data can be collected and used. They should maintain a skeptical stance regarding what they see in their audit work, including financial statements, management reporting of results, assessments of program effectiveness/efficiency, and compliance with standards. Here are some observations about three of the most relevant issues to this story — misreporting data, methodology, and data quality and integrity — along with a few suggestions about how to fix the problems.
Misreporting data. The practice of
"p-hacking," in which researchers slice and dice a data set until an impressive-looking pattern emerges, has become prevalent. Also common is publication bias, which is the tendency to favor publication of studies with positive results. The increased presence of the internet and social media has further accentuated the problem.
Misreporting data can take various forms, from tweaking variables to show a desired result, to pretending that a finding proves an original hypothesis — in other words, uncovering an answer to a question that was only asked after the fact. For example, in psychology research, a result usually is considered statistically significant when a calculation called a p-value is less than or equal to 0.05. But excessive data massaging can produce a p-value lower than 0.05 just by random chance, making a hypothesis seem valid when it's actually a chance result. An insightful paper on this topic can be found
Sample sizes also matter in survey data analysis. They always should be reported — or at least made available — along with confidence levels and the methodologies applied to the data. Additionally, sample design and the avoidance of sample bias are important considerations in judging the validity of survey sample results.
Weak statistical methods. A related issue is the choice of statistics to represent the findings, and the importance of having a baseline/benchmark for expected results. A basic but prime example of the former is the bell curve. If you read that "the average of a group's score was five out of 10," that does not necessarily mean most scored a 5 — an "upright" bell curve. But the actual range of scores may be quite different. For example, half the group may have scored zero or one out of 10, and the other half nine out of 10 — which means that an "inverted" bell best represents the result. On the latter, understanding the differences between correlation and causation, and the use of a relevant baseline are important.
Here is a famous example: There is a strong positive correlation between the number of Nobel prizes the people of a country have earned and the quantity of chocolate eaten annually in that country. But this does not show that eating more chocolate will earn you a Nobel prize. Correlation does not imply causation. The countries that eat the most chocolate are the wealthier ones where chocolate is inexpensive and that tend to have more money to invest in education and research — resulting in more Nobel prizes.
Poor data quality and documentation. In many instances, researchers do not do enough to appropriately identify and categorize the quality of data used. This is particularly true where data sets originate from disparate systems or sources, historical data is used, and data definitions have not been validated for comparability. A systematic measurement of data quality and a disclosure against a standard (even a scorecard green, red, yellow type) alongside the published results would help alleviate problems of misinterpretation. And as data increasingly is captured electronically, it should be retained, along with its documentation, coding, and methodological routines.
Overall, pre-approval and pre-registration, including publicly, of research plans can help to address these three problems. That is especially the case when the specifics are addressed by stating exactly what the hypothesis is and what plans there are to test it and how. When these requirements are in place, there is less room for cherry-picking the most eye-catching results after the study is completed.
Wherever possible, more efforts should be made to run larger studies or replications, which are less likely to produce spurious results that get published. Researchers should describe their methods in more detail, and upload any materials or code to open databases, making it easier to review the basis of their work. Declaring the quality of the data used against a standard or benchmark also would help. And, journal editors should collaborate to establish and enforce consistently high standards for accepting and publishing research results.