Major data analysis errors invalidate cancer microbiome findings

Gihawi, Abraham ORCID:, Ge, Yuchen, Lu, Jennifer, Puiu, Daniela, Xu, Amanda, Cooper, Colin S. ORCID:, Brewer, Daniel S. ORCID:, Pertea, Mihaela and Salzberg, Steven L. (2023) Major data analysis errors invalidate cancer microbiome findings. mBIO, 14 (5). ISSN 2150-7511

[thumbnail of gihawi-et-al-2023-major-data-analysis-errors-invalidate-cancer-microbiome-findings]
PDF (gihawi-et-al-2023-major-data-analysis-errors-invalidate-cancer-microbiome-findings) - Published Version
Available under License Creative Commons Attribution.

Download (973kB) | Preview


We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.

Item Type: Article
Additional Information: Funding Information: S.L.S., Y.G., J.L., D.P., and M.P. acknowledge the support from the U.S. NIH under grants R01 HG006677 and R35-GM130151. A.G., C.S.C., and D.S.B. acknowledge the support from Prostate Cancer UK (MA-ETNA19-003), Big C Cancer Charity (ref 16-09R), The Bob Champion Cancer Trust, and Cancer Research UK.
Uncontrolled Keywords: bioinformatics,cancer,computational biology,keywords microbiome,metagenomics,virology,microbiology,sdg 3 - good health and well-being ,/dk/atira/pure/subjectarea/asjc/2400/2406
Faculty \ School: Faculty of Medicine and Health Sciences > Norwich Medical School
UEA Research Groups: Faculty of Medicine and Health Sciences > Research Groups > Cancer Studies
Faculty of Medicine and Health Sciences > Research Centres > Metabolic Health
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 03 Nov 2023 03:22
Last Modified: 14 Nov 2023 11:16
DOI: 10.1128/mbio.01607-23


Downloads per month over past year

Actions (login required)

View Item View Item