Major data analysis errors invalidate cancer microbiome findings

Gihawi, Abraham, Ge, Yuchen, Lu, Jennifer, Puiu, Daniela, Xu, Amanda, Cooper, Colin S., Brewer, Daniel S., Pertea, Mihaela and Salzberg, Steven L. (2023) Major data analysis errors invalidate cancer microbiome findings. mBIO, 14 (5). ISSN 2150-7511

[thumbnail of gihawi-et-al-2023-major-data-analysis-errors-invalidate-cancer-microbiome-findings]
Preview
PDF (gihawi-et-al-2023-major-data-analysis-errors-invalidate-cancer-microbiome-findings) - Published Version
Available under License Creative Commons Attribution.

Download (973kB) | Preview

Abstract

We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.

Item Type: Article
Additional Information: Funding Information: S.L.S., Y.G., J.L., D.P., and M.P. acknowledge the support from the U.S. NIH under grants R01 HG006677 and R35-GM130151. A.G., C.S.C., and D.S.B. acknowledge the support from Prostate Cancer UK (MA-ETNA19-003), Big C Cancer Charity (ref 16-09R), The Bob Champion Cancer Trust, and Cancer Research UK.
Uncontrolled Keywords: bioinformatics,cancer,computational biology,keywords microbiome,metagenomics,virology,microbiology,sdg 3 - good health and well-being ,/dk/atira/pure/subjectarea/asjc/2400/2406
Faculty \ School: Faculty of Medicine and Health Sciences > Norwich Medical School
UEA Research Groups: Faculty of Medicine and Health Sciences > Research Groups > Cancer Studies
Faculty of Medicine and Health Sciences > Research Centres > Metabolic Health
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 03 Nov 2023 03:22
Last Modified: 06 Feb 2025 11:41
URI: https://ueaeprints.uea.ac.uk/id/eprint/93547
DOI: 10.1128/mbio.01607-23

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item