Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data

Xu, Junlin, Cui, Lingyu, Zhuang, Jujuan, Meng, Yajie, Bing, Pingping, He, Binsheng, Tian, Geng, Pui, Choi Kwok, Wu, Taoyang ORCID: https://orcid.org/0000-0002-2663-2001, Wang, Bing and Yang, Jialiang (2022) Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data. Computers in Biology and Medicine, 146. ISSN 0010-4825

[thumbnail of AcceptedManuscript]
Preview
PDF (AcceptedManuscript) - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Abstract

Recent advances in single-cell RNA sequencing (scRNA-seq) provide exciting opportunities for transcriptome analysis at single-cell resolution. Clustering individual cells is a key step to reveal cell subtypes and infer cell lineage in scRNA-seq analysis. Although many dedicated algorithms have been proposed, clustering quality remains a computational challenge for scRNA-seq data, which is exacerbated by inflated zero counts due to various technical noise. To address this challenge, we assess the combinations of nine popular dropout imputation methods and eight clustering methods on a collection of 10 well-annotated scRNA-seq datasets with different sample sizes. Our results show that (i) imputation algorithms do typically improve the performance of clustering methods, and the quality of data visualization using t-Distributed Stochastic Neighbor Embedding; and (ii) the performance of a particular combination of imputation and clustering methods varies with dataset size. For example, the combination of single-cell analysis via expression recovery and Sparse Subspace Clustering (SSC) methods usually works well on smaller datasets, while the combination of adaptively-thresholded low-rank approximation and single-cell interpretation via multikernel learning (SIMLR) usually achieves the best performance on larger datasets.

Item Type: Article
Additional Information: Funding Information: This work was supported by the Hunan Provincial Innovation Foundation for Postgraduate of China (Grant No. CX20200434) awarded to Junlin Xu, and the National Natural Science Foundation of China (Grant No. 62172004) to Bing Wang.
Uncontrolled Keywords: single-cell rna sequencing,dropout imputation,cell clustering,t-sne,adjusted rand index,adjusted rand index,health informatics,computer science applications ,/dk/atira/pure/subjectarea/asjc/2700/2718
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Computational Biology
Faculty of Science > Research Centres > Centre for Ecology, Evolution and Conservation
Faculty of Science > Research Groups > Data Science and AI
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 13 Jun 2022 14:30
Last Modified: 10 Dec 2024 01:39
URI: https://ueaeprints.uea.ac.uk/id/eprint/85570
DOI: 10.1016/j.compbiomed.2022.105697

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item