Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive answer. In order to enhance the process of defining meaningful cluster resolution, we compare Bayesian Latent Dirichlet Allocation (LDA) method to its non-parametric counterpart, Hierarchical Dirichlet Process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.
Dirichlet process mixture models for single-cell RNA-seq clustering
- Award Group:
- Funder(s): European Union's Horizon 2020 research and innovation programme
- Award Id(s): 675395
- Funder(s):
- Award Group:
- Funder(s): Juhani Ahon Laketieteen Tutkimussuunnitelma
- Funder(s):
Currently Viewing Accepted Manuscript - Newer Version Available
Nigatu A. Adossa, Kalle T. Rytkönen, Laura L. Elo; Dirichlet process mixture models for single-cell RNA-seq clustering. Biol Open 2022; bio.059001. doi: https://doi.org/10.1242/bio.059001
Download citation file:
Advertisement
Read & Publish Open Access publishing: what authors say
We have had great feedback from authors who have benefitted from our Read & Publish agreement with their institution and have been able to publish Open Access with us without paying an APC. Read what they had to say.
Gatekeeping at BiO
In his Editorial, BiO Editor-in-Chief Dan Gorelick outlines the criteria by which articles submitted to BiO are assessed, as part of initiatives to increase transparency of journal 'gatekeeping'.
The Forest of Biologists
Our Publisher Claire Moulton recently visited the two Woodland Trust UK sites where we are planting new native trees for published Research and Review papers and protecting ancient woodland on behalf of our peer reviewers.
A Year at the Forefront
This series of Review articles aims to highlight the key discoveries, technological innovations, new resources and new hypotheses that have made an impact in a specific biological field during the past year. This publishing opportunity is available to early-career researchers, without a publication charge. Find out about eligibility and how to submit a proposal.
How we support early-career researchers
Biology Open, its sister journals and its not-for-profit publisher, The Company of Biologists, support early-career researchers in numerous ways, helping them grow their network and raise their profile. Find out what we can do to support you.