Skip to main navigation Skip to search Skip to main content

Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models

  • Department of Hematology
  • Vrije Universiteit Amsterdam
  • Amsterdam UMC
  • Department of Mathematics
  • Amsterdam UMC

Research output: Contribution to journalArticleAcademicpeer-review

25 Downloads (Pure)

Abstract

Background
An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can characterize a heterogeneous set of samples, and facilitates direct comparison of cell populations across the data samples. The multi-sample approach to statistical mixture modeling has been explored in a number of reports, mostly within a Bayesian framework and with high computational complexity. Although these approaches are effective, they are also computationally demanding, and therefore do not relate well to the requirement of scalability, which is essential in the multi-sample setting. This limits their utility in the analysis of large sets of large FCM samples.
Results
We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.
Conclusions
We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes.
Original languageEnglish
Article number262
Number of pages18
JournalBMC Bioinformatics
Volume26
Issue number1
DOIs
Publication statusPublished - 23 Oct 2025

Keywords

  • Gaussian mixture models
  • EM algorithm
  • Flow cytometry
  • Large-scale data
  • Clustering
  • Classification

Fingerprint

Dive into the research topics of 'Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models'. Together they form a unique fingerprint.

Cite this