Introduction to the machine learning tools in the Cytobank platform – Cytobank

Background

Due to advancements in technology in the cytometry field, it is now possible to measure many parameters simultaneously on a single-cell level. The resulting data present challenges in the analysis due to data complexity, and data management and collaboration. Machine learning algorithm tools for dimensionality reduction, clustering and automatic gating are well-suited to handle this high-dimensional data. 

The unsupervised nature of many of these algorithms reduces bias that can be introduced by manual gating of known subpopulations (Keyes, T.J. et al. “A Cancer Biologist's Primer on Machine Learning Applications in High-Dimensional Cytometry”. Cytometry, 97;782-799) and enables the researcher to identify unexpected phenotypes. The supervised learning, especially automatic gating, can reduce inter-operator variability of manual population identification and streamline the data analysis workflow, which can help improve reproducibility and reliability. The reduction in hands-on time required to achieve an exhaustive analysis of high-dimensional datasets using machine learning algorithms compared to manual gating represents an additional advantage. 

Machine learning analysis algorithms

The Cytobank platform is a cloud-based analysis tool with integrated machinelearning-based analysis algorithms. The platform’s clustering, dimensionality reduction, automatic gating and visualization tools leverage the scalable computing and collaborative power of the cloud for large analyses to be done quickly. The Cytobank platform currently has a dimensionality reduction suite that includes four dimensionality reduction algorithms (viSNE, opt-SNE, tSNE-CUDA and UMAP), two clustering algorithms (SPADE, FlowSOM), one semi-supervised algorithm for biomarker identification (CITRUS) and one supervised learning (Automatic gating). You can set up machine learning algorithms on the Cytobank platform by clicking Advanced analyses in the blue navigation bar.

viSNE, tSNE-CUDA, UMAP, and opt-SNE are dimensionality reduction (DR) tools that enable you to reduce high-dimensional data into two dimensions, thereby enabling rapid exploratory analysis and visualization of complex results. Check out the article on Introduction to the dimensionality reduction suite in the Cytobank  platform to find more information on DR analyses. 

FlowSOM is a clustering algorithm that speeds time to analysis and quality of clustering with Self-Organizing Maps (SOMs), that can reveal how all markers are behaving on all cells, and can detect subsets that might otherwise be missed. It therefore helps define cell clusters with the same phenotype, and can be applied to better define which markers are expressed by the cells inside our gate. You can read more about FlowSOM on Introduction to FlowSOM in Cytobank. 

Another clustering algorithm that can be used to identify groups of similar cells is SPADE, which clusters phenotypically similar cells into a hierarchy that allows high-throughput, multidimensional analysis of heterogeneous samples. You can access  Overview and introduction to SPADE and find more information on SPADE. 

CITRUS is a data analysis tool that combines two important steps: a clustering and a statistical inference analysis. It then generates different models to identify differences among sample groups in a correlative or predictive way. Check out the article on Overview of CITRUS to understand more about CITRUS. 

Automatic gating is a supervised leaning tool that consists of two actions: Train new model and Run inference using model. It trains a model on your manually gated data as training data and then appliesthe trained model to predict the populations or subsets in new FCS file data.You can learn more about Automatic gating on Introduction of the automatic gating algorithm.