Introduction to the machine learning tools in the Cytobank platform


Due to advancements in technology in the cytometry field, it is now possible to measure many parameters simultaneously on a single-cell level. The resulting data present challenges in the analysis due to data complexity, and data management and collaboration. Machine learning algorithm tools for dimensionality reduction and clustering are well-suited to handle this high-dimensional data. 

The unsupervised nature of many of these algorithms reduces bias that can be introduced by manual gating of known subpopulations (Keyes, T.J et. al. “A Cancer Biologist's Primer on Machine Learning Applications in High-Dimensional Cytometry”. Cytometry, 97;782-799) and enables the researcher to identify unexpected phenotypes. The reduction in hands-on time required to achieve an exhaustive analysis of high-dimensional datasets using machine learning algorithms compared to manual gating represents an additional advantage. 


Machine-learning analysis algorithms

The Cytobank platform is a cloud-based analysis tool with integrated machine-learning-based analysis algorithms. The platform’s clustering, dimensionality reduction, and visualization tools leverage the scalable computing and collaborative power of the cloud for large analyses to be done quickly. The Cytobank platform currently has a dimensionality reduction suite that includes four dimensionality reduction algorithms (viSNE, opt-SNE, tSNE-CUDA and UMAP), two clustering algorithms (SPADE, FlowSOM), and one supervised algorithm (CITRUS). You can set up machine learning algorithms on the Cytobank platform by clicking Advanced analyses in the blue navigation bar.




viSNE, tSNE-CUDA, UMAP, and opt-SNE are dimensionality reduction (DR) tools that enable you to reduce high dimensional data into two dimensions, thereby enabling rapid exploratory analysis and visualization of complex results. You may check out the article on Introduction to the dimensionality reduction suite in the Cytobank platform to find more information on DR analyses. 

FlowSOM is a clustering algorithm that speeds time to analysis and quality of clustering with Self-Organizing Maps (SOMs), that can reveal how all markers are behaving on all cells, and can detect subsets that might otherwise be missed. It therefore helps define cells clusters with the same phenotype, and can be applied to better define which markers are expressed by the cells inside our gate. You can read more about FlowSOM on Introduction to FlowSOM in Cytobank. 

Another clustering algorithm that may be used to identify groups of similar cells is SPADE, which clusters phenotypically similar cells into a hierarchy that allows high-throughput, multidimensional analysis of heterogeneous samples. You can access Overview and introduction to SPADE and find more info on SPADE. 

CITRUS is a data analysis tool that combines two important steps: a clustering and a statistical inference analysis. It then generates different models to identify differences among  sample groups in a correlative or predictive way. Check out the article on Overview of CITRUS to understand more about CITRUS. 

Have more questions? Submit a request