Introduction to the dimensionality reduction suite in the Cytobank platform – Cytobank

Background

What is the dimensionality reduction suite in the Cytobank platform?

The dimensionality reduction suite is a powerful tool for exploratory data analysis and data visualization. The suite contains four different dimensionality reduction algorithms. They are viSNE/tSNE¹, tSNE-CUDA², UMAP³ and opt-SNE⁴. These four algorithms can reduce high-dimensional data down to two dimensions for rapid exploratory data analysis and easy visualization.

How to use the Dimensionality Reduction Suite in your analysis workflow?

In cytometry data analysis, scientists usually run these dimensionality reduction algorithms after compensating, scaling and pre-gating the data. Following the dimensionality reduction analysis, scientists can create a dimensionality reduction map colored by channels or overlay dot plots in the Cytobank platform to visualize and interpret the results.

What additional features does the dimensionality reduction suite offer?

The dimensionality reduction suite offers a tool for users to easily compare the different results generated by the four algorithms. Please refer to “Comparison of the dimensionality reduction results within the settings page" to learn more about how to compare dimensionality reduction analysis results.

(Introduction to the dimensionality reduction suite in Cytobank)

Algorithms implemented in the dimensionality reduction suite

viSNE/t-SNE (t-distributed stochastic neighbor embedding)

viSNE/t-SNE is a non-linear dimensionality reduction algorithm developed based on Stochastic Neighbor Embedding. It uses probability value rather than distance to model the similarity between data points. viSNE in the Cytobank platform uses the Barnes-Hut implementation of the t-SNE algorithm ¹.

Here are some resources to learn more about viSNE:

The publication from the Pe’er lab at Columbia University that originally demonstrated the use of the tSNE algorithm in cytometry data: viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia

A viSNE analysis can be run on a dataset by logging into Premium or Enterprise Cytobank and using the Experiment Navigation bar to open the viSNE menu and create a new analysis.

opt-SNE (optimized t-SNE)

opt-SNE is a t-SNE based algorithm that can automatically optimize the early exaggeration process and the learning rate value for a t-SNE analysis run. This 2019 publication in Nature Communications shows that opt-SNE can produce superior dimensionality reduction results than the original t-SNE algorithms if the settings of the original t-SNE algorithms are not optimized.

tSNE-CUDA (GPU accelerated t-distributed stochastic neighbor embedding)

tSNE-CUDA is a state-of-art implementation of the t-SNE algorithm. It utilizes GPU to significantly reduce the computational time of the t-SNE algorithm. This 2019 publication shows that tSNE-CUDA significantly outperformed many current t-SNE implementations.

UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction)

Similar to t-SNE, UMAP is a non-linear dimension reduction algorithm. Unlike t-SNE, it models the underlying pattern of a dataset using a topological framework. UMAP in Cytobank uses a GPU implementation of UMAP. It is available in the RAPIDS cuML library. The author of this GPU accelerated UMAP algorithm claims that it can be up to 100x faster than other versions of the UMAP implementation.

REFERENCES:

L.J.P. van der Maaten, Journal of Machine Learning Research, 2014
Chan DM et al., Journal of Parallel and Distributed Computing, 2019
Corey J. Nolet. et al., arXiv, 2020, arXiv:2008.00325 http://arxiv.org/abs/2008.00325
Belkina A. et al., Nat Communications, 2019