Clustering is a process in which events from raw data are analyzed across a number of parameters and then assigned into groups (i.e. "clusters"). A cluster is a bin for collecting events that are similar to each other based on the overall properties of each event in the data. One use case of clustering might be to take 100,000 events and assign them into 50 clusters. After the clustering, each one of those 100,000 events will belong to a cluster, and thus have an identifier of which cluster it belongs to. The identifiers will be numbers between 1 and 50. Consider this example SPADE tree with 50 nodes:
(a spade tree with 50 nodes)
If a subset of this same clustered data were visualized in a spreadsheet form:
(the clustered data from the SPADE tree - a cluster_id column is added to the original data to indicate which cluster each event belongs to. This principle is the same regardless of which clustering algorithm is used)
In Cytobank, if FCS files are ever exported from a SPADE tree (or imported from a clustering done elsewhere), the resulting files will have a cluster channel that can be used for downstream analysis. When visualized in a plot (with scales set to linear), discrete patterns will emerge that represent the events with integer values in the cluster column. These are example plots that can be viewed for the SPADE tree above:
(cluster channel versus a marker) (cluster channel versus itself)
Drawing gates around clusters (including automatically)
A gate can be drawn around the events with a certain cluster value in order to isolate events for a particular cluster. This can be useful for downstream analysis (see section below). Drawing gates around many clusters can be challenging and/or tedious. Cytobank Support has a tool that can be enabled in the gating interface that will generate cluster gates for all clusters. Create a support ticket to request access to this functionality!
(gates applied to all clusters automatically - only showing subset of clusters)
While in essence the drawing of gates around clusters is essentially cluster gating, a current limitation in this approach is that cluster gates that include collections of clusters can't readily be made. For example, it might be desired to have a gate that represents clusters 1, 17, 19, 20, and 45. Currently there is no simple way to do that besides drawing a tricky polygon gate that encircles the desired clusters while excluding others.
Applications of cluster gating
Visually compare clustering algorithm results
Use cluster gating in concert with colored overlay dot plots and viSNE to create a figure that show the results of viSNE and a different clustering algorithm at the same time. This is a way of comparing the results of any given clustering algorithm with those of viSNE:
(cluster results for a 10 node SPADE tree are colored on a viSNE plot. Both algorithms were run on the same data with the same channel selections. Areas where colors disagree show disagreements in the categorization tendency of either algorithm)