Background
SPADE is an algorithm available in the Cytobank platform that takes multiparameter data, performs clustering, and represents the clustered data as a two-dimensional minimum spanning tree of connected clusters. There are a small number of configuration steps necessary for running SPADE. This article provides an overview of these configuration steps and concepts to consider for setting up the analysis. Click the links below to jump to any section in this article.
- Select a Population
- Fold Change Configurations
- Select Clustering Channels
- Target Number of Nodes
- Downsampling Target
- Scale Settings and SPADE
- Compensation
- Include and Exclude Samples from a SPADE Run
- Adjust the Number of Events in the Analysis
Select a Population
Choose a population on which to run SPADE from the Population box on the SPADE setup page. This box is filled with the populations created in the Gating Editor of your experiment. Only one population can be chosen for a given SPADE run. If your experiment includes gating groups, the Select Population dialogue will display the populations nested under their corresponding gating group in a hierarchical manner. Each hierarchy will be preceded by the gating group name. Only the events within the selected population across all samples belonging to the same gating group in the experiment will be utilized for the SPADE analysis.
Which population is chosen for the SPADE run is a consequence of the analysis goal. Researchers interested in analysis of all available phenotypes should run SPADE on a high-level population that has had basic cleanup gating, such as CD45+ viable singlets. However, researchers looking to probe a smaller compartment (e.g., T cells only) would choose a more downstream gated population such as CD3+ cells on which to run the SPADE analysis.
(Select Population dialogue (left) showing the populations belonging to the gating groups named B cell and Pheno Basic. The Populations box (right) shows the gating group as a suffix of the population name between brackets.)
Fold Change Configurations
Fold change analysis between samples using SPADE is a powerful workflow. Read about it in a dedicated article: Fold Change with SPADE.
Select Clustering Channels
Which channels are chosen for SPADE analysis depends on the goals of the analysis. In most cases, the goal of a SPADE analysis is to automatically categorize events of interest into phenotype clusters for final categorization and analysis. If this is the case, then SPADE will need any channels that can be used to identify phenotypically distinct cell subsets. This is generally CD markers and non-CD phenotyping markers such as HLA-DR, IgM, CCR7, etc.
Signaling markers or other functional or dynamic markers can be chosen for clustering in SPADE as well. All the same principles of running and analyzing apply, but the nuances of this approach and the analysis strategy should be understood before running SPADE on these markers. Most of the time, signaling markers will not be selected for clustering, and will instead be analyzed after the SPADE tree is created.
A common mistake in choosing channels for a SPADE analysis is including linearly scaled channels among an otherwise non-linearly scaled collection of channels. Examples of linearly scaled channels can be forward scatter in fluorescence cytometry and cell_length or event_length in mass cytometry (CyTOF). Other channels can be linear as well, and scaling in general should be checked before a SPADE analysis. To learn more about the effect of scale settings on SPADE results, consult the scales section within this article.
Another common mistake seen in choosing channels for SPADE is choosing channels that don't contribute to the interpretable categorization of events but will still affect the analysis results. A good example of this is the time channel. Choosing the time channel in a SPADE analysis will often create nonsensical results. Time itself does not help identify a cell population and thus is not useful for categorization by SPADE. Note that time is often useful to gate out areas of the data that were not captured with high fidelity, but this manual gating step should be done before the SPADE analysis (read more about time gates).
Channels not appearing for selection in SPADE?
Note: the channels available to cluster events in a SPADE analysis must be common to all files in the experiment, or to all files belonging to the gating group selected if the experiment includes more than one gating group. If multiple files in the experiment or in the gating group have different sets of channels, only those channels common to all files will be available for selection in SPADE. Despite not being able to cluster on channels unique to certain files, the data for these channels will still be present for statistics and visualization after the SPADE run is complete. Some panel/channel conflicts can be solved in the Cytobank platform. Read this overview on how to work with panels and channels for more details.
When the experiment includes FCS files with different channels, a warning message will appear. See in a section below how you can leverage gating groups or the fold-change groups to exclude samples from a SPADE run.
(Message warning about the presence of FCS files with different channels in the experiment.)
Target Number of Nodes
The target number of nodes for a SPADE run determines how many clusters will be present in the results. The correct number of clusters to select presents a sort of Goldilocks problem. Setting the target number of clusters lower simplifies the tree but increases the chances of rare or subtle populations being undesirably clustered into an existing cluster with a cell population that is actually different. This effect can be referred to as underclustering. The alternative to this effect is setting a higher target number of clusters. A larger number of clusters will increase the chance of correctly isolating a rare or subtle population, but also increases the amount of noise in the results, since homogeneous abundant populations will be undesirably split across many clusters. Trending toward overclustering is likely desired in most cases because it can be mitigated during analysis by manually consolidating similar nodes within bubbles, whereas it can be hard or impossible to detect consequences of underclustering. Thus, a middle road needs to be taken that provides enough clusters for rare or subtle populations while not significantly overclustering the dataset. This useful value for target number of clusters will depend on preexisting knowledge of the complexity of the dataset, the number of channels in the data files, the needs of the researcher, and empirical evaluation of results from multiple runs on the same dataset.
(SPADE allows the user to set the target number of nodes or clusters to be created.)
Downsampling Target
Downsampling in context of SPADE refers to density-dependent downsampling. This routine operates before clustering on the data passed to SPADE. Density-dependent downsampling detects regions of density within a dataset and removes events in order to normalize the density across the dataset. The overall effect is that the structure and distribution of the data will remain consistent but areas of high abundance/redundancy will be lowered. This redistribution favors rare cell types to form their own clusters instead of otherwise being outnumbered by abundant cell types.
Downsampling is done on a per-file basis according to a percentage value or absolute number set by the researcher. When a percentage is used, each file is downsampled until the percent of events remaining in the file is equal to the target. For example, a file with 1000 events and a percent downsampling target of 10% would have 900 events removed for a total of 100 events remaining after downsampling. In the case of an absolute number target, the file is downsampled until that absolute number is hit. Note that references to files in this paragraph actually imply the number of events in the file after events are filtered according to gates for the chosen population. Event/gate filtering happens before downsampling.
(SPADE allows the user to set the downsampling based on a percent of total events in the input population or an absolute number of events.)
Scale Settings and SPADE
It is essential to set up the scales correctly before the run. For scale settings and SPADE, the general rule is that SPADE sees what you see based on the scale settings in the experiment. The exception to this rule, however, is that SPADE doesn't pay attention to scale min or max. If data are piled on the edge of the plot, SPADE doesn't see them in this fashion. It sees the data in the natural continuum uninterrupted by scale min and max. The one scale setting affecting SPADE is the scale argument of arcsinh scales. In general, if data are scaled appropriately for normal analysis involving manual gating, then the scale settings are fine for SPADE analysis. Please refer to the scaling support article and the blog post on how to scale cytometry data effectively for more details.
Note that having log-based scales may cause SPADE failures due to attempted calculation of the logarithm of a negative number or zero, which are common values within cytometry data. For log-like scaling that handles negative numbers and zero, please use arcsinh scales. Learn more about scale settings.
(Click on Edit scales to change the scale settings prior to running a SPADE analysis.)
Compensation
Compensation should be applied to fluorescent data before running a SPADE algorithm, as it would be for any other analysis of these data by selecting the appropriate compensation. The Cytobank platform uses the experiment-wide compensation to govern how compensation is applied to gates, illustrations and advanced analyses and other tools. For files uploaded by DROP, or FCS files that have no internal compensation matrix, you can leave the default option (file-internal compensation) and no compensation will be applied. For a SPADE run, the selected experiment-wide compensation will be used. If you wish to use another compensation, select it on the Compensation editor. Please see how the experiment-wide compensation works in the Cytobank platform for further information.
(Click on Edit to change the experiment-wide compensation that will be applied to your data before running a SPADE analysis.)
Include and Exclude Samples from a SPADE Run
By default, all sample files within an experiment are included in the SPADE run. If an experiment has gating groups enabled, only the files belonging to the gating group of the selected input population will be included in the SPADE run. There are three options to exclude samples from a SPADE run:
- Use the Selective Clone functionality to create different experiments that are based on the original experiment but with different sets of files.
- Create different gating groups and select the input population of the gating group with the files of interest.
- Unassign files in the Select fold-change group dialogue to exclude them from the analysis.
Adjust the Number of Events in the SPADE Analysis
Currently the overall number of events in the analysis and the events contributed per file cannot be adjusted. Consider using time gates to create populations with fewer events.
Have more questions? Submit a request
For Research Use Only. Not for use in diagnostic procedures.