The Effect of FlowSOM Settings on Run Time
FlowSOM is a super fast clustering algorithm that is designed to analyze large cytometry datasets. It’s up to fifty times faster than SPADE. As the size of a dataset increases, FlowSOM becomes even faster than SPADE. There are a variety of settings available to configure for any given FlowSOM run. Understanding how changing these settings affects the overall run time of a FlowSOM analysis is useful when you analyze a large dataset. This article provides run time data for various FlowSOM analyses done using Cytobank and can serve as a reference for what to expect when changing the settings for an analysis.
The dataset used for these tests is a selection of data from the Bendall, et al. (Science, 2011) mass cytometry dataset, which is publicly available on all Cytobank servers. The settings, except where they are being variably tested, were default values. Click the links below to jump to the relevant section.
Discussion on Run Stability
FlowSOM Settings and Run Stability
The FlowSOM algorithm uses different amounts of computational resources depending on the settings of a FlowSOM run. Certain combinations of settings may increase the chances of a FlowSOM run failing as a result of going beyond the computing resource allocations currently allocated to your Cytobank server (write in to Cytobank support to request a compute upgrade if you suspect this is happening to you). Resource use increases most substantially as the settings increase for number of events, number of files, number of clusters, and number of channels. Pushing these values to their limits simultaneously is likely to result in failure. Among all settings, number of events impacts most on the amount of time the algorithm takes to complete. Number of iterations affects resource use less but do have a large impact on run time.
Event counts - Effect on Run Time
Increasing the event count increases the run time:
(Increasing the number of events can significantly increase FlowSOM run time.)
Number of Channels - Effect on Run Time
Increasing the number of channels increases the run time:
(Increased number of channels results in increased run time - displayed stratified by event count)
Number of Iterations - Effect on Run Time
Increasing the number of iterations increases the run time:
(Increased number of channel iterations results in increased run time - displayed stratified by event count)
Number of Clusters - Effect on Run Time
Increasing the number of clusters increases the run time:
(The testing data has 1M events with 30 channels.)