Performance and benchmark run times for Autogating training and inference runs – Cytobank

Background

The Cytobank Automatic gating algorithm enables users to mimic the manual gating process using advanced deep learning models. There are two important steps when you conduct an analysis with your own data using the automatic gating algorithm. The first step is to train a model, and the second step is to perform model inference using the trained model. This article provides run time data for the training model and can serve as a reference for what to expect when changing the settings for an analysis.

There is no input size limit on the Autogating inference run.

Autogating training model max capacity and run time

Like the other machine learning algorithms the Cytobank platform offers, the automatic gating algorithm may fail to build a training model if the input data is too large for the computing resource allocations of your Cytobank server. When training an Automatic gating model, the number of channels used for gating, the number of events, number of populations all impact the running time. Reducing the size of these factors can significantly reduce the chance of failure when the algorithm trains a model.

Here is our test based on a dataset of 246 files with an event count of total 37 M events. We titrate the run based on the number of gating channels and number of events to define three zones, the green successful run zone, the yellow warning zone and red exceeding memory limits zone.

(Autogating model training limit based on the number of gating channels and number of events)

1) If the combination of the # of event and # of gating channels falls in thegreen zone, you should be able to run the training model successfully.

2) If the combination of the # of event and # of gating channels falls in the yellow warning zone, there will be a warning message indicating that the run may fail due to memory limits. You may reduce the event and/or populations or attempt the run anyway.

3) If the combination of the # of events and # of gating channels falls in the red warning zone, the algorithm will tell you the max # of events that you can run with the selected # of populations. You may reduce the selection based on the advice before being able to launch the run.

We also titrate the model training run time based on the # of populations selected and different data size. In general, the more # of populations and more # of events been trained, the more time is needed to complete the model training. Overall, the model training algorithm is very powerful and can complete the run fast.

(Autogating training mode runtimes titrate by number of populations and event number, data size of 200k and 1M events are used in the test)

(Autogating training mode runtimes titrate by number of populations and event number,data size of 4M, 7.5M and 10M events are used in the test)

Training magnification is another setting that will affect the running time. The value of training magnification will multiply the runtime if increased. For example, if set the value of training magnification to be 3 instead of the default 1, the run will take 3 times longer.