Setting a successful PeacoQC process contains the following steps:
Navigate to a newly uploaded experiment or an existing experiment in your Cytobank account and click DataQC on the blue navigation bar, then enter a name for this PeacoQC process.
Choose all the FCS files that will be used for PeacoQC processing by clicking Choose at the upper corner or click into the box region. Only files with Time channels can be selected to run PeacoQC process. Only files under 500MB size can be selected. If one of your files exceeded 500MB in size, you may consider using a slice of time or pre-gate to a subset to reduce the file size. Files over 8 million events or less than 1500 events will also not be able to run. Please refer to the Benchmark for PeacoQC process run capacity for more details of the run capacity.
Choose any fluorescent or metal channel to be used for PeacoQC processing by clicking Choose or click anywhere inside the channel box. Please remember that the Time channel will be automatically selected and can't be deselected. At least two channels must be selected in order to run PeacoQC process.
If you are working with flow cytometry data, you may choose to enable Remove margins. For mass cytometry data, there is no need to remove margins and the option will be disabled.
Events on the scale that are before the minimum or after the maximum scale range are called 'margin events'. Remove margins will remove events outside of the scale range of selected channels. There are two options to remove margins: you can use the experiment scales set by the Cytobank platform or use the file-internal scales. Use experiment scales means that it will remove margins based on the minimum and maximum of the scales defined in the Cytobank platform for each selected channel. This means the scales are applied globally to each file identically. If you choose file-internal scales, then the scales are based on the Range ($PnR) values in each individual FCS file.
Anomaly detection method
There are three options to filter and remove anomaly events: IT (isolation tree) position, MAD (Mean absolute deviation) distance, or both. When selecting Both, which is the recommended approach, the algorithm will use both IT and MAD methods.
- IT (isolation tree)
The first filtering step is done by using an isolation tree (IT), a method based on the anomaly detection method Isolation Forest which isolates every point in the data and classifies them as outliers if they are easily separable.
- MAD (Mean absolute deviation)
Since the isolation tree (IT) sometimes fails to find subtle changes in the peak ranges, a second filtering step is applied. In this filtering step, the noise is first removed by applying a smoothing spline to all peak ranges individually. Then, a bin is filtered out when its smoothed value is further than the default of six Median Absolute Deviations (MAD) away from the median of the entire smoothed peak range for any of the peaks.
The number of MADs (Mean absolute deviation) allowed is evaluated after the selection of the isolation tree (IT) to identify strong effects only present in individual channels, which might be missed in the high dimensional space. The default is 6. The lower the number of MADs allowed, the more strict the algorithm will be and more cells will be removed.
IT Limit parameter
The gain limit of the isolation tree indicates how strong outlier datapoints influence the standard deviation evaluated when building the Isolation Forest tree. The default IT_limit is 0.6. By lowering this limit, the algorithm will be more strict and outliers will be removed sooner.
The PeacoQC process works by splitting the events in each file into bins, which are sets of events that are continuous along the Time channel. The PeacoQC process then decides for each bin whether or not to mark the whole bin as anomaly and remove it. The PeacoQC process will automatically select the appropriate number of bins, with this parameter defining the maximum upper limit. If this parameter is lowered, then larger bins (more events per bin) may be made, and if the parameter is raised, smaller bins may be made. The default value is 500 and it can range from 40 to 1,000,000. The default value here should work well for almost all use cases, so we do not recommend adjusting it.
To avoid small regions being kept while the bins around them have been filtered out, any remaining regions of N consecutive bins or less will also be removed (where N is this parameter, defaulting to 5). This sometimes occurs when the algorithm is not stringent enough, and a bin in a noisy region appears quite similar to the stable region in another part of the measurement. Increasing this parameter will increase the number of consecutive bins to be removed, thus making the algorithm more strict.
You will need to inspect and set the scale settings for any channels that will be used for The PeacoQC process, and if appropriate, compensate or unmix the FCS files prior to running the PeacoQC process.
Data scaling: The PeacoQC process scales data according to the scale settings of the experiment. If you have chosen the file-internal scales as your remove margin option, The PeacoQC process will use the min and max value of the $PnR of each file, and use the experiment argument value set by the experiment to transform the scale. You can refer to View and adjust scale type, maximum, minimum, and scale argument for a channel.
Compensation: The experiment-wide-compensation will be used. You can refer to How the experiment-wide compensation works in the Cytobank platform.
Please refer to the original paper for more details.
Emmaneel A, Quintelier K, Sichien D, Rybakowska P, Marañón C, Alarcón-Riquelme ME, Van Isterdael G, Van Gassen S, Saeys Y. PeacoQC: Peak-based selection of high quality cytometry data. Cytometry A. 2022 Apr;101(4):325-338. doi: 10.1002/cyto.a.24501. Epub 2021 Oct 3. PMID: 34549881; PMCID: PMC9293479.