## Welcome to scPower

### What would you like to do?

## Welcome to scPower

### - a statistical framework for design and power analysis of multi-sample single cell transcriptomics experiments-

The tool supports the user to set the experimental parameters of cell type specific inter-individual DE and eQTL analysis using single cell RNA-seq data.

Experimental design suggestions are made in a way to optimize the power of the experiment.

scPower offers optimization for two different experimental settings:

**Detect cell types**(referred to as "cell type detection probability" in the figure above)**Detect DE/eQTL genes**(referred to as "Overall detection power")

#### Detect DE/eQTL genes

In this section you can find the parameter combination which maximizes the detection power of DE / eQTL genes. The**main plot**on the right side shows the

**detection power**depending on parameter combinations. You can choose 2 out of the 3 cost determining factors (sample size, cells per person, read depth) to be displayed on x- and y-Axis. Due to the fixed budget, the third one can be determined and will be displayed as circle size.

Depending on the overall budget, not all parameter combinations will be possible and some spots will stay white in the grid. An arrow called "selected study" points on the study with the highest detection power and the two plots below visualize the power curves for this study. The arrow can be set to any parameter combination by clicking on the main plot.

The detection power is the product of the

**expression probability**and the

**DE/eQTL power**. The expression probability shows how likely it is that the DE/eQTL genes are expressed, while the DE/eQTL power shows how likely it is to detect the genes as significant, given that they are expressed. The

**two lower plots**show the influence of the parameters on each of the probabilities. The plot on the left depicts the influence of the parameter on the x axis, while the parameter on the y axis is kept constant (taking the value of the selected study). The plots on the right shows the same for the parameter on the y axis.

The power analysis can be tailored to the users experimental setup with a lot of different parameters.

In case some parameters are unknown, the user can fall back to the defaults we provide.

The parameters are divided into different categories:

**General parameters****Multiple testing correction**

Both the p-value and the multiple testing strategy can be chosen. We recommend using FWER adjustement for eQTL studies and FDR adjustment for DE studies.**Mapping and multiplet estimation**

The more cells are loaded on a lane, the more multiplets are produced. These need to be discared before the analysis. Furthermore, since multiplets have a higher fraction of reads per cell than singlets, higher multiplet rates also reduce the target read depth.**Expression cutoffs**

A gene is defined as expressed, if it has a certain fraction of UMI counts per gene in a certain fraction of individuals (both parameters can be set). This influences the expression probability.**Special parameters**

The method of power calculation can be changed to speed up calculation or to increase accuracy (especially important for eQTL calculation).

#### Detect cell types

This section determines the power to detect a sufficient number of cells from a cell type of interest in each individual. This is important as a cell-type specific DE or eQTL analysis is only possible if enough cells of this cell type are detected. The method calculates the minimal number of cells per individual which are necessary to reach a sufficient power threshold.#### References

A detailed description of the complete model can be found in our publication:Schmid, K. T. et al. scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies. Nature Communications (2021)

All code including an offline version of this website, build with R shiny, is available as an R package on Github.

With the R package, the user can also fit and incorporate own priors for expression probabilities and effect sizes. This is due to runtime reasons not possible over the webserver.

The package contains a detailed introduction vignette explaining all necessary steps for the inclusion of custom priors.

### General parameters

### Cost and experimental parameters

##### Multiple testing correction

### Mapping and Multiplet estimation

##### Expression cutoffs

##### Special parameters

### Detection power depending on design parameters

Detection power depending on

Click the

**cells per individual**,**read depth**and**sample size**. Display two of those three parameters as x- and y-axis by selecting from the options in**'Parameter grid'**, the third one will be displayed as**circle size**.Click the

**Calculate optimal study**button to update the plots with the current set of parameters.**Click**on a specific point in the plot to visualize the exact trace in the plots below### Influence of design parameters on individual power components

The overall detection power is the result of

Below a visualization how the design choices influence those power components.

The dashed lines shows the location of the

**expression probability**(probability that the DE/eQTL genes are detected) and**DE power**(probability that the DE/eQTL genes are found significant).Below a visualization how the design choices influence those power components.

The plots show the influence of the y axis (left) and x axis (right) parameter of the upper plot onto the power of the selected study, while keeping the second parameter constant.

The dashed lines shows the location of the

**selected study**.### Study parameters

### Required cells per person to detect rare cell types with a certain power

The figure shows the required number of cells per individual (y-axis, log scale)
to detect the minimal number of cells from a target cell type per individuum (x-axis) with a certain
probability. The power depends on the total number of individuals and the frequency of the
target cell type. Note that the required number of cells per sample only counts
correctly measured cells (no doublets etc), so the number is a lower bound for the required cells to be sequenced.