If we set an FDR threshold of 0.1%, this approach identifies around 1300 highly
variable genes.
The output of this variance modelling can be used as input to a `denoisePCA()`
function to compute "denoised" principal components for clustering and other
downstream analyses (details not shown here; please see the `simpleSingleCell`
workflow).
#### High Dropout Genes
An alternative to finding HVGs is to identify genes with unexpectedly high numbers of zeros.
The frequency of zeros, known as the "dropout rate", is very closely related to expression level
in scRNASeq data. Zeros are the dominant feature of single-cell RNASeq data, typically accounting
for over half of the entries in the final expression matrix. These zeros predominantly result
from the failure of mRNAs failing to be reversed transcribed [(Andrews and Hemberg, 2016)](http://www.biorxiv.org/content/early/2017/05/25/065094). Reverse transcription
is an enzyme reaction thus can be modelled using the Michaelis-Menten equation:
An alternative to finding HVGs is to identify genes with unexpectedly high
numbers of zeros. The frequency of zeros, known as the "dropout rate", is very
closely related to expression level in scRNASeq data. Zeros are the dominant
feature of single-cell RNASeq data, typically accounting for over half of the
entries in the final expression matrix. These zeros predominantly result from
the failure of mRNAs failing to be reversed transcribed [(Andrews and Hemberg,
Plot the expression of the features for each of the other methods. Which appear
to be differentially expressed? How consistent are the different methods for
this dataset?
```{r, fig.width = 7, fig.height = 10}
M3DropExpressionHeatmap(
DANB_genes,
expr_matrix,
cell_labels = celltype_labs
)
```
Plot the expression of the features for each of the other methods. Which appear to be differentially expressed? How consistent are the different methods for this dataset?