Commit 877ce5d6 authored by Ruqian Lyu's avatar Ruqian Lyu
Browse files

update readme and public/htmls

title: "BIOS_sctransform-normalisation"
author: "Ruqian Lyu"
date: "11/30/2020"
bibliography: ../bios.bib
output: html_document
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,dpi=100)
## sctransform
Different from the `size_factor`-based normalisation method that we used before,
`sctransform` is based on probabilistic models for normalisation and variance
stablisation of UMI-count data from scRNAseq. Instead of using a constant factor
for normalising all genes for one cell, `sctransform` [@Hafemeister2019-zh]
models and scales each gene individually.
`sctransform` fits a generalised linear model with a negative binomial error model
for each gene with the sequencing depth (library size) as a covariate followed by
a regularisation procedure to control overfitting. The residuals from the
regularised regression model is then treated as normalised expression levels
with variation caused by sequencing depth removed.
sce.pbmc <- readRDS(file = "raw_data/sce_pbmc.rds")
We use the `log10_umi` as the latent variable that will be regressed out in the
normalised gene expression values.
```{r runsct,message=F}
colData(sce.pbmc)$log10_umi <- log(colData(sce.pbmc)$total,base=10)
pbmc.sctrans <- suppressWarnings(sctransform::vst(assay(sce.pbmc,"counts"),
cell_attr = colData(sce.pbmc),
latent_var = "log10_umi", verbosity = FALSE))
## Warning related issue:
## Add to assay field
`pbmc.sctrans` stores returned value from running variance stablisation using
sctransform. The normalised values are stored in the matrix `y`. We now add `y`
to the `sce` object in the assay field with asaay name "SCT". This is equivalant
to the `logcounts` assay.
## Less genes will be returned by sctransform which filtered out genes that are
## only detected in 5 or less cells.
sce.pbmc <- sce.pbmc[rownames(pbmc.sctrans$y),]
assay(sce.pbmc,"SCT") <- pbmc.sctrans$y
## Select highly variable genes
Feature selection after `sctransform` normalisation is straightforward. We can
just select the top genes that have a high residual variance which contribute
the most biological sources of variation.
head(round(pbmc.sctrans$gene_attr[order(-pbmc.sctrans$gene_attr$residual_variance), ], 2),
select 3,000 highly variable genes for downstream analysis
hvgs_3k <- rownames(round(pbmc.sctrans$gene_attr[order(-pbmc.sctrans$gene_attr$residual_variance), ], 2),
## runPCA with HVGs
Next, we runPCA with the selected number of highly variable genes.
```{r PCA}
sce.pbmc <- runPCA(sce.pbmc,exprs_values="SCT",ncomponents=10,
Generate TSNE plot using PCs
```{r runtsne}
sce.pbmc <- runTSNE(sce.pbmc, dimred="PCA")
sce.pbmc <- runUMAP(sce.pbmc, dimred="PCA")
## Clustering
The remaining steps are similar to those presented in the main workflow.
```{r }
g <- buildSNNGraph(sce.pbmc, k=35, use.dimred = 'PCA')
clust <- igraph::cluster_walktrap(g)$membership
colLabels(sce.pbmc) <- factor(clust)
## Plot Clusters
plotTSNE(sce.pbmc, colour_by="label")
plotUMAP(sce.pbmc, colour_by="label")
plotExpression(sce.pbmc, features=c("CD14", "CD68",
"MNDA", "FCGR3A"), x="label", colour_by="label",exprs_values = "SCT")
## More info
`sctransform` is also integrated and interfaced with `Seurat` package which you
can find more information here:
## References
## SessionInfo
This diff is collapsed.
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment