Commit 877ce5d6 authored by Ruqian Lyu's avatar Ruqian Lyu
Browse files

update readme and public/htmls

parents
---
title: "BIOS_sctransform-normalisation"
author: "Ruqian Lyu"
date: "11/30/2020"
bibliography: ../bios.bib
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,dpi=100)
```
## sctransform
Different from the `size_factor`-based normalisation method that we used before,
`sctransform` is based on probabilistic models for normalisation and variance
stablisation of UMI-count data from scRNAseq. Instead of using a constant factor
for normalising all genes for one cell, `sctransform` [@Hafemeister2019-zh]
models and scales each gene individually.
`sctransform` fits a generalised linear model with a negative binomial error model
for each gene with the sequencing depth (library size) as a covariate followed by
a regularisation procedure to control overfitting. The residuals from the
regularised regression model is then treated as normalised expression levels
with variation caused by sequencing depth removed.
```{r}
suppressPackageStartupMessages({
library(sctransform)
library(ggplot2)
library(scater)
library(scran)
}
)
sce.pbmc <- readRDS(file = "raw_data/sce_pbmc.rds")
```
We use the `log10_umi` as the latent variable that will be regressed out in the
normalised gene expression values.
```{r runsct,message=F}
set.seed(10000)
colData(sce.pbmc)$log10_umi <- log(colData(sce.pbmc)$total,base=10)
pbmc.sctrans <- suppressWarnings(sctransform::vst(assay(sce.pbmc,"counts"),
cell_attr = colData(sce.pbmc),
latent_var = "log10_umi", verbosity = FALSE))
## Warning related issue: https://github.com/ChristophH/sctransform/issues/25
```
```{r}
#str(pbmc.sctrans)
```
## Add to assay field
`pbmc.sctrans` stores returned value from running variance stablisation using
sctransform. The normalised values are stored in the matrix `y`. We now add `y`
to the `sce` object in the assay field with asaay name "SCT". This is equivalant
to the `logcounts` assay.
```{r}
## Less genes will be returned by sctransform which filtered out genes that are
## only detected in 5 or less cells.
sce.pbmc <- sce.pbmc[rownames(pbmc.sctrans$y),]
assay(sce.pbmc,"SCT") <- pbmc.sctrans$y
```
## Select highly variable genes
Feature selection after `sctransform` normalisation is straightforward. We can
just select the top genes that have a high residual variance which contribute
the most biological sources of variation.
```{r}
head(round(pbmc.sctrans$gene_attr[order(-pbmc.sctrans$gene_attr$residual_variance), ], 2),
22)
```
select 3,000 highly variable genes for downstream analysis
```{r}
hvgs_3k <- rownames(round(pbmc.sctrans$gene_attr[order(-pbmc.sctrans$gene_attr$residual_variance), ], 2),
3000)
```
## runPCA with HVGs
Next, we runPCA with the selected number of highly variable genes.
```{r PCA}
set.seed(10000)
sce.pbmc <- runPCA(sce.pbmc,exprs_values="SCT",ncomponents=10,
subset_row=hvgs_3k)
```
Generate TSNE plot using PCs
```{r runtsne}
set.seed(10000)
sce.pbmc <- runTSNE(sce.pbmc, dimred="PCA")
sce.pbmc <- runUMAP(sce.pbmc, dimred="PCA")
```
## Clustering
The remaining steps are similar to those presented in the main workflow.
```{r }
g <- buildSNNGraph(sce.pbmc, k=35, use.dimred = 'PCA')
clust <- igraph::cluster_walktrap(g)$membership
colLabels(sce.pbmc) <- factor(clust)
```
## Plot Clusters
```{r}
plotTSNE(sce.pbmc, colour_by="label")
plotUMAP(sce.pbmc, colour_by="label")
```
```{r}
plotExpression(sce.pbmc, features=c("CD14", "CD68",
"MNDA", "FCGR3A"), x="label", colour_by="label",exprs_values = "SCT")
```
## More info
`sctransform` is also integrated and interfaced with `Seurat` package which you
can find more information here: https://satijalab.org/seurat/v3.2/sctransform_vignette.html
## References
## SessionInfo
```{r}
sessionInfo()
```
This diff is collapsed.
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment