Skip to content
Snippets Groups Projects
Commit 20ea08d2 authored by Jeffrey Pullin's avatar Jeffrey Pullin
Browse files

Actually delete the file...

parent ea4b53b9
No related branches found
No related tags found
No related merge requests found
Pipeline #6355 passed
---
title: "Process pbmc3k data"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r libraries}
library(TENxPBMCData)
library(scater)
library(scran)
library(EnsDb.Hsapiens.v86)
```
```{r set-seed}
set.seed(2332021)
```
```{r load-data}
pbmc3k <- TENxPBMCData(dataset = "pbmc3k")
counts(pbmc3k) <- as(counts(pbmc3k), "dgCMatrix")
```
```{r gene-annotation}
rownames(pbmc3k) <- uniquifyFeatureNames(
rowData(pbmc3k)$ENSEMBL_ID,
rowData(pbmc3k)$Symbol
)
# FIXME: Is this warning message safe to ignore?
location <- mapIds(
EnsDb.Hsapiens.v86,
keys = rowData(pbmc3k)$ENSEMBL_ID,
column = "SEQNAME", keytype = "GENEID"
)
```
The dataset is already filtered, so we don't perform cell detection.
```{r quality-control}
stats <- perCellQCMetrics(
pbmc3k,
subsets = list(Mito = which(location == "MT"))
)
high_mito <- isOutlier(stats$subsets_Mito_percent, type = "higher")
pbmc3k <- pbmc3k[, !high_mito]
# Remove genes with 0 counts.
zero_genes <- which(rowSums(counts(pbmc3k)) == 0)
pbmc3k <- pbmc3k[-zero_genes, ]
```
```{r normalization}
# Despite the name this is quite slow...
clusters <- quickCluster(pbmc3k)
pbmc3k <- computeSumFactors(pbmc3k, cluster = clusters)
pbmc3k <- logNormCounts(pbmc3k)
```
```{r variance-modelling}
dec_pbmc3k <- modelGeneVarByPoisson(pbmc3k)
# Select the top 10000 genes.
top_pbmc3k <- getTopHVGs(dec_pbmc3k, n = 10000)
pbmc3k <- pbmc3k[top_pbmc3k, ]
```
```{r dimensionality-reduction}
# Running PCA is slow.
pbmc3k <- runPCA(pbmc3k)
pbmc3k <- runTSNE(pbmc3k, dimred = "PCA")
pbmc3k <- runUMAP(pbmc3k, dimred = "PCA")
```
```{r plot-tsne}
plotTSNE(pbmc3k)
```
```{r plot-umap}
plotUMAP(pbmc3k)
```
```{r clustering}
# 15 is hand chose to give a clustering which 'looks right' on the tSNE plot.
g <- buildSNNGraph(pbmc3k, k = 15, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
colLabels(pbmc3k) <- factor(clust)
```
````{r plot-tsne-with-clustering}
plotTSNE(pbmc3k, colour_by = "label")
```
```{r plot-umap-with-clustering}
plotUMAP(pbmc3k, colour_by = "label")
```
In the simulations we only use a subset of cells to estimate the splatter parameters, so that the variance is not overestimated because of clustering.
Here we select cluster 9.
```{r define-subset}
subset_ind <- colLabels(pbmc3k) == 9
pbmc3k_subset <- pbmc3k[, subset_ind]
```
```{r save-subset}
saveRDS(pbmc3k_subset, here::here("data", "pbmc3k.rds"))
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment