Commit 95b8e073 authored by Davis McCarthy's avatar Davis McCarthy
Browse files

Updating DE, imputation and projection chapters

parent 52831f27
Pipeline #956 passed with stage
in 8 seconds
......@@ -4,7 +4,7 @@ output: html_document
---
```{r setup, echo=FALSE}
knitr::opts_chunk$set(out.width='90%', fig.align = 'center', eval=FALSE)
knitr::opts_chunk$set(out.width='90%', fig.align = 'center', eval=TRUE, message=FALSE, warning=FALSE)
knitr::opts_knit$set(root.dir = normalizePath(".."))
```
......@@ -13,7 +13,7 @@ knitr::opts_knit$set(root.dir = normalizePath(".."))
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(scRNA.seq.funcs)
library(edgeR)
library(monocle)
library(limma)
library(MAST)
library(ROCR)
set.seed(1)
......@@ -21,10 +21,13 @@ set.seed(1)
### Introduction
To test different single-cell differential expression methods we will be using the Blischak dataset from Chapters 7-17.
For this experiment bulk RNA-seq data for each cell-line was generated in addition to single-cell data. We will use the
differentially expressed genes identified using standard methods on the respective bulk data as the ground truth for evaluating the
accuracy of each single-cell method. To save time we have pre-computed these for you. You can run the commands below to load these data.
To test different single-cell differential expression methods we will be using
the Blischak dataset from Chapters 7-17. For this experiment bulk RNA-seq data
for each cell-line was generated in addition to single-cell data. We will use
the differentially expressed genes identified using standard methods on the
respective bulk data as the ground truth for evaluating the accuracy of each
single-cell method. To save time we have pre-computed these for you. You can run
the commands below to load these data.
```{r}
DE <- read.table("data/tung/TPs.txt")
......@@ -35,7 +38,8 @@ GroundTruth <- list(
)
```
This ground truth has been produce for the comparison of individual NA19101 to NA19239. Now load the respective single-cell data:
This ground truth has been produce for the comparison of individual NA19101 to
NA19239. Now load the respective single-cell data:
```{r}
molecules <- read.table("data/tung/molecules.txt", sep = "\t")
......@@ -46,17 +50,19 @@ group <- anno[keep,1]
batch <- anno[keep,4]
# remove genes that aren't expressed in at least 6 cells
gkeep <- rowSums(data > 0) > 5;
counts <- data[gkeep,]
counts_mat <- as.matrix(data[gkeep,])
# Library size normalization
lib_size = colSums(counts)
norm <- t(t(counts)/lib_size * median(lib_size))
lib_size <- colSums(counts_mat)
norm <- t(t(counts_mat)/lib_size * median(lib_size))
# Variant of CPM for datasets with library sizes of fewer than 1 mil molecules
```
Now we will compare various single-cell DE methods. We will focus on methods
that performed well in Soneson and Robinson's [2019; CITE] detailed comparison
of differential expression methods for single-cell data. Note that we will only
be running methods which are available as R-packages and run relatively quickly.
that performed well in Soneson and Robinson's [@Soneson2018-hy] detailed
comparison of differential expression methods for single-cell data. Note that we
will only be running methods which are available as R-packages and run
relatively quickly.
### Kolmogorov-Smirnov test
......@@ -103,10 +109,14 @@ length(sigDE)
sum(GroundTruth$DE %in% sigDE)
# Number of KS-DE genes that are true DE genes
sum(GroundTruth$notDE %in% sigDE)
# Number of KS-DE genes that are truly not-DE
# Number of KS-DE genes that are truly not-DE
```
As you can see many more of our ground truth negative genes were identified as DE by the KS-test (false positives) than ground truth positive genes (true positives), however this may be due to the larger number of notDE genes thus we typically normalize these counts as the True positive rate (TPR), TP/(TP + FN), and False positive rate (FPR), FP/(FP+TP).
As you can see many more of our ground truth negative genes were identified as
DE by the KS-test (false positives) than ground truth positive genes (true
positives), however this may be due to the larger number of notDE genes thus we
typically normalize these counts as the True positive rate (TPR), TP/(TP + FN),
and false discovery rate (FPR), FP/(FP+TP).
```{r}
tp <- sum(GroundTruth$DE %in% sigDE)
......@@ -114,12 +124,18 @@ fp <- sum(GroundTruth$notDE %in% sigDE)
tn <- sum(GroundTruth$notDE %in% names(pVals)[pVals >= 0.05])
fn <- sum(GroundTruth$DE %in% names(pVals)[pVals >= 0.05])
tpr <- tp/(tp + fn)
fpr <- fp/(fp + tn)
cat(c(tpr, fpr))
fdr <- fp/(fp + tp)
cat(c(tpr, fdr), "\n")
```
Now we can see the TPR is much higher than the FPR indicating the KS test is identifying DE genes.
So far we've only evaluated the performance at a single significance threshold. Often it is informative to vary the threshold and evaluate performance across a range of values. This is then plotted as a receiver-operating-characteristic curve (ROC) and a general accuracy statistic can be calculated as the area under this curve (AUC). We will use the ROCR package to facilitate this plotting.
Now we can see the TPR is OK, but the FDR is very high--- much higher than we
would normally deem acceptable.
So far we've only evaluated the performance at a single significance threshold.
Often it is informative to vary the threshold and evaluate performance across a
range of values. This is then plotted as a receiver-operating-characteristic
curve (ROC) and a general accuracy statistic can be calculated as the area under
this curve (AUC). We will use the ROCR package to facilitate this plotting.
```{r ks-roc-plot, fig.cap="ROC curve for KS-test."}
# Only consider genes for which we know the ground truth
......@@ -133,24 +149,33 @@ ROCR::plot(perf)
aucObj <- ROCR::performance(pred, "auc")
aucObj@y.values[[1]] # AUC
```
Finally to facilitate the comparisons of other DE methods let's put this code into a function so we don't need to repeat it:
Finally to facilitate the comparisons of other DE methods let's put this code
into a function so we don't need to repeat it:
```{r}
DE_Quality_AUC <- function(pVals) {
DE_Quality_AUC <- function(pVals, plot=TRUE) {
pVals <- pVals[names(pVals) %in% GroundTruth$DE |
names(pVals) %in% GroundTruth$notDE]
truth <- rep(1, times = length(pVals));
truth[names(pVals) %in% GroundTruth$DE] = 0;
pred <- ROCR::prediction(pVals, truth)
perf <- ROCR::performance(pred, "tpr", "fpr")
ROCR::plot(perf)
if (plot)
ROCR::plot(perf)
aucObj <- ROCR::performance(pred, "auc")
return(aucObj@y.values[[1]])
}
```
### Wilcox/Mann-Whitney-U Test
The Wilcox-rank-sum test is another non-parametric test, but tests specifically if values in one group are greater/less than the values in the other group. Thus it is often considered a test for difference in median expression between two groups; whereas the KS-test is sensitive to any change in distribution of expression values.
### Wilcoxon/Mann-Whitney-U Test
The Wilcoxon-rank-sum test is another non-parametric test, but tests
specifically if values in one group are greater/less than the values in the
other group. Thus it is often considered a test for difference in median
expression between two groups; whereas the KS-test is sensitive to any change in
distribution of expression values.
```{r wilcox-plot, fig.cap="ROC curve for Wilcox test.", message=FALSE, warning=FALSE}
pVals <- apply(
......@@ -173,17 +198,34 @@ We've already used edgeR for differential expression in Chapter
gene expression and uses a generalized linear model (GLM) framework, the enables
us to include other factors such as batch to the model.
```{r edger-plot, fig.cap="ROC curve for edgeR.", message=FALSE}
`edgeR`, like many DE methods work best with pre-filtering of very lowly
experessed genes. Here, we will keep genes with non-zero expression in at least
30 cells.
The first steps for this analysis then involve
```{r edger-plot, fig.cap="Biological coefficient of variation plot for edgeR.", message=FALSE}
keep_gene <- (rowSums(counts_mat > 0) > 29.5 & rowMeans(counts_mat) > 0.2)
table(keep_gene)
dge <- DGEList(
counts = counts,
norm.factors = rep(1, length(counts[1,])),
counts_mat = counts_mat[keep_gene,],
norm.factors = rep(1, length(counts_mat[1,])),
group = group
)
group_edgeR <- factor(group)
design <- model.matrix(~ group_edgeR)
dge <- estimateDisp(dge, design = design, trend.method = "none")
fit <- glmFit(dge, design)
res <- glmLRT(fit)
dge <- calcNormFactors(dge, method = "TMM")
summary(dge$samples$norm.factors)
dge <- estimateDisp(dge, design = design)
plotBCV(dge)
```
Next we fit a negative binomial quasi-likelihood model for differential
expression.
```{r edger-plot, fig.cap="ROC curve for edgeR.", message=FALSE}
fit <- glmQLFit(dge, design)
res <- glmQLFTest(fit)
pVals <- res$table[,4]
names(pVals) <- rownames(res$table)
......@@ -200,7 +242,7 @@ using a hurdle model to combine tests of discrete (0 vs not zero) and continuous
framework to enable complex models to be considered.
```{r MAST-plot, fig.cap="ROC curve for MAST.", message=FALSE}
log_counts <- log(counts + 1) / log(2)
log_counts <- log2(counts_mat + 1)
fData <- data.frame(names = rownames(log_counts))
rownames(fData) <- rownames(log_counts);
cData <- data.frame(cond = group)
......@@ -211,7 +253,7 @@ colData(obj)$cngeneson <- scale(colSums(assay(obj) > 0))
cond <- factor(colData(obj)$cond)
# Model expression as function of condition & number of detected genes
zlmCond <- zlm.SingleCellAssay(~ cond + cngeneson, obj)
zlmCond <- zlm(~ cond + cngeneson, obj)
summaryCond <- summary(zlmCond, doLRT = "condNA19101")
summaryDt <- summaryCond$datatable
......@@ -226,10 +268,78 @@ DE_Quality_AUC(pVals)
### limma
The `limma` package implements Empirical Bayes linear models for identifying
differentially expressed genes. Originally developed more microarray data, the
extended limma-voom method has proven to be a very fast and accurate DE method
for bulk RNA-seq data. With its speed and flexibility, `limma` also performed
well for DE analysis of scRNA-seq data in the benchmarking study mentioned
earlier.
Here we demonstrate how to use `limma` for DE analysis on these data.
```{r limma-de}
library(limma)
## we can use the same design matrix structure as for edgeR
design <- model.matrix(~ group_edgeR)
colnames(design)
## apply the same filtering of low-abundance genes as for edgeR
keep_gene <- (rowSums(counts_mat > 0) > 29.5 & rowMeans(counts_mat) > 0.2)
summary(keep_gene)
y <- counts_mat[keep_gene, ]
v <- voom(y, design)
fit <- lmFit(v, design)
res <- eBayes(fit)
topTable(res)
pVals <- res$p.value
names(pVals) <- rownames(y)
pVals <- p.adjust(pVals, method = "fdr")
DE_Quality_AUC(pVals)
```
__Q:__ How does `limma` compare on this dataset?
### Pseudobulk
In this approach, we sum counts across cells from the same condition (here we
will use batch within individual) to create "pseudobulk" samples that we will
analyse for DE the way we would analyse bulk RNA-seq data. Research continues on
benchmarking pseudobulk and single-cell DE approaches, but the suggestion from
certain (we think reliable) sources is that pseudobulk approaches are more
robust and accurate (and typically very fast) in a range of settings where we're
interested in differences between groups or conditions rather than individual
cells.
```{r pseudobulk}
summed <- scater::sumCountsAcrossCells(counts_mat, factor(batch))
head(summed)
y <- DGEList(summed)
y <- y[aveLogCPM(y) > 1,]
y <- calcNormFactors(y)
individ <- gsub("\\.r[123]", "", colnames(summed))
design <- model.matrix(~individ)
head(design)
y <- estimateDisp(y, design)
summary(y$trended.dispersion)
plotBCV(y)
fit <- glmQLFit(y)
res <- glmQLFTest(fit)
summary(decideTests(res))
pVals <- res$table[,4]
names(pVals) <- rownames(res$table)
pVals <- p.adjust(pVals, method = "fdr")
DE_Quality_AUC(pVals)
```
__Q:__ What do you make of these pseudobulk results? What are the pros and cons
of this approach?
### Reflection
* What is your overall assessment of the DE methods presented in this chapter?
* When might you choose to use one method in preference to others?
* What caveats or potential problems can you think of with conducting DE testing
using approaches like those presented here?
### sessionInfo()
......
......@@ -3,7 +3,7 @@ output: html_document
---
```{r setup, echo=FALSE}
knitr::opts_chunk$set(out.width='90%', fig.align = 'center', eval=FALSE)
knitr::opts_chunk$set(out.width='90%', fig.align = 'center', eval=TRUE, message=FALSE, warning=FALSE)
knitr::opts_knit$set(root.dir = normalizePath(".."))
```
......@@ -24,17 +24,21 @@ As discussed previously, one of the main challenges when analyzing scRNA-seq
data is the presence of zeros, or dropouts. The dropouts are assumed to have
arisen for three possible reasons:
* The gene was not expressed in the cell and hence there are no transcripts to sequence
* The gene was expressed, but for some reason the transcripts were lost somewhere prior to sequencing
* The gene was expressed and transcripts were captured and turned into cDNA, but the sequencing depth was not sufficient to produce any reads.
Thus, dropouts could be result of experimental shortcomings, and if this is
the case then we would like to provide computational corrections. One possible
solution is to impute the dropouts in the expression matrix. To be able to
impute gene expression values, one must have an underlying model. However,
since we do not know which dropout events are technical artefacts and which
correspond to the transcript being truly absent, imputation is a difficult
challenge and prone to creating [false-positive results in downstream analysis](https://f1000research.com/articles/7-1740/v2).
* The gene was not expressed in the cell and hence there are no transcripts to
sequence
* The gene was expressed, but for some reason the transcripts were lost
somewhere prior to sequencing
* The gene was expressed and transcripts were captured and turned into cDNA, but
the sequencing depth was not sufficient to produce any reads.
Thus, dropouts could be result of experimental shortcomings, and if this is the
case then we would like to provide computational corrections. One possible
solution is to impute the dropouts in the expression matrix. To be able to
impute gene expression values, one must have an underlying model. However, since
we do not know which dropout events are technical artefacts and which correspond
to the transcript being truly absent, imputation is a difficult challenge and
prone to creating [false-positive results in downstream
analysis](https://f1000research.com/articles/7-1740/v2).
There are many different imputation methods available we will consider three
fast, published methods:
......@@ -50,7 +54,9 @@ imputed all inflated zeros.
### scImpute
To test `scImpute`, we use the default parameters and we apply it to the Deng dataset that we have worked with before. scImpute takes a .csv or .txt file as an input:
To test `scImpute`, we use the default parameters and we apply it to the Deng
dataset that we have worked with before. scImpute takes a .csv or .txt file as
an input:
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
deng <- readRDS("data/deng/deng-reads.rds")
......@@ -81,17 +87,19 @@ plotPCA(
)
```
Compare this result to the original data in Chapter \@ref(clust-methods). What are the most significant differences?
Compare this result to the original data in Chapter \@ref(clust-methods). What
are the most significant differences?
We can examine the expression of specific genes to directly see the effect of
imputation on the expression distribution.
```{r}
plotExpression(res, c("Sox2", "Eomes", "Zscan4d", "Fgf4"))
plotExpression(deng, c("Sox2", "Eomes", "Zscan4d", "Fgf4"))
```
To evaluate the impact of the imputation, we use `SC3` to cluster the imputed matrix
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
res <- sc3_estimate_k(res)
metadata(res)$sc3$k_estimation
......@@ -107,11 +115,14 @@ __Exercise:__ Based on the PCA and the clustering results, do you think that imp
### DrImpute
We can do the same for DrImpute. DrImpute runs on a log-normalized expression matrix directly in R, we generate this matrix using scater, then run DrImpute. Unlike scImpute, DrImpute considers the consensus imputation across a range of ks using two differ correlation distances:
We can do the same for DrImpute. DrImpute runs on a log-normalized expression
matrix directly in R, we generate this matrix using scater, then run DrImpute.
Unlike scImpute, DrImpute considers the consensus imputation across a range of
ks using two differ correlation distances:
```{r}
deng <- normalize(deng)
res <- DrImpute(deng@assays[["logcounts"]], ks=8:12)
res <- DrImpute(logcounts(deng), ks=8:12)
colnames(res) <- colnames(deng)
rownames(res) <- rownames(deng)
res <- SingleCellExperiment(
......@@ -125,10 +136,13 @@ plotPCA(
)
plotExpression(res, c("Sox2", "Eomes", "Zscan4d", "Fgf4"))
```
__Exercise:__ Check the sc3 clustering of the DrImpute matrix, do you think that imputation using `DrImpute` is a good idea for the Deng dataset?
__Exercise:__ Check the sc3 clustering of the DrImpute matrix, do you think that
imputation using `DrImpute` is a good idea for the Deng dataset?
__Exercise:__ What is the difference between `scImpute` and `DrImpute` based on the PCA and clustering analysis? Which one do you think is best to use?
__Exercise:__ What is the difference between `scImpute` and `DrImpute` based on
the PCA and clustering analysis? Which one do you think is best to use?
### sessionInfo()
......
This diff is collapsed.
---
output: html_document
---
```{r setup, echo=FALSE}
knitr::opts_chunk$set(out.width='90%', fig.align = 'center', eval=FALSE)
knitr::opts_knit$set(root.dir = normalizePath(".."))
```
## Search scRNA-Seq data
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(scfind)
library(SingleCellExperiment)
library(plotly)
set.seed(1234567)
```
### About
`scfind` is a tool that allows one to search single cell RNA-Seq collections
(Atlas) using lists of genes, e.g. searching for cells and cell-types where a
specific set of genes are expressed. `scfind` is a [Github
package](https://github.com/hemberg-lab/scfind). Cloud implementation of
`scfind` with a large collection of datasets is available on our
[website](https://scfind.sanger.ac.uk/).
```{r, echo = FALSE, out.width = '80%', fig.cap="scfind can be used to search large collection of scRNA-seq data by a list of gene IDs."}
knitr::include_graphics("figures/scfind.png")
```
### Dataset
We will run `scfind` on the Tabula Muris 10X dataset. `scfind` also operates on
`SingleCellExperiment` class:
```{r, message=FALSE, warning=FALSE}
tm10x_heart <- readRDS("data/sce/Heart_10X.rds")
tm10x_heart
colData(tm10x_heart)
```
### Gene Index
Now we need to create a gene index using our dataset:
```{r}
heart_index <- buildCellTypeIndex(
tm10x_heart,
cell_type_column = "cell_type1"
)
```
`scfind` adopts a two-step compression strategy which allows efficient
compression of large cell-by-gene matrix and allows fast retrieval of data by
gene query. We estimated that one can achieve 2 orders of magnitude compression
with this method.
The input matrix for indexing is the raw count matrix of the
`SingleCellExperiment` class. By default the `cell_type1` column of the
`colData` slot of the `SingleCellExperiment` object is used to define cell
types, however it can also be defined manually using the `cell.type.label`
argument of the `buildCellTypeIndex`. For dataset with more than one tissue, you
can also merge all tissues together to create a super index using the function
`mergeDataset`.
The index can be saved in .rds format using `saveObject` function and loaded
using `loadObject` function for future use.
```{r}
tm10x_thymus <- readRDS("data/sce/Thymus_10X.rds")
thymus_index <- buildCellTypeIndex(
tm10x_thymus,
cell_type_column = "cell_type1"
)
## scfind_index <- mergeDataset(heart_index, thymus_index)
## scfind_index@datasets
## cellTypeNames(scfind_index)
## sample(scfindGenes(scfind_index),20)
```
To quickly and easily find the enriched cell type using an interactive Shiny
application use the following method:
```{r eval=FALSE, message=FALSE, warning=FALSE, include=FALSE}
scfindShiny(scfind_index)
```
### Marker genes
Now let's find the marker genes for Thymus T cell in the datasets
```{r, eval=FALSE}
# Showing the top 5 marker genes for each cell type and sort by F1 score.
t_cell_markers <- cellTypeMarkers(scfind_index, cell.types = "Thymus.T cell", top.k = 5, sort.field = "f1")
t_cell_markers
```
Next, you can evaluate the markers of Thymus T cell in Thymus stromal cell
```{r, eval=FALSE}
evaluateMarkers(
scfind_index,
gene.list = as.character(t_cell_markers$genes),
cell.types = "Thymus.stromal cell",
sort.field = "f1"
)
```
```{r, eval=FALSE}
# By default, the marker evaluation takes all cell types in the dataset as background cell type, but you can use the argument `background.cell.types` to fine tune the evaluation
background <- cellTypeNames(scfind_index, datasets = "Thymus")
background
evaluateMarkers(
scfind_index,
gene.list = as.character(t_cell_markers$genes),
cell.types = "Thymus.stromal cell",
sort.field = "f1",
background.cell.types = background
)
```
### Search cells by a gene list
`scfind` can instantly identify the cell type that best represents the genes of
interest from large single cell dataset. We will use the marker genes identified
in an original publication [Yanbin et al.
2015](https://www.nature.com/articles/cr201599). Cardiomyocyte-specific markers
used in immunostaining as shown in Figure 1.
```{r, eval=FALSE}
cardiomyocytes <- c("Mef2c", "Gata4", "Nkx2.5", "Myh6", "tnnt2", "tnni3", "CDH2", "Cx43", "GJA1")
result <- markerGenes(
scfind_index,
gene.list = cardiomyocytes
)
result
```
To allow search of enriched cell type from a long list of gene query, `scfind`
features a query optimization routine. First, the function `markerGenes` will
counter suggest subqueries that with the highest support in the dataset. The
TF-IDF score for each gene set allows user to identify the best subquery for
finding the most relevant cell type.
```{r, eval=FALSE}
best_subquery <- result[which.max(result$tfidf),] # get the best subquery by ranking TF-IDF score
best_subquery <- strsplit(as.character(best_subquery$Query), ",")[[1]] # obtain gene list
hyperQueryCellTypes(
scfind_index,
gene.list = best_subquery
)
```
`hyperQueryCellTypes` function returns a list of p-values corresponding to all
cell types in a given dataset. It also outputs a list of cells in which genes
from the given gene list are co-expressed.
__Exercise 1__
Find the marker genes of all cell types in the Heart dataset
```{r, echo=FALSE, eval=FALSE}
heart_cell_types <- cellTypeNames(scfind_index, "Heart")
for(cell_type_name in heart_cell_types)
{
result <- cellTypeMarkers(
scfind_index,
cell.types = cell_type_name,
sort.field = "f1",
background = heart_cell_types)
print(result)
}
```
```{r}
cardiac_contractility <- c("Ace2","Fkbp1b","Gh","Cacna1c","Cd59b","Ppp1r1a","Tnnt2","Nos1","Agtr1a","Camk2g","Grk2","Ins2","Dnah8","Igf1","Nos3","Nppa","Nppb","Il6","Myh6","Ren2","Tnni3","Apln","Kcnmb1","Pik3cg","Prkca","Aplnr","Slc8a1","Ace","Akt1","Edn1","Kcnmb2","Nos2","Tnf","Myh14","Adrb2","Agt","Adrb1","Atp2a2","Ryr2","Pln")
```
__Exercise 2__
Input the gene list relevant to "cardiac contractility" and find the best gene
set with the highest support. Identify the enriched cell type for this query.
```{r, echo=FALSE, eval=FALSE}
result <- markerGenes(scfind_index, cardiac_contractility)
best_sub_query <- as.character(result[which.max(result$tfidf), c("Query")])
best_sub_query <- strsplit(best_sub_query, ",")[[1]]
hyperQueryCellTypes(scfind_index, best_sub_query)
```
### In-silico gating
Using the `findCellTypes` function, you can perform in-silico gating to identify
cell type subsets as if the way cell sorting works. To do so, you can add
logical operators including "-" and "*" for "no" and "intermediate" expression,
respectively in front of the gene name. Here, we use operators to subset T cell
of the Thymus dataset into effector T regulatory cells and effector memory T
cell.
```{r, eval=FALSE}
effector_t_reg_cells <- c("*Ptprc", "-Il7r", "Ctla4", "-Il7r")
effector_memory_t_cells <- c("-Il2ra", "*Ptprc", "Il7r")
subset_treg <- findCellTypes(scfind_index, effector_t_reg_cells, "Thymus")
subset_tmem <- findCellTypes(scfind_index, effector_memory_t_cells, "Thymus")
subset_treg
subset_tmem
```
Let's use the TSNE plot information from the `SingleCellExperiment` of Thymus to
illustrate the gating result
```{r, eval=FALSE}
map <- data.frame(
tm10x_thymus@reducedDims[['TSNE']],
cell_type = as.character(colData(tm10x_thymus)$cell_type1),
stringsAsFactors = F
)
map <- subset(map, cell_type == "T cell")
plot_ly(map, x = ~X1 , y = ~X2, type="scatter")
map$cell_type[subset_treg$`Thymus.T cell`] <- "Effector T Regulatory Cell"
map$cell_type[subset_tmem$`Thymus.T cell`] <- "Effector Memory T Cell"
plot_ly(map, x = ~X1 , y = ~X2, type="scatter", color = ~cell_type)
```
### sessionInfo()
```{r}
sessionInfo()
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment