Commit e3053478 authored by Davis McCarthy's avatar Davis McCarthy
Browse files

New build of website

parent cbb7c1d7
......@@ -261,7 +261,7 @@ _Comments_ Did you notice the ordering of clusters in the lineage prediced for `
After running slingshot, an interesting next step may be to find genes that change their expression over the course of development. We demonstrate one possible method for this type of analysis on the 100 most variable genes. We will regress each gene on the pseudotime variable we have generated, using a general additive model (GAM). This allows us to detect non-linear patterns in gene expression.
```{r gam_tm_deg,message=FASLE}
```{r gam_tm_deg, message=FALSE}
library(gam)
t <- deng_SCE$slingPseudotime_1
......
......@@ -339,7 +339,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="7.4.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sessioninfo-3"><i class="fa fa-check"></i><b>7.4.6</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.5" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#identifying-confounding-factors-reads"><i class="fa fa-check"></i><b>7.5</b> Identifying confounding factors (Reads)</a></li>
<li class="chapter" data-level="7.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#dealing-with-confounders"><i class="fa fa-check"></i><b>7.6</b> Dealing with confounders</a><ul>
<li class="chapter" data-level="7.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#batch-effects"><i class="fa fa-check"></i><b>7.6</b> Batch effects</a><ul>
<li class="chapter" data-level="7.6.1" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#introduction-6"><i class="fa fa-check"></i><b>7.6.1</b> Introduction</a></li>
<li class="chapter" data-level="7.6.2" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#linear-models"><i class="fa fa-check"></i><b>7.6.2</b> Linear models</a></li>
<li class="chapter" data-level="7.6.3" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sctransform-2"><i class="fa fa-check"></i><b>7.6.3</b> sctransform</a></li>
......@@ -347,7 +347,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="7.6.5" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#combat"><i class="fa fa-check"></i><b>7.6.5</b> Combat</a></li>
<li class="chapter" data-level="7.6.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#mnncorrect"><i class="fa fa-check"></i><b>7.6.6</b> mnnCorrect</a></li>
<li class="chapter" data-level="7.6.7" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#harmony"><i class="fa fa-check"></i><b>7.6.7</b> Harmony</a></li>
<li class="chapter" data-level="7.6.8" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#how-to-evaluate-and-compare-confounder-removal-strategies"><i class="fa fa-check"></i><b>7.6.8</b> How to evaluate and compare confounder removal strategies</a></li>
<li class="chapter" data-level="7.6.8" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#how-to-evaluate-and-compare-batch-correction"><i class="fa fa-check"></i><b>7.6.8</b> How to evaluate and compare batch correction</a></li>
<li class="chapter" data-level="7.6.9" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#big-exercise-2"><i class="fa fa-check"></i><b>7.6.9</b> Big Exercise</a></li>
<li class="chapter" data-level="7.6.10" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sessioninfo-4"><i class="fa fa-check"></i><b>7.6.10</b> sessionInfo()</a></li>
</ul></li>
......@@ -370,14 +370,13 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="9.1.2" data-path="latent-spaces.html"><a href="latent-spaces.html#tsne-t-distributed-stochastic-neighbor-embedding"><i class="fa fa-check"></i><b>9.1.2</b> tSNE: t-Distributed Stochastic Neighbor Embedding</a></li>
<li class="chapter" data-level="9.1.3" data-path="latent-spaces.html"><a href="latent-spaces.html#manifold-methods"><i class="fa fa-check"></i><b>9.1.3</b> Manifold methods</a></li>
</ul></li>
<li class="chapter" data-level="9.2" data-path="latent-spaces.html"><a href="latent-spaces.html#matrix-factorization-and-factor-analysis"><i class="fa fa-check"></i><b>9.2</b> Matrix factorization and factor analysis</a></li>
<li class="chapter" data-level="9.2" data-path="latent-spaces.html"><a href="latent-spaces.html#matrix-factorization-and-factor-analysis"><i class="fa fa-check"></i><b>9.2</b> Matrix factorization and factor analysis</a><ul>
<li class="chapter" data-level="9.2.1" data-path="latent-spaces.html"><a href="latent-spaces.html#slalom-interpretable-latent-spaces"><i class="fa fa-check"></i><b>9.2.1</b> <span>Slalom</span>: Interpretable latent spaces</a></li>
</ul></li>
<li class="chapter" data-level="9.3" data-path="latent-spaces.html"><a href="latent-spaces.html#autoencoders"><i class="fa fa-check"></i><b>9.3</b> Autoencoders</a><ul>
<li class="chapter" data-level="9.3.1" data-path="latent-spaces.html"><a href="latent-spaces.html#background-and-some-notations"><i class="fa fa-check"></i><b>9.3.1</b> Background and some notations</a></li>
<li class="chapter" data-level="9.3.2" data-path="latent-spaces.html"><a href="latent-spaces.html#objective"><i class="fa fa-check"></i><b>9.3.2</b> Objective</a></li>
</ul></li>
<li class="chapter" data-level="9.4" data-path="latent-spaces.html"><a href="latent-spaces.html#interpretable-latent-spaces"><i class="fa fa-check"></i><b>9.4</b> Interpretable latent spaces</a><ul>
<li class="chapter" data-level="9.4.1" data-path="latent-spaces.html"><a href="latent-spaces.html#slalom"><i class="fa fa-check"></i><b>9.4.1</b> Slalom</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="10" data-path="clustering-and-cell-annotation.html"><a href="clustering-and-cell-annotation.html"><i class="fa fa-check"></i><b>10</b> Clustering and cell annotation</a><ul>
<li class="chapter" data-level="10.1" data-path="clustering-and-cell-annotation.html"><a href="clustering-and-cell-annotation.html#clustering-methods"><i class="fa fa-check"></i><b>10.1</b> Clustering Methods</a><ul>
......@@ -399,15 +398,16 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="11.1" data-path="trajectory-inference.html"><a href="trajectory-inference.html#first-look-at-deng-data"><i class="fa fa-check"></i><b>11.1</b> First look at Deng data</a><ul>
<li class="chapter" data-level="11.1.1" data-path="trajectory-inference.html"><a href="trajectory-inference.html#tscan"><i class="fa fa-check"></i><b>11.1.1</b> TSCAN</a></li>
<li class="chapter" data-level="11.1.2" data-path="trajectory-inference.html"><a href="trajectory-inference.html#slingshot"><i class="fa fa-check"></i><b>11.1.2</b> Slingshot</a></li>
<li class="chapter" data-level="11.1.3" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle"><i class="fa fa-check"></i><b>11.1.3</b> Monocle</a></li>
<li class="chapter" data-level="11.1.4" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-2"><i class="fa fa-check"></i><b>11.1.4</b> Monocle 2</a></li>
<li class="chapter" data-level="11.1.5" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-3"><i class="fa fa-check"></i><b>11.1.5</b> Monocle 3</a></li>
<li class="chapter" data-level="11.1.6" data-path="trajectory-inference.html"><a href="trajectory-inference.html#diffusion-maps"><i class="fa fa-check"></i><b>11.1.6</b> Diffusion maps</a></li>
<li class="chapter" data-level="11.1.7" data-path="trajectory-inference.html"><a href="trajectory-inference.html#other-methods"><i class="fa fa-check"></i><b>11.1.7</b> Other methods</a></li>
<li class="chapter" data-level="11.1.8" data-path="trajectory-inference.html"><a href="trajectory-inference.html#comparison-of-the-methods"><i class="fa fa-check"></i><b>11.1.8</b> Comparison of the methods</a></li>
<li class="chapter" data-level="11.1.9" data-path="trajectory-inference.html"><a href="trajectory-inference.html#expression-of-genes-through-time"><i class="fa fa-check"></i><b>11.1.9</b> Expression of genes through time</a></li>
<li class="chapter" data-level="11.1.10" data-path="trajectory-inference.html"><a href="trajectory-inference.html#dynverse"><i class="fa fa-check"></i><b>11.1.10</b> dynverse</a></li>
<li class="chapter" data-level="11.1.11" data-path="trajectory-inference.html"><a href="trajectory-inference.html#sessioninfo-7"><i class="fa fa-check"></i><b>11.1.11</b> sessionInfo()</a></li>
<li class="chapter" data-level="11.1.3" data-path="trajectory-inference.html"><a href="trajectory-inference.html#gam-general-additive-model-for-identifying-temporally-expressed-genes"><i class="fa fa-check"></i><b>11.1.3</b> GAM general additive model for identifying temporally expressed genes</a></li>
<li class="chapter" data-level="11.1.4" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle"><i class="fa fa-check"></i><b>11.1.4</b> Monocle</a></li>
<li class="chapter" data-level="11.1.5" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-2"><i class="fa fa-check"></i><b>11.1.5</b> Monocle 2</a></li>
<li class="chapter" data-level="11.1.6" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-3"><i class="fa fa-check"></i><b>11.1.6</b> Monocle 3</a></li>
<li class="chapter" data-level="11.1.7" data-path="trajectory-inference.html"><a href="trajectory-inference.html#diffusion-maps"><i class="fa fa-check"></i><b>11.1.7</b> Diffusion maps</a></li>
<li class="chapter" data-level="11.1.8" data-path="trajectory-inference.html"><a href="trajectory-inference.html#other-methods"><i class="fa fa-check"></i><b>11.1.8</b> Other methods</a></li>
<li class="chapter" data-level="11.1.9" data-path="trajectory-inference.html"><a href="trajectory-inference.html#comparison-of-the-methods"><i class="fa fa-check"></i><b>11.1.9</b> Comparison of the methods</a></li>
<li class="chapter" data-level="11.1.10" data-path="trajectory-inference.html"><a href="trajectory-inference.html#expression-of-genes-through-time"><i class="fa fa-check"></i><b>11.1.10</b> Expression of genes through time</a></li>
<li class="chapter" data-level="11.1.11" data-path="trajectory-inference.html"><a href="trajectory-inference.html#dynverse"><i class="fa fa-check"></i><b>11.1.11</b> dynverse</a></li>
<li class="chapter" data-level="11.1.12" data-path="trajectory-inference.html"><a href="trajectory-inference.html#sessioninfo-7"><i class="fa fa-check"></i><b>11.1.12</b> sessionInfo()</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="12" data-path="dechapter.html"><a href="dechapter.html"><i class="fa fa-check"></i><b>12</b> Differential Expression (DE) analysis</a><ul>
......@@ -547,20 +547,20 @@ the Salmon index that was used for the quantification).</p>
<p>Here we will show you how to create an <code>SCE</code> from a <code>MultiAssayExperiment</code>
object. For example, if you download <code>Shalek2013</code> dataset you will be able to
create an <code>SCE</code> using the following code:</p>
<div class="sourceCode" id="cb821"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb821-1" data-line-number="1"><span class="kw">library</span>(MultiAssayExperiment)</a>
<a class="sourceLine" id="cb821-2" data-line-number="2"><span class="kw">library</span>(SummarizedExperiment)</a>
<a class="sourceLine" id="cb821-3" data-line-number="3"><span class="kw">library</span>(scater)</a>
<a class="sourceLine" id="cb821-4" data-line-number="4">d &lt;-<span class="st"> </span><span class="kw">readRDS</span>(<span class="st">&quot;~/Desktop/GSE41265.rds&quot;</span>)</a>
<a class="sourceLine" id="cb821-5" data-line-number="5">cts &lt;-<span class="st"> </span><span class="kw">assays</span>(<span class="kw">experiments</span>(d)[[<span class="st">&quot;gene&quot;</span>]])[[<span class="st">&quot;count_lstpm&quot;</span>]]</a>
<a class="sourceLine" id="cb821-6" data-line-number="6">tpms &lt;-<span class="st"> </span><span class="kw">assays</span>(<span class="kw">experiments</span>(d)[[<span class="st">&quot;gene&quot;</span>]])[[<span class="st">&quot;TPM&quot;</span>]]</a>
<a class="sourceLine" id="cb821-7" data-line-number="7">phn &lt;-<span class="st"> </span><span class="kw">colData</span>(d)</a>
<a class="sourceLine" id="cb821-8" data-line-number="8">sce &lt;-<span class="st"> </span><span class="kw">SingleCellExperiment</span>(</a>
<a class="sourceLine" id="cb821-9" data-line-number="9"> <span class="dt">assays =</span> <span class="kw">list</span>(</a>
<a class="sourceLine" id="cb821-10" data-line-number="10"> <span class="dt">countData =</span> cts, </a>
<a class="sourceLine" id="cb821-11" data-line-number="11"> <span class="dt">tpmData =</span> tpms</a>
<a class="sourceLine" id="cb821-12" data-line-number="12"> ),</a>
<a class="sourceLine" id="cb821-13" data-line-number="13"> <span class="dt">colData =</span> phn</a>
<a class="sourceLine" id="cb821-14" data-line-number="14">)</a></code></pre></div>
<div class="sourceCode" id="cb904"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb904-1" data-line-number="1"><span class="kw">library</span>(MultiAssayExperiment)</a>
<a class="sourceLine" id="cb904-2" data-line-number="2"><span class="kw">library</span>(SummarizedExperiment)</a>
<a class="sourceLine" id="cb904-3" data-line-number="3"><span class="kw">library</span>(scater)</a>
<a class="sourceLine" id="cb904-4" data-line-number="4">d &lt;-<span class="st"> </span><span class="kw">readRDS</span>(<span class="st">&quot;~/Desktop/GSE41265.rds&quot;</span>)</a>
<a class="sourceLine" id="cb904-5" data-line-number="5">cts &lt;-<span class="st"> </span><span class="kw">assays</span>(<span class="kw">experiments</span>(d)[[<span class="st">&quot;gene&quot;</span>]])[[<span class="st">&quot;count_lstpm&quot;</span>]]</a>
<a class="sourceLine" id="cb904-6" data-line-number="6">tpms &lt;-<span class="st"> </span><span class="kw">assays</span>(<span class="kw">experiments</span>(d)[[<span class="st">&quot;gene&quot;</span>]])[[<span class="st">&quot;TPM&quot;</span>]]</a>
<a class="sourceLine" id="cb904-7" data-line-number="7">phn &lt;-<span class="st"> </span><span class="kw">colData</span>(d)</a>
<a class="sourceLine" id="cb904-8" data-line-number="8">sce &lt;-<span class="st"> </span><span class="kw">SingleCellExperiment</span>(</a>
<a class="sourceLine" id="cb904-9" data-line-number="9"> <span class="dt">assays =</span> <span class="kw">list</span>(</a>
<a class="sourceLine" id="cb904-10" data-line-number="10"> <span class="dt">countData =</span> cts, </a>
<a class="sourceLine" id="cb904-11" data-line-number="11"> <span class="dt">tpmData =</span> tpms</a>
<a class="sourceLine" id="cb904-12" data-line-number="12"> ),</a>
<a class="sourceLine" id="cb904-13" data-line-number="13"> <span class="dt">colData =</span> phn</a>
<a class="sourceLine" id="cb904-14" data-line-number="14">)</a></code></pre></div>
<p>You can also see that several different QC metrics have already been
pre-calculated on the <a href="http://imlspenticton.uzh.ch:3838/conquer/">conquer</a>
website.</p>
......
......@@ -46,6 +46,8 @@ barcode_rank <- rank(-umi_per_barcode[,2])
plot(barcode_rank, umi_per_barcode[,2], xlim=c(1,8000))
```
<img src="cell-calling_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" />
Here we can see an roughly exponential curve of library sizes, so to make
things simpler lets log-transform them.
......@@ -55,6 +57,8 @@ log_lib_size <- log10(umi_per_barcode[,2])
plot(barcode_rank, log_lib_size, xlim=c(1,8000))
```
<img src="cell-calling_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" />
That's better, the "knee" in the distribution is much more pronounced. We
could manually estimate where the "knee" is but it much more reproducible to
algorithmically identify this point.
......@@ -71,7 +75,11 @@ inflection <- which(rawdiff == min(rawdiff[100:length(rawdiff)], na.rm=TRUE))
plot(barcode_rank, log_lib_size, xlim=c(1,8000))
abline(v=inflection, col="red", lwd=2)
```
<img src="cell-calling_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" />
```r
threshold <- 10^log_lib_size[inflection]
cells <- umi_per_barcode[umi_per_barcode[,2] > threshold,1]
......@@ -80,6 +88,10 @@ Recall <- sum(cells %in% truth[,1])/length(truth[,1])
c(TPR, Recall)
```
```
## [1] 1.0000000 0.7831707
```
### Mixture model
Another is to fix a mixture model and find where the higher and lower distributions intersect. However, data may not fit the assumed distributions very well:
......@@ -89,8 +101,32 @@ Another is to fix a mixture model and find where the higher and lower distributi
set.seed(-92497)
# mixture model
require("mixtools")
```
```
## Loading required package: mixtools
```
```
## mixtools package, version 1.1.0, Released 2017-03-10
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
```
```r
mix <- normalmixEM(log_lib_size)
```
```
## number of iterations= 43
```
```r
plot(mix, which=2, xlab2="log(mol per cell)")
```
<img src="cell-calling_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" />
```r
p1 <- dnorm(log_lib_size, mean=mix$mu[1], sd=mix$sigma[1])
p2 <- dnorm(log_lib_size, mean=mix$mu[2], sd=mix$sigma[2])
if (mix$mu[1] < mix$mu[2]) {
......@@ -122,6 +158,8 @@ thresh = totals[round(0.01*n_cells)]/10
plot(totals, xlim=c(1,8000))
abline(h=thresh, col="red", lwd=2)
```
<img src="cell-calling_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" />
__Exercise__
Identify cells using this threshodl and calculate the TPR and Recall.
......
......@@ -105,7 +105,7 @@ to scRNA-seq data by building a graph where each vertice represents a cell
and (weight of) the edge measures similarity between two cells.
Actually, graph-based clustering is the most popular clustering algorithm in
scRNA-seq data analysis, and has been reported to have outperformed other
clustering methods in many situations (ref).
clustering methods in many situations [@freytag2018comparison].
##### Why do we want to represent the data as a graph?\
......@@ -120,12 +120,12 @@ clustering methods in many situations (ref).
- __Step2__: Add weights, and obtain a shared nearest neighbour (__SNN__) graph
<center>![](figures/SNN.jpg){width= 4%}</center>
<center>![](figures/SNN.jpg){width=40%}</center>
There are two ways of adding weights: number and rank.\
- _number_: The number of shared nodes between $u$ and $v$, in this case, 3. \
- _rank_: A measurement of the closeness to their common nearest neighbours. (ref) \
- _rank_: A measurement of the closeness to their common nearest neighbours. (@xu2015identification) \
<font color="#bf812d">
......@@ -142,28 +142,37 @@ $$ w(u, v) = K - s(u, v).$$
##### Quality function (Modularity)\
Modularity is not the only quality function for graph-based clustering,
Modularity [@newman2004finding] is not the only quality function for graph-based clustering,
but it is one of the first attempts to embed in a compact form many questions including
<font color="red"> ... </font>.\
the definition of quality function and null model etc.\
__The idea of modularity__: A random graph should not have a cluster structure. \
The more "quality" a partition has compared to a random graph, the "better" the partition is.\
Specifically, it is defined by:
the <font color="#bf812d"> quality </font> of a partition on the actual graph $-$ the quality of the same partition on a <font color="#bf812d"> random graph </font>
<font color="#bf812d"> quality </font>: Sum of the weights within clusters \
<font color="#bf812d"> random graph </font>: a copy of the original graph, with some of its properties, but without community structure. The random graph defined by modularity is: each node has the same degree as the original graph.
$$ Q \propto \sum_{i, j} A_{i, j} \delta(i, j) - \sum_{i, j} \dfrac{k_i k_j}{2m} \delta(i, j)$$
<font color="red"> [notations] </font>
$$ Q \propto \sum_{i, j} A_{i, j} \delta(i, j) - \sum_{i, j} \dfrac{k_i k_j}{2m} \delta(i, j)$$
- $A_{i, j}$: weight between node $i$ and $j$;
- $\delta(i, j)$: indicator of whether $i$ and $j$ are in the same cluster;
- $k_i$: the degree of node $i$ (the sum of weights of all edges connected to $i$);
- $m$: the total weight in the all graph.
__Higher modularity implies better partition__:
<center>![](figures/modularity.jpg){width=80%}</center>
__Limits of modularity__: \
__Limits of modularity__: [@good2010performance]\
1. Resolution limit. \
Short version: Modularity maximization forces small communities into larger ones. \
Longer version: For two clusters $A$ and $B$, if $k_A k_B < 2m$ then modularity increases by merging A and B into a single cluster, even if A and B are distinct clusters.\
......@@ -179,12 +188,13 @@ __Limits of modularity__: \
Modularity-based clustering methods implemented in single cell analysis are mostly greedy algorithms,
that are very fast, although not the most accurate approaches.
&nbsp; &nbsp; __Louvain__:
&nbsp; &nbsp; __Louvain__: [@blondel2008fast]
<center>![](figures/Louvain.jpg){width=80%}</center>
&nbsp; &nbsp; __Leiden__: Improved Louvain, hybrid of greedy algorithm and sampling technique \
&nbsp; &nbsp; __Leiden__:[@traag2019louvain] \
Improved Louvain, hybrid of greedy algorithm and sampling technique \
##### __Advantages__: \
-Fast \
......
This diff is collapsed.
......@@ -62,11 +62,11 @@ Perform Louvain clustering:
```r
cl <- igraph::cluster_louvain(deng15)$membership
colData(deng)$cl <- factor(cl)
mclust::adjustedRandIndex(colData(deng)$cell_type1, colData(deng)$cl)
mclust::adjustedRandIndex(colData(deng)$cell_type2, colData(deng)$cl)
```
```
## [1] 0.8248454
## [1] 0.4197754
```
Reaches very high similarity with the labels provided in the original paper.
......@@ -74,18 +74,22 @@ However, it tend to merge small clusters into larger ones.
```r
table(deng$cell_type1, cl)
table(deng$cell_type2, cl)
```
```
## cl
## 1 2 3
## 16cell 49 0 1
## 2cell 0 22 0
## 4cell 0 14 0
## 8cell 36 0 1
## blast 0 0 133
## zygote 0 12 0
## cl
## 1 2 3
## 16cell 49 0 1
## 4cell 0 14 0
## 8cell 36 0 1
## early2cell 0 8 0
## earlyblast 0 0 43
## late2cell 0 10 0
## lateblast 0 0 30
## mid2cell 0 12 0
## midblast 0 0 60
## zy 0 4 0
```
......@@ -141,25 +145,55 @@ table(muraro$cell_type1, cl)
Let's run `SC3` clustering on the Deng data. The advantage of the `SC3` is that it can directly ingest a `SingleCellExperiment` object.
Now let's image we do not know the number of clusters _k_ (cell types). `SC3` can estimate a number of clusters for you:
`SC3` can estimate a number of clusters:
```r
deng <- sc3_estimate_k(deng)
metadata(deng)$sc3$k_estimation
```
Interestingly, the number of cell types predicted by `SC3` is smaller than in the original data annotation. However, early, mid and late stages of different cell types together, we will have exactly 6 cell types. We store the merged cell types in `cell_type1` column of the `colData` slot:
```
## Estimating k...
```
```r
plotPCA(deng, colour_by = "cell_type1")
metadata(deng)$sc3$k_estimation
```
```
## [1] 6
```
Now we are ready to run `SC3` (we also ask it to calculate biological properties of the clusters):
Next we run `SC3` (we also ask it to calculate biological properties of the clusters):
```r
deng <- sc3(deng, ks = 10, biology = TRUE, n_cores = 1)
```
```
## Setting SC3 parameters...
```
```
## Calculating distances between the cells...
```
```
## Performing transformations and calculating eigenvectors...
```
```
## Performing k-means clustering...
```
```
## Calculating consensus matrix...
```
```
## Calculating biology...
```
`SC3` result consists of several different outputs (please look in [@Kiselev2016-bq] and [SC3 vignette](http://bioconductor.org/packages/release/bioc/vignettes/SC3/inst/doc/my-vignette.html) for more details). Here we show some of them:
Consensus matrix:
......@@ -174,30 +208,42 @@ Silhouette plot:
sc3_plot_silhouette(deng, k = 10)
```
<img src="clustering_files/figure-html/unnamed-chunk-7-1.png" width="672" style="display: block; margin: auto;" />
Heatmap of the expression matrix:
```r
sc3_plot_expression(deng, k = 10, show_pdata = "cell_type2")
```
<img src="clustering_files/figure-html/unnamed-chunk-8-1.png" width="672" style="display: block; margin: auto;" />
Identified marker genes:
```r
sc3_plot_markers(deng, k = 10, show_pdata = "cell_type2")
```
<img src="clustering_files/figure-html/unnamed-chunk-9-1.png" width="672" style="display: block; margin: auto;" />
PCA plot with highlighted `SC3` clusters:
```r
plotPCA(deng, colour_by = "sc3_10_clusters")
```
<img src="clustering_files/figure-html/unnamed-chunk-10-1.png" width="672" style="display: block; margin: auto;" />
Compare the results of `SC3` clustering with the original publication cell type labels:
```r
adjustedRandIndex(colData(deng)$cell_type2, colData(deng)$sc3_10_clusters)
```
```
## [1] 0.7796181
```
__Note__ `SC3` can also be run in an interactive `Shiny` session:
```r
......@@ -233,7 +279,9 @@ __Note__ Due to direct calculation of distances `SC3` becomes very slow when the
<center> ![](figures/SingleR_score.png){width=80%} </center>
- __Step4. Fine tuning__\
We stop here and assign each cell with label that score the highest, actually, if we set the argument ```fine.tune = FALSE```, that is exactly what the package function ```SingleR``` does.
But there is one more question, what if the second highest score is very close to the highest?
But there is one more question, what if the second highest score is very close to the highest? say, 1, 1, 1, 9.5, 10.
`SingleR` set a threshold to define how close is "very close", the default is 0.05.
For (only) the cells that falls into this category, it goes back to Step2.
#### Example
......@@ -345,11 +393,11 @@ metadata(sceM)$scmap_cell_index$subclusters[1:5,1:5]
```
## D28.1_1 D28.1_13 D28.1_15 D28.1_17 D28.1_2
## [1,] 13 25 36 1 29
## [2,] 7 24 19 17 21
## [3,] 19 35 7 7 36
## [4,] 38 27 29 38 41
## [5,] 8 39 24 40 1
## [1,] 6 11 7 38 36
## [2,] 1 16 17 44 38
## [3,] 28 17 4 45 25
## [4,] 43 41 40 33 22
## [5,] 36 27 29 11 35
```
......@@ -418,4 +466,4 @@ plot(
Among the 2126 cells in the data, only 89 are annotated as different labels as the
......@@ -339,7 +339,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="7.4.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sessioninfo-3"><i class="fa fa-check"></i><b>7.4.6</b> sessionInfo()</a></li>
</ul></li>
<li class="chapter" data-level="7.5" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#identifying-confounding-factors-reads"><i class="fa fa-check"></i><b>7.5</b> Identifying confounding factors (Reads)</a></li>
<li class="chapter" data-level="7.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#dealing-with-confounders"><i class="fa fa-check"></i><b>7.6</b> Dealing with confounders</a><ul>
<li class="chapter" data-level="7.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#batch-effects"><i class="fa fa-check"></i><b>7.6</b> Batch effects</a><ul>
<li class="chapter" data-level="7.6.1" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#introduction-6"><i class="fa fa-check"></i><b>7.6.1</b> Introduction</a></li>
<li class="chapter" data-level="7.6.2" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#linear-models"><i class="fa fa-check"></i><b>7.6.2</b> Linear models</a></li>
<li class="chapter" data-level="7.6.3" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sctransform-2"><i class="fa fa-check"></i><b>7.6.3</b> sctransform</a></li>
......@@ -347,7 +347,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="7.6.5" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#combat"><i class="fa fa-check"></i><b>7.6.5</b> Combat</a></li>
<li class="chapter" data-level="7.6.6" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#mnncorrect"><i class="fa fa-check"></i><b>7.6.6</b> mnnCorrect</a></li>
<li class="chapter" data-level="7.6.7" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#harmony"><i class="fa fa-check"></i><b>7.6.7</b> Harmony</a></li>
<li class="chapter" data-level="7.6.8" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#how-to-evaluate-and-compare-confounder-removal-strategies"><i class="fa fa-check"></i><b>7.6.8</b> How to evaluate and compare confounder removal strategies</a></li>
<li class="chapter" data-level="7.6.8" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#how-to-evaluate-and-compare-batch-correction"><i class="fa fa-check"></i><b>7.6.8</b> How to evaluate and compare batch correction</a></li>
<li class="chapter" data-level="7.6.9" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#big-exercise-2"><i class="fa fa-check"></i><b>7.6.9</b> Big Exercise</a></li>
<li class="chapter" data-level="7.6.10" data-path="normalization-confounders-and-batch-correction.html"><a href="normalization-confounders-and-batch-correction.html#sessioninfo-4"><i class="fa fa-check"></i><b>7.6.10</b> sessionInfo()</a></li>
</ul></li>
......@@ -370,14 +370,13 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="9.1.2" data-path="latent-spaces.html"><a href="latent-spaces.html#tsne-t-distributed-stochastic-neighbor-embedding"><i class="fa fa-check"></i><b>9.1.2</b> tSNE: t-Distributed Stochastic Neighbor Embedding</a></li>
<li class="chapter" data-level="9.1.3" data-path="latent-spaces.html"><a href="latent-spaces.html#manifold-methods"><i class="fa fa-check"></i><b>9.1.3</b> Manifold methods</a></li>
</ul></li>
<li class="chapter" data-level="9.2" data-path="latent-spaces.html"><a href="latent-spaces.html#matrix-factorization-and-factor-analysis"><i class="fa fa-check"></i><b>9.2</b> Matrix factorization and factor analysis</a></li>
<li class="chapter" data-level="9.2" data-path="latent-spaces.html"><a href="latent-spaces.html#matrix-factorization-and-factor-analysis"><i class="fa fa-check"></i><b>9.2</b> Matrix factorization and factor analysis</a><ul>
<li class="chapter" data-level="9.2.1" data-path="latent-spaces.html"><a href="latent-spaces.html#slalom-interpretable-latent-spaces"><i class="fa fa-check"></i><b>9.2.1</b> <span>Slalom</span>: Interpretable latent spaces</a></li>
</ul></li>
<li class="chapter" data-level="9.3" data-path="latent-spaces.html"><a href="latent-spaces.html#autoencoders"><i class="fa fa-check"></i><b>9.3</b> Autoencoders</a><ul>
<li class="chapter" data-level="9.3.1" data-path="latent-spaces.html"><a href="latent-spaces.html#background-and-some-notations"><i class="fa fa-check"></i><b>9.3.1</b> Background and some notations</a></li>
<li class="chapter" data-level="9.3.2" data-path="latent-spaces.html"><a href="latent-spaces.html#objective"><i class="fa fa-check"></i><b>9.3.2</b> Objective</a></li>
</ul></li>
<li class="chapter" data-level="9.4" data-path="latent-spaces.html"><a href="latent-spaces.html#interpretable-latent-spaces"><i class="fa fa-check"></i><b>9.4</b> Interpretable latent spaces</a><ul>
<li class="chapter" data-level="9.4.1" data-path="latent-spaces.html"><a href="latent-spaces.html#slalom"><i class="fa fa-check"></i><b>9.4.1</b> Slalom</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="10" data-path="clustering-and-cell-annotation.html"><a href="clustering-and-cell-annotation.html"><i class="fa fa-check"></i><b>10</b> Clustering and cell annotation</a><ul>
<li class="chapter" data-level="10.1" data-path="clustering-and-cell-annotation.html"><a href="clustering-and-cell-annotation.html#clustering-methods"><i class="fa fa-check"></i><b>10.1</b> Clustering Methods</a><ul>
......@@ -399,15 +398,16 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li class="chapter" data-level="11.1" data-path="trajectory-inference.html"><a href="trajectory-inference.html#first-look-at-deng-data"><i class="fa fa-check"></i><b>11.1</b> First look at Deng data</a><ul>
<li class="chapter" data-level="11.1.1" data-path="trajectory-inference.html"><a href="trajectory-inference.html#tscan"><i class="fa fa-check"></i><b>11.1.1</b> TSCAN</a></li>
<li class="chapter" data-level="11.1.2" data-path="trajectory-inference.html"><a href="trajectory-inference.html#slingshot"><i class="fa fa-check"></i><b>11.1.2</b> Slingshot</a></li>
<li class="chapter" data-level="11.1.3" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle"><i class="fa fa-check"></i><b>11.1.3</b> Monocle</a></li>
<li class="chapter" data-level="11.1.4" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-2"><i class="fa fa-check"></i><b>11.1.4</b> Monocle 2</a></li>
<li class="chapter" data-level="11.1.5" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-3"><i class="fa fa-check"></i><b>11.1.5</b> Monocle 3</a></li>
<li class="chapter" data-level="11.1.6" data-path="trajectory-inference.html"><a href="trajectory-inference.html#diffusion-maps"><i class="fa fa-check"></i><b>11.1.6</b> Diffusion maps</a></li>
<li class="chapter" data-level="11.1.7" data-path="trajectory-inference.html"><a href="trajectory-inference.html#other-methods"><i class="fa fa-check"></i><b>11.1.7</b> Other methods</a></li>
<li class="chapter" data-level="11.1.8" data-path="trajectory-inference.html"><a href="trajectory-inference.html#comparison-of-the-methods"><i class="fa fa-check"></i><b>11.1.8</b> Comparison of the methods</a></li>
<li class="chapter" data-level="11.1.9" data-path="trajectory-inference.html"><a href="trajectory-inference.html#expression-of-genes-through-time"><i class="fa fa-check"></i><b>11.1.9</b> Expression of genes through time</a></li>
<li class="chapter" data-level="11.1.10" data-path="trajectory-inference.html"><a href="trajectory-inference.html#dynverse"><i class="fa fa-check"></i><b>11.1.10</b> dynverse</a></li>
<li class="chapter" data-level="11.1.11" data-path="trajectory-inference.html"><a href="trajectory-inference.html#sessioninfo-7"><i class="fa fa-check"></i><b>11.1.11</b> sessionInfo()</a></li>
<li class="chapter" data-level="11.1.3" data-path="trajectory-inference.html"><a href="trajectory-inference.html#gam-general-additive-model-for-identifying-temporally-expressed-genes"><i class="fa fa-check"></i><b>11.1.3</b> GAM general additive model for identifying temporally expressed genes</a></li>
<li class="chapter" data-level="11.1.4" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle"><i class="fa fa-check"></i><b>11.1.4</b> Monocle</a></li>
<li class="chapter" data-level="11.1.5" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-2"><i class="fa fa-check"></i><b>11.1.5</b> Monocle 2</a></li>
<li class="chapter" data-level="11.1.6" data-path="trajectory-inference.html"><a href="trajectory-inference.html#monocle-3"><i class="fa fa-check"></i><b>11.1.6</b> Monocle 3</a></li>
<li class="chapter" data-level="11.1.7" data-path="trajectory-inference.html"><a href="trajectory-inference.html#diffusion-maps"><i class="fa fa-check"></i><b>11.1.7</b> Diffusion maps</a></li>
<li class="chapter" data-level="11.1.8" data-path="trajectory-inference.html"><a href="trajectory-inference.html#other-methods"><i class="fa fa-check"></i><b>11.1.8</b> Other methods</a></li>
<li class="chapter" data-level="11.1.9" data-path="trajectory-inference.html"><a href="trajectory-inference.html#comparison-of-the-methods"><i class="fa fa-check"></i><b>11.1.9</b> Comparison of the methods</a></li>
<li class="chapter" data-level="11.1.10" data-path="trajectory-inference.html"><a href="trajectory-inference.html#expression-of-genes-through-time"><i class="fa fa-check"></i><b>11.1.10</b> Expression of genes through time</a></li>
<li class="chapter" data-level="11.1.11" data-path="trajectory-inference.html"><a href="trajectory-inference.html#dynverse"><i class="fa fa-check"></i><b>11.1.11</b> dynverse</a></li>
<li class="chapter" data-level="11.1.12" data-path="trajectory-inference.html"><a href="trajectory-inference.html#sessioninfo-7"><i class="fa fa-check"></i><b>11.1.12</b> sessionInfo()</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="12" data-path="dechapter.html"><a href="dechapter.html"><i class="fa fa-check"></i><b>12</b> Differential Expression (DE) analysis</a><ul>
......@@ -534,22 +534,22 @@ the 15/03/16. We will use these copies for reproducibility purposes.</p>
</div>
<div id="pancreas" class="section level2">
<h2><span class="header-section-number">4.3</span> Pancreas</h2>
<p>We have included two human pancreas datasets: from Muraro et al (2016) and
Segerstolpe et al. (2016). Since the pancreas has been widely studied, these
<p>We have included two human pancreas datasets: from Muraro et al (2016) <span class="citation">(Muraro et al. <a href="#ref-Muraro2016-yk">2016</a>)</span> and
Segerstolpe et al. (2016) <span class="citation">(Segerstolpe et al. <a href="#ref-Segerstolpe2016-wc">2016</a>)</span>. Since the pancreas has been widely studied, these
datasets are well annotated.</p>
<div id="muraro" class="section level3">
<h3><span class="header-section-number">4.3.1</span> Muraro</h3>
<p>Single-cell CEL-seq data were generated using a customised automated platform
<p>Single-cell CEL-seq2 data were generated using a customised automated platform
that uses FACS, robotics, and the CEL-Seq2 protocol to obtain the transcriptomes
of thousands of single pancreatic cells from four deceased organ donors. Cell
surface markers can be used for sorting and enriching certain cell types.</p>
surface markers can be used for sorting and enriching certain cell types.<span class="citation">(Muraro et al. <a href="#ref-Muraro2016-yk">2016</a>)</span></p>
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/27693023">Muraro,M.J. et al. (2016) A Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Syst, 3, 385–394.e3.</a></p>
</div>
<div id="segerstolpe" class="section level3">
<h3><span class="header-section-number">4.3.2</span> Segerstolpe</h3>
<p>Single-cell RNA-seq dataset of human pancreatic cells from patients with type 2
diabetes and healthy controls. Single cells were prepared using Smart-seq2
protocol and sequenced on an Illumina HiSeq 2000.</p>
protocol and sequenced on an Illumina HiSeq 2000.<span class="citation">(Segerstolpe et al. <a href="#ref-Segerstolpe2016-wc">2016</a>)</span></p>
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/27667667">Segerstolpe,Å. et al. (2016) Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes. Cell Metab., 24, 593–607.</a></p>
</div>
</div>
......@@ -759,6 +759,15 @@ rather than loading the whole thing into RAM.</p>
<h2><span class="header-section-number">4.13</span> Advanced Exercise</h2>
<p>Write an R function/script which will fully automate this procedure for each data-type for any tissue.</p>
</div>
</div>