Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
BioCellGenpublic
MIG_2019_scRNAseqworkshop
Commits
4b7ce491
Commit
4b7ce491
authored
Oct 01, 2019
by
Puxue Qiao
Browse files
slalom methodology
parent
2bf1d423
Pipeline
#932
passed with stage
in 5 seconds
Changes
5
Pipelines
1
Hide whitespace changes
Inline
Sidebyside
course_files/figures/FA.png
View replaced file @
2bf1d423
View file @
4b7ce491
457 KB

W:

H:
299 KB

W:

H:
2up
Swipe
Onion skin
course_files/figures/FA_matrix.png
0 → 100644
View file @
4b7ce491
100 KB
course_files/figures/slab_spike.png
0 → 100644
View file @
4b7ce491
14.8 KB
course_files/figures/slalom_anno.png
0 → 100644
View file @
4b7ce491
73.8 KB
course_files/latentspaces.Rmd
View file @
4b7ce491
...
...
@@ 3,7 +3,7 @@ output: html_document

```{r setup, echo=FALSE}
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(fig.align = "center"
, eval = TRUE
)
knitr::opts_knit$set(root.dir = normalizePath(".."))
```
...
...
@@ 433,22 +433,66 @@ ggplot(dt, aes(x=PHATE1, y=PHATE2, color=clust)) +
## Matrix factorization and factor analysis
Factor Analysis is similar to PCA in that,
they both aim to obtain a new set of distinct summary variables,
which are fewer in number than the original number of variables.
The key concept of factor analysis is that the original, observed variables are
__The key concept of factor analysis__: The original, observed variables are
correlated because they are all associated with some unobservable variables,
called latent factors.
the __latent factors__.
It looks similar to PCA, but instead of dimensionality reduction, factor analysis
focuses on studying the latent factors.
The variance of a variable can be splitted into two parts: \
The variance of a
n observed
variable can be splitted into two parts: \
 Common variance: the part of variance that is explained by latent factors; \
 Unique variance: the part that is specific to only one variable, usually considered as an error component or residual.
 Unique variance: the part that is specific to only one variable, usually considered as an error component or __residual__.
The __factor loadings__ or weights indicate how much each latent factor is affecting the observed features.
<center> ![](figures/FA.png){width=
8
0%} </center>
<center> ![](figures/FA.png){width=
6
0%} </center>
### [Slalom](https://bioconductor.org/packages/release/bioc/html/slalom.html): Interpretable latent spaces
Highlight of Slalom:
 It incorporates prior information to help the model estimation;
 It learns whatever not provided by prior knowledge in the model training process;
 It enforces sparsity in the weight matrix.
#### Methodology
__Matrix expression of factor analysis:__
<center>![](figures/FA_matrix.png){width=80%} </center>
__How prior knowledge affects the model:__
<center>![](figures/slalom_anno.png) </center>
 $I_{g, k}$: (observed) Indicator of whether a gene $g$ is annotated to a given pathway or factor $k$;\
 $z_{g, k}$: (latent) Indicator of whether factor $k$ has a regulatory effect on gene $g$;\
 $w_{g, k}$: (estimated) weights.
__grey arrow__:
$$ P(I_{g, k}\vert z_{g, k}) = \begin{cases}
\text{Bernoulli}(p_1), \text{if } z_{g, k} = 1\\
\text{Bernoulli}(p_2), \text{if } z_{g, k} = 0\\
\end{cases}$$
__green arrow__:
$$ P(w_{g, k}\vert z_{g, k}) = \begin{cases}
N(w_{g, k}, 1/\alpha), \text{ if } z_{g, k} = 1\\
\delta_0(w_{g, k}), \text{ if } z_{g, k} = 0\\
\end{cases}$$
<center>![](figures//slab_spike.png)</center>
We only look at the part of the __likelihood__ that is relavant to this part:
$\prod_{g} \prod_{k}P(I_{g, k}, w_{g, k}, z_{g, k})$, \
where $P(I_{g, k}, w_{g, k}, z_{g, k}) = P(I_{g, k}, w_{g, k} z_{g, k})P(z_{g,k})
= P( I_{g, k} z_{g, k})P( w_{g, k} z_{g, k})P(z_{g,k})$.
Since we do not know anything about $z_{g,k}$, it is assumed as Bernoulli(1/2).
#### Example
First, get a geneset in a `GeneSetCollection` object.
```{r}
gmtfile < system.file("extdata", "reactome_subset.gmt", package = "slalom")
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment