### slalom methodology

parent 2bf1d423
Pipeline #932 passed with stage
in 5 seconds 457 KB | W: | H:

299 KB | W: | H:  • 2-up
• Swipe
• Onion skin

100 KB

14.8 KB

73.8 KB

 ... ... @@ -3,7 +3,7 @@ output: html_document --- {r setup, echo=FALSE} knitr::opts_chunkset(fig.align = "center") knitr::opts_chunkset(fig.align = "center", eval = TRUE) knitr::opts_knit$set(root.dir = normalizePath(".."))  ... ... @@ -433,22 +433,66 @@ ggplot(dt, aes(x=PHATE1, y=PHATE2, color=clust)) + ## Matrix factorization and factor analysis Factor Analysis is similar to PCA in that, they both aim to obtain a new set of distinct summary variables, which are fewer in number than the original number of variables. The key concept of factor analysis is that the original, observed variables are __The key concept of factor analysis__: The original, observed variables are correlated because they are all associated with some unobservable variables, called latent factors. the __latent factors__. It looks similar to PCA, but instead of dimensionality reduction, factor analysis focuses on studying the latent factors. The variance of a variable can be splitted into two parts: \ The variance of an observed variable can be splitted into two parts: \ - Common variance: the part of variance that is explained by latent factors; \ - Unique variance: the part that is specific to only one variable, usually considered as an error component or residual. - Unique variance: the part that is specific to only one variable, usually considered as an error component or __residual__. The __factor loadings__ or weights indicate how much each latent factor is affecting the observed features. ![](figures/FA.png){width=80%} ![](figures/FA.png){width=60%} ### [Slalom](https://bioconductor.org/packages/release/bioc/html/slalom.html): Interpretable latent spaces Highlight of Slalom: - It incorporates prior information to help the model estimation; - It learns whatever not provided by prior knowledge in the model training process; - It enforces sparsity in the weight matrix. #### Methodology __Matrix expression of factor analysis:__ ![](figures/FA_matrix.png){width=80%} __How prior knowledge affects the model:__ ![](figures/slalom_anno.png) -$I_{g, k}$: (observed) Indicator of whether a gene$g$is annotated to a given pathway or factor$k$;\ -$z_{g, k}$: (latent) Indicator of whether factor$k$has a regulatory effect on gene$g$;\ -$w_{g, k}$: (estimated) weights. __grey arrow__: $$P(I_{g, k}\vert z_{g, k}) = \begin{cases} \text{Bernoulli}(p_1), \text{if } z_{g, k} = 1\\ \text{Bernoulli}(p_2), \text{if } z_{g, k} = 0\\ \end{cases}$$ __green arrow__: $$P(w_{g, k}\vert z_{g, k}) = \begin{cases} N(w_{g, k}, 1/\alpha), \text{ if } z_{g, k} = 1\\ \delta_0(w_{g, k}), \text{ if } z_{g, k} = 0\\ \end{cases}$$ ![](figures//slab_spike.png) We only look at the part of the __likelihood__ that is relavant to this part:$\prod_{g} \prod_{k}P(I_{g, k}, w_{g, k}, z_{g, k})$, \ where$P(I_{g, k}, w_{g, k}, z_{g, k}) = P(I_{g, k}, w_{g, k}| z_{g, k})P(z_{g,k}) = P( I_{g, k}| z_{g, k})P( w_{g, k}| z_{g, k})P(z_{g,k})$. Since we do not know anything about$z_{g,k}\$, it is assumed as Bernoulli(1/2). #### Example First, get a geneset in a GeneSetCollection object. `{r} gmtfile <- system.file("extdata", "reactome_subset.gmt", package = "slalom") ... ...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment