Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
MAGE_2020_Marker-Gene-Benchmarking
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
BioCellGen-public
MAGE_2020_Marker-Gene-Benchmarking
Commits
aeb023f8
Commit
aeb023f8
authored
4 years ago
by
Jeffrey Pullin
Browse files
Options
Downloads
Patches
Plain Diff
Delete old `simulation.Rmd`
parent
27e64466
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
analysis/simulation.Rmd
+0
-153
0 additions, 153 deletions
analysis/simulation.Rmd
with
0 additions
and
153 deletions
analysis/simulation.Rmd
deleted
100644 → 0
+
0
−
153
View file @
27e64466
---
title: "Simulation"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r libraries, message = FALSE, warning = FALSE}
library(splatter)
library(scran)
library(scater)
library(pheatmap)
```
## Aim
This document aims to outline the simulation framework used in this project
## Analysis
We will use {splatter} as the basic simulation framework for the project.
We will use `method = "groups"` as the basic setup for the simulations.
The relevant parameters for the groups method are:
* `nGroups`: number of groups (not set directly)
* `group.prob`: probability of cell being in each group
In {splatter} groups are created by simulating a random number of DE genes in each group. The relevant parameters for the DE method are:
* `de.prob`: probability that a gene is DE in any group (default: 0.1)
* `de.downProb`: probability that a DE gene is down-regulated (default: 0.5)
* `de.facLoc`: location (meanlog) of the DE factor log-normal distribution
```{r sim-splatter}
n_groups <- 5
# Can't set nCells directly but batchCells = nCells if 1 batch.
params <- newSplatParams(group.prob = rep(1/n_groups, n_groups),
de.facLoc = 2,
batchCells = 1000)
splatter_sim <- splatSimulate(params, method = "groups", verbose = FALSE)
splatter_sim
```
After we do the simulation we need to do several things to the simulated SingleCellExperiment object:
1. Extract the indices of the DE genes in each group. These are the genes with FacLoc not equal to 1.
```{r extract-de-genes}
# From code/simulation.R
# Should this just return the gene names?
extract_de_inds <- function(sce) {
stopifnot(is(sce, "SingleCellExperiment"))
n_groups <- length(unique(colData(sce)$Group))
col_names <- paste0("DEFacGroup", 1:n_groups)
data <- rowData(sce)[, col_names]
out <- lapply(data, function(x) which(x != 1))
names(out) <- paste0("group_", 1:n_groups)
out
}
de_inds <- extract_de_inds(splatter_sim)
str(de_inds)
```
Next, we need to process the object into the form that can be used by marker gene methods. Specifically we need to:
* Filter genes? (not at the moment)
* Normalize the data (using just log-counts for now)
* Make the colLabels the group id
NB: quickClusters no longer warns when nCells = 1000
```{r process-sim}
quick_clusters <- quickCluster(splatter_sim)
# Gives message:
# assuming UMI data when setting 'min.mean'
splatter_sim <- computeSumFactors(splatter_sim, clusters = quick_clusters)
splatter_sim <- logNormCounts(splatter_sim)
colLabels(splatter_sim) <- colData(splatter_sim)$Group
```
We perform EDA to check that the simulation produces reasonable data
NB: 1000 cells/10000 genes is computationally manageable
```{r eda}
dec_splatter_sim <- modelGeneVarByPoisson(splatter_sim)
splatter_sim <- denoisePCA(splatter_sim, technical = dec_splatter_sim)
plotPCA(splatter_sim, colour_by = "Group")
splatter_sim <- runTSNE(splatter_sim, dimred = "PCA")
plotTSNE(splatter_sim, colour_by = "Group")
```
PCA, tSNE show grouping but the tSNA appears weaker.
When de.FacLoc is increased to 2 then the clusters are very well seperated.
With the transformed data we can run the marker gene selection methods. For now we run scran only.
```{r run-scran}
scran_mgs <- findMarkers(splatter_sim, pval.type = "all")
```
With the output we can then calculate various summaries of quality of the calculated marker genes.
Initially we will focus on the markers for group 1 only.
One important question is how to choose the top marker genes from the {scran} output.
```{r mg-metrics}
scran_group_1_mgs <- scran_mgs[[1]]
# This selects the the top 6 genes in each pairwise comparison.
# scran_group_1_mgs <- scran_group_1_mgs[scran_group_1_mgs$Top <= 2, ]
# Just select the top 30
scran_group_1_mgs <- scran_group_1_mgs[1:30, ]
scran_group_1_nums <- readr::parse_number(rownames(scran_group_1_mgs))
length(scran_group_1_nums)
length(intersect(scran_group_1_nums, de_inds$group_1))
```
Really poor performance even with the the simplest possible simulation... The clusters are clear and number of cells is large so this is unexpected...
Even when the number of MGs selected is small many selected are not real marker genes
Let's try to understand why the performance is so bad...
```{r mg-eda}
logFCs <- getMarkerEffects(scran_group_1_mgs)
pheatmap(logFCs, breaks = seq(-5, 5, length.out=101))
```
Need to test the different scran options
`pval.type = "all"` gives much better performance in simple simulation
## Seurat
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment