Skip to content
Snippets Groups Projects
Commit 05efe431 authored by Jeffrey Pullin's avatar Jeffrey Pullin
Browse files

Update notes

parent 269e0876
No related branches found
No related tags found
No related merge requests found
......@@ -2,3 +2,29 @@
This file contains information about the data used in the project.
To best assess the methods, we need to find data for which the cell types have known marker genes, preferably 10-30. This suggests we should focus on data for which the cell types are well understood and well characterized.
Some collections of data which may contain suitable datasets are:
* https://eyeintegration.nei.nih.gov/
* https://singlecell.broadinstitute.org/single_cell
* https://www.ebi.ac.uk/gxa/sc/home
* https://doi.org/10.1093/database/baaa073
Once we have the data we also need to find lists of the marker genes which may be even more challenging. Most papers don't seem to include the markers they use in the final publication.
### Bryan 2018
Supplementary Table 9 contains candidate marker gene (signature gene) lists for a selection of eye cell types. These were found in the following way:
"genes overexpressed in the bulk retina tissue relative to the synthetic body set, in the top 20% of expression in a particular retina cell type, and in the bottom 50% for the remaining retina cell type"
Most of the cell types have 5-10 marker genes. These are bulk data driven but may be useful.
### Macosko 2015
Use pre-identified markers shown in plots but doesn't list them!!!
......@@ -2,13 +2,13 @@
This file contains information about the different methods we will benchmark.
### Scanpy
### scanpy
This new default is t-test not t-test over_estim_var
### Seurat
### Scran
### scran
### Others
......@@ -17,3 +17,6 @@ This new default is t-test not t-test over_estim_var
diffxpy?
Natranos et al. (2018)
Come up with method categorisation
## Nomenclature
Marker genes are sometimes called:
Biomarkers - Seruat tutorials
Signature genes - bryan2018
Cell type specific vs disease markers
# Plots
Upset plot to measure method overlap
FDR vs TPR plots to measure performance
Overall plot at the end
Plots to show simulations
* tSNE
* Count matrices
Venn diagram plots
# Questions
* Does comparing the cluster of interest to every other cluster and then combining these comparisons lead to better performance than simply comparing the cluster of interest to all other cells?
* Which differential expression method is optimal in terms of speed, statistical properties, etc.?
* What impact does using classification methods such as logistic regression or ROC have compared to more traditional hypothesis testing methods such as the t-test.
* Does the 'hack' in the {scanpy} implementation cause problems in practice?
## Simulations
### Methods
Which simulation methods should we use? Options are:
* splatter
* muscat
* spsimseq
It would be good to show that the results are robust to the exact details (distributional assumptions) of the methods used.
Should validate with countsimQC/other plots
### Details
Consider other types of differential expression?
### Types
Null simulation (testing FDR control)
# Thinking
In {splatter} when groups are simulated what is the DE defined with respect to?
Recovery as a function of the number that are also DE
Recovery as a function of parameters of the DE
Recoverty as a function of proportion
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment