Update notes

05efe431 · Jeffrey Pullin · 269e0876 · 05efe431 · 05efe431 · 05efe431
Commit 05efe431 authored 4 years ago by Jeffrey Pullin
--- a/notes/data.md
+++ b/notes/data.md
@@ -2,3 +2,29 @@

 This file contains information about the data used in the project. 

+To best assess the methods, we need to find data for which the cell types have known marker genes, preferably 10-30. This suggests we should focus on data for which the cell types are well understood and well characterized. 
+
+Some collections of data which may contain suitable datasets are:
+
+* https://eyeintegration.nei.nih.gov/
+* https://singlecell.broadinstitute.org/single_cell
+* https://www.ebi.ac.uk/gxa/sc/home
+* https://doi.org/10.1093/database/baaa073
+
+Once we have the data we also need to find lists of the marker genes which may be even more challenging. Most papers don't seem to include the markers they use in the final publication. 
+
+### Bryan 2018
+
+Supplementary Table 9 contains candidate marker gene (signature gene) lists for a selection of eye cell types. These were found in the following way:
+
+"genes overexpressed in the bulk retina tissue relative to the synthetic body set, in the top 20% of expression in a particular retina cell type, and in the bottom 50% for the remaining retina cell type"
+
+Most of the cell types have 5-10 marker genes. These are bulk data driven but may be useful. 
+
+### Macosko 2015
+
+Use pre-identified markers shown in plots but doesn't list them!!!
+
+
+
+
--- a/notes/methods.md
+++ b/notes/methods.md
@@ -2,13 +2,13 @@

 This file contains information about the different methods we will benchmark.

-### Scanpy
+### scanpy

 This new default is t-test not t-test over_estim_var

 ### Seurat

-### Scran 
+### scran 

 ### Others 

@@ -17,3 +17,6 @@ This new default is t-test not t-test over_estim_var
 diffxpy?

 Natranos et al. (2018)
+
+Come up with method categorisation
+
--- a/notes/nomenclature.md
+++ b/notes/nomenclature.md
+## Nomenclature 
+
+Marker genes are sometimes called:
+
+Biomarkers - Seruat tutorials 
+Signature genes -  bryan2018
+
+Cell type specific vs disease markers
--- a/notes/plots.md
+++ b/notes/plots.md
+# Plots
+
+Upset plot to measure method overlap
+
+FDR vs TPR plots to measure performance
+
+Overall plot at the end 
+
+Plots to show simulations
+
+* tSNE
+* Count matrices
+
+Venn diagram plots
--- a/notes/questions.md
+++ b/notes/questions.md
+# Questions 
+
+* Does comparing the cluster of interest to every other cluster and then combining these comparisons lead to better performance than simply comparing the cluster of interest to all other cells?
+
+* Which differential expression method is optimal in terms of speed, statistical properties, etc.?
+
+* What impact does using classification methods such as logistic regression or ROC have compared to more traditional hypothesis testing methods such as the t-test.
+
+* Does the 'hack' in the {scanpy} implementation cause problems in practice?
--- a/notes/simulations.md
+++ b/notes/simulations.md
+## Simulations
+
+### Methods
+
+Which simulation methods should we use? Options are:
+
+* splatter
+* muscat 
+* spsimseq 
+
+It would be good to show that the results are robust to the exact details (distributional assumptions) of the methods used. 
+
+Should validate with countsimQC/other plots
+
+### Details 
+
+Consider other types of differential expression?
+
+
+### Types
+
+Null simulation (testing FDR control)
+
+
+
--- a/notes/thinking.md
+++ b/notes/thinking.md
+# Thinking 
+
+In {splatter} when groups are simulated what is the DE defined with respect to?
+
+Recovery as a function of the number that are also DE
+
+Recovery as a function of parameters of the DE
+
+Recoverty as a function of proportion