From 08f121a38147f479af4eb93fc8357261a983fa74 Mon Sep 17 00:00:00 2001
From: rlyu <rlyu@svi.edu.au>
Date: Wed, 12 May 2021 13:36:30 +1000
Subject: [PATCH] update README

---
 README.md | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index f4a40bf..3a56579 100644
--- a/README.md
+++ b/README.md
@@ -10,9 +10,39 @@ sequence for the list of SNP markers.
 
 ![sscocaller_fig](images/sscocaller_fig.png)
 
+## Hidden Markov Model configuration
+
+- Observations. The allele specific counts across the informative SNP markers for
+each chromosome in each sperm cell.
+
+- States. Sperm cells have haploid genomes. There are two possible hidden states (haplotypes)
+corresponding to a REF or ALT segment in the haploid genome. At each SNP site $i$, 
+there are two hidden states: $s_{i}= 0$ corresponds to ALT segment while $s_i=1$
+corresponds to REF segment.
+
+- Emission probabilities. Two binomial distributions were used for modelling the
+emission probabilities for sperm cells at each SNP marker. For each site $s_i$
+          $$ c = c_r + c_a ~,$$
+          $$c_a |_{s = 0} \sim Bin(c,\theta_{ALT} ) ~,$$
+          $$c_a |_{s = 1}\sim Bin(c,\theta_{REF} ) ~.$$
+
+- Transition Probabilities}. A distance-dependent transition probability [[1]](#1)
+was applied, which corresponded to an average of `--cmPmb` cM (centiMorgan) per 1Mb 
+(1 million base pairs):
+ $$p_{ij} = 1-e^{(-d_{ij}\mathbf{x}0.5\mathbf{x}10^{-8})} ~,$$
+where $p_{ij}$ is the transition probability of transitioning to a different 
+state at SNP $j$ from SNP $i$, and $d_{ij}$ denotes the physical base-pair 
+distances between SNP $i$ and SNP $j$.
+
+- Initial probabilities. The initial probabilities for the two hidden states
+were set to be both 0.5 since they were equally likely to happen.
+
+
+        
 ## Inputs
 
-- Bam, sorted and index bam file which contains DNA reads of single sperm cells with CB tag, eg. from single-cell alignment pipeline (cellranger)
+- Bam, sorted and index bam file which contains DNA reads of single sperm cells 
+with `CB` tag, eg. from single-cell preprocessing pipeline (cellranger)
 - VCF, variant call file that contains the list of informative SNPs
 - barcodeFile, the list of cell barcodes of the sperm cells
 
@@ -75,8 +105,6 @@ ls -lh $HOME/htslib/*.so
 
 Then, `sscocaller` can be installed using `nimble`
 
-
-
 `nimble install https://gitlab.svi.edu.au/biocellgen-public/sscocaller.git`
 
 The built binary in $HOME/.nimble/bin/sscocaller
@@ -115,4 +143,13 @@ sscocaller is available at "./src/sscocaller"
 ## Downstream analysis in R
 
 The output files from `sscocaller` can be directly parsed into R for construction of individual genetic maps using
-the R package `comapr` available from [TBD].
\ No newline at end of file
+the R package `comapr` available from [TBD].
+
+
+## References
+<a id="1">[1]</a> 
+Hinch, AG. (2019). 
+Factors influencing meiotic recombination revealed by
+              whole-genome sequencing of single sperm 
+Science, 363(6433)
+
-- 
GitLab