diff --git a/README.md b/README.md index f4a40bf387595d832ae47756f32585ee2acb4da5..3a56579aeff8f792ffa43f4d0662bf0f26151804 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,39 @@ sequence for the list of SNP markers.  +## Hidden Markov Model configuration + +- Observations. The allele specific counts across the informative SNP markers for +each chromosome in each sperm cell. + +- States. Sperm cells have haploid genomes. There are two possible hidden states (haplotypes) +corresponding to a REF or ALT segment in the haploid genome. At each SNP site $i$, +there are two hidden states: $s_{i}= 0$ corresponds to ALT segment while $s_i=1$ +corresponds to REF segment. + +- Emission probabilities. Two binomial distributions were used for modelling the +emission probabilities for sperm cells at each SNP marker. For each site $s_i$ + $$ c = c_r + c_a ~,$$ + $$c_a |_{s = 0} \sim Bin(c,\theta_{ALT} ) ~,$$ + $$c_a |_{s = 1}\sim Bin(c,\theta_{REF} ) ~.$$ + +- Transition Probabilities}. A distance-dependent transition probability [[1]](#1) +was applied, which corresponded to an average of `--cmPmb` cM (centiMorgan) per 1Mb +(1 million base pairs): + $$p_{ij} = 1-e^{(-d_{ij}\mathbf{x}0.5\mathbf{x}10^{-8})} ~,$$ +where $p_{ij}$ is the transition probability of transitioning to a different +state at SNP $j$ from SNP $i$, and $d_{ij}$ denotes the physical base-pair +distances between SNP $i$ and SNP $j$. + +- Initial probabilities. The initial probabilities for the two hidden states +were set to be both 0.5 since they were equally likely to happen. + + + ## Inputs -- Bam, sorted and index bam file which contains DNA reads of single sperm cells with CB tag, eg. from single-cell alignment pipeline (cellranger) +- Bam, sorted and index bam file which contains DNA reads of single sperm cells +with `CB` tag, eg. from single-cell preprocessing pipeline (cellranger) - VCF, variant call file that contains the list of informative SNPs - barcodeFile, the list of cell barcodes of the sperm cells @@ -75,8 +105,6 @@ ls -lh $HOME/htslib/*.so Then, `sscocaller` can be installed using `nimble` - - `nimble install https://gitlab.svi.edu.au/biocellgen-public/sscocaller.git` The built binary in $HOME/.nimble/bin/sscocaller @@ -115,4 +143,13 @@ sscocaller is available at "./src/sscocaller" ## Downstream analysis in R The output files from `sscocaller` can be directly parsed into R for construction of individual genetic maps using -the R package `comapr` available from [TBD]. \ No newline at end of file +the R package `comapr` available from [TBD]. + + +## References +<a id="1">[1]</a> +Hinch, AG. (2019). +Factors influencing meiotic recombination revealed by + whole-genome sequencing of single sperm +Science, 363(6433) +