
sscocaller: Calling crossovers from single-sperm DNA sequencing reads
sscocaller
processes DNA reads from each single sperm in the aligned and
sorted BAM file for inferring the haplotypes of single sperm genomes that can
later be used for calling crossovers by identifying haplotype shifts (see comapr).
It takes the large bam file which contains all aligned DNA reads from sperm cells and
summarizes allele counts for the provided informative SNP markers. While counting
the alleles, the Viterbi algorithm is implemented for finding the haplotype
sequence for the list of SNP markers.
Hidden Markov Model configuration
Inputs
- Bam, sorted and index bam file which contains DNA reads of single sperm cells
with
CB
tag, eg. from single-cell preprocessing pipeline (cellranger) - VCF, variant call file that contains the list of informative SNPs
- barcodeFile, the list of cell barcodes of the sperm cells
Outputs
- *.mtx
- sparse matrix with columns corresponding to the list of sperm cell barcodes and rows corresponding to the list of SNP positions in VCF file
- {sample}_chr1_altCount.mtx, a sparse mtx with entries representing alternative allele counts
- {sample}_chr1_totalCount.mtx, a sparse mtx with entries representing total allele counts
- {sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state)
- {sample}_chr1_snpAnnot.txt, the SNP positions and allele
- {sample}_chr1_viSegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state.
Usage
Usage:
sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix>
Options:
-t --threads <threads> number of BAM decompression threads [default: 4]
--minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20]
--baseq <baseq> base quality threshold for a base to be used for counting [default: 13]
--chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms)
--minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1]
--maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 5]
--maxTotalDP <maxTotalDP> the maximum DP across all barcodes for a SNP to be included in the output file [default: 25]
--minTotalDP <minTotalDP> the minimum DP across all barcodes for a SNP to be included in the output file [default: 10]
--chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix
--thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1]
--thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9]
--cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1]
-h --help show help
Examples
./sscocaller --threads 4 possorted_bam. SNPs.vcf.gz barcodeFile.tsv ./percell/ccsnp
Run for a Bam of a single cell (bulk sample)
For cases where the DNA reads do not have a cell barocode tag (CB) and all the DNA reads are from one cell (Bulk sample), sscocaller
can
still be applied:
1, Prepare a bulkBC.txt file in which only one dummy cell barcode "bulk" is listed. It means in the bulkBC.txt file, there is only one row and it has the text "bulk".
2, Apply sscocaller as above :
./sscocaller --threads 10 bulk.bam SNPs.vcf.gz bulkBC.txt ./output/bulksample
Setup/installation
Static builds
The static bianry can be simply downloaded which works for GNU/Linux type OS: ./src/sscocaller
The static build was generated by using docker image docker://svirlyu/sscocaller_nsb adapted from https://github.com/brentp/hts-nim/blob/master/Dockerfile
/usr/local/bin/nsb -s ./src/sscocaller.nim -n sscocaller.nimble -o /mnt/src -- --d:release --threads:on
A static binary build of sscocaller
is available as downloadable artifacts at gitlab repo: "https://gitlab.svi.edu.au/biocellgen-public/sscocaller" or via
the Releases
tab
Install from bioconda
sscocaller
is available as a conda package and can be
Using docker
sscocaller
is also available in docker image [svirlyu/sscocaller
] https://hub.docker.com/r/svirlyu/sscocaller
It can be executed by
docker run -it svirlyu/sscocaller
## execute sscocaller
/usr/bin/sscocaller -h
Install with nimble
sscocaller
uses hts-nim
(https://github.com/brentp/hts-nim) that requires the hts-lib
library. If you are building the sscocaller
from
source, you would need to install hts-lib
git clone --recursive https://github.com/samtools/htslib.git
cd htslib && git checkout 1.10 && autoheader && autoconf && ./configure --enable-libcurl
cd ..
make -j 4 -C htslib
export LD_LIBRARY_PATH=$HOME/htslib
ls -lh $HOME/htslib/*.so
Then, sscocaller
can be installed using nimble
nimble install https://gitlab.svi.edu.au/biocellgen-public/sscocaller.git
The built package is located at $HOME/.nimble/bin/sscocaller
Downstream analysis in R
The output files from sscocaller
can be directly parsed into R for construction of individual genetic maps using the R package comapr
available from github.com/ruqianl/comapr
A complete analysis workflow can be accessed here.
References
[1] Hinch, AG. (2019). Factors influencing meiotic recombination revealed by whole-genome sequencing of single sperm Science, 363(6433)