Skip to content
Snippets Groups Projects
Ruqian Lyu's avatar
Ruqian Lyu authored
23f45782
History

sscocaller: Calling crossovers from single-sperm DNA sequencing reads

sscocaller processes DNA reads from each single sperm in the aligned and sorted BAM file for inferring the haplotypes of single sperm genomes that can later be used for calling crossovers by identifying haplotype shifts (see comapr). It takes the large bam file which contains all aligned DNA reads from sperm cells and summarizes allele counts for the provided informative SNP markers. While counting the alleles, the Viterbi algorithm is implemented for finding the haplotype sequence for the list of SNP markers.

sscocaller_fig

Hidden Markov Model configuration

HMM_fig

Inputs

  • Bam, sorted and index bam file which contains DNA reads of single sperm cells with CB tag, eg. from single-cell preprocessing pipeline (cellranger)
  • VCF, variant call file that contains the list of informative SNPs
  • barcodeFile, the list of cell barcodes of the sperm cells

Outputs

  • *.mtx
    • sparse matrix with columns corresponding to the list of sperm cell barcodes and rows corresponding to the list of SNP positions in VCF file
    • {sample}_chr1_altCount.mtx, a sparse mtx with entries representing alternative allele counts
    • {sample}_chr1_totalCount.mtx, a sparse mtx with entries representing total allele counts
    • {sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state)
  • {sample}_chr1_snpAnnot.txt, the SNP positions and allele
  • {sample}_chr1_viSegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state.

Usage


  Usage:
      sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix>

  Options:
      -t --threads <threads> number of BAM decompression threads [default: 4]
      --minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20]
      --baseq <baseq>  base quality threshold for a base to be used for counting [default: 13]
      --chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms)
      --minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1]
      --maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 5]
      --maxTotalDP <maxTotalDP> the maximum DP across all barcodes for a SNP to be included in the output file [default: 25]
      --minTotalDP <minTotalDP> the minimum DP across all barcodes for a SNP to be included in the output file [default: 10]
      --chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix
      --thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1]
      --thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9]
      --cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1]
      -h --help  show help

  Examples
      ./sscocaller --threads 4 possorted_bam. SNPs.vcf.gz barcodeFile.tsv ./percell/ccsnp

Run for a Bam of a single cell (bulk sample)

For cases where the DNA reads do not have a cell barocode tag (CB) and all the DNA reads are from one cell (Bulk sample), sscocaller can still be applied:

1, Prepare a bulkBC.txt file in which only one dummy cell barcode "bulk" is listed. It means in the bulkBC.txt file, there is only one row and it has the text "bulk".

2, Apply sscocaller as above :

    ./sscocaller --threads 10 bulk.bam SNPs.vcf.gz bulkBC.txt ./output/bulksample

Setup/installation

Static builds

The static bianry can be simply downloaded which works for GNU/Linux type OS: ./src/sscocaller

The static build was generated by using docker image docker://svirlyu/sscocaller_nsb adapted from https://github.com/brentp/hts-nim/blob/master/Dockerfile

/usr/local/bin/nsb -s ./src/sscocaller.nim -n sscocaller.nimble -o /mnt/src -- --d:release --threads:on

A static binary build of sscocaller is available as downloadable artifacts at gitlab repo: "https://gitlab.svi.edu.au/biocellgen-public/sscocaller" or via the Releases tab

Install from bioconda

sscocaller is available as a conda package and can be install with bioconda

Using docker

sscocaller is also available in docker image [svirlyu/sscocaller] https://hub.docker.com/r/svirlyu/sscocaller

It can be executed by

docker run -it svirlyu/sscocaller

## execute sscocaller
/usr/bin/sscocaller -h

Install with nimble

sscocaller uses hts-nim(https://github.com/brentp/hts-nim) that requires the hts-lib library. If you are building the sscocaller from source, you would need to install hts-lib

git clone --recursive https://github.com/samtools/htslib.git
cd htslib && git checkout 1.10 && autoheader && autoconf && ./configure --enable-libcurl

cd ..
make -j 4 -C htslib
export LD_LIBRARY_PATH=$HOME/htslib
ls -lh $HOME/htslib/*.so

Then, sscocaller can be installed using nimble

nimble install https://gitlab.svi.edu.au/biocellgen-public/sscocaller.git

The built package is located at $HOME/.nimble/bin/sscocaller

Downstream analysis in R

The output files from sscocaller can be directly parsed into R for construction of individual genetic maps using the R package comapr available from github.com/ruqianl/comapr

A complete analysis workflow can be accessed here.

References

[1] Hinch, AG. (2019). Factors influencing meiotic recombination revealed by whole-genome sequencing of single sperm Science, 363(6433)