Skip to content
Snippets Groups Projects

Reproduciable workflow for downloading and preprocessing single-sperm DNA sequencing data from Hinch et al 2019

Step1 download files

submit-wgetSRAFastqdump.sh for downloading sra files from GEO and dumping into fastq files for each sperm sample

Step2 run alignment

run_alignment.snk is a snakemake file which contains rules/steps for preprocessing the fastq reads and mapping reads to the mouse reference genome mm10

Step3 subsample reads

sscocaller is designed to process DNA reads with CB (cell barcode) tags from all single sperm cells stored in one BAM file. And to reduce some processing burdens, the mapped reads for each sperm were de-duplicated and subsamples to a fraction of 0.5.

In addition, before merging reads from each sperm, the CB (cell barcode, the SRR ID) tag was appended to each DNA read using appendCB. Refer to steps defined in run_subsample.snk.