Skip to content
Snippets Groups Projects
README.md 2.07 KiB
Newer Older
## Reproduciable workflow for downloading and preprocessing single-sperm DNA sequencing data from Hinch et al 2019

Ruqian Lyu's avatar
Ruqian Lyu committed
### Step1 download files

submit-wgetSRAFastqdump.sh for downloading sra files from GEO and dumping into fastq files for each sperm sample


### Step2 run alignment

Ruqian Lyu's avatar
Ruqian Lyu committed
`run_alignment.snk` is a snakemake file which contains rules/steps for preprocessing the 
Ruqian Lyu's avatar
Ruqian Lyu committed
fastq reads and mapping reads to the mouse reference genome mm10

### Step3 subsample reads

`sscocaller` is designed to process DNA reads with CB (cell barcode) tags from all single sperm cells stored in one BAM file. And to reduce some processing burdens, the
mapped reads for each sperm were de-duplicated and subsamples to a fraction of 0.5.

In addition, before merging reads from each sperm, the CB (cell barcode, the SRR ID) tag was appended to each DNA read using [appendCB](https://github.com/ruqianl/appendCB).
Refer to steps defined in `run_subsample.snk`.

Ruqian Lyu's avatar
Ruqian Lyu committed
### Step4 merge single-sperm bam files into one Bam

`samtools` was used for merge CB-taged reads from all single sperm to one BAM file. See
`submit-mergeBams.sh`

Ruqian Lyu's avatar
Ruqian Lyu committed
### Step5 Finding informative SNP markers
Ruqian Lyu's avatar
Ruqian Lyu committed

The informative SNP markers are those SNPs which differ between the two mouse stains
that were used to generate the F1 hybrid mouse (CAST and BL6). The following steps were
applied which largely align with what has been described in the original paper [@Hinch2019-dt].

The bulk sperm sample `SRR8454653` was used for calling de no vo variants for this
mouse individual using GATK HaplotypeCaller. Only the HET SNPs with `MQ>50` AND 
`DP>10` AND `DP<80` were kept.
The SNPs were further filtered to only keep the positions which have been called 
as Homo_alternative `CAST_EiJ.mgp.v5.snps.dbSNP142.vcf.gz` downloaded from the 
Ruqian Lyu's avatar
Ruqian Lyu committed
dbsnp database from Mouse Genome Project.

## Running sscocaller

`run_sscocaller.snk` defines the snakemake rule for running sscocaller for each chromosome.

Ruqian Lyu's avatar
Ruqian Lyu committed

## Downstream crossover analysis with comapr


Ruqian Lyu's avatar
Ruqian Lyu committed
https://biocellgen-public.svi.edu.au/hinch-single-sperm-DNA-seq-processing/public/Crossover-identification-with-sscocaller-and-comapr.html