Skip to content
Snippets Groups Projects
Commit 20c8a1ba authored by Ruqian Lyu's avatar Ruqian Lyu
Browse files

update readme and subsample

parent 70177c4b
No related branches found
No related tags found
No related merge requests found
## Reproduciable workflow for downloading and preprocessing single-sperm DNA sequencing data from Hinch et al 2019
### Step1 download files
submit-wgetSRAFastqdump.sh for downloading sra files from GEO and dumping into fastq files for each sperm sample
### Step2 run alignment
run_alignment.snk is a snakemake file which contains rules/steps for preprocessing the
fastq reads and mapping reads to the mouse reference genome mm10
### Step3 subsample reads
`sscocaller` is designed to process DNA reads with CB (cell barcode) tags from all single sperm cells stored in one BAM file. And to reduce some processing burdens, the
mapped reads for each sperm were de-duplicated and subsamples to a fraction of 0.5.
In addition, before merging reads from each sperm, the CB (cell barcode, the SRR ID) tag was appended to each DNA read using [appendCB](https://github.com/ruqianl/appendCB).
Refer to steps defined in `run_subsample.snk`.
## subsample bams
import pandas as pd
## a sample meta file to align sample name and corresponding fastqs
sample_meta_file = "sampleNames_meta.txt"
## Run sscocaller on deduped and subsampled sperm
spermSRRfile = "singleSpermSRR.txt"
spermSRR_meta = pd.read_csv(spermSRRfile, delimiter='\t',header =0)
spermSRR = pd.Series(spermSRR_meta['sample_name'])
outdir = "output/alignment/"
rule all:
input: expand([outdir+"subBam/{sperm_srr}.mkdup.sort.rmdup.sub.bam",outdir+"cbBam/{sperm_srr}.mkdup.sort.rmdup.sub.cb.bam.bai"], sperm_srr = spermSRR)
rule run_rmDups:
input:
bam=outdir+"cleanBam/{sperm_srr}.mkdup.sort.bam"
output:
dedupBam=outdir+"dedupBam/{sperm_srr}.mkdup.sort.rmdup.bam",
dedupBai=outdir+"dedupBam/{sperm_srr}.mkdup.sort.rmdup.bam.bai"
resources:
cpus = 2,
mem = 10240
shell:
"""
samtools rmdup --output-fmt BAM {input.bam} {output.dedupBam}
samtools index -@ {resources.cpus} {output.dedupBam}
"""
rule subsample_bam:
input:
dedupBam=outdir+"dedupBam/{sperm_srr}.mkdup.sort.rmdup.bam"
params:
proportion="0.5"
resources:
cpus = 1,
mem = 10240
output:
subBam=outdir+"subBam/{sperm_srr}.mkdup.sort.rmdup.sub.bam",
subBai=outdir+"subBam/{sperm_srr}.mkdup.sort.rmdup.sub.bam.bai"
shell:
"""
samtools view -s {params.proportion} {input.dedupBam} -b -o {output.subBam}
samtools index {output.subBam}
"""
rule appendCB:
input:
subBam=outdir+"subBam/{sperm_srr}.mkdup.sort.rmdup.sub.bam"
output:
cbBam=outdir+"cbBam/{sperm_srr}.mkdup.sort.rmdup.sub.cb.bam"
resources:
cpus = 1,
mem = 3240
shell:
"""
/mnt/mcfiles/rlyu/Projects/github/appendCB/src/appendCB --generate --append {wildcards.sperm_srr} {input.subBam} {output.cbBam}
"""
rule sort_cbbam:
input:
cbBam=outdir+"cbBam/{sperm_srr}.mkdup.sort.rmdup.sub.cb.bam"
output:
cbBai=outdir+"cbBam/{sperm_srr}.mkdup.sort.rmdup.sub.cb.bam.bai"
resources:
cpus = 1,
mem = 3240
shell:
"""
samtools index {input.cbBam}
"""
\ No newline at end of file
sample_name
SRR8454693
SRR8454727
SRR8454763
SRR8454801
SRR8454836
SRR8454655
SRR8454694
SRR8454728
SRR8454764
SRR8454802
SRR8454837
SRR8454656
SRR8454695
SRR8454729
SRR8454765
SRR8454803
SRR8454838
SRR8454657
SRR8454696
SRR8454730
SRR8454766
SRR8454804
SRR8454840
SRR8454658
SRR8454697
SRR8454731
SRR8454768
SRR8454805
SRR8454841
SRR8454659
SRR8454698
SRR8454732
SRR8454769
SRR8454806
SRR8454842
SRR8454660
SRR8454699
SRR8454733
SRR8454770
SRR8454807
SRR8454843
SRR8454661
SRR8454700
SRR8454734
SRR8454771
SRR8454808
SRR8454844
SRR8454662
SRR8454701
SRR8454735
SRR8454772
SRR8454809
SRR8454848
SRR8454663
SRR8454702
SRR8454736
SRR8454773
SRR8454810
SRR8454850
SRR8454664
SRR8454704
SRR8454737
SRR8454774
SRR8454811
SRR8454852
SRR8454665
SRR8454705
SRR8454738
SRR8454775
SRR8454812
SRR8454853
SRR8454666
SRR8454706
SRR8454739
SRR8454777
SRR8454813
SRR8454854
SRR8454667
SRR8454707
SRR8454740
SRR8454778
SRR8454814
SRR8454855
SRR8454668
SRR8454708
SRR8454741
SRR8454779
SRR8454815
SRR8454856
SRR8454669
SRR8454709
SRR8454742
SRR8454780
SRR8454816
SRR8454857
SRR8454670
SRR8454710
SRR8454743
SRR8454782
SRR8454818
SRR8454858
SRR8454671
SRR8454711
SRR8454744
SRR8454783
SRR8454819
SRR8454859
SRR8454672
SRR8454712
SRR8454745
SRR8454784
SRR8454820
SRR8454860
SRR8454673
SRR8454713
SRR8454747
SRR8454785
SRR8454821
SRR8454861
SRR8454674
SRR8454714
SRR8454748
SRR8454786
SRR8454822
SRR8454862
SRR8454675
SRR8454715
SRR8454749
SRR8454787
SRR8454823
SRR8454863
SRR8454677
SRR8454716
SRR8454750
SRR8454789
SRR8454824
SRR8454864
SRR8454678
SRR8454717
SRR8454751
SRR8454790
SRR8454825
SRR8454865
SRR8454679
SRR8454718
SRR8454753
SRR8454791
SRR8454827
SRR8454866
SRR8454681
SRR8454719
SRR8454754
SRR8454792
SRR8454828
SRR8454867
SRR8454683
SRR8454720
SRR8454755
SRR8454793
SRR8454829
SRR8454868
SRR8454684
SRR8454721
SRR8454756
SRR8454794
SRR8454830
SRR8454869
SRR8454685
SRR8454722
SRR8454758
SRR8454795
SRR8454831
SRR8454870
SRR8454689
SRR8454723
SRR8454759
SRR8454796
SRR8454832
SRR8454871
SRR8454690
SRR8454724
SRR8454760
SRR8454798
SRR8454833
SRR8454691
SRR8454725
SRR8454761
SRR8454799
SRR8454834
SRR8454692
SRR8454726
SRR8454762
SRR8454800
SRR8454835
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment