Skip to content
Snippets Groups Projects
Commit abe49214 authored by Ruqian Lyu's avatar Ruqian Lyu
Browse files

update readme

parent ca2a27cd
No related branches found
No related tags found
No related merge requests found
Pipeline #6603 failed
## sscocaller: Calling crossovers from single-sperm DNA sequencing reads
`sscocaller` processes DNA reads from each single sperm in the aligned and sorted BAM file for detecting crossovers by identifying haplotype shifts. It takes the large bam file which contains aligned DNA reads from a list of single sperm cells and summarizes allele counts for the provided informative SNP markers. A HMM model is applied for haplotyping each sperm and viterbi algorithm is run for deriving the inferred haplotype sequence against the list of SNP markers.
`sscocaller` processes DNA reads from each single sperm in the aligned and
sorted BAM file for inferring the haplotypes of single sperm genomes that can
later be used for calling crossovers by identifying haplotype shifts (see [comapr](https://github.com/ruqianl/comapr)).
It takes the large bam file which contains all aligned DNA reads from sperm cells and
summarizes allele counts for the provided informative SNP markers. While counting
the alleles, the Viterbi algorithm is implemented for finding the haplotype
sequence for the list of SNP markers.
![sscocaller_fig](images/sscocaller_fig.png)
## Inputs
- Bam, sorted and index bam file which contains DNA reads of single sperm cells with CB tag, eg. from single-cell alignment pipeline (cellranger)
- VCF, variant calling file that contains the list of informative SNPs provided
- VCF, variant call file that contains the list of informative SNPs
- barcodeFile, the list of cell barcodes of the sperm cells
......@@ -19,31 +25,35 @@
* {sample}_chr1_totalCount.mtx, a sparse mtx with entries representing total allele counts
* {sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state)
* {sample}_chr1_snpAnnot.txt, the SNP positions and allele
* {sample}_chr1_SegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state.
* {sample}_chr1_viSegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state.
## Usage
```
Usage:
sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix>
Options:
-t --threads <threads> number of BAM decompression threads [default: 4]
-MQ --minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20]
-BQ --baseq <baseq> base quality threshold for a base to be used for counting [default: 13]
-CHR --chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms)
-minDP --minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1]
-maxDP --maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 10]
-chrName --chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix
-thetaREF --thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1]
-thetaALT --thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9]
-cmPmb --cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1]
-h --help show help
Examples
./sscocaller --threads 10 AAAGTAGCACGTCTCT-1.raw.bam AAAGTAGCACGTCTCT-1.raw.bam.dp3.alt.vcf.gz barcodeFile.tsv ./percell/ccsnp-
Usage:
sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix>
Options:
-t --threads <threads> number of BAM decompression threads [default: 4]
-MQ --minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20]
-BQ --baseq <baseq> base quality threshold for a base to be used for counting [default: 13]
-CHR --chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms)
-minDP --minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1]
-maxDP --maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 5]
-maxTotalDP --maxTotalDP <maxTotalDP> the maximum DP across all barcodes for a SNP to be included in the output file [default: 25]
-minTotalDP --minTotalDP <minTotalDP> the minimum DP across all barcodes for a SNP to be included in the output file [default: 10]
-chrName --chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix
-thetaREF --thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1]
-thetaALT --thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9]
-cmPmb --cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1]
-h --help show help
Examples
./sscocaller --threads 10 AAAGTAGCACGTCTCT-1.raw.bam AAAGTAGCACGTCTCT-1.raw.bam.dp3.alt.vcf.gz barcodeFile.tsv ./percell/ccsnp-
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment