diff --git a/README.md b/README.md index 1d6422752ca95be7205da26cab787a1770aab4bd..f4a40bf387595d832ae47756f32585ee2acb4da5 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,19 @@ ## sscocaller: Calling crossovers from single-sperm DNA sequencing reads -`sscocaller` processes DNA reads from each single sperm in the aligned and sorted BAM file for detecting crossovers by identifying haplotype shifts. It takes the large bam file which contains aligned DNA reads from a list of single sperm cells and summarizes allele counts for the provided informative SNP markers. A HMM model is applied for haplotyping each sperm and viterbi algorithm is run for deriving the inferred haplotype sequence against the list of SNP markers. +`sscocaller` processes DNA reads from each single sperm in the aligned and +sorted BAM file for inferring the haplotypes of single sperm genomes that can +later be used for calling crossovers by identifying haplotype shifts (see [comapr](https://github.com/ruqianl/comapr)). +It takes the large bam file which contains all aligned DNA reads from sperm cells and +summarizes allele counts for the provided informative SNP markers. While counting +the alleles, the Viterbi algorithm is implemented for finding the haplotype +sequence for the list of SNP markers.  ## Inputs - Bam, sorted and index bam file which contains DNA reads of single sperm cells with CB tag, eg. from single-cell alignment pipeline (cellranger) -- VCF, variant calling file that contains the list of informative SNPs provided +- VCF, variant call file that contains the list of informative SNPs - barcodeFile, the list of cell barcodes of the sperm cells @@ -19,31 +25,35 @@ * {sample}_chr1_totalCount.mtx, a sparse mtx with entries representing total allele counts * {sample}_chr1_vi.mtx, a sparse mtx with entries representing inferred viterbi state (haplotype state) * {sample}_chr1_snpAnnot.txt, the SNP positions and allele -* {sample}_chr1_SegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state. +* {sample}_chr1_viSegInfo.txt, statistics of viterbi state segments in text file format. It contains consecutive viterbi states for each chromosome with statistics including, starting SNP position, ending SNP position, the number of SNPs supporting the segment, the log likelihood ratio of the viterbi segment and the inferred hidden state. ## Usage ``` -Usage: - sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix> - - -Options: - -t --threads <threads> number of BAM decompression threads [default: 4] - -MQ --minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20] - -BQ --baseq <baseq> base quality threshold for a base to be used for counting [default: 13] - -CHR --chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms) - -minDP --minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1] - -maxDP --maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 10] - -chrName --chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix - -thetaREF --thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1] - -thetaALT --thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9] - -cmPmb --cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1] - -h --help show help - -Examples - ./sscocaller --threads 10 AAAGTAGCACGTCTCT-1.raw.bam AAAGTAGCACGTCTCT-1.raw.bam.dp3.alt.vcf.gz barcodeFile.tsv ./percell/ccsnp- + + Usage: + sscocaller [options] <BAM> <VCF> <barcodeFile> <out_prefix> + + Options: + -t --threads <threads> number of BAM decompression threads [default: 4] + -MQ --minMAPQ <mapq> Minimum MAPQ for read filtering [default: 20] + -BQ --baseq <baseq> base quality threshold for a base to be used for counting [default: 13] + -CHR --chrom <chrom> the selected chromsome (whole genome if not supplied,separate by comma if multiple chroms) + -minDP --minDP <minDP> the minimum DP for a SNP to be included in the output file [default: 1] + -maxDP --maxDP <maxDP> the maximum DP for a SNP to be included in the output file [default: 5] + -maxTotalDP --maxTotalDP <maxTotalDP> the maximum DP across all barcodes for a SNP to be included in the output file [default: 25] + -minTotalDP --minTotalDP <minTotalDP> the minimum DP across all barcodes for a SNP to be included in the output file [default: 10] + -chrName --chrName <chrName> the chr names with chr prefix or not, if not supplied then no prefix + -thetaREF --thetaREF <thetaREF> the theta for the binomial distribution conditioning on hidden state being REF [default: 0.1] + -thetaALT --thetaALT <thetaALT> the theta for the binomial distribution conditioning on hidden state being ALT [default: 0.9] + -cmPmb --cmPmb <cmPmb> the average centiMorgan distances per megabases default 0.1 cm per Mb [default 0.1] + -h --help show help + + Examples + ./sscocaller --threads 10 AAAGTAGCACGTCTCT-1.raw.bam AAAGTAGCACGTCTCT-1.raw.bam.dp3.alt.vcf.gz barcodeFile.tsv ./percell/ccsnp- + + ```