Commit cf485f19 authored by Davis McCarthy's avatar Davis McCarthy
Browse files

Updating snakemake workflow

parent 333a02ca
......@@ -42,17 +42,24 @@ We have two 1.5 hour sessions to work on single-cell methylation. Broadly, we wi
We will manage the data processing and analysis workflow using [Snakemake](http://snakemake.readthedocs.io/en/stable/). We will analyze our results in RStudio, using an [R Markdown](http://rmarkdown.rstudio.com) Notebook (see the `notebooks` folder in this repository for an example.)
## Data
## Data and references
The aim will be for you to analyze the data you generate during the course in Heidelberg.
However, in case that data is unavailable for any reason and to have an alternative dataset that is processed and ready for analysis, we also have access to a small dataset from Stephen Clark and colleagues at the Babraham Institute, Cambridge. This dataset consists of 15 cells from mouse embryos.
### Reference files
### Raw data
1. Raw `fastq` files are available at this [link](https://www.dropbox.com/sh/1wy3gw7fpil73dd/AADIOGvbsYNdt45KnaHahmqqa?dl=0) (6GB; password required, which will be shared on the course Slack channel). Only if you want to work from raw `fastq` files (substantial computation needed) and have a high-bandwidth connection, download the files at the link and save to `data/fastq`.
1. Raw `fastq` files for a "test" dataset (sampling 500,000 reads from each of
the above `fastq` files), smaller in size so a little more convenient, are
available at this [link](https://www.dropbox.com/sh/s0dmlgg0cmxak9y/AAAC4NK_Bz2rSN7kYJfJcloRa?dl=0)
(210MB; password required).
### Intermediate results files
1. Merged `Bismark` files are available at this [link](https://www.dropbox.com/sh/b3v55pdkkimo13s/AAA4gH-6uCxMqFSbFM72rwLna?dl=0) (76MB; password required). Download and copy these to `data/bismark/merged`.
1. Summarized, annotated methylation results that we will use for analysis are
available in the results folder of this repository (we will generate these
......
......@@ -15,22 +15,16 @@ import glob
import os
import re
TEST = False
if TEST:
SAMPLES_LONG = glob.glob('data/fastq/test/*.fastq.gz')
SAMPLES = [os.path.basename(w).replace('lane[123]+_', '') for w in SAMPLES_LONG]
SAMPLES = [w.replace('.fastq.gz', '') for w in SAMPLES]
SAMPLES_LONG = [os.path.basename(w).replace('.fastq.gz', '') for w in SAMPLES_LONG]
SAMPLES_MERGE = [w.replace('_R1', '').replace('_R2', '') for w in SAMPLES]
list(set(SAMPLES_MERGE))
else:
SAMPLES_LONG = glob.glob('data/fastq/*.fastq.gz')
SAMPLES = [w.replace('lane[123]+_', '') for w in SAMPLES_LONG]
SAMPLES = [w.replace('_[ATCG]+_.*.fastq.gz', '') for w in SAMPLES]
SAMPLES_LONG = [os.path.basename(w).replace('.fastq.gz', '') for w in SAMPLES_LONG]
SAMPLES_MERGE = [w.replace('_R1', '').replace('_R2', '') for w in SAMPLES]
SAMPLES_MERGE = list(set(SAMPLES_MERGE))
SAMPLES_LONG = glob.glob('data/fastq/*.fastq.gz')
SAMPLES_LONG = [re.sub('.*lane[123]+_', '', w) for w in SAMPLES_LONG]
SAMPLES = [re.sub('.*lane[123]+_', '', w) for w in SAMPLES_LONG]
SAMPLES = [re.sub('_[ATCG]+_.*_(R[12]).fastq.gz', '_\\1', w) for w in SAMPLES]
SAMPLES_LONG = [os.path.basename(w).replace('.fastq.gz', '') for w in SAMPLES_LONG]
SAMPLES_MERGE = [w.replace('_R1', '').replace('_R2', '') for w in SAMPLES]
SAMPLES_MERGE = list(set(SAMPLES_MERGE))
print(SAMPLES)
print(SAMPLES_LONG)
fastqc_html_reports = expand('reports/fastqc/{sample}_fastqc.html', sample = SAMPLES_LONG)
......@@ -44,7 +38,6 @@ rule all:
rule fastqc_reports:
input:
'data/fastq/{sample}.fastq.gz'
## if test = False, remove test/ from path above
output:
'reports/fastqc/{sample}_fastqc.html'
params:
......@@ -58,7 +51,6 @@ rule fastqc_reports:
rule trim_fastq:
input:
'data/fastq/{sample}.fastq.gz'
## if test = False, remove test/ from path above
output:
temp('{sample}_trimmed.fq.gz'),
temp('{sample}.fastq.gz_trimming_report.txt')
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment