Commit c57ac083 authored by Christina Azodi's avatar Christina Azodi
Browse files

First init

parent b1f309cd
File mode changed from 100755 to 100644
FROM nfcore/base
LABEL authors="davismcc@gmail.com" \
maintainer="Davis McCarthy <davismcc@gmail.com>" \
description="Docker image containing all requirements for AAAA_2019_Project-Template"
description="Docker image containing all requirements for EUUI_2019_sceQTL-Workflow"
RUN apt-get update && \
apt-get -y upgrade && \
......
File mode changed from 100755 to 100644
# Project AAAA_2019_Project-Template
# Project EUUI_2019_sceQTL-Workflow
## Project overview
......@@ -26,13 +26,7 @@ c(sample(LETTERS, 1), sample(c("A","E","I","O","U"), 1), sample(LETTERS, 2))
### Update template for new project
To setup the new project, the project ID should be changed in the following files:
* `cluster.json` (a file that defines defaults for running Snakemake on a cluster)
* `Dockerfile` (a file that defines a Docker container with necessary software installed)
* `environment.yml` (a file that defines the conda environment to use)
* `analysis/_site.yml` (a YAML file with parameters for constructing the website with workflowr)
* `envs/myenvs.yml` (a conda environment definition file)
* `org/project_management.org` (an org-mode file for managing the project)
Specify the way to cite the project in the `CITATION` file. This will likely need updating over the course of the project.
......@@ -80,6 +74,6 @@ Workflow management software makes a very big difference when trying to run comp
## Acknowledgements
This project is a [workflowr][] project. Making use of the workflowr package for reproducible analyses dictates certain structures for the project file.
This project is a [workflmiowr][] project. Making use of the workflowr package for reproducible analyses dictates certain structures for the project file.
[workflowr]: https://github.com/jdblischak/workflowr
File mode changed from 100755 to 100644
name: "AAAA_2019_Project-Template"
name: "EUUI_2019_sceQTL-Workflow"
output_dir: "../docs"
navbar:
title: "AAAA_2019_Project-Template"
title: "EUUI_2019_sceQTL-Workflow"
left:
- text: "Home"
href: index.html
......@@ -11,7 +11,7 @@ navbar:
href: license.html
right:
- icon: fa-github
href: https://github.com/davismcc/AAAA_2019_Project-Template
href: https://github.com/davismcc/EUUI_2019_sceQTL-Workflow
output:
workflowr::wflow_html:
toc: true
......
File mode changed from 100755 to 100644
---
title: "Human dermal fibroblast clonality project"
author: "Davis J. McCarthy"
title: "single-cell expression QTL Workflow project"
author: "Christina B Azodi"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
......@@ -9,9 +9,7 @@ output:
## Project overview
This project investigates clonality in human dermal fibroblast cell populations
in 32 cell lines from distinct donors, using bulk whole-exome sequencing and
single-cell RNA-sequencing data.
This project developes a pipeline for eQTL analysis using single cell sequencing data from human dermal fibroblast cell populations and human induced pluripotent stem cells at different timepoints during differentiation.
**Key findings:**
......
File mode changed from 100755 to 100644
......@@ -4,7 +4,7 @@
"memory" : "8000",
"n" : 1,
"queue" : "research",
"name" : "AAAA_2019_Project-Template.{rule}.{wildcards}",
"name" : "EUUI_2019_sceQTL-Workflow.{rule}.{wildcards}",
"output" : "logs/{rule}.out",
"error" : "logs/{rule}.err"
}
......
File mode changed from 100755 to 100644
#!/bin/bash
# FUNCTION: Preprocess single cell data
# USAGE: sh sc_preprocess.sh </PATH/TO/RAW/FASTQ/FILES> </PATH/TO/OUTPUT>
# EXAMPLE: sc_preprocess.sh /mnt/mcscratch/cazodi/Datasets/sc_fibro/
# /mnt/mcfiles/cazodi/Projects/EUUI_2019_sceQTL-Workflow/output/00_sc-preprocessing/
# This script accepts the directory where raw fastq files are located and performs the following:
# 1. Read quality control (FastQC)
# 2. Summary of read quality control (MultiQC)
# 3. Quality and adaptor trimming (Trim_Galore!)
# 4. Trimmed read quality control (FastQC)
# 5. Summary of trimmed read quality control (MultiQC)
$WKDIR=/mnt/mcfiles/cazodi/Projects/EUUI_2019_sceQTL-Workflow/output/00_sc-preprocessing/
# Read input arguments
dir_in=$1
dir_out=$2
name=basename dir_in
dir_in
dir_out
name
# 1. Read quality control (FastQC)
## Fibro
#for i in ERR*fastq.gz; do fastqc -f fastq -out /mnt/mcfiles/cazodi/projects/01_sceQTL/00_ReadQC $i; done
## iPSC
# Trim_galore!
## Fibro
## iPSC
# FastQC on trimmed reads
## Fibro
## iPSC
# Salmon
## Fibro
## iPSC
# Scater
## Fibro
## iPSC
# Scran
## Fibro
## iPSC
\ No newline at end of file
# preprocess sc data
# FastQC on raw reads
## Fibro
for i in ERR*fastq.gz; do fastqc -f fastq -out /mnt/mcfiles/cazodi/projects/01_sceQTL/00_ReadQC $i; done
## iPSC
# Trim_galore!
## Fibro
## iPSC
# FastQC on trimmed reads
## Fibro
## iPSC
# Salmon
## Fibro
## iPSC
# Scater
## Fibro
## iPSC
# Scran
## Fibro
## iPSC
\ No newline at end of file
# Data
Save raw data files here.
Note: raw data files saved in scratch directory to save space
/mnt/mcscratch/cazodi/Datasets/geno/
/mnt/mcscratch/cazodi/Datasets/sc_fib/
/mnt/mcscratch/cazodi/Datasets/sc_iPSC/
name: AAAA_2019_Project-Template
name: EUUI_2019_sceQTL-Workflow
channels:
- defaults
- bioconda
......
name: AAAA_2019_Project-Template
name: EUUI_2019_sceQTL-Workflow
channels:
- defaults
- bioconda
......
File mode changed from 100755 to 100644
#Methods for eQTL analysis from Single Cell RNA-Seq
##Data
###Install fast ENA bulk download software
####1. Download and install aspera:
```
wget https://download.asperasoft.com/download/sw/connect/3.9.7/ibm-aspera-connect-3.9.7.175481-linux-g2.12-64.tar.gz
tar zxvf ibm-aspera-connect-3.9.7.175481-linux-g2.12-64.tar.gz
./ibm-aspera-connect-3.9.7.175481-linux-g2.12-64.sh
```
####2. Get accessions to download and generate ascp run files
**Fibroblast data: https://www.ebi.ac.uk/ena/data/view/PRJEB28691
iPSC data: https://www.ebi.ac.uk/ena/data/view/PRJEB14362
genotype data: http://www.hipsci.org/lines/#/files (ENA: https://www.ebi.ac.uk/ena/data/view/PRJEB11750)**
**Firbroblast data (E-MTAB-7167 from the Cardelino paper (McCarthy et al)))**
1. Download text file from ENA
2. Convert to wget download file:
```awk 'FS="\t", OFS="\t" { gsub("ftp.sra.ebi.ac.uk", "ftp://ftp.sra.ebi.ac.uk"); print }' PRJEB28691.txt | cut -f5 | awk -F ";" 'OFS="\n" {print $1, $2}' | awk NF | awk 'NR > 1, OFS="\n" {print "/usr/bin/wget" " " $1}' > download.txt```
3. Run using tmux:
```
tmux new -s fibro # Running on mc01
bash download.txt
```
4. Check downloads complete:
```
python /mnt/mcfiles/cazodi/github/Utilities/check_download.py -run download.txt -col 1 -delim '/' -sep ' '
```
**Downloading genotype data**
1. Download text file from ENA: Study accession, Analysis accession, Submitted files (FTP), Submitter's analysis name, Sample accession.
2. Convert into bash run file:
```awk 'FS="\t", OFS="\t" { gsub("ftp.sra.ebi.ac.uk", "ftp://ftp.sra.ebi.ac.uk"); print }' PRJEB11750.txt | cut -f6 | awk -F ";" 'OFS="\n" {print $1, $2}' | awk NF | awk 'NR > 1, OFS="\n" {print "/usr/bin/wget" " " $1}' > download.txt```
3. Select only genotype files (i.e. .vcf)
grep 'genotypes.vcf.gz' download.txt > download_vcf.txt
**iPSC data**
1. Download text file from ENA
2. Convert to wget download file:
```awk 'FS="\t", OFS="\t" { gsub("ftp.sra.ebi.ac.uk", "ftp://ftp.sra.ebi.ac.uk"); print }' PRJEB14362.txt | cut -f5 | awk -F ";" 'OFS="\n" {print $1, $2}' | awk NF | awk 'NR > 1, OFS="\n" {print "/usr/bin/wget" " " $1}' > download.txt```
3. Run using tmux:
```
tmux new -s fibro_ipsc # Running on mc01
bash download.txt
```
4. Check downloads complete:
```
python /mnt/mcfiles/cazodi/github/Utilities/check_download.py -run download.txt -col 1 -delim '/' -sep ' '
```
*Trying with aspera... never got it to work... couldn't connect TCP for SSH:*
```
awk 'FS="\t", OFS="\t" { gsub("ftp.sra.ebi.ac.uk", "era-fasp@fasp.sra.ebi.ac.uk:"); print }' accessions.txt | cut -f3 | awk -F ";" 'OFS="\n" {print $1, $2}' | awk NF | awk 'NR > 1, OFS="\n" {print "$HOME/.aspera/connect/bin/ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh" " " $1 " ."}' > download.txt
```
##Data Pre-processing
### Read Quality Control using [FastQC](https://raw.githubusercontent.com/s-andrews/FastQC/master/INSTALL.txt)
```
conda install -c bioconda fastqc
for i in ERR*fastq.gz; do fastqc -f fastq -out /mnt/mcfiles/cazodi/projects/01_sceQTL/00_ReadQC $i; done
```
Merge FastQC results:
```
chmod +x /mnt/mcfiles/cazodi/github/Utilities/MultiQC.sh
/mnt/mcfiles/cazodi/github/Utilities/MultiQC.sh
```
### Trim reads using [TrimGalore!](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md)
*Install*
```
fastqc -v # dependency
cutadapt --version # dependency
conda install -c bioconda cutadapt
curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/0.6.5.tar.gz -o trim_galore.tar.gz
tar xvzf trim_galore.tar.gz
~/TrimGalore-0.6.5/trim_galore
```
#+TITLE: AAAA_2019_Project-Template
#+TITLE: EUUI_2019_sceQTL-Workflow
* TODO Organise roadmap for project [2019-2-2 Sat]
* TODO Organise roadmap for project [2019-11-26 Tues]
* Kanban
| Backlog | Waiting On | Planned | Doing | Done |
|-----------------+------------+---------+-------+------|
| [[Update template]] | | | | |
| [[Reproducibility]] | | | | |
| | | | | |
| | | | [[Download sc geno data]] | |
| | [[Preprocessing]] | | | |
| [[Barcode counts]] | | | | |
| [[Normalizing]] | | | | |
| [[Demultiplexing]] | | | | |
| | | | | |
* [#C] Update template <<Update template>> [0%]
......
File mode changed from 100755 to 100644
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment