Skip to content
Snippets Groups Projects
rlyu's avatar
Ruqian Lyu authored
2bf1d423
History

MIG scRNA-seq Workshop 2019: About the course

Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing (scRNA-seq). The cellular resolution and the genome-wide scope of scRNA-seq makes it possible to address issues that are intractable using other methods like bulk RNA-seq or single-cell RT-qPCR. However, scRNA-seq data poses many challenges due to the scale and complexity of scRNA-seq datasets, with novel methods often required to account for the particular characteristics of the data.

In this course we will discuss some of the questions that can be addressed using scRNA-seq as well as the available computational and statistical methods. We will cover key features of the technology platforms and fundamental principles of scRNA-seq data analysis that are transferable across technologies and analysis workflows. The number of computational tools is already vast and increasing rapidly, so we provide hands-on workflows using some of our favourite tools on carefully selected, biologically-relevant example datasets.

Across two days, attendees can expect to gain an understanding of approaches to and practical analysis experience on: quality control, data normalisation, visualisation, clustering, trajectory (pseudotime) inference, differential expression, batch correction and data integration.

Course outline:

  • Day 1:

    • Morning session 1: Workshop overview; introduction to scRNA-seq; pre-processing scRNA-seq data
    • Morning session 2: Quality control, visualisation and exploratory data analysis
    • Afternoon session 1: Normalisation, confounders and batch correction
    • Afternoon session 2: Latent spaces, clustering and cell annotation
  • Day 2:

    • Morning session 1: Trajectory inference
    • Morning session 2: Differential expression; data imputation
    • Afternoon session 1: Combining datasets and data integration
    • Afternoon session 2: Case studies

This course has been adapted from a course taught through the University of Cambridge Bioinformatics training unit, but the material is meant for anyone interested in learning about computational analysis of scRNA-seq data and is updated roughly twice per year.

The number of computational tools is increasing rapidly and we are doing our best to keep up to date with what is available. One of the main constraints for this course is that we would like to use tools that are implemented in R and that run reasonably fast. Moreover, we will also confess to being somewhat biased towards methods that have been developed either by us or by our friends and colleagues.

Web page

Access the formatted, navigable HTML version of the course here:

https://biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/index.html

Expression of interest

If you are interested in attending the workshop, please complete this expression of interest survey. Gathering expressions of interest allow us to tailor the material to the interests of the participants and ensure that participants have the necessary programming experience to benefit fully from the workshop.

Registration

Personalised registration links will be sent to those who complete the expression of interest form linked to above.

GitLab

The source code and materials for the course are available at the SVI Bioinformatics and Cellular Genomics Lab's GitLab:

https://gitlab.svi.edu.au/biocellgen-public/mig_2019_scrnaseq-workshop

Time and place

Wednesday 2 October and Thursday 3 October, 09:00 - 17:00

Room 256, Arts West Building, the University of Melbourne

For a map and directions, please see here: https://maps.unimelb.edu.au/parkville/building/148a

Schedule

  • Day 1:

    • 09:00-09:15 - Welcome: Introduction and workshop overview
    • 09:15-10:30 - Morning session 1: Introduction to scRNA-seq; pre-processing scRNA-seq data
    • 10:30-11:00 - Morning tea break
    • 11:00-12:30 - Morning session 2: Quality control, visualisation and exploratory data analysis
    • 12:30-13:30 - Lunch break
    • 13:30-15:00 - Afternoon session 1: Normalisation, confounders and batch correction
    • 15:00-15:30 - Afternoon tea break
    • 15:30-17:00 - Afternoon session 2: Latent spaces, clustering and cell annotation
  • Day 2:

    • 09:00-09:15 - Welcome back: recap and overview of Day 2
    • 09:15-10:30 - Morning session 1: Trajectory inference
    • 10:30-11:00 - Morning tea break
    • 11:00-12:30 - Morning session 2: Differential expression; data imputation
    • 12:30-13:30 - Lunch break
    • 13:30-15:00 - Afternoon session 1: Combining datasets and data integration
    • 15:00-15:30 - Afternoon tea break
    • 15:30-16:45 - Afternoon session 2: Case studies
    • 16:45-17:00 - Final remarks and workshop close

Video

This video was recorded during the course (2 days) in May 2019 in Cambridge, UK. This recorded version of the course differs slightly from the version in this document.

Day 1

Day 2

Docker image

The course can be reproduced without any package installation by running the course docker image which contains all the required packages.

Workshop Docker Repository on DockerHub

Run the image

Make sure Docker is installed on your system. If not, please follow these instructions. To run the course docker image (use the latest version):

docker run -p 8888:8888 -e PASSWORD="jupyter" svibiocellgen/mig_2019_scrnaseq-workshop:v1.01

Then follow the instructions provided, e.g.:

To access the notebook, open this file in a browser:
    file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
Or copy and paste one of these URLs:
    http://(a9ee1aad5398 or 127.0.0.1):8888/?token=22debff49d9aae3c50e6b0b5241b47eefb1b8f883fcb7e6d

A Jupyter session will be open in a web browser (we recommend Chrome).

Windows users

On Windows operating system the IP address of the container can be different from 127.0.0.1 (localhost). To find the IP address please run:

docker-machine ip default

Download data/other files

Download from AWS (within Docker)

In the Jupyter session, please click on New -> Terminal. In the new terminal window please run:

./poststart.sh

Manual download from AWS

If you want to download data files from AWS outside of Docker image you can still use the same poststart.sh script but you will need to install AWS CLI on your computer.

Alternatively, you can browse and download the files in you web-browser by visiting this link.

NB: Only the core datasets (i.e. not Tabula Muris) are available from AWS storage.

Manual download from SVI

Recommended if you are using your own computer

For simplicity, we have also hosted the core datasets used in the course and a subset of the Tabula Muris data on SVI websites. There are two files to download, both "tarballs", i.e. compressed archives of multiple folders and files.

Core datasets

To download the core datasets, click this link (195Mb).

It is most convenient to download the tarball to the head directory for the course. We then want to unpack the tarball and move it to a directory called data in the head directory of the repository.

To do this at the command line:

wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/mig-sc-workshop-2019-data.tar.gz
mkdir workshop-data
tar -xvf mig-sc-workshop-2019-data.tar.gz --directory workshop-data
mv workshop-data/mnt/mcfiles/Datasets/MIG_2019_scRNAseq-workshop/data ./
rm -r workshop-data

[This requires a little bit of faff to get all of the directory paths correct and then tidy updated.]

Alternatively, if you are working on your laptop, unpack the tarball using the default method on your system (usually a double click on the *.tar.gz file will do the trick) and drag and drop the data folder to the workshop directory.

Tabula Muris

To download the Tabula Muris data, clink this link (655Mb).

We then go through a similar process as described above to unpack the tarball.

wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/Tabula_Muris.tar.gz
tar -xvf Tabula_Muris.tar.gz
mv mnt/mcfiles/Datasets/Tabula_Muris data
rm -r mnt
Desired results

The data folder then should contain both the core datasets and the Tabula Muris data, and have the following structure:

data
├── 10cells_barcodes.txt
├── 2000_reference.transcripts.fa
├── deng
│   └── deng-reads.rds
├── droplet_id_example_per_barcode.txt.gz
├── droplet_id_example_truth.gz
├── EXAMPLE.cram
├── pancreas
│   ├── muraro.rds
│   └── segerstolpe.rds
├── pbmc3k_filtered_gene_bc_matrices
│   └── hg19
│       ├── barcodes.tsv
│       ├── genes.tsv
│       └── matrix.mtx
├── sce
│   ├── Heart_10X.rds
│   └── Thymus_10X.rds
├── Tabula_Muris
│   ├── droplet
│   │   ├── droplet
│   │   ├── droplet_annotation.csv
│   │   └── droplet_metadata.csv
│   └── FACS_smartseq2
│       ├── FACS
│       ├── FACS_annotations.csv
p│       └── FACS_metadata.csv
└── tung
    ├── annotation.txt
    ├── molecules.txt
    ├── reads.txt
    ├── TNs.txt
    └── TPs.txt

11 directories, 22 files

With the files in these locations, everything is set up to run the code as presented in the RMarkdown files in the workshop.

Case study dataset

The dataset we used in the case study is from (O’Koren et al), you can download all the relevant files via this link

It includes the raw fastqs and processed count matrix data and is of size 2.1GB. If you would like to start with the count matrix data, please follow the instruction in the RMarkdown to download the processed count matrix data from GEO.

RStudio

Now go back to Jupyter browser tab and change word tree in the url to rstudio. RStudio server will open with all of the course files, software and the data folder available.

Manual installation

If you are not using a docker image of the course, then to be able to run all code chunks of the course you need to clone or download the course GitHub repository and start an R session in the head folder. You will also need to install all required packages manually. We are using Bioconductor version 3.9 packages in this version of the course.

The install.R file in the workshop repository provides the necessary commands for installing all of the required packages. You can run this script from the command line with Rscript install.R or copy-and-paste the commands into an R session and run them interactively.

Below is the R script itself, so you can see the required packages and copy-and-paste these commands into R to install the packages. This first approach figures out which of the required packages are not installed and attempts to install them:

## Commands to install the packages necessary for the course

install.packages("BiocManager")

BiocManager::install(version = "3.9") ## this workshop uses Bioc release version 3.9

## Check which packages need installation
required_pkgs <- c("batchelor", "beachmat", "biomaRt", "bookdown", "cluster",
                   "corrplot", "cowplot", "DelayedArray", "DESeq2", "destiny",
                   "devtools", "DrImpute", "DropletUtils", "edgeR",
                   "EnsDb.Hsapiens.v86", "EnsDb.Mmusculus.v79", "future",
                   "ggbeeswarm", "ggfortify", "ggthemes", "glmpca", "googleVis",
                   "graph", "gtools", "harmony", "igraph", "kBET", "KernSmooth",
                   "knitr", "leidenbase", "limma", "lle", "M3Drop", "MAST",
                   "Matrix", "matrixStats", "mclust", "mixtools", "monocle",
                   "monocle3", "MultiAssayExperiment", "mvoutlier", "ouija",
                   "pcaMethods", "phateR","pheatmap", "plotly", "Polychrome",
                   "pryr", "RBGL", "RColorBrewer", "Rhdf5lib", "rJava", "ROCR",
                   "RUVSeq", "SC3", "scater", "scds", "scfind", "scImpute",
                   "scmap", "scran", "scRNA.seq.funcs", "scRNAseq",
                   "sctransform", "sctransform", "Seurat",
                   "SingleCellExperiment", "SingleR", "slalom", "SLICER",
                   "slingshot", "SummarizedExperiment", "sva", "tidyverse",
                   "TSCAN", "umap", "xtable")

installed_pkgs <- installed.packages()
ip <- rownames(installed_pkgs)
pkgs_to_install <- required_pkgs[!(required_pkgs %in% ip)]
pkgs_to_install
BiocManager::install(pkgs_to_install)

The commands below install all of the required packages (broken down by repository), and attempts to update any packages that are already installed. Installation for CRAN and Bioconductor packages only requires the package name; GitHub packages also require the name of the owner/host of the repository.

## Install other CRAN packages
BiocManager::install(c("devtools", "pryr", "tidyverse", "Seurat", "rJava", "umap",
                   "bookdown", "cluster", "KernSmooth", "ROCR", "googleVis",
                   "ggbeeswarm", "SLICER", "ggfortify", "mclust", "DrImpute",
                   "phateR", "mixtools", "mvoutlier", "igraph", "sctransform",
                   "corrplot", "cowplot", "ggthemes", "knitr", "lle", "Matrix",
                   "matrixStats", "pheatmap", "Polychrome", "plotly", "RColorBrewer",
                   "future"))

## Install Bioconductor packages - the command is the same
BiocManager::install(c("graph", "RBGL", "gtools", "xtable", "pcaMethods",
                       "limma", "SingleCellExperiment", "Rhdf5lib", "scater",
                       "scran", "RUVSeq", "sva", "SC3", "TSCAN", "monocle", 
                       "destiny", "DESeq2", "edgeR", "MAST", "scmap", "biomaRt",
                       "MultiAssayExperiment", "SummarizedExperiment",
                       "DropletUtils", "beachmat", "batchelor", "M3Drop",
                       "scfind", "scRNAseq", "SingleR", "slalom", "DelayedArray", "scds",
                       "slingshot", "EnsDb.Mmusculus.v79", "EnsDb.Hsapiens.v86"))

## install github packages
devtools::install_github(c("hemberg-lab/scRNA.seq.funcs",
                           "immunogenomics/harmony",
                           "Vivianstats/scImpute", "theislab/kBET",
                           "kieranrcampbell/ouija", "willtownes/glmpca",
                           "cole-trapnell-lab/leidenbase", "cole-trapnell-lab/monocle3",
                           "ChristophH/sctransform"))

Alternatively, you can just install packages listed in a chapter of interest.

Citation

This version of the workshop has been updated by Davis J. McCarthy, Ruqian Lyu and PuXue Qiao, based on the 2019-07-01 version of the course:

  • Ruqian Lyu, PuXue Qiao, Vladimir Kiselev, Tallulah Andrews, Jennifer Westoby, Maren Büttner, Jimmy Lee, Krzysztof Polanski, Sebastian Y. Müller, Elo Madissoon, Stephane Ballereau, Maria Do Nascimento Lopes Primo, Rocio Martinez Nunez, Martin Hemberg and Davis J. McCarthy, (2019), "Analysis of single cell RNA-seq data", https://scrnaseq-course.cog.sanger.ac.uk/website/index.html

License

All of the course material is licensed under GPL-3. Anyone is welcome to go through the material in order to learn about analysis of scRNA-seq data. If you plan to use the material for your own teaching, we would appreciate if you tell us about it in addition to providing a suitable citation.

Prerequisites

The course is intended for those who have basic familiarity with Unix and the R statistical language.

We will also assume that you are familiar with mapping and analysing bulk RNA-seq data as well as with the commonly available computational tools.

We recommend attending the Introduction to RNA-seq and ChIP-seq data analysis or the Analysis of high-throughput sequencing data with Bioconductor before attending this course.

Contact

If you have any comments, questions or suggestions about the material, please contact Davis McCarthy.