Skip to content
Snippets Groups Projects
splatter.html 76.5 KiB
Newer Older
Luke Zappia's avatar
Luke Zappia committed
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
Luke Zappia's avatar
Luke Zappia committed
<title>Introduction to Splatter • Splatter</title>
<!-- jquery --><script src="https://code.jquery.com/jquery-3.1.0.min.js" integrity="sha384-nrOSfDHtoPMzJHjVTdCopGqIqeYETSXhZDFyniQ8ZHcVy08QesyHcnOUpMpqnmWq" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/cosmo/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous">
Luke Zappia's avatar
Luke Zappia committed
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script><!-- Font Awesome icons --><link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-T8Gy5hrqNKT+hzMclPo118YTQO6cYprQmhrYwIiQ/3axmI1hQomh7Ud2hPOy8SP1" crossorigin="anonymous">
Luke Zappia's avatar
Luke Zappia committed
<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.7.1/clipboard.min.js" integrity="sha384-cV+rhyOuRHc9Ub/91rihWcGmMmCXDeksTtCihMupQHSsi8GIIRDG0ThDc3HGQFJ3" crossorigin="anonymous"></script><!-- sticky kit --><script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet">
<script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
<meta property="og:title" content="Introduction to Splatter">
<meta property="og:description" content="">
<meta name="twitter:card" content="summary">
Luke Zappia's avatar
Luke Zappia committed
<!-- mathjax --><script src="https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><!--[if lt IE 9]>
Luke Zappia's avatar
Luke Zappia committed
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
Luke Zappia's avatar
Luke Zappia committed
<![endif]--><!-- Global site tag (gtag.js) - Google Analytics --><script async src="https://www.googletagmanager.com/gtag/js?id=UA-52309538-4"></script><script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
Luke Zappia's avatar
Luke Zappia committed

Luke Zappia's avatar
Luke Zappia committed
  gtag('config', 'UA-52309538-4');
Luke Zappia's avatar
Luke Zappia committed
</script>
Luke Zappia's avatar
Luke Zappia committed
</head>
<body>
Luke Zappia's avatar
Luke Zappia committed
    <div class="container template-article">
Luke Zappia's avatar
Luke Zappia committed
      <header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
  <div class="container">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
Luke Zappia's avatar
Luke Zappia committed
      <span class="navbar-brand">
        <a class="navbar-link" href="../index.html">Splatter</a>
Luke Zappia's avatar
Luke Zappia committed
        <span class="label label-default" data-toggle="tooltip" data-placement="bottom" title="Released package">1.5.3</span>
Luke Zappia's avatar
Luke Zappia committed
      </span>
Luke Zappia's avatar
Luke Zappia committed
    </div>
Luke Zappia's avatar
Luke Zappia committed

Luke Zappia's avatar
Luke Zappia committed
    <div id="navbar" class="navbar-collapse collapse">
      <ul class="nav navbar-nav">
<li>
Luke Zappia's avatar
Luke Zappia committed
  <a href="../index.html">
Luke Zappia's avatar
Luke Zappia committed
    <span class="fa fa-home fa-lg"></span>
     
  </a>
</li>
<li>
Luke Zappia's avatar
Luke Zappia committed
  <a href="../articles/splatter.html">Get started</a>
Luke Zappia's avatar
Luke Zappia committed
</li>
<li>
  <a href="../reference/index.html">Reference</a>
</li>
<li>
Luke Zappia's avatar
Luke Zappia committed
  <a href="../news/index.html">Changelog</a>
Luke Zappia's avatar
Luke Zappia committed
</li>
      </ul>
<ul class="nav navbar-nav navbar-right">
<li>
  <a href="https://github.com/Oshlack/splatter">
    <span class="fa fa-github fa-lg"></span>
     
  </a>
</li>
      </ul>
</div>
<!--/.nav-collapse -->
  </div>
<!--/.container -->
</div>
<!--/.navbar -->

      
      </header><div class="row">
Luke Zappia's avatar
Luke Zappia committed
  <div class="col-md-9 contents">
Luke Zappia's avatar
Luke Zappia committed
    <div class="page-header toc-ignore">
      <h1>Introduction to Splatter</h1>
                        <h4 class="author">Luke Zappia</h4>
            
Luke Zappia's avatar
Luke Zappia committed
            <h4 class="date">2018-08-20</h4>
Luke Zappia's avatar
Luke Zappia committed
      
      <small class="dont-index">Source: <a href="https://github.com/Oshlack/splatter/blob/master/vignettes/splatter.Rmd"><code>vignettes/splatter.Rmd</code></a></small>
      <div class="hidden name"><code>splatter.Rmd</code></div>

    </div>
Luke Zappia's avatar
Luke Zappia committed
<div class="figure">
<img src="splatter-logo-small.png" alt="Splatter logo"><p class="caption">Splatter logo</p>
</div>
<p>Welcome to Splatter! Splatter is an R package for the simple simulation of single-cell RNA sequencing data. This vignette gives an overview and introduction to Splatter’s functionality.</p>
<div id="installation" class="section level1">
<h1 class="hasAnchor">
<a href="#installation" class="anchor"></a>Installation</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Splatter can be installed from Bioconductor:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r">
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("splatter")
</code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p>To install the most recent development version from Github use:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r">
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("Oshlack/splatter", build_vignettes=TRUE)
</code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="quickstart" class="section level1">
<h1 class="hasAnchor">
<a href="#quickstart" class="anchor"></a>Quickstart</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Assuming you already have a matrix of count data similar to that you wish to simulate there are two simple steps to creating a simulated data set with Splatter. Here is an example using the example dataset in the <code>scater</code> package:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="co"># Load package</span></a>
<a class="sourceLine" id="cb3-2" data-line-number="2"><span class="kw">library</span>(splatter)</a></code></pre></div>
<pre><code>## Loading required package: SingleCellExperiment</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'SingleCellExperiment' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: SummarizedExperiment</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'SummarizedExperiment' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: GenomicRanges</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'GenomicRanges' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: stats4</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: BiocGenerics</code></pre>
<pre><code>## Loading required package: parallel</code></pre>
<pre><code>## 
## Attaching package: 'BiocGenerics'</code></pre>
<pre><code>## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB</code></pre>
<pre><code>## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs</code></pre>
<pre><code>## The following objects are masked from 'package:base':
## 
Luke Zappia's avatar
Luke Zappia committed
##     anyDuplicated, append, as.data.frame, basename, cbind,
##     colMeans, colnames, colSums, dirname, do.call, duplicated,
##     eval, evalq, Filter, Find, get, grep, grepl, intersect,
##     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
##     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
##     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which, which.max,
##     which.min</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: S4Vectors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'S4Vectors' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## 
## Attaching package: 'S4Vectors'</code></pre>
<pre><code>## The following object is masked from 'package:base':
## 
##     expand.grid</code></pre>
<pre><code>## Loading required package: IRanges</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'IRanges' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: GenomeInfoDb</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: Biobase</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'Biobase' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: DelayedArray</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'DelayedArray' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: matrixStats</code></pre>
<pre><code>## 
## Attaching package: 'matrixStats'</code></pre>
<pre><code>## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: BiocParallel</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning: package 'BiocParallel' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## 
## Attaching package: 'DelayedArray'</code></pre>
<pre><code>## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## The following objects are masked from 'package:base':
Luke Zappia's avatar
Luke Zappia committed
## 
Luke Zappia's avatar
Luke Zappia committed
##     aperm, apply</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb37-1" data-line-number="1"><span class="co"># Load example data</span></a>
<a class="sourceLine" id="cb37-2" data-line-number="2"><span class="kw">library</span>(scater)</a></code></pre></div>
<pre><code>## Warning: package 'scater' was built under R version 3.5.1</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Loading required package: ggplot2</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## 
## Attaching package: 'scater'</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## The following object is masked from 'package:S4Vectors':
## 
##     rename</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## The following object is masked from 'package:stats':
## 
##     filter</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb43-1" data-line-number="1"><span class="kw">data</span>(<span class="st">"sc_example_counts"</span>)</a>
<a class="sourceLine" id="cb43-2" data-line-number="2"><span class="co"># Estimate parameters from example data</span></a>
<a class="sourceLine" id="cb43-3" data-line-number="3">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatEstimate.html">splatEstimate</a></span>(sc_example_counts)</a>
<a class="sourceLine" id="cb43-4" data-line-number="4"><span class="co"># Simulate data using estimated parameters</span></a>
<a class="sourceLine" id="cb43-5" data-line-number="5">sim &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(params)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Getting parameters...</code></pre>
<pre><code>## Creating simulation object...</code></pre>
<pre><code>## Simulating library sizes...</code></pre>
<pre><code>## Simulating gene means...</code></pre>
<pre><code>## Simulating BCV...</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Simulating counts...</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Simulating dropout (if needed)...</code></pre>
<pre><code>## Done!</code></pre>
<p>These steps will be explained in detail in the following sections but briefly the first step takes a dataset and estimates simulation parameters from it and the second step takes those parameters and simulates a new dataset.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="the-splat-simulation" class="section level1">
<h1 class="hasAnchor">
<a href="#the-splat-simulation" class="anchor"></a>The Splat simulation</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Before we look at how we estimate parameters let’s first look at how Splatter simulates data and what those parameters are. We use the term ‘Splat’ to refer to the Splatter’s own simulation and differentiate it from the package itself. The core of the Splat model is a gamma-Poisson distribution used to generate a gene by cell matrix of counts. Mean expression levels for each gene are simulated from a <a href="https://en.wikipedia.org/wiki/Gamma_distribution">gamma distribution</a> and the Biological Coefficient of Variation is used to enforce a mean-variance trend before counts are simulated from a <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson distribution</a>. Splat also allows you to simulate expression outlier genes (genes with mean expression outside the gamma distribution) and dropout (random knock out of counts based on mean expression). Each cell is given an expected library size (simulated from a log-normal distribution) that makes it easier to match to a given dataset.</p>
<p>Splat can also simulate differential expression between groups of different types of cells or differentiation paths between different cells types where expression changes in a continuous way. These are described further in the <a href="#simulating-counts">simulating counts</a> section.</p>
Luke Zappia's avatar
Luke Zappia committed
<div id="parameters" class="section level2">
<h2 class="hasAnchor">
<a href="#parameters" class="anchor"></a>Parameters</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>The parameters required for the Splat simulation are briefly described here:</p>
<ul>
<li>
<strong>Global parameters</strong>
<ul>
<li>
<code>nGenes</code> - The number of genes to simulate.</li>
<li>
<code>nCells</code> - The number of cells to simulate.</li>
<li>
<code>seed</code> - Seed to use for generating random numbers.</li>
</ul>
</li>
<li>
<strong>Batch parameters</strong>
<ul>
<li>
<code>nBatches</code> - The number of batches to simulate.</li>
<li>
<code>batchCells</code> - The number of cells in each batch.</li>
<li>
<code>batch.facLoc</code> - Location (meanlog) parameter for the batch effects factor log-normal distribution.</li>
<li>
<code>batch.facScale</code> - Scale (sdlog) parameter for the batch effects factor log-normal distribution.</li>
</ul>
</li>
<li>
<strong>Mean parameters</strong>
<ul>
<li>
<code>mean.shape</code> - Shape parameter for the mean gamma distribution.</li>
<li>
<code>mean.rate</code> - Rate parameter for the mean gamma distribution.</li>
</ul>
</li>
<li>
<strong>Library size parameters</strong>
<ul>
<li>
Luke Zappia's avatar
Luke Zappia committed
<code>lib.loc</code> - Location (meanlog) parameter for the library size log-normal distribution, or mean for the normal distribution.</li>
Luke Zappia's avatar
Luke Zappia committed
<li>
Luke Zappia's avatar
Luke Zappia committed
<code>lib.scale</code> - Scale (sdlog) parameter for the library size log-normal distribution, or sd for the normal distribution.</li>
<li>
<code>lib.norm</code> - Whether to use a normal distribution instead of the usual log-normal distribution.</li>
Luke Zappia's avatar
Luke Zappia committed
</ul>
</li>
<li>
<strong>Expression outlier parameters</strong>
<ul>
<li>
<code>out.prob</code> - Probability that a gene is an expression outlier.</li>
<li>
<code>out.facLoc</code> - Location (meanlog) parameter for the expression outlier factor log-normal distribution.</li>
<li>
<code>out.facScale</code> - Scale (sdlog) parameter for the expression outlier factor log-normal distribution.</li>
</ul>
</li>
<li>
<strong>Group parameters</strong>
<ul>
<li>
<code>nGroups</code> - The number of groups or paths to simulate.</li>
<li>
<code>group.prob</code> - The probabilities that cells come from particular groups.</li>
</ul>
</li>
<li>
<strong>Differential expression parameters</strong>
<ul>
<li>
<code>de.prob</code> - Probability that a gene is differentially expressed in each group or path.</li>
<li>
<code>de.loProb</code> - Probability that a differentially expressed gene is down-regulated.</li>
<li>
<code>de.facLoc</code> - Location (meanlog) parameter for the differential expression factor log-normal distribution.</li>
<li>
<code>de.facScale</code> - Scale (sdlog) parameter for the differential expression factor log-normal distribution.</li>
</ul>
</li>
<li>
<strong>Biological Coefficient of Variation parameters</strong>
<ul>
<li>
<code>bcv.common</code> - Underlying common dispersion across all genes.</li>
<li>
<code>bcv.df</code> - Degrees of Freedom for the BCV inverse chi-squared distribution.</li>
</ul>
</li>
<li>
<strong>Dropout parameters</strong>
<ul>
<li>
Luke Zappia's avatar
Luke Zappia committed
<code>dropout.type</code> - Type of dropout to simulate.</li>
Luke Zappia's avatar
Luke Zappia committed
<li>
<code>dropout.mid</code> - Midpoint parameter for the dropout logistic function.</li>
<li>
<code>dropout.shape</code> - Shape parameter for the dropout logistic function.</li>
</ul>
</li>
<li>
<strong>Differentiation path parameters</strong>
<ul>
<li>
<code>path.from</code> - Vector giving the originating point of each path.</li>
<li>
<code>path.length</code> - Vector giving the number of steps to simulate along each path.</li>
<li>
<code>path.skew</code> - Vector giving the skew of each path.</li>
<li>
<code>path.nonlinearProb</code> - Probability that a gene changes expression in a non-linear way along the differentiation path.</li>
<li>
<code>path.sigmaFac</code> - Sigma factor for non-linear gene paths.</li>
</ul>
</li>
</ul>
<p>While this may look like a lot of parameters Splatter attempts to make it easy for the user, both by providing sensible defaults and making it easy to estimate many of the parameters from real data. For more details on the parameters see <code><a href="../reference/SplatParams.html">?SplatParams</a></code>.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
</div>
<div id="the-splatparams-object" class="section level1">
<h1 class="hasAnchor">
<a href="#the-splatparams-object" class="anchor"></a>The <code>SplatParams</code> object</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>All the parameters for the Splat simulation are stored in a <code>SplatParams</code> object. Let’s create a new one and see what it looks like.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb52-1" data-line-number="1">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/newParams.html">newSplatParams</a></span>()</a>
<a class="sourceLine" id="cb52-2" data-line-number="2">params</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## A Params object of class SplatParams 
Luke Zappia's avatar
Luke Zappia committed
## Parameters can be (estimable) or [not estimable], 'Default' or  'NOT DEFAULT' 
Luke Zappia's avatar
Luke Zappia committed
## 
## Global: 
Luke Zappia's avatar
Luke Zappia committed
## (Genes)  (Cells)   [Seed] 
Luke Zappia's avatar
Luke Zappia committed
##   10000      100   802677 
Luke Zappia's avatar
Luke Zappia committed
## 
Luke Zappia's avatar
Luke Zappia committed
## 28 additional parameters 
Luke Zappia's avatar
Luke Zappia committed
## 
## Batches: 
Luke Zappia's avatar
Luke Zappia committed
##     [Batches]  [Batch Cells]     [Location]        [Scale] 
##             1            100            0.1            0.1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Mean: 
Luke Zappia's avatar
Luke Zappia committed
##  (Rate)  (Shape) 
##     0.3      0.6 
Luke Zappia's avatar
Luke Zappia committed
## 
## Library size: 
Luke Zappia's avatar
Luke Zappia committed
## (Location)     (Scale)      (Norm) 
##         11         0.2       FALSE 
Luke Zappia's avatar
Luke Zappia committed
## 
## Exprs outliers: 
Luke Zappia's avatar
Luke Zappia committed
## (Probability)     (Location)        (Scale) 
##          0.05              4            0.5 
Luke Zappia's avatar
Luke Zappia committed
## 
## Groups: 
Luke Zappia's avatar
Luke Zappia committed
##      [Groups]  [Group Probs] 
##             1              1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Diff expr: 
Luke Zappia's avatar
Luke Zappia committed
## [Probability]    [Down Prob]     [Location]        [Scale] 
##           0.1            0.5            0.1            0.4 
Luke Zappia's avatar
Luke Zappia committed
## 
## BCV: 
Luke Zappia's avatar
Luke Zappia committed
## (Common Disp)          (DoF) 
##           0.1             60 
Luke Zappia's avatar
Luke Zappia committed
## 
## Dropout: 
Luke Zappia's avatar
Luke Zappia committed
##     [Type]  (Midpoint)     (Shape) 
##       none           0          -1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Paths: 
Luke Zappia's avatar
Luke Zappia committed
##         [From]        [Length]          [Skew]    [Non-linear] 
##              0             100             0.5             0.1 
## [Sigma Factor] 
Luke Zappia's avatar
Luke Zappia committed
##            0.8</code></pre>
<p>As well as telling us what type of object we have (“A <code>Params</code> object of class <code>SplatParams</code>”) and showing us the values of the parameter this output gives us some extra information. We can see which parameters can be estimated by the <code>splatEstimate</code> function (those in parentheses), which can’t be estimated (those in brackets) and which have been changed from their default values (those in ALL CAPS).</p>
Luke Zappia's avatar
Luke Zappia committed
<div id="getting-and-setting" class="section level2">
<h2 class="hasAnchor">
<a href="#getting-and-setting" class="anchor"></a>Getting and setting</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>If we want to look at a particular parameter, for example the number of genes to simulate, we can extract it using the <code>getParam</code> function:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb54-1" data-line-number="1"><span class="kw"><a href="../reference/getParam.html">getParam</a></span>(params, <span class="st">"nGenes"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] 10000</code></pre>
<p>Alternatively, to give a parameter a new value we can use the <code>setParam</code> function:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb56"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb56-1" data-line-number="1">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/setParam.html">setParam</a></span>(params, <span class="st">"nGenes"</span>, <span class="dv">5000</span>)</a>
<a class="sourceLine" id="cb56-2" data-line-number="2"><span class="kw"><a href="../reference/getParam.html">getParam</a></span>(params, <span class="st">"nGenes"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] 5000</code></pre>
<p>If we want to extract multiple parameters (as a list) or set multiple parameters we can use the <code>getParams</code> or <code>setParams</code> functions:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb58"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb58-1" data-line-number="1"><span class="co"># Set multiple parameters at once (using a list)</span></a>
<a class="sourceLine" id="cb58-2" data-line-number="2">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/setParams.html">setParams</a></span>(params, <span class="dt">update =</span> <span class="kw">list</span>(<span class="dt">nGenes =</span> <span class="dv">8000</span>, <span class="dt">mean.rate =</span> <span class="fl">0.5</span>))</a>
<a class="sourceLine" id="cb58-3" data-line-number="3"><span class="co"># Extract multiple parameters as a list</span></a>
<a class="sourceLine" id="cb58-4" data-line-number="4"><span class="kw"><a href="../reference/getParams.html">getParams</a></span>(params, <span class="kw">c</span>(<span class="st">"nGenes"</span>, <span class="st">"mean.rate"</span>, <span class="st">"mean.shape"</span>))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## $nGenes
## [1] 8000
## 
## $mean.rate
## [1] 0.5
## 
## $mean.shape
## [1] 0.6</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb60"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb60-1" data-line-number="1"><span class="co"># Set multiple parameters at once (using additional arguments)</span></a>
<a class="sourceLine" id="cb60-2" data-line-number="2">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/setParams.html">setParams</a></span>(params, <span class="dt">mean.shape =</span> <span class="fl">0.5</span>, <span class="dt">de.prob =</span> <span class="fl">0.2</span>)</a>
<a class="sourceLine" id="cb60-3" data-line-number="3">params</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## A Params object of class SplatParams 
Luke Zappia's avatar
Luke Zappia committed
## Parameters can be (estimable) or [not estimable], 'Default' or  'NOT DEFAULT' 
Luke Zappia's avatar
Luke Zappia committed
## 
## Global: 
Luke Zappia's avatar
Luke Zappia committed
## (GENES)  (Cells)   [Seed] 
Luke Zappia's avatar
Luke Zappia committed
##    8000      100   802677 
Luke Zappia's avatar
Luke Zappia committed
## 
Luke Zappia's avatar
Luke Zappia committed
## 28 additional parameters 
Luke Zappia's avatar
Luke Zappia committed
## 
## Batches: 
Luke Zappia's avatar
Luke Zappia committed
##     [Batches]  [Batch Cells]     [Location]        [Scale] 
##             1            100            0.1            0.1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Mean: 
Luke Zappia's avatar
Luke Zappia committed
##  (RATE)  (SHAPE) 
##     0.5      0.5 
Luke Zappia's avatar
Luke Zappia committed
## 
## Library size: 
Luke Zappia's avatar
Luke Zappia committed
## (Location)     (Scale)      (Norm) 
##         11         0.2       FALSE 
Luke Zappia's avatar
Luke Zappia committed
## 
## Exprs outliers: 
Luke Zappia's avatar
Luke Zappia committed
## (Probability)     (Location)        (Scale) 
##          0.05              4            0.5 
Luke Zappia's avatar
Luke Zappia committed
## 
## Groups: 
Luke Zappia's avatar
Luke Zappia committed
##      [Groups]  [Group Probs] 
##             1              1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Diff expr: 
Luke Zappia's avatar
Luke Zappia committed
## [PROBABILITY]    [Down Prob]     [Location]        [Scale] 
##           0.2            0.5            0.1            0.4 
Luke Zappia's avatar
Luke Zappia committed
## 
## BCV: 
Luke Zappia's avatar
Luke Zappia committed
## (Common Disp)          (DoF) 
##           0.1             60 
Luke Zappia's avatar
Luke Zappia committed
## 
## Dropout: 
Luke Zappia's avatar
Luke Zappia committed
##     [Type]  (Midpoint)     (Shape) 
##       none           0          -1 
Luke Zappia's avatar
Luke Zappia committed
## 
## Paths: 
Luke Zappia's avatar
Luke Zappia committed
##         [From]        [Length]          [Skew]    [Non-linear] 
##              0             100             0.5             0.1 
## [Sigma Factor] 
Luke Zappia's avatar
Luke Zappia committed
##            0.8</code></pre>
<p>The parameters with have changed are now shown in ALL CAPS to indicate that they been changed form the default.</p>
<p>We can also set parameters directly when we call <code>newSplatParams</code>:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb62"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb62-1" data-line-number="1">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/newParams.html">newSplatParams</a></span>(<span class="dt">lib.loc =</span> <span class="dv">12</span>, <span class="dt">lib.scale =</span> <span class="fl">0.6</span>)</a>
<a class="sourceLine" id="cb62-2" data-line-number="2"><span class="kw"><a href="../reference/getParams.html">getParams</a></span>(params, <span class="kw">c</span>(<span class="st">"lib.loc"</span>, <span class="st">"lib.scale"</span>))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## $lib.loc
## [1] 12
## 
## $lib.scale
## [1] 0.6</code></pre>
Luke Zappia's avatar
Luke Zappia committed
</div>
</div>
<div id="estimating-parameters" class="section level1">
<h1 class="hasAnchor">
<a href="#estimating-parameters" class="anchor"></a>Estimating parameters</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Splat allows you to estimate many of it’s parameters from a data set containing counts using the <code>splatEstimate</code> function.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb64-1" data-line-number="1"><span class="co"># Check that sc_example counts is an integer matrix</span></a>
<a class="sourceLine" id="cb64-2" data-line-number="2"><span class="kw">class</span>(sc_example_counts)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] "matrix"</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb66"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb66-1" data-line-number="1"><span class="kw">typeof</span>(sc_example_counts)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] "integer"</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb68"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb68-1" data-line-number="1"><span class="co"># Check the dimensions, each row is a gene, each column is a cell</span></a>
<a class="sourceLine" id="cb68-2" data-line-number="2"><span class="kw">dim</span>(sc_example_counts)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] 2000   40</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb70"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb70-1" data-line-number="1"><span class="co"># Show the first few entries</span></a>
<a class="sourceLine" id="cb70-2" data-line-number="2">sc_example_counts[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>##           Cell_001 Cell_002 Cell_003 Cell_004 Cell_005
## Gene_0001        0      123        2        0        0
## Gene_0002      575       65        3     1561     2311
## Gene_0003        0        0        0        0     1213
## Gene_0004        0        1        0        0        0
## Gene_0005        0        0       11        0        0</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb72"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb72-1" data-line-number="1">params &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatEstimate.html">splatEstimate</a></span>(sc_example_counts)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p>Here we estimated parameters from a counts matrix but <code>splatEstimate</code> can also take a <code>SingleCellExperiment</code> object. The estimation process has the following steps:</p>
Luke Zappia's avatar
Luke Zappia committed
<ol style="list-style-type: decimal">
Luke Zappia's avatar
Luke Zappia committed
<li>Mean parameters are estimated by fitting a gamma distribution to the mean expression levels.</li>
<li>Library size parameters are estimated by fitting a log-normal distribution to the library sizes.</li>
<li>Expression outlier parameters are estimated by determining the number of outliers and fitting a log-normal distribution to their difference from the median.</li>
<li>BCV parameters are estimated using the <code>estimateDisp</code> function from the <code>edgeR</code> package.</li>
<li>Dropout parameters are estimated by checking if dropout is present and fitting a logistic function to the relationship between mean expression and proportion of zeros.</li>
</ol>
<p>For more details of the estimation procedures see <code><a href="../reference/splatEstimate.html">?splatEstimate</a></code>.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="simulating-counts" class="section level1">
<h1 class="hasAnchor">
<a href="#simulating-counts" class="anchor"></a>Simulating counts</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Once we have a set of parameters we are happy with we can use <code>splatSimulate</code> to simulate counts. If we want to make small adjustments to the parameters we can provide them as additional arguments, alternatively if we don’t supply any parameters the defaults will be used:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb73"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb73-1" data-line-number="1">sim &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(params, <span class="dt">nGenes =</span> <span class="dv">1000</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Getting parameters...</code></pre>
<pre><code>## Creating simulation object...</code></pre>
<pre><code>## Simulating library sizes...</code></pre>
<pre><code>## Simulating gene means...</code></pre>
<pre><code>## Simulating BCV...</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Simulating counts...</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Simulating dropout (if needed)...</code></pre>
<pre><code>## Done!</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb82"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb82-1" data-line-number="1">sim</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## class: SingleCellExperiment 
## dim: 1000 40 
Luke Zappia's avatar
Luke Zappia committed
## metadata(1): Params
Luke Zappia's avatar
Luke Zappia committed
## assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
## rownames(1000): Gene1 Gene2 ... Gene999 Gene1000
## rowData names(4): Gene BaseGeneMean OutlierFactor GeneMean
## colnames(40): Cell1 Cell2 ... Cell39 Cell40
## colData names(3): Cell Batch ExpLibSize
## reducedDimNames(0):
## spikeNames(0):</code></pre>
<p>Looking at the output of <code>splatSimulate</code> we can see that <code>sim</code> is <code>SingleCellExperiment</code> object with 1000 features (genes) and 40 samples (cells). The main part of this object is a features by samples matrix containing the simulated counts (accessed using <code>counts</code>), although it can also hold other expression measures such as FPKM or TPM. Additionaly a <code>SingleCellExperiment</code> contains phenotype information about each cell (accessed using <code>colData</code>) and feature information about each gene (accessed using <code>rowData</code>). Splatter uses these slots, as well as <code>assays</code>, to store information about the intermediate values of the simulation.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb84"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb84-1" data-line-number="1"><span class="co"># Access the counts</span></a>
<a class="sourceLine" id="cb84-2" data-line-number="2"><span class="kw">counts</span>(sim)[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>##       Cell1 Cell2 Cell3 Cell4 Cell5
Luke Zappia's avatar
Luke Zappia committed
## Gene1    12     0    84     2     0
## Gene2   166   344   871  3283  1194
## Gene3   196     0    49     0     0
## Gene4     0     0     2     0     0
## Gene5  1806     0     0     0     0</code></pre>
<div class="sourceCode" id="cb86"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb86-1" data-line-number="1"><span class="co"># Information about genes</span></a>
<a class="sourceLine" id="cb86-2" data-line-number="2"><span class="kw">head</span>(<span class="kw">rowData</span>(sim))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## DataFrame with 6 rows and 4 columns
Luke Zappia's avatar
Luke Zappia committed
##           Gene     BaseGeneMean OutlierFactor         GeneMean
##       &lt;factor&gt;        &lt;numeric&gt;     &lt;numeric&gt;        &lt;numeric&gt;
## Gene1    Gene1 32.3718444106857             1 32.3718444106857
## Gene2    Gene2 747.318971058425             1 747.318971058425
## Gene3    Gene3 326.109389737362             1 326.109389737362
## Gene4    Gene4 110.418481903485             1 110.418481903485
## Gene5    Gene5 31.0298971169305             1 31.0298971169305
## Gene6    Gene6 4.76290936260425             1 4.76290936260425</code></pre>
<div class="sourceCode" id="cb88"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb88-1" data-line-number="1"><span class="co"># Information about cells</span></a>
<a class="sourceLine" id="cb88-2" data-line-number="2"><span class="kw">head</span>(<span class="kw">colData</span>(sim))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## DataFrame with 6 rows and 3 columns
Luke Zappia's avatar
Luke Zappia committed
##           Cell       Batch       ExpLibSize
##       &lt;factor&gt; &lt;character&gt;        &lt;numeric&gt;
Luke Zappia's avatar
Luke Zappia committed
## Cell1    Cell1      Batch1 416832.625556834
## Cell2    Cell2      Batch1 485547.616871914
## Cell3    Cell3      Batch1 213084.745731367
## Cell4    Cell4      Batch1 298739.480020433
## Cell5    Cell5      Batch1 286377.519263817
## Cell6    Cell6      Batch1 335445.033244731</code></pre>
<div class="sourceCode" id="cb90"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb90-1" data-line-number="1"><span class="co"># Gene by cell matrices</span></a>
<a class="sourceLine" id="cb90-2" data-line-number="2"><span class="kw">names</span>(<span class="kw">assays</span>(sim))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] "BatchCellMeans" "BaseCellMeans"  "BCV"            "CellMeans"     
## [5] "TrueCounts"     "counts"</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb92"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb92-1" data-line-number="1"><span class="co"># Example of cell means matrix</span></a>
<a class="sourceLine" id="cb92-2" data-line-number="2"><span class="kw">assays</span>(sim)<span class="op">$</span>CellMeans[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>##              Cell1        Cell2        Cell3        Cell4        Cell5
Luke Zappia's avatar
Luke Zappia committed
## Gene1 1.353239e+01 5.985547e-02  81.70412873 4.165728e+00 1.960157e-03
## Gene2 1.636830e+02 3.550569e+02 885.79020522 3.334211e+03 1.179854e+03
## Gene3 2.131076e+02 3.205235e-04  52.24270359 3.044254e-05 5.925819e-06
## Gene4 1.708375e-10 1.486684e-12   1.08476099 2.597789e-08 1.729964e-03
## Gene5 1.822366e+03 6.290119e-01   0.05023998 7.738061e-03 1.338512e-05</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<p>An additional (big) advantage of outputting a <code>SingleCellExperiment</code> is that we get immediate access to other analysis packages, such as the plotting functions in <code>scater</code>. For example we can make a PCA plot:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb94"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb94-1" data-line-number="1"><span class="co"># Use scater to calculate logcounts</span></a>
<a class="sourceLine" id="cb94-2" data-line-number="2">sim &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/normalize">normalise</a></span>(sim)</a></code></pre></div>
<pre><code>## Warning: 'normalise' is deprecated.
## Use 'normalize' instead.
## See help("Deprecated")</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning in .local(object, ...): using library sizes as size factors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb97"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb97-1" data-line-number="1"><span class="co"># Plot PCA</span></a>
<a class="sourceLine" id="cb97-2" data-line-number="2"><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/plot_reddim">plotPCA</a></span>(sim)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/pca-1.png" width="576" style="display: block; margin: auto;"></p>
<p>(<strong>NOTE:</strong> Your values and plots may look different as the simulation is random and produces different results each time it is run.)</p>
Luke Zappia's avatar
Luke Zappia committed
<p>For more details about the <code>SingleCellExperiment</code> object refer to the [vignette] <a href="https://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html">SCE-vignette</a>. For information about what you can do with <code>scater</code> refer to the <code>scater</code> documentation and <a href="https://bioconductor.org/packages/release/bioc/vignettes/scater/inst/doc/vignette.html">vignette</a>.</p>
Luke Zappia's avatar
Luke Zappia committed
<p>The <code>splatSimulate</code> function outputs the following additional information about the simulation:</p>
<ul>
<li>
<strong>Cell information (<code>pData</code>)</strong>
<ul>
<li>
<code>Cell</code> - Unique cell identifier.</li>
<li>
<code>Group</code> - The group or path the cell belongs to.</li>
<li>
<code>ExpLibSize</code> - The expected library size for that cell.</li>
<li>
<code>Step</code> (paths only) - How far along the path each cell is.</li>
</ul>
</li>
<li>
<strong>Gene information (<code>fData</code>)</strong>
<ul>
<li>
<code>Gene</code> - Unique gene identifier.</li>
<li>
<code>BaseGeneMean</code> - The base expression level for that gene.</li>
<li>
<code>OutlierFactor</code> - Expression outlier factor for that gene (1 is not an outlier).</li>
<li>
<code>GeneMean</code> - Expression level after applying outlier factors.</li>
<li>
<code>DEFac[Group]</code> - The differential expression factor for each gene in a particular group (1 is not differentially expressed).</li>
<li>
<code>GeneMean[Group]</code> - Expression level of a gene in a particular group after applying differential expression factors.</li>
</ul>
</li>
<li>
<strong>Gene by cell information (<code>assayData</code>)</strong>
<ul>
<li>
<code>BaseCellMeans</code> - The expression of genes in each cell adjusted for expected library size.</li>
<li>
<code>BCV</code> - The Biological Coefficient of Variation for each gene in each cell.</li>
<li>
<code>CellMeans</code> - The expression level of genes in each cell adjusted for BCV.</li>
<li>
<code>TrueCounts</code> - The simulated counts before dropout.</li>
<li>
<code>Dropout</code> - Logical matrix showing which counts have been dropped in which cells.</li>
</ul>
</li>
</ul>
Luke Zappia's avatar
Luke Zappia committed
<p>Values that have been added by Splatter are named using <code>UpperCamelCase</code> to separate them from the <code>underscore_naming</code> used by <code>scater</code> and other packages. For more information on the simulation see <code><a href="../reference/splatSimulate.html">?splatSimulate</a></code>.</p>
Luke Zappia's avatar
Luke Zappia committed
<div id="simulating-groups" class="section level2">
<h2 class="hasAnchor">
<a href="#simulating-groups" class="anchor"></a>Simulating groups</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>So far we have only simulated a single population of cells but often we are interested in investigating a mixed population of cells and looking to see what cell types are present or what differences there are between them. Splatter is able to simulate these situations by changing the <code>method</code> argument Here we are going to simulate two groups, by specifying the <code>group.prob</code> parameter and setting the <code>method</code> parameter to <code>"groups"</code>:</p>
<p>(<strong>NOTE:</strong> We have also set the <code>verbose</code> argument to <code>FALSE</code> to stop Splatter printing progress messages.)</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb98"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb98-1" data-line-number="1">sim.groups &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(<span class="dt">group.prob =</span> <span class="kw">c</span>(<span class="fl">0.5</span>, <span class="fl">0.5</span>), <span class="dt">method =</span> <span class="st">"groups"</span>,</a>
<a class="sourceLine" id="cb98-2" data-line-number="2">                            <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb98-3" data-line-number="3">sim.groups &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/normalize">normalise</a></span>(sim.groups)</a></code></pre></div>
<pre><code>## Warning: 'normalise' is deprecated.
## Use 'normalize' instead.
## See help("Deprecated")</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning in .local(object, ...): using library sizes as size factors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb101-1" data-line-number="1"><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/plot_reddim">plotPCA</a></span>(sim.groups, <span class="dt">colour_by =</span> <span class="st">"Group"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/groups-1.png" width="576" style="display: block; margin: auto;"></p>
<p>As we have set both the group probabilites to 0.5 we should get approximately equal numbers of cells in each group (around 50 in this case). If we wanted uneven groups we could set <code>group.prob</code> to any set of probabilites that sum to 1.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="simulating-paths" class="section level2">
<h2 class="hasAnchor">
<a href="#simulating-paths" class="anchor"></a>Simulating paths</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>The other situation that is often of interest is a differentiation process where one cell type is changing into another. Splatter approximates this process by simulating a series of steps between two groups and randomly assigning each cell to a step. We can create this kind of simulation using the <code>"paths"</code> method.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb102"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb102-1" data-line-number="1">sim.paths &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(<span class="dt">method =</span> <span class="st">"paths"</span>, <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb102-2" data-line-number="2">sim.paths &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/normalize">normalise</a></span>(sim.paths)</a></code></pre></div>
<pre><code>## Warning: 'normalise' is deprecated.
## Use 'normalize' instead.
## See help("Deprecated")</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning in .local(object, ...): using library sizes as size factors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb105"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb105-1" data-line-number="1"><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/plot_reddim">plotPCA</a></span>(sim.paths, <span class="dt">colour_by =</span> <span class="st">"Step"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/paths-1.png" width="576" style="display: block; margin: auto;"></p>
<p>Here the colours represent the “step” of each cell or how far along the differentiation path it is. We can see that the cells with dark colours are more similar to the originating cell type and the light coloured cells are closer to the final, differentiated, cell type. By setting additional parameters it is possible to simulate more complex process (for example multiple mature cell types from a single progenitor).</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="batch-effects" class="section level2">
<h2 class="hasAnchor">
<a href="#batch-effects" class="anchor"></a>Batch effects</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>Another factor that is important in the analysis of any sequencing experiment are batch effects, technical variation that is common to a set of samples processed at the same time. We apply batch effects by telling Splatter how many cells are in each batch:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb106"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb106-1" data-line-number="1">sim.batches &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(<span class="dt">batchCells =</span> <span class="kw">c</span>(<span class="dv">50</span>, <span class="dv">50</span>), <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb106-2" data-line-number="2">sim.batches &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/normalize">normalise</a></span>(sim.batches)</a></code></pre></div>
<pre><code>## Warning: 'normalise' is deprecated.
## Use 'normalize' instead.
## See help("Deprecated")</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning in .local(object, ...): using library sizes as size factors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb109"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb109-1" data-line-number="1"><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/plot_reddim">plotPCA</a></span>(sim.batches, <span class="dt">colour_by =</span> <span class="st">"Batch"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/batches-1.png" width="576" style="display: block; margin: auto;"></p>
<p>This looks at lot like when we simulated groups and that is because the process is very similar. The difference is that batch effects are applied to all genes, not just those that are differentially expressed, and the effects are usually smaller. By combining groups and batches we can simulate both unwanted variation that we aren’t interested in (batch) and the wanted variation we are looking for (group):</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb110"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb110-1" data-line-number="1">sim.groups &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(<span class="dt">batchCells =</span> <span class="kw">c</span>(<span class="dv">50</span>, <span class="dv">50</span>), <span class="dt">group.prob =</span> <span class="kw">c</span>(<span class="fl">0.5</span>, <span class="fl">0.5</span>),</a>
<a class="sourceLine" id="cb110-2" data-line-number="2">                            <span class="dt">method =</span> <span class="st">"groups"</span>, <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb110-3" data-line-number="3">sim.groups &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/normalize">normalise</a></span>(sim.groups)</a></code></pre></div>
<pre><code>## Warning: 'normalise' is deprecated.
## Use 'normalize' instead.
## See help("Deprecated")</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Warning in .local(object, ...): using library sizes as size factors</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb113"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb113-1" data-line-number="1"><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/plot_reddim">plotPCA</a></span>(sim.groups, <span class="dt">shape_by =</span> <span class="st">"Batch"</span>, <span class="dt">colour_by =</span> <span class="st">"Group"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/batch-groups-1.png" width="576" style="display: block; margin: auto;"></p>
<p>Here we see that the effects of the group (first component) are stronger than the batch effects (second component) but by adjusting the parameters we could made the batch effects dominate.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="convenience-functions" class="section level2">
<h2 class="hasAnchor">
<a href="#convenience-functions" class="anchor"></a>Convenience functions</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>Each of the Splatter simulation methods has it’s own convenience function. To simulate a single population use <code><a href="../reference/splatSimulate.html">splatSimulateSingle()</a></code> (equivalent to <code><a href="../reference/splatSimulate.html">splatSimulate(method = "single")</a></code>), to simulate grops use <code><a href="../reference/splatSimulate.html">splatSimulateGroups()</a></code> (equivalent to <code><a href="../reference/splatSimulate.html">splatSimulate(method = "groups")</a></code>) or to simulate paths use <code><a href="../reference/splatSimulate.html">splatSimulatePaths()</a></code> (equivalent to <code><a href="../reference/splatSimulate.html">splatSimulate(method = "paths")</a></code>).</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
</div>
<div id="other-simulations" class="section level1">
<h1 class="hasAnchor">
<a href="#other-simulations" class="anchor"></a>Other simulations</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>As well as it’s own Splat simulation method the Splatter package contains implementations of other single-cell RNA-seq simulations that have been published or wrappers around simulations included in other packages. To see all the available simulations run the <code><a href="../reference/listSims.html">listSims()</a></code> function:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb114"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb114-1" data-line-number="1"><span class="kw"><a href="../reference/listSims.html">listSims</a></span>()</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## Splatter currently contains 13 simulations 
Luke Zappia's avatar
Luke Zappia committed
## 
## Splat (splat) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1186/s13059-017-1305-0    GitHub: Oshlack/splatter 
## The Splat simulation generates means from a gamma distribution, adjusts them for BCV and generates counts from a gamma-poisson. Dropout and batch effects can be optionally added. 
Luke Zappia's avatar
Luke Zappia committed
## 
## Splat Single (splatSingle) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1186/s13059-017-1305-0    GitHub: Oshlack/splatter 
Luke Zappia's avatar
Luke Zappia committed
## The Splat simulation with a single population. 
## 
## Splat Groups (splatGroups) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1186/s13059-017-1305-0    GitHub: Oshlack/splatter 
Luke Zappia's avatar
Luke Zappia committed
## The Splat simulation with multiple groups. Each group can have it's own differential expression probability and fold change distribution. 
## 
## Splat Paths (splatPaths) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1186/s13059-017-1305-0    GitHub: Oshlack/splatter 
Luke Zappia's avatar
Luke Zappia committed
## The Splat simulation with differentiation paths. Each path can have it's own length, skew and probability. Genes can change in non-linear ways. 
## 
## Simple (simple) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1186/s13059-017-1305-0    GitHub: Oshlack/splatter 
Luke Zappia's avatar
Luke Zappia committed
## A simple simulation with gamma means and negative binomial counts. 
## 
## Lun (lun) 
## DOI: 10.1186/s13059-016-0947-7    GitHub: MarioniLab/Deconvolution2016 
Luke Zappia's avatar
Luke Zappia committed
## Gamma distributed means and negative binomial counts. Cells are given a size factor and differential expression can be simulated with fixed fold changes. 
## 
## Lun 2 (lun2) 
Luke Zappia's avatar
Luke Zappia committed
## DOI: 10.1093/biostatistics/kxw055     GitHub: MarioniLab/PlateEffects2016 
Luke Zappia's avatar
Luke Zappia committed
## Negative binomial counts where the means and dispersions have been sampled from a real dataset. The core feature of the Lun 2 simulation is the addition of plate effects. Differential expression can be added between two groups of plates and optionally a zero-inflated negative-binomial can be used. 
## 
## scDD (scDD) 
## DOI: 10.1186/s13059-016-1077-y    GitHub: kdkorthauer/scDD 
## The scDD simulation samples a given dataset and can simulate differentially expressed and differentially distributed genes between two conditions. 
## 
Luke Zappia's avatar
Luke Zappia committed
## BASiCS (BASiCS) 
## DOI: 10.1371/journal.pcbi.1004333     GitHub: catavallejos/BASiCS 
## The BASiCS simulation is based on a bayesian model used to deconvolve biological and technical variation and includes spike-ins and batch effects. 
## 
## mfa (mfa) 
## DOI: 10.12688/wellcomeopenres.11087.1     GitHub: kieranrcampbell/mfa 
Luke Zappia's avatar
Luke Zappia committed
## The mfa simulation produces a bifurcating pseudotime trajectory. This can optionally include genes with transient changes in expression and added dropout. 
## 
## PhenoPath (pheno) 
## DOI: 10.1101/159913   GitHub: kieranrcampbell/phenopath 
## The PhenoPath simulation produces a pseudotime trajectory with different types of genes. 
## 
## ZINB-WaVE (zinb) 
## DOI: 10.1101/125112   GitHub: drisso/zinbwave 
Luke Zappia's avatar
Luke Zappia committed
## The ZINB-WaVE simulation simulates counts from a sophisticated zero-inflated negative-binomial distribution including cell and gene-level covariates. 
## 
## SparseDC (sparseDC) 
## DOI: 10.1093/nar/gkx1113      GitHub: cran/SparseDC 
## The SparseDC simulation simulates a set of clusters across two conditions, where some clusters may be present in only one condition.</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<p>(or more conveniently for the vignette as a table)</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb116"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb116-1" data-line-number="1">knitr<span class="op">::</span><span class="kw"><a href="http://www.rdocumentation.org/packages/knitr/topics/kable">kable</a></span>(<span class="kw"><a href="../reference/listSims.html">listSims</a></span>(<span class="dt">print =</span> <span class="ot">FALSE</span>))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<table class="table">
<thead><tr class="header">
Luke Zappia's avatar
Luke Zappia committed
<th align="left">Name</th>
<th align="left">Prefix</th>
<th align="left">DOI</th>
<th align="left">GitHub</th>
<th align="left">Description</th>
Luke Zappia's avatar
Luke Zappia committed
</tr></thead>
<tbody>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Splat</td>
<td align="left">splat</td>
<td align="left">10.1186/s13059-017-1305-0</td>
<td align="left">Oshlack/splatter</td>
<td align="left">The Splat simulation generates means from a gamma distribution, adjusts them for BCV and generates counts from a gamma-poisson. Dropout and batch effects can be optionally added.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Splat Single</td>
<td align="left">splatSingle</td>
<td align="left">10.1186/s13059-017-1305-0</td>
<td align="left">Oshlack/splatter</td>
<td align="left">The Splat simulation with a single population.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Splat Groups</td>
<td align="left">splatGroups</td>
<td align="left">10.1186/s13059-017-1305-0</td>
<td align="left">Oshlack/splatter</td>
<td align="left">The Splat simulation with multiple groups. Each group can have it’s own differential expression probability and fold change distribution.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Splat Paths</td>
<td align="left">splatPaths</td>
<td align="left">10.1186/s13059-017-1305-0</td>
<td align="left">Oshlack/splatter</td>
<td align="left">The Splat simulation with differentiation paths. Each path can have it’s own length, skew and probability. Genes can change in non-linear ways.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Simple</td>
<td align="left">simple</td>
<td align="left">10.1186/s13059-017-1305-0</td>
<td align="left">Oshlack/splatter</td>
<td align="left">A simple simulation with gamma means and negative binomial counts.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Lun</td>
<td align="left">lun</td>
<td align="left">10.1186/s13059-016-0947-7</td>
<td align="left">MarioniLab/Deconvolution2016</td>
<td align="left">Gamma distributed means and negative binomial counts. Cells are given a size factor and differential expression can be simulated with fixed fold changes.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">Lun 2</td>
<td align="left">lun2</td>
<td align="left">10.1093/biostatistics/kxw055</td>
<td align="left">MarioniLab/PlateEffects2016</td>
<td align="left">Negative binomial counts where the means and dispersions have been sampled from a real dataset. The core feature of the Lun 2 simulation is the addition of plate effects. Differential expression can be added between two groups of plates and optionally a zero-inflated negative-binomial can be used.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">scDD</td>
<td align="left">scDD</td>
<td align="left">10.1186/s13059-016-1077-y</td>
<td align="left">kdkorthauer/scDD</td>
<td align="left">The scDD simulation samples a given dataset and can simulate differentially expressed and differentially distributed genes between two conditions.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">BASiCS</td>
<td align="left">BASiCS</td>
<td align="left">10.1371/journal.pcbi.1004333</td>
<td align="left">catavallejos/BASiCS</td>
<td align="left">The BASiCS simulation is based on a bayesian model used to deconvolve biological and technical variation and includes spike-ins and batch effects.</td>
</tr>
Luke Zappia's avatar
Luke Zappia committed
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">mfa</td>
<td align="left">mfa</td>
<td align="left">10.12688/wellcomeopenres.11087.1</td>
<td align="left">kieranrcampbell/mfa</td>
<td align="left">The mfa simulation produces a bifurcating pseudotime trajectory. This can optionally include genes with transient changes in expression and added dropout.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="odd">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">PhenoPath</td>
<td align="left">pheno</td>
<td align="left">10.1101/159913</td>
<td align="left">kieranrcampbell/phenopath</td>
<td align="left">The PhenoPath simulation produces a pseudotime trajectory with different types of genes.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
<tr class="even">
Luke Zappia's avatar
Luke Zappia committed
<td align="left">ZINB-WaVE</td>
<td align="left">zinb</td>
<td align="left">10.1101/125112</td>
<td align="left">drisso/zinbwave</td>
<td align="left">The ZINB-WaVE simulation simulates counts from a sophisticated zero-inflated negative-binomial distribution including cell and gene-level covariates.</td>
Luke Zappia's avatar
Luke Zappia committed
</tr>
Luke Zappia's avatar
Luke Zappia committed
<tr class="odd">
<td align="left">SparseDC</td>
<td align="left">sparseDC</td>
<td align="left">10.1093/nar/gkx1113</td>
<td align="left">cran/SparseDC</td>
<td align="left">The SparseDC simulation simulates a set of clusters across two conditions, where some clusters may be present in only one condition.</td>
</tr>
Luke Zappia's avatar
Luke Zappia committed
</tbody>
</table>
Luke Zappia's avatar
Luke Zappia committed
<p>Each simulation has it’s own prefix which gives the name of the functions associated with that simulation. For example the prefix for the simple simulation is <code>simple</code> so it would store it’s parameters in a <code>SimpleParams</code> object that can be created using <code><a href="../reference/newParams.html">newSimpleParams()</a></code> or estimated from real data using <code><a href="../reference/simpleEstimate.html">simpleEstimate()</a></code>. To simulate data using that simulation you would use <code><a href="../reference/simpleSimulate.html">simpleSimulate()</a></code>. Each simulation returns a <code>SingleCellExperiment</code> object with intermediate values similar to that returned by <code><a href="../reference/splatSimulate.html">splatSimulate()</a></code>. For more detailed information on each simulation see the appropriate help page (eg. <code><a href="../reference/simpleSimulate.html">?simpleSimulate</a></code> for information on how the simple simulation works or <code><a href="../reference/lun2Estimate.html">? lun2Estimate</a></code> for details of how the Lun 2 simulation estimates parameters) or refer to the appropriate paper or package.</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="other-expression-values" class="section level1">
<h1 class="hasAnchor">
<a href="#other-expression-values" class="anchor"></a>Other expression values</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>Splatter is designed to simulate count data but some analysis methods expect other expression values, particularly length-normalised values such as TPM or FPKM. The <code>scater</code> package has functions for adding these values to a <code>SingleCellExperiment</code> object but they require a length for each gene. The <code>addGeneLengths</code> function can be used to simulate these lengths:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb117"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb117-1" data-line-number="1">sim &lt;-<span class="st"> </span><span class="kw"><a href="../reference/simpleSimulate.html">simpleSimulate</a></span>(<span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb117-2" data-line-number="2">sim &lt;-<span class="st"> </span><span class="kw"><a href="../reference/addGeneLengths.html">addGeneLengths</a></span>(sim)</a>
<a class="sourceLine" id="cb117-3" data-line-number="3"><span class="kw">head</span>(<span class="kw">rowData</span>(sim))</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## DataFrame with 6 rows and 3 columns
Luke Zappia's avatar
Luke Zappia committed
##           Gene          GeneMean    Length
##       &lt;factor&gt;         &lt;numeric&gt; &lt;numeric&gt;
## Gene1    Gene1  1.63195218364911      3183
## Gene2    Gene2  2.24908052110884      3558
## Gene3    Gene3 0.118794797239876      4102
## Gene4    Gene4 0.582831992501315       612
## Gene5    Gene5 0.493213594777193      1103
## Gene6    Gene6 0.881807964448208      1507</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<p>We can then use <code>scater</code> to calculate TPM:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb119"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb119-1" data-line-number="1"><span class="kw">tpm</span>(sim) &lt;-<span class="st"> </span><span class="kw"><a href="http://www.rdocumentation.org/packages/scater/topics/calculateTPM">calculateTPM</a></span>(sim, <span class="kw">rowData</span>(sim)<span class="op">$</span>Length)</a>
<a class="sourceLine" id="cb119-2" data-line-number="2"><span class="kw">tpm</span>(sim)[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">5</span>]</a></code></pre></div>
<pre><code>##           Cell1     Cell2    Cell3     Cell4     Cell5
## Gene1  99.11615 148.33343 145.2572  98.70588 147.04199
## Gene2   0.00000  88.46642 129.9476 132.45398  87.69621
## Gene3   0.00000   0.00000   0.0000   0.00000   0.00000
## Gene4   0.00000 514.31950   0.0000 256.68368 254.92083
## Gene5 143.01301   0.00000   0.0000   0.00000 141.44293</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<p>The default method used by <code>addGeneLengths</code> to simulate lengths is to generate values from a log-normal distribution which are then rounded to give an integer length. The parameters for this distribution are based on human protein coding genes but can be adjusted if needed (for example for other species). Alternatively lengths can be sampled from a provided vector (see <code><a href="../reference/addGeneLengths.html">?addGeneLengths</a></code> for details and an example).</p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="comparing-simulations-and-real-data" class="section level1">
<h1 class="hasAnchor">
<a href="#comparing-simulations-and-real-data" class="anchor"></a>Comparing simulations and real data</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>One thing you might like to do after simulating data is to compare it to a real dataset, or compare simulations with different parameters or models. Splatter provides a function <code>compareSCEs</code> that aims to make these comparisons easier. As the name suggests this function takes a list of <code>SingleCellExperiment</code> objects, combines the datasets and produces some plots comparing them. Let’s make two small simulations and see how they compare.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb121"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb121-1" data-line-number="1">sim1 &lt;-<span class="st"> </span><span class="kw"><a href="../reference/splatSimulate.html">splatSimulate</a></span>(<span class="dt">nGenes =</span> <span class="dv">1000</span>, <span class="dt">batchCells =</span> <span class="dv">20</span>, <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb121-2" data-line-number="2">sim2 &lt;-<span class="st"> </span><span class="kw"><a href="../reference/simpleSimulate.html">simpleSimulate</a></span>(<span class="dt">nGenes =</span> <span class="dv">1000</span>, <span class="dt">nCells =</span> <span class="dv">20</span>, <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb121-3" data-line-number="3">comparison &lt;-<span class="st"> </span><span class="kw"><a href="../reference/compareSCEs.html">compareSCEs</a></span>(<span class="kw">list</span>(<span class="dt">Splat =</span> sim1, <span class="dt">Simple =</span> sim2))</a>
<a class="sourceLine" id="cb121-4" data-line-number="4"></a>
<a class="sourceLine" id="cb121-5" data-line-number="5"><span class="kw">names</span>(comparison)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] "FeatureData" "PhenoData"   "Plots"</code></pre>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb123"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb123-1" data-line-number="1"><span class="kw">names</span>(comparison<span class="op">$</span>Plots)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## [1] "Means"        "Variances"    "MeanVar"      "LibrarySizes"
## [5] "ZerosGene"    "ZerosCell"    "MeanZeros"</code></pre>
<p>The returned list has three items. The first two are the combined datasets by gene (<code>FeatureData</code>) and by cell (<code>PhenoData</code>) and the third contains some comparison plots (produced using <code>ggplot2</code>), for example a plot of the distribution of means:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb125"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb125-1" data-line-number="1">comparison<span class="op">$</span>Plots<span class="op">$</span>Means</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/comparison-means-1.png" width="576" style="display: block; margin: auto;"></p>
<p>These are only a few of the plots you might want to consider but it should be easy to make more using the returned data. For example, we could plot the number of expressed genes against the library size:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb126"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb126-1" data-line-number="1"><span class="kw">library</span>(<span class="st">"ggplot2"</span>)</a>
<a class="sourceLine" id="cb126-2" data-line-number="2"><span class="kw"><a href="http://www.rdocumentation.org/packages/ggplot2/topics/ggplot">ggplot</a></span>(comparison<span class="op">$</span>PhenoData,</a>
<a class="sourceLine" id="cb126-3" data-line-number="3">       <span class="kw"><a href="http://www.rdocumentation.org/packages/ggplot2/topics/aes">aes</a></span>(<span class="dt">x =</span> total_counts, <span class="dt">y =</span> total_features_by_counts, <span class="dt">colour =</span> Dataset)) <span class="op">+</span></a>
<a class="sourceLine" id="cb126-4" data-line-number="4"><span class="st">    </span><span class="kw"><a href="http://www.rdocumentation.org/packages/ggplot2/topics/geom_point">geom_point</a></span>()</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/comparison-libsize-features-1.png" width="576" style="display: block; margin: auto;"></p>
Luke Zappia's avatar
Luke Zappia committed
<div id="comparing-differences" class="section level2">
<h2 class="hasAnchor">
<a href="#comparing-differences" class="anchor"></a>Comparing differences</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>Sometimes instead of visually comparing datasets it may be more interesting to look at the differences between them. We can do this using the <code>diffSCEs</code> function. Similar to <code>compareSCEs</code> this function takes a list of <code>SingleCellExperiment</code> objects but now we also specify one to be a reference. A series of similar plots are returned but instead of showing the overall distributions they demonstrate differences from the reference.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb127"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb127-1" data-line-number="1">difference &lt;-<span class="st"> </span><span class="kw"><a href="../reference/diffSCEs.html">diffSCEs</a></span>(<span class="kw">list</span>(<span class="dt">Splat =</span> sim1, <span class="dt">Simple =</span> sim2), <span class="dt">ref =</span> <span class="st">"Simple"</span>)</a>
<a class="sourceLine" id="cb127-2" data-line-number="2">difference<span class="op">$</span>Plots<span class="op">$</span>Means</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/difference-1.png" width="576" style="display: block; margin: auto;"></p>
<p>We also get a series of Quantile-Quantile plot that can be used to compare distributions.</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb128"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb128-1" data-line-number="1">difference<span class="op">$</span>QQPlots<span class="op">$</span>Means</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<p><img src="splatter_files/figure-html/difference-qq-1.png" width="576" style="display: block; margin: auto;"></p>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="making-panels" class="section level2">
<h2 class="hasAnchor">
<a href="#making-panels" class="anchor"></a>Making panels</h2>
Luke Zappia's avatar
Luke Zappia committed
<p>Each of these comparisons makes several plots which can be a lot to look at. To make this easier, or to produce figures for publications, you can make use of the functions <code>makeCompPanel</code>, <code>makeDiffPanel</code> and <code>makeOverallPanel</code>.</p>
<p>These functions combine the plots into a single panel using the <code>cowplot</code> package. The panels can be quite large and hard to view (for example in RStudio’s plot viewer) so it can be better to output the panels and view them separately. Luckily <code>cowplot</code> provides a convenient function for saving the images. Here are some suggested parameters for outputting each of the panels:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb129"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb129-1" data-line-number="1"><span class="co"># This code is just an example and is not run</span></a>
<a class="sourceLine" id="cb129-2" data-line-number="2">panel &lt;-<span class="st"> </span><span class="kw"><a href="../reference/makeCompPanel.html">makeCompPanel</a></span>(comparison)</a>
<a class="sourceLine" id="cb129-3" data-line-number="3">cowplot<span class="op">::</span><span class="kw"><a href="http://www.rdocumentation.org/packages/cowplot/topics/save_plot">save_plot</a></span>(<span class="st">"comp_panel.png"</span>, panel, <span class="dt">nrow =</span> <span class="dv">4</span>, <span class="dt">ncol =</span> <span class="dv">3</span>)</a>
<a class="sourceLine" id="cb129-4" data-line-number="4"></a>
<a class="sourceLine" id="cb129-5" data-line-number="5">panel &lt;-<span class="st"> </span><span class="kw"><a href="../reference/makeDiffPanel.html">makeDiffPanel</a></span>(difference)</a>
<a class="sourceLine" id="cb129-6" data-line-number="6">cowplot<span class="op">::</span><span class="kw"><a href="http://www.rdocumentation.org/packages/cowplot/topics/save_plot">save_plot</a></span>(<span class="st">"diff_panel.png"</span>, panel, <span class="dt">nrow =</span> <span class="dv">3</span>, <span class="dt">ncol =</span> <span class="dv">5</span>)</a>
<a class="sourceLine" id="cb129-7" data-line-number="7"></a>
<a class="sourceLine" id="cb129-8" data-line-number="8">panel &lt;-<span class="st"> </span><span class="kw"><a href="../reference/makeOverallPanel.html">makeOverallPanel</a></span>(comparison, difference)</a>
<a class="sourceLine" id="cb129-9" data-line-number="9">cowplot<span class="op">::</span><span class="kw"><a href="http://www.rdocumentation.org/packages/cowplot/topics/save_plot">save_plot</a></span>(<span class="st">"overall_panel.png"</span>, panel, <span class="dt">ncol =</span> <span class="dv">4</span>, <span class="dt">nrow =</span> <span class="dv">7</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
</div>
</div>
<div id="citing-splatter" class="section level1">
<h1 class="hasAnchor">
<a href="#citing-splatter" class="anchor"></a>Citing Splatter</h1>
Luke Zappia's avatar
Luke Zappia committed
<p>If you use Splatter in your work please cite our paper:</p>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb130"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb130-1" data-line-number="1"><span class="kw">citation</span>(<span class="st">"splatter"</span>)</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## 
Luke Zappia's avatar
Luke Zappia committed
##   Zappia L, Phipson B, Oshlack A. Splatter: Simulation of
##   single-cell RNA sequencing data. Genome Biology. 2017;
##   doi:10.1186/s13059-017-1305-0
Luke Zappia's avatar
Luke Zappia committed
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     author = {Luke Zappia and Belinda Phipson and Alicia Oshlack},
Luke Zappia's avatar
Luke Zappia committed
##     title = {Splatter: simulation of single-cell RNA sequencing data},
##     journal = {Genome Biology},
Luke Zappia's avatar
Luke Zappia committed
##     year = {2017},
Luke Zappia's avatar
Luke Zappia committed
##     url = {http://dx.doi.org/10.1186/s13059-017-1305-0},
##     doi = {10.1186/s13059-017-1305-0},
Luke Zappia's avatar
Luke Zappia committed
##   }</code></pre>
Luke Zappia's avatar
Luke Zappia committed
</div>
<div id="session-information" class="section level1 unnumbered">
<h1 class="hasAnchor">
<a href="#session-information" class="anchor"></a>Session information</h1>
Luke Zappia's avatar
Luke Zappia committed
<div class="sourceCode" id="cb132"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb132-1" data-line-number="1"><span class="kw">sessionInfo</span>()</a></code></pre></div>
Luke Zappia's avatar
Luke Zappia committed
<pre><code>## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
Luke Zappia's avatar
Luke Zappia committed
## Running under: macOS Sierra 10.12.6
Luke Zappia's avatar
Luke Zappia committed
## 
## Matrix products: default
Luke Zappia's avatar
Luke Zappia committed
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
Luke Zappia's avatar
Luke Zappia committed
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
Luke Zappia's avatar
Luke Zappia committed
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
Luke Zappia's avatar
Luke Zappia committed
## [8] methods   base     
Luke Zappia's avatar
Luke Zappia committed
## 
## other attached packages:
Luke Zappia's avatar
Luke Zappia committed
##  [1] scater_1.9.14               ggplot2_3.0.0.9000         
##  [3] splatter_1.5.2              SingleCellExperiment_1.3.10
##  [5] SummarizedExperiment_1.11.6 DelayedArray_0.7.27        
##  [7] BiocParallel_1.15.8         matrixStats_0.54.0         
##  [9] Biobase_2.41.2              GenomicRanges_1.33.13      
## [11] GenomeInfoDb_1.17.1         IRanges_2.15.16            
## [13] S4Vectors_0.19.19           BiocGenerics_0.27.1        
## [15] BiocStyle_2.9.3            
Luke Zappia's avatar
Luke Zappia committed
## 
## loaded via a namespace (and not attached):
Luke Zappia's avatar
Luke Zappia committed
##  [1] viridis_0.5.1            edgeR_3.23.3            
##  [3] splines_3.5.0            viridisLite_0.3.0       
##  [5] DelayedMatrixStats_1.3.6 assertthat_0.2.0        
##  [7] highr_0.7                sp_1.3-1                
##  [9] vipor_0.4.5              GenomeInfoDbData_1.1.0  
## [11] yaml_2.2.0               pillar_1.3.0            
## [13] backports_1.1.2          lattice_0.20-35         
## [15] limma_3.37.3             glue_1.3.0              
## [17] digest_0.6.15            XVector_0.21.3          
## [19] checkmate_1.8.5          colorspace_1.3-2        
## [21] cowplot_0.9.3            htmltools_0.3.6         
## [23] Matrix_1.2-14            plyr_1.8.4              
## [25] pkgconfig_2.0.1          bookdown_0.7            
## [27] zlibbioc_1.27.0          purrr_0.2.5             
## [29] scales_1.0.0             HDF5Array_1.9.9         
## [31] tibble_1.4.2             withr_2.1.2             
## [33] lazyeval_0.2.1           survival_2.42-6         
## [35] magrittr_1.5             crayon_1.3.4            
## [37] memoise_1.1.0            evaluate_0.11           
## [39] fs_1.2.5                 MASS_7.3-50