Commit 28f0e5e8 authored by Lucy McNeill's avatar Lucy McNeill
Browse files

vignettes working with devtools::check()

parent 539a96f5
No preview for this file type
......@@ -13,8 +13,7 @@ Authors@R:
role = c("rev", "ctb"),
comment = c(ORCID = "0000-0003-0143-8293")))
Maintainer: Lucy McNeill <lucmcneill@gmail.com>
Description: Synapsis is a Bioconductor software package for automated (unbiased and reproducible) analysis of meiotic immunofluorescence datasets. The primary functions of the software can i) identify cells in meiotic prophase that are labelled by a synaptonemal complex axis or central element protein, ii) isolate individual synaptonemal complexes and measure their physical length, iii) quantify foci and co-localise them with synaptonemal complexes, iv) measure interference between synaptonemal complex-associated foci. The software has applications that extend to multiple species and to the analysis of other proteins that label meiotic prophase chromosomes. The software converts meiotic immunofluorescence images into R data frames that are compatible with machine learning methods.
Given a set of microscopy images of meiotic spread slides, synapsis crops images around individual single cells, counts colocalising foci on strands on a per cell basis, and measures the distance between foci on any given strand.
Description: Synapsis is a Bioconductor software package for automated (unbiased and reproducible) analysis of meiotic immunofluorescence datasets. The primary functions of the software can i) identify cells in meiotic prophase that are labelled by a synaptonemal complex axis or central element protein, ii) isolate individual synaptonemal complexes and measure their physical length, iii) quantify foci and co-localise them with synaptonemal complexes, iv) measure interference between synaptonemal complex-associated foci. The software has applications that extend to multiple species and to the analysis of other proteins that label meiotic prophase chromosomes. The software converts meiotic immunofluorescence images into R data frames that are compatible with machine learning methods. Given a set of microscopy images of meiotic spread slides, synapsis crops images around individual single cells, counts colocalising foci on strands on a per cell basis, and measures the distance between foci on any given strand.
biocViews:
Software,
Visualization
......@@ -27,5 +26,9 @@ LazyData: true
RoxygenNote: 7.1.1
VignetteBuilder: knitr
Suggests:
testthat (>= 3.0.0)
knitr,
rmarkdown,
testthat (>= 3.0.0),
ggplot2,
tidyverse
Config/testthat/edition: 3
---
title: "Data preparation and running synapsis"
author: "Lucy McNeill"
date: "30/06/2021"
output: html_document
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{setting_up_synapsis_v1}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(EBImage)
```
# Preparing your data
In this notebook we will give a comprehensive guide for preparing image data so that it is compatible with synapsis.
......@@ -50,7 +65,6 @@ Synapsis has particular purposes for families of stains/ antibodies (i.e. foci c
## Section A: splitting a folder full of .jpeg, .png or .tiff files using EBImage
The three colour .jpeg files have SYCP3 as red, MLH3 as green and DAPI as blue.
```{r}
library(EBImage)
path = paste0(system.file("extdata",package = "synapsis"))
file_3channel <- paste0(path,"/MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-RGB.jpeg")
image_3channel <- readImage(file_3channel)
......@@ -71,7 +85,6 @@ And now we can save
```
To do this for a whole folder, where in this example all images end in "-RGB.jpeg", use this loop instead:
```{r, eval = FALSE}
library(EBImage)
# location of the jpeg/png/tiff files to convert into three channel jpeg/png/tiff
path = 'data-folder/from-sharepoint/OneDrive_1_01-06-2021'
file_list <- list.files(path)
......
# Using synapsis
# Setup in Rstudio
### installing library from gitlab
To use synapsis, you will need the following packages:
- stats
- EBImage
- graphics
- utils
And to run this tutorial, you will need:
- tidyverse
- ggplot2
- knitr
- rmarkdown
```{r}
#if (!requireNamespace("BiocManager", quietly = TRUE))
#install.packages("BiocManager")
#BiocManager::install("EBImage")
# change to synapsis eventually
```
```{r, eval = FALSE}
devtools::install_git('https://gitlab.svi.edu.au/lmcneill/synapsis')
```
### loading synapsis
```{r}
library(synapsis)
```
### checking documentation
```{r}
??count_foci
```
## Data preparation
Please download test1.zip and test2.zip in the "coding" chat.
Double click to unzip. Drag the folders into the place you'll be working from, and make note of the path.
For example, if I want to put them in a folder I created called "test-data-all" in "imaging", in "svi", in the "Documents" folder, my path looks like:
```{r}
path = "~/Documents/svi/imaging/test-data-all"
```
## might need these libraries?
```{r}
#library(knitr)
#library(rmarkdown)
#library(EBImage)
#library(ggplot2)
```
# Calling functions on data
## Cropping routine.
*cell_area variables are based on our mouse imaging data. For different species with larger/ smaller nuclei or other teams using different magnifications, we would need to change this (with some finetuning, which is made easier by rknit)
There is an annotation setting, which we switch to "on". If you haven't cropped the data yet, make sure *```{r}* is at the top of the following chunk.
```{r, eval = FALSE}
auto_crop(path, annotation = "on", max_cell_area = 30000, min_cell_area = 7000)
```
Here we called path, plus other optional parameters (that would otherwise take on default values). But only path is essential. This is because auto_crop has built-in default values which are assumed when the user doesn't specify.
If a crops folder with three channels per image was successfully generated, put *```{r, eval = FALSE}* at the top of the previous chink. Now that cell candidates have been cropped, we don't need/want to wait around for that again!
## Getting pachytene
```{r}
SYCP3_stats <- get_pachytene(path,ecc_thresh = 0.8, area_thresh = 0.04)
```
SYCP3_stats is a data frame summarising some features of the cells classified as pachytene.
## Counting foci
```{r}
foci_counts <- count_foci(path,offset_factor = 3, brush_size = 3, brush_sigma = 3, annotate = "on")
```
Make sure every line prior to the above chunk is commented out, because we want to knit with annotate = "on" to check that synapsis is counting close or same as a manual count.
If it's identifying too many things as focis, try increasing some of the input parameters like offset_factor or brush_size.
If it's not identifying any foci but there are clearly foci there, try decreasing those parameters.
# Statistics
## some basic statistics
```{r}
### comparing groups
counts <- foci_counts$foci_count
hist(as.numeric(counts))
counts_mod <- foci_counts[as.numeric(foci_counts$foci_count) > 0,]
counts_mod <- foci_counts[as.numeric(foci_counts$foci_count) < 40,]
#counts_mod <- counts_mod[as.numeric(counts_mod$percent_on) > 0.55,]
# counts_mod <- counts_mod[as.numeric(counts_mod$sd_foci) <20,]
counts <- counts_mod$foci_count
hist(as.numeric(counts))
counts_KO <- counts_mod[counts_mod$genotype == "Fancm-/-",]
counts_WT <- counts_mod[counts_mod$genotype == "Fancm+/+",]
count_KO <- counts_KO$foci_count
count_WT <- counts_WT$foci_count
mean(as.numeric(count_KO), na.rm= TRUE)
mean(as.numeric(count_WT), na.rm= TRUE)
sd(as.numeric(count_KO), na.rm= TRUE)
sd(as.numeric(count_WT), na.rm= TRUE)
c1 <- rgb(173,216,230,max = 255, alpha = 140, names = "lt.blue")
c4 <- rgb(255,200,50, max = 255, alpha = 120, names = "lt.orange")
A <- hist(as.numeric(count_WT),plot = FALSE)
B <- hist(as.numeric(count_KO), plot = FALSE )
plot(A,ylim = c(0,40), main = "Pachytene", col = c4, xlab = "foci count")
plot(B, col = c1, add = TRUE)
```
## comparison testing
### anova test
```{r}
## anova test
counts_mod$group <- factor(counts_mod$genotype, c("Fancm-/-", "Fancm+/+"))
outfit <- lm(foci_count ~ genotype, data=counts_mod)
outfit
#df.residual(outfit)
#sigma(outfit)
#model.matrix(outfit)
outfit0 <- lm(foci_count ~ 1, data=counts_mod)
anova(outfit0, outfit)
```
### t test
```{r}
t.test(as.numeric(count_KO),as.numeric(count_WT))
```
#boxplot
```{r}
library(tidyverse)
counts_mod %>%
ggplot(aes(x=genotype, y=as.numeric(foci_count), fill=genotype)) +
geom_boxplot(width=0.5,lwd=1.5) +
geom_jitter(width=0.15)+
labs(subtitle="MLH3 foci counts")
```
The next function takes a long time so has been commented out.
```{r, eval = FALSE}
df_dist <- measure_distances(path, annotate = "off")
```
---
title: "Using synapsis"
author: "Lucy McNeill"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
number_sections: true
toc: yes
toc_float: yes
---
# Setup in Rstudio
### installing library from gitlab
To use synapsis, you will need the following packages:
- stats
- EBImage
- graphics
- utils
And to run this tutorial, you will need:
- tidyverse
- ggplot2
- knitr
- rmarkdown
```{r, eval = FALSE}
devtools::install_git('https://gitlab.svi.edu.au/lmcneill/synapsis')
```
### loading synapsis
```{r}
library(synapsis)
library(EBImage)
```
### checking documentation
## Data preparation
First, let's set the path where our images are.
For example, if I want to put them in a folder I created called "test-data-all" in "imaging", in "svi", in the "Documents" folder, my path looks like:
```{r, eval = FALSE}
path = "~/Documents/svi/imaging/test-data-one"
```
Have 3 channel jpegs? Add this. For nd2 files use python script.
Synapsis relies on a folder of either two or three channels corresponding to a single image. They have the same filename, except for the end, with common string referring to channel 1 and channel 2 (and optionally, channel 3), and then the file extension.
Converting 3 channels to one single image... (if you used the python script)
```{r, echo = FALSE}
#setwd(path)
#DAPI_c <- "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-DAPI.jpeg"
#DAPI_c <- readImage(DAPI_c)
#SYCP3_c <- "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-SYCP3.jpeg"
#SYCP3_c <- readImage(SYCP3_c)
#MLH3_c <- "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-MLH3.jpeg"
#MLH3_c <- readImage(MLH3_c)
#all_c <- rgbImage(SYCP3_c,MLH3_c,DAPI_c)
#display(all_c)
#filename_3 <- "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006.jpeg"
#print(paste0(path,"/",filename_3))
#writeImage(all_c, paste0(path,"/",filename_3))
```
```{r, eval = FALSE}
path = "~/Documents/svi/imaging/test-data-one"
file <- "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-RGB.jpeg"
image <- paste0(path,"/", file)
image_3channel <- readImage(image)
r = channel(image_3channel,"r")
g = channel(image_3channel,"g")
b = channel(image_3channel,"b")
```
Note that in this example, red illuminates axis, foci in green, and DAPI channel is shown in blue. If your three channel images have this swapped, it's no problem, but we need to make sure this is communicated through the functions.
For this example, the files are ready to go, but you can use the following script to separate your image into two or three channels and save them (if they are jpegs, tif, png etc), or if you have nd2 you can use the python converter, when you want to use your own data.
# Calling functions on data
## Cropping routine.
You can type
```{r, eval = FALSE}
??auto_crop_fast
```
There is an annotation setting, which we switch to "on". max_cell_area and min_cell_area have been calibrated to our data set, where the subject are mouse cells and magnification kept constant. You could run it on the first five images by setting e.g. test_amount = 5. But for now we will just look at the single image.
```{r}
path = "~/Documents/svi/imaging/test-data-one"
auto_crop_fast(path, annotation = "on", max_cell_area = 30000, min_cell_area = 7000, test_amount = 1)
```
Here we called path, plus other optional parameters (that would otherwise take on default values). But only path is essential. This is because auto_crop_fast has built-in default values which are assumed when the user doesn't specify.
A crops folder with three channels per "viable cell" should have been generated inside the folder where these images are kept i.e. in path.
## Getting pachytene
```{r}
SYCP3_stats <- get_pachytene(path,ecc_thresh = 0.8, area_thresh = 0.04, annotation = "on")
foci_counts <- count_foci(path,offset_factor = 3, brush_size = 3, brush_sigma = 3, annotation = "on",stage = "pachytene")
```
SYCP3_stats is a data frame summarising some features of the cells classified as pachytene.
## Distance between foci on SC
```{r}
df_dist <- measure_distances(path, annotation = "on")
```
## Bigger data set
Now we will run auto_crop on the bigger data set (silently/ without annotation) to do some significance testing.
```{r}
path = "~/Documents/svi/imaging/test-data-all"
start_time <- as.numeric(as.numeric(Sys.time())*1000, digits=15) # place at start
auto_crop_fast(path, annotation = "off", max_cell_area = 30000, min_cell_area = 7000,third_channel = "on")
end_time <- as.numeric(as.numeric(Sys.time())*1000, digits=15)
print(end_time - start_time)
```
## Counting foci
Now let's count the foci for each genotype.
```{r}
SYCP3_stats <- get_pachytene(path,ecc_thresh = 0.8, area_thresh = 0.04, annotation = "off")
foci_counts <- count_foci(path,offset_factor = 3, brush_size = 3, brush_sigma = 3, annotation = "off",stage = "pachytene")
```
Make sure every line prior to the above chunk is commented out, because we want to knit with annotate = "on" to check that synapsis is counting close or same as a manual count.
If it's identifying too many things as focis, try increasing some of the input parameters like offset_factor or brush_size.
If it's not identifying any foci but there are clearly foci there, try decreasing those parameters.
# Statistics
## some basic statistics
```{r}
### comparing groups
counts <- foci_counts$foci_count
counts_mod <- foci_counts[as.numeric(foci_counts$foci_count) > 0,]
counts_mod <- foci_counts[as.numeric(foci_counts$foci_count) < 40,]
#counts_mod <- counts_mod[as.numeric(counts_mod$percent_on) > 0.55,]
# counts_mod <- counts_mod[as.numeric(counts_mod$sd_foci) <20,]
counts <- counts_mod$foci_count
counts_KO <- counts_mod[counts_mod$genotype == "Fancm-/-",]
counts_WT <- counts_mod[counts_mod$genotype == "Fancm+/+",]
count_KO <- counts_KO$foci_count
count_WT <- counts_WT$foci_count
mean(as.numeric(count_KO), na.rm= TRUE)
mean(as.numeric(count_WT), na.rm= TRUE)
sd(as.numeric(count_KO), na.rm= TRUE)
sd(as.numeric(count_WT), na.rm= TRUE)
c1 <- rgb(173,216,230,max = 255, alpha = 140, names = "lt.blue")
c4 <- rgb(255,180,50, max = 255, alpha = 120, names = "lt.orange")
A <- hist(as.numeric(count_WT),plot = FALSE)
B <- hist(as.numeric(count_KO), plot = FALSE )
plot(A,ylim = c(0,20), main = "Pachytene", col = c4, xlab = "foci count per cell")
plot(B, col = c1, add = TRUE)
```
## comparison testing
### anova test
```{r}
## anova test
counts_mod$group <- factor(counts_mod$genotype, c("Fancm-/-", "Fancm+/+"))
outfit <- lm(foci_count ~ genotype, data=counts_mod)
outfit
#df.residual(outfit)
#sigma(outfit)
#model.matrix(outfit)
outfit0 <- lm(foci_count ~ 1, data=counts_mod)
anova(outfit0, outfit)
```
### t test
```{r}
t.test(as.numeric(count_KO),as.numeric(count_WT))
```
now that we have a p value we can paste this on the histogram
```{r, eval = FALSE}
c1 <- rgb(173,216,230,max = 255, alpha = 140, names = "lt.blue")
c4 <- rgb(255,180,50, max = 255, alpha = 120, names = "lt.orange")
A <- hist(as.numeric(count_WT),plot = FALSE)
B <- hist(as.numeric(count_KO), plot = FALSE )
plot(A,ylim = c(0,20), main = "Pachytene", col = c4, xlab = "foci count per cell")
text(x = 10, y = 15, label = "anova p value = 0.01*", col = "black", cex = 1)
plot(B, col = c1, add = TRUE)
```
# Boxplot
```{r}
library(tidyverse)
counts_mod %>%
ggplot(aes(x=genotype, y=as.numeric(foci_count), fill=genotype)) +
geom_boxplot(width=0.5,lwd=1.5) +
geom_jitter(width=0.15)+
labs(subtitle="MLH3 foci counts")
```
We won't run this for the notebook yet:
```{r, eval = FALSE}
df_dist <- measure_distances(path, annotation = "off")
print(df_dist)
```
```{r, eval = FALSE}
pass_only <- df_dist[df_dist$pass_fail == "pass",]
distances <- pass_only$fractional_distance
distances_KO <- pass_only[pass_only$genotype == "Fancm-/-",]
distances_WT <- pass_only[pass_only$genotype == "Fancm+/+",]
distance_KO <- distances_KO$fractional_distance
distance_WT <- distances_WT$fractional_distance
mean(as.numeric(distance_KO), na.rm= TRUE)
mean(as.numeric(distance_WT), na.rm= TRUE)
sd(as.numeric(distance_KO), na.rm= TRUE)
sd(as.numeric(distance_WT), na.rm= TRUE)
c1 <- rgb(173,216,230,max = 255, alpha = 140, names = "lt.blue")
c4 <- rgb(255,180,50, max = 255, alpha = 120, names = "lt.orange")
A <- hist(as.numeric(distance_WT),plot = FALSE)
B <- hist(as.numeric(distance_KO), plot = FALSE )
plot(A,ylim = c(0,9),xlim = c(0,1), main = "Pachytene", col = c4, xlab = "foci distance as fraction of total length")
text(x = 0.2, y = 7, label = "anova p value 0.9 (NS)", col = "black", cex = 1)
plot(B, col = c1, add = TRUE)
```
```{r, eval = FALSE}
t.test(as.numeric(distance_KO),as.numeric(distance_WT))
```
```{r, eval = FALSE}
column <- as.data.frame(count_WT)
write.csv(column, "column4.csv",row.names = FALSE)
```
---
title: "Using synapsis"
author: "Lucy McNeill"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
number_sections: true
toc: yes
toc_float: yes
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{synapsis_tutorial_v1}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE}
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(synapsis)
library(knitr)
library(rmarkdown)
library(tidyverse)
library(ggplot2)
```
# Getting started
## installing library from gitlab
......
# Using synapsis
```{r}
packages = c("rmarkdown","BiocManager","imager")
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
library(knitr)
library(rmarkdown)
#BiocManager::install("EBImage")
library(EBImage)
library(ggplot2)
## comment out if you have crops..
#path = "~/Documents/svi/imaging/data-folder/test-folder/all-WT"
#files <- list.files(path)
#setwd("~/Documents/synapsis/synapsis/R")
#source("auto_crop.R")
# call routine to crop images, and display their "cell number"
#crop(files, path, "regular")
#### make sure that these are output in a new folder.
### call the next functions on the new folder.
# call get_pachytene()
#path = "~/Documents/svi/imaging/data-folder/test-folder/all-WT/crops"
#source("~/Documents/synapsis/synapsis/R/get_pachytene.R")
#files <- list.files(path)
#get_pachytene(files,path)
### only keep BW filter
# call count_foci() on the pachytene files only.
#path = "~/Documents/svi/imaging/data-folder/test-folder/all-KO/crops/pachytene"
#source("~/Documents/synapsis/synapsis/R/count_foci.R")
#files <- list.files(path)
#foci_counts <- count_foci(files,path)
#column_2 <- as.data.frame(foci_counts)
#write.csv(column_2, "temp.csv",row.names = FALSE)
# Call for KO folder, call for WT folder.
# call same routine, but excluding those which are cropped badly.
### now that you have all the pachytene crop files, measure the distances.
path = "~/Documents/svi/imaging/data-folder/test-folder/all-WT/some/crops/pachytene"
source("~/Documents/synapsis/synapsis/R/measure_distances_2.R")
files <- list.files(path)
measure_distances_2(files,path)
```
# Using synapsis
```{r}
packages = c("rmarkdown","BiocManager","imager")
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
library(knitr)
library(rmarkdown)
#BiocManager::install("EBImage")
library(EBImage)
library(ggplot2)
## comment out if you have crops..
path = "~/Documents/svi/imaging/test-data"
source("auto_crop_test.R")
auto_crop_test(path)
source("~/Documents/synapsis/synapsis/R/get_pachytene.R")
SYCP3_stats <- get_pachytene(path)
# take a look at pachytene. Remove any crops that don't cut it.
source("~/Documents/synapsis/synapsis/R/count_foci.R")
#foci_counts <- count_foci(path,offset_factor = 5, brush_size = 1)
## Settings that worked for Wayne's single image
#foci_counts <- count_foci(path,offset_factor = 9, brush_size = 1, brush_sigma = 3)
## settings that used to work for me (but overcount)
#foci_counts <- count_foci(path,offset_factor = 2, brush_size = 3, brush_sigma = 3)
foci_counts <- count_foci(path,offset_factor = 3, brush_size = 3, brush_sigma = 3, annotate = "on")
### comparing groups
counts <- foci_counts$foci_count
hist(as.numeric(counts))
counts_mod <- foci_counts[as.numeric(foci_counts$foci_count) > 0,]
counts_mod <- counts_mod[as.numeric(counts_mod$percent_on) > 0.55,]
counts <- counts_mod$foci_count
hist(as.numeric(counts))
counts_KO <- counts_mod[counts_mod$genotype == "Fancm-/-",]
counts_WT <- counts_mod[counts_mod$genotype == "Fancm+/+",]
count_KO <- counts_KO$foci_count
count_WT <- counts_WT$foci_count
mean(as.numeric(count_KO), na.rm= TRUE)
mean(as.numeric(count_WT), na.rm= TRUE)
sd(as.numeric(count_KO), na.rm= TRUE)
sd(as.numeric(count_WT), na.rm= TRUE)