Commit 8480c663 authored by Lucy McNeill's avatar Lucy McNeill
Browse files

add instructions for installing python/launching jupyter notebook for data preparation

parent d2351bce
No preview for this file type
# Code
# External scripts
Save command-line scripts and shared R code here.
This directory contains extra scripts that might be required for data preparation. At the moment, these are all python scripts.
## Getting started with python
Here is a short, straightforward way to install python to launch a jupyter notebook on a mac.
If anything is unclear or doesn't work, please let us know.
For terminal comments, just type everything *after* the $ sign exactly as it appears.
### Step 1: check whether you have python
First, open a terminal and type
```r
$ which python
```
if you get something like
```r
$ which python
/Users/lmcneill/opt/anaconda3/bin/python
```
then you already have python installed. Check for anaconda.
Otherwise, if you got a blank response e.g.:
```r
$ which python
$
```
then you'll need to install python.
### Step 2: installing python with anaconda3
We recommmend installing via anaconda, where the direct download .zip can be found [here](https://www.anaconda.com/products/individual#macos).
Once you have successfully gone through the installation steps in the graphical installer, *close the terminal, open a new one*, and then repeat step 1. You should get a path to where python is now i.e.
```r
$ which python
/Users/lmcneill/opt/anaconda3/bin/python
```
## `nd2converter.ipynb`
at the moment, anaconda installed most of the packages we will need for this. Except for nd2reader. So we return to the terminal and type:
```r
$ conda config --add channels conda-forge
```
After this, we can install packages from the command line. To install the python package `nd2reader`, type:
```r
$ conda install nd2reader
```
### Step 3: Launching jupyter notebook
Once you have installed `nd2converter.ipynb`, put it somewhere that you will remember. Then, in the terminal, navigate to the folder that the file is located. For example, if I put it in a folder called jupyter-notebooks in
```r
'/Users/lmcneill/Documents/svi/jupyter-notebooks'
```
then I would open a terminal and type
```r
$ cd Documents/svi/jupyter-notebooks
```
to check that the notebook is there, type
```r
$ ls
```
and it should return all the files you have in there, including our notebook
```r
$ ls
nd2converter.ipynb
```
OK, we are ready to launch jupyter notebook. Type:
```r
$ jupyter notebook
```
A window should open in your browser. Click on `nd2converter.ipynb`.
When you have changed the path to the images you want to analyse i.e. change
```r
parent_dir = '/Users/lmcneill/Documents/svi/imaging/data-folder/from-sharepoint/OneDrive_1_01-06-2021'
```
in the python script, click inside the cell and press Shift+Enter. The notebook should run for a few mins. You can look in parent_dir in Finder to watch the .nd2s get converted.
## Navigating the terminal
By the way, if you're new to using the mac terminal, here are some commands that will help you.
To list all files and directories in your current location, type
```r
$ ls
```
to go into a folder (or complete path) in your current directory (i.e. appears in ls)
```r
$ cd Documents
```
to go to the previous folder
```r
$ cd ..
```
to go home
```r
$ cd
```
to make a folder
```r
$ mkdir
```
to make a file
```r
$ touch test.md
```
to remove a file (a bit dangerous)
```r
$ rm test.md
```
to print working directory (helpful for getting the full path to use `parent_dir` in `nd2converter.ipynb`)
```r
$ pwd
```
#' auto_crop
#'
#' crop an image around each viable cell candidate.
#'
#' @import EBImage
#' @export
#' @param file_list The file list
#' @param img_path The path
#' @return cropped SC and foci channels around single cells, regardless of stage
auto_crop <- function(file_list, img_path, crop_method = "regular")
{
# input :
# output : a bunch of output jpegs? Or save them all?
#BiocManager::install("EBImage")
library(EBImage)
cell_count <- 0
image_count <-0
pair <- 0
## for each image that is *-dna.jpeg,
for (file in file_list){
setwd(img_path)
if(grepl("*dna.jpeg$", file)){
file_dna = file
image_count <- image_count +1
image <- readImage(file_dna)
img_orig <- channel(2*image, "grey")
pair <- 0
}
if(grepl("*foci.jpeg$", file)){
file_foci = file
image <- readImage(file_foci)
img_orig_foci <- channel(image, "gray")
# call functions: get
pair <- 1
}
if(pair ==1){
#### function: blur the image
## call it on img_orig, optional offset
blob_th <- get_blobs(img_orig)
blob_label = bwlabel(blob_th)
blob_label <- channel(blob_label, "gray")
candidate <- bwlabel(blob_label)
## function: remove things that aren't cells
retained <- keep_cells(candidate)
### crop over each cell
## function: crop around every single viable cell
### crop foci channel here
## Loop over all objects in this final "removed" image
x_final <- computeFeatures.shape(retained)
x_final <- data.frame(x_final)
OOI_final <- nrow(x_final)
counter_final <- 0
# looping through each object to crop
while(counter_final<OOI_final){
counter_final <- counter_final+1
cell_count <- cell_count +1
crop_single_object(retained,OOI_final,counter_final,img_orig,img_orig_foci,file_dna,cell_count)
print("cell count for above crop is")
print(cell_count)
}
}
pair <- 0
}
print("out of")
print(image_count)
print("we got")
print(cell_count)
print("viable cells")
}
### add all the functions used here
#' get_blobs
#'
#' Makes mask of all objects bright enough
#'
#' @import EBImage
#' @export
#' @param img_orig Original image
#' @return Mask with cell candidates
#################################### new function ####################################
get_blobs <- function(img_orig, crop_method = "regular"){
# input:
# output:
## subfunction: get signals and make BW mask
### default offset
thresh <- 10*img_orig
# subfunction: big blur to blobs
img_tmp_dna <- img_orig
img_tmp <- thresh
w = makeBrush(size = 51, shape = 'gaussian', sigma = 15)
img_flo = filter2(img_tmp, w)
## default amplification
bg <- mean(10*img_tmp)
offset = 0.2
if(crop_method == "regular"){
blob_th = 10*img_flo > bg + offset
}
if(crop_method == "watershed"){
blob_th = 10*img_flo > bg + offset
}
return(blob_th)
}
#################################### new function ####################################
#' keep_cells
#'
#' Deletes objects in mask which are too small, large, oblong i.e. unlikely to be a cell
#'
#' @import EBImage
#' @export
#' @param candidate Mask of individual cell candidates
#' @return Mask of cell candidates which meet size criteria
keep_cells <- function(candidate){
# input:
# output:
# delete everything that's too small
colorimg<- colorLabels(candidate, normalize = TRUE)
x <- computeFeatures.shape(candidate)
x <- data.frame(x)
OOI <- nrow(x)
counter <- 0
retained <- candidate
while(counter<OOI){
counter <- counter+1
pixel_area = x$s.area[counter]
semi_maj <- x$s.radius.max[counter]
semi_min <- x$s.radius.min[counter]
# if statement checking if it's the wrong area
if(pixel_area> 20000 | pixel_area < 5000){
retained <- as.numeric(retained)*rmObjects(candidate, counter, reenumerate = TRUE)
}
## if statement checking that it's not too long i.e. not at edge.
if(semi_maj/semi_min > 2 & is.na(semi_maj/semi_min)==FALSE){
retained <- as.numeric(retained)*rmObjects(candidate, counter, reenumerate = TRUE)
}
}
retained <- bwlabel(retained)
return(retained)
}
#################################### new function ####################################
#' crop_single_object
#'
#' Creates mask for every individual cell candidate in mask
#'
#' @import EBImage
#' @export
#' @param retained Mask of cell candidates which meet size criteria
#' @return Crops aroudn all candidates in both channels
#'
crop_single_object <- function(retained, OOI_final,counter_final,img_orig,img_orig_foci,file,cell_count){
tmp_img <- retained
## have a single object
### delete all other objects
counter_single <- 0
# looping over all other objects to crop
while(counter_single < OOI_final){
counter_single <- counter_single + 1
# iteratively remove all other objects
if(counter_single != counter_final){
tmp_img <- as.numeric(tmp_img)*rmObjects(bwlabel(retained), counter_single, reenumerate = TRUE)
}
display(tmp_img)
}
## function: remove noise
noise_gone <- bwlabel(tmp_img)*as.matrix(img_orig)
noise_gone_foci <- bwlabel(tmp_img)*as.matrix(img_orig_foci)
## first get the row and column list that has a one in it.
row_list <- c()
col_list <- c()
# I think this is quick enough for now.. takes less than 10s...
xx = data.frame(as.numeric(tmp_img))
xx <- data.frame(bwlabel(tmp_img))
xm2 = t(as.matrix(xx))
i <- 0
my_matrix <- xm2
### now loop over matrix
for(row in 1:nrow(my_matrix)) {
for(col in 1:ncol(my_matrix)) {
if(my_matrix[row, col]==1){
row_list[i] <- row
col_list[i] <- col
i <- i+1
}
}
}
if (length(col_list>1)){
cy <- mean(row_list)
cx <- mean(col_list)
x_maj <- max(col_list)
x_min <- min(col_list)
y_maj <- max(row_list)
y_min <- min(row_list)
## radius
## total number of pixels given by list length. Find radius of area.
## crops to a square for the moment. can change this.
max_r <- max(x_maj-cx,y_maj-cy)
crop_r <- floor(max_r)
# might want to do this as a matrix
top_left_x <- floor(cx-crop_r)
top_left_y <- floor(cy-crop_r)
bottom_left_x <- floor(cx-crop_r)
bottom_left_y <- floor(cy+crop_r)
bottom_right_x <- floor(cx+crop_r)
bottom_right_y <- floor(cy+crop_r)
top_right_x <- floor(cx-crop_r)
top_right_y <- floor(cy+crop_r)
## crop image
ix <- bottom_left_x:bottom_right_x
iy <- top_left_y:bottom_left_y
# determine the dimensions, 2 in this case
## cropping part
tryCatch({
new_img <- noise_gone[ix, iy]
## want all images to have the same mean to 0.1
orig_mean <- mean(new_img)
mean_factor <- 0.08/orig_mean
new_img <- new_img*mean_factor
filename_crop = paste0("./crops/", file,"-crop",cell_count,"-dna.jpeg")
writeImage(new_img, filename_crop)
new_img_foci <- noise_gone_foci[ix, iy]
filename_crop_foci = paste0("./crops/", file,"-crop",cell_count,"-foci.jpeg")
writeImage(new_img_foci, filename_crop_foci)
#### strand related stuff here
},
error = function(e) {
#what should be done in case of exception?
str(e) # #prints structure of exception
print("couldn't crop it")
}
)
}
}
#################################### new function ####################################
#' count_foci
#'
#' Count coincident foci in cropped SC and foci channel per cell
#'
#' @import EBImage
#' @export
#' @param file_list list of files
#' @return foci count per cell
count_foci <- function(file_list, img_path)
{
# input :
# output : a bunch of output jpegs? Or save them all?
#BiocManager::install("EBImage")
library(EBImage)
cell_count <- 0
image_count <-0
pair <- 0
foci_counts <- 0
## for each image that is *-dna.jpeg,
for (file in file_list){
setwd(img_path)
if(grepl("*dna.jpeg$", file)){
file_dna = file
image_count <- image_count +1
image <- readImage(file_dna)
img_orig <- channel(2*image, "grey")
pair <- 0
}
if(grepl("*foci.jpeg$", file)){
file_foci = file
image <- readImage(file_foci)
img_orig_foci <- channel(image, "gray")
# call functions: get
pair <- 1
}
if(pair ==1){
new_img<-img_orig
display(new_img)
#### now see which have the right amount of strands
disc = makeBrush(21, "disc")
disc = disc / sum(disc)
localBackground = filter2(new_img, disc)
offset = 0.2
thresh_crop = (new_img - localBackground > offset)
strands <- bwlabel(thresh_crop)
display(strands)
color_img_strands<- colorLabels(strands, normalize = TRUE)
num_strands <- computeFeatures.shape(strands)
num_strands <- data.frame(num_strands)
foci_mask_crop <- img_orig_foci
bg <- mean(img_orig_foci)
orig_mean <- mean(img_orig_foci)
mean_factor <- 0.01/orig_mean
img_orig_foci <- img_orig_foci*mean_factor
display(img_orig_foci)
#### normalise the foci image
if(bg < 1 && bg > 0){
offset = 2*bg
foci_th = foci_mask_crop > bg + offset
### smooth it
### maybe up the contrast first??
img_tmp_contrast = foci_mask_crop
print("cell counter is")
print(cell_count)
#display(foci_mask_crop)
w = makeBrush(size = 3, shape = 'gaussian', sigma = 3)
img_flo = filter2(img_tmp_contrast, w)
## only choose objects above bright pixel value
#foci_th = foci_mask_crop > 0.2
#foci_th = img_flo > 0.2
display(img_flo)
## smooth foci channel
foci_th = img_flo > bg + offset
#foci_th = img_flo > 0.05
display(foci_th)
foci_label = bwlabel(foci_th)
foci_label <- channel(foci_label, "grey")
display(colorLabels(strands))
num_strands <- computeFeatures.shape(strands)
num_strands <- data.frame(num_strands)
##### print properties of the images
### multiply strands by foci_label
display(rgbImage(strands,foci_label,0*foci_label))
coincident_foci <- bwlabel(foci_label*strands)
display(colorLabels(coincident_foci))
overlap_no = table(coincident_foci)
foci_per_cell <- length(overlap_no)
print(foci_per_cell)
#print(file)
image_mat <- as.matrix(foci_mask_crop)
image_mat <- image_mat[image_mat > 1e-06]
hist(image_mat)
mean_ratio <- median(image_mat)/mean(image_mat)
skew <- (median(image_mat)-mean(image_mat))/sd(image_mat)
### look at properties of the foci.
foci_candidates <- computeFeatures.shape(foci_label)
foci_candidates <- data.frame(foci_candidates)
foci_areas <- foci_candidates$s.area
if (sd(foci_areas)<20 && foci_per_cell >0){
foci_counts <- append(foci_counts,foci_per_cell)
}
}
###
}
}
hist(foci_counts, breaks =7 )
print(mean(foci_counts))
print(median(foci_counts))
print(sd(foci_counts))
return(foci_counts)
}
#' get_pachytene
#'
#' Identifies crops in pachytene
#'
#' @import EBImage
#' @param file_list The file list
#' @param img_path The path
#' @return Pairs of foci and SC channel crops for pachytene
get_pachytene <- function(file_list, img_path)
{
# input :
# output : a bunch of output jpegs? Or save them all?
#BiocManager::install("EBImage")
library(EBImage)
cell_count <- 0
image_count <-0
pair <- 0
pachytene_count <- 0
## for each image that is *-dna.jpeg,
for (file in file_list){
setwd(img_path)