Commit 9f46136b authored by Jeffrey Pullin's avatar Jeffrey Pullin
Browse files

Add RankCorr directory

Initially I wanted to add this via git submodule but the git repo contains pycache files which change every time RankCorr is run! I have therefore simply downloaded the repo, removed the offending files, and commited the rest. Another option would be to git submodule a fork.

Ulimately, it seems unlikely that RankCorr will be updated soon so the download and commit strategy is probably sufficent.
parent e4bd7a9a
Pipeline #7419 passed with stage
in 8 seconds
# RankCorr
A marker selection method for scRNA-seq data based on rank correlation. See the notebook `RankCorr-example.ipynb` for a full walkthough of how to run the method; an outline is presented below.
The RankCorr method is contained in a highly modified version of the
[PicturedRocks](https://github.com/picturedrocks) data analysis package.
The modified version is included here. See
the PicturedRocks repository for further information and extra (new) tools!
```
from picturedRocks import Rocks
```
Required inputs for the `Rocks` class:
* `X`, an `np.ndarry` of gene counts. Each row should contain the genetic information from a cell; the columns of `X` correspond to the genes (note that this is the transpose of some commonly used packages).
* `y`, a vector of cluster labels. These labels must be consecutive integers starting at 0.
```
data = Rocks(X, y)
lamb = 2.0 # the sparsity parameter
markers = data.CSrankMarkers(lamb=lamb)
```
%% Cell type:markdown id: tags:
07 May 2020
# An example: running RankCorr on Paul
%% Cell type:markdown id: tags:
For editing packages - don't need to run this
%% Cell type:code id: tags:
``` python
%load_ext autoreload
%autoreload 2
```
%% Cell type:code id: tags:
``` python
import numpy as np
import pandas as pd
```
%% Cell type:markdown id: tags:
Also load scanpy for easy access to the Paul data set. Check out the scanpy repository at https://github.com/theislab/scanpy
%% Cell type:code id: tags:
``` python
import scanpy.api as sc
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80, color_map='viridis') # low dpi (dots per inch) yields small inline figures
sc.logging.print_versions()
```
%% Output
/home/ahsvargo/miniconda3/envs/bmc/lib/python3.8/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import RangeIndex
/home/ahsvargo/miniconda3/envs/bmc/lib/python3.8/site-packages/scanpy/api/__init__.py:2: FutureWarning:
In a future version of Scanpy, `scanpy.api` will be removed.
Simply use `import scanpy as sc` and `import scanpy.external as sce` instead.
warnings.warn(
scanpy==1.4.6 anndata==0.7.1 umap==0.4.0 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.1 statsmodels==0.11.0 python-igraph==0.8.0
%% Cell type:code id: tags:
``` python
import anndata
```
%% Cell type:markdown id: tags:
## Load the RankCorr methods
%% Cell type:markdown id: tags:
The RankCorr code is currently in a heavily modified version of the PicturedRocks package. See the PicturedRocks repo at https://github.com/umangv/picturedrocks for the original package.
The modified package is included in the code here - this needs to be loading the local version for the remainder of the code to run
%% Cell type:code id: tags:
``` python
from picturedRocks import Rocks
```
%% Cell type:markdown id: tags:
Required inputs for the `Rocks` class:
* `X`, an `np.ndarry` of gene counts. Each row should contain the genetic information from a cell; the columns of `X` correspond to the genes (note that this is the transpose of some commonly used packages).
* `y`, a vector of cluster labels. These labels must be consecutive integers starting at 0.
%% Cell type:markdown id: tags:
## Load the Paul dataset
This will automatically download the data set if this is your first time running it.
%% Cell type:code id: tags:
``` python
dataset = "paul15"
```
%% Cell type:code id: tags:
``` python
adata = sc.datasets.paul15()
```
%% Output
WARNING: In Scanpy 0.*, this returned logarithmized data. Now it returns non-logarithmized data.
... storing 'paul15_clusters' as categorical
Trying to set attribute `.uns` of view, copying.
%% Cell type:code id: tags:
``` python
adata
```
%% Output
AnnData object with n_obs × n_vars = 2730 × 3451
obs: 'paul15_clusters'
uns: 'iroot'
%% Cell type:markdown id: tags:
Create the required vector of cluster labels based on the strings provided in the AnnData object.
%% Cell type:code id: tags:
``` python
lookup = list(adata.obs['paul15_clusters'].cat.categories)
yVec = np.array([lookup.index( adata.obs['paul15_clusters'][i] ) for i in range(adata.obs['paul15_clusters'].shape[0]) ])
```
%% Cell type:markdown id: tags:
Here are cluster names from the Paul data set. See Paul (2015).
%% Cell type:code id: tags:
``` python
lookup
```
%% Output
['1Ery',
'2Ery',
'3Ery',
'4Ery',
'5Ery',
'6Ery',
'7MEP',
'8Mk',
'9GMP',
'10GMP',
'11DC',
'12Baso',
'13Baso',
'14Mo',
'15Mo',
'16Neu',
'17Neu',
'18Eos',
'19Lymph']
%% Cell type:markdown id: tags:
Create the `Rocks` object as outlined above
%% Cell type:code id: tags:
``` python
data = Rocks(adata.X, yVec)
# PicturedRocks provides normalization capabilities, though this shouldn't be used for marker selection.
#data.normalize(log=False, totalexpr=10000)
'''
# It is also possible to use the PicturedRocks for fold testing, to match the results from the manuscript.
# This will be discussed more in the future.
ft = FoldTester(data)
folds = np.load("paul15-scviFolds.npz")["folds"]
ft.folds = folds
ft.validatefolds()
ft.makerocks(verbose=0)
'''
```
%% Output
'\n# It is also possible to use the PicturedRocks for fold testing, to match the results from the manuscript. \n# This will be discussed more in the future.\nft = FoldTester(data)\nfolds = np.load("paul15-scviFolds.npz")["folds"]\nft.folds = folds\nft.validatefolds()\n\nft.makerocks(verbose=0)\n'
%% Cell type:markdown id: tags:
## Run RankCorr
%% Cell type:markdown id: tags:
The main RankCorr method is `CSrankMarkers.` In addition to the data provided by the `Rocks` object, it requires one parameter: