Single-cell-Data

KODAMA

Single-cell data

The data set from Tasic et al. encompasses 23,822 cells from adult mouse cortex, split by the authors into 133 clusters with strong hierarchical organisation. A standard preprocessing pipeline consisting of sequencing depth normalisation, feature selection, log-transformation, and reducing the dimensionality to 50 PCs was applied as described by Kobak & Berens in The art of using t-SNE for single-cell transcriptomics.

Tutorial

Download the data from here and unpack. Direct links: VISp, ALM. To get the information about cluster colors and labels (sample_heatmap_plot_data.csv), open the interactive data browser, go to "Sample Heatmaps", click "Build Plot!" and then "Download data as CSV".

ta=read.csv("tasic-sample_heatmap_plot_data.txt")
rownames(ta)=ta[,1]
VIS=read.csv("mouse_VISp_gene_expression_matrices_2018-06-14/mouse_VISp_2018-06-14_exon-matrix.csv")
ALM=read.csv("mouse_ALM_gene_expression_matrices_2018-06-14/mouse_ALM_2018-06-14_exon-matrix.csv")

The intron and exon data are merged and the zeros columns are removed.

data=t(cbind(ALM,VIS))
colnames(data)=as.character(data[1,])
data=data[-1,]
ii=intersect(rownames(data),rownames(ta))
data=data[ii,]
data=data[,colSums(data)!=0]
near.zero.counts=colMeans(data<32)

The data are normalized and the converted to log ratios.

temp=data
temp[temp<=32]=NA
temp=log2(temp)
m=colMeans(temp,na.rm = TRUE)
y=exp(-1.5*(m-6.56))+0.02
data=data[,which(near.zero.counts>y)]
su=rowSums(data)
data=((data/su)*10^6)*median(su)
data=log2(data+1)

The first 50 principal components are calculated.

pca=prcomp(data)$x[,1:50]

The KODAMA algorithm is then applied to the 50 PCA and t-SNE is used to visualize the KODAMA dissimilarity matrix.

kk=KODAMA.matrix(pca)
res_KODAMA_tSNE <- KODAMA.visualization(kk)
plot(res_KODAMA_tSNE,pch=21,bg=ta[,"cluster_color"],main="KODAMA", xlab= "First dimension", ylab = "Second dimension")

The KODAMA clustering are then visualized

KODAMA Visualization