超详细的R语言热图之complexheatmap系列1
热图是非常常见的图形,在R语言中有非常多的R包可以画热图,比如pheatmap
等,但complexheatmap
包无疑是其中的老大哥,其功能之全面远超其他同类R包。
在日常使用中我也发现pheatmap
逐渐不能满足我的需求,于是便有了complexheatmap
包的学习。
本系列内容非常多,将通过多篇推文逐渐介绍,欢迎大家关注我的公众号:医学和生信笔记。
本系列是对
ComplexeHeatmap
包的学习笔记,部分内容根据自己的理解有适当的改动,但总体不影响原文。如有不明之处,以原文为准。原文请见:https://jokergoo.github.io/ComplexHeatmap-reference/book/
第一章 简介
复杂热图可用于展示同一个数据集或不同数据集之间的关系或揭示内部规律。ComplexHeatmap
包可提供灵活的热图展示及高度自定义的注释图形。
1.1 设计理念
一个完整的热图由热图主体和热图组件构成。热图主体可以被分为不同的行和列,热图组件包括行/列标题,聚类树,行名/列名,行注释条/列注释条。
热图列表由多个热图主体和热图注释组成,但不同的热图主体和注释被有序排列,使得彼此之间具有较好的可比性。
ComplexHeatmap
包是面向对象的,主要包括以下类:
Heatmap class: 单个热图,包括热图主体,行名/列名,标题,聚类树,行注释条/列注释条; HeatmapList class: 多个热图主体和热图注释; HeatmapAnnotation class: 定义一系列的行注释/列注释,这些注释既可以作为热图组件,又可以独立于热图;
还有一些其他类:
SingleAnnotation class: 定义单个行注释/列注释,包含在 HeatmapAnnotation class中; ColorMapping class: 映射颜色,包括热图主体颜色和各种注释的颜色 AnnotationFunction class: 创建用户自定义的注释
ComplexHeatmap
是基于grid
的,充分利用此包需要用户了解grid
绘图系统的知识。
1.2 各章节速览
1.简介complexheatmap
的设计理念,简答介绍。
2. 单个热图
介绍单个热图的组成
3. 热图注释
热图注释概念,如何绘制简单注释和复杂注释,简单注释和复杂注释的不同
4. 热图列表
如何绘制多个热图和注释,它们的位置排布是怎样安排的
5. 图例
如何绘制热图主体和注释条的图例,如何自定义图例
6. 热图装饰
如何添加用户自定义图形
7. 瀑布图
8. UpSet plot
9. 其他高阶图形
10. 和其他R包交互
11. 交互式热图
12. 更多例子
第二章 单个热图
单个热图是最常见的可视化图形,虽然ComplexHeatmap包的闪光点是可以同时绘制多个热图,但是作为基本图形,对单个热图的绘制也是很重要的。
首先随机生成一个矩阵
set.seed(123)
nr1 = 4; nr2 = 8; nr3 = 6; nr = nr1 + nr2 + nr3
nc1 = 6; nc2 = 8; nc3 = 10; nc = nc1 + nc2 + nc3
mat = cbind(rbind(matrix(rnorm(nr1*nc1, mean = 1, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc1, mean = 0, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc1, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc2, mean = 0, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc2, mean = 1, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc2, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc3, mean = 0.5, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc3, mean = 0.5, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc3, mean = 1, sd = 0.5), nr = nr3))
)
mat = mat[sample(nr, nr), sample(nc, nc)] # random shuffle rows and columns
rownames(mat) = paste0("row", seq_len(nr))
colnames(mat) = paste0("column", seq_len(nc))
dim(mat)
## [1] 18 24
Heatmap()
函数是绘制热图的基本函数,它会绘制一个热图主体,行名,列名,聚类树和注释。默认的颜色是黄色系的。
library(ComplexHeatmap)
## 载入需要的程辑包:grid
## ========================================
## ComplexHeatmap version 2.8.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
##
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
##
## The new InteractiveComplexHeatmap package can directly export static
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
heatmap(mat)
2.1 颜色
对于热图可视化,颜色是数据矩阵的主要表示形式。在大多数情况下,热图用于可视化连续数值矩阵。在这种情况下,用户应提供颜色映射功能。颜色映射函数接受数值型向量,并返回对应的颜色向量。用户应始终使用circlize::colorRamp2()
函数在Heatmap()
中生成颜色映射。colorRamp2()
的两个参数是离散型数值向量和对应的颜色向量。colorRamp2()
通过LAB颜色空间在每个间隔内线性插值颜色。另外,使用colorRamp2()
有助于生成带有适当刻度线的图例。
在以下示例中,线性插值-2和2之间的值以获得相应的颜色,大于2的值都映射为红色,小于-2的值都映射为绿色。
library(circlize)
## ========================================
## circlize version 0.4.13
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
##
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
## in R. Bioinformatics 2014.
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(circlize))
## ========================================
col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
col_fun(seq(-3, 3))
## [1] "#00FF00FF" "#00FF00FF" "#B1FF9AFF" "#FFFFFFFF" "#FF9E81FF" "#FF0000FF"
## [7] "#FF0000FF"
Heatmap(mat, name = "mat", col = col_fun)
使用colorRamp2()
可以精确控制颜色映射范围,并且不会受到极端值的影响。
mat2 = mat
mat2[1, 1] = 100000
Heatmap(mat2, name = "mat", col = col_fun,
column_title = "a matrix with outliers")
另外,使用colorRamp2()
可以使得多个热图之间的颜色具有可比性,如下所示,在3个热图中,相同的颜色总是对应相同的数值:
p1 <- Heatmap(mat, name = "mat", col = col_fun, column_title = "mat")
p2 <- Heatmap(mat/4, name = "mat/4", col = col_fun, column_title = "mat/4")
p3 <- Heatmap(abs(mat), name = "abs(mat)", col = col_fun, column_title = "abs(mat)")
p1 + p2 + p3
如果矩阵是连续的,也可以简单地提供颜色的向量,并且颜色将被线性插值。 但是此方法对异常值不友好,因为映射总是从矩阵中的最小值开始,以最大值结束。
Heatmap(mat, name = "mat", col = rev(rainbow(10)), column_title = "set a color vector for continuous matrix")
还可以可视化NA,使用na_col = "xxx"
指定NA的颜色:
mat_with_na = mat
na_index = sample(c(TRUE, FALSE), nrow(mat)*ncol(mat), replace = TRUE, prob = c(1, 9))
mat_with_na[na_index] = NA
Heatmap(mat_with_na, name = "mat with na", na_col = "black", column_title = "a matrix with na")
改变colorRamp2()
函数的线性插值
f1 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"))
f2 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"), space = "RGB")
p1 <- Heatmap(mat, name = "mat1", col = f1, column_title = "color space in LAB")
p2 <- Heatmap(mat, name = "mat2", col = f2, column_title = "color space in RGB")
p1 + p2
热图边框的样式通过border_gp
函数控制,热图每个小格子的样式由rect_gp = gpar()
函数控制。
Heatmap(mat, name = "mat1", border_gp = gpar(lty = 2, col = "red"), column_title = "set heatmap border")
Heatmap(mat, name = "mat2", column_title = "set cell border", rect_gp = gpar(col = "white", lty = 1, lwd = 2))
如果设置type = "none"
,热图主体部分不会画任何东西,可以通过cell_fun
和layer_fun
自定义,后面会介绍。
Heatmap(mat, name = "lalala", rect_gp = gpar(type="none"), column_title = "no heatmap body")
2.2 行标题/列标题
添加行标题和列标题:
Heatmap(mat, name = "color", column_title = "i am column title", row_title = "i am row title")
更改标题位置:
Heatmap(mat, name = "color", column_title = "i am column title", row_title = "i am row title", column_title_side = "bottom", row_title_side = "right")
旋转行/列标题:
Heatmap(mat, name = "color", column_title = "i am title", column_title_rot = 90, row_title = "i am row title", row_title_rot = 0)
更改行/列标题样式:
Heatmap(mat, name = "color", column_title = "i am column title", row_title = "i am row title",
column_title_gp = gpar(fontsize = 20, fontface = "bold"),
row_title_gp = gpar(col = "steelblue", fontsize = 16, fill = "red", border = "green")
)
标题是公式:
Heatmap(mat, name = "mat",
column_title = expression(hat(beta) == (X^t * X)^{-1} * X^t * y))
2.3 聚类
支持各种自定义
关闭聚类(不聚类):
p1 <- Heatmap(mat)
p2 <- Heatmap(mat, cluster_rows = F, cluster_columns = F)
p1 + p2
聚类但是不显示聚类树:
Heatmap(mat, show_row_dend = T, show_column_dend = F)
调整聚类树的位置:
Heatmap(mat, row_dend_side = "right", column_dend_side = "bottom")
调整聚类树的高度和宽度:
Heatmap(mat, row_dend_width = unit(4, "cm"), column_dend_height = unit(3, "cm"))
2.3.1 距离计算方法
支持:
pearson
,spearson
,kendall
,三选一;自定义距离计算函数
p1 <- Heatmap(mat, name = "mat1", clustering_distance_rows = "pearson",
column_title = "pre-defined distance method(1-pearson)")
p2 <- Heatmap(mat, name = "mat2", clustering_distance_rows = function(m) dist(m),
column_title = "a function that calculates distance matrix")
p1 + p2
2.3.2 聚类方法
支持hclust
函数提供的方法
Heatmap(mat, name = "mat", clustering_method_rows = "single")
2.3.3 自定义聚类树颜色
可以借助dendextend包自定义聚类树的颜色,具体做法如下:
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.15.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## Or contact: <tal.galili@gmail.com>
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## 载入程辑包:'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
row_dend = as.dendrogram(hclust(dist(mat)))
row_dend = color_branches(row_dend, k = 2) # `color_branches()` returns a dendrogram object
Heatmap(mat, name = "mat", cluster_rows = row_dend)
row_dend_gp
和column_dend_gp
参数控制聚类树样式,使用此参数会覆盖row_dend
和column_dend
:
Heatmap(mat, name = "mat", cluster_rows = row_dend, row_dend_gp = gpar(col = "red"))
从2.5.6版本以后,可以通过提供合适的nodePar
给树的节点使用不同的形状:
row_dend = dendrapply(row_dend, function(d) {
attr(d, "nodePar") = list(cex = 0.8, pch = sample(20, 1), col = rand_color(1))
return(d)
})
Heatmap(mat, name = "mat", cluster_rows = row_dend, row_dend_width = unit(2, "cm"))
2.3.4 重新排列聚类树
在Heatmap()
函数中,对聚类树进行重新排序,以使具有较大差异的行/列彼此分离(请参阅reorder.dendrogram()
文档)。 此处的差异(或称权重)是通过行/列的均值来计算的。如果将其设置为逻辑值,则row_dend_reorder
和column_dend_reorder
控制是否应用聚类树重排序。 如果将两个参数设置为数值向量,则它们还控制重排序的权重(会被传递给reorder.dendrogram()
的wts
参数)。可以通过设置row_dend_reorder = F
来关闭重新排序。
默认情况下,如果将cluster_rows/cluster_columns
设置为逻辑值或聚类函数,聚类树会重新排序。 如果将cluster_rows/cluster_columns
设置为聚类对象,则会关闭重排序。
m2 = matrix(1:100, nr = 10, byrow = TRUE)
Heatmap(m2, name = "mat1", row_dend_reorder = FALSE, column_title = "no reordering")
Heatmap(m2, name = "mat2", row_dend_reorder = TRUE, column_title = "apply reordering")
还有非常多重新排序聚类树的方法,可以使用使用dendsort包,所有的重新排序的方法都是返回排列好的聚类树对象,因此我们可以先生成排列好的行/列聚类树对象,然后再传递给cluster_rows
和cluster_columns
参数。
Heatmap(mat, name = "mat", column_title = "default reordering")
library(dendsort)
row_dend = dendsort(hclust(dist(mat)))
col_dend = dendsort(hclust(dist(t(mat))))
Heatmap(mat, name = "mat", cluster_rows = row_dend, cluster_columns = col_dend,
column_title = "reorder by dendsort")
2.4 改变行/列顺序
聚类可以改变行/列顺序,我们也可以通过row_order
和column_order
手动改变行/列顺序
Heatmap(mat, name = "mat", row_order = order(as.numeric(gsub("row", "", rownames(mat)))),
column_order = order(as.numeric(gsub("column", "", colnames(mat)))),
column_title = "reorder matrix")
Heatmap(mat, name = "mat", row_order = sort(rownames(mat)),
column_order = sort(colnames(mat)),
column_title = "reorder matrix by row/column names")
2.5 Seriation包排序
Seriation包是专门用来排序的,(详见: http://nicolas.kruchten.com/content/2018/02/seriation/),一些用法如下:
library(seriation)
o = seriate(max(mat) - mat, method = "BEA_TSP")
Heatmap(max(mat) - mat, name = "mat",
row_order = get_order(o, 1), column_order = get_order(o, 2),
column_title = "seriation by BEA_TSP method")
o1 = seriate(dist(mat), method = "TSP")
o2 = seriate(dist(t(mat)), method = "TSP")
Heatmap(mat, name = "mat", row_order = get_order(o1), column_order = get_order(o2),
column_title = "seriation from the distance matrix")
o1 = seriate(dist(mat), method = "GW")
## Registered S3 method overwritten by 'gclus':
## method from
## reorder.hclust seriation
o2 = seriate(dist(t(mat)), method = "GW")
Heatmap(mat, name = "mat", cluster_rows = as.dendrogram(o1[[1]]),
cluster_columns = as.dendrogram(o2[[1]]))
以上就是本系列第1篇的内容,本系列内容较多,更多内容将逐步推送!
欢迎大家关注我的公众号:医学和生信笔记
医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!