查看原文
其他

一招学会热图(heatmap)绘制之R语言heatmap函数

Mr.Fantasy,春卷 靠谱Bioplot 2022-06-07

写在前面

本文介绍使用R语言内置函数heatmap()绘制热图。老调重弹,在了解R语言基本思维、使用方法之后,每学一个新的函数或者方法都有迹可循。一般都是先使用帮助文档学习新函数的定义和使用方法,接着搜索测试数据进行训练即可。文末附heatmap说明书。

Step 1 打开Rstudio并设置工作路径

### 准备
rm(list = ls()) #清空界面
getwd() #查看当前工作路径
#设置临时工作路径
setwd("/Users/xuefei/Documents/Heatmap1/"
dir() # 列出当前工作路径下的文件

函数setwd()只会建立一个临时的工作路径,当关闭Rstudio并重新打开时,project会恢复到原来的工作路径,因此我们推荐使用新建project的方式管理R文件。

Step 2 输入并测试数据

a=c(12,14,17,11,16)
b=c(4,20,15,11,9)
c=c(5,7,19,8,18)
d=c(15,13,11,17,16)
e=c(12,19,16,7,9)

A=cbind(a,b,c,d,e)
B=rbind(a,b,c,d,e)

Step 3

根据说明书中用法修改代码,具体如下

代码1,输入需要绘图矩阵名字,其他不变

require(graphics); require(grDevices)
x  <- as.matrix(A) # 修改矩阵名字
rc <- rainbow(nrow(x), start = 0end = .3)
cc <- rainbow(ncol(x), start = 0end = .3)
hv <- heatmap(x, col = cm.colors(256), scale = "column",
              RowSideColors = rc, ColSideColors = cc, margins = c(5,10),
              xlab = "specification variables", ylab =  "Car Models",
              main = "heatmap(<Mtcars data>, ..., scale = \"column\")")
utils::str(hv) # the two re-ordering index vectors

图1,可见颜色变化及行、列树图

代码2 去除列树图

## 代码2
## no column dendrogram (nor reordering) at all: 
## 完全没有列树图(或重新排序),即将Colv定义为缺失值NA 
heatmap(x, Colv = NA, col = cm.colors(256), scale = "column",
        RowSideColors = rc, margins = c(5,10),
        xlab = "specification variables", ylab =  "Car Models",
        main = "heatmap(<Mtcars data>, ..., scale = \"column\")")

图2,列树图消失

代码3 使用说明书中示例代码进行训练

因为结果图众多,因此大家可复制以下链接,在线训练并查看标准答案(结果图案)。

https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/heatmap

# NOT RUN {
round(Ca <- cor(attitude), 2)
symnum(Ca) # simple graphic
heatmap(Ca,               symm = TRUE, margins = c(6,6)) # with reorder()行数图有序
heatmap(Ca, Rowv = FALSE, symm = TRUE, margins = c(6,6)) # _NO_ reorder()行数图无序
## slightly artificial with color bar, without and with ordering添加颜色bar:
cc <- rainbow(nrow(Ca))
heatmap(Ca, Rowv = FALSE, symm = TRUE, RowSideColors = cc, ColSideColors = cc,
    margins = c(6,6))
heatmap(Ca,        symm = TRUE, RowSideColors = cc, ColSideColors = cc,
    margins = c(6,6))

## For variable clustering, rather use distance based on cor():
symnum( cU <- cor(USJudgeRatings) )

hU <- heatmap(cU, Rowv = FALSE, symm = TRUE, col = topo.colors(16),
             distfun = function(c) as.dist(1 - c), keep.dendro = TRUE)
## The Correlation matrix with same reordering:
round(100 * cU[hU[[1]], hU[[2]]])
## The column dendrogram:
utils::str(hU$Colv)
# }

heatmap函数说明书

大段落预警!٩(๑>◡<๑)۶
使用“?heatmap”函数调出说明书

(1) 查看定义

A heat map is a false color image (basically image(t(x))) with a dendrogram added to the left side and to the top. Typically, reordering of the rows and columns according to some set of values (row or column means) within the restrictions imposed by the dendrogram is carried out.
热图是一种在左侧和顶部有系统树图的假彩色图像(基于通用函数image(t(x))。通常,树图是由数据根据某些数集(行或列表示)重排列而生成。如下图

(2) 使用方法

(3) 通过解读说明书理解参数

(1) x: numeric matrix of the values to be plotted. 需要绘图的矩阵
(2) Rowv: determines if and how the row dendrogram should be computed and reordered. Either a dendrogram or a vector of values used to reorder the row dendrogram or NA to suppress any row dendrogram (and reordering) or by default, NULL, see ‘Details’ below.
确定行树图的计算方法并判断是否需要重排。不是依据树图或一个向量值来重排形成行树图,就是通过缺失值NA或者默认值、空值NULL来定义行树图,具体可见“Details”
简单说来,就是Rowv参数定义行树图的形成。
(3) Colv: determines if and how the column dendrogram should be reordered. Has the same options as the Rowv argument above and additionally when x is a square matrix, Colv = "Rowv" means that columns should be treated identically to the rows (and so if there is to be no row dendrogram there will not be a column one either).
Colv与Rowv参数类似,用于定义列树图。另外,当x是一个方阵时,Colv = "Rowv"意味着列应该与行同等对待(因此,如果没有行树图,列树图也不会存在)。
(4) distfun: function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.
用于计算行与列之间差异性。默认distfun = dist
(5) hclustfun: function used to compute the hierarchical clustering when Rowv or Colv are not dendrograms. Defaults to hclust. Should take as argument a result of distfun and return an object to which as.dendrogram can be applied.
当Rowv或Colv不是树状图时,用于计算层次聚类的函数。默认hclustfun = clust。应将distfun的结果作为参数,并返回一个可以应用于as.dendrogram的对象。
(6) reorderfun: function(d, w) of dendrogram and weights for reordering the row and column dendrograms. The default uses reorder.dendrogram.
重新排序行和列树图及重量的功能。默认使用reorder.dendrogram。
(7) add.expr:expression that will be evaluated after the call to image. Can be used to add components to the plot.
表达式将在调用image后被评估。可用于将组件添加到绘图
(8) symm: logical indicating if x should be treated symmetrically; can only be true when x is a square matrix.
逻辑指示是否对x进行对称处理;只有当x是方阵时才成立
(9) revC:logical indicating if the column order should be reversed for plotting, such that e.g., for the symmetric case, the symmetry axis is as usual.
用于绘图的列顺序是否应该反转,在这种例子下,对于对称情况,对称轴与往常一样。
(10) scale:character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. The default is "row" if symm false, and "none" otherwise.
字符说明是否该在行方向或列方向中居中和缩放,或不缩放。当symm为否定时,默认值是“row”,否则为“none”。
(11) na.rm: logical indicating whether NA's should be removed.
逻辑表明NA的是否应该被删除。
(12) margins:numeric vector of length 2 containing the margins (see par(mar= *)) for column and row names, respectively.
长度为2的数字向量,分别包含列名和行名的边距(参见par(mar = *))。
(13) ColSideColors:(optional) character vector of length ncol(x) containing the color names for a horizontal side bar that may be used to annotate the columns of x.
(可选)长度ncol(x)的字符向量,包含可用于注释x列的水平边栏的颜色名称。
(14) RowSideColors:(optional) character vector of length nrow(x) containing the color names for a vertical side bar that may be used to annotate the rows of x.
(可选)长度的字符向量nrow(x)包含可以用来注释x行垂直侧栏的颜色名称。
(15) cexRow, cexCol:positive numbers, used as cex.axis in for the row or column axis labeling. The defaults currently only use number of rows or columns, respectively.
正数,用作cex。轴在为行或列轴标记。当前的默认值分别只使用行或列的数量
(16) labRow, labCol:character vectors with row and column labels to use; these default to rownames(x) or colnames(x), respectively.
使用行和列标签的字符向量;这些默认为rownames(x)或colnames(x)。
(17) main, xlab, ylab:main, x- and y-axis titles; defaults to none.
主标题、x轴标题和y轴标题;默认为none。
(18) keep.dendro
logical indicating if the dendrogram(s) should be kept as part of the result (when Rowv and/or Colv are not NA).
逻辑表示,如果树图(s)应作为结果的一部分保留(Rowv和/或Colv不适用)。
(19) verbose:logical indicating if information should be printed.
逻辑说明是否应打印信息。

additional arguments passed on to image, e.g., col specifying the colors.
其他传递给图像的参数,例如,col指定颜色。

(4) 详细说明

4.1 Details

(1): If either Rowv or Colv are dendrograms they are honored (and not reordered). Otherwise, dendrograms are computed as dd <- as.dendrogram(hclustfun(distfun(X))) where X is either x or t(x).
如果Rowv或Colv都是树状图,他们将会被保留(而不是重新排序)。否则,树状图被计算为dd <- as.树状图(hclustfun(distfun(X))),其中X为X或t(X)
(2): If either is a vector (of ‘weights’) then the appropriate dendrogram is reordered according to the supplied values subject to the constraints imposed by the dendrogram, by reorder(dd, Rowv), in the row case. If either is missing, as by default, then the ordering of the corresponding dendrogram is by the mean value of the rows/columns, i.e., in the case of rows, Rowv <- rowMeans(x, na.rm = na.rm). If either is NA, no reordering will be done for the corresponding side.
如果其中一个向量(“权值”),那么适当的树状图将根据树状图施加的约束(dd, Rowv),在行情况下根据树状图提供的值重新排序。如果其中任何一个缺失(默认情况下),那么相应树状图的顺序就是行/列的平均值,即,对于行,Rowv <- rowMeans(x, na)。rm = na.rm)。如果其中任何一个是NA,则不需要对对应的一侧进行重新排序。
(3): By default (scale = "row") the rows are scaled to have mean zero and standard deviation one. There is some empirical evidence from genomic plotting that this is useful.
默认情况下(scale = "row"),这些行被缩放到均值为零,标准差为1。有一些来自基因组绘图的经验证据表明这是有用的。
(4): The default colors are not pretty. Consider using enhancements such as the RColorBrewer package.
默认颜色比较一般。考虑使用增强R包,例如RColorBrewer package。

4.2 Value

Invisibly, a list with components
不可见的,包含组件的列表
(1) rowInd: row index permutation vector as returned by order.dendrogram.
由order.dendrogram返回的行索引排列向量。
(7) colInd:  column index permutation vector.
列索引排列向量。
(8) Rowv:  the row dendrogram; only if input Rowv was not NA and keep.dendro is true.
行系统树图;只有输入Rowv不是NA,及keep.dendro是Ture时。
(9) Cold:  the column dendrogram; only if input Colv was not NA and keep.dendro is true.
列系统树图;只有当输入Colv不是NA,及keep.dendro是Ture时。

4.3 Note

(1): Unless Rowv = NA (or Colw = NA), the original rows and columns are reordered in any case to match the dendrogram, e.g., the rows by order.dendrogram(Rowv) where Rowv is the (possibly reorder()ed) row dendrogram.
除非Rowv = NA(或Colw = NA),否则原始的行和列在任何情况下都会被重新排序以匹配树状图,例如,按顺序排列的行。
(2): heatmap() uses layout and draws the image in the lower right corner of a 2x2 layout. Consequentially, it can not be used in a multi column/row layout, i.e., when par(mfrow = *) or (mfcol = *) has been called.
heatmap()使用布局并在一个2x2布局的右下角绘制图像。因此,它不能用于多列/行布局,即,当par(mfrow = *)或(mfcol = *)被调用时。

(5) 示例(非常重要)

这里只放了示例中的前半部分,具体可按照说明书训练

# NOT RUN {
require(graphics); require(grDevices)
x  <- as.matrix(mtcars)
rc <- rainbow(nrow(x), start = 0, end = .3)
cc <- rainbow(ncol(x), start = 0, end = .3)
hv <- heatmap(x, col = cm.colors(256), scale = "column",
              RowSideColors = rc, ColSideColors = cc, margins = c(5,10),
              xlab = "specification variables", ylab =  "Car Models",
              main = "heatmap(<Mtcars data>, ..., scale = \"column\")")
utils::str(hv) # the two re-ordering index vectors
## no column dendrogram (nor reordering) at all:
heatmap(x, Colv = NA, col = cm.colors(256), scale = "column",
        RowSideColors = rc, margins = c(5,10),
        xlab = "specification variables", ylab =  "Car Models",
        main = "heatmap(<Mtcars data>, ..., scale = \"column\")")
# }

将"NOT RUN"改成"RUN"后运行代码,则自动生成下图



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存