今天以ggplot2的一个坑来说一下,坑无处不在,防不胜防,你大可以试一下下面的代码:
> set.seed(123)
> require(ggplot2)
Loading required package: ggplot2
> rnorm(3)
[1] 0.8005543 1.1902066 -1.6895557
> set.seed(123)
> rnorm(3)
[1] -0.5604756 -0.2301775 1.5587083
在两次set.seed和rnorm之间,第一次因为加载了ggplot2,结果就不一样了!这必须是第二次是正确答案,也就是说加载ggplot2把你的seed给吃了!加载包会改变R环境?这绝对不是好主意,我们来试试加载别的包试试,比如我的clusterProfiler:
> set.seed(123)
> require(clusterProfiler)
Loading required package: clusterProfiler
Loading required package: DOSE
DOSE v3.4.0 For help: https://guangchuangyu.github.io/DOSE
If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609
clusterProfiler v3.6.0 For help: https://guangchuangyu.github.io/clusterProfiler
If you use clusterProfiler in published research, please cite:
Guangchuang Yu., Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
> rnorm(3)
[1] -0.5604756 -0.2301775 1.5587083
显然并不影响!这才是正确的打开方式。而这种雷,你不小心踩了,都不知道在那里死的!
我们来看看代码:
https://github.com/tidyverse/ggplot2/blob/master/R/zzz.r
.onAttach <- function(...) {
if (!interactive() || stats::runif(1) > 0.1) return()
tips <- c(
"Need help? Try the ggplot2 mailing list: http://groups.google.com/group/ggplot2.",
"Find out what's changed in ggplot2 at http://github.com/tidyverse/ggplot2/releases.",
"Use suppressPackageStartupMessages() to eliminate package startup messages.",
"Stackoverflow is a great place to get help: http://stackoverflow.com/tags/ggplot2.",
"Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
"Want to understand how all the pieces fit together? Buy the ggplot2 book: http://ggplot2.org/book/"
)
tip <- sample(tips, 1)
packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
}
因为你加载包的时候,hadley用了sample,也就是说你的seed,被加载时候的sample指令给用掉了。这个坑,就在2018-01-19被Jim Hester给修复了,用了withr::with_preserve_seed,这个坑存在了两年多啊!
.onAttach <- function(...) {
withr::with_preserve_seed({
if (!interactive() || stats::runif(1) > 0.1) return()
tips <- c(
"RStudio Community is a great place to get help: https://community.rstudio.com/c/tidyverse.",
"Find out what's changed in ggplot2 at https://github.com/tidyverse/ggplot2/releases.",
"Use suppressPackageStartupMessages() to eliminate package startup messages.",
"Need help? Try Stackoverflow: https://stackoverflow.com/tags/ggplot2.",
"Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
"Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/"
)
tip <- sample(tips, 1)
packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
})
}
之前写的R的诡异事件,都是R容易掉的坑,而大家面临的,远不止这些,因为「包治百病」嘛,都是在用R包,而各种R包,还可能有各种各样想不到的坑在等着你!而且修复bug这种事情,远不比表面看的那么简单,不信看看微软那30岁的bug(翻译来自:http://azaleasays.com/2017/01/22/30-year-old-bug-in-microsoft-excel/)
在 Excel 诞生之前,电子表格软件的天下是属于 Lotus 1-2-3 的。而 Lotus 1-2-3 就假设1900年是闰年,这样计算和处理闰年方便快捷。Excel 为了和市场领导者 Lotus 1-2-3 兼容,使用了同样的日期数据格式,并且兼容了这个 bug,这样用户就可以无缝地在 Excel 上读写 Lotus 1-2-3 文件。几年后,Excel 打败了 Lotus 1-2-3,但是 Excel 也要兼容自己老版本的文件,一旦修复了这个 bug,则:
所有 Excel 文件里的日期,都会差一天。修正这些数据要花费人力物力。
使用日期相关函数的公式,可能会得出错误结果。
会导致兼容 Excel 日期的其他软件不再兼容。
所以,我们决定让这个 bug 长命百岁。