问答|ggplot2 绘图中关于图例的一些 Tips~
之前有个培训班的小伙伴在会员群里问了个关于 ggplot2 添加图例的问题,所以我今天就帮他解决下!
案例引入
我们还是从案例入手,下面的案例中我使用的 2020 年 3 月 22 日世界各国的新冠肺炎确诊病例数量的数据。首先我们读取并整理这份数据(下载链接:https://mdniceczx.oss-cn-beijing.aliyuncs.com/time_series_19-covid-Confirmed.csv):
pacman::p_load(readxl, tidyverse, ggplot2, lubridate, scales, tidyr, purrr, ggrepel)
# 确诊
# 3 月 22 日的数据不完整,所以筛选 23 号之前的数据
read_csv('time_series_19-covid-Confirmed.csv') %>%
gather(5:ncol(.), key = "date", value = "confirmed") %>%
set_names(c("prov", "country", "lat",
"lon", "date", "confirmed")) %>%
mutate(country = case_when(
country == "Taiwan*" ~ "China",
country == "US" ~ "United States",
T ~ country
)) %>%
group_by(country, date) %>%
summarise(confirmed = sum(confirmed)) %>%
ungroup() %>%
distinct() %>%
mutate(date = mdy(date),
confirmed = if_else(is.na(confirmed), 0, confirmed)) %>%
arrange(country, date) %>%
dplyr::filter(date <= ymd("2020-03-22")) -> df
df 是这样的:
# install.packages('reactable')
reactable::reactable(df)
2020 年 3 月 22 日世界各国的新冠肺炎总确诊病例数为:
current_total <- subset(df, date == "2020-03-22") %>%
pull(confirmed) %>%
sum()
current_total
#> [1] 302728
接下来绘制一幅折线图展示各国的增长趋势:
# 各国的疫情发展折线图
enfont <- "IBMPlexMono-Medium"
df %>%
ggplot(aes(x = date, y = confirmed, color = country),
size = 1) +
geom_point() +
geom_line() +
geom_label_repel(data = subset(df,
date == ymd("2020-03-22")),
aes(label = paste0(country, ": ",
confirmed)),
family = enfont) +
theme_ipsum(base_family = enfont) +
theme(legend.position = "none") +
scale_x_date(breaks = date_breaks("10 days"),
labels = date_format("%m-%d"),
limits = ymd(c("2020-01-19", "2020-03-22"))) +
scale_color_manual(values = rev(rep(RColorBrewer::brewer.pal(n = 9, name = "Paired"), 21))) +
labs(title = paste0("Total Cases of COVID-19: ", current_total),
subtitle = "TidyFriday Project | 2020-03-22",
caption = "Data Source: John Hopkins University\n<https://github.com/CSSEGISandData/COVID-19>",
x = "", y = "Cases")
下面我们进入今天的正题,为了方便,我仅仅选择截止 3 月 22 日确诊人数最多的是个国家,这里可以用 top_n 函数:
# 查找 3 月 22 日确诊人数最多的前 10 个国家
df %>%
dplyr::filter(date == "2020-03-22") %>%
top_n(10, confirmed) %>%
arrange(-confirmed) %>%
pull(country) -> country_list
# 筛选出这些国家的数据并绘图
df %>%
dplyr::filter(country %in% country_list) %>%
ggplot(aes(x = date, y = confirmed, color = country),
size = 1) +
geom_point() +
geom_line() +
geom_label_repel(data = subset(df,
date == ymd("2020-03-22") &
country %in% country_list),
aes(label = paste0(country, ": ",
confirmed)),
family = enfont) +
theme_ipsum(base_family = enfont) +
scale_x_date(breaks = date_breaks("10 days"),
labels = date_format("%m-%d"),
limits = ymd(c("2020-01-19", "2020-03-22"))) +
scale_color_manual(values = RColorBrewer::brewer.pal(n = 10, name = "Paired")) +
labs(title = paste0("Total Cases of COVID-19: ", current_total),
subtitle = "TidyFriday Project | 2020-03-22",
caption = "Data Source: John Hopkins University\n<https://github.com/CSSEGISandData/COVID-19>",
x = "", y = "Cases")
这个图例其实是由三个映射生成的,因为我把 color = country 放在了 ggplot() 里面,所以这个参数会传递给下面的三个图层,最后这三个映射复合在一起才形成了这样的图例。
下面我们看一下如果我们把 color 映射为 confirmed:
# 如果我们设定颜色的深浅表示确诊数量
df %>%
dplyr::filter(country %in% country_list) %>%
ggplot(aes(x = date, y = confirmed, color = confirmed),
size = 1) +
geom_point() +
geom_line() +
geom_label_repel(data = subset(df,
date == ymd("2020-03-22") &
country %in% country_list),
aes(label = paste0(country, ": ",
confirmed)),
family = enfont) +
theme_ipsum(base_family = enfont) +
scale_x_date(breaks = date_breaks("10 days"),
labels = date_format("%m-%d"),
limits = ymd(c("2020-01-19", "2020-03-22"))) +
scale_color_gradientn(colors = RColorBrewer::brewer.pal(n = 9, name = "Reds")) +
labs(title = paste0("Total Cases of COVID-19: ", current_total),
subtitle = "TidyFriday Project | 2020-03-22",
caption = "Data Source: John Hopkins University\n<https://github.com/CSSEGISandData/COVID-19>",
x = "", y = "Cases")
注意到这个时候这个图看起来就不太对了,这是因为我们把 confirmed 映射为 color 的时候 confirmed 变量也会自动被作为分组变量,所以这个时候我们还需要指定分组变量为 country: group = country
df %>%
dplyr::filter(country %in% country_list) %>%
ggplot(aes(x = date, y = confirmed,
color = confirmed, group = country),
size = 1) +
geom_point() +
geom_line() +
geom_label_repel(data = subset(df,
date == ymd("2020-03-22") &
country %in% country_list),
aes(label = paste0(country, ": ",
confirmed)),
family = enfont) +
theme_ipsum(base_family = enfont) +
scale_x_date(breaks = date_breaks("10 days"),
labels = date_format("%m-%d"),
limits = ymd(c("2020-01-19", "2020-03-22"))) +
scale_color_gradientn(colors = RColorBrewer::brewer.pal(n = 9, name = "Reds")) +
labs(title = paste0("Total Cases of COVID-19: ", current_total),
subtitle = "TidyFriday Project | 2020-03-22",
caption = "Data Source: John Hopkins University\n<https://github.com/CSSEGISandData/COVID-19>",
x = "", y = "Cases") -> p
p
可以看到,这个时候的图例是连续的渐变色柱条(colorbar),我们还可以通过下面的设置把渐变色柱条变成分组着色的柱条(colorsteps):
p +
guides(color = guide_colorsteps())
这样的比较好看,最近我比较喜欢使用这种图例。guide_colorsteps() 提供了丰富的方法进行图例样式设计,例如图例的高度:
p +
guides(color = guide_colorsteps(barheight = grid::unit(5, "cm")))
再例如把图例的方向反过来:
p +
guides(color = guide_colorsteps(barheight = grid::unit(5, "cm"),
reverse = TRUE))
更多设置可以查看帮助文档:
?guide_colorsteps()
当然如果你想设置 colorbar() 可以查看 colorbar() 的帮助文档:
?guide_colorbar()
例如:
p +
guides(color = guide_colorbar(barheight = grid::unit(5, "cm"), reverse = TRUE))
这位小伙伴遇到的问题
大家注意到上面的图表的图例都是通过指定一个映射生成的,那么有时候我们会遇到下面的情况,我们先构造一个数据框:
df %>%
dplyr::filter(country %in% country_list[1:2]) %>%
spread(key = "country", value = "confirmed") -> df_wide
df_wide
这种数据是宽型数据,我们也可以直接用 df_wide 绘图:
ggplot(df_wide, aes(x = date)) +
geom_line(aes(y = China), color = "#E31A1C") +
geom_point(aes(y = China), color = "#E31A1C") +
geom_line(aes(y = Italy), color = "#18BC9C") +
geom_point(aes(y = Italy), color = "#18BC9C")
大家注意到这个时候就没有图例了,因为我们是单独把两个序列绘制出来的,当然想要图例的一个方法就是把宽型数据转换成长型数据(像上面的一样),不过其实也可以这样:
ggplot(df_wide, aes(x = date)) +
geom_line(aes(y = China, color = "China")) +
geom_point(aes(y = China, color = "China")) +
geom_line(aes(y = Italy, color = "Italy")) +
geom_point(aes(y = Italy, color = "Italy")) +
scale_color_manual(values = c(
"China" = "#E31A1C",
"Italy" = "#18BC9C"
), name = "country")
看,图例是不是出来了!
或者把宽形数据转换成长形数据即可:
df_wide %>%
gather(China, Italy, key = "country", value = "confirmed") %>%
ggplot(aes(x = date, y = confirmed, color = country)) +
geom_point() +
geom_line() +
scale_color_manual(values = c(
"China" = "#E31A1C",
"Italy" = "#18BC9C"
))
这和上面的结果就是一样的了!大家在实际使用中可以根据自己的需要选择方法。