看代码学画图 - Nobel化学奖的获得者多半投胎在那些国家？

Original Y叔叔 YuLabSMU 2022-09-20

收录于合集 #画图另类秘笈 45个

这是Antoine（一个博士生）做的一个可视化，用了我的ggimage包（多次在本公众号介绍过），并且在twitter上AT了我，我看到小国旗都严重变形了，就给他来个PR，修复一下。讲真ggimage用图片来当画图元素，可以说图片是不会变形的，只要你用coord_equal或者coord_fixed，但在实际中，我们不会用这两个，并且图片长宽比会根据我们的需要而设定，这样子用于画图的图片等同于就被拉扯了，于是长宽比就不是原来的，而图片就变形了，如同上面的小国旗。所以我在ggimage里提供了一个参数，asp，顾名思义用来设定aspect ratio，于是你可以拯救变形的小图片。最后我还给了它另外一个参数，我们拭目以待，看他最后将给我们带来什么样的应用。

这里我给出他的代码的解读，帮助大家理解。

library(LaCroixColoR)
library(tidyverse)
library(ggthemes)
library(ggimage)

LaCroixColoR这个包是用来配色，library(ggthemes)包用来设主题，用了华尔街日报的主题theme_wsj，我的ggimage包用来画小国旗。tidyverse当然是用来处理数据。

数据

数据在 https://github.com/abichat/tidytuesday/.

nobel_winners <- read_csv("data/data_2019-05-14.csv",
                          col_types = "dccccdccDccccccDcc")

大概长这样子：

> head(nobel_winners)
# A tibble: 6 x 18
  prize_year category prize motivation prize_share laureate_id laureate_type
       <dbl> <chr>    <chr> <chr>      <chr>             <dbl> <chr>        
1       1901 Chemist~ The ~ "\"in rec~ 1/1                 160 Individual   
2       1901 Literat~ The ~ "\"in spe~ 1/1                 569 Individual   
3       1901 Medicine The ~ "\"for hi~ 1/1                 293 Individual   
4       1901 Peace    The ~ <NA>       1/2                 462 Individual   
5       1901 Peace    The ~ <NA>       1/2                 463 Individual   
6       1901 Physics  The ~ "\"in rec~ 1/1                   1 Individual   
# ... with 11 more variables: full_name <chr>, birth_date <date>,
#   birth_city <chr>, birth_country <chr>, gender <chr>,
#   organization_name <chr>, organization_city <chr>,
#   organization_country <chr>, death_date <date>, death_city <chr>,
#   death_country <chr>

画小国旗用的不是国家的全称，而是代号，所以这里生成一个对应的代号，用了tribble这个可以很方便地按照row来输入一个tibble，前面用~开头的是column name，想想以前要按照row来输入的话，还挺麻烦，因为data.frame是column based的。

df_countrycode <-
  tribble(~country, ~code, 
          "United States of America", "US",
          "Germany", "DE",
          "United Kingdom", "GB",
          "France", "FR",
          "Japan", "JP",
          "Netherlands", "NL",
          "Sweden", "SE",
          "Russia", "RU",
          "Canada", "CA",
          "Austria", "AT")

表格

countries <-
  nobel_winners %>%
  filter(category == "Chemistry") %>%
  count(birth_country, sort = TRUE) %>%
  head(n = 10) %>%
  pull(birth_country)

这里只看化学奖，于是过滤一下，按照出生的国家进行计数并排序，只拿人数最多的10个国家，用pull获取出生国家，而不管计数，出来结果如下。PS，pull相当于[[去获取某个column。

> countries
 [1] "United States of America" "Germany"                 
 [3] "United Kingdom"           "France"                  
 [5] "Japan"                    "Netherlands"             
 [7] "Sweden"                   "Austria"                 
 [9] "Canada"                   "Russia"

获奖者数目：

nobel_counts <-
  nobel_winners %>%
  filter(birth_country %in% countries, category == "Chemistry") %>%
  select(prize_year, birth_country) %>%
  arrange(prize_year) %>%
  group_by(birth_country) %>%
  mutate(n_prize = n(),
         first_prize = min(prize_year),
         last_prize = max(prize_year),
         cum = row_number()) %>%
  ungroup()

从出生国家中过滤出上面排名前10的国家，并且只选择化学奖，只保留获奖年份和出生国家这两个信息，按照获奖年份排序，按照出生国家分组，用mutate加一些相当的column，n_prize是每个国家获奖的次数，first_prize是第一次获奖的年份，last_prize是最后一次获奖的年份，cum是所有获奖次数的累积加和，也就是每一个分组之后排序的row number。

> nobel_counts

nobel_counts
# A tibble: 151 x 6
   prize_year birth_country            n_prize first_prize last_prize   cum
        <dbl> <fct>                      <int>       <dbl>      <dbl> <dbl>
 1       1901 Netherlands                    5        1901       2016     0
 2       1901 Netherlands                    5        1901       2016     1
 3       1903 Sweden                         5        1903       2015     0
 4       1903 Sweden                         5        1903       2015     1
 5       1906 France                        10        1906       2016     0
 6       1906 France                        10        1906       2016     1
 7       1912 France                        10        1906       2016     2
 8       1912 France                        10        1906       2016     3
 9       1913 France                        10        1906       2016     4
10       1914 United States of America      57        1914       2015     0
# ... with 141 more rows

对上面的数据再做处理：

nobel_counts <-
  nobel_counts %>%
  filter(cum == 1) %>%
  mutate(cum = 0) %>%
  bind_rows(nobel_counts) %>%
  arrange(prize_year, cum) %>%
  mutate(birth_country = fct_reorder(birth_country, n_prize, .desc = TRUE))

把cum==1，也就是首次获奖的记录，记为cum = 0，再用bind_rows添加到原来的数据集上，按照prize_year和cum排序，按照获奖次数设定出生国家的factor levels。

first_last_nobel <-
  nobel_counts %>%
  select(birth_country, n_prize, first_prize, last_prize) %>%
  unique() %>%
  left_join(df_countrycode, by = c("birth_country" = "country"))

再整理一个首次和最后下次获奖的记录，这只是为了画图额外增加的，在最后的点上画小国旗，而在第一个点画上点，而这个第一个点，其实不是第一个点，而是y=0，做为所以线条的起点，这也是前面加cum=0的记录的原因，让线条把0和首次获奖1之前不中断。

画图

ggplot(nobel_counts) +
  aes(x = prize_year, y = cum, group = birth_country) +
  geom_line(aes(color = birth_country)) +
  geom_point(data = first_last_nobel, y = 0,
             aes(x = first_prize, color = birth_country)) +
  geom_flag(data = first_last_nobel, size = 0.03, asp=2,
            aes(x = last_prize, y = n_prize, image = code)) +
  scale_color_manual(values = lacroix_palette("PeachPear", n = 10, type = "continuous")) +
  scale_y_continuous(limits = c(NA, 60)) +
  labs(title = "Number of chemistry Nobel prizes by birth country", color = NULL,
       caption = "Source: The Nobel Prize\n@_abichat for #TidyTuesday") +
  theme_wsj(color = "gray") +
  theme(legend.position = "bottom", 
        plot.caption = element_text(size = 12), 
        plot.title = element_text(size = 25),
        legend.text = element_text(family = "mono"))

ggsave("plots/plot_2019-05-14.png", width = 29, height = 21, units = "cm", dpi = "retina")

geom_line画每个国家累积获奖次数的线条，geom_point加0点，geom_flag最高点加小国旗。scale设置标尺，labs设标题，theme设主题，最后的图是这样子的：

轮到你了

你起码可以换一个奖项来画图吧？比如说医学奖拿来试试：

> select(nobel_winners, category) %>% unique
# A tibble: 6 x 1
  category  
  <chr>     
1 Chemistry 
2 Literature
3 Medicine  
4 Peace     
5 Physics   
6 Economics

往期精彩

高三女生醉酒后被强奸致死？检方回应

高三女生醉酒后被强奸致死？检方回应

那些内心强大的孩子，童年被允许做过这1件事

波罗的海，电缆断裂！

关晓彤突然官宣喜讯！粉丝欢呼：恭喜啊，终于等到这一天

看代码学画图 - Nobel化学奖的获得者多半投胎在那些国家？

数据

表格

画图

轮到你了

您可能也对以下帖子感兴趣

高三女生醉酒后被强奸致死？检方回应

高三女生醉酒后被强奸致死？检方回应

那些内心强大的孩子，童年被允许做过这1件事

波罗的海，电缆断裂！

关晓彤突然官宣喜讯！粉丝欢呼：恭喜啊，终于等到这一天

生成图片，分享到微信朋友圈

看代码学画图 - Nobel化学奖的获得者多半投胎在那些国家？

数据

表格

画图

轮到你了

您可能也对以下帖子感兴趣