dplyr中的across操作
💡专注R语言在🩺生物医学中的使用
dplyr
中的across
函数取代了之前的xx_if/xx_at/xx_all
,用法更加灵活,初学时觉得不如xx_if/xx_at/xx_all
简单易懂,用习惯后真是利器!
主要是介绍across
函数的用法,这是dplyr
1.0才出来的一个函数,大大简化了代码
可用于对多列做同一个操作。
一般用法
陷阱
across其他连用
和filter()连用
一般用法
library(dplyr, warn.conflicts = FALSE)
across()
有两个基本参数:
.cols:选择你想操作的列 .fn:你想进行的操作,可以使一个函数或者多个函数组成的列表
可以替代_if(),at_(),all_()
starwars %>%
summarise(across(where(is.character), n_distinct))
## # A tibble: 1 × 8
## name hair_color skin_color eye_color sex gender homeworld species
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 87 13 31 15 5 3 49 38
可以直接写列名:
starwars %>%
group_by(species) %>%
filter(n() > 1) %>%
summarise(across(c(sex, gender, homeworld), n_distinct))
## # A tibble: 9 × 4
## species sex gender homeworld
## <chr> <int> <int> <int>
## 1 Droid 1 2 3
## 2 Gungan 1 1 1
## 3 Human 2 2 16
## 4 Kaminoan 2 2 1
## 5 Mirialan 1 1 1
## 6 Twi'lek 2 2 1
## 7 Wookiee 1 1 1
## 8 Zabrak 1 1 2
## 9 <NA> 1 1 3
也可以和where
函数连用,省时省力:
starwars %>%
group_by(homeworld) %>%
filter(n() > 1) %>%
summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))
## # A tibble: 10 × 4
## homeworld height mass birth_year
## <chr> <dbl> <dbl> <dbl>
## 1 Alderaan 176. 64 43
## 2 Corellia 175 78.5 25
## 3 Coruscant 174. 50 91
## 4 Kamino 208. 83.1 31.5
## 5 Kashyyyk 231 124 200
## 6 Mirial 168 53.1 49
## 7 Naboo 175. 64.2 55
## 8 Ryloth 179 55 48
## 9 Tatooine 170. 85.4 54.6
## 10 <NA> 139. 82 334.
如果没有缺失值,可以直接写mean,
library(tidyr)
starwars %>% drop_na() %>%
group_by(homeworld) %>%
filter(n() > 1) %>%
summarise(across(where(is.numeric), mean))
## # A tibble: 4 × 4
## homeworld height mass birth_year
## <chr> <dbl> <dbl> <dbl>
## 1 Corellia 175 78.5 25
## 2 Mirial 168 53.1 49
## 3 Naboo 177 62 60
## 4 Tatooine 181. 96 37.6
acorss
支持多个函数同时使用,只要放入列表中即可:
min_max <- list(
min = ~min(.x, na.rm = TRUE),
max = ~max(.x, na.rm = TRUE)
)
starwars %>% summarise(across(where(is.numeric), min_max))
## # A tibble: 1 × 6
## height_min height_max mass_min mass_max birth_year_min birth_year_max
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 66 264 15 1358 8 896
starwars %>% summarise(across(c(height, mass, birth_year), min_max))
## # A tibble: 1 × 6
## height_min height_max mass_min mass_max birth_year_min birth_year_max
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 66 264 15 1358 8 896
当然也是支持glue
的:
starwars %>% summarise(across(where(is.numeric), min_max, .names = "{.fn}.{.col}"))
## # A tibble: 1 × 6
## min.height max.height min.mass max.mass min.birth_year max.birth_year
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 66 264 15 1358 8 896
starwars %>% summarise(across(c(height, mass, birth_year), min_max, .names = "{.fn}.{.col}"))
## # A tibble: 1 × 6
## min.height max.height min.mass max.mass min.birth_year max.birth_year
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 66 264 15 1358 8 896
分开写也是可以的:
starwars %>% summarise(
across(c(height, mass, birth_year), ~min(.x, na.rm = TRUE), .names = "min_{.col}"),
across(c(height, mass, birth_year), ~max(.x, na.rm = TRUE), .names = "max_{.col}")
)
## # A tibble: 1 × 6
## min_height min_mass min_birth_year max_height max_mass max_birth_year
## <int> <dbl> <dbl> <int> <dbl> <dbl>
## 1 66 15 8 264 1358 896
这种情况不能使用where(is.numeric)
,因为第2个across
会使用新创建的列(“min_height”, “min_mass” and “min_birth_year”)。
可以放在tibble里解决:
starwars %>% summarise(
tibble(
across(where(is.numeric), ~min(.x, na.rm = TRUE), .names = "min_{.col}"),
across(where(is.numeric), ~max(.x, na.rm = TRUE), .names = "max_{.col}")
)
)
## # A tibble: 1 × 6
## min_height min_mass min_birth_year max_height max_mass max_birth_year
## <int> <dbl> <dbl> <int> <dbl> <dbl>
## 1 66 15 8 264 1358 896
陷阱
在使用where(is.numeric)
,要注意下面这种情况:
df <- data.frame(x = c(1, 2, 3), y = c(1, 4, 9))
df %>%
summarise(n = n(), across(where(is.numeric), sd))
## n x y
## 1 NA 1 4.041452
n这里是3,是一个常数,所以它的sd变成了NA,可以通过换一下顺序解决:
df %>% summarise(across(where(is.numeric), sd),
n = n()
)
## x y n
## 1 1 4.041452 3
或者通过下面两种方法解决:
df %>%
summarise(n = n(), across(where(is.numeric) & !n, sd))
## n x y
## 1 3 1 4.041452
df %>%
summarise(
tibble(n = n(), across(where(is.numeric), sd))
)
## n x y
## 1 3 1 4.041452
across其他连用
还可以和group_by()
/count()
/distinct()
连用。
和filter()连用
across()
不能直接和filter()
连用,和filter()
连用的是if_any()
和if_all()
。
if_any():任何一列满足条件即可 if_all():所有列都要满足条件
starwars %>%
filter(if_any(everything(), ~ !is.na(.x)))
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 <NA> white, red red NA none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
starwars %>%
filter(if_all(everything(), ~ !is.na(.x)))
## # A tibble: 29 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 Darth V… 202 136 none white yellow 41.9 male mascu…
## 3 Leia Or… 150 49 brown light brown 19 fema… femin…
## 4 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 5 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 6 Biggs D… 183 84 black light brown 24 male mascu…
## 7 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## 8 Anakin … 188 84 blond fair blue 41.9 male mascu…
## 9 Chewbac… 228 112 brown unknown blue 200 male mascu…
## 10 Han Solo 180 80 brown fair brown 29 male mascu…
## # … with 19 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
获取更多信息,欢迎加入🐧QQ交流群:613637742
“医学和生信笔记,专注R语言在临床医学中的使用、R语言数据分析和可视化。主要分享R语言做医学统计学、meta分析、网络药理学、临床预测模型、机器学习、生物信息学等。
往期回顾
使用patchwork进行拼图的一些细节
文献学习:机器学习帮助临床决策
ggplot2分面图形大改造
只会logistic和cox的决策曲线?来看看适用于一切模型的DCA!
R语言画连线排序图
生存资料ROC曲线的最佳截点和平滑曲线