do.call 比 Reduce 快?

Original JunJunLab 老俊俊的生信笔记 2022-08-15

收录于合集 #R操作小技巧（Tips） 22个

蒲公英的约定

1引言

在处理大数据的时候,我们往往会使用 *apply 家族函数及多线程等等操作,对于循环的结果可以使用 Reduce 或者 do.call 来合并结果。

在处理数据的时候我发现 do.call 竟然比 Reduce 快了不少,可以节约大量时间!

2示例

使用 Reduce 合并结果:

options(future.globals.maxSize= 5000e6)

system.time({future_lapply(1:50000, function(x){
  tmp <- center_df[x,]
  if(tmp$end5 >= tmp$st_pos & tmp$end3 <= tmp$sp_pos){
    pos_new = c(tmp$end5:tmp$end3)
    score = 1/tmp$len
    tmp_score <- data.table(type=tmp$type,rname=tmp$rname,pos_new,score)
    return(tmp_score)
  }else{}
}) %>% Reduce('rbind',.) %>%
    data.table() %>%
    .[,.(sum_density = sum(score)),by = .(type,rname,pos_new)] -> center_score_df})

用户  系统  流逝
82.35  1.24 83.52

使用 do.call 合并结果:

system.time({future_lapply(1:50000, function(x){
  tmp <- center_df[x,]
  if(tmp$end5 >= tmp$st_pos & tmp$end3 <= tmp$sp_pos){
    pos_new = c(tmp$end5:tmp$end3)
    score = 1/tmp$len
    tmp_score <- data.table(type=tmp$type,rname=tmp$rname,pos_new,score)
    return(tmp_score)
  }else{}
}) %>% do.call('rbind',.) %>%
    data.table() %>%
    .[,.(sum_density = sum(score)),by = .(type,rname,pos_new)] -> center_score_df})

用户  系统  流逝
18.53  0.34 18.89