「盘点」R语言的那些分分合合

Original 豆豆花花生信星球 2022-06-25

今天是生信星球陪你的第449天

大神一句话，菜鸟跑半年。我不是大神，但我可以缩短你走弯路的半年~

就像歌儿唱的那样，如果你不知道该往哪儿走，就留在这学点生信好不好~

这里有豆豆和花花的学习历程，从新手到进阶，生信路上有你有我！

花花写于2019-09-16

最近写推文缺素材，打算写R语言的《盘点》系列，欢迎后台发消息告诉我想看盘点啥。

有很多R语言函数是向量化的，就是说可以直接对向量进行处理，而不需要你写循环！

1.字符串

字符串分割的函数，str_split()就好，默认返回列表，加上参数simplify=T，则返回矩阵。

跟字符串连接有关的有三个函数：

paste、stringr::str_c和paste0

他们一脉相承！str_c出自tidyverse套装中的stringr包，和paste基本无区别，这个包专门处理字符串用，命名都以str_开头，我是很喜欢的。

paste0就更简单了，也是从paste演变而来，只是paste默认分隔符为空格，paste0默认无分隔符，无需记忆，一试便知，我上课都有一句名言，你试试啊！

先准备包和示例数据

if(!require(stringr))install.packages("stringr")
library(stringr)

# 我们取stringr的内置长字符串数据集sentences作为示例数据，为了简化，只拿前三行
class(sentences)
#> [1] "character"
x <-head(sentences,3)
x
#> [1] "The birch canoe slid on the smooth planks." 
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."

（1）字符串分割

str_split(x,pattern = " ")
#> [[1]]
#> [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
#> [8] "planks."
#> 
#> [[2]]
#> [1] "Glue"        "the"         "sheet"       "to"          "the"        
#> [6] "dark"        "blue"        "background."
#> 
#> [[3]]
#> [1] "It's"  "easy"  "to"    "tell"  "the"   "depth" "of"    "a"     "well."
str_split(x,pattern = " ",simplify = T)
#>      [,1]   [,2]    [,3]    [,4]   [,5]  [,6]    [,7]     [,8]         
#> [1,] "The"  "birch" "canoe" "slid" "on"  "the"   "smooth" "planks."    
#> [2,] "Glue" "the"   "sheet" "to"   "the" "dark"  "blue"   "background."
#> [3,] "It's" "easy"  "to"    "tell" "the" "depth" "of"     "a"          
#>      [,9]   
#> [1,] ""     
#> [2,] ""     
#> [3,] "well."

（2）向量拆分

其实这是一个取子集问题！感受一下

x[1]
#> [1] "The birch canoe slid on the smooth planks."
x[2:3]
#> [1] "Glue the sheet to the dark blue background."
#> [2] "It's easy to tell the depth of a well."
x[1:2]
#> [1] "The birch canoe slid on the smooth planks." 
#> [2] "Glue the sheet to the dark blue background."
x[3]
#> [1] "It's easy to tell the depth of a well."

（3）两个向量连接

参数是分隔符sep，默认为空格，可以自定义。

y = 1:3
y
#> [1] 1 2 3
paste(y,x)
#> [1] "1 The birch canoe slid on the smooth planks." 
#> [2] "2 Glue the sheet to the dark blue background."
#> [3] "3 It's easy to tell the depth of a well."
paste(y,x,sep="、")
#> [1] "1、The birch canoe slid on the smooth planks." 
#> [2] "2、Glue the sheet to the dark blue background."
#> [3] "3、It's easy to tell the depth of a well."
str_c(y,x)
#> [1] "1The birch canoe slid on the smooth planks." 
#> [2] "2Glue the sheet to the dark blue background."
#> [3] "3It's easy to tell the depth of a well."
str_c(y,x,sep = ",")
#> [1] "1,The birch canoe slid on the smooth planks." 
#> [2] "2,Glue the sheet to the dark blue background."
#> [3] "3,It's easy to tell the depth of a well."
#paste0是没有分隔符的！等价于paste(y,x,sep="")
paste0(y,x)
#> [1] "1The birch canoe slid on the smooth planks." 
#> [2] "2Glue the sheet to the dark blue background."
#> [3] "3It's easy to tell the depth of a well."

（4）字符串向量合为长字符串

就是一个向量的三个元素合并到一起！

x
#> [1] "The birch canoe slid on the smooth planks." 
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."
str_c(x,collapse = "//")
#> [1] "The birch canoe slid on the smooth planks.//Glue the sheet to the dark blue background.//It's easy to tell the depth of a well."

（5）两个向量首尾相连

这个其实不应该算是个问题！直接用c()就可以，只是很多人不知道！

c(y,x)
#> [1] "1"                                          
#> [2] "2"                                          
#> [3] "3"                                          
#> [4] "The birch canoe slid on the smooth planks." 
#> [5] "Glue the sheet to the dark blue background."
#> [6] "It's easy to tell the depth of a well."

2.数据框

准备包和数据

if(!require(tidyr))install.packages("tidyr")
library(tidyr)
df <- data.frame(hb=c("a,d","b,e","c,f"))
df
#>    hb
#> 1 a,d
#> 2 b,e
#> 3 c,f

(1)分割和合并列的函数

unite seprate

先按照逗号把hb列分为两列。

df2 <- separate(df,hb,into = c("h","b"))
df2
#>   h b
#> 1 a d
#> 2 b e
#> 3 c f

再合并回来

df3 <- unite(df2,c("h","b"),col = hb,sep = ",")
df3
#>    hb
#> 1 a,d
#> 2 b,e
#> 3 c,f
identical(df,df3)
#> [1] FALSE
#破镜重圆！

然后是cbind，rbind，merge

示例数据

test1 <- data.frame(x = c('b','e','f'), 
                    z = c("A","B","C"),
                    stringsAsFactors = F)
test1
#>   x z
#> 1 b A
#> 2 e B
#> 3 f C
test2 <- data.frame(x = c('a','b','c','d','e','f'), 
                    y = c(1,2,3,4,5,6),
                    stringsAsFactors = F)
test2 
#>   x y
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
#> 5 e 5
#> 6 f 6

合并一下~

#简单的按列拼一起，两数据框行数相同就可拼，rbind也是一样，两数据框列数相同可拼，否则报错
cd <- cbind(test1,test2);cd
#>   x z x y
#> 1 b A a 1
#> 2 e B b 2
#> 3 f C c 3
#> 4 b A d 4
#> 5 e B e 5
#> 6 f C f 6
#根据x列合并到一起
merge(test1,test2,by="x")
#>   x z y
#> 1 b A 2
#> 2 e B 5
#> 3 f C 6

然后还想拆分回去？merge是回不去了，如果是cbind、rbind简单合并的话，可逆回去，那就是数据框取子集的问题！

cd[,1:ncol(test1)]
#>   x z
#> 1 b A
#> 2 e B
#> 3 f C
#> 4 b A
#> 5 e B
#> 6 f C
cd[,(ncol(test1)+1 ): ncol(cd)]
#>   x y
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
#> 5 e 5
#> 6 f 6

向大家隆重推荐隔壁生信技能树的一系列干货！
全球公益巡讲、招学徒
B站公益74小时生信工程师教学视频合辑

点击底部的“阅读原文”，获得更好的阅读体验哦😻

初学生信，很荣幸带你迈出第一步。

我们是生信星球，一个不拽术语、通俗易懂的生信知识平台。由于是2018年新号，竟然没有留言功能。需要帮助或提出意见请后台留言、联系微信或发送邮件到jieandze1314@gmail.com，每一条都会看到的哦~

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

近视的孩子有救了！国内最新近视防控矫正技术，不手术，扫码进群即可了解！

「盘点」R语言的那些分分合合

1.字符串

先准备包和示例数据

（1）字符串分割

（2）向量拆分

（3）两个向量连接

（4）字符串向量合为长字符串

（5）两个向量首尾相连

2.数据框

准备包和数据

(1)分割和合并列的函数

然后是cbind，rbind，merge

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突 认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

近视的孩子有救了！国内最新近视防控矫正技术，不手术，扫码进群即可了解！

生成图片，分享到微信朋友圈

「盘点」R语言的那些分分合合

1.字符串

先准备包和示例数据

（1）字符串分割

（2）向量拆分

（3）两个向量连接

（4）字符串向量合为长字符串

（5）两个向量首尾相连

2.数据框

准备包和数据

(1)分割和合并列的函数

然后是cbind，rbind，merge

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡