「盘点」R语言的那些分分合合
大神一句话,菜鸟跑半年。我不是大神,但我可以缩短你走弯路的半年~
就像歌儿唱的那样,如果你不知道该往哪儿走,就留在这学点生信好不好~
这里有豆豆和花花的学习历程,从新手到进阶,生信路上有你有我!
花花写于2019-09-16
最近写推文缺素材,打算写R语言的《盘点》系列,欢迎后台发消息告诉我想看盘点啥。
有很多R语言函数是向量化的,就是说可以直接对向量进行处理,而不需要你写循环!
1.字符串
字符串分割的函数,str_split()就好,默认返回列表,加上参数simplify=T,则返回矩阵。
跟字符串连接有关的有三个函数:
paste、stringr::str_c和paste0
他们一脉相承!str_c出自tidyverse套装中的stringr包,和paste基本无区别,这个包专门处理字符串用,命名都以str_开头,我是很喜欢的。
paste0就更简单了,也是从paste演变而来,只是paste默认分隔符为空格,paste0默认无分隔符,无需记忆,一试便知,我上课都有一句名言,你试试啊!
先准备包和示例数据
if(!require(stringr))install.packages("stringr")
library(stringr)
# 我们取stringr的内置长字符串数据集sentences作为示例数据,为了简化,只拿前三行
class(sentences)
#> [1] "character"
x <-head(sentences,3)
x
#> [1] "The birch canoe slid on the smooth planks."
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."
(1)字符串分割
str_split(x,pattern = " ")
#> [[1]]
#> [1] "The" "birch" "canoe" "slid" "on" "the" "smooth"
#> [8] "planks."
#>
#> [[2]]
#> [1] "Glue" "the" "sheet" "to" "the"
#> [6] "dark" "blue" "background."
#>
#> [[3]]
#> [1] "It's" "easy" "to" "tell" "the" "depth" "of" "a" "well."
str_split(x,pattern = " ",simplify = T)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
#> [2,] "Glue" "the" "sheet" "to" "the" "dark" "blue" "background."
#> [3,] "It's" "easy" "to" "tell" "the" "depth" "of" "a"
#> [,9]
#> [1,] ""
#> [2,] ""
#> [3,] "well."
(2)向量拆分
其实这是一个取子集问题!感受一下
x[1]
#> [1] "The birch canoe slid on the smooth planks."
x[2:3]
#> [1] "Glue the sheet to the dark blue background."
#> [2] "It's easy to tell the depth of a well."
x[1:2]
#> [1] "The birch canoe slid on the smooth planks."
#> [2] "Glue the sheet to the dark blue background."
x[3]
#> [1] "It's easy to tell the depth of a well."
(3)两个向量连接
参数是分隔符sep,默认为空格,可以自定义。
y = 1:3
y
#> [1] 1 2 3
paste(y,x)
#> [1] "1 The birch canoe slid on the smooth planks."
#> [2] "2 Glue the sheet to the dark blue background."
#> [3] "3 It's easy to tell the depth of a well."
paste(y,x,sep="、")
#> [1] "1、The birch canoe slid on the smooth planks."
#> [2] "2、Glue the sheet to the dark blue background."
#> [3] "3、It's easy to tell the depth of a well."
str_c(y,x)
#> [1] "1The birch canoe slid on the smooth planks."
#> [2] "2Glue the sheet to the dark blue background."
#> [3] "3It's easy to tell the depth of a well."
str_c(y,x,sep = ",")
#> [1] "1,The birch canoe slid on the smooth planks."
#> [2] "2,Glue the sheet to the dark blue background."
#> [3] "3,It's easy to tell the depth of a well."
#paste0是没有分隔符的!等价于paste(y,x,sep="")
paste0(y,x)
#> [1] "1The birch canoe slid on the smooth planks."
#> [2] "2Glue the sheet to the dark blue background."
#> [3] "3It's easy to tell the depth of a well."
(4)字符串向量合为长字符串
就是一个向量的三个元素合并到一起!
x
#> [1] "The birch canoe slid on the smooth planks."
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."
str_c(x,collapse = "//")
#> [1] "The birch canoe slid on the smooth planks.//Glue the sheet to the dark blue background.//It's easy to tell the depth of a well."
(5)两个向量首尾相连
这个其实不应该算是个问题!直接用c()就可以,只是很多人不知道!
c(y,x)
#> [1] "1"
#> [2] "2"
#> [3] "3"
#> [4] "The birch canoe slid on the smooth planks."
#> [5] "Glue the sheet to the dark blue background."
#> [6] "It's easy to tell the depth of a well."
2.数据框
准备包和数据
if(!require(tidyr))install.packages("tidyr")
library(tidyr)
df <- data.frame(hb=c("a,d","b,e","c,f"))
df
#> hb
#> 1 a,d
#> 2 b,e
#> 3 c,f
(1)分割和合并列的函数
unite seprate
先按照逗号把hb列分为两列。
df2 <- separate(df,hb,into = c("h","b"))
df2
#> h b
#> 1 a d
#> 2 b e
#> 3 c f
再合并回来
df3 <- unite(df2,c("h","b"),col = hb,sep = ",")
df3
#> hb
#> 1 a,d
#> 2 b,e
#> 3 c,f
identical(df,df3)
#> [1] FALSE
#破镜重圆!
然后是cbind,rbind,merge
示例数据
test1 <- data.frame(x = c('b','e','f'),
z = c("A","B","C"),
stringsAsFactors = F)
test1
#> x z
#> 1 b A
#> 2 e B
#> 3 f C
test2 <- data.frame(x = c('a','b','c','d','e','f'),
y = c(1,2,3,4,5,6),
stringsAsFactors = F)
test2
#> x y
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
#> 5 e 5
#> 6 f 6
合并一下~
#简单的按列拼一起,两数据框行数相同就可拼,rbind也是一样,两数据框列数相同可拼,否则报错
cd <- cbind(test1,test2);cd
#> x z x y
#> 1 b A a 1
#> 2 e B b 2
#> 3 f C c 3
#> 4 b A d 4
#> 5 e B e 5
#> 6 f C f 6
#根据x列合并到一起
merge(test1,test2,by="x")
#> x z y
#> 1 b A 2
#> 2 e B 5
#> 3 f C 6
然后还想拆分回去?merge是回不去了,如果是cbind、rbind简单合并的话,可逆回去,那就是数据框取子集的问题!
cd[,1:ncol(test1)]
#> x z
#> 1 b A
#> 2 e B
#> 3 f C
#> 4 b A
#> 5 e B
#> 6 f C
cd[,(ncol(test1)+1 ): ncol(cd)]
#> x y
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
#> 5 e 5
#> 6 f 6
向大家隆重推荐隔壁生信技能树的一系列干货!
点击底部的“阅读原文”,获得更好的阅读体验哦😻
初学生信,很荣幸带你迈出第一步。
我们是生信星球,一个不拽术语、通俗易懂的生信知识平台。由于是2018年新号,竟然没有留言功能。需要帮助或提出意见请后台留言、联系微信或发送邮件到jieandze1314@gmail.com,每一条都会看到的哦~