使用R语言快速绘制三线表
Try to learn everything about something!
临床研究中基线资料表必不可少,通常也是你文章中的第一张表!
一般我们会通过Word或者Excel进行绘制基线资料表,但是这种方法很麻烦,需要不停的复制粘贴。。。
今天介绍的这个tableone
,就是专门为这个基线资料表而生的!和之前介绍过的comparegroups
有点像哦~ 但是功能不止于基线资料表,还可以做非常多统计描述的工作!
支持连续性变量及分类变量,支持自动标注P值,加权数据也是支持的,真的是一步到位了!
安装
使用
基本描述功能
选择变量&指定变量类型
显示所有水平
非正态分布变量
分层显示
导出
不同包的比较
安装
目前不在cran上,只能通过github安装。
# install.packages("devtools")
devtools::install_github(repo = "kaz-yos/tableone", ref = "develop")
R包安装有问题的小伙伴可以加我微信或者评论区留言~
使用
为了方便比较,还是使用之前用过的predimed数据集。
“PREDIMED 是在西班牙进行的一项多中心试验,将具有血管风险,但在参加试验时没有心血管疾病的参与者,随机分配到以下三种饮食中的一种:地中海饮食加特级初榨橄榄油 (MedDiet+VOO), 地中海饮食补充混合坚果(MedDiet+Nuts),或控制饮食(建议减少饮食脂肪)。主要终点是心血管事件(心肌梗死、中风或心血管原因死亡)的发生。
library(compareGroups)
library(tableone)
data("predimed")
str(predimed)
## 'data.frame': 6324 obs. of 15 variables:
## $ group : Factor w/ 3 levels "Control","MedDiet + Nuts",..: 1 1 3 2 3 1 3 3 1 1 ...
## ..- attr(*, "label")= chr "Intervention group"
## $ sex : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 1 2 1 1 1 ...
## ..- attr(*, "label")= chr "Sex"
## $ age : num 58 77 72 71 79 63 75 66 71 76 ...
## ..- attr(*, "label")= chr "Age"
## $ smoke : Factor w/ 3 levels "Never","Current",..: 3 2 3 3 1 3 1 1 3 2 ...
## ..- attr(*, "label")= chr "Smoking"
## $ bmi : num 33.5 31.1 30.9 27.7 35.9 ...
## ..- attr(*, "label")= chr "Body mass index"
## $ waist : num 122 119 106 118 129 143 88 85 90 79 ...
## ..- attr(*, "label")= chr "Waist circumference"
## $ wth : num 0.753 0.73 0.654 0.694 0.806 ...
## ..- attr(*, "label")= chr "Waist-to-height ratio"
## $ htn : Factor w/ 2 levels "No","Yes": 1 2 1 2 2 2 1 2 2 2 ...
## ..- attr(*, "label")= chr "Hypertension"
## $ diab : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 2 2 2 1 2 ...
## ..- attr(*, "label")= chr "Type-2 diabetes"
## $ hyperchol: Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 1 2 1 ...
## ..- attr(*, "label")= chr "Dyslipidemia"
## $ famhist : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 2 1 1 ...
## ..- attr(*, "label")= chr "Family history of premature CHD"
## $ hormo : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 NA NA 1 1 1 ...
## ..- attr(*, "label")= chr "Hormone-replacement therapy"
## $ p14 : num 10 10 8 8 9 9 8 9 14 9 ...
## ..- attr(*, "label")= chr "MeDiet Adherence score"
## $ toevent : num 5.37 6.1 5.95 2.91 4.76 ...
## ..- attr(*, "label")= chr "follow-up to main event (years)"
## $ event : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 2 1 1 2 1 ...
## ..- attr(*, "label")= Named chr "AMI, stroke, or CV Death"
## .. ..- attr(*, "names")= chr "varlabel"
基本描述功能
首先是基本的统计描述功能,使用CreateTableOne()
函数可以给出数据的基本情况:
(tab <- CreateTableOne(data = predimed))
##
## Overall
## n 6324
## group (%)
## Control 2042 (32.3)
## MedDiet + Nuts 2100 (33.2)
## MedDiet + VOO 2182 (34.5)
## sex = Female (%) 3645 (57.6)
## age (mean (SD)) 67.01 (6.17)
## smoke (%)
## Never 3892 (61.5)
## Current 858 (13.6)
## Former 1574 (24.9)
## bmi (mean (SD)) 29.97 (3.82)
## waist (mean (SD)) 100.36 (10.59)
## wth (mean (SD)) 0.63 (0.07)
## htn = Yes (%) 5235 (82.8)
## diab = Yes (%) 3002 (47.5)
## hyperchol = Yes (%) 4578 (72.4)
## famhist = Yes (%) 1429 (22.6)
## hormo = Yes (%) 97 ( 1.7)
## p14 (mean (SD)) 8.68 (1.94)
## toevent (mean (SD)) 4.36 (1.69)
## event = Yes (%) 252 ( 4.0)
选择变量&指定变量类型
通过vars()
函数指定保留哪些变量,通过factorVars()
函数指定哪些是分类变量。
但其实predimed
这个数据集已经把分类变量因子化了,因此这里不用factorVars()
也是可以的。
CreateTableOne(data = predimed,
vars = c("group","sex","bmi","waist","wth"),
factorVars = c("group","sex")
)
##
## Overall
## n 6324
## group (%)
## Control 2042 (32.3)
## MedDiet + Nuts 2100 (33.2)
## MedDiet + VOO 2182 (34.5)
## sex = Female (%) 3645 (57.6)
## bmi (mean (SD)) 29.97 (3.82)
## waist (mean (SD)) 100.36 (10.59)
## wth (mean (SD)) 0.63 (0.07)
显示所有水平
在第一个表中我们可以发现很多分类变量被折叠了,都只显示了yes的一部分,比如sex/htn/diab等,我们在print()
中添加showAllLevels = T
显示所有分类!
print(tab, showAllLevels = T)
##
## level Overall
## n 6324
## group (%) Control 2042 (32.3)
## MedDiet + Nuts 2100 (33.2)
## MedDiet + VOO 2182 (34.5)
## sex (%) Male 2679 (42.4)
## Female 3645 (57.6)
## age (mean (SD)) 67.01 (6.17)
## smoke (%) Never 3892 (61.5)
## Current 858 (13.6)
## Former 1574 (24.9)
## bmi (mean (SD)) 29.97 (3.82)
## waist (mean (SD)) 100.36 (10.59)
## wth (mean (SD)) 0.63 (0.07)
## htn (%) No 1089 (17.2)
## Yes 5235 (82.8)
## diab (%) No 3322 (52.5)
## Yes 3002 (47.5)
## hyperchol (%) No 1746 (27.6)
## Yes 4578 (72.4)
## famhist (%) No 4895 (77.4)
## Yes 1429 (22.6)
## hormo (%) No 5564 (98.3)
## Yes 97 ( 1.7)
## p14 (mean (SD)) 8.68 (1.94)
## toevent (mean (SD)) 4.36 (1.69)
## event (%) No 6072 (96.0)
## Yes 252 ( 4.0)
对于二分类变量,还可以使用cramVars
参数达到类似的效果,但要注意,此时不同类别是显示在一行中的:
print(tab, cramVars = c("sex","htn","diab"))
##
## Overall
## n 6324
## group (%)
## Control 2042 (32.3)
## MedDiet + Nuts 2100 (33.2)
## MedDiet + VOO 2182 (34.5)
## sex = Male/Female (%) 2679/3645 (42.4/57.6)
## age (mean (SD)) 67.01 (6.17)
## smoke (%)
## Never 3892 (61.5)
## Current 858 (13.6)
## Former 1574 (24.9)
## bmi (mean (SD)) 29.97 (3.82)
## waist (mean (SD)) 100.36 (10.59)
## wth (mean (SD)) 0.63 (0.07)
## htn = No/Yes (%) 1089/5235 (17.2/82.8)
## diab = No/Yes (%) 3322/3002 (52.5/47.5)
## hyperchol = Yes (%) 4578 (72.4)
## famhist = Yes (%) 1429 (22.6)
## hormo = Yes (%) 97 ( 1.7)
## p14 (mean (SD)) 8.68 (1.94)
## toevent (mean (SD)) 4.36 (1.69)
## event = Yes (%) 252 ( 4.0)
非正态分布变量
对于正态分布的变量使用的是均值±标准差的方式进行展示,对于非正态变量则通过中位数(四分位距)表示。可以通过nonnormal
参数指定。
print(tab,
showAllLevels = T,
nonnormal = c("p14","toevent")
)
##
## level Overall
## n 6324
## group (%) Control 2042 (32.3)
## MedDiet + Nuts 2100 (33.2)
## MedDiet + VOO 2182 (34.5)
## sex (%) Male 2679 (42.4)
## Female 3645 (57.6)
## age (mean (SD)) 67.01 (6.17)
## smoke (%) Never 3892 (61.5)
## Current 858 (13.6)
## Former 1574 (24.9)
## bmi (mean (SD)) 29.97 (3.82)
## waist (mean (SD)) 100.36 (10.59)
## wth (mean (SD)) 0.63 (0.07)
## htn (%) No 1089 (17.2)
## Yes 5235 (82.8)
## diab (%) No 3322 (52.5)
## Yes 3002 (47.5)
## hyperchol (%) No 1746 (27.6)
## Yes 4578 (72.4)
## famhist (%) No 4895 (77.4)
## Yes 1429 (22.6)
## hormo (%) No 5564 (98.3)
## Yes 97 ( 1.7)
## p14 (median [IQR]) 9.00 [7.00, 10.00]
## toevent (median [IQR]) 4.79 [2.86, 5.79]
## event (%) No 6072 (96.0)
## Yes 252 ( 4.0)
分层显示
但是在实际写论文的时候,经常需要分组显示,分别展示不同组间的统计资料,然后计算组间有没有差别!
可以通过strata
参数实现,自动给出P值。
tab_s <- CreateTableOne(data = predimed, vars = colnames(predimed)[-1] , strata = "group")
全部展开展示:
print(tab_s, showAllLevels = T)
## Stratified by group
## level Control MedDiet + Nuts MedDiet + VOO
## n 2042 2100 2182
## sex (%) Male 812 (39.8) 968 (46.1) 899 (41.2)
## Female 1230 (60.2) 1132 (53.9) 1283 (58.8)
## age (mean (SD)) 67.34 (6.28) 66.68 (6.02) 67.02 (6.21)
## smoke (%) Never 1282 (62.8) 1259 (60.0) 1351 (61.9)
## Current 270 (13.2) 296 (14.1) 292 (13.4)
## Former 490 (24.0) 545 (26.0) 539 (24.7)
## bmi (mean (SD)) 30.28 (3.96) 29.69 (3.77) 29.94 (3.71)
## waist (mean (SD)) 100.84 (10.77) 100.19 (10.56) 100.08 (10.44)
## wth (mean (SD)) 0.63 (0.07) 0.62 (0.06) 0.63 (0.06)
## htn (%) No 331 (16.2) 362 (17.2) 396 (18.1)
## Yes 1711 (83.8) 1738 (82.8) 1786 (81.9)
## diab (%) No 1072 (52.5) 1150 (54.8) 1100 (50.4)
## Yes 970 (47.5) 950 (45.2) 1082 (49.6)
## hyperchol (%) No 563 (27.6) 561 (26.7) 622 (28.5)
## Yes 1479 (72.4) 1539 (73.3) 1560 (71.5)
## famhist (%) No 1580 (77.4) 1640 (78.1) 1675 (76.8)
## Yes 462 (22.6) 460 (21.9) 507 (23.2)
## hormo (%) No 1811 (98.3) 1835 (98.4) 1918 (98.2)
## Yes 31 ( 1.7) 30 ( 1.6) 36 ( 1.8)
## p14 (mean (SD)) 8.44 (1.94) 8.81 (1.90) 8.77 (1.97)
## toevent (mean (SD)) 4.09 (1.74) 4.31 (1.70) 4.64 (1.60)
## event (%) No 1945 (95.2) 2030 (96.7) 2097 (96.1)
## Yes 97 ( 4.8) 70 ( 3.3) 85 ( 3.9)
## Stratified by group
## p test
## n
## sex (%) <0.001
##
## age (mean (SD)) 0.003
## smoke (%) 0.444
##
##
## bmi (mean (SD)) <0.001
## waist (mean (SD)) 0.045
## wth (mean (SD)) <0.001
## htn (%) 0.249
##
## diab (%) 0.017
##
## hyperchol (%) 0.423
##
## famhist (%) 0.581
##
## hormo (%) 0.850
##
## p14 (mean (SD)) <0.001
## toevent (mean (SD)) <0.001
## event (%) 0.064
##
但是tableone
并没有提供直接导出到Word的途径,只能导入到csv文件中,这是有点差劲的地方。
导出
tab_sv <- print(tab_s,showAllLevels = T,printToggle = F)
write.csv(tab_sv,file = "tab_sv.csv")
不同包的比较
可以看到tableone
做出一张表需要2行代码:
tab_s <- CreateTableOne(data = predimed,vars = colnames(predimed)[-1], strata = "group")
print(tab_s,showAllLevels = T)
## Stratified by group
## level Control MedDiet + Nuts MedDiet + VOO
## n 2042 2100 2182
## sex (%) Male 812 (39.8) 968 (46.1) 899 (41.2)
## Female 1230 (60.2) 1132 (53.9) 1283 (58.8)
## age (mean (SD)) 67.34 (6.28) 66.68 (6.02) 67.02 (6.21)
## smoke (%) Never 1282 (62.8) 1259 (60.0) 1351 (61.9)
## Current 270 (13.2) 296 (14.1) 292 (13.4)
## Former 490 (24.0) 545 (26.0) 539 (24.7)
## bmi (mean (SD)) 30.28 (3.96) 29.69 (3.77) 29.94 (3.71)
## waist (mean (SD)) 100.84 (10.77) 100.19 (10.56) 100.08 (10.44)
## wth (mean (SD)) 0.63 (0.07) 0.62 (0.06) 0.63 (0.06)
## htn (%) No 331 (16.2) 362 (17.2) 396 (18.1)
## Yes 1711 (83.8) 1738 (82.8) 1786 (81.9)
## diab (%) No 1072 (52.5) 1150 (54.8) 1100 (50.4)
## Yes 970 (47.5) 950 (45.2) 1082 (49.6)
## hyperchol (%) No 563 (27.6) 561 (26.7) 622 (28.5)
## Yes 1479 (72.4) 1539 (73.3) 1560 (71.5)
## famhist (%) No 1580 (77.4) 1640 (78.1) 1675 (76.8)
## Yes 462 (22.6) 460 (21.9) 507 (23.2)
## hormo (%) No 1811 (98.3) 1835 (98.4) 1918 (98.2)
## Yes 31 ( 1.7) 30 ( 1.6) 36 ( 1.8)
## p14 (mean (SD)) 8.44 (1.94) 8.81 (1.90) 8.77 (1.97)
## toevent (mean (SD)) 4.09 (1.74) 4.31 (1.70) 4.64 (1.60)
## event (%) No 1945 (95.2) 2030 (96.7) 2097 (96.1)
## Yes 97 ( 4.8) 70 ( 3.3) 85 ( 3.9)
## Stratified by group
## p test
## n
## sex (%) <0.001
##
## age (mean (SD)) 0.003
## smoke (%) 0.444
##
##
## bmi (mean (SD)) <0.001
## waist (mean (SD)) 0.045
## wth (mean (SD)) <0.001
## htn (%) 0.249
##
## diab (%) 0.017
##
## hyperchol (%) 0.423
##
## famhist (%) 0.581
##
## hormo (%) 0.850
##
## p14 (mean (SD)) <0.001
## toevent (mean (SD)) <0.001
## event (%) 0.064
##
而comparegroups
只需要1行:
descrTable(group ~ ., data = predimed)
##
## --------Summary descriptives table by 'Intervention group'---------
##
## ____________________________________________________________________________________
## Control MedDiet + Nuts MedDiet + VOO p.overall
## N=2042 N=2100 N=2182
## ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ
## Sex: <0.001
## Male 812 (39.8%) 968 (46.1%) 899 (41.2%)
## Female 1230 (60.2%) 1132 (53.9%) 1283 (58.8%)
## Age 67.3 (6.28) 66.7 (6.02) 67.0 (6.21) 0.003
## Smoking: 0.444
## Never 1282 (62.8%) 1259 (60.0%) 1351 (61.9%)
## Current 270 (13.2%) 296 (14.1%) 292 (13.4%)
## Former 490 (24.0%) 545 (26.0%) 539 (24.7%)
## Body mass index 30.3 (3.96) 29.7 (3.77) 29.9 (3.71) <0.001
## Waist circumference 101 (10.8) 100 (10.6) 100 (10.4) 0.045
## Waist-to-height ratio 0.63 (0.07) 0.62 (0.06) 0.63 (0.06) <0.001
## Hypertension: 0.249
## No 331 (16.2%) 362 (17.2%) 396 (18.1%)
## Yes 1711 (83.8%) 1738 (82.8%) 1786 (81.9%)
## Type-2 diabetes: 0.017
## No 1072 (52.5%) 1150 (54.8%) 1100 (50.4%)
## Yes 970 (47.5%) 950 (45.2%) 1082 (49.6%)
## Dyslipidemia: 0.423
## No 563 (27.6%) 561 (26.7%) 622 (28.5%)
## Yes 1479 (72.4%) 1539 (73.3%) 1560 (71.5%)
## Family history of premature CHD: 0.581
## No 1580 (77.4%) 1640 (78.1%) 1675 (76.8%)
## Yes 462 (22.6%) 460 (21.9%) 507 (23.2%)
## Hormone-replacement therapy: 0.850
## No 1811 (98.3%) 1835 (98.4%) 1918 (98.2%)
## Yes 31 (1.68%) 30 (1.61%) 36 (1.84%)
## MeDiet Adherence score 8.44 (1.94) 8.81 (1.90) 8.77 (1.97) <0.001
## follow-up to main event (years) 4.09 (1.74) 4.31 (1.70) 4.64 (1.60) <0.001
## AMI, stroke, or CV Death: 0.064
## No 1945 (95.2%) 2030 (96.7%) 2097 (96.1%)
## Yes 97 (4.75%) 70 (3.33%) 85 (3.90%)
## ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ
而且还能直接导出到Word,直接变成三线表!
“该用谁,不用我说了吧?
comparegroups包使用点这里: 使用compareGroups包1行代码生成基线资料表
以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
欢迎在评论区留言或直接添加我的微信!
欢迎关注公众号:医学和生信笔记
“医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!
往期回顾
机器学习算法识别结直肠癌中的免疫相关lncRNA signature
使用zotero和obsidian管理和阅读文献
R语言ggsci配色包详解
R语言画多时间点ROC和多指标ROC曲线
R语言可视化聚类树