使用compareGroups包1行代码生成基线资料表
这个包可用一句代码生成基线资料表、单因素分析表、多因素分析表等,可直接把结果导出为csv、Excel、Word、Markdown、LaTeX、PDF,而且十分美观,大大提高工作效率。
在我看来只有一个小小的问题:供选择的统计方法有限。
安装
使用
参数调整
安装
# 2选1
install.packages("compareGroups")
# library(devtools)
devtools::install_github(repo = "isubirana/compareGroups")
使用
现在使用只要一句代码就可以生成基线资料表了。使用自带的predimed
数据集进行演示。
“PREDIMED 是在西班牙进行的一项多中心试验,将具有血管风险,但在参加试验时没有心血管疾病的参与者,随机分配到以下三种饮食中的一种:地中海饮食加特级初榨橄榄油 (MedDiet+VOO), 地中海饮食补充混合坚果(MedDiet+Nuts),或控制饮食(建议减少饮食脂肪)。主要终点是心血管事件(心肌梗死、中风或心血管原因死亡)的发生。
这个数据可以说是非常临床了!对于搞临床的来说非常亲切,这样方便大家换成自己的数据。
library(compareGroups)
## Warning: 程辑包'compareGroups'是用R版本4.1.3 来建造的
data("predimed")
dim(predimed)
## [1] 6324 15
str(predimed)
## 'data.frame': 6324 obs. of 15 variables:
## $ group : Factor w/ 3 levels "Control","MedDiet + Nuts",..: 1 1 3 2 3 1 3 3 1 1 ...
## ..- attr(*, "label")= chr "Intervention group"
## $ sex : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 1 2 1 1 1 ...
## ..- attr(*, "label")= chr "Sex"
## $ age : num 58 77 72 71 79 63 75 66 71 76 ...
## ..- attr(*, "label")= chr "Age"
## $ smoke : Factor w/ 3 levels "Never","Current",..: 3 2 3 3 1 3 1 1 3 2 ...
## ..- attr(*, "label")= chr "Smoking"
## $ bmi : num 33.5 31.1 30.9 27.7 35.9 ...
## ..- attr(*, "label")= chr "Body mass index"
## $ waist : num 122 119 106 118 129 143 88 85 90 79 ...
## ..- attr(*, "label")= chr "Waist circumference"
## $ wth : num 0.753 0.73 0.654 0.694 0.806 ...
## ..- attr(*, "label")= chr "Waist-to-height ratio"
## $ htn : Factor w/ 2 levels "No","Yes": 1 2 1 2 2 2 1 2 2 2 ...
## ..- attr(*, "label")= chr "Hypertension"
## $ diab : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 2 2 2 1 2 ...
## ..- attr(*, "label")= chr "Type-2 diabetes"
## $ hyperchol: Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 1 2 1 ...
## ..- attr(*, "label")= chr "Dyslipidemia"
## $ famhist : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 2 1 1 ...
## ..- attr(*, "label")= chr "Family history of premature CHD"
## $ hormo : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 NA NA 1 1 1 ...
## ..- attr(*, "label")= chr "Hormone-replacement therapy"
## $ p14 : num 10 10 8 8 9 9 8 9 14 9 ...
## ..- attr(*, "label")= chr "MeDiet Adherence score"
## $ toevent : num 5.37 6.1 5.95 2.91 4.76 ...
## ..- attr(*, "label")= chr "follow-up to main event (years)"
## $ event : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 2 1 1 2 1 ...
## ..- attr(*, "label")= Named chr "AMI, stroke, or CV Death"
## .. ..- attr(*, "names")= chr "varlabel"
数据就是这样的,大部分都是常见的基线资料,比如性别、年龄、BMI、腰围、是否有糖尿病、是否有高血压、是否有家族史等等,还有一列是 是否发生了心血管事件。
“分类变量要因子化!
restab <- descrTable(group ~ ., data = predimed)
restab
##
## --------Summary descriptives table by 'Intervention group'---------
##
## ____________________________________________________________________________________
## Control MedDiet + Nuts MedDiet + VOO p.overall
## N=2042 N=2100 N=2182
## ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ
## Sex: <0.001
## Male 812 (39.8%) 968 (46.1%) 899 (41.2%)
## Female 1230 (60.2%) 1132 (53.9%) 1283 (58.8%)
## Age 67.3 (6.28) 66.7 (6.02) 67.0 (6.21) 0.003
## Smoking: 0.444
## Never 1282 (62.8%) 1259 (60.0%) 1351 (61.9%)
## Current 270 (13.2%) 296 (14.1%) 292 (13.4%)
## Former 490 (24.0%) 545 (26.0%) 539 (24.7%)
## Body mass index 30.3 (3.96) 29.7 (3.77) 29.9 (3.71) <0.001
## Waist circumference 101 (10.8) 100 (10.6) 100 (10.4) 0.045
## Waist-to-height ratio 0.63 (0.07) 0.62 (0.06) 0.63 (0.06) <0.001
## Hypertension: 0.249
## No 331 (16.2%) 362 (17.2%) 396 (18.1%)
## Yes 1711 (83.8%) 1738 (82.8%) 1786 (81.9%)
## Type-2 diabetes: 0.017
## No 1072 (52.5%) 1150 (54.8%) 1100 (50.4%)
## Yes 970 (47.5%) 950 (45.2%) 1082 (49.6%)
## Dyslipidemia: 0.423
## No 563 (27.6%) 561 (26.7%) 622 (28.5%)
## Yes 1479 (72.4%) 1539 (73.3%) 1560 (71.5%)
## Family history of premature CHD: 0.581
## No 1580 (77.4%) 1640 (78.1%) 1675 (76.8%)
## Yes 462 (22.6%) 460 (21.9%) 507 (23.2%)
## Hormone-replacement therapy: 0.850
## No 1811 (98.3%) 1835 (98.4%) 1918 (98.2%)
## Yes 31 (1.68%) 30 (1.61%) 36 (1.84%)
## MeDiet Adherence score 8.44 (1.94) 8.81 (1.90) 8.77 (1.97) <0.001
## follow-up to main event (years) 4.09 (1.74) 4.31 (1.70) 4.64 (1.60) <0.001
## AMI, stroke, or CV Death: 0.064
## No 1945 (95.2%) 2030 (96.7%) 2097 (96.1%)
## Yes 97 (4.75%) 70 (3.33%) 85 (3.90%)
## ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ
这样一张表基本上就很全面了,可以直接导出到Word里面微调。
export2word(restab, file='table1.docx') # 直接是三线表的格式
参数调整
有非常多参数可以调整,这里简单介绍下几个常用的。
descrTable(formula, data, subset, na.action = NULL, y = NULL, Xext = NULL,
selec = NA, method = 1, timemax = NA, alpha = 0.05, min.dis = 5, max.ylev = 5,
max.xlev = 10, include.label = TRUE, Q1 = 0.25, Q3 = 0.75, simplify = TRUE,
ref = 1, ref.no = NA, fact.ratio = 1, ref.y = 1, p.corrected = TRUE,
compute.ratio = TRUE, include.miss = FALSE, oddsratio.method = "midp",
chisq.test.perm = FALSE, byrow = FALSE, chisq.test.B = 2000, chisq.test.seed = NULL,
Date.format = "d-mon-Y", var.equal = TRUE, conf.level = 0.95, surv = FALSE,
riskratio = FALSE, riskratio.method = "wald", compute.prop = FALSE,
lab.missing = "'Missing'",
hide = NA, digits = NA, type = NA, show.p.overall = TRUE,
show.all, show.p.trend, show.p.mul = FALSE, show.n, show.ratio =
FALSE, show.descr = TRUE, show.ci = FALSE, hide.no = NA, digits.ratio = NA,
show.p.ratio = show.ratio, digits.p = 3, sd.type = 1, q.type = c(1, 1),
extra.labels = NA, all.last = FALSE)
可以看到参数非常多,这其实是整合了createTable()
和compareGroups()
两个函数的参数,本来是要2步才能创建一个表的,现在只需要一个descrTable()
函数即可。
formula:可以为空,如果是空则对整体进行描述,也不进行统计检验 subset和selec:用来选择部分数据,比如subset = sex == "Female",selec = list(hormo = sex == "Female", waist = waist > 20) method:默认连续型变量为正态分布,使用anova法进行检验,有以下取值: alpha:决定shapiro-wilks法的显著性界值 min.dis:当个数少于这个值时,则转化为分类变量 比如method = c(waist = 2),waist这个变量为非正态分布 1:正态分布(默认值),使用anova和t-test,多重比较使用Turkey法 2:非正态分布,使用kruskal法进行检验 3:分类变量,使用卡方或者Fisher精确概率法 NA:使用shapiro-wilks法检验是否是正态分布,然后选择合适的方法,提供另外2个参数: max.ylev和max.xlev:因变量和自变量最大的水平数量 simplify:默认是TRUE,删除因子水平为0的 ref和ref.no:更改参考水平 fact.ratio:计算OR值或RR值的增加单位,比如fact.ratio = c(age = 10, bmi = 2),表示age每增加10,bmi每增加2 digits:小数点位数
这个R包也可用于生存分析,只要用Surv()
函数构建号表达式即可,还自带了一些可视化方法,不过并不是很好看。另外,还提供了一个图形化的使用方式。
更多精彩,可以查看官方教程[1]
参考资料
comparegroups官方教程:: https://cran.r-project.org/web/packages/compareGroups/vignettes/compareGroups_vignette.html
以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
欢迎在评论区留言或直接添加我的微信!
欢迎关注公众号:医学和生信笔记
“医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!
往期回顾
让你的图片中文不再乱码!
用更简单的方式画森林图
R语言处理因子之forcats包介绍(2)
长数据变为宽数据的7种情况!