其他
ggnostic和ggcoef可视化回归模型
开始今天的学习吧~
来自R包:GGally。
ggbivariate()
展示一个结果变量和其他变量之间的关系。
data(tips, package = "reshape")
ggbivariate(tips, outcome = "smoker", explanatory = c("day","time","sex","tip"))
ggbivariate(tips, outcome = "total_bill", explanatory = c("day", "time", "sex", "tip"))
ggbivariate(tips, "smoker")
自定义颜色
终于可以直接和ggplot2
对接了!
library(ggplot2)
ggbivariate(tips, outcome = "smoker", explanatory = c("day","time","sex","tip")) +
scale_fill_brewer(type = "qual")
这个函数当然也可以直接用ggplot2
的语法修改主题,就不在赘述了。
ggnostic()
这个函数可以直接可视化回归模型。
使用之前医学统计学系列中例15-1的数据:
df15_1 <- data.frame(
cho = c(5.68,3.79,6.02,4.85,4.60,6.05,4.90,7.08,3.85,4.65,4.59,4.29,7.97,
6.19,6.13,5.71,6.40,6.06,5.09,6.13,5.78,5.43,6.50,7.98,11.54,5.84,
3.84),
tg = c(1.90,1.64,3.56,1.07,2.32,0.64,8.50,3.00,2.11,0.63,1.97,1.97,1.93,
1.18,2.06,1.78,2.40,3.67,1.03,1.71,3.36,1.13,6.21,7.92,10.89,0.92,
1.20),
ri = c(4.53, 7.32,6.95,5.88,4.05,1.42,12.60,6.75,16.28,6.59,3.61,6.61,7.57,
1.42,10.35,8.53,4.53,12.79,2.53,5.28,2.96,4.31,3.47,3.37,1.20,8.61,
6.45),
hba = c(8.2,6.9,10.8,8.3,7.5,13.6,8.5,11.5,7.9,7.1,8.7,7.8,9.9,6.9,10.5,8.0,
10.3,7.1,8.9,9.9,8.0,11.3,12.3,9.8,10.5,6.4,9.6),
fpg = c(11.2,8.8,12.3,11.6,13.4,18.3,11.1,12.1,9.6,8.4,9.3,10.6,8.4,9.6,10.9,
10.1,14.8,9.1,10.8,10.2,13.6,14.9,16.0,13.2,20.0,13.3,10.4)
)
str(df15_1)
## 'data.frame': 27 obs. of 5 variables:
## $ cho: num 5.68 3.79 6.02 4.85 4.6 6.05 4.9 7.08 3.85 4.65 ...
## $ tg : num 1.9 1.64 3.56 1.07 2.32 0.64 8.5 3 2.11 0.63 ...
## $ ri : num 4.53 7.32 6.95 5.88 4.05 ...
## $ hba: num 8.2 6.9 10.8 8.3 7.5 13.6 8.5 11.5 7.9 7.1 ...
## $ fpg: num 11.2 8.8 12.3 11.6 13.4 18.3 11.1 12.1 9.6 8.4 ...
先看看关系:
ggbivariate(df15_1, outcome = "fpg")
建立回归方程:
f <- lm(fpg ~ cho + tg + ri + hba, data = df15_1)
summary(f)
##
## Call:
## lm(formula = fpg ~ cho + tg + ri + hba, data = df15_1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6268 -1.2004 -0.2276 1.5389 4.4467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9433 2.8286 2.101 0.0473 *
## cho 0.1424 0.3657 0.390 0.7006
## tg 0.3515 0.2042 1.721 0.0993 .
## ri -0.2706 0.1214 -2.229 0.0363 *
## hba 0.6382 0.2433 2.623 0.0155 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.01 on 22 degrees of freedom
## Multiple R-squared: 0.6008, Adjusted R-squared: 0.5282
## F-statistic: 8.278 on 4 and 22 DF, p-value: 0.0003121
可视化回归方程(回归方程的诊断图):
ggnostic(f)
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
这幅图也是很全面了,直接给出了各个自变量有无统计学意义,残差,sigma值,帽子值,cook距离!
当然还可以给出更多信息!:
ggnostic(f, columnsY = c("fpg", ".fitted", ".se.fit", ".resid", ".std.resid", ".hat", ".sigma", ".cooksd"))
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
ggnostic
和ggbivariate()
用在医学统计学中的回归可视化中感觉非常实用!
ggcoef_model()
可以直接可视化回归系数,支持线性回归和logistic回归,还有一个ggcoef()
用法差不多。
线性回归:
data(tips, package = "reshape")
mod_simple <- lm(tip ~ day + time + total_bill, data = tips)
ggcoef_model(mod_simple)
logistic回归:
d_titanic <- as.data.frame(Titanic)
d_titanic$Survived <- factor(d_titanic$Survived, c("No", "Yes"))
mod_titanic <- glm(
Survived ~ Sex * Age + Class,
weights = Freq,
data = d_titanic,
family = binomial
)
ggcoef_model(mod_titanic, exponentiate = TRUE)
支持自定义标题:
library(labelled)
tips_labelled <- tips %>%
set_variable_labels(
day = "Day of the week",
time = "Lunch or Dinner",
total_bill = "Bill's total"
)
mod_labelled <- lm(tip ~ day + time + total_bill, data = tips_labelled)
ggcoef_model(mod_labelled)
支持更改主题:
ggcoef_model(mod_simple) +
xlab("Coefficients") +
ggtitle("Custom title") +
scale_color_brewer(palette = "Set1") +
theme(legend.position = "right")
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
同时比较多个回归模型:
mod1 <- lm(Fertility ~ ., data = swiss)
mod2 <- step(mod1, trace = 0)
mod3 <- lm(Fertility ~ Agriculture + Education * Catholic, data = swiss)
models <- list("Full model" = mod1, "Simplified model" = mod2, "With interaction" = mod3)
ggcoef_compare(models)
分面:
ggcoef_compare(models, type = "faceted")
这个包真的很强大,好好使用绝对是利器!
以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
欢迎在评论区留言或直接添加我的微信!
欢迎关注我的公众号:医学和生信笔记
“医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!
往期精彩内容:
R语言tidy风格医学统计学
R语言处理因子之forcats包介绍(1)
R语言处理因子之forcats包介绍(2)
矿工日常:藏毛窦简介