R 语言 SEM 笔记:结构方程模型的多组分析(Multigroup Analysis)
目录
引言
1 包
2 数据
3 拟合模型:原文代码
4 多组分析:出现 bug
5 分析问题
5.1 随机效应指定问题以及 nlme 的 lme 函数收敛问题
5.2 更换为使用 glmmTMB 拟合
6 多组分析:解决 bug
7 basisSet2.R 和 multigroup2.R 的代码
7.1 basisSet2.R
7.2 multigroup2.R
结语
引言
结构方程的多组分析回答一个问题:结构方程模型中,哪些路径会根据分组而变化,哪些不会?
步骤是:1、先拟合一个整体模型;2、再检验模型分组后,哪些路径(在统计学上)是全局不变的(constrained),哪些是随着分组改变的?
类似先做方差分析,再做事后检验?
具体参考:
Introduction to Multigroup Analysis
https://jslefche.github.io/sem_book/multigroup-analysis.html#introduction-to-multigroup-analysis
1 包
library(tidyverse)
library(nlme)
library(glmmTMB)
library(piecewiseSEM)
2 数据
本文数据、原代码来自以下论文的 Open Research 部分
原代码的 SEM 多组分析 multigroup 代码无法运行,本文中对此代码做了改动
Understanding temporal variability across trophic levels and spatial scales in freshwater ecosystems
https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4219
读取数据
# 数据
local.metr.full4 <- read.csv("local.metr.full4.csv", header = TRUE)
# scale 且转换为 numeric 数值型
scale2 <- function(x) as.numeric(scale(x))
# 数据标准化
local.metr.full4.scaled <- local.metr.full4 |>
mutate_if(is.numeric, scale2) |>
as.data.frame()
3 拟合模型:原文代码
又有小 bug:summary 在 R Notebook 里无法获取作 Tests of directed separation
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行
site.cv0 <- psem(
lme(
cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
Gen_sampled + S,
random = ~ 1 | Metacom / Meta_troph,
data = local.metr.full4.scaled
),
lme(
synchrony_comm_site ~ S + Gen_sampled,
random = ~ 1 | Metacom / Meta_troph,
data = local.metr.full4.scaled
),
lme(
mean_cv_species_site ~ S + bio15 + Gen_sampled,
random = ~ 1 | Metacom / Meta_troph,
data =local.metr.full4.scaled
)
)
# 看看模型拟合
# summary(site.cv0, .progressBar = FALSE)
4 多组分析:出现 bug
报错了!
Error in
[.data.frame
(data, , var) : undefined columns selected
mt.cv0 <- multigroup(
site.cv0,
group = "New_tr_g",
standardize = "range"
)
mt.cv0
5 分析问题
5.1 随机效应指定问题以及 nlme 的 lme 函数收敛问题
site.cv0 模型出现 Warning message: Check model convergence: log-likelihood estimates lead to negative Chi-squared 这是因为有的随机效应太小
检查发现,这个问题出在 Metacom 这个随机效应的 StdDev 只有 0.0002506188
随机效应里去掉 Metacom,发现仍然出现警告,说明模型还是没有收敛
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行
# 逐个检查 psem 里的模型,发现
lme(
synchrony_comm_site ~ S + Gen_sampled,
random = ~ 1 | Metacom / Meta_troph,
data = local.metr.full4.scaled
)
## Linear mixed-effects model fit by REML
## Data: local.metr.full4.scaled
## Log-restricted-likelihood: -624.3329
## Fixed: synchrony_comm_site ~ S + Gen_sampled
## (Intercept) S Gen_sampled
## -0.04030877 -0.38854654 -0.20262683
##
## Random effects:
## Formula: ~1 | Metacom
## (Intercept)
## StdDev: 0.0002507212
##
## Formula: ~1 | Meta_troph %in% Metacom
## (Intercept) Residual
## StdDev: 0.4234018 0.7825915
##
## Number of Observations: 501
## Number of Groups:
## Metacom Meta_troph %in% Metacom
## 30 49
# 问题出在 Metacom 这个随机效应的 StdDev 只有 0.0002506188
# Random effects:
# Formula: ~1 | Metacom
# (Intercept)
# StdDev: 0.0002506188
# 随机效应里去掉 Metacom,但是仍然报错
site.cv0 <- psem(
lme(
cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
Gen_sampled + S,
random = ~ 1 | Metacom / Meta_troph,
data = local.metr.full4.scaled
),
lme(
synchrony_comm_site ~ S + Gen_sampled,
random = ~ 1 | Meta_troph, # 随机效应里去掉 Metacom
data = local.metr.full4.scaled
),
lme(
mean_cv_species_site ~ S + bio15 + Gen_sampled,
random = ~ 1 | Metacom / Meta_troph,
data =local.metr.full4.scaled
)
)
# 看看模型拟合,发现模型还是没有收敛
# summary(site.cv0, .progressBar = FALSE)
5.2 更换为使用 glmmTMB 拟合
更换为使用 glmmTMB 拟合,因为 glmmTMB 使用更好的优化器 optimizer
不再报错了!
Chi-Squared = 3.665 with P-value = 0.16 and on 2 degrees of freedom 也正常显示
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行
# 更换为使用 glmmTMB 拟合,因为 glmmTMB 使用更好的优化器 optimizer
site.cv0 <- psem(
glmmTMB(
cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
Gen_sampled + S + (1 | Metacom / Meta_troph),
data = local.metr.full4.scaled,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
),
glmmTMB(
synchrony_comm_site ~ S + Gen_sampled + (1 | Metacom / Meta_troph), # 通过更改 glmmTMBControl 也可以解决 isSingular 问题
data = local.metr.full4.scaled,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
),
glmmTMB(
mean_cv_species_site ~ S + bio15 + Gen_sampled + (1 | Metacom / Meta_troph),
data = local.metr.full4.scaled,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
)
)
# 看看模型拟合
# summary(site.cv0, .progressBar = FALSE)
6 多组分析:解决 bug
mt.cv0 需要粘贴到 R 控制台 Console 里运行
得到最终 Fisher’s C = 6.354 with P-value = 0.174 and on 4 degrees of freedom
source("basisSet2.R")
source("multigroup2.R")
# 用 glmmTMB 拟合,但是又想做 multigroup 分析的小技巧
site.cv0$data$New_tr_g <- local.metr.full4.scaled$New_tr_g
# 多组分析
mt.cv0 <- multigroup2(
site.cv0,
group = "New_tr_g",
standardize = "range"
)
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
# 还是有收敛问题的警告,但是目前我没办法做更多了
mt.cv0$Cstat |> print()
## Fisher.C df P.Value
## 1 6.345 4 0.175
mt.cv0$global |> print()
## Response Predictor
## 7 cv_comm_site synchrony_comm_site
## 11 cv_comm_site S
## 16 synchrony_comm_site Gen_sampled
## 22 mean_cv_species_site bio15
mt.cv0$anovaInts |> print()
## Response Predictor Test.Stat DF P.Value
## 7 cv_comm_site synchrony_comm_site:New_tr_g 285.6 1 0.0923
## 8 cv_comm_site New_tr_g:mean_cv_species_site 285.6 1 0.0000 ***
## 9 cv_comm_site New_tr_g:bio15 285.6 1 0.0000 ***
## 10 cv_comm_site New_tr_g:Gen_sampled 285.6 1 0.0000 ***
## 11 cv_comm_site New_tr_g:S 285.6 1 0.1032
## 15 synchrony_comm_site S:New_tr_g 33.4 1 0.0008 ***
## 16 synchrony_comm_site New_tr_g:Gen_sampled 33.4 1 0.0854
## 21 mean_cv_species_site S:New_tr_g 0.1 1 0.0000 ***
## 22 mean_cv_species_site New_tr_g:bio15 0.1 1 0.0904
## 23 mean_cv_species_site New_tr_g:Gen_sampled 0.1 1 0.0045 **
mt.cv0$group.coefs |> print()
## $Primary
## Response Predictor Estimate Std.Error DF Crit.Value
## 1 cv_comm_site synchrony_comm_site 0.5796 0.0333 501 17.4028
## 2 cv_comm_site mean_cv_species_site 0.7381 0.1225 97 6.0252
## 3 cv_comm_site bio15 0.0721 0.0845 97 0.8537
## 4 cv_comm_site Gen_sampled -0.2023 0.2444 97 -0.8278
## 5 cv_comm_site S 0.0820 0.0504 501 1.6280
## 6 synchrony_comm_site S -0.1208 0.1412 97 -0.8553
## 7 synchrony_comm_site Gen_sampled -0.2032 0.0631 501 -3.2205
## 8 mean_cv_species_site S -0.3374 0.0654 97 -5.1564
## 9 mean_cv_species_site bio15 0.2267 0.0750 501 3.0230
## 10 mean_cv_species_site Gen_sampled 0.5324 0.2864 97 1.8586
## P.Value Std.Estimate
## 1 0.0000 0.529 *** c
## 2 0.0000 0.4426 ***
## 3 0.3933 0.0443
## 4 0.4078 -0.0698
## 5 0.1035 0.0597 c
## 6 0.3924 -0.0963
## 7 0.0013 -0.0768 ** c
## 8 0.0000 -0.4093 ***
## 9 0.0025 0.232 ** c
## 10 0.0631 0.3062
##
## $Secondary
## Response Predictor Estimate Std.Error DF Crit.Value
## 1 cv_comm_site synchrony_comm_site 0.5796 0.0333 501 17.4028
## 2 cv_comm_site mean_cv_species_site 0.5348 0.0652 208 8.2019
## 3 cv_comm_site bio15 0.2252 0.0887 208 2.5389
## 4 cv_comm_site Gen_sampled -85.7565 42.5758 208 -2.0142
## 5 cv_comm_site S 0.0820 0.0504 501 1.6280
## 6 synchrony_comm_site S -0.3548 0.0876 208 -4.0520
## 7 synchrony_comm_site Gen_sampled -0.2032 0.0631 501 -3.2205
## 8 mean_cv_species_site S -0.0814 0.0773 208 -1.0536
## 9 mean_cv_species_site bio15 0.2267 0.0750 501 3.0230
## 10 mean_cv_species_site Gen_sampled 67.3925 45.1464 208 1.4928
## P.Value Std.Estimate
## 1 0.0000 0.7427 *** c
## 2 0.0000 0.4504 ***
## 3 0.0111 0.1627 *
## 4 0.0440 -0.1726 *
## 5 0.1035 0.0487 c
## 6 0.0001 -0.1645 ***
## 7 0.0013 -3e-04 ** c
## 8 0.2921 -0.0575
## 9 0.0025 0.1945 ** c
## 10 0.1355 0.1611
##
## $Tertiary
## Response Predictor Estimate Std.Error DF Crit.Value
## 1 cv_comm_site synchrony_comm_site 0.5796 0.0333 501 17.4028
## 2 cv_comm_site mean_cv_species_site 0.3647 0.0589 173 6.1919
## 3 cv_comm_site bio15 -0.2048 0.0765 173 -2.6778
## 4 cv_comm_site Gen_sampled -95.3207 41.8872 173 -2.2756
## 5 cv_comm_site S 0.0820 0.0504 501 1.6280
## 6 synchrony_comm_site S -0.6758 0.1396 173 -4.8397
## 7 synchrony_comm_site Gen_sampled -0.2032 0.0631 501 -3.2205
## 8 mean_cv_species_site S -0.1303 0.1953 173 -0.6673
## 9 mean_cv_species_site bio15 0.2267 0.0750 501 3.0230
## 10 mean_cv_species_site Gen_sampled -45.5872 87.2956 173 -0.5222
## P.Value Std.Estimate
## 1 0.0000 0.4898 *** c
## 2 0.0000 0.3353 ***
## 3 0.0074 -0.1323 **
## 4 0.0229 -0.1157 *
## 5 0.1035 0.043 c
## 6 0.0000 -0.4190 ***
## 7 0.0013 -3e-04 ** c
## 8 0.5046 -0.0743
## 9 0.0025 0.1593 ** c
## 10 0.6015 -0.0602
##
## $Producers
## Response Predictor Estimate Std.Error DF Crit.Value
## 1 cv_comm_site synchrony_comm_site 0.5796 0.0333 501 17.4028
## 2 cv_comm_site mean_cv_species_site 0.8441 0.4189 23 2.0150
## 3 cv_comm_site bio15 -0.1403 0.1560 23 -0.8990
## 4 cv_comm_site Gen_sampled 0.0222 0.0562 23 0.3942
## 5 cv_comm_site S 0.0820 0.0504 501 1.6280
## 6 synchrony_comm_site S -0.3341 0.4495 23 -0.7432
## 7 synchrony_comm_site Gen_sampled -0.2032 0.0631 501 -3.2205
## 8 mean_cv_species_site S -0.0626 0.0490 23 -1.2772
## 9 mean_cv_species_site bio15 0.2267 0.0750 501 3.0230
## 10 mean_cv_species_site Gen_sampled -0.0212 0.0951 23 -0.2229
## P.Value Std.Estimate
## 1 0.0000 0.7481 *** c
## 2 0.0439 0.3247 *
## 3 0.3687 -0.1375
## 4 0.6934 0.0423
## 5 0.1035 0.0696 c
## 6 0.4574 -0.2196
## 7 0.0013 -0.3004 ** c
## 8 0.2015 -0.1381
## 9 0.0025 0.5777 ** c
## 10 0.8236 -0.1051
7 basisSet2.R 和 multigroup2.R 的代码
需要先保存为 basisSet2.R 和 multigroup2.R 两个文件,然后 source 调用
7.1 basisSet2.R
basisSet2 <- function(modelList.with.data, direction = NULL, interactions = FALSE) {
amat <- getDAG(modelList.with.data)
b <- lapply(1:nrow(amat), function(i) {
lapply(
i:ncol(amat),
function(j) {
if (amat[i, j] != 0 | i == j) {
NULL
} else {
cvars <- c(rownames(amat)[amat[, rownames(amat)[i], drop = FALSE] == 1], rownames(amat)[amat[, rownames(amat)[j], drop = FALSE] == 1])
cvars <- cvars[!duplicated(cvars)]
c(rownames(amat)[i], rownames(amat)[j], cvars)
}
}
)
})
b <- unlist(b, recursive = FALSE)
b <- b[!sapply(b, is.null)]
if (length(b) > 0) {
b <- piecewiseSEM:::filterExogenous(b, modelList.with.data, amat)
b <- piecewiseSEM:::filterSmoothed(b, modelList.with.data)
b <- piecewiseSEM:::filterExisting(b, modelList.with.data)
b <- piecewiseSEM:::filterInteractions(b, interactions)
b <- piecewiseSEM:::removeCerror(b, modelList.with.data)
b <- piecewiseSEM:::reverseAddVars(b, modelList.with.data, amat)
b <- piecewiseSEM:::reverseNonLin(b, modelList.with.data, amat)
b <- piecewiseSEM:::fixCatDir(b, modelList.with.data) # Here is the cause of the problem!
if (!is.null(direction)) {
b <- piecewiseSEM:::specifyDir(b, direction)
}
}
class(b) <- "basisSet"
return(b)
}
7.2 multigroup2.R
multigroup2 <- function(modelList, group, standardize = "scale", standardize.type = "latent.linear", test.type = "III") {
name <- deparse(match.call()$modelList)
data <- modelList$data
# piecewiseSEM:::removeData(modelList, formulas = 1) remove data from modelList BUT piecewiseSEM:::fixCatDir(b, modelList) need `modelList$data`
modelListRaw <- modelList
modelList <- piecewiseSEM:::removeData(modelList, formulas = 1)
intModelList <- lapply(modelList, function(i) {
rhs2 <- paste(paste(piecewiseSEM:::all.vars_trans(i)[-1], "*", group), collapse = " + ")
i <- update(i, formula(paste(". ~ ", rhs2)))
return(i)
})
newModelList <- lapply(unique(data[, group]), function(i) {
update(as.psem(modelList), data = data[data[, group] == i, ])
})
names(newModelList) <- unique(data[, group])
coefsList <- lapply(
newModelList,
coefs,
standardize,
standardize.type,
test.type
)
names(coefsList) <- unique(data[, group])
coefTable <- coefs(modelList, standardize, standardize.type, test.type)
anovaTable <- anova(as.psem(intModelList))[[1]]
anovaInts <- anovaTable[grepl(":", anovaTable$Predictor), ]
global <- anovaInts[anovaInts$P.Value >= 0.05, c("Response", "Predictor")]
global$Predictor <- sub(":", "\\1", sub(group, "\\1", global$Predictor))
if (nrow(global) == nrow(anovaInts)) {
newCoefsList <- list(global = coefTable)
} else {
newCoefsList <- lapply(names(coefsList), function(i) {
ct <- as.matrix(coefsList[[i]])
idx <- which(apply(ct[, 1:2], 1, paste, collapse = "___") %in% apply(global[, 1:2], 1, paste, collapse = "___"))
ct[idx, ] <- as.matrix(coefTable[idx, ])
ct <- cbind(ct, ifelse(1:nrow(ct) %in% idx, "c", ""))
for (j in 1:nrow(ct)) {
if (ct[j, ncol(ct)] == "c") {
model <- modelList[[which(sapply(piecewiseSEM:::listFormula(modelList), function(x) {
piecewiseSEM:::all.vars.merMod(x)[1] == ct[j, "Response"]
}))]]
data. <- data[data[, group] == i, ]
sd.x <- piecewiseSEM:::GetSDx(model, modelList, data., standardize)
sd.x <- sd.x[which(names(sd.x) == ct[j, "Predictor"])]
sd.y <- piecewiseSEM:::GetSDy(model, data., standardize, standardize.type)
new.coef <- as.numeric(ct[j, "Estimate"]) * (sd.x / sd.y)
ct[j, "Std.Estimate"] <- ifelse(length(new.coef) > 0, round(as.numeric(new.coef), 4), "-")
}
}
ct <- as.data.frame(ct)
ct[is.na(ct)] <- "-"
names(ct)[(ncol(ct) - 1):ncol(ct)] <- ""
return(ct)
})
names(newCoefsList) <- names(coefsList)
}
if (nrow(global) == nrow(anovaInts)) {
gof <- piecewiseSEM::fisherC(modelList)
} else {
# b <- piecewiseSEM:::basisSet(modelList) # Cause of the problem
b <- basisSet2(modelListRaw) # A solution
cf <- coefTable[coefTable$Response %in% global$Response & coefTable$Predictor %in% global$Predictor, ]
b <- lapply(
X = b,
FUN = function(i) {
for (j in 3:length(i)) {
value <- cf[cf$Response == i[2] & cf$Predictor == i[j], "Estimate"]
if (length(value) != 0) {
i[j] <- paste0("offset(", value, "*", i[j], ")")
}
}
return(i)
}
)
if (length(b) == 0) {
b <- NULL
}
gof <- fisherC(modelList, basis.set = b)
}
ret <- list(
name = name,
group = group,
global = global,
anovaInts = anovaInts,
group.coefs = newCoefsList,
Cstat = gof
)
class(ret) <- "multigroup.psem"
return(ret)
}
结语
代码和数据(原代码中有数据预处理步骤)都是全部公开的。如果您想“即得即用”——获取此份代码的 .R 文件和 .csv 数据,请打赏或帮助转发:
打赏 3 元或以上,或
转发票圈保持好友可见 12 小时以上,并且点文末右下角的“在看”,或
转发到群聊(最好是科研相关的群),并且点“在看”
然后,私信我留下邮箱即可获取(完成后,务必留下邮箱)
更多独家代码请查看本公众号其他推送:
① R 语言|全球独家|鼠标修改布局|结构方程模型可视化工具教程
② 结构方程模型|可视化升级:完全用 R 语言实现 Manuel Delgado-Baquerizo 风格的 SEM 图
③ R 语言结构方程模型:尝试用 R 画一个 PNAS 文章的 SEM 图
④ R 语言 SEM 笔记:复合变量(composite)的构造案例
⑤ R 语言 SEM 笔记:使用 ggplot2 衍生包画 Nature 结构方程模型图