R 语言 SEM 笔记：结构方程模型的多组分析（Multigroup Analysis）

原创 ecologyR ecologyR 2024-03-25

引言

结构方程的多组分析回答一个问题：结构方程模型中，哪些路径会根据分组而变化，哪些不会？

步骤是：1、先拟合一个整体模型；2、再检验模型分组后，哪些路径（在统计学上）是全局不变的（constrained），哪些是随着分组改变的？

类似先做方差分析，再做事后检验？

具体参考：

Introduction to Multigroup Analysis

https://jslefche.github.io/sem_book/multigroup-analysis.html#introduction-to-multigroup-analysis

1 包

library(tidyverse)
library(nlme)
library(glmmTMB)
library(piecewiseSEM)

2 数据

本文数据、原代码来自以下论文的 Open Research 部分

原代码的 SEM 多组分析 multigroup 代码无法运行，本文中对此代码做了改动

Understanding temporal variability across trophic levels and spatial scales in freshwater ecosystems

https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.4219

读取数据

# 数据
local.metr.full4 <- read.csv("local.metr.full4.csv", header = TRUE)

# scale 且转换为 numeric 数值型
scale2 <- function(x) as.numeric(scale(x))

# 数据标准化
local.metr.full4.scaled <- local.metr.full4 |> 
  mutate_if(is.numeric, scale2) |>
  as.data.frame()

3 拟合模型：原文代码

又有小 bug：summary 在 R Notebook 里无法获取作 Tests of directed separation
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行

site.cv0 <- psem(
  lme(
    cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
      Gen_sampled + S,
    random = ~ 1 | Metacom / Meta_troph,
    data = local.metr.full4.scaled
  ),
  lme(
    synchrony_comm_site ~ S + Gen_sampled,
    random = ~ 1 | Metacom / Meta_troph,
    data = local.metr.full4.scaled
  ),
  lme(
    mean_cv_species_site ~ S + bio15 + Gen_sampled,
    random = ~ 1 | Metacom / Meta_troph,
    data =local.metr.full4.scaled
  )
)

# 看看模型拟合

# summary(site.cv0, .progressBar = FALSE)

4 多组分析：出现 bug

报错了！

Error in [.data.frame(data, , var) : undefined columns selected

mt.cv0 <- multigroup(
  site.cv0,
  group = "New_tr_g",
  standardize = "range"
)
mt.cv0

5 分析问题

5.1 随机效应指定问题以及 nlme 的 lme 函数收敛问题

site.cv0 模型出现 Warning message: Check model convergence: log-likelihood estimates lead to negative Chi-squared 这是因为有的随机效应太小
检查发现，这个问题出在 Metacom 这个随机效应的 StdDev 只有 0.0002506188
随机效应里去掉 Metacom，发现仍然出现警告，说明模型还是没有收敛
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行

# 逐个检查 psem 里的模型，发现
lme(
    synchrony_comm_site ~ S + Gen_sampled,
    random = ~ 1 | Metacom / Meta_troph,
    data = local.metr.full4.scaled
  )

## Linear mixed-effects model fit by REML
##   Data: local.metr.full4.scaled 
##   Log-restricted-likelihood: -624.3329
##   Fixed: synchrony_comm_site ~ S + Gen_sampled 
## (Intercept)           S Gen_sampled 
## -0.04030877 -0.38854654 -0.20262683 
## 
## Random effects:
##  Formula: ~1 | Metacom
##          (Intercept)
## StdDev: 0.0002507212
## 
##  Formula: ~1 | Meta_troph %in% Metacom
##         (Intercept)  Residual
## StdDev:   0.4234018 0.7825915
## 
## Number of Observations: 501
## Number of Groups: 
##                 Metacom Meta_troph %in% Metacom 
##                      30                      49

# 问题出在 Metacom 这个随机效应的 StdDev 只有 0.0002506188
# Random effects:
#  Formula: ~1 | Metacom
#          (Intercept)
# StdDev: 0.0002506188

# 随机效应里去掉 Metacom，但是仍然报错
site.cv0 <- psem(
  lme(
    cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
      Gen_sampled + S,
    random = ~ 1 | Metacom / Meta_troph,
    data = local.metr.full4.scaled
  ),
  lme(
    synchrony_comm_site ~ S + Gen_sampled,
    random = ~ 1 | Meta_troph, # 随机效应里去掉 Metacom
    data = local.metr.full4.scaled
  ),
  lme(
    mean_cv_species_site ~ S + bio15 + Gen_sampled,
    random = ~ 1 | Metacom / Meta_troph,
    data =local.metr.full4.scaled
  )
)

# 看看模型拟合，发现模型还是没有收敛

# summary(site.cv0, .progressBar = FALSE)

5.2 更换为使用 glmmTMB 拟合

更换为使用 glmmTMB 拟合，因为 glmmTMB 使用更好的优化器 optimizer

不再报错了！
Chi-Squared = 3.665 with P-value = 0.16 and on 2 degrees of freedom 也正常显示
summary(site.cv0, .progressBar = FALSE) 需要粘贴到 R 控制台 Console 里运行

# 更换为使用 glmmTMB 拟合，因为 glmmTMB 使用更好的优化器 optimizer
site.cv0 <- psem(
  glmmTMB(
    cv_comm_site ~ synchrony_comm_site + mean_cv_species_site + bio15 +
      Gen_sampled + S + (1 | Metacom / Meta_troph),
    data = local.metr.full4.scaled,
    control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
  ),
  glmmTMB(
    synchrony_comm_site ~ S + Gen_sampled + (1 | Metacom / Meta_troph), # 通过更改 glmmTMBControl 也可以解决 isSingular 问题
    data = local.metr.full4.scaled,
    control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
  ),
  glmmTMB(
    mean_cv_species_site ~ S + bio15 + Gen_sampled + (1 | Metacom / Meta_troph),
    data = local.metr.full4.scaled,
    control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
  )
)

# 看看模型拟合

# summary(site.cv0, .progressBar = FALSE)

6 多组分析：解决 bug

mt.cv0 需要粘贴到 R 控制台 Console 里运行
得到最终 Fisher’s C = 6.354 with P-value = 0.174 and on 4 degrees of freedom

source("basisSet2.R")
source("multigroup2.R")

# 用 glmmTMB 拟合，但是又想做 multigroup 分析的小技巧
site.cv0$data$New_tr_g <- local.metr.full4.scaled$New_tr_g

# 多组分析
mt.cv0 <- multigroup2(
  site.cv0,
  group = "New_tr_g",
  standardize = "range"
)

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

# 还是有收敛问题的警告，但是目前我没办法做更多了
mt.cv0$Cstat |> print()

##   Fisher.C df P.Value
## 1    6.345  4   0.175

mt.cv0$global |> print()

##                Response           Predictor
## 7          cv_comm_site synchrony_comm_site
## 11         cv_comm_site                   S
## 16  synchrony_comm_site         Gen_sampled
## 22 mean_cv_species_site               bio15

mt.cv0$anovaInts |> print()

##                Response                     Predictor Test.Stat DF P.Value    
## 7          cv_comm_site  synchrony_comm_site:New_tr_g     285.6  1  0.0923    
## 8          cv_comm_site New_tr_g:mean_cv_species_site     285.6  1  0.0000 ***
## 9          cv_comm_site                New_tr_g:bio15     285.6  1  0.0000 ***
## 10         cv_comm_site          New_tr_g:Gen_sampled     285.6  1  0.0000 ***
## 11         cv_comm_site                    New_tr_g:S     285.6  1  0.1032    
## 15  synchrony_comm_site                    S:New_tr_g      33.4  1  0.0008 ***
## 16  synchrony_comm_site          New_tr_g:Gen_sampled      33.4  1  0.0854    
## 21 mean_cv_species_site                    S:New_tr_g       0.1  1  0.0000 ***
## 22 mean_cv_species_site                New_tr_g:bio15       0.1  1  0.0904    
## 23 mean_cv_species_site          New_tr_g:Gen_sampled       0.1  1  0.0045  **

mt.cv0$group.coefs |> print()

## $Primary
##                Response            Predictor Estimate Std.Error  DF Crit.Value
## 1          cv_comm_site  synchrony_comm_site   0.5796    0.0333 501    17.4028
## 2          cv_comm_site mean_cv_species_site   0.7381    0.1225  97     6.0252
## 3          cv_comm_site                bio15   0.0721    0.0845  97     0.8537
## 4          cv_comm_site          Gen_sampled  -0.2023    0.2444  97    -0.8278
## 5          cv_comm_site                    S   0.0820    0.0504 501     1.6280
## 6   synchrony_comm_site                    S  -0.1208    0.1412  97    -0.8553
## 7   synchrony_comm_site          Gen_sampled  -0.2032    0.0631 501    -3.2205
## 8  mean_cv_species_site                    S  -0.3374    0.0654  97    -5.1564
## 9  mean_cv_species_site                bio15   0.2267    0.0750 501     3.0230
## 10 mean_cv_species_site          Gen_sampled   0.5324    0.2864  97     1.8586
##    P.Value Std.Estimate      
## 1   0.0000        0.529 *** c
## 2   0.0000       0.4426 ***  
## 3   0.3933       0.0443      
## 4   0.4078      -0.0698      
## 5   0.1035       0.0597     c
## 6   0.3924      -0.0963      
## 7   0.0013      -0.0768  ** c
## 8   0.0000      -0.4093 ***  
## 9   0.0025        0.232  ** c
## 10  0.0631       0.3062      
## 
## $Secondary
##                Response            Predictor Estimate Std.Error  DF Crit.Value
## 1          cv_comm_site  synchrony_comm_site   0.5796    0.0333 501    17.4028
## 2          cv_comm_site mean_cv_species_site   0.5348    0.0652 208     8.2019
## 3          cv_comm_site                bio15   0.2252    0.0887 208     2.5389
## 4          cv_comm_site          Gen_sampled -85.7565   42.5758 208    -2.0142
## 5          cv_comm_site                    S   0.0820    0.0504 501     1.6280
## 6   synchrony_comm_site                    S  -0.3548    0.0876 208    -4.0520
## 7   synchrony_comm_site          Gen_sampled  -0.2032    0.0631 501    -3.2205
## 8  mean_cv_species_site                    S  -0.0814    0.0773 208    -1.0536
## 9  mean_cv_species_site                bio15   0.2267    0.0750 501     3.0230
## 10 mean_cv_species_site          Gen_sampled  67.3925   45.1464 208     1.4928
##    P.Value Std.Estimate      
## 1   0.0000       0.7427 *** c
## 2   0.0000       0.4504 ***  
## 3   0.0111       0.1627   *  
## 4   0.0440      -0.1726   *  
## 5   0.1035       0.0487     c
## 6   0.0001      -0.1645 ***  
## 7   0.0013       -3e-04  ** c
## 8   0.2921      -0.0575      
## 9   0.0025       0.1945  ** c
## 10  0.1355       0.1611      
## 
## $Tertiary
##                Response            Predictor Estimate Std.Error  DF Crit.Value
## 1          cv_comm_site  synchrony_comm_site   0.5796    0.0333 501    17.4028
## 2          cv_comm_site mean_cv_species_site   0.3647    0.0589 173     6.1919
## 3          cv_comm_site                bio15  -0.2048    0.0765 173    -2.6778
## 4          cv_comm_site          Gen_sampled -95.3207   41.8872 173    -2.2756
## 5          cv_comm_site                    S   0.0820    0.0504 501     1.6280
## 6   synchrony_comm_site                    S  -0.6758    0.1396 173    -4.8397
## 7   synchrony_comm_site          Gen_sampled  -0.2032    0.0631 501    -3.2205
## 8  mean_cv_species_site                    S  -0.1303    0.1953 173    -0.6673
## 9  mean_cv_species_site                bio15   0.2267    0.0750 501     3.0230
## 10 mean_cv_species_site          Gen_sampled -45.5872   87.2956 173    -0.5222
##    P.Value Std.Estimate      
## 1   0.0000       0.4898 *** c
## 2   0.0000       0.3353 ***  
## 3   0.0074      -0.1323  **  
## 4   0.0229      -0.1157   *  
## 5   0.1035        0.043     c
## 6   0.0000      -0.4190 ***  
## 7   0.0013       -3e-04  ** c
## 8   0.5046      -0.0743      
## 9   0.0025       0.1593  ** c
## 10  0.6015      -0.0602      
## 
## $Producers
##                Response            Predictor Estimate Std.Error  DF Crit.Value
## 1          cv_comm_site  synchrony_comm_site   0.5796    0.0333 501    17.4028
## 2          cv_comm_site mean_cv_species_site   0.8441    0.4189  23     2.0150
## 3          cv_comm_site                bio15  -0.1403    0.1560  23    -0.8990
## 4          cv_comm_site          Gen_sampled   0.0222    0.0562  23     0.3942
## 5          cv_comm_site                    S   0.0820    0.0504 501     1.6280
## 6   synchrony_comm_site                    S  -0.3341    0.4495  23    -0.7432
## 7   synchrony_comm_site          Gen_sampled  -0.2032    0.0631 501    -3.2205
## 8  mean_cv_species_site                    S  -0.0626    0.0490  23    -1.2772
## 9  mean_cv_species_site                bio15   0.2267    0.0750 501     3.0230
## 10 mean_cv_species_site          Gen_sampled  -0.0212    0.0951  23    -0.2229
##    P.Value Std.Estimate      
## 1   0.0000       0.7481 *** c
## 2   0.0439       0.3247   *  
## 3   0.3687      -0.1375      
## 4   0.6934       0.0423      
## 5   0.1035       0.0696     c
## 6   0.4574      -0.2196      
## 7   0.0013      -0.3004  ** c
## 8   0.2015      -0.1381      
## 9   0.0025       0.5777  ** c
## 10  0.8236      -0.1051

7 basisSet2.R 和 multigroup2.R 的代码

需要先保存为 basisSet2.R 和 multigroup2.R 两个文件，然后 source 调用

7.1 basisSet2.R

basisSet2 <- function(modelList.with.data, direction = NULL, interactions = FALSE) {
  amat <- getDAG(modelList.with.data)
  b <- lapply(1:nrow(amat), function(i) {
    lapply(
      i:ncol(amat),
      function(j) {
        if (amat[i, j] != 0 | i == j) {
          NULL
        } else {
          cvars <- c(rownames(amat)[amat[, rownames(amat)[i], drop = FALSE] == 1], rownames(amat)[amat[, rownames(amat)[j], drop = FALSE] == 1])
          cvars <- cvars[!duplicated(cvars)]
          c(rownames(amat)[i], rownames(amat)[j], cvars)
        }
      }
    )
  })
  b <- unlist(b, recursive = FALSE)
  b <- b[!sapply(b, is.null)]
  if (length(b) > 0) {
    b <- piecewiseSEM:::filterExogenous(b, modelList.with.data, amat)
    b <- piecewiseSEM:::filterSmoothed(b, modelList.with.data)
    b <- piecewiseSEM:::filterExisting(b, modelList.with.data)
    b <- piecewiseSEM:::filterInteractions(b, interactions)
    b <- piecewiseSEM:::removeCerror(b, modelList.with.data)
    b <- piecewiseSEM:::reverseAddVars(b, modelList.with.data, amat)
    b <- piecewiseSEM:::reverseNonLin(b, modelList.with.data, amat)
    b <- piecewiseSEM:::fixCatDir(b, modelList.with.data) # Here is the cause of the problem!
    if (!is.null(direction)) {
      b <- piecewiseSEM:::specifyDir(b, direction)
    }
  }
  class(b) <- "basisSet"
  return(b)
}

7.2 multigroup2.R

multigroup2 <- function(modelList, group, standardize = "scale", standardize.type = "latent.linear", test.type = "III") {
  name <- deparse(match.call()$modelList)
  data <- modelList$data
  # piecewiseSEM:::removeData(modelList, formulas = 1) remove data from modelList BUT piecewiseSEM:::fixCatDir(b, modelList) need `modelList$data`
  modelListRaw <- modelList
  modelList <- piecewiseSEM:::removeData(modelList, formulas = 1)
  intModelList <- lapply(modelList, function(i) {
    rhs2 <- paste(paste(piecewiseSEM:::all.vars_trans(i)[-1], "*", group), collapse = " + ")
    i <- update(i, formula(paste(". ~ ", rhs2)))
    return(i)
  })
  newModelList <- lapply(unique(data[, group]), function(i) {
    update(as.psem(modelList), data = data[data[, group] == i, ])
  })
  names(newModelList) <- unique(data[, group])
  coefsList <- lapply(
    newModelList,
    coefs,
    standardize,
    standardize.type,
    test.type
  )
  names(coefsList) <- unique(data[, group])
  coefTable <- coefs(modelList, standardize, standardize.type, test.type)
  anovaTable <- anova(as.psem(intModelList))[[1]]
  anovaInts <- anovaTable[grepl(":", anovaTable$Predictor), ]
  global <- anovaInts[anovaInts$P.Value >= 0.05, c("Response", "Predictor")]
  global$Predictor <- sub(":", "\\1", sub(group, "\\1", global$Predictor))
  if (nrow(global) == nrow(anovaInts)) {
    newCoefsList <- list(global = coefTable)
  } else {
    newCoefsList <- lapply(names(coefsList), function(i) {
      ct <- as.matrix(coefsList[[i]])
      idx <- which(apply(ct[, 1:2], 1, paste, collapse = "___") %in% apply(global[, 1:2], 1, paste, collapse = "___"))
      ct[idx, ] <- as.matrix(coefTable[idx, ])
      ct <- cbind(ct, ifelse(1:nrow(ct) %in% idx, "c", ""))
      for (j in 1:nrow(ct)) {
        if (ct[j, ncol(ct)] == "c") {
          model <- modelList[[which(sapply(piecewiseSEM:::listFormula(modelList), function(x) {
            piecewiseSEM:::all.vars.merMod(x)[1] == ct[j, "Response"]
          }))]]
          data. <- data[data[, group] == i, ]
          sd.x <- piecewiseSEM:::GetSDx(model, modelList, data., standardize)
          sd.x <- sd.x[which(names(sd.x) == ct[j, "Predictor"])]
          sd.y <- piecewiseSEM:::GetSDy(model, data., standardize, standardize.type)
          new.coef <- as.numeric(ct[j, "Estimate"]) * (sd.x / sd.y)
          ct[j, "Std.Estimate"] <- ifelse(length(new.coef) > 0, round(as.numeric(new.coef), 4), "-")
        }
      }
      ct <- as.data.frame(ct)
      ct[is.na(ct)] <- "-"
      names(ct)[(ncol(ct) - 1):ncol(ct)] <- ""
      return(ct)
    })
    names(newCoefsList) <- names(coefsList)
  }
  if (nrow(global) == nrow(anovaInts)) {
    gof <- piecewiseSEM::fisherC(modelList)
  } else {
    # b <- piecewiseSEM:::basisSet(modelList) # Cause of the problem
    b <- basisSet2(modelListRaw) # A solution
    cf <- coefTable[coefTable$Response %in% global$Response & coefTable$Predictor %in% global$Predictor, ]
    b <- lapply(
      X = b,
      FUN = function(i) {
        for (j in 3:length(i)) {
          value <- cf[cf$Response == i[2] & cf$Predictor == i[j], "Estimate"]
          if (length(value) != 0) {
            i[j] <- paste0("offset(", value, "*", i[j], ")")
          }
        }
        return(i)
      }
    )
    if (length(b) == 0) {
      b <- NULL
    }
    gof <- fisherC(modelList, basis.set = b)
  }
  ret <- list(
    name = name,
    group = group,
    global = global,
    anovaInts = anovaInts,
    group.coefs = newCoefsList,
    Cstat = gof
  )
  class(ret) <- "multigroup.psem"
  return(ret)
}

结语

代码和数据（原代码中有数据预处理步骤）都是全部公开的。如果您想“即得即用”——获取此份代码的 .R 文件和 .csv 数据，请打赏或帮助转发：

打赏 3 元或以上，或
转发票圈保持好友可见 12 小时以上，并且点文末右下角的“在看”，或
转发到群聊（最好是科研相关的群），并且点“在看”

然后，私信我留下邮箱即可获取（完成后，务必留下邮箱）

更多独家代码请查看本公众号其他推送：

① R 语言｜全球独家｜鼠标修改布局｜结构方程模型可视化工具教程

② 结构方程模型｜可视化升级：完全用 R 语言实现 Manuel Delgado-Baquerizo 风格的 SEM 图

③ R 语言结构方程模型：尝试用 R 画一个 PNAS 文章的 SEM 图

④ R 语言 SEM 笔记：复合变量（composite）的构造案例

⑤ R 语言 SEM 笔记：使用 ggplot2 衍生包画 Nature 结构方程模型图

⑥ R 语言 SEM：结构方程模型合集

继续滑动看下一个

ecologyR

向上滑动看下一个

宾曰语云被法学教授投诉：严重侵权，“违法犯罪”！

当“上帝”变为“老天爷”

京东Plus的隐藏特权，很多会员都没领取，白交了会员费...

呼吁四川大学澄清：1998年1月，川大有多少个“姜涛与爱人程月玲”？

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

R 语言 SEM 笔记：结构方程模型的多组分析（Multigroup Analysis）

目录

引言

1 包

2 数据

3 拟合模型：原文代码

4 多组分析：出现 bug

5 分析问题

5.1 随机效应指定问题以及 nlme 的 lme 函数收敛问题

5.2 更换为使用 glmmTMB 拟合

6 多组分析：解决 bug

7 basisSet2.R 和 multigroup2.R 的代码

7.1 basisSet2.R

7.2 multigroup2.R

结语

您可能也对以下帖子感兴趣

宾曰语云被法学教授投诉：严重侵权，“违法犯罪”！

当“上帝”变为“老天爷”

京东Plus的隐藏特权，很多会员都没领取，白交了会员费...

呼吁四川大学澄清：1998年1月，川大有多少个“姜涛与爱人程月玲”？

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

生成图片，分享到微信朋友圈

R 语言 SEM 笔记：结构方程模型的多组分析（Multigroup Analysis）

目录

引言

1 包

2 数据

3 拟合模型：原文代码

4 多组分析：出现 bug

5 分析问题

5.1 随机效应指定问题以及 nlme 的 lme 函数收敛问题

5.2 更换为使用 glmmTMB 拟合

6 多组分析：解决 bug

7 basisSet2.R 和 multigroup2.R 的代码

7.1 basisSet2.R

7.2 multigroup2.R

结语

您可能也对以下帖子感兴趣