其他
R 语言 SEM 笔记:复合变量(composite)的构造案例
引言
复合变量(composite variable),最经典的论文可能是 James B. Grace 发表于 Nature 的,
# Integrative modelling reveals mechanisms linking productivity and plant species richness
什么是复合变量?
复合变量是结构方程建模中除潜在变量外的另一种处理复杂多元概念的方法。与潜变量之间最重要的区别在于:潜在变量表示不可观测的概念的可测量表现,而复合变量则产生于测量变量的总综合影响。
https://jslefche.github.io/sem_book/composite-variables.html#constructing-a-composite-variable
参考以下结构方程模型初学者必看资料,作者是 Jon Lefcheck,他也是 piecewiseSEM 包的作者,
# https://jslefche.github.io/sem_book
1、工具包
library(ggplot2)
library(ggpmisc)
library(piecewiseSEM)
library(lavaan)
library(semPlot)
2、复合变量构造
参考 https://jslefche.github.io/sem_book
里的案例,但此处数据标准化了,
# https://jslefche.github.io/sem_book/composite-variables.html#grace-dat.keeley-revisited-a-worked-example
"composite <~ 3.11 * cover + -2.14 * coversq
rich ~ composite + firesev
cover ~ firesev
cover ~~ coversq
firesev ~~ coversq
" -> model
注意到,cover 与 rich 的关系可能是非线性的,因此引入二次项,
dat.keeley <- piecewiseSEM::keeley
# 非线性关系
theme_set(
cowplot::theme_minimal_grid(
font_family = "Open Sans",
font_size = 15,
line_size = 0.7
)
)
dat.keeley |>
ggplot(aes(cover, rich)) +
geom_point(
size = 2,
color = "dodgerblue"
) +
geom_smooth(
method = "lm",
formula = y ~ poly(x, 2),
color = "dodgerblue",
fill = "dodgerblue",
linewidth = 1,
alpha = 0
) +
stat_poly_eq(
method = "lm",
formula = y ~ poly(x, 2),
use_label(c("r2", "p")),
size = 5,
color = "dodgerblue",
family = "Open Sans"
)
计算二次项,并加入数据,
# polynomial term
dat.keeley$coversq <- dat.keeley$cover ^ 2
数据标准化,
# scale the data
dat.keeley <- data.frame(scale(dat.keeley))
使用 lavaan 拟合,
# fit SEM
sem.fit <- sem(model, data = dat.keeley)
summary(sem.fit)
## lavaan 0.6.15 ended normally after 45 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 90
##
## Model Test User Model:
##
## Test statistic 1.844
## Degrees of freedom 1
## P-value (Chi-square) 0.174
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Composites:
## Estimate Std.Err z-value P(>|z|)
## composite <~
## cover 58.000
## coversq -28.578
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## rich ~
## composite 0.008 0.003 2.386 0.017
## firesev -0.248 0.107 -2.312 0.021
## cover ~
## firesev -0.437 0.095 -4.611 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .cover ~~
## coversq 0.794 0.121 6.588 0.000
## coversq ~~
## firesev -0.376 0.112 -3.369 0.001
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .rich 0.804 0.120 6.708 0.000
## .cover 0.800 0.119 6.708 0.000
## coversq 0.989 0.147 6.708 0.000
## firesev 0.989 0.147 6.708 0.000
## composite 0.000
3、绘图
# plot SEM
par(family = "Open Sans", cex = 1.3)
semPaths(
sem.fit,
what = "std",
whatLabels = "std",
residuals = FALSE,
# style = "lisrel",
layout = "tree2",
fade = FALSE,
edge.color = "dodgerblue3",
edge.label.cex = 1,
label.cex = 2,
label.prop = 0.8,
edge.label.color = "black",
nCharNodes = 20, # Number of characters to abbreviate node labels
shapeMan = "square",
sizeMan = 5,
curve = 1
)
结语
因为,我们一般认为 rich
服从泊松分布(poisson),该模型还可以继续修改?