查看原文
其他

复现丨基于R的多项式回归

萜妹 萜心话 2023-05-14

小可爱们好,我终于找到了篇涉及多项式回归且公开了数据与代码的文章,所以我们今天一起来复现一下吖。

Flynn, F. J., & Lide, C. R. (2022). Communication Miscalibration: The Price Leaders Pay for Not Sharing Enough. Academy of Management Journal. https://doi.org/10.5465/amj.2021.0245

整体介绍

这篇文章想要证明:在对领导沟通错误评估时,领导沟通不足比领导沟通过度更可能形成领导缺乏同情心的感知,进而影响到员工对领导能力的评估。

Study 3 Result

假设分别为:

  • 假设1:管理者可能更常被视为沟通不足而非沟通过度。
  • 假设2:当领导者的沟通被认为是良好的,员工会比被误判时提供更多有利的评价。
  • 假设3a:沟通不足的领导者被认为比沟通过度的领导者具有更少的同理心。
  • 假设3b:沟通不足的领导者被认为比沟通过度的领导者具有更低的领导能力。
  • 假设4:沟通不足对领导力评价的负面影响将被感知到的缺乏同理心所中介。

各研究分别为:

  • 研究1a:使用360评估的档案数据,人工编码沟通不足与沟通过度,检验假设1;
  • 研究1b:招募了400名被试,采用量表衡量与领导的沟通情况(1=沟通不足,5,=沟通良好,9=沟通过度),检验假设1;
  • 研究2:招募了350名被试,操纵了领导的沟通情况(3种),采用单因素方差分析检验假设2-4;
  • 研究3:采用多项式回归的方式,检验假设1-4。

本次主要分享多项式回归的处理方式,仅以Study 3 为例。

公开数据

获取地址:https://osf.io/fkqhb/?view_only=9415b6468fbf4e8f9041af0dac2c8d43

公开数据

作者非常nice的直接分享了.Rmd文件,我们下载后打开即可。

结果复现

软件准备

安装R包

打开.Rmd文件后,Rstudio大概率会提醒你,有未安装的包。我们直接点击【Install】。

安装R包

前期准备

# set knitting options
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(comment = NA)

这里的意思是设置knitr参数,作用是加强代码的可读性,和数据分析无关。

# load packages
library("knitr") # for rendering the RMarkdown file
library("kableExtra") # for rendering tables
library("broom") # for tyding-up data sets
library("tibble") # explore variables & values in the data frame
library("magrittr") # use pipe operator
library("patchwork") # for organizing plots
library("corrr") # for correlations
library("GGally") # to vizualize relationships between variables
library("mediation") # for mediation analysis
library("emmeans") # for planned contrasts
library("reghelper") # for simple slopes analysis
library("lme4") # for linear mixed effects models
library("lavaan") # for mediation
library("semPlot") # for mediation
library("ggpubr") # for plots
library("car")
library("RSA")
library("Rmisc") # for error bars on plots
library("rstatix") # for basic stats tests
library("tidyverse") # data wrangling and visualization

加载R包。

# set graph themes
source("psych290RcodeV1.R")

这一步我运行时会报错,大概是我没有"psych290RcodeV1.R"这个文件吧。但这也不影响数据分析。

# IMPORT CLEANED DATA
df.data <- read.csv("data_Study3.csv")

作者原始语句是读取"data/data_Study3.csv",但是公开数据中.Rmd和.csv 是在同一个文件夹里,所以我把语句调整了下。

假设1

# paired t-test
t.test(df.data$actual, df.data$ought,
paired = TRUE)

对actual和ought列进行配对样本t检验。

Result

由此可得文中橙色部分数据。

文中结果
## group-level descriptive stats
df.data %>%
select(actual, ought) %>%
get_summary_stats(., type = "mean_sd") %>%
select(-n)

对actual和ought列进行描述性统计,由此可得黄色部分数据。

Result
# effect size
## transform data
df.data_long <- df.data %>%
select(participant, ought, actual) %>%
pivot_longer(., cols = c(ought, actual),
names_to = "scale")

将宽型数据(即一个ID拥有多个变量,此处为222行)转换为长型数据(即一个ID与一个变量为一行,此处形成444行)。

数据类型
## calculate cohen's d
cohens_d(df.data_long, value ~ scale, paired = TRUE)

计算组间差异效应量cohen's d,由此可得绿色部分数据。

假设2-一致性更优

数据准备

#-------------------
# DATA PREPARATION
#-------------------
df <- df.data %>%
select(ability, empathy, X = actual, Y = ought)

选择df.data文件中的【ability, empathy, actual, ought】四列形成一个新的数据文件,名为【df】,其中将actual改命名为X,ought改命名为Y。

# # center predictors at the grand mean
grandmean <- mean(c(df$X, df$Y), na.rm=T)
df$X <- df$X-grandmean
df$Y <- df$Y-grandmean

此处为中心化。语句含义为:计算X与Y的均值,再用原X/Y减去均值,形成新的X/Y。

# add squared and interaction terms of the predictors (required to inspect multicollinearity)
df$X2 <- df$X^2
df$XY <- df$X*df$Y
df$Y2 <- df$Y^2

此处为计算平方项与交互项。

# inspect potential multicollinearity using the variance inflation factor (VIF),
# (see Fox, 2016 for a discussion of VIFs and their cutoffs)
lm_sqd <- lm(ability ~ X + Y + X2 + XY + Y2, data=df)
vif(lm_sqd)

使用方差膨胀因子(VIF)检查潜在的多重共线性。具体的衡量标准,可以去看Fox(2016)。

Result

作者这里的结果都没超过3,所以是不存在多重共线性的。

检验假设

#------------------------------------------------------------------------
# TEST THE CONGRUENCE HYPOTHESIS FOR THE OUTCOME VARIABLE ability
#------------------------------------------------------------------------

# ESTIMATE THE POLYNOMIAL MODEL
rsa_sqd <- RSA(ability ~ X*Y, data=df, model="full")

这里就是把我们传统的回归分析(lm或者glm)改成了多项式回归的分析方式(RSA)。

含义依旧是对结果变量【ability】做关于X与Y的多项式回归,数据来自【df】,model选择“full”,最终将这个分析的结果储存为【rsa_sqd】,方便后续调用。

# GET PARAMETERS
## congruence hypothesis supported if...
### 1) p10 is n.s.
### 2) p11 confidence interval includes 1
### 3) a3 is n.s.
### 4) a4 is significant and negative


getPar(rsa_sqd) %>%
filter(label %in% c("b0", "b1", "b2", "b3", "b4", "b5",
"p10", "p11",
"a1", "a2", "a3", "a4")) %>%
mutate_at(.vars = vars(c(est, se, pvalue, ci.lower, ci.upper)),
.funs = ~ round(., 4)) %>%
select(-z)

### *NOTE* - a1 is significant, meaning there is a linear common main effect of X and Y (i.e., the effect of congruence on perceived ability strengthens as employees report greater quantities of actual and expected communication)

这里啥也不用改,直接运行即可。

Result

关于一致性假设成立的条件是:①不一致性线(L=-T)上 ,斜率(a3)不显著,曲率(a4)为负且显著;②第一主轴没有偏离一致性线(L=T),即斜率 p 11=1、截距p10=0。所以这里的假设得到了验证。

文中结果
文中结果

绘制图形

#------------------------------------------------------------------------
# PLOTS
#------------------------------------------------------------------------

# ESTIMATED MODEL
plot(rsa_sqd, legend=F, param=F,
axes=c("LOC", "LOIC","PA1"),
project=c("LOC", "LOIC","PA1"),
xlab="Perceived\nQuantity", ylab="Preferred\nQuantity", zlab="Perceived Ability")

ESTIMATED MODEL

作者文章中的图就是这张。

# CONTOUR MODEL
plot(rsa_sqd, type="contour",
xlab="Actual X", ylab="Ought Y", zlab="Ability Z")
CONTOUR MODEL
# # INTERACTIVE MODEL
plot(rsa_sqd, type="interactive",
xlab="Actual X", ylab="Ought Y", zlab="Ability Z")
INTERACTIVE MODEL

后面这两张只是不同的作图形式,并非一定要汇报。

假设3b-不对称性

# t-test
df.data %>%
filter(communication %in% c("over","under")) %>%
t.test(ability ~ communication, data = .,
var.equal = TRUE)

## group-level descriptive stats
df.data %>%
filter(communication %in% c("over","under")) %>%
group_by(communication) %>%
get_summary_stats(ability, type = "mean_sd") %>%
select(-n)

这里是独立样本t检验,大致上和H1相同,就不展开啦。

# effect size
lsr::cohensD(df.data$ability[df.data$communication == "under"],
df.data$ability[df.data$communication == "over"])

换了一种方式计算组间差异效应量cohen's d,结果不再是表,而是一个具体的数字。

文中结果
# visualization
ability_errorBars <- summarySE(df.data, measurevar = "ability", groupvars = "communication") %>%
mutate_at(.vars = vars(communication),
.funs = ~ recode(., `under` = "Under-Communicates",
`calibrated` = "Well-Calibrated",
`over` = "Over-Communicates")) %>%
mutate_at(.vars = vars(communication),
.funs = ~ factor(., levels = c("Under-Communicates", "Well-Calibrated", "Over-Communicates")))

ggplot(data = ability_errorBars,
mapping = aes(x = communication,
y = ability)) +
geom_bar(position = "dodge", stat = "identity", color = "black", width = .5) +
geom_errorbar(aes(ymin = ability-se, ymax = ability+se),
width = .15, position = position_dodge(.5)) +
theme_Publication_leaveLegend() +
scale_fill_grey() +
coord_cartesian(ylim = c(0,8)) +
scale_y_continuous(limits = c(0,8), expand = c(0,0)) +
labs(x = "\nLeader Communication Calibration",
y = "Perceived Ability\n",
title = "")
# ggsave("Study 3_leader ability_withSE.png", dpi = 800)

可视化的部分,因为我前面没办法加载出"psych290RcodeV1.R",所以会报错。等我搞定再复现它。

假设4-中介效应

# define the block variable function
block.poly = function(y, m, x, z, controls, data, boot.n) {
require(lavaan)
# Setup the variables that will be used in the analysis
data$b_y = data[,y]
data$b_m = data[,m]
data$b_x = data[,x]
data$b_z = data[,z]

# standardize and create the polynomial components
data$b_x_std = scale(data$b_x, center=T, scale=T)
data$b_z_std = scale(data$b_z, center=T, scale=T)
data$b_x_std_sq = data$b_x_std^2
data$b_z_std_sq = data$b_z_std^2
data$b_x_stdXb_z_std = data$b_x_std*data$b_z_std

# First, create the block variable for the polynomial components predicting the mediator
frmla = as.formula("b_m ~ b_x_std + b_z_std + b_x_std_sq + b_x_stdXb_z_std + b_z_std_sq")
bl_1 = lm(frmla, data=data)
bl_1_co = summary(bl_1)$coefficients[2:6,1]
bl_1_vars = names(bl_1_co)

# Second, create the block info predicting the outcome
frmla = as.formula(paste(y,"~",paste0(controls, collapse="+"),"+b_x_std + b_z_std + b_x_std_sq + b_x_stdXb_z_std + b_z_std_sq", sep=""))
bl_2 = lm(frmla, data=data)
bl_2_co = summary(bl_2)$coefficients[(length(controls)+2):(length(controls)+6),1]
bl_2_vars = names(bl_2_co)

data$bl_1_var = 0
data$bl_2_var = 0
for(i in 1:5) {
data$bl_1_var = data$bl_1_var + bl_1_co[i]*data[,bl_1_vars[i]]
data$bl_2_var = data$bl_2_var + bl_2_co[i]*data[,bl_2_vars[i]]
}

line1 = paste("b_m ~ ",paste0(controls, collapse="+"), "+ b1*bl_1_var", sep="")
line2 = paste("b_y ~ ",paste0(controls, collapse="+"), "+ bl_2_var + b2*b_m", sep="")
line3 = "a1 := b1*b2"
path.model = paste(line1, line2, line3, sep="\n")
o = sem(model=path.model, data=data, se="boot", bootstrap=boot.n)
return(o)
}

# the defined parameter a1 is the indirect effect of the congruence variables (perceived and preferred communication quantity) on the ultimate outcome (perceived ability) through the mediator (perceived empathy)

这个部分就是固定的,相当于设置一个固定程序,我们在使用时只需要定义好x、m、y、z以及对应的数据文件即可。

# 5000 bootstrapped samples (no controls)
out = block.poly(y="ability",m="empathy",x="actual",z="ought", controls=NULL,
data=df.data, boot.n=5000)
parameterestimates(out, standardized = TRUE) %>%
filter(label %in% c("b1", "b2", "a1")) %>%
select(label, unstd_b = est, std_B = std.all, se, pvalue, ci.lower, ci.upper)
Result
文中结果
# 5000 bootstrapped samples (controlling for job satisfaction)
out_controlJS = block.poly(y="ability",m="empathy",x="actual",z="ought", controls="job_satisfaction",data=df.data, boot.n=5000)
parameterestimates(out_controlJS, standardized = TRUE) %>%
filter(label %in% c("b1", "b2", "a1")) %>%
select(label, unstd_b = est, std_B = std.all, se, pvalue, ci.lower, ci.upper)

如果有控制变量,就在controls=""中加入。

其他

作者在最后提供了描述性统计和信度等信息,我们这里受限于篇幅,也就不展开啦,有兴趣的小可爱可以自己尝试哦。


啦啦啦,这篇推送就到这里了。它比我预期耗时久多了,所以又拖到了现在TAT。

但是实操一遍后,萜妹还是很有收获的。强烈推荐对多项式回归感兴趣的小可爱们试试。

参考文献:

Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models, 3e. Thousand Oaks, CA: Sage Publications

下期预告:《范文丨顶刊引言精读(十)》

往期推送:

干货丨多项式回归之操作篇
范文丨多项式回归研究Change
复现丨基于R的实验检验
复现丨带置信区间的调节效应图

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存