复现丨基于R的多项式回归

Original 萜妹萜心话 2023-05-14

收录于合集

小可爱们好，我终于找到了篇涉及多项式回归且公开了数据与代码的文章，所以我们今天一起来复现一下吖。

Flynn, F. J., & Lide, C. R. (2022). Communication Miscalibration: The Price Leaders Pay for Not Sharing Enough. Academy of Management Journal. https://doi.org/10.5465/amj.2021.0245

整体介绍

这篇文章想要证明：在对领导沟通错误评估时，领导沟通不足比领导沟通过度更可能形成领导缺乏同情心的感知，进而影响到员工对领导能力的评估。

Study 3 Result

假设分别为：

假设1：管理者可能更常被视为沟通不足而非沟通过度。
假设2：当领导者的沟通被认为是良好的，员工会比被误判时提供更多有利的评价。
假设3a：沟通不足的领导者被认为比沟通过度的领导者具有更少的同理心。
假设3b：沟通不足的领导者被认为比沟通过度的领导者具有更低的领导能力。
假设4：沟通不足对领导力评价的负面影响将被感知到的缺乏同理心所中介。

各研究分别为：

研究1a：使用360评估的档案数据，人工编码沟通不足与沟通过度，检验假设1；
研究1b：招募了400名被试，采用量表衡量与领导的沟通情况（1=沟通不足，5,=沟通良好，9=沟通过度），检验假设1；
研究2：招募了350名被试，操纵了领导的沟通情况（3种），采用单因素方差分析检验假设2-4；
研究3：采用多项式回归的方式，检验假设1-4。

本次主要分享多项式回归的处理方式，仅以Study 3 为例。

公开数据

获取地址：https://osf.io/fkqhb/?view_only=9415b6468fbf4e8f9041af0dac2c8d43

公开数据

作者非常nice的直接分享了.Rmd文件，我们下载后打开即可。

结果复现

软件准备

安装R包

打开.Rmd文件后，Rstudio大概率会提醒你，有未安装的包。我们直接点击【Install】。

安装R包

前期准备

# set knitting options 
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(comment = NA)

这里的意思是设置knitr参数，作用是加强代码的可读性，和数据分析无关。

# load packages
library("knitr")        # for rendering the RMarkdown file
library("kableExtra")   # for rendering tables 
library("broom")        # for tyding-up data sets
library("tibble")       # explore variables & values in the data frame
library("magrittr")     # use pipe operator
library("patchwork")    # for organizing plots
library("corrr")        # for correlations
library("GGally")       # to vizualize relationships between variables 
library("mediation")    # for mediation analysis
library("emmeans")      # for planned contrasts
library("reghelper")    # for simple slopes analysis
library("lme4")         # for linear mixed effects models
library("lavaan")       # for mediation
library("semPlot")      # for mediation
library("ggpubr")       # for plots
library("car")
library("RSA")
library("Rmisc")        # for error bars on plots
library("rstatix")      # for basic stats tests
library("tidyverse")    # data wrangling and visualization

加载R包。

# set graph themes
source("psych290RcodeV1.R")

这一步我运行时会报错，大概是我没有"psych290RcodeV1.R"这个文件吧。但这也不影响数据分析。

# IMPORT CLEANED DATA 
df.data <- read.csv("data_Study3.csv")

作者原始语句是读取"data/data_Study3.csv"，但是公开数据中.Rmd和.csv 是在同一个文件夹里，所以我把语句调整了下。

假设1

# paired t-test
t.test(df.data$actual, df.data$ought,
       paired = TRUE)

对actual和ought列进行配对样本t检验。

Result

由此可得文中橙色部分数据。

文中结果

    ## group-level descriptive stats
    df.data %>% 
      select(actual, ought) %>% 
      get_summary_stats(., type = "mean_sd") %>% 
      select(-n)

对actual和ought列进行描述性统计，由此可得黄色部分数据。

Result

# effect size 
    ## transform data 
    df.data_long <- df.data %>% 
      select(participant, ought, actual) %>% 
      pivot_longer(., cols = c(ought, actual),
                   names_to = "scale")

将宽型数据（即一个ID拥有多个变量，此处为222行）转换为长型数据（即一个ID与一个变量为一行，此处形成444行）。

数据类型

    ## calculate cohen's d
    cohens_d(df.data_long, value ~ scale, paired = TRUE)

计算组间差异效应量cohen's d，由此可得绿色部分数据。

假设2-一致性更优

数据准备

#-------------------
# DATA PREPARATION
#-------------------
df <- df.data %>% 
  select(ability, empathy, X = actual, Y = ought)

选择df.data文件中的【ability, empathy, actual, ought】四列形成一个新的数据文件，名为【df】，其中将actual改命名为X，ought改命名为Y。

# # center predictors at the grand mean
grandmean <- mean(c(df$X, df$Y), na.rm=T)
df$X <- df$X-grandmean
df$Y <- df$Y-grandmean

此处为中心化。语句含义为：计算X与Y的均值，再用原X/Y减去均值，形成新的X/Y。

# add squared and interaction terms of the predictors (required to inspect multicollinearity)
df$X2 <- df$X^2
df$XY <- df$X*df$Y
df$Y2 <- df$Y^2

此处为计算平方项与交互项。

# inspect potential multicollinearity using the variance inflation factor (VIF), 
# (see Fox, 2016 for a discussion of VIFs and their cutoffs)
lm_sqd <- lm(ability ~ X + Y + X2 + XY + Y2, data=df)
vif(lm_sqd)

使用方差膨胀因子（VIF）检查潜在的多重共线性。具体的衡量标准，可以去看Fox(2016)。

Result

作者这里的结果都没超过3，所以是不存在多重共线性的。

检验假设

#------------------------------------------------------------------------
# TEST THE CONGRUENCE HYPOTHESIS FOR THE OUTCOME VARIABLE ability
#------------------------------------------------------------------------

# ESTIMATE THE POLYNOMIAL MODEL
rsa_sqd <- RSA(ability ~ X*Y, data=df, model="full")

这里就是把我们传统的回归分析（lm或者glm）改成了多项式回归的分析方式（RSA）。

含义依旧是对结果变量【ability】做关于X与Y的多项式回归，数据来自【df】，model选择“full”，最终将这个分析的结果储存为【rsa_sqd】，方便后续调用。

# GET PARAMETERS 
## congruence hypothesis supported if...
### 1) p10 is n.s. 
### 2) p11 confidence interval includes 1 
### 3) a3 is n.s.
### 4) a4 is significant and negative


getPar(rsa_sqd) %>% 
  filter(label %in% c("b0", "b1", "b2", "b3", "b4", "b5", 
                      "p10", "p11", 
                      "a1", "a2", "a3", "a4")) %>% 
  mutate_at(.vars = vars(c(est, se, pvalue, ci.lower, ci.upper)),
            .funs = ~ round(., 4)) %>% 
  select(-z)

### *NOTE* - a1 is significant, meaning there is a linear common main effect of X and Y (i.e., the effect of congruence on perceived ability strengthens as employees report greater quantities of actual and expected communication)

这里啥也不用改，直接运行即可。

Result

关于一致性假设成立的条件是：①不一致性线(L=-T)上，斜率（a3）不显著，曲率（a4）为负且显著；②第一主轴没有偏离一致性线(L=T)，即斜率 p 11=1、截距p10=0。所以这里的假设得到了验证。

文中结果

绘制图形

#------------------------------------------------------------------------
# PLOTS 
#------------------------------------------------------------------------

# ESTIMATED MODEL
plot(rsa_sqd, legend=F, param=F,
     axes=c("LOC", "LOIC","PA1"),
     project=c("LOC", "LOIC","PA1"),
     xlab="Perceived\nQuantity", ylab="Preferred\nQuantity", zlab="Perceived Ability")

ESTIMATED MODEL

作者文章中的图就是这张。

# CONTOUR MODEL 
plot(rsa_sqd, type="contour", 
     xlab="Actual X", ylab="Ought Y", zlab="Ability Z")

CONTOUR MODEL

# # INTERACTIVE MODEL 
plot(rsa_sqd, type="interactive", 
      xlab="Actual X", ylab="Ought Y", zlab="Ability Z")

INTERACTIVE MODEL

后面这两张只是不同的作图形式，并非一定要汇报。

假设3b-不对称性

# t-test
df.data %>% 
  filter(communication %in% c("over","under")) %>% 
  t.test(ability ~ communication, data = .,
         var.equal = TRUE)

    ## group-level descriptive stats
    df.data %>% 
      filter(communication %in% c("over","under")) %>% 
      group_by(communication) %>% 
      get_summary_stats(ability, type = "mean_sd") %>% 
      select(-n)

这里是独立样本t检验，大致上和H1相同，就不展开啦。

# effect size 
lsr::cohensD(df.data$ability[df.data$communication == "under"],
       df.data$ability[df.data$communication == "over"])

换了一种方式计算组间差异效应量cohen's d，结果不再是表，而是一个具体的数字。

文中结果

# visualization 
ability_errorBars <- summarySE(df.data, measurevar = "ability", groupvars = "communication") %>% 
  mutate_at(.vars = vars(communication), 
            .funs = ~ recode(., `under` = "Under-Communicates",
                             `calibrated` = "Well-Calibrated",
                             `over` = "Over-Communicates")) %>% 
  mutate_at(.vars = vars(communication),
            .funs = ~ factor(., levels = c("Under-Communicates", "Well-Calibrated", "Over-Communicates")))

ggplot(data = ability_errorBars,
       mapping = aes(x = communication,
                     y = ability)) +
  geom_bar(position = "dodge", stat = "identity", color = "black", width = .5) +
  geom_errorbar(aes(ymin = ability-se, ymax = ability+se), 
                width = .15, position = position_dodge(.5)) +
  theme_Publication_leaveLegend() + 
  scale_fill_grey() +
  coord_cartesian(ylim = c(0,8)) +
  scale_y_continuous(limits = c(0,8), expand = c(0,0)) +
  labs(x = "\nLeader Communication Calibration",
       y = "Perceived Ability\n",
       title = "") 
# ggsave("Study 3_leader ability_withSE.png", dpi = 800)

可视化的部分，因为我前面没办法加载出"psych290RcodeV1.R"，所以会报错。等我搞定再复现它。

假设4-中介效应

# define the block variable function 
block.poly = function(y, m, x, z, controls, data, boot.n) {
	require(lavaan)
	# Setup the variables that will be used in the analysis
	data$b_y = data[,y]
	data$b_m = data[,m]
	data$b_x = data[,x]
	data$b_z = data[,z]
	
	# standardize and create the polynomial components
	data$b_x_std = scale(data$b_x, center=T, scale=T)
	data$b_z_std = scale(data$b_z, center=T, scale=T)	
	data$b_x_std_sq = data$b_x_std^2
	data$b_z_std_sq = data$b_z_std^2
	data$b_x_stdXb_z_std = data$b_x_std*data$b_z_std

	# First, create the block variable for the polynomial components predicting the mediator
	frmla = as.formula("b_m ~ b_x_std + b_z_std + b_x_std_sq + b_x_stdXb_z_std + b_z_std_sq")
	bl_1 = lm(frmla, data=data)
	bl_1_co = summary(bl_1)$coefficients[2:6,1]
	bl_1_vars = names(bl_1_co)
	
	# Second, create the block info predicting the outcome
	frmla = as.formula(paste(y,"~",paste0(controls, collapse="+"),"+b_x_std + b_z_std + b_x_std_sq + b_x_stdXb_z_std + b_z_std_sq", sep=""))
	bl_2 = lm(frmla, data=data)
	bl_2_co = summary(bl_2)$coefficients[(length(controls)+2):(length(controls)+6),1] 
	bl_2_vars = names(bl_2_co)
	
	data$bl_1_var = 0
	data$bl_2_var = 0
	for(i in 1:5) {
		data$bl_1_var = data$bl_1_var + bl_1_co[i]*data[,bl_1_vars[i]]
		data$bl_2_var = data$bl_2_var + bl_2_co[i]*data[,bl_2_vars[i]]	
	}	
	
	line1 = paste("b_m ~ ",paste0(controls, collapse="+"), "+ b1*bl_1_var", sep="")
	line2 = paste("b_y ~ ",paste0(controls, collapse="+"), "+ bl_2_var + b2*b_m", sep="")	
	line3 = "a1 := b1*b2"
	path.model = paste(line1, line2, line3, sep="\n")
	o = sem(model=path.model, data=data, se="boot", bootstrap=boot.n)
	return(o)
}

# the defined parameter a1 is the indirect effect of the congruence variables (perceived and preferred communication quantity) on the ultimate outcome (perceived ability) through the mediator (perceived empathy)

这个部分就是固定的，相当于设置一个固定程序，我们在使用时只需要定义好x、m、y、z以及对应的数据文件即可。

# 5000 bootstrapped samples (no controls)
out = block.poly(y="ability",m="empathy",x="actual",z="ought", controls=NULL,
                 data=df.data, boot.n=5000)
parameterestimates(out, standardized = TRUE) %>% 
  filter(label %in% c("b1", "b2", "a1")) %>% 
  select(label, unstd_b = est, std_B = std.all, se, pvalue, ci.lower, ci.upper)

Result

文中结果

# 5000 bootstrapped samples (controlling for job satisfaction)
out_controlJS = block.poly(y="ability",m="empathy",x="actual",z="ought", controls="job_satisfaction",data=df.data, boot.n=5000)
parameterestimates(out_controlJS, standardized = TRUE) %>% 
  filter(label %in% c("b1", "b2", "a1")) %>% 
  select(label, unstd_b = est, std_B = std.all, se, pvalue, ci.lower, ci.upper)

如果有控制变量，就在controls=""中加入。

其他

作者在最后提供了描述性统计和信度等信息，我们这里受限于篇幅，也就不展开啦，有兴趣的小可爱可以自己尝试哦。

啦啦啦，这篇推送就到这里了。它比我预期耗时久多了，所以又拖到了现在TAT。

但是实操一遍后，萜妹还是很有收获的。强烈推荐对多项式回归感兴趣的小可爱们试试。

参考文献：

Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models, 3e. Thousand Oaks, CA: Sage Publications

下期预告：《范文丨顶刊引言精读（十）》

往期推送：

干货丨多项式回归之操作篇

范文丨多项式回归研究Change

复现丨基于R的实验检验

复现丨带置信区间的调节效应图

又忘关摄像头了！村官和妇女主任激情戏再度上演……

西安8家物业服务企业违法违规被处罚！

“民生系”金融圈隐秘大佬屡次陷内斗，究竟是宿命还是另有隐情？

请关注玉溪大营老清真寺所遭遇的不公正对待！

Dior变色唇膏69元到手两支！一抹即变玻璃唇，秒变时尚girl！

复现丨基于R的多项式回归

整体介绍

公开数据

结果复现

软件准备

安装R包

前期准备

假设1

假设2-一致性更优

数据准备

检验假设

绘制图形

假设3b-不对称性

假设4-中介效应

其他

您可能也对以下帖子感兴趣

又忘关摄像头了！村官和妇女主任激情戏再度上演……

西安8家物业服务企业违法违规被处罚！

“民生系”金融圈隐秘大佬屡次陷内斗，究竟是宿命还是另有隐情？

请关注玉溪大营老清真寺所遭遇的不公正对待！

Dior变色唇膏69元到手两支！一抹即变玻璃唇，秒变时尚girl！

生成图片，分享到微信朋友圈

复现丨基于R的多项式回归

整体介绍

公开数据

结果复现

软件准备

安装R包

前期准备

假设1

假设2-一致性更优

数据准备

检验假设

绘制图形

假设3b-不对称性

假设4-中介效应

其他

您可能也对以下帖子感兴趣