软件应用 | Stata：因果推断方法综述和Stata操作

数据Seminar 2022-12-31

The following article is from 连享会 Author 连享会

作者：胡文涛 (中国人民大学)
邮箱：hwtxwhr@163.com
本文转载自公众号连享会（ID:lianxh_cn）

编者按：本文主要摘译自下文，特此致谢！
Source：Smith M J, Maringe C, Rachet B, et al. Tutorial: Introduction to computational causal inference using reproducible Stata, R and Python code[J]. arXiv preprint arXiv:2012.09920, 2020.
-PDF-:https://arxiv.org/pdf/2012.09920.pdf
-Code-:https://github.com/migariane/TutorialCausalInferenceEstimators

1. 背景介绍

1.1 潜在结果框架

1.2 三个假设

2. G-FORMULA 方法的正式定义及应用

2.1 非参数 G-formula 方法

2.2 参数 G-formula 方法

3. 逆处理概率加权法 (IPTW)

3.1 基于倾向得分的IPTW

3.2 基于稳定权重的边际结构模型

3.3 逆概率权重叠加回归调整

4. 增强型逆概率加权模型

5. 数据自适应估计：集成学习目标最大似然估计

6. 结语

7. 相关推文

1. 背景介绍

无论是与医学有关的健康研究还是政策评估，我们要识别因果关系所能依赖的数据往往只有观测数据，这就难以避免的需要精巧的研究设计来处理混杂 (confounding) 的问题。一个简单的方法就是在回归中将混杂因素给控制起来。但这通常不够，本文将介绍一些因果推理领域的最新进展，即通过建立经典的标准化方法来处理混杂。

在本文中，我们将按照方法的发展历史，在阐述方法原理基础上，提供 Stata 具体实现操作，以期读者不仅可以从理论上理解这些方法的演变，更能够在实际研究中应用这些方法。

1.1 潜在结果框架

一项干预对结果的因果效应是根据不同干预水平下潜在结果之间的对比来定义的 (即和 )，这个框架由 Rubin 发展确定的。现在，我们考虑右心导管插入术 (right heart catheterisation，以下简称 RHC) 对于 ICU 病人短期死亡 () 的影响。用 ATE 来定义右心导管插入术对于病人死亡的平均因果效应，可以用下式来表示：

对于每个病人而言，我们只能观测到一个结果，反事实是不能观测的，即与是未知的。但如果假设病人接受右心导管插入术是随机的，那么的一个无偏估计可以写成以下形式：

随机化使和。

1.2 三个假设

虽然随机化能够大大简化因果推断的难度，但随机化在临床试验中通常是不道德的，或者在观察性研究中是不可行的。我们进行因果推断时一般使用的是观察数据，为此需要对于可观察的数据做出以下三个假设 (不可验证)：

反事实的一致性 (Counterfactual consistency)：一致性意味着对于每个接受干预的个体而言，其观测的结果与他如果接受干预的潜在结果一致。同理，对于未接受干预的个体而言，其观测的结果与他如果未接受干预的潜在结果一致。一致性可用下式表示：

条件的可交换性 (Conditional exchangeability)：在随机研究中，条件可交换性，指的是边际可交换性和完全可交换性。成立的前提条件是接受干预的个体，如果他们没有被干预，会有与未干预个体具有相同的平均结果，反之亦然。这在观察性研究中不能保证，但如果结果的未测量风险因素根据测量的混杂因素在干预组和未干预组之间平均分布，则可以假设这是成立的。该假设可以用下述公式表示：

条件平均独立性：

正实性 (Positivity)：如果所有个体接受相同干预水平，计算平均因果效应是不可能的。相反，我们必须分配干预，只有这样，几乎可以肯定的是，一些个人将被分配到不同干预水平组。换句话说，我们必须确保分配给每个干预水平的概率是大于 0 的。即、、、均大于 0。

2. G-FORMULA 方法的正式定义及应用

2.1 非参数 G-formula 方法

在流行病学研究中，回归模型 (线性回归、logistic 回归、Cox 回归等) 经常被用于调整混杂因素 (confunding) 来正确估计因果效应，这需要假定混杂因素的影响是恒定不变的。但在一个观察性研究中，这些拟调整的混杂因素可能会随着时间变化而变化 (时依性混杂)，或受个体特征的影响，造成估计结果偏误。在两个群体之间年龄分布不同的情况下，可以需要对一个群体的年龄分布或按照外部标准进行年龄标准化。

G-formula 是经典标准化程序的一般化，在一个给定二元 (0, 1) 处理下，它可以获得 ATE 的无混杂的边际估计，如下式：

同时，

下面以一个具体例子来展示用 Stata 来估计非参数的 G-formula。

set more off
lxhuse rhc.dta, clear
global Y death_d30 // 结果变量
global A rhc       // 处理变量
global C gender    // 一组混杂因素中特殊的混杂因素，人为设定，便于理解
global W gender age edu race carcinoma // 一组五个混杂因素
regress $Y $A $C  //结果 Risk differences = 7.4%; 95% CI (4.84 -9.86); p<0. 001

*Non-parametric G-Formula for the ATE
proportion $C // 估计C的边际概率
matrix m=e(b)
gen genderf = m[1,1]
sum genderf
gen genderm = m[1,2]
sum genderm
cnssc install sumup, replace
sumup $Y, by($A $C) //A和C水平下结果的条件期望概率
//通过sumup命令来获得在给定A和c水平 (0,1 )情况下的结果的条件均值
matrix y00 = r( Stat1 ) // [6,1] matrix for E(Y|A=0,C=0)
matrix y01 = r( Stat2 ) // [6,1] matrix for E(Y|A=0,C=1)
matrix y10 = r( Stat3 ) // [6,1] matrix for E(Y|A=1,C=0)
matrix y11 = r( Stat4 ) // [6,1] matrix for E(Y|A=1,C=1)
// see " matrix list y00 ": position subscript [3,1] is the one of interest
// 应用 G- formula
gen EY1 = (( y11[3,1]- y01 [3,1]))* genderm
gen EY0 = (( y10[3,1]- y00 [3,1]))* genderf
qui : mean EY1 EY0
matrix ATE = r(table)
display "The ATE is:" ATE[1,1] + ATE[1,2] //应用 G- formula
drop EY1 EY0
// Also one can try
gen ATE = ((y11[3,1]- y01[3,1]))* genderm + ((y10[3,1]- y00[3,1]))* genderf
qui sum ATE

*Non-parametric G-Formula for the ATT
proportion $C if $A ==1
matrix m=e(b)
gen genderfatet = m[1,1]
gen gendermatet = m[1,2]
// G- formula
gen EY1 = ((y11[3,1]- y01[3,1]))* gendermatet
gen EY0 = ((y10[3,1]- y00[3,1]))* genderfatet
qui : mean EY1 EY0
matrix ATT = r(table )
display " The ATT is:" ATT[1,1] + ATT[1,2] // 应用G- formula
drop EY1 EY0
// Also one can try
gen ATT = ((y11[3,1]- y01[3,1]))* gendermatet + ((y10[3,1]- y00[3,1]))* genderfatet
qui sum ATT
drop ATT

*利用bootstrap进行统计推断
// ATE
capture program drop ATE
program define ATE , rclass // As before but now define a program to estimate the ATE
capture drop y1
capture drop y0
capture drop ATE
sumup $Y , by($A $C)
matrix y00 = r( Stat1 )
matrix y01 = r( Stat2 )
matrix y10 = r( Stat3 )
matrix y11 = r( Stat4 )
gen ATE = (( y11[3,1]- y01[3,1]))* genderm + ((y10[3,1]- y00[3,1]))* genderf
qui sum ATE
return scalar ate = `r(mean)'
end
qui bootstrap r(ate), reps (1000 ): ATE // Bootstrap 1000 estimates of the ATE
estat boot , all
drop ATE

// ATT
capture program drop ATT
program define ATT , rclass
capture drop y0
capture drop ATT
sumup $Y, by($A $C)
matrix y00 = r( Stat1 )
matrix y01 = r( Stat2 )
matrix y10 = r( Stat3 )
matrix y11 = r( Stat4 )
gen ATT = ((y11[3,1]- y01[3,1]))* gendermatet + ((y10[3,1]- y00[3,1]))* genderfatet
qui sum ATT
return scalar att = `r(mean)'
end
qui bootstrap r(att), reps (1000): ATT
estat boot , all
drop ATT

*也可以使用饱和回归模型的方法来应用G-Formula
regress $Y ibn.$A ibn.$A#c.($C) , noconstant vce(robust) coeflegend
//加 coeflegend 是为了让stata提供所分析变量的标签
predictnl ATE = (_b[1bn .rhc ] + _b[ 1bn. rhc#c. gender ]* gender ) ///
              - (_b[0bn .rhc ] + _b[ 0bn . rhc #c. gender ]* gender )
mean ATE

*另外基于margins命令
regress $Y ibn.$A ibn.$A#c.($C), noconstant vce(robust ) // Fully saturated model specification
margins $A, vce (unconditional) // A的边际概率
margins r.$A, contrast(nowald)  // 不同处理组之间的边际概率

2.2 参数 G-formula 方法

与非参数方法 (即无概率分布) 相反，为了基于特定参数来计算概率，我们假设有一个特定的概率分布适合我们的数据分布。为了计算 ATE，我们首先对每个治疗组分别回归 (使用简单的线性回归模型) 混杂因素的结果，然后对比两个处理组之间预期概率的差异。下面式子给出了G-formula 方法下的 ATE 的代数形式。

下面我们直接用 Stata 来对此进行计算：

regress $Y $C if $A ==1 // Expected probability amongst those with RHC
predict double y1hat
regress $Y $C if $A ==0 // Expected probability amongst those without RHC
predict double y0hat
mean y1hat y0hat        // Difference between the expected conditional probabilities
lincom _b[y1hat] - _b[y0hat] // ATE and biased confidence interval

*用teffects命令检验参数回归调整结果
teffects ra ($Y $C) ($A) // Parametric G- Formula implementation in Stata

*参数回归调整的Bootstrap统计检验
capture program drop ATE
program define ATE , rclass
capture drop y1
capture drop y0
reg $Y $C if $A ==1
predict double y1 , xb
quiet sum y1
reg $Y $C if $A ==0
predict double y0 , xb
quiet sum y0
mean y1 y0
lincom _b[y1]-_b[y0]
return scalar ace =`r(estimate)'
end
qui bootstrap r(ace), reps(1000 ): ATE
estat boot, all
drop ATE

*用G-formula去完成参数多元回归调整
regress $Y $W if $A ==1 // Saturated regression model with all confounders for those with RHC
predict double y1hat
regress $Y $W if $A ==0 // Saturated regression model with all confounders for those without RHC
predict double y0hat
mean y1hat y0hat // ATE is the difference in expectations
lincom _b[y1hat] - _b[y0hat] // The estimation from this command gives a biased confidence interval for
the ATE

teffects ra ($Y $W) ($A) //teffects命令检验

*用margins命令去完成参数多元回归调整
regress $Y ibn.$A ibn.$A#c.( $W), noconstant vce(robust ) // Fully saturated model
margins $A, vce(unconditional)
margins r.$A, contrast(nowald) // ATE and Delta method for the standard error and 95%CI

*多元参数回归调整的Bootstrap统计检验
capture program drop ATE
program define ATE , rclass
capture drop y1
capture drop y0
reg $Y $W if $A ==1
predict double y1 , xb
quiet sum y1
reg $Y $W if $A ==0
predict double y0 , xb
quiet sum y0
mean y1 y0
lincom _b[y1]-_b[y0]
return scalar ace =`r(estimate)'
end
qui bootstrap r(ace), reps (1000): ATE dots
estat boot, all

3. 逆处理概率加权法 (IPTW)

3.1 基于倾向得分的IPTW

在一个观察性研究中，可能会发现具有某些特征的个体相比于其他人更容易暴露于处理 (A)。假设有一些被暴露的个体，但基于封装在特定混杂向量中的一组特定特征却不太可能暴露。为了平衡暴露风险，我们增加了这些个体在结果变量上的权重，即他们接受治疗或暴露的概率的倒数 (即倾向得分)，这样他们就代表了他们自己也包括其他有相似特征的个体，他们没有被暴露。

同时，我们对那些不太可能不暴露的病人进行检查。除了现在 A 和 W 是独立的之外，结果数据集没有改变。因此，Y(1) 和 Y(0) 的一个比较给出了在三个识别假设下的边际因果效应，并且进一步假设倾向分数模型被正确地指定。因此，治疗加权的逆概率 (IPTW) 估计同样的是计算参数化的 G-formula 公式，不过是以治疗或暴露的反向概率为基础。

最初，他们是受经典的霍维茨 (Horvitz) 和汤普森 (Thompson) 调查估计量的启发，该估计量通过逆概率被用来增大结果变量，这逆概率用来观察 “丢失” 的过程。因此，回归调整和 IPTW 估计都是 G-formula 的计算实现，即标准化过程的一般化。

IPTW 下的 ATE 的被估量，可用下式表示：

是一个虚拟指示变量，指的是 A=a 等于 0 或 1。下面是具体计算操作：

*Computation of the IPTW estimator for the ATE
logit $A $W, vce(robust) nolog // Propensity score model for the exposure
predict double ps // Propensity score prediction
generate double ipw1 = ($A ==1)/ps // Sampling weights for the treated group
generate double ipw0 = ($A ==0)/(1-ps) // Sampling weights for the non - treated group
regress $Y [pw= ipw1] // Weighted outcome probability among treated
scalar Y1 = _b[_cons]
regress $Y [pw= ipw0] // Weighted outcome probability among non treated
scalar Y0 = _b[_cons]
display " ATE =" Y1 - Y0

*Bootstrap computation for the IPTW estimator
cap program drop ATE
program define ATE, rclass
capture drop y1
capture drop y0
regress $Y [pw= ipw1 ]
matrix y1 = e(b)
gen double y1 = y1[1,1]
regress $Y [pw= ipw0 ]
matrix y0 = e(b)
gen double y0 = y0[1,1]
mean y1 y0
lincom _b[y1]-_b[y0]
return scalar ace = `r(estimate)'
end
qui bootstrap r(ace), reps(1000): ATE
estat boot ,all
teffects ipw ($Y) ($A $W , logit ), nolog vsquish

*Box 19: Assessing IPTW balance
qui teffects ipw ($Y) ($A $W)
tebalance summarize // Stata 's tebalance

// tebalance by hand (gender)
egen genderst = std (gender) // Standardization
logistic $A $W // Propensity score
predict double ps
gen ipw = .
replace ipw =($A ==1)/ps if $A ==1
replace ipw =($A ==0)/(1-ps) if $A ==0
regress genderst $A // Raw difference
regress genderst $A [pw= ipw ] // Standardized difference

*Box 20: Assessing IPTW overlap by hand
sort $A
by $A: summarize ps
kdensity ps if $A ==1, generate ( x1pointsa d1A) nograph n( 10000 ) // Non - parametric kernel density estimate of
kdensity ps if $A ==0, generate ( x0pointsa d0A) nograph n( 10000 ) // Non - parametric kernel density estimate of
label variable d1A " density for RHC =1"
label variable d0A " density for RHC =0"
twoway (line d0A x0pointsa, yaxis (1))(line d1A x1pointsa, yaxis (2))

*Box 21: Assessing overlap using teffects overlap
qui: teffects ipw ($Y) ($A $W, logit), nolog vsquish
teffects overlap

3.2 基于稳定权重的边际结构模型

边际结构模型 (MSM) 是将 (Y) 作为结果变量的一个加权回归模型，处理或者说暴露 (A) 作为解释变量。需要注意的是，处理 A 的 MSM 系数是 ATE 估计值。为了匹配 MSM 模型，我们使用 IPTW 计算的处理概率或倾向得分的倒数作为抽样权重 (即霍维茨和汤普森)。

然而，由于存在较大的权重，ATE 估计的方差在非饱和状态MSM模型下可能会被夸大。更大的权重是潜在的几乎违背积极性假设的结果。因此，为了应对这种情况，可以使用稳定版本的权重。与上述权重的不同之处仅在于，它们不采用简单的倒数，而是将选择治疗的基线概率 (从没有协变量的模型中估计) 除以选择给定治疗的概率协变量。

稳定版权重的平均值等于 1。因此，稳定版的权重产生的ATE估计具有较小的方差，对接近正实性 (near-positivity)的不满足情况具有更加稳健的估计结果。最后，请注意，对于统计推断，我们使用 vce(robust) 去估计 ATE 的 SE。

* Computation of the IPTW estimator for the ATE using a MSM
// baseline treatment probabilities
logit $A , vce(robust) nolog
predict double nps, pr
// propensity score model
logit $A $W , vce(robust) nolog
predict double dps, pr
// Unstabilised weight
gen ipw = .
replace ipw =($A ==1)/ dps if $A ==1
replace ipw =($A ==0)/(1- dps) if $A ==0
sum ipw

// Stabilized weight
gen sws = .
replace sws = nps / dps if $A ==1
replace sws = (1- nps)/(1- dps) if $A ==0
sum sws

// MSM
reg $Y $A [pw= ipw ], vce(robust) // MSM unstabilized weight
reg $Y $A [pw= sws ], vce(robust) // MSM stabilized weight

3.3 逆概率权重叠加回归调整

IPTW-RA (Inverse probability weighting plus regression adjustment) 是一个使用 G-computation 回归调整结合稳定估计的 IPTW 所构造出的估计量。假设对于处理的倾向得分模型已经正确的指定但是回归方程设定不正确，IPTW-RA 有助于去校正估计量。当回归函数被正确设定时，权重不影响估计量的一致性，即使从中导出它们的模型被错误地赋值。

通过混合使用这两种方法，我们只能增加获得更一致的 ATE 估计的概率，这也是为什么将这两种建模方法结合起来的估计量被称为双重稳健的原因。当只使用 G-computation 方法时，面对由于稀疏性 (sparsity) 和接近阳性 (near-positivity) 而存在的可识别性问题时，他们仅仅能依赖于处理效果的外推法 (extrapolation)。将 IPTW 加入回归调整允许评估处理的平衡和可能的正实性违反情况，增加研究者对因果推理模型局限性的认识。

我们还鼓励研究人员在可能的情况下探索非参数 G-formula 的实现 (仅使用一些最重要的混杂因素)，来真正理解他们手头的数据，并从有限的样本中识别与维数灾难相关的潜在问题 (即，为零的空单元对于实现 G-formula 所需的分析中包括的不同变量的条件概率的给定组合)。

*Computation of the IPTW-RA estimator for the ATE and bootstrap for statistical inference
capture program drop ATE
program define ATE, rclass
capture drop y1
capture drop y0
reg $Y $W if $A==1 [pw=sws]
predict double y1, xb
quiet sum y1
return scalar y1=`r(mean)'
reg $Y $W if $A==0 [pw=sws]
predict double y0, xb 
quiet sum y0
return scalar y0=`r(mean)'
mean y1 y0 
lincom _b[y1]-_b[y0]
return scalar ace =`r(estimate)'
end
qui bootstrap r(ace), reps(10): ATE
estat boot, all

teffects ipwra ($Y $W) ($A $W), pom coeflegend
nlcom _b[ POmeans :1. rhc ]/ _b[ POmeans :0bn .rhc ] // marginal RR and 95%CI (Delta method)

4. 增强型逆概率加权模型

AIPTW (AUGMENTED INVERSE PROBABILITY WEIGHTING) 估计量是一个 IPTW 估计量，它包括一个增强项，当处理模型被错误设定时，它会校正估计量。当处理模型被正确指定时，增加项会由于样本量变大而消失。因此，AIPTW 估计量比 IPTW 更有效。

然而，与 IPTW 一样，当预测的处理概率太接近于零或一 (即，违背接近正实性) 时，AIPTW 表现不佳。增加项的期望值为零，包括倾向分数的期望值和回归调整结果。因此，AIPTW 集成了两个参数模型，即结果模型和处理模型。如果两个模型中的任何一个被正确设定，AIPTW 估计量就会产生满足参数一致性的双稳健估计。

下面是利用 Stata 具体实现该方法的估计：

* Computation of the AIPTW estimator for the ATE and bootstrap for statistical inference
// Step (i) prediction model for the outcome using G- computation regression adjustment
qui glm $Y $A $W , fam (bin)
predict double QAW , mu
qui glm $Y $W if $A ==1, fam (bin)
predict double Q1W , mu
qui glm $Y $W if $A ==0, fam (bin)
predict double Q0W , mu

// Step (ii): prediction model for the treatment
qui logit $A $W
predict double dps , pr
qui logit $A
predict double nps , pr
gen sws = .
replace sws = nps / dps if $A ==1
replace sws = (1- nps)/(1- dps) if $A ==0

// Step (iii ): Estimation equation based on analytical formula 5
gen double y1 = (sws *($Y - QAW ) + ( Q1W ))
sum y1
return scalar y1=`r(mean)'
gen double y0 = ( sws *($Y - QAW ) + ( Q0W ))
quiet sum y0
return scalar y0=`r(mean)'
mean y1 y0
lincom _b[y1] - _b[y0]

// step (iv) Bootstrap confidence intervals
capture program drop ATE
program define ATE , rclass
capture drop y1
capture drop y0
capture drop Q*
qui glm $Y $A $W , fam ( bin )
predict double QAW , mu
qui glm $Y $W if $A ==1, fam ( bin )
predict double Q1W , mu
qui glm $Y $W if $A ==0, fam ( bin )
predict double Q0W , mu
gen double y1 = ( sws *($Y - QAW ) + ( Q1W ))
quiet sum y1
return scalar y1=`r(mean)'
gen double y0 = ( sws *($Y - QAW ) + ( Q0W ))
quiet sum y0
return scalar y0=`r(mean)'
mean y1 y0
lincom _b[y1] - _b[y0]
return scalar ate =`r(estimate)'
end
qui bootstrap r( ate), reps(1000): ATE
estat boot, all

5. 数据自适应估计：集成学习目标最大似然估计

目标最大似然估计 (TMLE) 是一种半参数双稳健估计方法，通过允许使用非参数数据自适应 (data-adaptive) 机器学习方法的灵活估计，目标估计更接近真实模型设定，从而减少初始估计的偏差。因此，它只需要比同类模型更弱的假设。TMLE 的优势在模拟研究和应用分析中得到了反复证明。有证据表明，与其他双稳健估计器如 IPTW-RA 与 AIPTW 相比，TMLE 始终提供偏差最小的估计。

下面是该方法的 Stata 的一个具体实现：

* Computational implementation of TMLE by hand
* Step 1: prediction model for the outcome Q0 (g- computation )
glm $Y $A $W , fam ( binomial )
predict double QAW_0 , mu
gen aa=$A
replace $A = 0
predict double Q0W_0 , mu
replace $A= 1
predict double Q1W_0 , mu
replace $A = aa
drop aa
// Q to logit scale
gen logQAW = log ( QAW / (1 - QAW ))
gen logQ1W = log ( Q1W / (1 - Q1W ))
gen logQ0W = log ( Q0W / (1 - Q0W ))

* Step 2: prediction model for the treatment g0 ( IPTW )
glm $A $W , fam( binomial )
predict gw , mu
gen double H1W = $A / gw
gen double H0W = (1 - $A ) / (1 - gw)

* Step 3: Computing the clever covariate H(A,W) and estimating the parameter ( epsilon ) ( MLE)
glm $Y H1W H0W , fam ( binomial ) offset ( logQAW ) noconstant
mat a = e(b)
gen eps1 = a[1,1]
gen eps2 = a[1,2]

* Step 4: update from Q0W and Q1W to Q0W_1 and Q1W_1
gen double Q1W_1 = exp( eps1 / gw + logQ1W ) / (1 + exp ( eps1 / gw + logQ1W ))
gen double Q0W_1 = exp( eps2 / (1 - gw) + logQ0W ) / (1 + exp ( eps2 / (1 - gw) + logQ0W ))

* Step 5: Targeted estimate of the ATE
gen ATE = ( Q1W_1 - Q0W_1 )
summ ATE
global ATE = r( mean )

* Step 6: Statistical inference ( functional Delta method )
qui sum( Q1W_1 )
gen EY1tmle = r( mean )
qui sum( Q0W_1 )
gen EY0tmle = r( mean )

gen d1 = (( $A * ($Y - Q1W_1 )/gw)) + Q1W_1 - EY1tmle
gen d0 = ((1 - $A ) * ($Y - Q0W_1 )/(1 - gw)) + Q0W_1 - EY0tmle

gen IC = d1 - d0
qui sum IC
gen varIC = r( Var ) / r(N)

global LCI = $ATE - 1.96* sqrt ( varIC )
global UCI = $ATE + 1.96* sqrt ( varIC )
display " ATE :" %05.4f $ATE _col (15) "95%CI: " %05.4f $LCI "," %05.4f $UCI

preserve
eltmle $Y $A $W , tmle // install via " ssc install eltmle " or " github install migariane  eltmle "
restore

6. 结语

总的来说，本文介绍的所有方法包括 G-formula 的估计 (非参数或参数)，这是标准化方法的一般化，以及处理加权的逆概率方法 (IPTW)。然而，还有其他基于匹配策略的估计方法，由于篇幅所限没有讨论，更多详细内容请参考原文。

星标⭐我们不迷路！

想要文章及时到，文末“在看”少不了！

点击搜索你感兴趣的内容吧

往期推荐

数据治理 | 遇到海量数据stata卡死怎么办？这一数据处理利器要掌握

统计计量 | 学习SPSS的9本经典教材

数据可视化 | 有统一规范吗？

软件应用 | 数据科学系列：plotly可视化入门介绍

热点资讯 | 浙江工商大学开通企研·学术大数据平台试用！

热点资讯 | 三农学术周报（第18期）

统计计量 | 屡见不鲜的一类Wrong工具变量——组均值

数据Seminar

这里是大数据、分析技术与学术研究的三叉路口

推荐 | 青酱

欢迎扫描👇二维码添加关注

点击下方“阅读全文”了解更多

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

软件应用 | Stata：因果推断方法综述和Stata操作

1. 背景介绍

1.1 潜在结果框架

1.2 三个假设

2. G-FORMULA 方法的正式定义及应用

2.1 非参数 G-formula 方法

2.2 参数 G-formula 方法

3. 逆处理概率加权法 (IPTW)

3.1 基于倾向得分的IPTW

3.2 基于稳定权重的边际结构模型

3.3 逆概率权重叠加回归调整

4. 增强型逆概率加权模型

5. 数据自适应估计：集成学习目标最大似然估计

6. 结语

您可能也对以下帖子感兴趣

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

生成图片，分享到微信朋友圈

软件应用 | Stata：因果推断方法综述和Stata操作

1. 背景介绍

1.1 潜在结果框架

1.2 三个假设

2. G-FORMULA 方法的正式定义及应用

2.1 非参数 G-formula 方法

2.2 参数 G-formula 方法

3. 逆处理概率加权法 (IPTW)

3.1 基于倾向得分的IPTW

3.2 基于稳定权重的边际结构模型

3.3 逆概率权重叠加回归调整

4. 增强型逆概率加权模型

5. 数据自适应估计：集成学习目标最大似然估计

6. 结语

您可能也对以下帖子感兴趣