2021泛太平洋因果推断大会总结
导语
为了更多地推动因果推断学科的发展,聚集国内外因果推断的一线科研工作者,共同讨论因果科学的最新进展,北京大学讲席教授、北京大学公共卫生学院生物统计系主任、北京大学北京国际数学研究中心生物统计和信息研究室主任周晓华等发起了泛太平洋因果推断大会,会议于2021年9月11日至12日举行。泛太平洋因果推断大会是因果推断领域国际上最领先的会议之一。本次会议共邀请了28位嘉宾作报告,探讨因果推断领域的新进展。会议不仅设有专家学者报告,还组织了因果推理竞赛,邀请了获奖团队报告。累计超过1000人次注册会议,超过5000人次观看会议直播。
以下我们为大家梳理了本次因果推断大会的报告摘要与视频回放,经过报告人的同意,部分回放会公开在集智学园的平台,点击文末“阅读原文”即可跳转到课程页面。其中有些尚未发表的工作报告人不希望公开,错过直播的同学可能就要继续等待了~
Is being an only child harmful to psychological health? Evidence from a local instrumental variable analysis of the China One-Child Policy
Fan Li(Duke University)
Fan Li
Personal Website:
https://www2.stat.duke.edu/~fl35/
Two Dogmas of Methodology
Clark Glymour(Carnegie Mellon University)
Clark Glymour
Personal Website:
https://www.cmu.edu/dietrich/philosophy/people/emeritus/glymour.html
Combining Observational and Experimental Data for Personalized Causal Prediction
Sofia Triantafillou(University of Pittsburgh)
Sofia Triantafillou
Personal Website:
https://www.dbmi.pitt.edu/node/54091
Personalized Dynamic Treatment Regimes in Continuous Time: A Bayesian Approach for Optimizing Clinical Decisions with Timing
Yanxun Xu(Johns Hopkins University)
Yanxun Xu
该研究的一个背景是,在患者进行肾移植后,医生对患者进行随访,确定新的治疗和下一次随访日期。我们要研究的问题是,肌酐随时间如何变化?肌酐如何影响生存?医生如何治疗患者?最后需要找到最优的治疗策略。这涉及到四个模型,分别是纵向模型、生存模型、剂量量模型和随访模型。
我们记录了不同时间的患者临床测量值和医生决策,思路是用贝叶斯联合模型拟合模型。用线性混合效应模型把纵向模型和剂量模型联系起来,再通过把纵向效应、剂量效应、随访模型纳入到危险率函数中,把生存模型和其余三个模型联系起来。利用贝叶斯联合模型,为未知参数设定先验分布,就能获得后验分布。最后,最大化一个收益函数,得到最优治疗方案。作者把该方法应用于实际的肾移植数据上。
Yanxun Xu is an assistant professor in the Department of Applied Mathematics and Statistics. Her research focuses on Bayesian statistics; cancer genomics; clinical trial design; graphical models; nonparametric Bayesian statistical inference for big data analysis; high-throughput genomic date; and proteomics data. She earned her doctorate (2013) at Rice University; her master’s (2010) at Texas Tech University; and her bachelor’s (2007) at Beijing University of Aeronautics and Astronautics.
Personal Website:
https://engineering.jhu.edu/ams/faculty/yanxun-xu/
Higher order Targeted Maximum Likelihood Estimation and its Applications
Mark van der Laan(UCBerkeley)
Mark van der Laan
Personal Website:
https://statistics.berkeley.edu/people/mark-van-der-laan
Decomposition, Identification and Multiply Robust Estimation of Natural Mediation Effects with Multiple Mediators
Fan Xia (National Alzheimer's Coordinating Center, University of Washington, Seattle)
在因果推断中,识别性假设帮助我们找到直接的目标估计量,统计假设帮助我们得到估计量。处理变量和结局变量中间可能有多条路径,我们把因果作用分解为两部分:总(自然)直接作用和总(自然)非直接作用。当存在多个中介时,特别是如果因果结构未知,我们不能假设中介变量之间的因果结构,所以考虑这样的分解:总非直接作用=通过第k个中介的“出”非直接作用(EIE)+中介之间的交互作用(INT)。
这样分解的好处是,当对中介变量的干预不会影响中介的潜在结果时,EIE就等于传统的总非直接作用。如果中介之间没有交互作用,INT等于零。作者介绍了两组识别性假设,用来识别平均因果作用或各个分解的子作用,并在这些假设下建立工作参数模型,研究了矩估计和四重稳健估计。
Fan Xia is a postdoctoral fellow at NACC. Her primary areas of interest including causal inference and cluster randomized trials, especially causal mediation analysis and stepped wedge designs.
机器学习方法在许多领域取得了巨大的成功,但绝大部分方法都是基于独立同分布假定(这里指训练集的分布与测试集的分布一致),这极大地限制了它们的应用范围。当测试集与训练集的分布存在较大差异时,模型在测试集的表现将会很差,因而模型缺乏稳定性,这种现象称为out-of-distribution generalization problem。
关于这一问题,Cui首先引入了稳定学习这一概念,它指在不同测试集分布下表现都良好且稳定的方法。其次指出要解决这一问题的关键技术是找出因果特征。以AI识别小狗的为例,背景中的草地可以看作是相关信息,它是导致算法泛化能力弱的主要原因。要使得算法在不同的背景下都能较好地识别出小狗,需要排除这部分相关信息,找出刻画小狗特征的因果特征。
通过借鉴因果推断中加权以使特征变量保持平衡,从而去除混杂的思路,Cui提出了一系列基于加权的稳定学习方法,并介绍了其中的一些算法细节和理论结果。稳定学习方法选择权重的目标是尽量剔除特征变量间的相关性,以剔除图片中关联信息、保留因果信息,从而产生更稳定的估计。
此外,Cui从因果不变预测的角度出发,提出了一个获得稳定学习的异质性风险极小化框架。
Associate Professor (Tenured),Lab of Media and Network,Department of Computer Science and Technology. He has published more than 100 papers in famous conferences and periodicals in the field of data mining and multimedia, and has won the best paper awards in 7 international conferences and journals. He won the ACM China New Star Award in 2015 and the CCF-IEEECS Young Scientist Award in 2018. He is currently an outstanding member of CCF and a senior member of IEEE.
通过假设检验的方法选出一些感兴趣的参数后,再对这些选出的参数作统计推断的方法称为选择推断或事后推断。例如:在一个研究财政补贴对慈善捐赠影响的RCT试验中,t-检验结果如下:
结果表明,财政补贴只对$given including match和response rate(红色部分)有显著影响。若我们只报告这两个结果,直接将这两个结果展现出来的做法是不正确的。因为这两个结果已经是经过我们删选过后的结果。事实上,现实生活中经常出现类似”报喜不报忧”的问题,我们只将那些显著的结果报告出来,而将不显著的结果隐藏起来,这样展现出的结论是有偏的。我们想知道在选出这些变量中,排除“报喜不报忧”引起的偏差后,究竟还有多少个变量(称为effect size)是显著的。Okui提出利用截断正态分布,提出了一种确定effect size的方法。
Ryo is an Associate Professor in Department of Economics, Seoul National University. His research interest includes Microeconometrics, Applied microeconomics And Experimental Economics.
根据Neyman-Rubin模型,在随机对照试验中,平均处理效应(ATE)的无偏估计为处理组与控制组的结局均值之差,但这个估计没有用上协变量的信息。实际上,尽管协变量与处理变量独立,但它可能会影响潜在结果,因此从直觉上看,加入协变量的信息是能够改善ATE估计效率的。
已有研究表明,若结局为连续型变量时,直接将协变量放入到线性回归模型中,若限定处理组与控制组模型中协变量的系数一样,这样得到的ATE估计并不一定会改善直接用结局均值之差得到的估计;有趣的是,若允许处理组与控制组模型中协变量的系数不一样,无论线性模型是否指定正确,这样得到的ATE估计一定会改善直接用结局均值之差得到的估计,这个结果即便在高维情形下也成立。
一个自然的问题是:若结局为二值变量时,使用逻辑回归进行回归调整,上述结论依然成立吗?结论是不一定成立。Jia发现:逻辑回归调整得到的ATE估计是一个渐近正态的相合估计;若逻辑回归指定正确,则逻辑回归调整估计比用简单结局均值之差得到的估计更有效率,也比用线性回归调整估计更有效率;若逻辑回归指定错误,类似的结论不成立。因此,当Y为二值变量时的结论与Y为连续型变量的结论不对称。
通过分析这种不对称性的原因,Jia提出了基于广义估计方程(GEE)差异ATE估计,这个估计是有一个有效估计,因而比用简单结局均值之差得到的估计更有效。
Jinzhu JIA, researcher and doctoral supervisor, School of Public Health, Peking University. He graduated from Peking University in January 2009. From January 2009 to December 2010, UCBerkeley postdoctoral. From January 2011 to January 2018, he worked in the Department of probability and Statistics of the School of Mathematical Sciences of Peking University and the Statistics Center of Peking University, during which he visited Harvard University for one year. He joined the School of Public Health of Peking University in February 2018. The main research interests are high-dimensional statistical inference, big data analysis, statistical machine learning, causal inference, biological statistics and so on. He has published many papers in the fields of theoretical research on variable selection methods, the application of high-dimensional data and big data's statistical learning, and causal inference. He serves as deputy secretary-general of China probability and Statistics Society, executive director of Young statisticians' Association, director of Computing Statistics Branch of Field Statistics Research Society, and director of High-dimensional data Statistics Association of Field Statistics Research Society.
随着基础科学的进步,所收集到的数据越来越多,精准医疗在现代生物医学研究中备受关注,它旨在根据病人特征定制化地进行处理。精准医疗的成功依赖于精确稳定的估计个性化治疗规则的统计方法。
当前估计个性化治疗规则的方法可大体分为两类。一类是直接最小化或最大化总体的平均结局,这类方法通常将优化问题转化为一个加权的分类问题,然后采用一些机器学习算法进行估计,其局限性在于难以获得有效的置信区间;另一类方法是通过估计异质性因果效应和相应的置信区间或置信带来选择最优的个性化治疗规则。
Zhou提出的一系列方法属于第二类,但允许数据是高维的且未观测混杂存在。针对不同的结局类型,Zhou首先定义了不同的协变量特异因果作用CSTE (covariate-specific treatment effect)曲线,表示异质性因果效应,具有很好的因果解释。在高维情形下,当观察数据中存在未观测混杂时,Zhou通过工具变量定义了局部CSTE曲线,讨论了识别性,并给出了估计和相应的置信区间。
Professor and doctoral supervisor of Peking University, currently head of Department of Biostatistics, School of Public Health, Peking University, Director of data Center of traditional Chinese Medicine University of Beijing big data Research Institute, Deputy Director of big data Center of Medical and Health, Director of Biostatistics Laboratory of Beijing International Mathematical Research Center, Chairman of China Branch of International Association of Biological Statistics, President of Biomedical Statistics Branch of China Society for Field Statistics, member of American Association for the Advancement of Science. Member of the American Statistical Society and member of the International Institute of Statistics. At present, he is a top journal of biostatistics, associate editor of StatisticsinMedicine, and editor of Biostatistics&Epidemiology, China Branch of the International Society for Biostatistics. In the top international journal of statistics and biostatistics, J.R.Statist.Soc.B Journal of the American Statistical Association, Biometrika, Ann.Statist,Biometrics,Stat.Med. More than 240 SCI academic papers have been published, of which more than 130 are the first or correspondent authors.
考虑类别型的随机变量。一组变量V上的因果贝叶斯网络包含有向无环图以及一组条件概率函数,由此可以根据干预准则得到一系列干预条件分布。这些条件概率和外生机制组成了因果模块,干预打破了相应模块,但不改变其他模块。我们可以用类别理论来表述,类别由对象和箭头组成,箭头从域指向上域。称F是函子,如果F保持域和上域、保持复合和恒同。带有单积的类别称为单类别。因果理论建立在有向无环图上,态射表示因果机制或因果过程。在因果理论中,可以定义从干预变量到结果变量的因果效应态射。讲者用Trek分离的概念说明了因果效应可分解的条件。这一研究反映了do演算是因果理论的核心。
Zhang Jiji received his PhD in Logic, Computation, and Methodology from the Department of Philosophy at Carnegie Mellon University in 2006, and taught previously at California Institute of Technology and Lingnan University, before joining the Department of Religion and Philosophy at Hong Kong Baptist University in 2021. His philosophical interests lie mainly in philosophy of science, formal epistemology, and logic. The interdisciplinary part of his research centers around the topic of causation, addressing both the epistemological and logical aspects of causal reasoning, and the statistical and computational aspects of causal modelling and discovery. His work has appeared in both premier journals in philosophy, such as Journal of Philosophical Logic, British Journal for the Philosophy of Science, Philosophy of Science, Synthese, etc., and in leading venues in computer science and statistics, such as Artificial Intelligence, Journal of Machine Learning Research, Statistical Science, as well as some top conference proceedings in the field of Artificial Intelligence. With the new opportunities provided by the Ethical and Theoretical AI lab, he aims to work with colleagues across and beyond the university to apply causal modelling tools to shed new light on some important issues with implications for AI ethics, including especially machine learning interpretability, algorithmic bias, and AI-powered personalized medicine.
在众多隐变量中发现因果关系在很多领域都受到了关注。如何发现隐变量?如何从观测数据中确定隐变量的因果关系?当存在四个观测变量时,可以用Triad条件判断它们之上存在几个隐变量,但不能识别隐变量之间的方向,而且当只有三个变量时无法确定隐变量的数量。能否用高阶的矩信息完成任务呢?对于非高斯独立噪声的情形,可以用变量关系的不对称来判断隐变量之间的方向。进而,讲者提出广义独立噪声条件,用观测变量作为代理变量,可以得到一些关于协方差的条件。Triad条件可以看作是广义独立噪声条件的特例。最后,讲者考虑了线性非高斯隐变量模型。基于广义独立噪声模型,先找出因果集群,然后确定隐变量之间的方向。
Ruichu Cai is a professor and doctoral supervisor at the School of Computer Science, Guangdong University of Technology. Received a doctorate in engineering from South China University of Technology in 2010 and entered Guangdong University of Technology; was named associate professor in 2011; was named professor and doctoral supervisor in 2015; went to National University of Singapore during 2007-2009 and 2013-2014 Visited and studied with UIUC Advanced Digital Science Research Center. He has presided over 2 National Natural Science Foundation of China, 1 Provincial Outstanding Youth Fund, 1 Pearl River Science and Technology Rising Star and other projects. Dr. Cai focuses on research in fields such as causality discovery and high-dimensional data mining. More than 30 papers have been published, including important conferences in the fields of ICML, SIGMOD, SDM, and internationally renowned journals such as TNNLS, Bioinformatics, TKDE, NN, and PR; 4 authorized invention patents, 2 of which have been implemented in NetEase’s mailbox; related achievements have been implemented successively Won the second prize of Provincial Science and Technology Award (the fourth completer), and the first prize of Provincial Science and Technology Award (the third completer).
极值理论主要用于分析一些罕见现象:金融危机,洪灾,台风,火灾等。用统计的语言,即关心的事件都处于所有事件分布的尾部。最近,对极端事件的归因分析引起了广泛的关注,它旨在找出哪些是引起极端事件发生的风险因素。
Engelke的报告分为两部分。第一部分为重尾模型中的因果发现方法。Engelke首先介绍了因果尾系数(causal tail coefficient)的定义,它可用于任意两个变量因果的方向。对于任意p个变量,因果尾系数可用于恢复它们的因果序,并有理论保证。
第二部分讨论重尾分布的分位数处理效(QTE)的估计。QTE有着广泛的应用场景,如它可以回答下列问题:一个教育计划能够增加最穷的0.1%的人多少收入?Engelke给出了极值QTE(当分位数趋于1时的QTE)的估计,并讨论了极值QTE估计的渐近性质。
Sebastian is assistant professor at the Research Center for Statistics at the University of Geneva, where is holding an Eccellenza grant. He was visiting professor at the Department of Statistical Sciences at the University of Toronto from 2018 – 2019. Previously he was an Ambizione fellow at EPF Lausanne with Anthony Davison. Sebastian did his studies in Mathematics at University of Göttingen and UC Berkeley, and he finished his PhD as a Deutsche Telekom Foundation fellow in 2013 at the University of Göttingen with Martin Schlather. His research interests are in extreme value theory, spatial statistics, graphical models and data science. Since 2018, Sebastian is Associate Editor of the Springer journal Extremes and the Scandinavian Journal of Statistics.
Miao首先介绍了Paradata。Paradata指跟踪调查数据收集过程的记录,如电话访谈的日期,访谈时间,沟通方式,访谈对象的态度等。这些信息不属于结构化的数据,但它也提供了一些信息。因此,合理地利用Paradata能改善调查的质量。回叫数据为Paradata中的一种,它记录了尝试访谈的次数,提供了关于数据缺失机制的信息。
结构化数据中经常会出现缺失数据。非随机缺失MNAR的难点在于估计量通常是不可识别的,目前解决方法分为三类:约束参数模型、工具变量方法和影子变量方法。这三类方法都需要强的假设条件,实际中未必满足。
Miao提出了一种利用回叫数据处理MNAR的方法,通过它将约束参数模型推广到非参模型,讨论了可识别性假设,得到了半参有效界,并给出了一系列半参有效估计。
Wang is currently an assistant professor in the Department of Probability and Statistics at Peking University. He studied undergraduate and Ph.D. in the School of Mathematical Sciences at Peking University from 2008 to 2017. He did postdoctoral research at the Department of Biostatistics at Harvard University from 2017 to 2018. He joined Peking University in 2018.
Michael Elliott is a Professor of Biostatistics at the University of Michigan School of Public Health and Research Scientist at the Institute for Social Research. He received his PhD in biostatistics in 1999 from the University of Michigan. Prior to joining the University of Michigan in 2005, he held an appointment as an Assistant Professor at the Department of Biostatistics and Epidemiology at the University of Pennsylvania School of Medicine, and prior to that as a Visiting Professor of Biostatistics at the University of Michigan School of Public Health and as a Visiting Research Scientist at the University of Michigan Transportation Research Institute. Dr. Elliott's statistical research interests focus around the broad topic of "missing data," including the design and analysis of sample surveys, casual and counterfactual inference, and latent variable models. He has worked closely with collaborators in injury research, pediatrics, women's health, and the social determinants of physical and mental health. Dr. Elliott serves as an Associate Editor for the Journal of the American Statistical Association and the Journal of Survey Statistics and Methodology.
传统因果推断往往是考虑处理对于某个实值的结果变量的因果效应,而讲者则关心的是处理对于分布函数的影响。这是因为在讲者所研究的问题是关于结婚对于运动量的分布情况的影响。当结局变量是一个分布时,首先要定义关于分布的平均,以及两个分布之间的差值。对于第一个问题,讲者提出利用Wasserstein barycenter来得到不同分布之间的平均值,讲者指出这样得到的平均能够反映出原来分布的一些性质;而对于第二个问题,讲者利用分位数给出了比较合适的潜在结果之间的差,并具有因果的含义。在定义好因果参数后,讲者给出了类似于实值结局变量时的估计方法,包括逆概率加权,回归以及双稳健方法。
Linbo Wang received his PhD in Biostatistics from University of Washington in 2016. Prior to joining the University of Toronto, he spent two years at Harvard Causal Inference Program. His research interest includes causal inference, graphical models, and modern statistical inference in infinite-dimensional models. He is the recipient of several research awards, including a NSERC Discovery Accelerator Supplement in 2019.
在因果推断问题中,我们往往会给出潜在结果的模型,此时我们如何利用我们设定的模型模拟出数据的真实分布并不是一个简单的问题。讲者给出了一种参数化以及模拟数据分布的解决方案。讲者将数据分布做了一种分解,分成了我们感兴趣部分的模型,已有的分布模型,以及前两者之间的依赖关系分布。而依赖关系可以利用copula来表示。从而利用这样的分解,讲者通过拒绝性采样的方法就能够模拟出所需要的数据分布。
Robin is an Associate Professor in Statistics, and a fellow of Jesus College. He received his PhD in Statistics from the University of Washington in 2011, and was a Postdoctoral Research Fellow at the Statistical Laboratory in Cambridge from 2011 to 2013. His research interests include graphical models, causal inference, latent variable models and algebraic and semi-parametric statistics.
讲者主要介绍了纵向数据中变量的观测时间不规则时的统计分析方法。在SNMM模型中一般假定所有数据都是在固定时间点上测得,然而这在真实世界往往并不能满足。讲者在无未观测混杂的条件下,先讨论了在离散时间情形下的因果作用估计方法,然后利用鞅等工具提出了连续时间情形下的估计方法。同时该方法具有双稳健以及局部有效的性质。此外,当存在可忽略删失时,讲者指出只用完全数据分析是逆概率加权估计量中最优的。
Shu Yang graduated from Iowa State University in 2014 with major in Mathematics and comajor in Statistics working with J.K. Kim and Z. Zhu. After graduation, she joined Harvard TH Chan School of Public Health as a post-doc with Judith Lok. She then joined NC State as a faculty member since 2016.
未观测混杂是因果分析的一个很重要的问题,讲者介绍了一种新的能够处理未观测混杂的统计推断方法。讲者首先回顾了可忽略性假定成立是识别因果作用所需的条件。讲者认为很多时候,我们测量的并不是精确的混杂变量,而是混杂变量的代理。此时想要识别因果作用,讲者指出,当存在两个代理变量,且存在bridge function时,因果作用就可以识别。这样的识别条件大大减弱了工具变量识别因果作用的严格条件。同时讲者也简单讲解了非参情形下可以利用minimax方法来估计bridge function。当在线性模型下,讲者给出了类似于工具变量的两阶段最小二乘方法,这使得新提出的方法在应用时非常方便。
Eric J. Tchetgen Tchetgen is the Luddy Family President’s Distinguished Professor at the Wharton School of the University of Pennsylvania.
Professor Tchetgen Tchetgen comes to the University of Pennsylvania from Harvard University, where he has served since 2008 as Professor of Biostatistics and Epidemiologic Methods with joint appointments in the departments of Biostatistics and Epidemiology at the T.H. Chan School of Public Health.
He researches infectious diseases, including HIV/AIDS, and the role of genetic and social factors in the patterns, causes, and effects of public health. Professor Tchetgen Tchetgen has received grants from the National Institutes of Health and the Centers for Disease Control.
He completed his Ph.D. in Biostatistics at Harvard University in 2006 under the supervision of Professor James M. Robins. He received his B.S. in Electrical Engineering from Yale University in 1999.
孟德尔随机化在基因相关的研究中越来越重要。最初大家一般直接将基因变异当作工具变量进行分析,讲者指出实际上这些基因变异往往相关且会有基因多效性,导致工具变量的假定不能满足。之前俄又一些处理基因多效性的方法,但是这些方法还需要其他一些假定,例如InSIDE假定等。讲者为了解决这一问题,提出了MRCIP方法,该方法利用了参数模型,其中包括随机效应模型,并利用了PRW-EM算法来进行优化。讲者利用该方法分析了冠状动脉疾病数据,通过数据分析,讲者认为InSIDE假定很可能不成立,从而基于该假定的方法很可能得到有偏估计。
Dr. Liu received his doctorate in biostatistics from Harvard University, advised by Prof. Xihong Lin. He worked on the Wall street as a quantitative strategist in NYC before joining HKU. His current research interests are: Statistical inference for massive data, Big Data Analytics, Causal Mediation Analysis, Machine Learning, Signal Detection, Statistical Genetics and Genomics.
讲者首先介绍了在实验设计中应用很广的群体随机试验,并回顾了之前处理这种实验数据的方法,包括广义估计方程,混合效应模型,多层次模型等。然而这些模型往往是假定模型指定正确时,估计出的参数才具有因果含义,当模型错误设定时,这些方法就失效了。讲者提出了新的估计方法。在分析中,讲者使用有限样本的框架,只有处理的分配是随机化的。讲者发现用所有的个体数据做回归的得到的估计量反而比利用每个组内变量的平均做回归的渐近方差更大。
Peng Ding is currently an assistant professor in the Department of Statistics at the University of California, Berkeley. From 2004 to 2011, he received a bachelor's degree in mathematics and economics and a master's degree in statistics from Peking University. He received a Ph.D. in statistics from Harvard University in 2011-2015, and then did postdoctoral research in the Department of Epidemiology at Harvard School of Public Health.
讲者首先回顾了经典的两阶段最小二乘方法以及Anderson-Rubin估计方法。但是在观察性研究中,可能还会存在未观测的协变量,讲者关心的是当将未观测的协变量加入回归模型时,会对统计推断造成什么影响。讲者对于这个问题进行了敏感性分析。讲者指出如果未观察到的协变量能够使得第一阶段的回归系数改变符号,那么该未观测的协变量将可以使得最终得到的因果作用可正可负,既不包含该协变量的回归并不能得到有用的结论。然后讲者也介绍了最差情况下的置信区间的构造方法,并解释了实际数据的分析结果。
Carlos is a PhD candidate in the Department of Statistics at the University of California, Los Angeles (2016-2021). Starting this Fall 2021, He is joining the Department of Statistics at the University of Washington as an Assistant Professor.His research focuses on developing new causal and statistical methods for transparent and robust causal claims in empirical sciences.
讲者研究了孟德尔随机化中的弱工具变量的问题,即工具变量往往与处理相关性很小。此时讲者证明了在一些情况下,逆概率加权方法会得到有偏的估计,这也在一些实际应用中得到了验证。与此同时很多时候孟德尔随机化中的工具变量会直接影响结局变量。讲者提出了纠偏逆概率加权估计量,并证明了该估计量的无偏性和渐近正态性。最后讲者将该方法应用到了BMI-CAD数据集上,同时讲者也建议纠偏逆概率加权方法可以作为复杂数据分析的一个初步分析,能够提供一个简单的初始结论。
Ting Ye is an Assistant Professor of Biostatistics at the University of Washington. Before joining UW, she was a postdoctoral fellow in the Statistics Department of the Wharton School, University of Pennsylvania (mentored by Dylan Small and Sean Hennessy).
比例风险(hazard ratio)是流行病和药学中的一个重要概念,尽管缺乏明确的因果解释,比例风险模型仍然是有用而合理的近似。问题是,危险率不一定是比例的,或者协变量建模可能有误。当模型设定错误,部分似然估计收敛到哪里是不确定的。可以把Cox模型变复杂,但这会导致简单性和正确性的取舍。那么该怎么办呢?作者认为,应该以一个非参数的待估量为目标,而不是以模型为目标,从而发展基于有效影响函数的非参数推断,推断不依赖于模型是否正确。
假设希望使用Cox比例风险模型,然后总结给定某个协变量时暴露变量对事件发生时间结局的关联性。对于二元暴露变量,可以用对数风险率来总结。作者考虑了风险率随协变量或时间变化的情形,用加权平均来总结。这样就提出了一个与模型无关的待估量,可以用插入估计量或机器学习方法来估计。通过计算影响函数,可用数理统计的渐进理论消除插入偏差。最后,作者用模拟实验比较了一些方法。
Stijn Vansteelandt graduated as Master in Mathematics at Ghent University in 1998, and obtained a PhD in Mathematics (Statistics) in 2002 at the same university. After postdoctoral research at the Department of Biostatistics of the Harvard School of Public Health, he returned to Ghent University in 2004, where he is now Full Professor (80%) in the Department of Applied Mathematics, Computer Science and Statistics. He is furthermore Professor of Statistical Methodology (20%) in the Department of Medical Statistics at the London School of Hygiene and Tropical Medicine.
作者把传统的事件发生时间数据推广到多状态数据上,感兴趣的是基线处理对多状态结局(过程)的作用。特别地,将该方法应用于挪威的一项旨在增加就业和复业的政府项目。作者考虑了6个状态,包括就业、教育、全时生病、部分时间生病、失业和死亡,给出了转移图。整个过程可以被看作是一个随机过程,内含随时间变化的转移密度。状态之间的个体的分布可以用占用概率描述,我们想要估计基线处理对应的反事实占用概率,平均因果作用或其他感兴趣的量可通过反事实真用概率定义。占用概率通过Aalen-Johansen估计量估计。如果有协变量,用逆概率加权(倾向得分)来平均是很容易的。最后,作者用包含45万次转移的实际数据说明了该方法,并指出了几个值得未来进一步研究的地方。
Jon Michael Gran is an associate professor at the department of biostatistics of University of Oslo. His current projects are Causal and statistical models for patient outcomes after traumatic injury, and Effects of workplace initiatives on sick leave and work participation - new statistical and causal models to utilise population registries.
在很多实际场景中,资源事有限的,例如疫苗、器官等等,资源有限的情况下进行因果推断是有挑战性的,例如病人之间有因果联系(产生了非独立同分布样本)、反事实策略依赖于全体病人的特征。考虑n个因果关联的个体,只有前N个个体的数据,观测的信息包括临床特征、处理、生存结局。一般的动态治疗策略被定义为从患者的特征到处理的映射,作者采用的方法纳入了全部患者的特征,感兴趣的问题是识别在治疗策略g下n个基线个体的平均生存率,称为g下的聚类平均潜在结果。
对于一个统计模型,包含一个条件集A,使得结构方程的正则条件成立,这产生给定过去测量后独立同分布的结局以及荣誉参数的相合估计;以及一个条件集B,满足正性、可交换性、一致性,用于反事实识别。
Mats is a tenure-track assistant professor of statistics at the Department of Mathematics, EPFL. His research focuses on methods for causal inference. He is particularly interested in settings with exposures and outcomes that depend on time, that is, longitudinal data. Many of his works are inspired by applications in (bio)medicine.
Before he came to EPFL, he was privileged to work with Miguel Hernán and other excellent researchers at Harvard School of Public Health as a Kolokotrones research fellow and Fulbright Research Scholar. He also had the pleasure of being a part-time postdoctoral researcher under supervision of Kjetil Røyslandand Odd Aalen at the University of Oslo. Before he became a full time academic, he had a short career as resident doctor in internal medicine.
Mats received his MD, Dr.Philos in Neuroscience and BSc in Mathematics from the University of Oslo. He also hold a Msc in Statistics from the University of Oxford.
已有方法在试验性研究中使用机器学习算法估计异质性治疗效应、构造个性化治疗规则。但在实际中,机器学习算法管用吗?我们应当经验地评估机器学习算法的表现,从而避免假设机器学习算法的好性质、精确量化不确定性,并使样本量小的时候也适用。我们希望用一个随机化试验来评估一般的个性化治疗策略。在Neyman推断的框架下,针对平均因果作用的度量,可以评估偏差和方差,一个好的个性化治疗策略应当比随机分配更好。作者介绍了通过交叉拟合估计和评估个性化治疗策略(难点是同时考虑估计的不确定性和评估的不确定性),以及预算约束下的评估。最后,作者介绍了模拟实验和真实数据分析结果,并指出了可以期待的扩展,如扩展到异质性因果作用。
Kosuke Imai is a professor in the Department of Government and the Department of Statistics at Harvard University. He is also an affiliate of the Institute for Quantitative Social Science. Before moving to Harvard in 2018, Imai taught at Princeton University for 15 years where he was the founding director of the Program in Statistics and Machine Learning. In addition, Imai served as the President of the Society for Political Methodology from 2017 to 2019 and was elected fellow in 2017. He has been Professor of Visiting Status in the Faculty of Law and Graduate Schools of Law and Politics at the University of Tokyo.
因果科学读书会第三季启动
“因果”并不是一个新概念,而是一个已经在多个学科中使用了数十年的分析技术。通过前两季的分享,我们主要梳理了因果科学在计算机领域的前沿进展。如要融会贯通,我们需要回顾数十年来在社会学、经济学、医学、生物学等多个领域中,都是使用了什么样的因果模型、以什么样的范式、解决了什么样的问题。我们还要尝试进行对比和创新,看能否以现在的眼光,用其他的模型,为这些研究提供新的解决思路。
由智源社区、集智俱乐部联合举办的因果科学与Causal AI读书会第三季,将主要面向两类人群:如果你从事计算机相关方向研究,希望为不同领域引入新的计算方法,通过大数据、新算法得到新成果,可以通过读书会各个领域的核心因果问题介绍和论文推荐快速入手;如果你从事其他理工科或人文社科领域研究,也可以通过所属领域的因果研究综述介绍和研讨已有工作的示例代码,在自己的研究中快速开始尝试部署结合因果的算法。
第三季因果科学与Causal AI读书会从2021年10月24日开始,每周日上午 10:00-12:00举办。共11-12期,每周一期。持续时间预计 2-3 个月。
报名:(长期有效)
扫码报名
第一步:扫码填写报名信息。
第二步:信息填写之后,进入付款流程,提交保证金299元。(符合退费条件后可退费。)
第三步:添加负责人微信,拉入对应的读书会讨论群。
详情请见:
因果+X:解决多学科领域的因果问题 | 因果科学读书会第三季启动
点击“阅读原文”,即可学习课程