贝叶斯回归模型,选取数据先验分布,以贝叶斯逻辑回归为例
贝叶斯学派与频率学派的争执,从来没有停止过,因为他们对世界的看法是有差异的。贝叶斯学派从来不认为,通过观察所获得的频率就是概率,而是通过主观先验知识给出一个可能的分布,然后通过新的数据来不断逼近真实概率。当然,当数据样本足够大的时候,先验概率与真实概率的差异会消失,但是贝叶斯学派相对而言更有利于经验丰富的专家型人士的建模,因此充分利用了先验的知识这个重要信息。
Introduction
We provide a convenient and elegant way of fitting Bayesian regression models by simply prefixing the estimation command with bayes. You can choose from 45 supported estimation commands. All of Stata’s existing Bayesian features are supported by the new bayes prefix.
You can use default priors for model parameters or select from many prior distributions. I will demonstrate the use of the bayes prefix for fitting a Bayesian logistic regression model and explore the use of Cauchy priors (available as of the update on July 20, 2017) for regression coefficients.
A common problem for Bayesian practitioners is the choice of priors for the coefficients of a regression model. The conservative approach of specifying very weak or completely uninformative priors is considered to be data-driven and objective, but is at odds with the Bayesian paradigm.
Noninformative priors may also be insufficient for resolving some common regression problems such as the separation problem in logistic regression. On the other hand, in the absence of strong prior knowledge, there are no general rules for choosing informative priors. In this article, I follow some recommendations from Gelman et al. (2008) for providing weakly informative Cauchy priors for the coefficients of logistic regression models and demonstrate how these priors can be specified using the bayes prefix command.
Data
I consider a version of the well known Iris dataset (Fisher 1936) that describes three iris plants using their sepal and petal shapes. The binary variable virg distinguishes the Iris virginica class from those of Iris Versicolour and Iris Setosa. The variables slen and swid describe sepal length and width. The variables plen and pwid describe petal length and width. These four variables are standardized, so they have mean 0 and standard deviation 0.5.
Standardizing the variables used as covariates in a regression model is recommended by Gelman et al. (2008) to apply common prior distributions to the regression coefficients. This approach is also favored by other researchers, for example, Raftery (1996).
For validation purposes, I withhold the first and last observations from being used in model estimation. I generate the indicator variable touse that will mark the estimation subsample.
Models
I first run a standard logistic regression model with outcome variable virg and predictors slen, swid, plen, and pwid.
The logit command issues a note that some observations are completely determined. This is due to the fact that the continuous covariates, especially pwid, have many repeating values.
I then fit a Bayesian logistic regression model by prefixing the above command with bayes. I also specify a random-number seed for reproducibility.
By default, normal priors with mean 0 and standard deviation 100 are used for the intercept and regression coefficients. Default normal priors are provided for convenience, so users can see the naming conventions for parameters to specify their own priors. The chosen priors are selected to be fairly uninformative but may not be so for parameters with large values.
The bayes:logit command produces estimates that are much
higher in absolute value than the corresponding maximum likelihood estimates. For example, the posterior mean estimate for the coefficient of the plen variable, {virg:plen}, is about 60. This means that a unit change in plen results in a 60-unit change for the outcome in the logistic scale, which is very large. The corresponding maximum likelihood estimate is about 33. Given that the default priors are vague, can we explain this difference? Let's look at the sample posterior distribution of the regression coefficients. I draw histograms using the bayesgraph histogram command:
Under vague priors, posteriormodes are expected to be close to MLEs. All posterior distributions are skewed. Thus, all sample posterior means are much larger than posterior modes in absolute value and are different from MLEs.
Gelman et al. (2008) suggest applying Cauchy priors for the regression coefficients when data are standardized so that all continuous variables have standard deviation 0.5. Specifically, we use a scale of 10 for the intercept and a scale of 2.5 for the regression coefficients. This choice is based on the observation that within the unit change of each predictor, an outcome change of 5 units on the logistic scale will move the outcome probability from 0.01 to 0.5 and from 0.5 to 0.99.
The Cauchy priors are centered at 0, because the covariates are centered at 0.
We can use bayesgraph diagnostics to verify that there are no convergence problems with the model, but I skip this step here.
The posterior mean estimates in this model are about three times smaller in absolute value than those of the model with default vague normal priors and are closer to the maximum likelihood estimates. For example, the posterior mean estimate for {virg:plen} is now only about 21.
The estimated log-marginal likelihood of the model, -18.7, is higher than that of the model with default normal priors, -20.6, which indicates that the model with independent Cauchy priors fits the data better.
Predictions
Now that we are satisfied with our model, we can perform some postestimation. All Bayesian postestimation features work after the bayes prefix just as they do after the bayesmh command. Below, I show examples of obtaining out-of-sample predictions. Say we want to make predictions for the first and last observations in our dataset, which were not used for fitting the model. The first observation is not from the Iris virginica class, but the last one is.
We can use the bayesstats summary command to predict the outcome class by applying the invlogit() transformation to the desired linear combination of predictors.
The posterior mean probability for the first observation to belong to the class Iris virginica is estimated to be essentially zero, 7.3e-10. In contrast, the estimated probability for the last observation is about 0.91. Both predictions agree with the observed classes.
The dataset used in this post is available here: irisstd.dta
References
Gelman, A., A. Jakulin, M. G. Pittau, and Y.-S. Su. 2008. A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2: 1360–1383.
Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7: 179–188.
Raftery, A. E. 1996. Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika 83: 251–266.
注:想完全看清楚文章里的表格,请用电脑版本的打开。
《END》
写在后面:各位圈友,一个等待数日的好消息,是计量经济圈应圈友提议,09月04日创建了“计量经济圈的圈子”知识分享社群,如果你对计量感兴趣,并且考虑加入咱们这个计量圈子来受益彼此,那看看这篇介绍文章和操作步骤哦(戳这里)。进去之后一定要看“群公告”,不然接收不了群信息。若需要获得计量经济学视频资料,那可以(戳这里)。