Stata:拉索回归入门指南(lasso)
Stata:拉索回归入门(lasso)
1、简介
Lasso最初是“least absolute shrinkage and selection operator----最小绝对收缩和选择操作”的缩写。今天,lasso被认为是一个单词而不是一个缩写词。
lasso用于预测,用于模型选择,并作为估计的一个组成部分来执行推断。弹性网、平方根lasso的设计用于模型的选择和预测。Stata的lasso、elasticnet和sqrtlasso命令实现了这些方法。套索和弹力网适合连续、二进制0-1和计数结果,而sqrlasso适合连续结果。
Stata还为推理提供套索命令。他们使用套索来选择模型中出现的控制变量,并为特定的协变量子集估计系数和标准误差。
Stata的套索推断命令实现了称为双选择、分出和交叉拟合分出的方法。对于这些方法中的每一种,线性、逻辑或泊松回归都可以用于建模连续的、二进制的或计数的结果。偏出和交叉拟合偏出也允许线性模型中的内生协变量。
Stata还提供了一个专门的套索推断命令,用于在使用套索选择控制变量时估计治疗效果。telasso估计平均治疗效果(ATE),对被治疗者的平均治疗效果(ATET),或潜在结果均值(POMs);参考telasso (TE)。
Lasso是一种选择和拟合模型中出现的协变量的方法。lasso命令可以适合线性linear,logit, probit和泊松模型。让我们考虑一个线性模型,y在x1 x2,:::;xp上的模型。通常,您可以通过键入来符合这个模型
regress y x1 x2 : : : xp
现在假设您不确定哪些变量(协变量)属于模型,尽管您确定其中一些变量属于模型,并且它们的数量相对于数据集n中的观察数量较小。在这种情况下,您可以输入
lasso linear y x1 x2 : : : xp
您可以指定数百甚至数千个协变量。你甚至可以指定比你的数据中观察到的更多的协变量!您指定的协变量是套索选择的潜在协变量。
Lasso有三种用法:
1.Lasso用于预测。
2.Lasso用于选型。
3.Lasso用于推理。
通过预测,我们的意思是预测一个结果的价值有条件的一组潜在的回归量。我们的意思是预测样本内外的结果。
模型选择是指选择一组能很好地预测结果的变量。我们的意思不是在真实模型中选择变量或对系数进行科学解释。相反,我们的意思是在一个数据集中选择与结果密切相关的变量,并测试这些相同的变量是否能很好地预测其他数据集中的结果。
所谓推理,我们指的是解释和赋予拟合模型的系数意义的推理。推理涉及到对真实模型中变量的影响的估计,以及对标准误差、置信区间、p值等的估计
为了估计线性套索模型,我们可能会键入
. lasso linear y x1-x500
拉索将从x1-x500中选择一个变量子集用于预测。
如果我们有一个二进制0-1结果变量,我们可以通过输入来拟合一个logit模型
. lasso logit x1-x500
或者通过输入一个probit模型
. lasso probit y x1-x500
对于计数结果,我们可以通过输入来拟合泊松模型
. lasso poisson y x1-x500
在任何这些套索命令之后,我们可以使用predict来获得对y的预测。
有关演示如何使用lasso命令拟合适合预测的模型的示例,请参见 lasso中的备注和示例
2、语法格式
lasso model depvar [(alwaysvars)] othervars [if] [in] [weight] [, options]
model模型是linear, logit, probit, or poisson.
Alwaysvars是总是包含在模型中的变量。
其他变量是套索将选择包括在或从模型中排除的变量。
noconstant 表示不加常数项
Selection (sel_method)选择方法,从可能的lambda集合中选择套索惩罚参数lambda*的值
[no]log 表示显示或不显示迭代日志
Rseed(#)设置随机数种子
Grid (#_g[, ratio(#) min(#)])使用带有#_g个网格点的对数网格指定可能的lambda集
Selection (sel_method)选择方法包括如下:
cv[, cv_opts] 表示 select lambda* using CV; the default
adaptive[, adapt_opts cv_opts]表示 select lambda* using an adaptive lasso
plugin[, plugin_opts] 表示 select lambda* using a plugin iterative formula
bic[, bic_opts] 表示select lambda* using BIC function
none 表示do not select lambda*
3、案例应用
我们使用auto数据集来演示lasso命令。
sysuse auto
虽然这个数据集不太可能适合做套索,但它非常适合演示
下面使用lasso linear 进行 lasso 回归,结果如下
lasso linear mpg i.foreign i.rep78 headroom weight turn gear_ratio price
> trunk length displacement, selection(cv, alllambdas) stop(0) rseed(12345)
选项含义:
“selection(cv, alllambdas)”表示使用交叉验证(默认为 10折)选择调节参数 选择项“rseed(12345)”表示在将样本随机分为10 等分时,使用随机种子12345
结果 为:
. sysuse auto
(1978 automobile data)
. lasso linear mpg i.foreign i.rep78 headroom weight turn gear_ratio price trunk length displacement, selection(cv, alllambda
> s) stop(0) rseed(12345)
Evaluating up to 100 lambdas in grid ...
Grid value 1: lambda = 4.69114 no. of nonzero coef. = 0
Grid value 2: lambda = 4.274392 no. of nonzero coef. = 2
Grid value 3: lambda = 3.894667 no. of nonzero coef. = 2
Grid value 4: lambda = 3.548676 no. of nonzero coef. = 2
Grid value 5: lambda = 3.233421 no. of nonzero coef. = 2
Grid value 6: lambda = 2.946173 no. of nonzero coef. = 2
Grid value 7: lambda = 2.684443 no. of nonzero coef. = 2
Grid value 8: lambda = 2.445964 no. of nonzero coef. = 2
Grid value 9: lambda = 2.228672 no. of nonzero coef. = 2
Grid value 10: lambda = 2.030683 no. of nonzero coef. = 2
Grid value 11: lambda = 1.850282 no. of nonzero coef. = 2
Grid value 12: lambda = 1.685908 no. of nonzero coef. = 2
Grid value 13: lambda = 1.536137 no. of nonzero coef. = 2
Grid value 14: lambda = 1.399671 no. of nonzero coef. = 2
Grid value 15: lambda = 1.275328 no. of nonzero coef. = 3
Grid value 16: lambda = 1.162031 no. of nonzero coef. = 3
Grid value 17: lambda = 1.0588 no. of nonzero coef. = 3
Grid value 18: lambda = .9647388 no. of nonzero coef. = 3
Grid value 19: lambda = .8790341 no. of nonzero coef. = 4
Grid value 20: lambda = .8009431 no. of nonzero coef. = 5
Grid value 21: lambda = .7297895 no. of nonzero coef. = 6
Grid value 22: lambda = .664957 no. of nonzero coef. = 6
Grid value 23: lambda = .6058841 no. of nonzero coef. = 6
Grid value 24: lambda = .552059 no. of nonzero coef. = 6
Grid value 25: lambda = .5030156 no. of nonzero coef. = 6
Grid value 26: lambda = .4583291 no. of nonzero coef. = 6
Grid value 27: lambda = .4176124 no. of nonzero coef. = 6
Grid value 28: lambda = .3805129 no. of nonzero coef. = 6
Grid value 29: lambda = .3467091 no. of nonzero coef. = 6
Grid value 30: lambda = .3159085 no. of nonzero coef. = 7
Grid value 31: lambda = .287844 no. of nonzero coef. = 8
Grid value 32: lambda = .2622728 no. of nonzero coef. = 8
Grid value 33: lambda = .2389732 no. of nonzero coef. = 8
Grid value 34: lambda = .2177434 no. of nonzero coef. = 8
Grid value 35: lambda = .1983997 no. of nonzero coef. = 8
Grid value 36: lambda = .1807744 no. of nonzero coef. = 8
Grid value 37: lambda = .1647149 no. of nonzero coef. = 8
Grid value 38: lambda = .1500821 no. of nonzero coef. = 8
Grid value 39: lambda = .1367492 no. of nonzero coef. = 8
Grid value 40: lambda = .1246008 no. of nonzero coef. = 8
Grid value 41: lambda = .1135316 no. of nonzero coef. = 8
Grid value 42: lambda = .1034458 no. of nonzero coef. = 8
Grid value 43: lambda = .0942559 no. of nonzero coef. = 9
Grid value 44: lambda = .0858825 no. of nonzero coef. = 10
Grid value 45: lambda = .0782529 no. of nonzero coef. = 11
Grid value 46: lambda = .0713012 no. of nonzero coef. = 11
Grid value 47: lambda = .064967 no. of nonzero coef. = 11
Grid value 48: lambda = .0591955 no. of nonzero coef. = 11
Grid value 49: lambda = .0539367 no. of nonzero coef. = 11
Grid value 50: lambda = .0491451 no. of nonzero coef. = 11
Grid value 51: lambda = .0447792 no. of nonzero coef. = 12
Grid value 52: lambda = .0408011 no. of nonzero coef. = 12
Grid value 53: lambda = .0371765 no. of nonzero coef. = 13
Grid value 54: lambda = .0338738 no. of nonzero coef. = 13
Grid value 55: lambda = .0308646 no. of nonzero coef. = 13
Grid value 56: lambda = .0281226 no. of nonzero coef. = 13
Grid value 57: lambda = .0256243 no. of nonzero coef. = 13
Grid value 58: lambda = .0233479 no. of nonzero coef. = 13
Grid value 59: lambda = .0212738 no. of nonzero coef. = 13
Grid value 60: lambda = .0193839 no. of nonzero coef. = 13
Grid value 61: lambda = .0176618 no. of nonzero coef. = 13
Grid value 62: lambda = .0160928 no. of nonzero coef. = 13
Grid value 63: lambda = .0146632 no. of nonzero coef. = 13
Grid value 64: lambda = .0133605 no. of nonzero coef. = 13
Grid value 65: lambda = .0121736 no. of nonzero coef. = 13
Grid value 66: lambda = .0110922 no. of nonzero coef. = 13
Grid value 67: lambda = .0101068 no. of nonzero coef. = 13
Grid value 68: lambda = .0092089 no. of nonzero coef. = 13
Grid value 69: lambda = .0083908 no. of nonzero coef. = 13
Grid value 70: lambda = .0076454 no. of nonzero coef. = 13
Grid value 71: lambda = .0069662 no. of nonzero coef. = 13
Grid value 72: lambda = .0063473 no. of nonzero coef. = 13
Grid value 73: lambda = .0057835 no. of nonzero coef. = 13
Grid value 74: lambda = .0052697 no. of nonzero coef. = 13
Grid value 75: lambda = .0048015 no. of nonzero coef. = 13
Grid value 76: lambda = .004375 no. of nonzero coef. = 13
Grid value 77: lambda = .0039863 no. of nonzero coef. = 13
Grid value 78: lambda = .0036322 no. of nonzero coef. = 13
Grid value 79: lambda = .0033095 no. of nonzero coef. = 13
Grid value 80: lambda = .0030155 no. of nonzero coef. = 13
Grid value 81: lambda = .0027476 no. of nonzero coef. = 13
Grid value 82: lambda = .0025035 no. of nonzero coef. = 13
Grid value 83: lambda = .0022811 no. of nonzero coef. = 13
Grid value 84: lambda = .0020785 no. of nonzero coef. = 13
Grid value 85: lambda = .0018938 no. of nonzero coef. = 13
Grid value 86: lambda = .0017256 no. of nonzero coef. = 13
Grid value 87: lambda = .0015723 no. of nonzero coef. = 13
Grid value 88: lambda = .0014326 no. of nonzero coef. = 13
Grid value 89: lambda = .0013053 no. of nonzero coef. = 13
Grid value 90: lambda = .0011894 no. of nonzero coef. = 13
Grid value 91: lambda = .0010837 no. of nonzero coef. = 13
Grid value 92: lambda = .0009874 no. of nonzero coef. = 13
Grid value 93: lambda = .0008997 no. of nonzero coef. = 13
Grid value 94: lambda = .0008198 no. of nonzero coef. = 13
Grid value 95: lambda = .000747 no. of nonzero coef. = 13
Grid value 96: lambda = .0006806 no. of nonzero coef. = 13
Grid value 97: lambda = .0006201 no. of nonzero coef. = 13
Grid value 98: lambda = .000565 no. of nonzero coef. = 13
Grid value 99: lambda = .0005149 no. of nonzero coef. = 13
Grid value 100: lambda = .0004691 no. of nonzero coef. = 13
10-fold cross-validation with 100 lambdas ...
Fold 1 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 2 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 3 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 4 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 5 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 6 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 7 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 8 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 9 of 10: 10....20....30....40....50....60....70....80....90....100
Fold 10 of 10: 10....20....30....40....50....60....70....80....90....100
... cross-validation complete
Lasso linear model No. of obs = 69
No. of covariates = 15
Selection: Cross-validation No. of CV folds = 10
--------------------------------------------------------------------------
| No. of Out-of- CV mean
| nonzero sample prediction
ID | Description lambda coef. R-squared error
---------+----------------------------------------------------------------
1 | first lambda 4.69114 0 0.0049 33.74852
40 | lambda before .1246008 8 0.6225 12.80314
* 41 | selected lambda .1135316 8 0.6226 12.79854
42 | lambda after .1034458 8 0.6218 12.82783
100 | last lambda .0004691 13 0.5734 14.46932
--------------------------------------------------------------------------
* lambda selected by cross-validation.
.
这个命令在LASSO中有充分的解释。这里特别感兴趣的是子选项alllambdas和选项stop(0)。它们一起确保在交叉验证网格中搜索完整的100个默认值。否则,套索将停止搜索一旦它已经找到一个最佳或一旦它的其他停止规则之一是满足。
上表显示,根据交叉验证准则,调节参数λ的最优取值为 .1135316 ,共有 8 个非零回归系数。相应的样本外(Out-of-sample R-squared)为0.6226。
为这个套索回归估计展示系数路径绘图就像打字一样容易,可以使用命令 coefpath 画出 lasso 的系数路径
coefpath
x轴表示从0到15的受惩罚系数('L 1-范数)的绝对值之和。每条线都跟踪我们模型中一个标准化协变量的惩罚系数。这些图表很受欢迎,但也带来了一些难题。它们只能在协变量很少的情况下被解释,而套索通常在协变量很多的情况下最适用。
通常情况下,由于变量太多,任何单一路径都无法引起人们的兴趣。这些数据足够小,我们可以查看每个协变量。让我们打开图例,把它放在图的旁边,用一列作为图示
coefpath, lineopts(lwidth(thick)) legend(on position(3) cols(1)) xsize(4.2)
如想显示 lasso 回归的系数,可使用命令
lassocoef, display(coef) sort(coef)
结果为:
lassocoef, display(coef) sort(coef)
------------------------
| active
-------------+----------
length | -2.930892
0.foreign | 1.427544
gear_ratio | 1.334716
|
rep78 |
5 | 1.275659
|
turn | -.7169113
|
rep78 |
3 | -.3124912
|
weight | -.295765
price | -.291902
_cons | 0
------------------------
Legend:
b - base level
e - empty cell
o - omitted