Stata, 不可能后悔的10篇文章, 编程code和注解
凡是搞计量经济的,都关注这个号了
邮箱:econometrics666@sina.cn
所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问
**Copyrights @计量经济圈(ID: econometrics666)-
3.高效使用Stata的115页Tips, PDF版本可打印使用
4."高级计量经济学及Stata应用"和"Stata十八讲"配套数据
6.世界范围内使用最多的500个Stata程序,再不使用你就真的out了
8.reg3, 多元回归, 面板数据, 方差分析, 异方差和自相关检验和修正的Stata程序Handbook
9.Stata统计功能、数据作图、学习资源等,一文打尽所有你的wonders
如果你不懂下面每个程序运行的是什么,你可以到社群交流探讨。
http://fmwww.bc.edu/repec/bocode/t/textEditors.html#notepadplus
http://personal.lse.ac.uk/lembcke/ecStata/2010/MResStataNotesOct2010PartA.pdf
http://homepages.rpi.edu/~simonk/pdf/UsefulStataCommands.pdf
http://statadaily.com/tag/notepad/
http://personal.lse.ac.uk/lembcke/ecStata/2009/MResStataNotesFeb2009PartB.pdf
http://fmwww.bc.edu/GStat/docs/StataMLNL.pdf
第一个程序
capture program drop argdisp
program argdisp
version 13
args first second third //根据位置来分配宏定义
display "1st argument =first'" display "2nd argument =
second'"
display "3rd argument = `third'"
end
argdisp cat dog mouse //第一个宏的位置为cat,第二个为dog,第三个为mouse
argdisp 3.456 2+5-12 X3+cat //第一个宏的位置为3.456,第二个为2+5-12,第三个为X3+cat
*第二个程序
capture program drop myprog
program myprog
version 15
syntax varlist [if] [in] [, adjust(real 1) title(string)]
display
if "title'"!= "" { display "
title':"
}
foreach var of local varlist {
quietly summarizevar'
if'in' display "
var'" " " r(mean)`adjust'
}
end
webuse auto.dta, clear
myprog mpg price //计算均值的一个程序
myprog mpg weight if foreign==1 //条件foreign==1的情况下的均值情况
myprog mpg weight if foreign==1, title("My title")
myprog mpg weight if foreign==1, title("My title") adjust(2)
**第三个程序
capture program drop doavar
program doavar
version 15
args touse name value
qui summarizename' if
touse'
display "name'" " " r(mean)*
value'
end
**第四个程序
capture program drop myprog
program myprog
version 15
syntax varlist [if] [in] [, adjust(real 1) title(string)]
marksample touse
display
if "title'"!= "" { display "
title':"
}
foreach var of local varlist {
doavartouse'
var' `adjust'
}
end
webuse auto.dta, clear
doavar mpg weight trunk //表示第二变量weight的均值与第三个变量trunk的乘积
myprog mpg weight trunk //这三个变量的均值
**第五个程序
capture program drop lnsim
program lnsim
version 15
tempname sim //临时名字
postfilesim' mean var meansd sd using results, replace //四个变量储存在results.dta quietly { forvalues i = 1/10000 { drop _all set obs 100 gen z = exp(rnormal()) //lognormal随机数 sum z post
sim' (r(mean)) (r(var)) (r(mean)/r(sd)) (r(sd)) //求这四个变量的相关统计量
}
}
postclose `sim'
end
set seed 12345
lnsim
use results, clear
describe
sum
*产生新的变量或者在变量中输入新的值
webuse genxmpl2, clear
generate str9 lastname = word(name, 2)
**输入新的变量
input x
x
1
2
end
input double (y z)
y z
3 4
5 6
end
input str2 s
s
ab
cd
end
*与Preserve功能相仿的恢复功能
webuse auto, clear
snapshot erase _all //先把这些snapshot从系统中清除
snapshot save, label("before changes")
generate gpm = 1/mpg //产生新的变量
label variable gpm "gallons per mile"
snapshot save, label("after changes") //保存这个改变后的数据
drop gpm //现在drop掉gpm
snapshot list //看看现在有几个snapshot
snapshot restore 2 //恢复snapshot 2即gpm
describe //可以看看现在恢复过后的数据
snapshot restore 1
snapshot list
*Program相关的功能
program dir //可以看到有多少ado文件在系统内存里
capture program drop rng
program rng
args n a b
if "b'"=="" { display "You must type three arguments: n a b" exit } drop _all set obs
n'
gen x = (_n-1)/(_N-1)(b'-
a')+`a'
end
rng 10 2 3 //运行程序
list x in 1/10 //得到的结果
**第六个程序
capture program drop smooth
program smooth
args v1 v2
confirm variablev1' //核实数据集里有这个变量 confirm new variable
v2' //核实数据集里没有这个变量
genv2' = cond(_n==1 |_n==_N,
v1',(v1'[_n-1]+
v1'[_n+1])/3) //条件函数
end
webuse auto.dta, clear
smooth mpg new_mpg
list new_mpg in 1/10
**扩展宏
local logitprops: properties logit //logit程序的性质
di `logitprops'
*Putexcel输出操作
putexcel set results //设置results.xlsx作为输出excel
putexcel A1 = "Variable" B1 = "Mean" C1 = "Std. Dev.", border(bottom)
sysuse auto, clear
summarize mpg
return list
putexcel A2 = "mpg" B2 =r(mean)' C2 =
r(sd)', nformat(number_d2)
*输出tabulation table
sysuse auto, clear
putexcel set results //设置results.xlsx作为输出excel
tab foreign, matcell(cell) matrow(rows)
putexcel A1=("Car type") B1=("Freq.")
putexcel A2=matrix(rows) B2=matrix(cell)
putexcel A4=("Total") B4=(r(N))
*输出回归中的各个结果
sysuse auto.dta, clear
regress price turn gear
putexcel set "results.xls", sheet("regress resutls")
putexcel F1=("Number of obs") G1=(e(N))
putexcel F2=("F") G2=(e(F))
putexcel F3=("Prob > F") G3=(Ftail(e(df_m), e(df_r),e(F)))
putexcel F4=("R-squared") G4=(e(r2))
putexcel F5=("Adj R-squared") G5=(e(r2_a))
putexcel F6=("Root MSE") G6=(e(rmse))
matrix a=r(table)'
matrix a=a[.,1..6]
putexcel A8=matrix(a)
*使用quietly进行编程
capture program drop myprog
program myprog
quietly{
regress1'
2'
predict resid, resid
sort resid
summarize resid, detail
}
list1'
2' resid if resid<r(p5) | resid>r(p95)
drop resid
end
sysuse auto.dta, clear
myprog mpg price //直接列出来三部分的值
展示共线性所删除的变量
sysuse auto.dta, clear
gen tt= turn+ trunk
_rmcoll turn trunk tt
display r(varlist)
_rmcoll i.rep78
display r(varlist)
_rmcoll rep78#foreign
display r(varlist)
syntax varlist [fweight iweight] ... [, noCONStant ... ]
marksample touse
if "weight'"!= "" { tempvar w quietly gen double
w' =exp' if
touse'
local wgt [weight'=
w']
}
else local wgt / is nothing */
gettoken depvar xvars : varlist
_rmcollxvars'
wgt' iftouse',
constant'
local xvars `r(varlist)'
*程序运行需要的时间记录
capture program drop tester
sysuse auto.dta, clear
program tester
version 15
timer clear 1
forvalues repeat=1(1)1{
timer on 1
logit foreign trunk price rep78 //这是需要运行的程序
timer off 1
}
timer list 1
end
sum turn
logit foreign trunk price if length > 190
marksample touse
reg headroom trunk price if touse==1
Scalar相关操作程序
sysuse auto.dta, clear
sum mpg, meanonly
scalar m1=r(mean)
sum trunk, meanonly
scalar m2=r(mean)
scalar df=m1-m2
dis df
scalar list //把所有的scalars显示出来
gen newvar1=mpgm1
dis newvar1
gen newvar2 = mpg*scalar(m1) //这一个更好
dis newvar2
*构造一个简单的程序
capture program drop mysub
sysuse auto.dta, clear
program mysub
args m1 m2 m3
logitm1'
m2' `m3'
end
capture program drop myprog
program myprog
drop z
set obs 100
gen z=uniform()
sum z
gen m1 = r(mean)
mysub foreign m1 trunk
end
*决定是否数据已经发生改变
sysuset auto.dta, clear
logit foreign trunk price, vce(cluster make)
predict xb
signestimationsample foreign trunk price
checkestimationsample //如果数据没有发生改变则silently return
quietly tsset //时间序列数据
signestimationsample r(timevar) lhsvar rhsvars othervars
quietly xtset //面板数据
signestimationsample r(panelvar) rtimevar lhsvar rhsvars clustervar
*让Stata等10秒钟再运行下一个程序
sleep 10000
*SMCL: Stata markup and control language
display "{title: this is SMCL, too}"
display "now we will try {help summarize: clicking}"
display "You can also run Stata commands by {stata summarize mpg: clicking}"
display "{center: The use of {ul:SMCL} in help files}"
display "{text}the variable mpg has mean {result: 21.3} in the sample"
display "{text}mpg {c |} {result}21.3"
display "{text}mpg {c |} {result:21.3}"
display "error: variable not found"
display "{txt}the variable mpg has mean {res:21.3} in the sample"
display "When using the {cmd:summarize} command, specify"
display "{cmdab:su:mmarize}[{it:varlist}][{it:weight}][{cmdab:if} {it:exp}]"
display "{opt replace}"
display "{opt bseunit(varname)}"
display "opt f:ormat"
display "sep:arator(#)"
display "{hilite:[R] anova} for more details"
display " this text will be ignored"
display "{hiline 20}"
display "{dup 20: A}"
display "{manhelpi mta M:Mata Reference Manual}"
display "{{pstd}You can change the style of the text using the {cmd} directive; see {help example##cmd} below}"
display "{help epitab}"
display "{newvar}"
display "{search anova: click here} for the latest info on ANOVA"
display "you can {browse "http://www.stata.com":visit the Stata website}"
display "see {view "http://www.stata.com/man/readme.smcl"}"
*一个SMCL相关的程序
program example2
display as text "{p}"
display "Below we will call a subroutine to contribute a sentence"
display "to this paragraph being constructed by example2:"
example2_subroutine
display "The text that example2_subroutine contributed became"
display "part of this single paragraph. Now we will end the paragraph."
end
program example2_subroutine
display "This sentence is being displayed by"
display "example2_subroutine"
end
*Sortpreserve把数据顺序恢复到原来位置
capture program drop myprog
program myprog, sortpreserve
args i j
sorti'
j'
mysubcalculationi'
j'
end
program mysubcalculation, sortpreserve
args i j
sortj'
i'
end
sysuse auto.dta, clear
myprog mpg trunk
program myprog2, byable(recall) sortpreserve
syntax varname [if] [in]
marksample touse
sorttouse'
varname'
summarizevarname' if
touse'
end
sysuse auto.dta, clear
myprog2 price
*Byable允许程序前面放by进行分组回归
program myprog1, byable(recall)
syntax [varlist] [if] [in]
marksample touse
summarizevarlist' if
touse'
end
sysuse auto.dta, clear
by foreign: myprog1 price trunk weight
program myprog3, byable(onecall) sortpreserve
syntax newvarname =exp [if] [in]
marksample touse, novarlist
tempvar rhs
quietly {
gen doublerhs'
exp' iftouse' sort
touse'_byvars'
rhs'
bytouse'
_byvars': gentype'
varlist' = /*
*/rhs' -
rhs'[_n-1] if `touse'
}
end
myprog3 mpg_new= mpg^2
*Syntax语言
capture program drop myprog
program myprog
version 15
syntax varlist [if] [in][,adjust(real 1) title(string)]
display "varlist contains |varlist'|" display " if contains |
if'|"
display " in contains |in'|" display "adjust contains |
adjust'|"
display "title contains |`title'|"
end
sysuse auto.dta, clear
myprog mpg weight if foreign in 1/20, title("My results") adjust(2.5) //执行程序
capture program drop myprog
program myprog
version 15
syntax varlist [if] [in] [, adjust(real 1) title(string)]
marksample touse //标记样本
display
if "title'"!="" { display "
title':"
}
foreach var of local varlist {
quietly sumvar' if
touse'
display %9s "var'" " " %9.0g r(mean)*
adjust'
}
end
sysuse auto.dta, clear
myprog mpg weight if foreign, title("My results") adjust(2.5) //执行程序
*Gettoken获得token的程序
local str "cat+dog mouse++horse"
gettoken left: str //空格之前的放在left,之后的放在str里
display"
left'"'
display"
str'"'
gettoken left str: str, parse(" +") //+之前的放在left,之后的放在str里
display"
left'"'
display"
str'"'
*看一个面板中某个变量在追踪的年份中保持不变的数目
webuse nlswork, clear
xtset idcode
keep if idcode<=6
keep idcode year union
quietly {
local r=0
gen n=0
forvalues j=1/6 {
duplicates tag union if idcode==j', gen(union
j')
tab unionj' return list scalar i
j'=r(r)
if ij'==1 { replace n =
r' + 1
local r=`r'+1
}
}
}
sum n
display as text "总共有" r(mean) "是没有发生变化的"
*Sysdir系统directory
sysdir
sysdir set OLDPLACE "d:\ado" //改变当前的OLDPLACE路径
adopath //ado files的路径
set adosize 1550 //增加ado空间
*Tabdisp展示Table与list有相似处
webuse tabdxmpl1, clear
tabdisp a b, cell(c) //相当于当a,b=(x1, x2)时c的数值
sysuse auto2, clear
tabdisp make, cell(mpg weight displ rep78) //变量make与mpg,weight,displ和rep78的表
collapse (mean) mpg, by(foreign rep78)
tabdisp foreign rep78, cell(mpg) //这个与collapse有点相似
tabdisp foreign rep78, cell(mpg) format(%9.2f) center //数值格式发生了变化
webuse tabdxmpl3, clear
tabdisp agecat sex party, c(reaction) center //现在是三层叠加的表格挺好用
webuse tabdxmpl4, clear
tabdisp sex response, cell(pop) missing //缺失值显示出来
webuse tabdxmpl5, clear
tabdisp sex response, cell(pop) total //显示总共Total在最后一列
*Macro宏定义
local ++x //这与local x=x'+1 local x=
x'+1
sysuse auto.dta, clear
global x : type mpg //扩展方程宏
dis "x"
global x2 : variable label mpg //扩展方程宏
dis "x2"
constraint 1 price = weight //限制1为price=weight
constraint 2 mpg > 20
local myname : constraint 2 //写一个扩展宏
macro list _myname //把扩展宏显示出来
local aname : constraint dir
macro list _aname
local today c(current_date) //显示当前日期的
dis `today'
dis c(N)
dis c(current_time)
dis c(max_N_theory)
dis c(max_matsize)
dis c(max_macrolen)
dis c(mindouble)
dis c(Weekdays)
constraint 1 price = weight
local myname: constraint 1
macro list _myname
local lmyname: strlen local myname
macro list _lmyname
local string "a or b or c or d"
global newstr: subinstr local string "c" "sand"
display "$newstr"
local string2 : subinstr global newstr "or" "and", all count(local n)
display "string2'" local x 5 display "
x++'" //x++=x'+1 display "
x'"
format `:format gear_ratio' headroom //把headroom的显示格式弄成与gear_ratio是一样的
*Tempfile的用处比较明显
preserve // preserve user’s data
keep var1 var2 xvar
tempfile master part1 // declare temporary files
save "master'" drop var2 save "
part1'"
use "master'", clear drop var1 rename var2 var1 append using "
part1'"
*Tokenize就是把子划成1、2、3这种形式
tokenize some words
display "1=|1'|, 2=|
2'|, 3=|3'|" tokenize "some more words" display "1=|
1'|, 2=|2'|, 3=|
3'|, 4=|`4'|"
*生成新的变量
set obs 100
gen x=uniform()
generate y = x[_n] //生成与x一样的y
generate xlag = x[_n-1] //生成x的之后一期,与时间序列里L.x
generate xlead = x[_n+1] //生成x的前一期,与时间序列里F.x
*计算置信区间(这是immediate程序)
sysuse auto, clear
ci means mpg price, level(90) //计算服从正态分布的mpg,price变量的均值置信区间
webuse petri, clear
ci means count, poisson //计算服从泊松分布的count的均值置信区间
webuse promonone, clear
ci proportions promoted //计算服从binomial分布的promoted均值置信区间
ci proportions promoted, wilson
ci proportions promoted, agresti
ci proportions promoted, jeffreys
webuse peas_normdist, clear
ci variances weight //计算weight的方差的置信区间
ci variances weight, sd bonett level(90)
cii means 166 19509 4379 //计算观测值为166,均值为19509,方差为4379的均值置信区间
cii means 166 19509 4379, level(90)
cii proportions 10 1
cii variances 15 2.1
*Trace跟踪某个语句出错了
program myprog
version 15
syntax varname , [Prefix(string)]
local newname "prefix'
varname'
local newname "new
end
sysuse auto.dta, clear
set trace on //可以知道哪个地方出错了
myprog mpg, prefix("new")
**以下是一个嵌套程序
capture program drop simple
program simple //一个简单的程序
version 15
args msg
if "msg'"=="hello" { display "you said hello" } else display "you did not say hello" display "good-bye" end set trace on //可以跟踪程序运行 simple hello simple no program myprog2 args msg simple "
msg'"
display "good"
end
program myprog1
args msg
myprog2 "`msg'"
display "bye"
end
set trace on //下面这几个一起选中执行
set tracenumber on //每个执行都有对应的行编号
set tracedepth 2 //根据嵌套进行缩进
myprog1 hello
set tracedepth 32000
set tracenumber off
*Unab把缩写的变量扩展成全称
sysuse auto, clear
unab x : mpg wei for, name(myopt())
display "`x'"
unab x : junk
unab x : mpg wei, max(1) name(myopt())
unab x : mpg wei, max(1) name(myopt()) min(0)
unab x : mpg wei, min(3) name(myopt())
unab x : mpg wei, min(3) name(myopt()) max(10)
unab x : mpg wei, min(3) max(10)
gen time = _n //时间序列数据
tsset time
tsunab mylist : l(1/3).mpg
display "mylist'" tsunab mylist : l(1/3).(price turn displ) di "
mylist'"
unab varn : mp
display "varn'" set varabbrev off //一旦关闭这个varabbrev就不能使用unab unab varn : mp set varabbrev on unab varn : mp display "
varn'"
*Unabcmd能够把系统自带的cmd扩展成全名
unabcmd gen
return list //能够看全名
unabcmd kappa
return list
*Viewsource能够看到每个ado和mata的源文件
viewsource ml.ado
viewsource xtreg.ado
viewsource panelsetup.mata
*While作为循环写程序
capture program drop demo
program demo
local i=1
whilei'>0 { display "i is now
i'"
local i=`i'-1 //也可以写成local --i
}
display "done"
end
set trace on
demo i=2
*Nobreak可以让程序继续执行而不被打断,如果是break就是ctrl+pause break
capture program drop breakprocess
program breakprocess
args myv
nobreak {
renamemyv' Result list Result in 1/5 rename Result
myv'
}
end
sysuse auto.dta, clear
set trace on
breakprocess mpg
*输出回归的variance-covariance矩阵
capture program drop yourprog
program yourprog
args var2 var3 var4 var5 var6 var7
global alpha "B1 C1 D1 E1 F1 G1 H1"
matrix list e(V)
matrix x = e(V)
putexcel set var_cov, replace
forvalues i=2/7 {
putexcel Ai+1'=("
vari''") foreach j of global alpha { putexcel
j'=("var
i''")
}
}
putexcel A8=("_cons")
putexcel I1=("_cons")
putexcel A9=("varaince-covariance matrix")
putexcel B2=matrix(x)
end
相应的do file都放在计量社群里, 有需要可以下载参看。
推荐阅读:
2.1998-2016年中国地级市年均PM2.5数据release
4.2005-2015中国分省分行业CO2数据circulation
5.匹配方法(matching)操作指南, 值得收藏的16篇文章
8.实证研究中用到的135篇文章, 社科学者常用toolkit
计量经济圈是中国计量第一大社区,我们致力于推动中国计量理论和实证技能的提升,圈子以海内外高校研究生和教师为主。计量经济圈绝对六多精神:社科资料最多、社科数据最多、科研牛人最多、海外名校最多、热情互助最多、前沿趋势最多。如果你热爱计量并希望长见识,那欢迎你加入到咱们这个大家庭(戳这里),要不然你只能去其他那些Open access圈子了。注意:进去之后一定要看小鹅社群“群公告”,不然接收不了群息,也不知道怎么进入咱们独一无二的微信群和QQ群。在规则框架下社群交流讨论无时间限制。