查看原文
其他

Stata最有用的points都在这里,无可替代的材料

欢迎投稿(荐稿)计量经济圈,计量相关都行

箱:econometrics666@sina.cn

计量经济圈(ID: econometrics666);需要打开里面蓝颜色链接获得资料的请直接进入咱们社群,我们会统一发放这个资料

FILE MANAGEMENT//文档管理

Gentzkowand Shapiro (2014) “Code and Data for the Social Sciences: A Practitioner’sGuide.” - I strongly recommend reading this before embarking on yourvery first empirical research project. The guide introduces you to a lot ofuseful concepts of data management developed in computer science, which willsave tons of time during an increasingly long journey of conducting a piece ofempirical research in economics. The most important are Chapters 2, 4 and 5,which help you organize your data files and millions of your Stata dofiles (no joking, by the time you publish your empirical paper, you willhave tons of Stata codes).

TUTORIALS//教学

Essamand Hughes (2016) Stata Cheetsheets --- All the importantStata commands at one glance. (HT: MarcBellemare)

Lembcke(2009) “Introduction to Stata” and “Advanced Stata Topics”--- These are the Statacourse lecture notes for PhD students at the Department of Economics, LSE.Since 2004, each year’s course instructor has updated and expanded them. I tookthe course in 2004, but the current version of the lecture note is much morethan what I learned at the course. You will learn a lot from this. Inparticular, “Advanced Stata Topics” touches on how to writeand publish your own Stata programme, maximum likelihood estimation in Stata,and how to use Mata (Stata’s matrix programming language), the topics that areusually not covered in a Stata course for economists.

Using Stata to Analyze Survey Data by Nicholas Minot (IFPRI):This is an excellent introduction to Stata specifically tailored for would-bedevelopment economists.

Maybeuseful:

A. Colin Cameron and P. K. Trivedi Microeconometrics: Methods and Applications 

GermánRodriguez “Stata Tutorial” Princeton University

Phil Bardsley, Kim Chantala, and Dan Blanchette "Stata Tutorial" University of NorthCarolina at Chapel Hill 

StataStarter Kit by UCLA Academic Technology Service

INTRODUCTION//引言

What Stata can/can't do by A. Colin Cameron(Dept. of Economics, University of California, Davis)

ADO FILES//自动程序

Toinstall an ado file, type "ssc install xxx" (where xxx should bereplaced with the name of the ado file) in your Stata interactive session.

DO FILES//do文档

Makingdo-files is essential because it allows other researchers to replicate yourempirical analysis. It's increasingly become the norm among empiricalresearchers to make public on the website Stata do-files used to produceresults in published papers. Here are some websites on how to make do-files.

MichaelS. Hill (2015) "In Stata coding, Style is the Essential: A brief commentaryon do-file style" 

Stata Tutorial by Carolina Population Center, University ofSouth Carolina 

AnIntroduction to Stata by Aimee Chin at MIT

Stata section of Guide to Genetic Analysis by Centre forIntegrated Genomic Medical Research (Links to example do-filesare dead, but it contains some information on editor software.)

Using external text editors to write do files byFriedrich Huebler

RA Manual Notes on Writing Code, by MatthewGentzkow and Jesse M. Shapiro (2012), offer the best practices in computerprogramming that are useful for writing Stata do files (and scripts for othersoftware).

Stata helpfor timer: A useful command if you run a do file that contains acommand to take very long to be executed (e.g. regression with a lot of fixedeffects).

Ifyou use Stata/MP on cluster computing facilities, see Stata Help:statampif you use Stata/MP on cluster computing facilities.

READING FILES//阅读文件

Everydata analysis begins with opening a data file. First, look at this website for jargons for data formats.(The description on rectangular files is wrong, though.)

StataHelp infiling: Official guide on which command to use for readingdifferent types of data.

Excel

Excelfiles can finally be imported by a Stata command: importexcel.

Forearlier versions of Stata to read an Excel file, follow this blog entry. Make sure to use the forwardslash (/) rather than the backslash (\) for the path name. It should thenwork. 

Stata

Thereis a useful ado program named USE10 whichallows you to read the Stata version 10 data with Stata version 9. Type “sscinstall use10” to install it. 

SPSS

Toread SPSS data files, use the usespss ado. (HT: David McKenzie.)

CSV

Ifeach data entry is separated by a comma (called the CSV format), use INSHEET.

Ifyour data includes an identification number with more than 7 digits, make sureyou include the double option to the insheet command.Read Stata Help for data_type for details.  

Tab-delimited

Ifthe separater is a tab or a space, use INFILE.

Fixed format

Ifthe data file is in the fixed format (no separater between data entries;entries are identified by column numbers), it's more tricky. There are threecases:

(1)If it's a flat file (each single line represents one observation), see Stata: How to Write a Dictionary Program to Read Raw Data bythe Electronic Data Service (EDS) at Columbia University;

(2)If it's a rectangular file (the fixed number of lines represent oneobservation), see "Example of a Program to Read Data with MultipleRecords/Case" at the bottom of Stata: How to Write a Dictionary Program to Read Raw Data bythe Electronic Data Service (EDS) at Columbia University;

(3)If it's a hierarchical file (a flexible number of lines represent oneobservation such as World Fertility Surveys), see Stata: How to Read Hierarchical Files in Stata bythe Electronic Data Service (EDS) at Columbia University.

From scratch

Tocreate a dataset from scratch, first type “drop _all” and then type “set obs #”where # is the number of observations in this new dataset. Then createvariables by the generate command etc. For a small dataset, you can use the INPUTcommand to directly enter the data.

Multiple files in the same directory

To read many files in the same directory and append them all,see Append Many Files by UCLA.

EDITING DATA STRUCTURE//编辑数据结构

Beforestarting to edit data itself, you need to edit the structure of data files:reshape, append, and merge.

RESHAPE:Whenever you use the datasets downloaded from World DevelopmentIndicators, you need to do this.

Using Stata's RESHAPE command, by Amy Yuen atElectronic Data Center of Emory University General Libraries

APPEND/MERGE:Good empirical research often relies on the use of two or more completelydifferent datasets. So you need to append or merge different datasets beforestarting analysis. 

ISID:When you want to merge two datasets which do not share the common uniqueidentifier but do share the same variables (e.g. birth date, birth region), theISID command lets you check if a certain set of variables uniquely identifyobservations. See Stata Help on ISID.

Stata Tutorial Part 4: Manipulating Files, bySyracuse University Library

DATA PROCESSING//数据处理

How to create dummy variables byStata FAQs

Create a new dataset by hand by CarolinaPopulation Centre, University of North Carolina

Listof math functions by Stata Help - can be usedin combination of generatecommand to edit variables.

List ofoperators by Stata Help

Date variables by Data and StatisticalServices, Princeton University --- This webpage tells you how to convert datevariables into different formats (e.g. convert the variables of year, month,and day into one date variable etc.). 

Tocategorize observations by percentile bins, use the command xtile.See this Statalist message. 

UNIQUE: Stata module to report number of unique values in variable(s) ---Sometimes this ado command is useful. For example, you may want to know whethera particular variable takes more than one value for each group of observations.To see the detail, type “ssc install unique” to install the ado file and thentype “help unique” for its help.

REGEXM:useful if you want to identify observations whose string variable contains aparticular set of letters. 

Loop over all values of a particular variable:there is a lesser-known command LEVELSOF, creating a local macro r(levels)which contains the list of all values of the specified variable.

SUMMARY STATISTICS//描述统计

ESTPOST -This is part of the ESTOUT ado file package, automatizing the process ofcreating a table of summary statistics. Highly recommended.

Section6 (pages 33-43) of Using Stata for Survey Data Analysis by NickMinot at IFPRI --- Very useful, especially if you are analyzing householdsurvey data.

How to conduct a t-test for survey data, by UCLAAcademic Technology Service --- Useful if each observation in your data needsto be weighted according to the sampling method. See also how to use the SVY command.

Generating Regression and Summary Statistics Tables inStata: A Checklist and Code, by Matthew Groh (May, 2014) ---Provides an example do file that uses the MAT2TXT Stata module.

ESTIMATIONS//估计

Overview of Stata estimation commands

Stata Textbook Examples: Econometric Analysis of CrossSection and Panel Data by Jeffrey M. Wooldridge, by UCLA AcademicTechnology Service --- First, find an example of the estimation method you wantto conduct in Wooldridge's graduate econometrics textbook. Then log on to thiswebpage to see what Stata command does the estimation you want.

Beyond simple OLS estimation by UCLAAcademic Technology Service - robustestimation, clusteringquantileregressionlinear hypothesis testingerrors-in-variables regression(eivreg), censored/truncated data, SURmultivariateregression, etc.

Fixed effects estimation

TheXTREG command with the FE option (ie. fixed effects estimation) has recentlybeen modified. See what’s new inStata 10 (items 4, 5, and 7 in particular) and inStata 11 (the fourth bullet point in particular).

Fixed Effects Estimation (xtreg commandwith fe option) by Stata FAQ - explains whythere is a constant term in the estimation result table.

Differences among within, between, and overall R-squaredobtained by the xtreg, fe command by JustinSmith (15 August 2006) 

R squared in Fixed Effects Estimation by Stata FAQ -explains why reported R squared is different between xtreg, fe and areg.See also this note by Indiana University Information TechnologyServices. Theoretical background can be found in Hayashi's Econometricstextbook (page 333-4), for example. (This issue seems to be outdated with thextreg command improved by Stata version 10 or higher.)

Ifyou notice the areg command the xtreg command with the fe option producedifferent clustered standard errors from each other, read this. 

Prais-Winstenpanel regression: use the XTPCSE command. Examples include Rohlfset al (2010).

Weighted least squares estimation

WeightedLeast Squares when the variance of the error term is known byStata Help

Choosing the Correct Weight Syntax by UNCCarolina Population Center - if you wonder what pweight,fweight, aweight, and iweight really mean.

Weighted Least Squares Regression byUCLA Academic Technology Service (See Deaton (1997) The Analysis ofHousehold Surveys, pp.67-73, for the use of weighted least squares in thecontext of survey design.)

probit, logit, and other nonlinear regressions

MARGINS:a new command introduced since version 12, to report the average value of thepredicted dependent variable by each specified value of regressors (if Iunderstand corectly). Useful for interpreting estimated coefficients fromnonlinear regressions, as explained by SSCC at University of Wisconsin-Madison.

INTEFF: this is an ado package to correctlyestimate the magnitude and standard errors of the effect of an interaction termin nonlinear models such as probit and logit. See Ai and Norton (2003) for detail. Thiscommand, however, does not work if there are quite a few dummy variables asregressors. It seems the MARGINS and MARGINSPLOT commands supercede the INTEFF.

Event study

How to conduct an event study estimationwith Stata by Data and Statistical Services, PrincetonUniversity

Attrition bias

Lee(2009)’s treatment effects bounds. In the case of attrition bias, this methodis now the industry standard. Now you can easily do it in Stata with the leebounds command. New

Standard errors

Bootstrapping:See Lecture 4 (pages 6-8) in Programming inStata, RLAB Data Service, London School of Economics. 

X_OLS: Timothy Conley's standard error correctionfor spatial correlation. This is the standard way of calculating standarderrors in the literature when you use the data where outcomes and regressorsare spatially correlated. 

Douglas Miller’s Stata code page contains aStata do file to execute Cameron, Gelbach, and Miller (2008)’s Wild Bootstrapstandard error clustering method, which is increasingly popular among appliedmicroeconometric researchers when the number of clusters is small. 

Matching estimation//匹配估计

CEM: CoarsenedExact Matching, by Iacus, King, and Porro (2008), for creating acontrol group whose observables are balanced against the treated group ex ante.Used by Azoulay, Zivin, and Wang (2010). 

Matching Estimators ado file by Abadie,Drukker, Herr, and Imbens

Synth by Abadie, Diamond, and Hainmueller--- A method to estimate the treatment effect from observational data when onlyone unit is treated. 

Pair-wiseMahalanobis matching with an optimal greedy algorithm: See page 209 of Bruhn andMcKenzie (2009). This article’s replication data file (click“Download Data Set” on this webpage), contains a Stata code for thismatching method.

AFTER EACH REGRESSION IS RUN...//回归之后

Howto interpret output tables that appear after executing estimation commmandssuch as summarize, regress, logistic, etc. by UCLA AcademicTechnology Service

reformat ado-file, by Sealed Envelop Ltd. - Thisado-file is useful when you have tons of fixed-effects (e.g. country dummies)and are interested in coefficients on these dummies.

StataClass 3, by Stas Kolenikov, Duke University - introduces commandsafter estimation for plotting residuals etc.

Fromversion 10, you can save estimation results in the disk by thecommand estimates save. As a result, the ESTSAVE ado is no longer necessary to install.

parmest ado-file allows you to create aStata data file of coefficient estimates along with t-values and p-values. Bydefault, Stata does not store t-values and p-values after regressions. Thisado-file is useful if you need to use t-values and/or p-values after eachregression is run. 

REPORT ESTIMATION RESULTS//汇报估计结果

ESTOUT -A great ado-file package to create a table of regression results either in thetext file format, in the HTML format, or in the TeX format! It's more versatilethan OUTREG2 (see below). It is slightly complicated but it's worth paying thefixed cost of learning how to use. To minimize the fixed cost, follow thefollowing steps:

Toinstall the package, see here.

First,learn how to use ESTSTO by readingthis.

Then,learn how to use ESTTAB by readingthis.

Onlyfor fancier things to do, you need to learn ESTOUT (themore flexible version of ESTTAB) and ESTADD (themore flexible version of ESTSTO's ADDSCALARS option).

Withthe ESTOUT package, you can easily create a summary statistics table!

TheESTOUT package also allows you to include "YES" or "NO" toindicate whether a certain set of fixed effects are controlled for (a standardpractice in labor economics type research). See this document.

Generating Regression and Summary Statistics Tables inStata: A Checklist and Code, by Matthew Groh (May, 2014) --- If youprefer creating regression tables in the Excel format.

TABOUT - Seems to be a very useful ado forautomating the process of creating any kinds of tables formatted to appear onan academic paper. Example Stata do files mentioned in this tutorial can be downloadedat the author’s website.

OUTREG2.ado - An improved version of OUTREG.ado (seebelow). It's less versatile than ESTOUT, but it's more flexible in producing aTeX file. One problem is that, after fixed effects estimation (areg or xtreg,fe), the nocons option does not work. 

How to use outreg.ado, by KelloggResearch Computing, Northwestern University - probably the most usefulexplanation of outreg ado file, including the PDF file of outreg help file. When you wantto use addstat option for reporting more than 10 statistics,outreg does not work properly. A solution can be found here (Statalist archives). (If you want tofurther convert the resulting EXCEL file into a LaTeX format, download EXCEL2LATEX here and extract the downloadedzip file into "C:\Documents and Settings\username\ApplicationData\Microsoft\AddIns" (where "username" is your own username).Then open the Excel and click "Tools - Add-Ins..." and check the boxfor Excel2Latex. You'll see a new small icon in tool bars. Select the table youwant to convert and then click the icon. Now you can create a TeX file of yourtable.)  

How to report multinominal logit regression results withOUTREG, by Statalist

GRAPHICS//绘图

Online Tutorial for Making Graphs by StataCorp. - An excellent website in the sense that you can choose the visual image(rather than picking the words like “bar graphs”, “scatter plots”, etc.) tolearn how to make various types of graph.

Howto make various types of graph (Follow linksbelow the heading of "Graphics") by UCLA Academic Technology Service- Useful if you want to make the twoway graphs.

BYoption for GRAPH command by Stata Help - this is how to makegraphs for each category (e.g. country by country).

BINSCATTER -A Stata package written by Michael Stepner, which allows you to create ascatter plot from (literally) millions of observations, by groupingobservations into several intervals of the x variable and plotting the averagevalue of the y variable for each group. (HT: David Seim)

Nonparametric regressioncurve in a scatter plot - search for "nonparametric".

Draw kernel density functions for each group in the samegraph by UCLA Academic Technology Service

Guide to creating PNG images with Stata byFriedrich Huebler

How to create animated graphics using Stata, byChuck Huber.

How to create a map from Stata by FriedrichHuebler

Drawing social networks in Stata with Netplot byRense Corten --- if you are analyzing social network data.  

PROGRAMMING//编程

Programming inStata, RLAB Data Service, London School of Economics: these arelecture notes for a Stata course at Department of Economics, LSE. Lectures 3 to5 deal with how to make your own program with Stata (macroloopingado-file,etc.). Very useful.

Howto display variable labels: See this Statalist message by Nick Cox on 27 May, 2010. 

TheCAPTURE command is useful when executing a do file, especially when you want toconduct different data processing steps depending on whether there is an error(which can be expressed as “if _rc==0” in the Stata code). See the paragraphsbelow the heading “If as a Way to Control Program Flow” in thiswebpage.

How do I run Stata in batch mode? (Stata FAQ): ifyou want to run a do file without launching Stata interactively in Unix

TROUBLESHOOTING//问题处理

Ifyou always type “set memory 900m” after launching Stata because you use a largedataset, read this.

Ifyou run Stata on Windows and encounter an error message "op. sys. refusesto provide memory, r(909)", you may want to consider ditchingWindows. Here's why.

Ifyou encounter an error message "insufficient disk space, r(699)",see this Stata FAQ article. 

Ifyou encounter a warning message “Warning: variance matrix is nonsymmetric orhighly singular”, see this post in Statalist by Jeff Pitblado of Stata Corp.

If youencounter an error message “could not rename c:\ado\plus\stata.trk toc:\ado\plus\backup.trk r(699);” when you try to install an ado file by the “sscinstall” command, read pages 47-48 of Lembcke (2009) “Introduction to Stata”. Unfortunately, thismethod does not change the Stata setting permanently. Everytime you use an adofile, you have to do this. 

FROM STATA TO OTHER SOFTWARE//各软件的交互

Export tables to Excel, written by Kevin Crow onThe Stata Blog.

How to transform dta file into csv file, by UCLAAcademic Technology Service. If data contains many decimal places, make sure touse the format command before the outsheet command so that Stata won’t randomlyround up values. If you don’t need the top row containing variable names, usethe noname option.  

Ordercommand by Stata Help - if you want to change the order ofvariables in the table you create from the Stata dataset.

How to edit Stata graphs in Microsoft Word, byStata FAQ

Stata tools for Latex, by UCLAAcademic Technology Service - for those of you who write empirical papers withLaTeX.

TEXTBOOK EXAMPLES//书籍示例

Stata commands for examples in Wooldridge's graduate leveltextbook Econometric Analysis of Cross Section and Panel Data,by UCLA Academic Technology Service

Stata commands for examples in Wooldridge'sundergrad level textbook Introductory Econometrics: A ModernApproach, by Boston College Academic Technology Support

Stata commands for Greene's textbook EconometricAnalysis (4th ed.), by UCLA Academic Technology Service

Accessible readings behind Stata commands//程序关联的相关文章

IVREG2

Murray,Michael P. (2006) "Avoiding Invalid Instruments and Coping with WeakInstruments," Journal of Economic Perspectives, 20(4), p. 128.

CLUSTER optionfor REGRESS

Deaton(1997) The Analysis of Household Surveys, pp.74-77.

Bertrandet al. (2004) "How Much Should We Trust Differences-in-differencesEstimates?," Quarterly Journal of Economics, vol.119, p.271.

KDENSITY

Deaton(1997) The Analysis of Household Surveys, pp.171-76.

The following websites may or may not be useful (Ihaven't checked them yet):

Tips forusing Stata 10, by Survey Design and Analysis Services Pty Ltd

Roger Newson's Stata ado files

Useful Links by Kellogg Research Computing,Northwestern University

Stata materials byStas Kolenikov, Duke University - includes very graphically well-presentedStata course notes.

Stataado files by Sealed Envelope Ltd.

Stata Tutorial at University of Essex

EszterHargittai's Stata Goodies Page

Stataresources developed by Johannes Schmieder


计量经济圈是中国计量第一大社区,我们致力于推动中国计量理论和实证技能的提升,圈子以海内外高校研究生和教师为主。计量经济圈六多精神:计量资料多,社会科学数据多,科研牛人多,名校人物多,热情互助多,前沿趋势多。如果你热爱计量并希望长见识,那欢迎你加入到咱们这个大家庭戳这里,要不然你只能去其他那些Open access圈子了。注意:进去之后一定要看小鹅社群“群公告”,不然接收不了群息,也不知道怎么进入咱们的微信群和计量论坛。


帮点击一下下面的小广告,谢谢支持!


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存