AER中截面数据(队列)DID的程序和数据开放下载！来自中国四学者的最新研究！

计量经济圈 2022-05-11

凡是搞计量经济的，都关注这个号了

邮箱：econometrics666@sina.cn

所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

之前，引荐过“中国学界F4发表AER一篇! 知识青年上山下乡与农村教育问题！”。今天，我们把该文原始数据和运行程序分享给各位学者，有需要的可以根据程序说明运行出结果。

这篇文章使用的方法已经被多次讲述过，例如：1.截面数据DID讲述, 截面做双重差分政策评估的范式，2.截面数据DID操作程序指南, 一步一步教你做，3.截面DID, 各种固定效应, 安慰剂检验, 置换检验, 其他外部冲击的处理。

Yi Chen，Ziying Fan，Xiaomin Gu，Li-An Zhou. Arrival of Young Talent: The Send-Down Movement and Rural Education in China, American Economic Review，2020.
This paper estimates the effects of the send-down movement during the Cultural Revolution— when about 16 million urban youth were mandated to resettle in the countryside— on rural education. Using a county-level dataset compiled from local gazetteers and population censuses, we show that greater exposure to the sent-down youths significantly increased rural children's educational achievement. This positive effect diminished after the urban youth left the countryside in the late 1970s but never disappeared. Rural children who interacted with the sent-down youths were also more likely to pursue more-skilled occupations, marry later, and have smaller families than those who did not.

这是数据：

这是程序：

长按二维码可以查看数据程序使用说明

注：为更好地浏览以下code，建议使用电脑查看。

下面展示了1-Table_Census_1990这个程序里的具体code（也是主要结果），其他的可以在文后下载。

******************************************************************************

This do-file carries out the analysis using the 1990 census.

Input data files:

census_1990_clean.dta

county_year_data.dta

Output files:

Table 2.txt (Summary Statistics)

Table 3.txt Columns (1)--(7) (The Effect of SDYs on the Educational Attainment of Rural Children)

Table 4.txt (Heterogeneous Effect of SDYs)

Table 6.txt (Addressing Various Confounding Factors)

Table 8.txt Columns (1)--(7) (The Lasting Effect of SDYs on Outcomes other than Education)

Figure 3.txt, census 1990

**********************************************

*Preparation: *

*Compute speed of school construction program*

**********************************************

use "$path1B\county_year_data.dta", clear

foreach s in secondary primary {

sort countyid year

bysort countyid: egen Min_year = min(year) if inrange(year,1964,1966)&!missing(school_`s')

bysort countyid: egen Max_year = max(year) if inrange(year,1975,1977)&!missing(school_`s')

bysort countyid: egen min_year = mean(Min_year)

bysort countyid: egen max_year = mean(Max_year)

gen Min_school = school_`s' if year == min_year

gen Max_school = school_`s' if year == max_year

bysort countyid: egen min_school = mean(Min_school)

bysort countyid: egen max_school = mean(Max_school)

gen `s'_speed = (max_school-min_school)/(max_year-min_year)

drop Min_* Max_* min_* max_*

}

drop if secondary_speed ==. | primary_speed == .

keep countyid secondary_speed primary_speed

duplicates drop

save "$path1A\rural_school_expansion.dta", replace

********************************************************************************

* *

* Step 1: Data Preparation and Summary Statistics *

* *

********************************************************************************

use "$path1B\census_1990_clean.dta", clear

*******************

*Control:1946-1955*

*Treat: 1956-1969*

*******************

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

******************

* Define Globals *

******************

global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

global var_abs_cohort2 "region1990 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

************************************

* Generate County Characteristics *

************************************

*rural school expansion

merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)

gen speed_primary_density = primary_speed/(pop1964/1000)

gen speed_secondary_density = secondary_speed/(pop1964/1000)

*intensity of the Great Famine

gen famine = inrange(year_birth,1959,1961) if rural == 1

gen nonfamine = inrange(year_birth,1955,1957) if rural == 1

bysort region1990: egen sum_famine = sum(famine)

bysort region1990: egen sum_nonfamine = sum(nonfamine)

gen ins_famine = 1-sum_famine/sum_nonfamine

drop famine nonfamine sum_famine sum_nonfamine

*extract data for county-level information

preserve

generate cr_info = [victims_cr!=.]

generate grain_info = [grain_output!=.]

generate school_info = !missing(speed_primary_density,speed_secondary_density )

foreach var in yedu primary_graduate junior_graduate {

replace `var' = . if treat !=0 | rural != 1 // only keep the baseline

}

collapse (mean) countyid pop1964 sdy_density han_ethn primary_graduate junior_graduate victims_cr cr_info grain_info school_info ins_famine, by(region1990)

gen prov = floor(region1990/10000)

save "$path1A\census_1990_county_char.dta", replace

restore

*******************************************************************

*Table 2: Summary Statistics of the 1% Sample from the 1990 Census*

*******************************************************************

gen age = 1990 - year_birth

outsum yedu primary_graduate junior_graduate male han_ethn age if treat==0 & rural==1 using "$path4\Table2.txt", replace

outsum yedu primary_graduate junior_graduate male han_ethn age if treat==0 & rural==0 using "$path4\Table2.txt", append

outsum yedu primary_graduate junior_graduate male han_ethn age if treat==1 & rural==1 using "$path4\Table2.txt", append

outsum yedu primary_graduate junior_graduate male han_ethn age if treat==1 & rural==0 using "$path4\Table2.txt", append

********************************************************************************

* *

* Step 2: Main Results *

* *

********************************************************************************

*****************************************************************************

*Table 3: The Effect of SDYs on the Educational Attainment of Rural Children*

*Columns (1)--(7) *

*****************************************************************************

foreach var in yedu primary_graduate junior_graduate {

forvalues i = 1/2 {

if (`i'==1) reghdfe `var' c.sdy_density#c.treat male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)

if (`i'==2) reghdfe `var' c.sdy_density#c.treat male han_ethn if rural==0, absorb($var_abs_cohort) cluster(region1990)

summ `var' if e(sample)&treat==0

local mean = r(mean)

if (("`var'"=="yedu")&(`i'==1)) outreg2 using "$path4\Table3.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat male han_ethn) sortvar(c.sdy_density#c.treat male han_ethn)

if (("`var'"!="yedu")|(`i'!=1)) outreg2 using "$path4\Table3.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat male han_ethn) sortvar(c.sdy_density#c.treat male han_ethn)

}

keep if rural==1

drop rural // for the remaining analysis, we only use the rural sample

gen treat_placebo = inrange(year_birth,1951,1955) if inrange(year_birth,1946,1955)

reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, absorb($var_abs_cohort) cluster(region1990)

outreg2 using "$path4\Table3.txt", append se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn) sortvar(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn)

drop treat_placebo

****************************************************************************************

*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*

*Panel B: Census 1990 *

****************************************************************************************

compress

forvalues y = 1946/1969 {

gen I`y' = sdy_density*[year_birth==`y']

}

reghdfe yedu I1946-I1969 male han_ethn, absorb($var_abs_cohort2) cluster(region1990)

outreg2 using "$path4\Figure3.txt", replace sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1969) sortvar(I1946-I1969)

drop I1946-I1969

drop if inrange(year_birth,1941,1945) // these cohorts serve as the baseline in the above regression, and will not be used in the following analysis.

***************************************

*Table 4: Heterogeneous Effect of SDYs*

***************************************

forvalues i = 1/8 {

if (`i'==1) reghdfe yedu c.sdy_density#c.treat male han_ethn if male==1 , absorb($var_abs_cohort) cluster(region1990)

if (`i'==2) reghdfe yedu c.sdy_density#c.treat male han_ethn if male==0 , absorb($var_abs_cohort) cluster(region1990)

if (`i'==3) reghdfe yedu c.sdy_density#c.treat male han_ethn if edu_base<5.5 , absorb($var_abs_cohort) cluster(region1990)

if (`i'==4) reghdfe yedu c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)

if (`i'==5) reghdfe primary_graduate c.sdy_density#c.treat male han_ethn if edu_base<5.5 , absorb($var_abs_cohort) cluster(region1990)

if (`i'==6) reghdfe junior_graduate c.sdy_density#c.treat male han_ethn if edu_base<5.5 , absorb($var_abs_cohort) cluster(region1990)

if (`i'==7) reghdfe primary_graduate c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)

if (`i'==8) reghdfe junior_graduate c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)

summ yedu if e(sample)&treat==0

local mean1 = r(mean)

summ primary_graduate if e(sample)&treat==0

local mean2 = r(mean)

summ junior_graduate if e(sample)&treat==0

local mean3 = r(mean)

if (`i'==1) outreg2 using "$path4\Table4.txt", replace se nonotes nocons noaster nolabel text addstat(Mean1,`mean1',Mean2,`mean2',Mean3,`mean3') keep(c.sdy_density#c.treat)

if (`i'!=1) outreg2 using "$path4\Table4.txt", append se nonotes nocons noaster nolabel text addstat(Mean1,`mean1',Mean2,`mean2',Mean3,`mean3') keep(c.sdy_density#c.treat)

}

********************************************************************************

* *

* Step 3: Contemporaneous Events and Other Outcome Variables *

* *

********************************************************************************

*************************************************

*Table 6: Addressing Various Confounding Factors*

*************************************************

*grain productivity

replace grain_output = grain_output/pop1964

*Cultural Revolution

replace victims_cr = victims_cr/pop1964

gen treat_cr1 = inrange(year_birth,1954,1961)

gen treat_cr2 = inrange(year_birth,1962,1968)

*great famine

gen famine_cohort1 = inrange(year_birth,1955,1958)

gen famine_cohort2 = inrange(year_birth,1959,1961)

*prepare for the interaction terms between school expansion program and SDY

reghdfe yedu c.sdy_density#c.treat c.speed_primary_density#c.treat c.speed_secondary_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

summ speed_secondary_density if e(sample)==1

gen DV_secondary_density = speed_secondary_density - r(mean)

summ speed_primary_density if e(sample)==1

gen DV_primary_density = speed_primary_density - r(mean)

summ sdy_density if e(sample)==1

gen DV_sdy_density = sdy_density - r(mean)

local newvar1 "c.grain_output#c.treat"

local newvar2 "c.speed_primary_density#c.treat c.speed_secondary_density#c.treat"

local newvar3 "c.speed_primary_density#c.treat c.speed_secondary_density#c.treat c.DV_sdy_density#c.treat#c.DV_primary_density c.DV_sdy_density#c.treat#c.DV_secondary_density"

local newvar4 "c.victims_cr#c.treat_cr1 c.victims_cr#c.treat_cr2"

local newvar5 "c.ins_famine#c.famine_cohort1 c.ins_famine#c.famine_cohort2"

local newvar6 "`newvar1' `newvar2' `newvar4' `newvar5'"

local newvar_r c.grain_output#c.treat c.speed_primary_density#c.treat c.speed_secondary_density#c.treat c.DV_sdy_density#c.treat#c.DV_primary_density c.DV_sdy_density#c.treat#c.DV_secondary_density ///

c.victims_cr#c.treat_cr1 c.victims_cr#c.treat_cr2 c.ins_famine#c.famine_cohort1 c.ins_famine#c.famine_cohort2

capture gen sample = .

forvalues i = 1/6 {

if (`i'==1) local comm "replace"

if (`i'!=1) local comm "append"

reghdfe yedu c.sdy_density#c.treat `newvar`i'' male han_ethn, absorb($var_abs_cohort) cluster(region1990)

outreg2 using "$path4\Table6_A.txt", `comm' se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat `newvar_r') sortvar(c.sdy_density#c.treat `newvar_r')

replace sample = e(sample)

reghdfe yedu c.sdy_density#c.treat male han_ethn if sample == 1, absorb($var_abs_cohort) cluster(region1990)

outreg2 using "$path4\Table6_B.txt", `comm' se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat)

}

**********************************************************************

*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*

*Columns (1)--(7) *

**********************************************************************

gen senior_high = [yedu > 9] if yedu>=9 & yedu<.

/*According to our definition of yedu, junior high graduates receive 9 years of education.

Going beyond 9 years of education is equivalent to going beyond junior high education.*/

gen occ_highskill = inlist(occisco,2,3) if !inlist(occisco,1,99)

forvalues i = 1/7 {

if (`i'==1) reghdfe senior_high c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==2) reghdfe laborforce c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==3) reghdfe laborforce c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==4) reghdfe occ_highskill c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==5) reghdfe occ_highskill c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==6) reghdfe teacher c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==7) reghdfe teacher c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

summ `e(depvar)' if e(sample)&treat==0

local mean = r(mean)

if (`i'==1) outreg2 using "$path4\Table8.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)

if (`i'!=1) outreg2 using "$path4\Table8.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)

}

文中其他表格的code:

Output files:

Table 3.txt Column (8) (The Effect of SDYs on the Educational Attainment of Rural Children)

Table 5.txt (Effects of SDYs on the Supply of Local Teachers and Educational Fiscal Expenses, 1955--1977)

Table 7.txt (The Effect of SDYs on Local People's Locus of Control)

Table 8.txt Columns (8)--(10) (The Lasting Effect of SDYs on Outcomes other than Education)

Figure 3.txt, census 1982/2010

********************************************************************************

* *

* Step 1: Analysis using the 1982 Census *

* *

********************************************************************************

global var_abs_cohort2 "region1982 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

use "$path1B\census_1982_clean.dta", clear

****************************************************************************************

*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*

*Panel A: Census 1982 *

****************************************************************************************

forvalues y = 1946/1962 {

gen I`y' = sdy_density*[year_birth==`y']

}

reghdfe yedu I1946-I1962 male han_ethn, absorb($var_abs_cohort2) cluster(region1982)

outreg2 using "$path4\Figure3.txt", append sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1962) sortvar(I1946-I1962)

********************************************************************************

* *

* Step 2: Analysis using the 2000 Census *

* *

********************************************************************************

global var_abs_cohort "region2000 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

global var_abs_cohort2 "region2000 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

use "$path1B\census_2000_clean.dta", clear

*****************************************************************************

*Table 3: The Effect of SDYs on the Educational Attainment of Rural Children*

*Columns (8) *

*****************************************************************************

gen treat_placebo = inrange(year_birth,1975,1979) if inrange(year_birth,1970,1979)

reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, absorb($var_abs_cohort) cluster(region2000)

drop treat_placebo

****************************************************************************************

*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*

*Panel C: Census 2000 *

****************************************************************************************

forvalues y = 1946/1979 {

gen I`y' = sdy_density*[year_birth==`y']

}

reghdfe yedu I1946-I1979 male han_ethn, absorb($var_abs_cohort2) cluster(region2000)

outreg2 using "$path4\Figure3.txt", append sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1979) sortvar(I1946-I1979)

drop I1946-I1979

**********************************************************************

*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*

*Columns (8)--(9) *

**********************************************************************

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

forvalues i = 1/2 {

if (`i'==1) reghdfe age_marry1st c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2000)

if (`i'==2) reghdfe n_child c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2000)

summ `e(depvar)' if e(sample)&treat==0

local mean = r(mean)

outreg2 using "$path4\Table8.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)

}

********************************************************************************

* *

* Step 3: Analysis using the 2010 Census *

* *

********************************************************************************

global var_abs_cohort "region2010 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

use "$path1B\census_2010_clean.dta", clear

rename treat_p treat

**********************************************************************

*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*

*Columns (10) *

**********************************************************************

reghdfe yedu c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2010)

summ `e(depvar)' if e(sample)&treat==0

local mean = r(mean)

outreg2 using "$path4\Table8.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat) sortvar(c.sdy_density#c.treat)

********************************************************************************

* *

* Step 4: Analysis using the 2010 CFPS *

* *

********************************************************************************

global var_abs_cohort "region2010_h prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

use "$path1B\CFPS_2010_clean.dta", clear

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

eststo clear

foreach var of varlist LOC LOC_education LOC_talent LOC_effort LOC_hard_work LOC_intellect LOC_F_SES LOC_F_wealth LOC_F_connection LOC_luck LOC_connection {

eststo: reghdfe `var' c.sdy_density#c.treat male han_ethn, vce(cluster region2010_h) absorb($var_abs_cohort)

}

outreg2 [*] using "$path4\Table7_A.txt", se nonotes nocons noaster nolabel bdec(3) text replace keep(c.sdy_density#c.treat) sortvar(c.sdy_density#c.treat)

eststo clear

foreach var of varlist LOC LOC_education LOC_talent LOC_effort LOC_hard_work LOC_intellect LOC_F_SES LOC_F_wealth LOC_F_connection LOC_luck LOC_connection {

eststo: reghdfe `var' c.sdy_density#c.treat male han_ethn yedu, vce(cluster region2010_h) absorb($var_abs_cohort)

}

outreg2 [*] using "$path4\Table7_B.txt", se nonotes nocons noaster nolabel bdec(3) text replace keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)

********************************************************************************

* *

* Step 5: Analysis using our county-by-year data *

* *

********************************************************************************

use "$path1B\county_year_data.dta", clear

keep if inrange(year,1955,1977)

drop if sdy_density == .

gen postSDY = [year >= 1968] if inrange(year,1955,1977)

foreach i in pri sec {

foreach j in total state nonst {

gen ratio_`i'_`j' = tch_`i'_`j'/pop1964

}

gen fiscal_edu_pc = log(10000*fiscal_edu/pop1964)

eststo clear

forvalues i = 1/7 {

if (`i'==1) reghdfe ratio_pri_total c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==2) reghdfe ratio_pri_state c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==3) reghdfe ratio_pri_nonst c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==4) reghdfe ratio_sec_total c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==5) reghdfe ratio_sec_state c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==6) reghdfe ratio_sec_nonst c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

if (`i'==7) reghdfe fiscal_edu_pc c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

unique countyid if e(sample)

local count = r(unique)

if (`i'==1) outreg2 using "$path4\Table5.txt", se nonotes nocons noaster nolabel bdec(3) text replace addstat(Ncounty,`count') keep(c.sdy_density#c.postSDY)

if (`i'!=1) outreg2 using "$path4\Table5.txt", se nonotes nocons noaster nolabel bdec(3) text append addstat(Ncounty,`count') keep(c.sdy_density#c.postSDY)

}

下面的code输出文章中的图：

Output files:

Figure1.pdf (Number of SDYs by Resettlement, 1962--1979)

Figure3.pdf (Effect of SDYs on the Educational Attainment of Different Cohorts)

**************************************************************************

*Figure 1: Number of SDYs by Resettlement, 1962--1979 (Source: Gu (2009))*

**************************************************************************

clear

input str9 year total rural_village collective_farm state_farm

1962-1966 129.28 87.06 0 42.22

1967-1968 199.68 165.96 0 33.72

1969 267.38 220.44 0 46.94

1970 106.4 74.99 0 31.41

1971 74.83 50.21 0 24.62

1972 67.39 50.26 0 17.13

1973 89.61 80.64 0 8.97

1974 172.48 119.19 34.63 18.66

1975 236.86 163.45 49.68 23.73

1976 188.03 122.86 41.51 23.66

1977 171.68 113.79 41.9 15.99

1978 48.09 26.04 18.92 3.13

1979 24.77 7.32 16.44 1.01

end

foreach var in total rural_village collective_farm state_farm {

replace `var'= `var'/100

}

gen v_temp = rural_village + collective_farm

encode year, generate(period)

twoway bar rural_village period, barw(0.6) base(0) color(gs2) ///

|| rbar v_temp rural_village period, barw(0.6) color(gs12) ///

|| rbar total v_temp period, barw(0.6) color(gs7) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(0.5)3, angle(0) format(%12.1f)) ytitle("Number of SDYs (Million)", margin(medium)) ///

xlabel(1(1)13, noticks valuelabel angle(90)) xtitle("Year") ///

legend(label(1 "Rural Villages") label(2 "Collective Farms") label(3 "State Farms") ring(0) pos(2) colgap(*0.5) )

graph export "$path4\Figure1.pdf",replace

*****************************************************************************

*Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*

*****************************************************************************

insheet using "$path4\Figure3.txt", clear

keep if inrange(_n,5,38)

gen year = substr(v1,2,4)

rename (v2 v3 v4 v5 v6 v7)(coef1990 se1990 coef1982 se1982 coef2000 se2000)

destring, force replace

keep year coef* se*

reshape long coef se, i(year) j(data)

drop if coef == .

gen lb = coef - 1.96*se

gen ub = coef + 1.96*se

gen y_overlap = min(max(year-1955,0),max(1970-year,0),6)

sort data year

twoway line lb year if data==1982, sort lpattern(dash) lcolor(gs8) yaxis(1) ///

|| line ub year if data==1982, sort lpattern(dash) lcolor(gs8) ///

|| line coef year if data==1982, lwidth(thick) lcolor(black) yaxis(1) ///

|| line y_overlap year if data==1982, sort lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(-4(2)8, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///

ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///

xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///

xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Panel A - Census 1982", size(small) margin(small)) ///

yline(0, lpattern(solid) lwidth(thin) lcolor(black)) legend(off) fxsize(70) fysize(60)

graph save a,replace

twoway line lb year if data==1990, lpattern(dash) lcolor(gs8) yaxis(1) ///

|| line ub year if data==1990, lpattern(dash) lcolor(gs8) ///

|| line coef year if data==1990, lwidth(thick) lcolor(black) yaxis(1) ///

|| line y_overlap year if data==1990, lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(-4(2)8, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///

ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///

xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///

xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Panel B - Census 1990", size(small) margin(small)) ///

yline(0, lpattern(solid) lwidth(thin) lcolor(black)) legend(off) fxsize(70) fysize(60)

graph save b,replace

twoway line lb year if data==2000, lpattern(dash) lcolor(gs8) yaxis(1) ///

|| line ub year if data==2000, lpattern(dash) lcolor(gs8) ///

|| line coef year if data==2000, lwidth(thick) lcolor(black) yaxis(1) ///

|| line y_overlap year if data==2000, lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(-3(1)6, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///

ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///

xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///

legend(order(3 1 4)label(3 "Coefficient") label(1 "95% CI") label(4 "Overlapped Years in""Primary Schools") col(2) size(small) margin(tiny)) ///

xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Panel C - Census 2000", size(small) margin(small)) ///

yline(0, lpattern(solid) lwidth(thin) lcolor(black)) fxsize(65) fysize(80)

graph save c,replace

twoway || connected coef year if data==1982, lwidth(medthick) msymbol(triangle) color(black) ///

|| line coef year if data==1990, lwidth(medthick) color(gs6) ///

|| connected coef year if data==2000, lwidth(medthick) msymbol(square) color(gs12) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(-2(1)5, labsize(small) angle(0) format(%12.0f)) ytitle("Coefficients", size(small)) ///

xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///

legend(label(1 "Census 1982") label(2 "Census 1990") label(3 "Census 2000") col(2) size(small)) ///

xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Panel D - Three Censuses in One Graph", size(small) margin(small)) ///

yline(0, lpattern(solid) lwidth(thin) lcolor(black)) fxsize(70) fysize(80)

graph save d,replace

graph combine a.gph b.gph c.gph d.gph, graphregion(fcolor(gs16) lcolor(gs16))

graph export "$path4\Figure3.pdf",replace

erase a.gph

erase b.gph

erase c.gph

erase d.gph

erase "$path4\Figure3.txt"

附录的表格code：

********************************************************************************

* *

* Tables in Appendix A *

* *

********************************************************************************

use "$path1B\census_1990_clean.dta", clear

global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

keep if inrange(year_birth,1946,1969)

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

***********************************************************

* generate base education level for both rural and urban *

***********************************************************

gen edu_temp_urban = yedu if treat == 0 & rural == 0

gen prefec = floor(region1990/100)

bysort region1990: egen edu_base_urban1 = mean(edu_temp_urban)

bysort prefec : egen edu_base_urban2 = mean(edu_temp_urban)

bysort prov : egen edu_base_urban3 = mean(edu_temp_urban)

drop edu_temp_urban

**************************************************

* Table A1: Knowledge Gap and the Effect of SDYs *

**************************************************

forvalues i = 1/3 {

gen edu_base_diff`i' = edu_base_urban`i' - edu_base

summ edu_base_diff`i' if !missing(yedu,sdy_density,edu_base_diff`i',treat) & rural==1

gen DV_edu_base_diff`i' = edu_base_diff`i' - r(mean)

summ sdy_density if !missing(yedu,sdy_density,edu_base_diff`i',treat) & rural==1

gen DV_sdy_density = sdy_density - r(mean)

reghdfe yedu c.sdy_density#c.treat c.treat#c.edu_base_diff`i' c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i' male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)

if (`i'==1) outreg2 using "$path4\TableA1.txt", replace se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i')

if (`i'!=1) outreg2 using "$path4\TableA1.txt", append se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i')

drop DV_edu_base_diff`i' DV_sdy_density

}

drop edu_base_diff* edu_base_urban*

*******************************************************

* Table A3: The Effect of SDYs on Occupational Choice *

*******************************************************

forvalues i = 1/9 {

gen O`i' = [occisco==`i'] if occisco!=99

}

forvalues i = 1/9 {

reghdfe O`i' c.sdy_density#c.treat male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)

summ `e(depvar)' if e(sample)&treat==0

local mean = r(mean)

if (`i'==1) outreg2 using "$path4\TableA3.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)

if (`i'!=1) outreg2 using "$path4\TableA3.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)

}

drop O1-O9 occisco

********************************************************************************

* *

* Tables in Appendix B *

* *

********************************************************************************

*********************************************************************

* Prepare the information availability of county-by-year level data *

*********************************************************************

use "$path1B\county_year_data.dta", clear

keep if inrange(year,1955,1977)

keep if sdy_density != .

bysort countyid: egen count1 = count(tch_sec_total)

bysort countyid: egen count2 = count(tch_pri_total)

bysort countyid: egen count3 = count(fiscal_edu)

gen share_nonmissing_teacher = (count1 + count2)/46

gen share_nonmissing_fiscal = (count3)/23

keep countyid share_nonmissing_teacher share_nonmissing_fiscal

duplicates drop

save "$path1A\teacher_fiscal_info.dta", replace

*****************************************

* Table B1: Count of Number of Counties *

*****************************************

use "$path1B\county_data.dta", clear

drop if region1990 == .

merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", nogenerate keep(1 3)

merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)

replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .

replace share_nonmissing_fiscal = 0 if share_nonmissing_fiscal == .

unique region1990

scalar r1 = r(unique) // number in Panel A, row 1

gen prov = floor(region1990/10000)

unique region1990 if !inlist(prov,11,12,31)

scalar r2 = r(unique) // number in Panel A, row 2

unique region1990 if !inlist(prov,11,12,31) & district!=1

scalar r3 = r(unique) // number in Panel A, row 3

unique region1990 if !inlist(prov,11,12,31) & district!=1 & sdy!=.

scalar r4 = r(unique) // number in Panel A, row 4

unique region1990 if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.

scalar r5 = r(unique) // number in Panel A, row 5

*Panel B is conditional on "core counties"

keep if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.

scalar r6 = .

unique region1990 if grain_output !=.

scalar r9 = r(unique) // number in Panel B, row 3

unique region1990 if !missing(secondary_speed,primary_speed)

scalar r12 = r(unique) // number in Panel B, row 6

unique region1990 if !missing(victims_cr)

scalar r13 = r(unique) // number in Panel B, row 7

*For the following variables, they don't have to show up in the 1990 census.

use "$path1B\county_data.dta", clear

merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", nogenerate keep(1 3)

merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)

replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .

replace share_nonmissing_fiscal = 0 if share_nonmissing_fiscal == .

gen prov = floor(region1990/10000)

keep if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.

unique countyid if share_nonmissing_teacher > 0

scalar r7 = r(unique) // number in Panel B, row 1

summ share_nonmissing_teacher if share_nonmissing_teacher > 0

scalar r8 = r(mean) // number in Panel B, row 2

unique countyid if share_nonmissing_fiscal > 0

scalar r10 = r(unique) // number in Panel B, row 4

summ share_nonmissing_fiscal if share_nonmissing_fiscal > 0

scalar r11 = r(mean) // number in Panel B, row 5

clear

set obs 13

gen num = .

forvalues i = 1/13 {

replace num = r`i' in `i'

}

outsheet using "$path4\TableB1.txt", replace

**************************************************************************************************

* Table B2: Correlation between County-level Information Availability and County Characteristics *

**************************************************************************************************

use "$path1A\census_1990_county_char.dta", clear

merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", keep(1 3) nogenerate

gen minority = 1 - han_ethn

replace victims_cr = victims_cr/pop1964

replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .

replace share_nonmissing_fiscal = 0 if share_nonmissing_fiscal == .

eststo clear

eststo: reghdfe sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe sdy_density victims_cr primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe share_nonmissing_teacher sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe grain_info sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe share_nonmissing_fiscal sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe school_info sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe cr_info sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

global var_order "sdy_density victims_cr ins_famine primary_graduate junior_graduate minority"

outreg2 [*] using "$path4\TableB2.txt", replace se nonotes nocons noaster nolabel text keep($var_order) sortvar($var_order)

********************************************************************************

* *

* Tables in Appendix C *

* *

********************************************************************************

************************************************************************

* Table C1: Comparing the Number of Received SDYs from County-aggregate*

* with that from National Report in Each Province, Column (1) *

* *

* Note: Column (2) comes from Gu (2009) *

************************************************************************

use "$path1B\county_data.dta", clear

drop if region1990 == .

gen prov = floor(region1990/10000)

drop if inlist(prov,11,12,31,54) // Drop Beijing, Tianjin, Shanghai, and Tibet

replace prov = 44 if prov == 46 // Hainan is part of Guangdong during the movement

replace prov = 51 if prov == 50 // Chongqing is part of Sichuan during the movement

collapse (sum) sdy, by(prov)

replace sdy = sdy/1000

outsheet using "$path4\TableC1_column1.txt", replace

********************************************************************************

* *

* Tables in Appendix D *

* *

********************************************************************************

use "$path1B\census_1990_clean.dta", clear

global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

gen treat_alt = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)&!inrange(year,1953,1955)

gen treat_junior = inrange(year_birth,1953,1969) if inrange(year_birth,1943,1969)

keep if inrange(year_birth,1943,1969)

************************************************************

* Table D1: Robustness Check with Different Specifications *

************************************************************

*alternative densities of SDY

bysort region1990: egen cohort_size = sum(treat)

gen sdy_density_alt = sdy_density*pop1964/(100*cohort_size)

drop cohort_size

*alternative exposure to SDY

gen primary_overlap = min(max(year-1955,0),max(1970-year,0),6) if inrange(year_birth,1946,1969)

gen junior_overlap = min(max(year-1952,0),max(1970-year,0),9) if inrange(year_birth,1943,1969)

global var_order "c.sdy_density#c.treat c.sdy_density_alt#c.treat c.sdy_density#c.primary_overlap c.sdy_density#c.treat_alt c.sdy_density#c.treat_junior c.sdy_density#c.junior_overlap"

gen temp_var = .

forvalues i = 1/8 {

if (`i'>=1&`i'<=3) local dep_var "c.sdy_density#c.treat"

if (`i'==4) local dep_var "c.sdy_density_alt#c.treat"

if (`i'==5) local dep_var "c.sdy_density#c.primary_overlap"

if (`i'==6) local dep_var "c.sdy_density#c.treat_alt"

if (`i'==7) local dep_var "c.sdy_density#c.treat_junior"

if (`i'==8) local dep_var "c.sdy_density#c.junior_overlap"

if (`i'==1) local cond "& year_birth<=1966"

if (`i'==2) local cond "& year_birth<=1963"

if (`i'==3) local cond "& year_birth<=1960"

if (`i'>3) local cond ""

reghdfe yedu `dep_var' male han_ethn if rural==1 `cond', absorb($var_abs_cohort) cluster(region1990)

if (`i'==1) outreg2 using "$path4\TableD1_A.txt", se nonotes nocons noaster nolabel text replace keep($var_order) sortvar($var_order)

if (`i'!=1) outreg2 using "$path4\TableD1_A.txt", se nonotes nocons noaster nolabel text append keep($var_order) sortvar($var_order)

replace temp_var = `dep_var' // to make the table easier to read, Panel B reports the corresponding coefficients to Panel A

reghdfe yedu temp_var male han_ethn if rural==0 `cond', absorb($var_abs_cohort) cluster(region1990)

if (`i'==1) outreg2 using "$path4\TableD1_B.txt", se nonotes nocons noaster nolabel text replace keep(temp_var)

if (`i'!=1) outreg2 using "$path4\TableD1_B.txt", se nonotes nocons noaster nolabel text append keep(temp_var)

}

*************************************

* Table D2: Other Robustness Checks *

*************************************

eststo clear

*drop nine Third-Frontier provinces

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & !inlist(prov,51,52,61,62,42,43,53,64,65), absorb($var_abs_cohort) cluster(region1990)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & !inlist(prov,51,52,61,62,42,43,53,64,65), absorb($var_abs_cohort) cluster(region1990)

*drop five provinces that does not match well between local gazettes and national reports

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & !inlist(prov,14,23,53,64,65), absorb($var_abs_cohort) cluster(region1990)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & !inlist(prov,14,23,53,64,65), absorb($var_abs_cohort) cluster(region1990)

*impose stronger assumptions on migration history

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & local_1985==1, absorb($var_abs_cohort) cluster(region1990)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & local_1985==1, absorb($var_abs_cohort) cluster(region1990)

*drop sample whose education are eligible for hukou transition

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & hukou_transit==0, absorb($var_abs_cohort) cluster(region1990)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & hukou_transit==0, absorb($var_abs_cohort) cluster(region1990)

*drop counties that SDY numbers end with zero

gen last_digit = mod(sdy,10)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & last_digit!=0, absorb($var_abs_cohort) cluster(region1990)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & last_digit!=0, absorb($var_abs_cohort) cluster(region1990)

outreg2 [*] using "$path4\TableD2.txt", se nonotes nocons noaster nolabel text replace keep(c.sdy_density#c.treat)

erase "$path1A\census_1990_county_char.dta"

erase "$path1A\rural_school_expansion.dta"

erase "$path1A\teacher_fiscal_info.dta"

以下code产生附录中的图：

Output files:

Figure A1.pdf (Trends of Real Educational Expenditures in Local Gazetteers)

Figure A2.pdf (The Process of China's Secondary Education Expansion since the Late 1960s)

Figure C1.pdf (Benford's Law and Data Quality on SDYs)

Figure C2.pdf (Number of SDYs Estimated from CFPS 2010)

Figure E1.pdf (Estimating the Effect of SDYs using the Synthetic Control Method)

********************************************************************************

* *

* Figures in Appendix A *

* *

********************************************************************************

************************************************************************

*Figure A1: Trends of Real Educational Expenditures in Local Gazetteers*

************************************************************************

use "$path1B\county_year_data.dta", clear

keep if inrange(year,1950,1990)

rename fiscal_edu fiscal_edu_county

collapse (mean) fiscal_edu_county, by(year)

merge 1:1 year using "$path1B\NBS_data.dta", nogenerate keepusing(fiscal_edu price_deflator)

replace fiscal_edu_national = fiscal_edu_national/price_deflator

replace fiscal_edu_county = fiscal_edu_county/price_deflator

twoway line fiscal_edu_national year, yaxis(1) lcolor(black) lpattern(solid) ///

|| line fiscal_edu_county year, yaxis(2) lcolor(gs8) lpattern(dash) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(50)250, angle(0) format(%12.0f) axis(1)) ytitle("National Fiscal Educational Expenditures""from NBS (100 million RMB)", margin(medium) axis(1)) ///

ylabel(0(100)500, angle(0) format(%12.0f) axis(2)) ytitle("Educational Expenditures Per County""from Local Gazeteers (10,000 RMB)", margin(medium) axis(2)) ///

xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") ///

legend(label(1 "National Fiscal Educational""Expenditures from NBS") label(2 "Educational Expenditures Per County""from Local Gazeteers") col(1) size(medsmall))

graph export "$path4\FigureA1.pdf",replace

**************************************************************************************

*Figure A2: The Process of China's Secondary Education Expansion since the Late 1960s*

**************************************************************************************

use "$path1B\county_year_data.dta", clear

keep if sdy_density !=.

collapse (mean) school_primary school_secondary, by(year)

twoway line school_primary year, yaxis(1) lcolor(black) lpattern(solid) ///

|| line school_secondary year, yaxis(2) lcolor(gs8) lpattern(dash) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(150)600, angle(0) format(%12.0f) axis(1)) ytitle("# Primary Schools per County", margin(medium) axis(1)) ///

ylabel(0(20)80, angle(0) format(%12.0f) axis(2)) ytitle("# Secondary Schools per County", margin(medium) axis(2)) ///

xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") title("Panel A - Summary Statistics from Local Gazeteers",size(medium) margin(medium)) ///

legend(label(1 "# Primary Schools per County") label(2 "# Secondary Schools per County") col(1) size(medsmall))

graph save a, replace

use "$path1B\NBS_data.dta", clear

twoway line primary_stu year, yaxis(1) lcolor(black) lpattern(solid) ///

|| line secondary_stu year, yaxis(2) lcolor(gs8) lpattern(dash) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(400)1600, angle(0) format(%12.0f) axis(1)) ytitle("# Primary Students per 10,000", margin(medium) axis(1)) ///

ylabel(0(200)800, angle(0) format(%12.0f) axis(2)) ytitle("# Secondary Students per 10,000", margin(medium) axis(2)) ///

xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") title("Panel B - National-level Statistics",size(medium) margin(medium)) ///

legend(label(1 "# Primary Students per 10,000") label(2 "# Secondary Students per 10,000") col(1) size(medsmall))

graph save b,replace

graph combine a.gph b.gph, rows(2) graphregion(fcolor(gs16) lcolor(gs16)) xsize(13.5) ysize(20)

graph export "$path4\FigureA2.pdf",replace

erase a.gph

erase b.gph

********************************************************************************

* *

* Figures in Appendix C *

* *

********************************************************************************

***************************************************

*Figure C1: Benford's Law and Data Quality on SDYs*

***************************************************

use "$path1B\county_data.dta", clear

drop if region1990 == .

gen prov = floor(region1990/10000)

keep if !inlist(prov,11,12,31) & district!=1 & sdy_density!=. // keep the sample corresponding to our main analysis

keep sdy region1990

firstdigit sdy, percent

. firstdigit sdy, percent

n chi-sq. P-value digit observed expected

------------------------------------------------------------

sdy 1773 6.81 0.5575 1 28.54 30.10

2 17.65 17.61

3 12.86 12.49

4 9.53 9.69

5 8.97 7.92

6 6.32 6.69

7 6.32 5.80

8 4.74 5.12

9 5.08 4.58

clear

input str9 digit data benford

1 28.54 30.10

2 17.65 17.61

3 12.86 12.49

4 9.53 9.69

5 8.97 7.92

6 6.32 6.69

7 6.32 5.80

8 4.74 5.12

9 5.08 4.58

end // the input comes from the results of firstdight, as shown above

graph bar benford data, over(digit) bargap(0) bar(1,color(gs0)) bar(2,color(gs12)) ///

graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(5)30, angle(0) format(%12.0f)) ytitle("Percentage Points", margin(medium)) ///

b1title("First Digit", margin(medium)) ///

legend(label(1 "Data on SDYs") label(2 "Benford's Law") ring(0) pos(2) colgap(*0.5))

graph export "$path4\FigureC1.pdf",replace

****************************************************

*Figure C2: Number of SDYs Estimated from CFPS 2010*

****************************************************

/*Note: Plotting this graph requires the original CFPS 2010 data.

We directly provide the output numbers here. Those numbers can be

replicated with the following codes. */

use "$path2B\cfps2010adult_201906",clear

rename qg101_a_1 sdy_start

rename qa1y_best year_birth

keep if qg1_s_1r==1|qg1_s_2r==1 // keep SDY sample

keep sdy_start rswt_nat

keep if inrange(sdy_start,1962,1979)

recode sdy_start (1962/1966=1)(1967/1968=2), gen(period)

replace period = sdy_start-1966 if inrange(sdy_start,1969,1979)

collapse (sum) rswt_nat, by(period)

gen cfps_impute = rswt_nat/10000

drop rswt_nat

list

clear

input str9 year sdy_national cfps_impute

1962-1966129.28160.266

1967-1968199.68254.5743

1969 267.38162.5964

1970 106.4113.5627

1971 74.8377.62477

1972 67.3984.00252

1973 89.6152.58943

1974 172.48170.2201

1975 236.86209.5235

1976 188.03173.3829

1977 171.68105.0372

1978 48.0932.24603

1979 24.773.03945

end

replace sdy_national= sdy_national/100

replace cfps_impute = cfps_impute/100

encode year, generate(period)

graph bar sdy_national cfps_impute, over(period,label(angle(90))) bargap(0) bar(1,color(gs0)) bar(2,color(gs12)) ///

graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(0(0.5)3, angle(0) format(%12.1f)) ytitle("Number of SDYs (Million)", margin(medium)) ///

legend(label(1 "National Reports") label(2 "CFPS Estimates") ring(0) pos(2) colgap(*0.5) )

graph export "$path4\FigureC2.pdf",replace

********************************************************************************

* *

* Figures in Appendix E *

* *

********************************************************************************

****************************************

* Synthetic Control, Step 1: *

* prepare county-level characteristics *

****************************************

use "$path1B\census_1990_clean.dta", clear

keep if rural == 1

gen minority = 1 - han_ethn

gen famine = inrange(year_birth,1959,1961)

gen nonfamine = inrange(year_birth,1955,1957)

replace minority = . if !inrange(year_birth,1946,1955)

replace primary_graduate = . if !inrange(year_birth,1946,1955)

replace junior_graduate = . if !inrange(year_birth,1946,1955)

collapse (mean) minority primary_graduate junior_graduate (sum) famine nonfamine, by(region1990)

gen ins_famine = 1 - famine/nonfamine

drop famine nonfamine

save "$path1A\census_1990_county_SC.dta", replace

****************************************************************

* Synthetic Control, Step 2: *

* pick up counties with sufficient observations in each cohort *

****************************************************************

use "$path1B\census_1990_clean.dta", clear

keep if inrange(year_birth,1946,1969)

keep if rural==1

collapse (count) N = yedu, by(year_birth region1990)

drop if N < 30

bysort region1990: gen balance = _N

keep if balance == 24

drop balance

keep region1990

duplicates drop

save "$path1A\census_1990_county_list.dta", replace

*****************************************************

* Synthetic Control, Step 3: *

* prepare county-by-cohort data for the SC analysis *

*****************************************************

use "$path1B\county_data.dta", clear

drop if region1990 == . |countyid == .

drop region1982 region2000 region2010

*SC requires counties to have complete information

merge 1:1 region1990 using "$path1A\census_1990_county_list.dta", keep(3) nogenerate

merge 1:1 region1990 using "$path1A\census_1990_county_SC.dta", keep(3) nogenerate

replace grain_output = grain_output/pop1964

keep region1990 sdy_density grain_output minority ins_famine urbanratio64 primary_graduate junior_graduate

keep if !missing(sdy_density,grain_output,minority,ins_famine,urbanratio64)

sort sdy_density

gen N = _N

gen treat = [_n > (N+1)/2]

drop N

tempfile temp

save `temp', replace

use "$path1B\census_1990_clean.dta", clear

keep if inrange(year_birth,1946,1969)

keep if rural==1

collapse (mean) yedu, by(year_birth region1990)

merge m:1 region1990 using `temp', keep(3) nogenerate

tsset region1990 year_birth

save "$path1A\synth_analysis.dta", replace

*****************************************************

* Synthetic Control, Step 4: *

* extended Abadie synthetic control (SC) method *

*****************************************************

use "$path1A\synth_analysis.dta", clear

gen D = [year_birth>=1956]*treat

drop if D==.

parallel initialize

synth_runner yedu yedu(1946(1)1955) primary_graduate(1955) junior_graduate(1955) grain_output(1955) urbanratio64(1955) minority(1955) ins_famine(1955), d(D) gen_var parallel

effect_graphs

pval_graphs

matrix P = e(pvals_std)

save "$path1A\synth_results.dta", replace

clear

svmat P, names(matcol)

gen I = 1

reshape long Pc, i(I) j(lead)

drop I

rename Pc p_vals

tempfile temp

save `temp', replace

use "$path1A\synth_results.dta", clear

merge m:1 lead using `temp', nogenerate

save "$path1A\synth_results.dta", replace

*****************************************************************************

*Figure E1: Estimating the Effect of SDYs using the Synthetic Control Method*

*****************************************************************************

use "$path1A\synth_results.dta", clear

keep if treat == 1

collapse (mean) p_vals yedu yedu_synth, by(year_birth)

gen effect = yedu - yedu_synth

twoway line yedu year_birth, lcolor(black) || line yedu_synth year_birth, lpattern(dash) lcolor(black) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(5(0.5)8, angle(0) format(%12.1f)) ytitle("Average Years of Education", margin(medium)) ///

xlabel(1945(5)1970) xtick(1945(5)1970) xtitle("Birth Cohort") ///

legend(label(1 "Treated (Counties with Higher""Density of SDY)") label(2 "Synthetic Control from Counties""with Lower Density of SDY") col(1) size(small) ring(0) pos(4) colgap(*0.5)) ///

xline(1955, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Treatment versus Control",size(medium) margin(medium))

graph save a,replace

twoway line effect year_birth, yaxis(1) lcolor(black) || scatter p_vals year_birth, yaxis(2) msymbol(square) mcolor(black) ///

||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///

ylabel(-0.1(0.1)0.4, angle(0) format(%12.1f) axis(1)) ytitle("Treatment Effect", margin(medium) axis(1)) ///

ylabel(0(0.2)1, angle(0) format(%12.1f) axis(2)) ytick(-0.25 0(0.2)1,axis(2)) ytitle("p-values", margin(medium) axis(2)) ///

xlabel(1945(5)1970) xtick(1945(5)1970) xtitle("Birth Cohort") ///

legend(label(1 "Treatment Effect") label(2 "p-values") col(1) size(small) ring(0) pos(3) colgap(*0.5)) ///

xline(1955, lpattern(solid) lwidth(thin) lcolor(black)) ///

title("Treatment Effect and P-values",size(medium) margin(medium))

graph save b,replace

graph combine a.gph b.gph, rows(1) graphregion(fcolor(gs16) lcolor(gs16)) xsize(18) ysize(8)

graph export "$path4\FigureE1.pdf",replace

erase a.gph

erase b.gph

erase "$path1A\census_1990_county_list.dta"

erase "$path1A\census_1990_county_SC.dta"

erase "$path1A\synth_analysis.dta"

erase "$path1A\synth_results.dta"

长按二维码下载所有数据程序，附上提取码：jlsq

下面这些短链接文章属于合集，可以收藏起来阅读，不然以后都找不到了。

2.5年，计量经济圈近1000篇不重类计量文章，

可直接在公众号菜单栏搜索任何计量相关问题,

Econometrics Circle

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

登热榜！某牙电母被S，榜一求爱遭拒！柚柚阿哲合体年度走红毯！

小敏感喊话阿哲，出镜抖音！欠钱不还，小白龙再被扒借贷官司！

AER中截面数据(队列)DID的程序和数据开放下载！来自中国四学者的最新研究！

您可能也对以下帖子感兴趣

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

登热榜！某牙电母被S，榜一求爱遭拒！柚柚阿哲合体年度走红毯！

小敏感喊话阿哲，出镜抖音！欠钱不还，小白龙再被扒借贷官司！

生成图片，分享到微信朋友圈

AER中截面数据(队列)DID的程序和数据开放下载！来自中国四学者的最新研究！

您可能也对以下帖子感兴趣