查看原文
其他

AER中截面数据(队列)DID的程序和数据开放下载!来自中国四学者的最新研究!

凡是搞计量经济的,都关注这个号了
箱:econometrics666@sina.cn
所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

之前,引荐过“中国学界F4发表AER一篇! 知识青年上山下乡与农村教育问题!”。今天,我们把该文原始数据和运行程序分享给各位学者,有需要的可以根据程序说明运行出结果。

这篇文章使用的方法已经被多次讲述过,例如:1.截面数据DID讲述, 截面做双重差分政策评估的范式,2.截面数据DID操作程序指南, 一步一步教你做,3.截面DID, 各种固定效应, 安慰剂检验, 置换检验, 其他外部冲击的处理
Yi Chen,Ziying Fan,Xiaomin Gu,Li-An Zhou. Arrival of Young Talent: The Send-Down Movement and Rural Education in China, American Economic Review,2020.
This paper estimates the effects of the send-down movement during the Cultural Revolution— when about 16 million urban youth were mandated to resettle in the countryside— on rural education. Using a county-level dataset compiled from local gazetteers and population censuses, we show that greater exposure to the sent-down youths significantly increased rural children's educational achievement. This positive effect diminished after the urban youth left the countryside in the late 1970s but never disappeared. Rural children who interacted with the sent-down youths were also more likely to pursue more-skilled occupations, marry later, and have smaller families than those who did not.
这是数据:

这是程序


长按二维码可以查看数据程序使用说明

注:为更好地浏览以下code,建议使用电脑查看。


下面展示了1-Table_Census_1990这个程序里的具体code(也是主要结果),其他的可以在文后下载。

******************************************************************************

This do-file carries out the analysis using the 1990 census.

Input data files:
census_1990_clean.dta
county_year_data.dta

Output files:
Table 2.txt (Summary Statistics)
Table 3.txt Columns (1)--(7) (The Effect of SDYs on the Educational Attainment of Rural Children)
Table 4.txt (Heterogeneous Effect of SDYs)
Table 6.txt (Addressing Various Confounding Factors)
Table 8.txt Columns (1)--(7) (The Lasting Effect of SDYs on Outcomes other than Education)

Figure 3.txt, census 1990 
*/
**********************************************
*Preparation:                                *
*Compute speed of school construction program*
**********************************************
use "$path1B\county_year_data.dta", clear

foreach s in secondary primary {
sort countyid year

bysort countyid: egen Min_year = min(year) if inrange(year,1964,1966)&!missing(school_`s')
bysort countyid: egen Max_year = max(year) if inrange(year,1975,1977)&!missing(school_`s')

bysort countyid: egen min_year = mean(Min_year)
bysort countyid: egen max_year = mean(Max_year)

gen Min_school = school_`s' if year == min_year
gen Max_school = school_`s' if year == max_year

bysort countyid: egen min_school = mean(Min_school)
bysort countyid: egen max_school = mean(Max_school)

gen `s'_speed = (max_school-min_school)/(max_year-min_year)
drop Min_* Max_* min_* max_*
}

drop if secondary_speed ==. | primary_speed == .

keep countyid secondary_speed primary_speed
duplicates drop

save "$path1A\rural_school_expansion.dta", replace


********************************************************************************
*                                                                              *
*             Step 1: Data Preparation and Summary Statistics                  *
*                                                                              *
********************************************************************************
use "$path1B\census_1990_clean.dta", clear

*******************
*Control:1946-1955*
*Treat:  1956-1969*
*******************
gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

******************
* Define Globals *
******************
global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"
global var_abs_cohort2 "region1990 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

************************************
* Generate County Characteristics  *
************************************
*rural school expansion
merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)

gen speed_primary_density   = primary_speed/(pop1964/1000)
gen speed_secondary_density = secondary_speed/(pop1964/1000)

*intensity of the Great Famine
gen famine = inrange(year_birth,1959,1961) if rural == 1
gen nonfamine = inrange(year_birth,1955,1957) if rural == 1

bysort region1990: egen sum_famine = sum(famine)
bysort region1990: egen sum_nonfamine = sum(nonfamine)

gen ins_famine = 1-sum_famine/sum_nonfamine
drop famine nonfamine sum_famine sum_nonfamine

*extract data for county-level information
preserve
generate cr_info = [victims_cr!=.]
generate grain_info = [grain_output!=.]
generate school_info = !missing(speed_primary_density,speed_secondary_density )

foreach var in yedu primary_graduate junior_graduate {
replace `var' = . if treat !=0 | rural != 1 // only keep the baseline
}

collapse (mean) countyid pop1964 sdy_density han_ethn primary_graduate junior_graduate victims_cr cr_info grain_info school_info ins_famine, by(region1990)
gen prov = floor(region1990/10000)

save "$path1A\census_1990_county_char.dta", replace
restore

*******************************************************************
*Table 2: Summary Statistics of the 1% Sample from the 1990 Census*
*******************************************************************
gen age = 1990 - year_birth
outsum yedu primary_graduate junior_graduate male han_ethn age if treat==0 & rural==1 using "$path4\Table2.txt", replace
outsum yedu primary_graduate junior_graduate male han_ethn age if treat==0 & rural==0 using "$path4\Table2.txt", append
outsum yedu primary_graduate junior_graduate male han_ethn age if treat==1 & rural==1 using "$path4\Table2.txt", append
outsum yedu primary_graduate junior_graduate male han_ethn age if treat==1 & rural==0 using "$path4\Table2.txt", append


********************************************************************************
*                                                                              *
*                          Step 2: Main Results                                *
*                                                                              *
********************************************************************************
*****************************************************************************
*Table 3: The Effect of SDYs on the Educational Attainment of Rural Children*
*Columns (1)--(7)                                                           *
*****************************************************************************
foreach var in yedu primary_graduate junior_graduate {
forvalues i = 1/2 {
if (`i'==1) reghdfe `var' c.sdy_density#c.treat male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)
if (`i'==2) reghdfe `var' c.sdy_density#c.treat male han_ethn if rural==0, absorb($var_abs_cohort) cluster(region1990)

summ `var' if e(sample)&treat==0
local mean = r(mean)
if (("`var'"=="yedu")&(`i'==1)) outreg2 using "$path4\Table3.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat male han_ethn) sortvar(c.sdy_density#c.treat male han_ethn)
if (("`var'"!="yedu")|(`i'!=1)) outreg2 using "$path4\Table3.txt", append  se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat male han_ethn) sortvar(c.sdy_density#c.treat male han_ethn)
}
}

keep if rural==1
drop rural // for the remaining analysis, we only use the rural sample

gen treat_placebo = inrange(year_birth,1951,1955) if inrange(year_birth,1946,1955)

reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, absorb($var_abs_cohort) cluster(region1990)
outreg2 using "$path4\Table3.txt", append se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn) sortvar(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn)

drop treat_placebo


****************************************************************************************
*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*
*Panel B: Census 1990                                                                  *
****************************************************************************************
compress

forvalues y = 1946/1969 {
gen I`y' = sdy_density*[year_birth==`y']
}

reghdfe yedu I1946-I1969 male han_ethn, absorb($var_abs_cohort2) cluster(region1990)
outreg2 using "$path4\Figure3.txt", replace sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1969) sortvar(I1946-I1969)

drop I1946-I1969
drop if inrange(year_birth,1941,1945) // these cohorts serve as the baseline in the above regression, and will not be used in the following analysis.

***************************************
*Table 4: Heterogeneous Effect of SDYs*
***************************************
forvalues i = 1/8 {
if (`i'==1) reghdfe yedu             c.sdy_density#c.treat male han_ethn if male==1                   , absorb($var_abs_cohort) cluster(region1990)
if (`i'==2) reghdfe yedu             c.sdy_density#c.treat male han_ethn if male==0                   , absorb($var_abs_cohort) cluster(region1990)

if (`i'==3) reghdfe yedu             c.sdy_density#c.treat male han_ethn if edu_base<5.5              , absorb($var_abs_cohort) cluster(region1990)
if (`i'==4) reghdfe yedu             c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)

if (`i'==5) reghdfe primary_graduate c.sdy_density#c.treat male han_ethn if edu_base<5.5              , absorb($var_abs_cohort) cluster(region1990)
if (`i'==6) reghdfe junior_graduate  c.sdy_density#c.treat male han_ethn if edu_base<5.5              , absorb($var_abs_cohort) cluster(region1990)

if (`i'==7) reghdfe primary_graduate c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)
if (`i'==8) reghdfe junior_graduate  c.sdy_density#c.treat male han_ethn if (edu_base>=5.5&edu_base<.), absorb($var_abs_cohort) cluster(region1990)

summ yedu             if e(sample)&treat==0
local mean1 = r(mean)
summ primary_graduate if e(sample)&treat==0
local mean2 = r(mean)
summ junior_graduate  if e(sample)&treat==0
local mean3 = r(mean)

if (`i'==1) outreg2 using "$path4\Table4.txt", replace se nonotes nocons noaster nolabel text addstat(Mean1,`mean1',Mean2,`mean2',Mean3,`mean3') keep(c.sdy_density#c.treat)
if (`i'!=1) outreg2 using "$path4\Table4.txt", append  se nonotes nocons noaster nolabel text addstat(Mean1,`mean1',Mean2,`mean2',Mean3,`mean3') keep(c.sdy_density#c.treat)
}


********************************************************************************
*                                                                              *
*          Step 3: Contemporaneous Events and Other Outcome Variables          *
*                                                                              *
********************************************************************************
*************************************************
*Table 6: Addressing Various Confounding Factors*
*************************************************
*grain productivity
replace grain_output = grain_output/pop1964

*Cultural Revolution
replace victims_cr = victims_cr/pop1964

gen treat_cr1 = inrange(year_birth,1954,1961)
gen treat_cr2 = inrange(year_birth,1962,1968)

*great famine
gen famine_cohort1 = inrange(year_birth,1955,1958)
gen famine_cohort2 = inrange(year_birth,1959,1961)

*prepare for the interaction terms between school expansion program and SDY
reghdfe yedu c.sdy_density#c.treat c.speed_primary_density#c.treat c.speed_secondary_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region1990)

summ speed_secondary_density if e(sample)==1
gen DV_secondary_density = speed_secondary_density - r(mean)

summ speed_primary_density if e(sample)==1
gen DV_primary_density = speed_primary_density - r(mean)

summ sdy_density if e(sample)==1
gen DV_sdy_density = sdy_density - r(mean)

local newvar1  "c.grain_output#c.treat"
local newvar2  "c.speed_primary_density#c.treat c.speed_secondary_density#c.treat"
local newvar3  "c.speed_primary_density#c.treat c.speed_secondary_density#c.treat c.DV_sdy_density#c.treat#c.DV_primary_density c.DV_sdy_density#c.treat#c.DV_secondary_density"
local newvar4  "c.victims_cr#c.treat_cr1 c.victims_cr#c.treat_cr2"
local newvar5  "c.ins_famine#c.famine_cohort1 c.ins_famine#c.famine_cohort2"
local newvar6  "`newvar1' `newvar2' `newvar4' `newvar5'"
local newvar_r c.grain_output#c.treat c.speed_primary_density#c.treat c.speed_secondary_density#c.treat c.DV_sdy_density#c.treat#c.DV_primary_density c.DV_sdy_density#c.treat#c.DV_secondary_density ///
c.victims_cr#c.treat_cr1 c.victims_cr#c.treat_cr2 c.ins_famine#c.famine_cohort1 c.ins_famine#c.famine_cohort2

capture gen sample = .

forvalues i = 1/6 {
if (`i'==1) local comm "replace"
if (`i'!=1) local comm "append"

reghdfe yedu c.sdy_density#c.treat `newvar`i'' male han_ethn, absorb($var_abs_cohort) cluster(region1990)
outreg2 using "$path4\Table6_A.txt", `comm' se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat `newvar_r') sortvar(c.sdy_density#c.treat `newvar_r') 

replace sample = e(sample)
reghdfe yedu c.sdy_density#c.treat             male han_ethn if sample == 1, absorb($var_abs_cohort) cluster(region1990)
outreg2 using "$path4\Table6_B.txt", `comm' se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat)
}

**********************************************************************
*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*
*Columns (1)--(7)                                                    *
**********************************************************************
gen senior_high = [yedu > 9] if yedu>=9 & yedu<. 
/*According to our definition of yedu, junior high graduates receive 9 years of education.
Going beyond 9 years of education is equivalent to going beyond junior high education.*/

gen occ_highskill = inlist(occisco,2,3) if !inlist(occisco,1,99)

forvalues i = 1/7 {
if (`i'==1) reghdfe senior_high   c.sdy_density#c.treat      male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==2) reghdfe laborforce    c.sdy_density#c.treat      male han_ethn, absorb($var_abs_cohort) cluster(region1990)
if (`i'==3) reghdfe laborforce    c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==4) reghdfe occ_highskill c.sdy_density#c.treat      male han_ethn, absorb($var_abs_cohort) cluster(region1990)
if (`i'==5) reghdfe occ_highskill c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

if (`i'==6) reghdfe teacher       c.sdy_density#c.treat      male han_ethn, absorb($var_abs_cohort) cluster(region1990)
if (`i'==7) reghdfe teacher       c.sdy_density#c.treat yedu male han_ethn, absorb($var_abs_cohort) cluster(region1990)

summ `e(depvar)' if e(sample)&treat==0
local mean = r(mean)

if (`i'==1) outreg2 using "$path4\Table8.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)
if (`i'!=1) outreg2 using "$path4\Table8.txt", append  se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)
}




文中其他表格的code:

Output files:
Table 3.txt Column (8) (The Effect of SDYs on the Educational Attainment of Rural Children)
Table 5.txt (Effects of SDYs on the Supply of Local Teachers and Educational Fiscal Expenses, 1955--1977)
Table 7.txt (The Effect of SDYs on Local People's Locus of Control)
Table 8.txt Columns (8)--(10) (The Lasting Effect of SDYs on Outcomes other than Education)

Figure 3.txt, census 1982/2010 
*/
********************************************************************************
*                                                                              *
*                    Step 1: Analysis using the 1982 Census                    *
*                                                                              *
********************************************************************************
global var_abs_cohort2 "region1982 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

use "$path1B\census_1982_clean.dta", clear


****************************************************************************************
*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*
*Panel A: Census 1982                                                                  *
****************************************************************************************
forvalues y = 1946/1962 {
gen I`y' = sdy_density*[year_birth==`y']
}

reghdfe yedu I1946-I1962 male han_ethn, absorb($var_abs_cohort2) cluster(region1982)
outreg2 using "$path4\Figure3.txt", append sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1962) sortvar(I1946-I1962)





********************************************************************************
*                                                                              *
*                    Step 2: Analysis using the 2000 Census                    *
*                                                                              *
********************************************************************************
global var_abs_cohort  "region2000 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"
global var_abs_cohort2 "region2000 prov#year_birth c.primary_base_older#year_birth c.junior_base_older#year_birth"

use "$path1B\census_2000_clean.dta", clear


*****************************************************************************
*Table 3: The Effect of SDYs on the Educational Attainment of Rural Children*
*Columns (8)                                                                *
*****************************************************************************
gen treat_placebo = inrange(year_birth,1975,1979) if inrange(year_birth,1970,1979)

reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, absorb($var_abs_cohort) cluster(region2000)
outreg2 using "$path4\Table3.txt", append se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn) sortvar(c.sdy_density#c.treat c.sdy_density#c.treat_placebo male han_ethn)

drop treat_placebo

****************************************************************************************
*Inputs for Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*
*Panel C: Census 2000                                                                  *
****************************************************************************************
forvalues y = 1946/1979 {
gen I`y' = sdy_density*[year_birth==`y']
}

reghdfe yedu I1946-I1979 male han_ethn, absorb($var_abs_cohort2) cluster(region2000)
outreg2 using "$path4\Figure3.txt", append sideway noparen se nonotes nocons noaster nolabel text keep(I1946-I1979) sortvar(I1946-I1979)

drop I1946-I1979

**********************************************************************
*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*
*Columns (8)--(9)                                                    *
**********************************************************************
gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

forvalues i = 1/2 {
if (`i'==1) reghdfe age_marry1st c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2000)
if (`i'==2) reghdfe n_child      c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2000)
summ `e(depvar)' if e(sample)&treat==0
local mean = r(mean)
outreg2 using "$path4\Table8.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)
}

********************************************************************************
*                                                                              *
*                    Step 3: Analysis using the 2010 Census                    *
*                                                                              *
********************************************************************************
global var_abs_cohort  "region2010 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

use "$path1B\census_2010_clean.dta", clear
rename treat_p treat

**********************************************************************
*Table 8: The Lasting Effect of SDYs on Outcomes other than Education*
*Columns (10)                                                        *
**********************************************************************
reghdfe yedu c.sdy_density#c.treat male han_ethn, absorb($var_abs_cohort) cluster(region2010)
summ `e(depvar)' if e(sample)&treat==0
local mean = r(mean)
outreg2 using "$path4\Table8.txt", append se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat) sortvar(c.sdy_density#c.treat)


********************************************************************************
*                                                                              *
*                    Step 4: Analysis using the 2010 CFPS                      *
*                                                                              *
********************************************************************************
global var_abs_cohort "region2010_h prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

use "$path1B\CFPS_2010_clean.dta", clear

gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

eststo clear
foreach var of varlist LOC LOC_education LOC_talent LOC_effort LOC_hard_work LOC_intellect LOC_F_SES LOC_F_wealth LOC_F_connection LOC_luck LOC_connection {
eststo: reghdfe `var' c.sdy_density#c.treat male han_ethn, vce(cluster region2010_h) absorb($var_abs_cohort)
}
outreg2 [*] using "$path4\Table7_A.txt", se nonotes nocons noaster nolabel bdec(3) text replace keep(c.sdy_density#c.treat) sortvar(c.sdy_density#c.treat)

eststo clear
foreach var of varlist LOC LOC_education LOC_talent LOC_effort LOC_hard_work LOC_intellect LOC_F_SES LOC_F_wealth LOC_F_connection LOC_luck LOC_connection {
eststo: reghdfe `var' c.sdy_density#c.treat male han_ethn yedu, vce(cluster region2010_h) absorb($var_abs_cohort)
}
outreg2 [*] using "$path4\Table7_B.txt", se nonotes nocons noaster nolabel bdec(3) text replace keep(c.sdy_density#c.treat yedu) sortvar(c.sdy_density#c.treat yedu)


********************************************************************************
*                                                                              *
*          Step 5: Analysis using our county-by-year data                      *
*                                                                              *
********************************************************************************
use "$path1B\county_year_data.dta", clear

keep if inrange(year,1955,1977)
drop if sdy_density == .

gen postSDY = [year >= 1968] if inrange(year,1955,1977)

foreach i in pri sec {
foreach j in total state nonst {
gen ratio_`i'_`j' = tch_`i'_`j'/pop1964
}
}

gen fiscal_edu_pc = log(10000*fiscal_edu/pop1964)

eststo clear

forvalues i = 1/7 {
if (`i'==1) reghdfe ratio_pri_total c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==2) reghdfe ratio_pri_state c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==3) reghdfe ratio_pri_nonst c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==4) reghdfe ratio_sec_total c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==5) reghdfe ratio_sec_state c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==6) reghdfe ratio_sec_nonst c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)
if (`i'==7) reghdfe fiscal_edu_pc   c.sdy_density#c.postSDY, cluster(countyid) absorb(countyid year#prov)

unique countyid if e(sample)
local count = r(unique)

if (`i'==1) outreg2 using "$path4\Table5.txt", se nonotes nocons noaster nolabel bdec(3) text replace addstat(Ncounty,`count') keep(c.sdy_density#c.postSDY) 
if (`i'!=1) outreg2 using "$path4\Table5.txt", se nonotes nocons noaster nolabel bdec(3) text append  addstat(Ncounty,`count') keep(c.sdy_density#c.postSDY) 
}




下面的code输出文章中的图:

Output files:
Figure1.pdf (Number of SDYs by Resettlement, 1962--1979)
Figure3.pdf (Effect of SDYs on the Educational Attainment of Different Cohorts)
*/
**************************************************************************
*Figure 1: Number of SDYs by Resettlement, 1962--1979 (Source: Gu (2009))*
**************************************************************************
clear
input str9 year total rural_village collective_farm state_farm
1962-1966 129.28 87.06 0     42.22
1967-1968 199.68 165.96 0     33.72
1969     267.38 220.44 0     46.94
1970     106.4 74.99 0     31.41
1971     74.83 50.21 0     24.62
1972     67.39 50.26 0     17.13
1973     89.61 80.64 0     8.97
1974     172.48 119.19 34.63 18.66
1975     236.86 163.45 49.68 23.73
1976     188.03 122.86 41.51 23.66
1977     171.68 113.79 41.9 15.99
1978     48.09 26.04 18.92 3.13
1979     24.77 7.32 16.44 1.01
end
foreach var in total rural_village collective_farm state_farm {
replace `var'= `var'/100
}

gen v_temp = rural_village + collective_farm
encode year, generate(period)

twoway bar rural_village period, barw(0.6) base(0) color(gs2) ///
|| rbar v_temp rural_village period, barw(0.6) color(gs12) ///
|| rbar total v_temp period, barw(0.6) color(gs7) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(0.5)3, angle(0) format(%12.1f)) ytitle("Number of SDYs (Million)", margin(medium)) ///
xlabel(1(1)13, noticks valuelabel angle(90)) xtitle("Year") ///
legend(label(1 "Rural Villages") label(2 "Collective Farms") label(3 "State Farms") ring(0) pos(2) colgap(*0.5)  ) 
graph export "$path4\Figure1.pdf",replace


*****************************************************************************
*Figure 3: Effect of SDYs on the Educational Attainment of Different Cohorts*
*****************************************************************************
insheet using "$path4\Figure3.txt", clear
keep if inrange(_n,5,38)
gen year = substr(v1,2,4)

rename (v2 v3 v4 v5 v6 v7)(coef1990 se1990 coef1982 se1982 coef2000 se2000)
destring, force replace
keep year coef* se*

reshape long coef se, i(year) j(data)
drop if coef == .

gen lb = coef - 1.96*se
gen ub = coef + 1.96*se
gen y_overlap = min(max(year-1955,0),max(1970-year,0),6)
sort data year

twoway line lb year if data==1982, sort lpattern(dash) lcolor(gs8) yaxis(1) ///
|| line ub year if data==1982, sort lpattern(dash) lcolor(gs8) ///
|| line coef year if data==1982, lwidth(thick) lcolor(black)  yaxis(1) ///
|| line y_overlap year if data==1982, sort lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(-4(2)8, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///
     ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///
xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel A - Census 1982", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) legend(off) fxsize(70) fysize(60)
graph save a,replace 

twoway line lb year if data==1990, lpattern(dash) lcolor(gs8) yaxis(1) ///
|| line ub year if data==1990, lpattern(dash) lcolor(gs8) ///
|| line coef year if data==1990, lwidth(thick) lcolor(black) yaxis(1) ///
|| line y_overlap year if data==1990, lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(-4(2)8, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///
     ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///
xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel B - Census 1990", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) legend(off) fxsize(70) fysize(60)
graph save b,replace 

twoway line lb year if data==2000, lpattern(dash) lcolor(gs8) yaxis(1) ///
|| line ub year if data==2000, lpattern(dash) lcolor(gs8) ///
|| line coef year if data==2000, lwidth(thick) lcolor(black) yaxis(1) ///
|| line y_overlap year if data==2000, lpattern(dash_dot) lwidth(thick) lcolor(gs8) yaxis(2) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(-3(1)6, labsize(small) angle(0) format(%12.0f) axis(1)) ytitle("Coefficients", size(small) axis(1)) ///
     ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", size(small) axis(2)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///
legend(order(3 1 4)label(3 "Coefficient") label(1 "95% CI") label(4 "Overlapped Years in""Primary Schools") col(2) size(small) margin(tiny)) ///
xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel C - Census 2000", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) fxsize(65) fysize(80)
graph save c,replace 

twoway || connected coef year if data==1982, lwidth(medthick) msymbol(triangle) color(black) ///
|| line coef year if data==1990, lwidth(medthick) color(gs6) ///
|| connected coef year if data==2000, lwidth(medthick) msymbol(square) color(gs12) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(-2(1)5, labsize(small) angle(0) format(%12.0f)) ytitle("Coefficients", size(small)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///
legend(label(1 "Census 1982") label(2 "Census 1990") label(3 "Census 2000") col(2) size(small)) ///
xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel D - Three Censuses in One Graph", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) fxsize(70) fysize(80)
graph save d,replace 


graph combine a.gph b.gph c.gph d.gph, graphregion(fcolor(gs16) lcolor(gs16))
graph export "$path4\Figure3.pdf",replace

erase a.gph
erase b.gph
erase c.gph
erase d.gph
erase "$path4\Figure3.txt"



附录的表格code:

********************************************************************************
*                                                                              *
*                            Tables in Appendix A                              *
*                                                                              *
********************************************************************************
use "$path1B\census_1990_clean.dta", clear
global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"

keep if inrange(year_birth,1946,1969)
gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)

***********************************************************
* generate base education level for both rural and urban  *
***********************************************************
gen edu_temp_urban = yedu if treat == 0 & rural == 0 

gen prefec = floor(region1990/100)
bysort region1990: egen edu_base_urban1 = mean(edu_temp_urban)
bysort prefec    : egen edu_base_urban2 = mean(edu_temp_urban)
bysort prov      : egen edu_base_urban3 = mean(edu_temp_urban)

drop edu_temp_urban

**************************************************
* Table A1: Knowledge Gap and the Effect of SDYs *
**************************************************
forvalues i = 1/3 {
gen edu_base_diff`i' = edu_base_urban`i' - edu_base

summ edu_base_diff`i' if !missing(yedu,sdy_density,edu_base_diff`i',treat) & rural==1
gen DV_edu_base_diff`i' = edu_base_diff`i' - r(mean)

summ sdy_density if !missing(yedu,sdy_density,edu_base_diff`i',treat) & rural==1
gen DV_sdy_density = sdy_density - r(mean)

reghdfe yedu c.sdy_density#c.treat c.treat#c.edu_base_diff`i' c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i' male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)
if (`i'==1) outreg2 using "$path4\TableA1.txt", replace se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i')
if (`i'!=1) outreg2 using "$path4\TableA1.txt", append  se nonotes nocons noaster nolabel text keep(c.sdy_density#c.treat c.DV_sdy_density#c.treat#c.DV_edu_base_diff`i')

drop DV_edu_base_diff`i' DV_sdy_density
}
drop edu_base_diff* edu_base_urban*


*******************************************************
* Table A3: The Effect of SDYs on Occupational Choice *
*******************************************************
forvalues i = 1/9 {
gen O`i' = [occisco==`i'] if occisco!=99
}

forvalues i = 1/9 {
reghdfe O`i' c.sdy_density#c.treat male han_ethn if rural==1, absorb($var_abs_cohort) cluster(region1990)

summ `e(depvar)' if e(sample)&treat==0
local mean = r(mean)

if (`i'==1) outreg2 using "$path4\TableA3.txt", replace se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)
if (`i'!=1) outreg2 using "$path4\TableA3.txt", append  se nonotes nocons noaster nolabel text addstat(Mean,`mean') keep(c.sdy_density#c.treat)
}
drop O1-O9 occisco

********************************************************************************
*                                                                              *
*                            Tables in Appendix B                              *
*                                                                              *
********************************************************************************
*********************************************************************
* Prepare the information availability of county-by-year level data *
*********************************************************************
use "$path1B\county_year_data.dta", clear
keep if inrange(year,1955,1977)
keep if sdy_density != .

bysort countyid: egen count1 = count(tch_sec_total)
bysort countyid: egen count2 = count(tch_pri_total)
bysort countyid: egen count3 = count(fiscal_edu)

gen share_nonmissing_teacher = (count1 + count2)/46
gen share_nonmissing_fiscal = (count3)/23

keep countyid share_nonmissing_teacher share_nonmissing_fiscal
duplicates drop

save "$path1A\teacher_fiscal_info.dta", replace

*****************************************
* Table B1: Count of Number of Counties *
*****************************************
use "$path1B\county_data.dta", clear

drop if region1990 == .
merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", nogenerate keep(1 3)
merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)
replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .
replace share_nonmissing_fiscal =  0 if share_nonmissing_fiscal  == .

unique region1990
scalar r1 = r(unique) // number in Panel A, row 1

gen prov = floor(region1990/10000)
unique region1990 if !inlist(prov,11,12,31)
scalar r2 = r(unique) // number in Panel A, row 2

unique region1990 if !inlist(prov,11,12,31) & district!=1
scalar r3 = r(unique) // number in Panel A, row 3

unique region1990 if !inlist(prov,11,12,31) & district!=1 & sdy!=.
scalar r4 = r(unique) // number in Panel A, row 4

unique region1990 if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.
scalar r5 = r(unique) // number in Panel A, row 5

*Panel B is conditional on "core counties"
keep if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.
scalar r6 = .

unique region1990 if grain_output !=.
scalar r9 = r(unique) // number in Panel B, row 3

unique region1990 if !missing(secondary_speed,primary_speed)
scalar r12 = r(unique) // number in Panel B, row 6

unique region1990 if !missing(victims_cr)
scalar r13 = r(unique) // number in Panel B, row 7

*For the following variables, they don't have to show up in the 1990 census.
use "$path1B\county_data.dta", clear
merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", nogenerate keep(1 3)
merge m:1 countyid using "$path1A\rural_school_expansion.dta", nogenerate keep(1 3)
replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .
replace share_nonmissing_fiscal =  0 if share_nonmissing_fiscal  == .

gen prov = floor(region1990/10000)
keep if !inlist(prov,11,12,31) & district!=1 & sdy!=. & pop1964!=.

unique countyid if share_nonmissing_teacher > 0
scalar r7 = r(unique) // number in Panel B, row 1
summ share_nonmissing_teacher if share_nonmissing_teacher > 0 
scalar r8 = r(mean) // number in Panel B, row 2

unique countyid if share_nonmissing_fiscal > 0
scalar r10 = r(unique) // number in Panel B, row 4
summ share_nonmissing_fiscal if share_nonmissing_fiscal > 0 
scalar r11 = r(mean) // number in Panel B, row 5

clear
set obs 13
gen num = .
forvalues i = 1/13 {
replace num = r`i' in `i'
}
outsheet using "$path4\TableB1.txt", replace

**************************************************************************************************
* Table B2: Correlation between County-level Information Availability and County Characteristics *
**************************************************************************************************
use "$path1A\census_1990_county_char.dta", clear
merge 1:1 countyid using "$path1A\teacher_fiscal_info.dta", keep(1 3) nogenerate

gen minority = 1 - han_ethn
replace victims_cr = victims_cr/pop1964
replace share_nonmissing_teacher = 0 if share_nonmissing_teacher == .
replace share_nonmissing_fiscal  =  0 if share_nonmissing_fiscal  == .

eststo clear
eststo: reghdfe sdy_density                          primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)
eststo: reghdfe sdy_density              victims_cr  primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

eststo: reghdfe share_nonmissing_teacher sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)
eststo: reghdfe grain_info               sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)
eststo: reghdfe share_nonmissing_fiscal  sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)
eststo: reghdfe school_info              sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)
eststo: reghdfe cr_info                  sdy_density primary_graduate junior_graduate minority ins_famine, vce(robust) absorb(prov)

global var_order "sdy_density victims_cr ins_famine primary_graduate junior_graduate minority"
outreg2 [*] using "$path4\TableB2.txt", replace se nonotes nocons noaster nolabel text keep($var_order) sortvar($var_order)

********************************************************************************
*                                                                              *
*                            Tables in Appendix C                              *
*                                                                              *
********************************************************************************
************************************************************************
* Table C1: Comparing the Number of Received SDYs from County-aggregate*
* with that from National Report in Each Province, Column (1)          *
*                                                                      *
* Note: Column (2) comes from Gu (2009)                                *
************************************************************************
use "$path1B\county_data.dta", clear

drop if region1990 == .
gen prov = floor(region1990/10000)

drop if inlist(prov,11,12,31,54) // Drop Beijing, Tianjin, Shanghai, and Tibet
replace prov = 44 if prov == 46 // Hainan is part of Guangdong during the movement
replace prov = 51 if prov == 50 // Chongqing is part of Sichuan during the movement

collapse (sum) sdy, by(prov)
replace sdy = sdy/1000
outsheet using "$path4\TableC1_column1.txt", replace

********************************************************************************
*                                                                              *
*                            Tables in Appendix D                              *
*                                                                              *
********************************************************************************
use "$path1B\census_1990_clean.dta", clear

global var_abs_cohort "region1990 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"


gen treat        = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)
gen treat_alt    = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)&!inrange(year,1953,1955)
gen treat_junior = inrange(year_birth,1953,1969) if inrange(year_birth,1943,1969)

keep if inrange(year_birth,1943,1969) 

************************************************************
* Table D1: Robustness Check with Different Specifications *
************************************************************
*alternative densities of SDY
bysort region1990: egen cohort_size = sum(treat)
gen sdy_density_alt = sdy_density*pop1964/(100*cohort_size)
drop cohort_size

*alternative exposure to SDY
gen primary_overlap = min(max(year-1955,0),max(1970-year,0),6) if inrange(year_birth,1946,1969)
gen junior_overlap  = min(max(year-1952,0),max(1970-year,0),9) if inrange(year_birth,1943,1969)

global var_order "c.sdy_density#c.treat c.sdy_density_alt#c.treat c.sdy_density#c.primary_overlap c.sdy_density#c.treat_alt c.sdy_density#c.treat_junior c.sdy_density#c.junior_overlap"

gen temp_var = .
forvalues i = 1/8 {
if (`i'>=1&`i'<=3) local dep_var "c.sdy_density#c.treat"
if (`i'==4) local dep_var "c.sdy_density_alt#c.treat"
if (`i'==5) local dep_var "c.sdy_density#c.primary_overlap"
if (`i'==6) local dep_var "c.sdy_density#c.treat_alt"
if (`i'==7) local dep_var "c.sdy_density#c.treat_junior"
if (`i'==8) local dep_var "c.sdy_density#c.junior_overlap"

if (`i'==1) local cond "& year_birth<=1966"
if (`i'==2) local cond "& year_birth<=1963"
if (`i'==3) local cond "& year_birth<=1960"
if (`i'>3)  local cond ""

reghdfe yedu `dep_var' male han_ethn if rural==1 `cond', absorb($var_abs_cohort) cluster(region1990)
if (`i'==1) outreg2 using "$path4\TableD1_A.txt", se nonotes nocons noaster nolabel text replace keep($var_order) sortvar($var_order)
if (`i'!=1) outreg2 using "$path4\TableD1_A.txt", se nonotes nocons noaster nolabel text append  keep($var_order) sortvar($var_order)

replace temp_var = `dep_var' // to make the table easier to read, Panel B reports the corresponding coefficients to Panel A

reghdfe yedu temp_var male han_ethn if rural==0 `cond', absorb($var_abs_cohort) cluster(region1990)
if (`i'==1) outreg2 using "$path4\TableD1_B.txt", se nonotes nocons noaster nolabel text replace keep(temp_var)
if (`i'!=1) outreg2 using "$path4\TableD1_B.txt", se nonotes nocons noaster nolabel text append  keep(temp_var)
}


*************************************
* Table D2: Other Robustness Checks *
*************************************

eststo clear
*drop nine Third-Frontier provinces
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & !inlist(prov,51,52,61,62,42,43,53,64,65), absorb($var_abs_cohort) cluster(region1990)
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & !inlist(prov,51,52,61,62,42,43,53,64,65), absorb($var_abs_cohort) cluster(region1990)

*drop five provinces that does not match well between local gazettes and national reports 
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & !inlist(prov,14,23,53,64,65), absorb($var_abs_cohort) cluster(region1990)
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & !inlist(prov,14,23,53,64,65), absorb($var_abs_cohort) cluster(region1990)

*impose stronger assumptions on migration history
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & local_1985==1, absorb($var_abs_cohort) cluster(region1990)
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & local_1985==1, absorb($var_abs_cohort) cluster(region1990)

*drop sample whose education are eligible for hukou transition
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & hukou_transit==0, absorb($var_abs_cohort) cluster(region1990)
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & hukou_transit==0, absorb($var_abs_cohort) cluster(region1990)

*drop counties that SDY numbers end with zero
gen last_digit = mod(sdy,10)

eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1 & last_digit!=0, absorb($var_abs_cohort) cluster(region1990)
eststo: reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0 & last_digit!=0, absorb($var_abs_cohort) cluster(region1990)

outreg2 [*] using "$path4\TableD2.txt", se nonotes nocons noaster nolabel text replace keep(c.sdy_density#c.treat)

erase "$path1A\census_1990_county_char.dta"
erase "$path1A\rural_school_expansion.dta"
erase "$path1A\teacher_fiscal_info.dta"



以下code产生附录中的图:
Output files:
Figure A1.pdf (Trends of Real Educational Expenditures in Local Gazetteers)
Figure A2.pdf (The Process of China's Secondary Education Expansion since the Late 1960s)
Figure C1.pdf (Benford's Law and Data Quality on SDYs)
Figure C2.pdf (Number of SDYs Estimated from CFPS 2010)
Figure E1.pdf (Estimating the Effect of SDYs using the Synthetic Control Method)
*/
********************************************************************************
*                                                                              *
*                            Figures in Appendix A                             *
*                                                                              *
********************************************************************************
************************************************************************
*Figure A1: Trends of Real Educational Expenditures in Local Gazetteers*
************************************************************************
use "$path1B\county_year_data.dta", clear
keep if inrange(year,1950,1990)
rename fiscal_edu fiscal_edu_county

collapse (mean) fiscal_edu_county, by(year)
merge 1:1 year using "$path1B\NBS_data.dta", nogenerate keepusing(fiscal_edu price_deflator)

replace fiscal_edu_national = fiscal_edu_national/price_deflator
replace fiscal_edu_county   = fiscal_edu_county/price_deflator

twoway line fiscal_edu_national year, yaxis(1) lcolor(black) lpattern(solid) ///
|| line fiscal_edu_county year, yaxis(2) lcolor(gs8) lpattern(dash) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(50)250, angle(0) format(%12.0f) axis(1)) ytitle("National Fiscal Educational Expenditures""from NBS (100 million RMB)", margin(medium) axis(1)) ///
     ylabel(0(100)500, angle(0) format(%12.0f) axis(2)) ytitle("Educational Expenditures Per County""from Local Gazeteers (10,000 RMB)", margin(medium) axis(2)) ///
xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") ///
legend(label(1 "National Fiscal Educational""Expenditures from NBS") label(2 "Educational Expenditures Per County""from Local Gazeteers") col(1) size(medsmall))
graph export "$path4\FigureA1.pdf",replace


**************************************************************************************
*Figure A2: The Process of China's Secondary Education Expansion since the Late 1960s*
**************************************************************************************
use "$path1B\county_year_data.dta", clear
keep if sdy_density !=.

collapse (mean) school_primary school_secondary, by(year)

twoway line school_primary year, yaxis(1) lcolor(black) lpattern(solid) ///
|| line school_secondary year, yaxis(2) lcolor(gs8) lpattern(dash) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(150)600, angle(0) format(%12.0f) axis(1)) ytitle("# Primary Schools per County", margin(medium) axis(1)) ///
     ylabel(0(20)80, angle(0) format(%12.0f) axis(2)) ytitle("# Secondary Schools per County", margin(medium) axis(2)) ///
xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") title("Panel A - Summary Statistics from Local Gazeteers",size(medium) margin(medium)) ///
legend(label(1 "# Primary Schools per County") label(2 "# Secondary Schools per County") col(1) size(medsmall)) 
graph save a, replace

use "$path1B\NBS_data.dta", clear

twoway line primary_stu year, yaxis(1) lcolor(black) lpattern(solid) ///
|| line secondary_stu year, yaxis(2) lcolor(gs8) lpattern(dash) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(400)1600, angle(0) format(%12.0f) axis(1)) ytitle("# Primary Students per 10,000", margin(medium) axis(1)) ///
     ylabel(0(200)800, angle(0) format(%12.0f) axis(2)) ytitle("# Secondary Students per 10,000", margin(medium) axis(2)) ///
xlabel(1950(5)1990) xtick(1950(5)1990) xtitle("Year") title("Panel B - National-level Statistics",size(medium) margin(medium)) ///
legend(label(1 "# Primary Students per 10,000") label(2 "# Secondary Students per 10,000") col(1) size(medsmall))
graph save b,replace 

graph combine a.gph b.gph, rows(2) graphregion(fcolor(gs16) lcolor(gs16)) xsize(13.5) ysize(20)
graph export "$path4\FigureA2.pdf",replace

erase a.gph
erase b.gph 

********************************************************************************
*                                                                              *
*                            Figures in Appendix C                             *
*                                                                              *
********************************************************************************
***************************************************
*Figure C1: Benford's Law and Data Quality on SDYs*
***************************************************
use "$path1B\county_data.dta", clear

drop if region1990 == .
gen prov = floor(region1990/10000)
keep if !inlist(prov,11,12,31) & district!=1 & sdy_density!=. // keep the sample corresponding to our main analysis

keep sdy region1990

firstdigit sdy, percent
/*
. firstdigit sdy, percent

          n   chi-sq.  P-value   digit   observed   expected
------------------------------------------------------------
sdy    1773      6.81   0.5575       1      28.54      30.10
                                     2      17.65      17.61
                                     3      12.86      12.49
                                     4       9.53       9.69
                                     5       8.97       7.92
                                     6       6.32       6.69
                                     7       6.32       5.80
                                     8       4.74       5.12
                                     9       5.08       4.58
*/

clear
input str9 digit data benford
1      28.54      30.10
2      17.65      17.61
3      12.86      12.49
4       9.53       9.69
5       8.97       7.92
6       6.32       6.69
7       6.32       5.80
8       4.74       5.12
9       5.08       4.58
end // the input comes from the results of firstdight, as shown above

graph bar benford data, over(digit) bargap(0)  bar(1,color(gs0)) bar(2,color(gs12)) ///
graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(5)30, angle(0) format(%12.0f)) ytitle("Percentage Points", margin(medium)) ///
     b1title("First Digit", margin(medium)) ///
legend(label(1 "Data on SDYs") label(2 "Benford's Law") ring(0) pos(2) colgap(*0.5)) 
graph export "$path4\FigureC1.pdf",replace 

****************************************************
*Figure C2: Number of SDYs Estimated from CFPS 2010*
****************************************************
/*Note: Plotting this graph requires the original CFPS 2010 data.
We directly provide the output numbers here. Those numbers can be
replicated with the following codes. */


/*
use "$path2B\cfps2010adult_201906",clear

rename qg101_a_1 sdy_start
rename qa1y_best year_birth

keep if qg1_s_1r==1|qg1_s_2r==1 // keep SDY sample
keep sdy_start rswt_nat

keep if inrange(sdy_start,1962,1979)
recode sdy_start (1962/1966=1)(1967/1968=2), gen(period)
replace period = sdy_start-1966 if inrange(sdy_start,1969,1979)

collapse (sum) rswt_nat, by(period)
gen cfps_impute = rswt_nat/10000
drop rswt_nat

list
*/

clear 
input str9 year sdy_national cfps_impute
1962-1966129.28160.266
1967-1968199.68254.5743
1969    267.38162.5964
1970    106.4113.5627
1971    74.8377.62477
1972    67.3984.00252
1973    89.6152.58943
1974    172.48170.2201
1975    236.86209.5235
1976    188.03173.3829
1977    171.68105.0372
1978    48.0932.24603
1979    24.773.03945
end

replace sdy_national= sdy_national/100
replace cfps_impute = cfps_impute/100

encode year, generate(period)

graph bar sdy_national cfps_impute, over(period,label(angle(90))) bargap(0) bar(1,color(gs0)) bar(2,color(gs12)) ///
graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(0(0.5)3, angle(0) format(%12.1f)) ytitle("Number of SDYs (Million)", margin(medium)) ///
legend(label(1 "National Reports") label(2 "CFPS Estimates") ring(0) pos(2) colgap(*0.5)  ) 
graph export "$path4\FigureC2.pdf",replace 

********************************************************************************
*                                                                              *
*                            Figures in Appendix E                             *
*                                                                              *
********************************************************************************
****************************************
* Synthetic Control, Step 1:           *
* prepare county-level characteristics *
****************************************
use "$path1B\census_1990_clean.dta", clear
keep if rural == 1

gen minority = 1 - han_ethn
gen famine = inrange(year_birth,1959,1961)
gen nonfamine = inrange(year_birth,1955,1957)

replace minority         = . if  !inrange(year_birth,1946,1955)
replace primary_graduate = . if  !inrange(year_birth,1946,1955)
replace junior_graduate  = . if  !inrange(year_birth,1946,1955)

collapse (mean) minority primary_graduate junior_graduate (sum) famine nonfamine, by(region1990)

gen ins_famine = 1 - famine/nonfamine
drop famine nonfamine
save "$path1A\census_1990_county_SC.dta", replace

****************************************************************
* Synthetic Control, Step 2:                                   *
* pick up counties with sufficient observations in each cohort *
****************************************************************
use "$path1B\census_1990_clean.dta", clear

keep if inrange(year_birth,1946,1969) 
keep if rural==1

collapse (count) N = yedu, by(year_birth region1990)

drop if N < 30
bysort region1990: gen balance = _N

keep if balance == 24
drop balance

keep region1990
duplicates drop
save "$path1A\census_1990_county_list.dta", replace

*****************************************************
* Synthetic Control, Step 3:                        *
* prepare county-by-cohort data for the SC analysis *
*****************************************************
use "$path1B\county_data.dta", clear
drop if region1990 == . |countyid == .
drop region1982 region2000 region2010

*SC requires counties to have complete information
merge 1:1 region1990 using "$path1A\census_1990_county_list.dta", keep(3) nogenerate
merge 1:1 region1990 using "$path1A\census_1990_county_SC.dta", keep(3) nogenerate
replace grain_output = grain_output/pop1964

keep region1990 sdy_density grain_output minority ins_famine urbanratio64 primary_graduate junior_graduate
keep if !missing(sdy_density,grain_output,minority,ins_famine,urbanratio64)

sort sdy_density
gen N = _N
gen treat = [_n > (N+1)/2] 
drop N

tempfile temp
save `temp', replace


use "$path1B\census_1990_clean.dta", clear

keep if inrange(year_birth,1946,1969) 
keep if rural==1

collapse (mean) yedu, by(year_birth region1990)
merge m:1 region1990 using `temp', keep(3) nogenerate

tsset region1990 year_birth
save "$path1A\synth_analysis.dta", replace

*****************************************************
* Synthetic Control, Step 4:                        *
* extended Abadie synthetic control (SC) method     *
*****************************************************
use "$path1A\synth_analysis.dta", clear

gen D = [year_birth>=1956]*treat
drop if D==.

parallel initialize 

synth_runner yedu yedu(1946(1)1955) primary_graduate(1955) junior_graduate(1955) grain_output(1955) urbanratio64(1955) minority(1955) ins_famine(1955), d(D) gen_var parallel

effect_graphs
pval_graphs

matrix P = e(pvals_std)
save "$path1A\synth_results.dta", replace

clear
svmat P, names(matcol)
gen I = 1
reshape long Pc, i(I) j(lead)
drop I

rename Pc p_vals

tempfile temp
save `temp', replace

use "$path1A\synth_results.dta", clear
merge m:1 lead using `temp', nogenerate
save "$path1A\synth_results.dta", replace

*****************************************************************************
*Figure E1: Estimating the Effect of SDYs using the Synthetic Control Method*
*****************************************************************************
use "$path1A\synth_results.dta", clear

keep if treat == 1
collapse (mean) p_vals yedu yedu_synth, by(year_birth)
gen effect = yedu - yedu_synth

twoway line yedu year_birth, lcolor(black) || line yedu_synth year_birth, lpattern(dash) lcolor(black) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(5(0.5)8, angle(0) format(%12.1f)) ytitle("Average Years of Education", margin(medium)) ///
xlabel(1945(5)1970) xtick(1945(5)1970) xtitle("Birth Cohort") ///
legend(label(1 "Treated (Counties with Higher""Density of SDY)") label(2 "Synthetic Control from Counties""with Lower Density of SDY") col(1) size(small) ring(0) pos(4) colgap(*0.5)) ///
xline(1955, lpattern(solid) lwidth(thin) lcolor(black))  ///
title("Treatment versus Control",size(medium) margin(medium))
graph save a,replace 

twoway line effect year_birth, yaxis(1) lcolor(black) || scatter p_vals year_birth, yaxis(2) msymbol(square) mcolor(black) ///
||, graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ///
     ylabel(-0.1(0.1)0.4, angle(0) format(%12.1f) axis(1)) ytitle("Treatment Effect", margin(medium) axis(1)) ///
     ylabel(0(0.2)1, angle(0) format(%12.1f) axis(2)) ytick(-0.25 0(0.2)1,axis(2)) ytitle("p-values", margin(medium) axis(2)) ///
xlabel(1945(5)1970) xtick(1945(5)1970) xtitle("Birth Cohort") ///
legend(label(1 "Treatment Effect") label(2 "p-values") col(1) size(small) ring(0) pos(3) colgap(*0.5)) ///
xline(1955, lpattern(solid) lwidth(thin) lcolor(black))  ///
title("Treatment Effect and P-values",size(medium) margin(medium))
graph save b,replace 

graph combine a.gph b.gph, rows(1) graphregion(fcolor(gs16) lcolor(gs16)) xsize(18) ysize(8)
graph export "$path4\FigureE1.pdf",replace

erase a.gph
erase b.gph

erase "$path1A\census_1990_county_list.dta"
erase "$path1A\census_1990_county_SC.dta"
erase "$path1A\synth_analysis.dta"
erase "$path1A\synth_results.dta"


长按二维码下载所有数据程序,附上提取码:jlsq
关于相关计量方法视频课程,文章,数据和代码,参看 1.面板数据方法免费课程, 文章, 数据和代码全在这里, 优秀学人好好收藏学习!2.双重差分DID方法免费课程, 文章, 数据和代码全在这里, 优秀学人必须收藏学习!3.工具变量IV估计免费课程, 文章, 数据和代码全在这里, 不学习可不要后悔!4.各种匹配方法免费课程, 文章, 数据和代码全在这里, 掌握匹配方法不是梦!5.断点回归RD和合成控制法SCM免费课程, 文章, 数据和代码全在这里, 有必要认真研究学习!6.空间计量免费课程, 文章, 数据和代码全在这里, 空间相关学者注意查收!
下面这些短链接文章属于合集,可以收藏起来阅读,不然以后都找不到了。

2.5年,计量经济圈近1000篇不重类计量文章,

可直接在公众号菜单栏搜索任何计量相关问题,

Econometrics Circle




数据系列空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 |  夜间灯光 | 官员方言  | 微观数据 | 内部数据计量系列匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节 | 时间序列 | RDD断点 | 合成控制 | 200篇合辑 | 因果识别 | 社会网络 | 空间DID数据处理Stata | R | Python | 缺失值 | CHIP/ CHNS/CHARLS/CFPS/CGSS等 |干货系列能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI | SSCI查询 | 名家经验计量经济圈组织了一个计量社群,有如下特征:热情互助最多前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存