DID基础入门教程(附代码与数据下载)
DID基础入门教程(附代码与数据)
温馨提示: 本文数据及代码,请点击底部「阅读原文」,查看下载链接



**# 1、获取示例数据use "Panel101.dta", clear**# 2、创建变量* 创建一个虚拟变量以表示处理开始的时间。让我们假设处理始于1994年。在这种情况下,1994 年之前的年份的值为 0,1994 年以后的年份的值为 1。gen time = (year>=1994) & !missing(year)* 创建一个虚拟变量来标识暴露于处理的组。在此示例中,假设处理了代码为 5、6 和 7 的国家/地区 =1)。国家1-4未接受治疗(=0)。gen treated = (country>4) & !missing(country)* 在时间和处理之间相乘。我们将这个交互项称为"did"gen did = time*treated**# 3、估计DIDreg y time treated did, r . reg y time treated did, rLinear regression Number of obs = 70 F(3, 66) = 2.17 Prob > F = 0.0998 R-squared = 0.0827 Root MSE = 3.0e+09-------------------------------------------------------------------------------- | Robust y | Coefficient std. err. t p>|t| [95% conf. interval]-------------------------------------------------------------------------------- time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09 did | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08 _cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09 * did的系数是双重差分估计结果。在10%的水平上效果显著,处理具有负面影响。**# 4、方法2reg y time##treated, r . reg y time##treated, r Linear regression Number of obs = 70 F(3, 66) = 2.17 Prob > F = 0.0998 R-squared = 0.0827 Root MSE = 3.0e+09----------------------------------------------------------------------------------- | Robust y | Coefficient std. err. t p>|t| [95% conf. interval]----------------------------------------------------------------------------------- 1.time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09 1.treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09time##treated 1 1 | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08 _cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09-----------------------------------------------------------------------------------**# 5、"diff" 命令进行操作diff y, t(treated) p(time)diff y, t(treated) p(time)Number of observations in the DIFF-IN-DIFF: 70 Baseline Follow-upControl: 16 24 40 Treated: 12 18 30 28 42--------------------------------------------------------------- Outcome var. | y | S. Err. | t | P>|t|---------------------------------------------------------------BaselineControl | 3.6e+08 | | |Treated | 2.1e+09 | | |Diff (T-C) | 1.8e+09 | 1.1e+09 | 1.58 | 0.120Follow-upControl | 2.6e+09 | | |Treated | 1.9e+09 | | |Diff (T-C) | -7.4e+08 | 9.2e+08 | -0.81 | 0.422Diff-in-Diff | -2.5e+09 | 1.5e+09 | -1.73 | 0.088* ------------------------------------------------------------------R-square: 0.08* Means and Standard Errors are estimated by linear regression**Inference: *** p<0.01; ** p<0.05; * p<0.1`**# 6、双重差分平行趋势检验gen period = year - 1994forvalues i = 3(-1)1{gen pre_`i' = (period == -`i' & treated == 1) }gen current = (period == 0 & treated == 1)forvalues j = 1(1)3{gen time_`j' = (period == `j' & treated == 1) }xtreg y time treated pre_* current time_* i.year, fe est sto regcoefplot reg, keep(pre_* current post_*) vertical recast(connect) yline(0) xline(3, lp(dash))
1、获取示例数据
use "Panel101.dta", clear
2、创建变量
创建一个虚拟变量以表示处理开始的时间。让我们假设处理始于1994年。在这种情况下,1994 年之前的年份的值为 0,1994 年以后的年份的值为 1。
gen time = (year>=1994) & !missing(year)
创建一个虚拟变量来标识暴露于处理的组。在此示例中,假设处理了代码为 5、6 和 7 的国家/地区 =1)。国家1-4未接受治疗(=0)。
gen treated = (country>4) & !missing(country)
在时间和处理之间相乘。我们将这个交互项称为“did”
gen did = time*treated
3、估计DID
reg y time treated did, r . reg y time treated did, rLinear regression Number of obs = 70 F(3, 66) = 2.17 Prob > F = 0.0998 R-squared = 0.0827 Root MSE = 3.0e+09-------------------------------------------------------------------------------- | Robust y | Coefficient std. err. t p>|t| [95% conf. interval]-------------------------------------------------------------------------------- time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09 did | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08 _cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09
did的系数是双重差分估计结果。在10%的水平上效果显著,处理具有负面影响。
4、方法2
使用 hastag 方法时无需生成交互。使用以下命令进行估计
reg y time##treated, r . reg y time##treated, r Linear regression Number of obs = 70 F(3, 66) = 2.17 Prob > F = 0.0998 R-squared = 0.0827 Root MSE = 3.0e+09----------------------------------------------------------------------------------- | Robust y | Coefficient std. err. t p>|t| [95% conf. interval]----------------------------------------------------------------------------------- 1.time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09 1.treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09time##treated 1 1 | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08 _cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09-----------------------------------------------------------------------------------
time##treated的系数是DID估计量(在前面的例子中是“did”)。在10%水平下效果显著,处理具有负作用
5、”diff” 命令进行操作
命令diff是用户编写的外部命令,要进行下载安装,请键入
ssc install diff
使用diff命令估计
diff y, t(treated) p(time)
注:括号中的t(treated) p(time)为处理和时间的参数;参见“basic”方法
diff y, t(treated) p(time)Number of observations in the DIFF-IN-DIFF: 70 Baseline Follow-upControl: 16 24 40 Treated: 12 18 30 28 42--------------------------------------------------------------- Outcome var. | y | S. Err. | t | P>|t|---------------------------------------------------------------BaselineControl | 3.6e+08 | | |Treated | 2.1e+09 | | |Diff (T-C) | 1.8e+09 | 1.1e+09 | 1.58 | 0.120Follow-upControl | 2.6e+09 | | |Treated | 1.9e+09 | | |Diff (T-C) | -7.4e+08 | 9.2e+08 | -0.81 | 0.422Diff-in-Diff | -2.5e+09 | 1.5e+09 | -1.73 | 0.088* ------------------------------------------------------------------R-square: 0.08* Means and Standard Errors are estimated by linear regression**Inference: *** p<0.01; ** p<0.05; * p<0.1`
6、双重差分平行趋势检验
平行趋势检验
首先生成年份虚拟变量与实验组虚拟变量的交互项,此处选在政策前后各3年进行对比。
gen period = year - 1994forvalues i = 3(-1)1{gen pre_`i' = (period == -`i' & treated == 1) }gen current = (period == 0 & treated == 1)forvalues j = 1(1)3{gen time_`j' = (period == `j' & treated == 1) }
随后将这些交互项作为解释变量进行回归,并将结果储存在reg中以备后续检验。
xtreg y time treated pre_* current time_* i.year, fe est sto reg
采用coefplot命令进行绘图,观察是否1994年前的回归系数均在0轴附近波动,在1994年后回归系数显著为负。
coefplot reg, keep(pre_* current post_*) vertical recast(connect) yline(0) xline(3, lp(dash))

结果发现系数在政策前的确在0附近波动,而政策后一年系数显著为负,但很快又回到0附近。这说明实验组和控制组的确是可以进行比较的,而政策效果可能出现在颁布后一年,随后又很快消失。

夜雨聆风