고정효과모형과 확률효과모형 2014. 6. 21.
복습 패널데이타를합동 (pooled) OLS 로 추정할경우의가정 모든패널개체에대해모든시점에서오차항의기대값이 0 이되어야한다. 모든패널개체에대해모든시점에서오차항의분산이 σ 2 이어야한다. ( 동분산성 homoskedasticity) 패널개체와시간에따라오차항의분산이변하지않아야한다. 패널개체의오차항이서로상관관계가없어야한다. 동시적상관 (contemporaneous correlation) 이없어야한다. 한개체의서로다른시점의오차항사이에상관관계가없어야한다. 자기상관 (autocorrelation, serial correlation) 이없어야한다. 오차항과설명변수사이에상관관계가존재하지않는다. 설명변수의외생성 (exogeneity) 을만족한다. 이러한가정이위배되는경우 OLS 추정량에문제가있을수있으나, 패널데이터는오차항에이분산성이나자기상관이존재할가능성이있다. 2
복습 패널자료분석 : GLS /* 공분산행렬가정에위해되는경우효율적인추정량을구하기위해 GLS 사용 */ xtgls fatal perinck spircons /* GLS, generalized least squares */ Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: homoskedastic Correlation: no autocorrelation 동분산성을가정한다. Estimated covariances = 1 Number of obs = 336 Estimated autocorrelations = 0 Number of groups = 48 Estimated coefficients = 3 Time periods = 7 Wald chi2(2) = 131.51 Log likelihood = -232.0139 Prob > chi2 = 0.0000 fatal Coef. Std. Err. z P>z [95% Conf. Interval] perinck -.1493585.0131226-11.38 0.000 -.1750783 -.1236387 spircons.1685464.0432517 3.90 0.000.0837746.2533182 _cons 3.817989.1647112 23.18 0.000 3.495161 4.140817 OLS 와동일추정계수, 표준오차가 OLS 보다약간작다. 3
복습 패널자료분석 : GLS( 오차항에서 패널개체간이분산성가정 ) xtgls fatal perinck spircons, panel(hetero) Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: heteroskedastic Correlation: no autocorrelation 패널개체간이분산성을가정한다. Estimated covariances = 48 Number of obs = 336 Estimated autocorrelations = 0 Number of groups = 48 Estimated coefficients = 3 Time periods = 7 48 개패널개체의오차항의분산을추정하였다. Wald chi2(2) = 234.36 Prob > chi2 = 0.0000 fatal Coef. Std. Err. z P>z [95% Conf. Interval] perinck -.1206468.0086248-13.99 0.000 -.137551 -.1037425 spircons.0777905.0341464 2.28 0.023.0108648.1447161 _cons 3.544183.1024056 34.61 0.000 3.343472 3.744894 4
복습 패널자료분석 : GLS( 오차항에서 패널개체간이분산성가정 ) xtgls fatal perinck spircons, panel(hetero) /* 모형추정후 e-class 에저장된내용을확인한다. */ ereturn list scalars: e(n) = 336 ( 전체표본수, 48 x 7) e(n_g) = 48 ( 패널개체수 ) e(n_t) = 7 ( 패널개체의시계열관측개체개수중가장큰값 ) e(g_min) = 7 e(g_avg) = 7 e(g_max) = 7 /* 행렬의구체적인값을확인한다 */ mat list e(sigma) 오차항의공분산행렬추정치 symmetric e(sigma)[48,48] c1 c2 c3 c4 c5 c6 c7 c8 c9 r1.09190281 r2 0.46693412 r3 0 0.05563616 r4 0 0 0.11937386 r5 0 0 0 0.01913816 5
복습 패널개체간이분산성검정 /* 제약모형 (restricted model): 오차항의분산이패널그룹에따라다르지않고모두같다. */ xtgls fatal perinck spircons estimates store R_model Log likelihood = -232.0139 /* 비제약모형 (unrestricted model): 오차항의분산이패널그룹에따라다르다. */ xtgls fatal perinck spircons, panel(hetero) igls nolog estimates store UR_model Log likelihood = -125.482 /* LR(likelihood ratio; 우도비 ) test */ lrtest UR_model R_model, df(47) 48 개분산 ( 비제약모형 ) - 1 개분산 ( 제약모형 ) Likelihood-ratio test LR chi2(47)= 213.06 (Assumption: R_model nested in UR_model) Prob > chi2= 0.0000 오차항의등분산성을기각한다. 6
패널자료분석 : GLS( 동시적상관가정 ) 동시적상관 (contemporaneous correlation) corr(ϵ it, ϵ jt ) 0, 모든 i j에대해 시점 t에서서로다른패널개체의오차항사이에상관관계가존재한다. 이분산성도가정된다. 7
패널자료분석 : GLS( 동시적상관가정 ) xtgls fatal perinck spircons, panel(corr) Cross-sectional time-series FGLS regression n(n+1)/2 = (48 *49)/2 Coefficients: generalized least squares Panels: heteroskedastic with cross-sectional correlation Correlation: no autocorrelation 이분산성과동시적상관을함께가정한다. Estimated covariances = 1176 Number of obs = 336 Estimated autocorrelations = 0 Number of groups = 48 Estimated coefficients = 3 Time periods = 7 Wald chi2(2) = 22.85 Prob > chi2 = 0.0000 fatal Coef. Std. Err. z P>z [95% Conf. Interval] perinck -.1340037.0282983-4.74 0.000 -.1894673 -.0785401 spircons.1456812.0765297 1.90 0.057 -.0043144.2956767 _cons 3.654852.322744 11.32 0.000 3.022285 4.287418 추정하는모수의개수 (1,176) 가관측개체수 ( 336) 보다많아추정결과의신뢰성에문제가있다. Note: you estimated at least as many quantities as you have observations. 8
패널자료분석 : GLS( 이분산성, 자기상관가정 ) xtgls fatal perinck spircons, corr(ar1) panel(hetero) Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: heteroskedastic Correlation: common AR(1) coefficient for all panels (0.8166) Estimated covariances = 48 Number of obs = 336 Estimated autocorrelations = 1 Number of groups = 48 Estimated coefficients = 3 Time periods = 7 Wald chi2(2) = 27.48 추정해야할모수는 52개 Prob > chi2 = 0.0000 fatal Coef. Std. Err. z P>z [95% Conf. Interval] perinck -.0563742.0119293-4.73 0.000 -.0797552 -.0329933 spircons -.0416077.0480484-0.87 0.387 -.1357808.0525655 _cons 2.821765.1703268 16.57 0.000 2.48793 3.155599 9
Between Effects 모형
Between Effects 모형 패널데이타의시계열특성을고려하지않고개체간변동만을고려하는모형이다. 각개체의시계열관측치그룹의평균값을이용하여계수를추정한다. y it = α + βx it + u i + e it (7.2) i: 개인 t: 시간 u i : 시간에따라변하지않는패널개체특성을나타내는오차항 e it : 시간과패널개체에따라변하는순수오차항 = α + β + u i + (7.4) 패널데이타를횡단면데이터로변환한것으로패널데이타의시계열특성을무시하고횡단면데이터의특성만사용하여효율적인추정량을얻지못한다. 11
Between Effects 모형 use P_data7_1, clear (National Longitudinal Survey. Young Women 14-26 years of age in 1968) tsset id year panel variable: idcode (strongly balanced) time variable: year, 68 to 73 delta: 1 unit 12
Between Effects 모형 13
합동 OLS 모형 reg ln_wage ttl_exp tenure black Source SS df MS Number of obs = 1251 F( 3, 1247) = 57.98 Model 20.0204514 3 6.67348381 Prob > F = 0.0000 Residual 143.528557 1247.115099084 R-squared = 0.1224 Adj R-squared = 0.1203 Total 163.549009 1250.130839207 Root MSE =.33926 ln_wage Coef. Std. Err. t P>t [95% Conf. Interval] ttl_exp.0379981.0079441 4.78 0.000.0224128.0535835 tenure.0207316.0082092 2.53 0.012.0046261.036837 black -.146449.0220408-6.64 0.000 -.18969 -.1032079 _cons 1.535136.0204658 75.01 0.000 1.494985 1.575287 14
Between Effects 모형 /* Between Effects 모형 */ xtreg ln_wage ttl_exp tenure black, be 패널의평균값을이용하여추정한다. Between regression (regression on group means) Number of obs = 1251 Group variable: idcode Number of groups = 213 R-sq: within = 0.1608 Obs per group: min = 3 between = 0.1005 ( 식 7.4) 의결정계수값 avg = 5.9 overall = 0.1200 max = 6 F(3,209) = 7.78 sd(u_i + avg(e_i.))=.2904677 Prob > F = 0.0001 (u i + ) 의표준편차 15
Between Effects 모형 ln_wage Coef. Std. Err. t P>t [95% Conf. Interval] ttl_exp.0258706.0308187 0.84 0.402 -.0348847.086626 tenure.0364367.0264689 1.38 0.170 -.0157435.0886169 black -.1424734.0457008-3.12 0.002 -.2325671 -.0523797 _cons 1.541204.0721162 21.37 0.000 1.399036 1.683373 다른조건이동일할때개인 A 가개인 B 보다총직업연차 (ttl_exp) 가한단위더많으면 A 가 B 보다평균적으로임금을 2.58% 더받는다. 16
고정효과모형
패널자료분석 : 고정효과 (fixed effect) 모형 오차항 u i 를확률변수 (random variable) 가아닌추정해야할모수 (p arameter) 로간주한다. 기울기모수는모든패널개체에대해동일하지만, 상수항 (α + u i ) 는개체별로달라진다. y it = α + β x it + u i + e it, i = 1, 2,, n 및 t = 1, 2,, T = (α + u i ) + β x it + e it 기울기모수인 β 는모든패널개체에서서로동일하지만상수항 α + u i 는패널개체에따라달라진다. i: 개인 t: 시간 u i : 시간에따라변하지않는패널개체특성을나타내는오차항 e it : 시간과패널개체에따라변하는순수한오차항 18
패널자료분석 : 고정효과 (fixed effect) 모형 y it = α + β x it + u i + e it (8.1) = (α + u i ) + β x it + e it (8.2) = α + β + u i + (8.3) 패널그룹별평균으로이루어진 between 모형 (8.1) (8.3): within 변환을적용한추정모형 고정효과모형추정방법 1 ( y it ) = β (x it )+ (e it, ) (8.4) 패널개체별더미변수이용 (LSDV 추정 ) 고정효과모형추정방법 2 LSDV, least squares dummy variable 최소제곱더미변수 y it = α i + β x it + e it α i 에대한양질의추정량을얻기위해서는패널개체수 (n) 가많지않고패널그룹내의 T 가충분해야한다. 19
고정효과모형 (Within 변환 ) use P_data8_1, clear tsset company year panel variable: company (strongly balanced) time variable: year, 1935 to 1954 delta: 1 unit 20
고정효과모형 (Within 추정 ) xtreg invest mvalue kstock, fe fixed effect Fixed-effects (within) regression Number of obs = 200 Group variable: company Number of groups = 10 R-sq: within = 0.7668 Obs per group: min = 20 between = 0.8194 avg = 20.0 overall = 0.8060 max = 20 F(2,188) = 309.01 corr(u_i, Xb) = -0.1517 Prob > F = 0.0000 오차항 u i 와 x it 와의상관계수추정치 고정효과모형에서는오차항 u i 가없는모형을추정하므로오차항 u i 와 x it 간에상관관계가존재하더라도추청계수의일치성이보장됨. 21
고정효과모형 (Within 추정 ) invest Coef. Std. Err. t P>t [95% Conf. Interval] mvalue.1101238.0118567 9.29 0.000.0867345.1335131 kstock.3100653.0173545 17.87 0.000.2758308.3442999 _cons -58.74393 12.45369-4.72 0.000-83.31086-34.177 어떤회사의 mvalue 값이 1 단위증가하면그회사의 invest 는평균적으로 0.11 만큼증가한다. sigma_u 85.732501 ( 오차항 u i 의표준편차 ) sigma_e 52.767964 ( 오차항 e it 의표준편차 ) rho.72525012 (fraction of variance due to u_i) 오차항의총분산에서패널의개체특성에대한오차항의분산이차지하는비율 0 에서 1 사이의값을갖는다. 1 에가까운값을가질경우시간에따라변하지않는패널개체의특성을고려하는것이중요 F test that all u_i=0: F(9, 188) = 49.18 Prob > F = 0.0000 귀무가설 : 모든 i 에대해 u i =0 귀무가설을받아들인다면합동 OLS 추정선택, 귀무가설이기각되면고정효과모형선택 22
고정효과모형 (Within 추정 ) use P_data8_2, clear tsset id year panel ariable :idcode (unbalanced) time variable : year, 68 to 88, but with gaps delta: 1 unit 23
고정효과모형 (Within 추정 ) xtreg ln_wage ttl_exp tenure black, fe y it = (α + u i ) + β x it + r z i + e it z i : 시간에따라변하지않는설명변수 Fixed-effects (within) regression Number of obs = 28101 Group variable: idcode Number of groups = 4699 R-sq: within = 0.1438 Obs per group: min = 1 between = 0.2743 avg = 6.0 overall = 0.1905 max = 15 F(2,23400) = 1965.74 corr(u_i, Xb) = 0.1736 Prob > F = 0.0000 ln_wage Coef. Std. Err. t P>t [95% Conf. Interval] ttl_exp.0245398.0006875 35.70 0.000.0231923.0258873 tenure.0123934.0009012 13.75 0.000.0106269.0141599 black (dropped) _cons 1.48521.0036176 410.55 0.000 1.478119 1.492301 sigma_u.37652594 sigma_e.29564006 rho.61861848 (fraction of variance due to u_i) F test that all u_i=0: F(4698, 23400) = 7.40 Prob > F = 0.0000 시간에따라변하지않는 black( 흑인여부 ) 변수는추정계수가제공되지않는다. Black 변수는패널개체의이질성에해당하는오차항 u i 에포함되어있다. 24
고정효과모형 (LSDV 추정 ) use P_data8_1, clear tsset company year xi, noomit: reg invest mvalue kstock i.company, nocons 패널개체 ( 회사 ) 에대한더미변수를모두생성하고상수항을제외한다. Source SS df MS Number of obs = 200 F( 12, 188) = 391.97 Model 13097228 12 1091435.66 Prob > F = 0.0000 Residual 523478.114 188 2784.45805 R-squared = 0.9616 Adj R-squared = 0.9591 Total 1 3620706.1 200 68103.5304 Root MSE = 52.768 invest Coef. Std. Err. t P>t [95% Conf. Interval] Within 추정과동일한추정값 mvalue.1101238.0118567 9.29 0.000.0867345.1335131 kstock.3100653.0173545 17.87 0.000.2758308.3442999 _Icompany_1-70.29669 49.70796-1.41 0.159-168.3537 27.76035 _Icompany_2 101.9058 24.93832 4.09 0.000 52.71093 151.1007 _Icompany_3-235.5718 24.43162-9.64 0.000-283.7672-187.3765 _Icompany_4-27.80929 14.07775-1.98 0.050-55.57995 -.0386303 _Icompany_5-114.6168 14.16543-8.09 0.000-142.5604-86.67319 _Icompany_6-23.16129 12.66874-1.83 0.069-48.15244 1.829856 _Icompany_7-66.55347 12.84297-5.18 0.000-91.88833-41.21862 _Icompany_8-57.54565 13.99315-4.11 0.000-85.14941-29.9419 _Icompany_9-87.22227 12.89189-6.77 0.000-112.6536-61.79091 _Icompany_10-6.567843 11.82689-0.56 0.579-29.89831 16.76262 25
고정효과모형 (LSDV 추정 ) xi: reg invest mvalue kstock i.company 상수항을포함하고패널개체에대한더미변수중하나를제외한다. i.company _Icompany_1-10 (naturally coded; _Icompany_1 omitted) Source SS df MS Number of obs = 200 F( 11, 188) = 288.50 Model 8836465.8 11 803315.073 Prob > F = 0.0000 Residual 523478.114 188 2784.45805 R-squared = 0.9441 Adj R-squared = 0.9408 Total 9359943.92 199 47034.8941 Root MSE = 52.768 invest Coef. Std. Err. t P>t [95% Conf. Interval] mvalue.1101238.0118567 9.29 0.000.0867345.1335131 kstock.3100653.0173545 17.87 0.000.2758308.3442999 _Icompany_2 172.2025 31.16126 5.53 0.000 110.7319 233.6732 _Icompany_3-165.2751 31.77556-5.20 0.000-227.9576-102.5927 _Icompany_4 42.4874 43.90987 0.97 0.334-44.13197 129.1068 _Icompany_5-44.32013 50.49225-0.88 0.381-143.9243 55.28406 _Icompany_6 47.13539 46.81068 1.01 0.315-45.20629 139.4771 _Icompany_7 3.743212 50.56493 0.07 0.941-96.00433 103.4908 _Icompany_8 12.75103 44.05263 0.29 0.773-74.14994 99.652 _Icompany_9-16.92558 48.45326-0.35 0.727-112.5075 78.65636 Within 추정과동일한추정값 _Icompany_10 63.72884 50.33023 1.27 0.207-35.55572 163.0134 _cons -70.29669 49.70796-1.41 0.159-168.3537 27.76035 26
고정효과모형 (LSDV 추정 ) testparm _Icompany_2-_Icompany_10 ( 1) _Icompany_2 = 0 ( 2) _Icompany_3 = 0 ( 3) _Icompany_4 = 0 ( 4) _Icompany_5 = 0 ( 5) _Icompany_6 = 0 ( 6) _Icompany_7 = 0 ( 7) _Icompany_8 = 0 ( 8) _Icompany_9 = 0 ( 9) _Icompany_10 = 0 패널개체에대한 9 개더미변수의추정계수가모두 0 인지에대한 F 검정 F( 9, 188) = 49.18 Prob > F = 0.0000 추정계수가모두 0 이라는귀무가설을매우낮은유의수준에서기각 27
고정효과모형 (LSDV 추정 ) use P_data8_2, clear tsset id year 패널그룹의수가많은경우 LSDV 추정시결과물이많아지거나실행시간이많이걸림 reg 명령문대신 areg 명령어를사용할수있음 areg ln_wage ttl_exp tenure black, absorb(id) 패널개체를지정하는변수 (dropping black because it does not vary within category) Linear regression, absorbing indicators Number of obs = 28101 F( 2, 23400) = 1965.74 Prob > F = 0.0000 R-squared = 0.6812 Adj R-squared = 0.6171 Root MSE =.29564 ln_wage Coef. Std. Err. t P>t [95% Conf. Interval] Within 추정결과와동일 ttl_exp.0245398.0006875 35.70 0.000.0231923.0258873 tenure.0123934.0009012 13.75 0.000.0106269.0141599 black (dropped) _cons 1.48521.0036176 410.55 0.000 1.478119 1.492301 idcode F(4698, 23400) = 7.660 0.000 (4699 categories) 패널개체수가 4,699명 28
고정효과모형 (LSDV 추정 ) use P_data8_3, clear 개체의이질성 (u i ) 과시간의이질성 (v t ) 을동시에고려 y it = α + β x it + u i + v t + e it tsset company year xi: xtreg invest mvalue kstock i.year, fe 연도에대한더미변수추가 i.year _Iyear_1935-1939 (naturally coded; _Iyear_1935 omitted) Fixed-effects (within) regression Number of obs = 50 Group variable: company Number of groups = 10 R-sq: within = 0.6380 Obs per group: min = 5 between = 0.5577 avg = 5.0 overall = 0.5628 max = 5 F(6,34) = 9.99 corr(u_i, Xb) = 0.1638 Prob > F = 0.0000 29
고정효과모형 (LSDV 추정 ) invest Coef. Std. Err. t P>t [95% Conf. Interval] mvalue.0599551.0117349 5.11 0.000.036107.0838032 kstock -.3416076.1099002-3.11 0.004 -.5649518 -.1182634 _Iyear_1936 10.61359 12.52288 0.85 0.403-14.83597 36.06315 _Iyear_1937 24.93282 14.52805 1.72 0.095-4.591731 54.45737 _Iyear_1938 22.7716 14.48812 1.57 0.125-6.671813 52.215 _Iyear_1939 17.27716 16.03763 1.08 0.289-15.31523 49.86954 _cons 51.61509 13.40755 3.85 0.000 24.36768 78.86251 sigma_u 83.765483 sigma_e 26.122686 rho.91136638 (fraction of variance due to u_i) F test that all u_i=0: F(9, 34) = 31.22 Prob > F = 0.0000 testparm _Iyear_1936-_Iyear_1939 연도더미변수가유의한지 F 검정 ( 1) _Iyear_1936 = 0 ( 2) _Iyear_1937 = 0 ( 3) _Iyear_1938 = 0 ( 4) _Iyear_1939 = 0 F( 4, 34) = 0.95 Prob > F = 0.4467 시간특성효과는존재하지않음. 30
확률효과모형
패널자료분석 : 확률효과 (random effect) 모형 y it = α + β x it + u i + e it i = 1, 2,, n 및 t = 1, 2,, T = (α + u i ) + β x it + e it u i ~ N(0, ), e it ~ N(0, ) u i 를확률변수로가정한다. cov(x it, u i )=0 의가정이성립할경우아래와같이변형된모형을 OLS 로추정하여일치추정량이면서효율적인추정량을얻을수있음. (y it )= α (1 ) + β( x it ) + [u i (1 ) + (e it )] = 0: 합동 OLS 로추정하는것과같음. = 1: within 회귀모형과같음. 32
패널자료분석 : 확률효과 (random effect) 모형 Between effect 모형과고정효과 (fixed effect) 모형의 weighted average 로파라미터를추정한다. 패널간정보와패널내정보를모두활용하며, 시간에따라변하지않는변수의효과를추정할수있다는장점이있다. 설명변수의외생성이성립하지않는다면파라미터추정이정확하게되지못하는단점이있다. 33
확률효과모형 use P_data10_1, clear tsset company year xtreg invest mvalue kstock, re theta 오차항의자가상관문제로 GLS(generalized least squares) 로추정 Random-effects GLS regression Number of obs = 200 Group variable: company Number of groups = 10 R-sq: within = 0.7668 Obs per group: min = 20 between = 0.8196 avg = 20.0 overall = 0.8061 max = 20 Random effects u_i ~ Gaussian Wald chi2(2) = 657.67 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 theta =.86122362 오차항 u i 의확률분포를정규분포로가정하여추정 의추정치로균형패널이므로 모든패널개체에도일한값적용 추정된값이 1 에가까우므로추정계수가고정효과모형결과와유사함. 34
확률효과모형 invest Coef. Std. Err. z P>z [95% Conf. Interval] mvalue.1097811.0104927 10.46 0.000.0892159.1303464 kstock.308113.0171805 17.93 0.000.2744399.3417861 _cons -57.83441 28.89893-2.00 0.045-114.4753-1.193537 sigma_u 84.20095 sigma_e 52.767964 rho.71800838 (fraction of variance due to u_i) 전체오차항의분산에서패널그룹오차항인 u i 의분산이차지하는비중 35
최우추정법을사용한확률효과모형 use P_data10_1, clear tsset company year u i, e it 을모두정규분포로가정하는경우최우추정법 (MLE; maximum likelihood estimation) 으로추정계수를얻을수있음. 표본의수가적은불균형패널인경우를제외하고 GLS 와 MLE 추정결과는기본적으로같음. xtreg invest mvalue kstock, mle nolog 최우추정법을사용하고로그우도함수를최대화하는과정을창에표시하지않고, 최종결과만을제시 Random-effects ML regression Number of obs = 200 Group variable: company Number of groups = 10 Random effects u_i ~ Gaussian Obs per group: min = 20 avg = 20.0 max = 20 LR chi2(2) = 293.43 Log likelihood = -1095.257 Prob > chi2 = 0.0000 36
최우추정법을사용한확률효과모형 invest Coef. Std. Err. z P>z [95% Conf. Interval] mvalue.1097626.0103389 10.62 0.000.0894988.1300265 kstock.307942.0171006 18.01 0.000.2744254.3414585 _cons -57.7672 27.70004-2.09 0.037-112.0583-3.476114 /sigma_u 80.29729 18.37811 51.27213 125.7536 /sigma_e 52.49255 2.69306 47.47094 58.04534 Rho.7005943.0985226.4881266.8603709 Likelihood-ratio test of sigma_u=0: chibar2(01)=193.09 Prob>=chibar2 = 0.000 귀무가설을채택할경우패널개체의특성오차항의분산이 0 이되므로패널그룹간이질성이존재하지않는다는것을의미 확률효과모형이의미가없다. 37
모형의추정치비교 use P_data10_2, clear tsset id year qui xtreg ln_wage ttl_exp tenure black, be estimates store BE_model 추정결과저장 qui xtreg ln_wage ttl_exp tenure black, fe estimates store FE_model qui xtreg ln_wage ttl_exp tenure black, re estimates store RE_model estimates table BE_model FE_model RE_model, b(%9.3f) star(0.01 0.05 0.1) 추정계수를표로작성 38
모형의추정치비교 Variable BE_model FE_model RE_model ttl_exp 0.042*** 0.025*** 0.027*** tenure 0.029*** 0.012*** 0.014*** black -0.115*** 0.000-0.124*** _cons 1.371*** 1.485*** 1.503*** 흑인이흑인이아닌사람보다임금을평균적으로 12.4% 덜받는다. Between 모형 : ( 다른조건이동일할때 ) 개인 A 가개인 B 보다경력이 1 년더많으면평균적으로임금이 4.2% 더높다. 고정효과모형 : 어떤개인 A 의임금이여러해동안관측되었는데, ( 다른조건이동일할때 ) 직업경력이 1 년증가하면평균적으로임금이 2.5% 증가한다. 확룔효과모형 : ( 다른조건이동일할때 ) 직업경력이 1 년증가하면임금이평균적으로 2.7% 증가한다. 확률효과모형은설명변수의효과가패널그룹간, 패널그룹내에서똑같이적용되는것으로해석된다. 개인 A 가개인 B 보다직업경력이많으면임금이 2.7% 증가하고, 또한개인 A 내에서경력이 1 년증가하면임금이 2.7% 증가한다. 39
고정효과모형과확률효과모형의선택 데이터에서패널개체의특성을의미하는 u i 에대한추론 패널개체들이모집단에서무작위로추출된표본일경우오차항 u i 가확률분포를따른다고가정할수있음 예 ) 한국노동패널데이터 패널개체들이모집단에서무작위로추출된표본이아니라특정모집단그자체인경우오차항 u i 가확률분포를따른다고가정할수없음 예 ) OECD 국가패널데이터 하우스만검정 (Hausman test) 계량경제학이론에서 cov(x it, u i )=0 가정이성립한다면고정효과추정량과확률효과추정량이모두일치추정량으로유사함. H0: cov(x it, u i ) = 0 H1: cov(x it, u i ) 0 귀무가설이맞다면고정효과추정량과확률효과추정량모두일치추정량이므로체계적차이가존재하지않는다. 확률효과모형이더효율적 귀무가설이틀리다면확률효과추정량은일치추정량이아니므로고정효과추정량과차이가있다. 고정효과모형선택 40
하우스만검정 use P_data11_4, clear tsset state year qui xtreg fatal unrate perinck beertax spircon, fe estimates store FE qui xtreg fatal unrate perinck beertax spircon, re estimates store RE hausman FE RE 반드시고정효과를먼저적고, 그다음에확률효과순으로적어야한다. 41
하우스만검정 ---- Coefficients ---- (b) (B) (b-b) sqrt(diag(v_b-v_b)) FE RE Difference S.E. unrate -.0290499 -.0491381.0200882. perinck.1047103 -.0110727.115783.0067112 beertax -.4840728.0442768 -.5283495.1090815 spircons.8169652.3024711.514494.0462668 b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(4) = (b-b)'[(v_b-v_b)^(-1)](b-b) = 130.93 Prob>chi2 = 0.0000 (V_b-V_B is not positive definite) 1% 유의수준에서귀무가설을기각한다. 확률효과모형의추정량이일치추정량이아니고, 고정효과모형을선택하는것이보다적절함. 42
의료패널데이터실습 43
변수명 id 개인 ID hdid 가구 id gender 성별, 1=male, 0=female 설명 gage 1: 0-9(gage_1), 2: 10-19(gage_2), 3: 20-29(gage_3), 4: 30-39(gage_4), 5: 40-49(g age_5), 6: 50-59(gage_6), 7: 60-69(gage_7), 8: 70-79(gage_8), 9: 80+(gage_9) 1: 20-34(agegr_1), 2: 35-49(agegr_2), 3: 50-64(agegr_3), 4: 65-74(agegr_4), 5: 75 agegr +(agegr_5) elderly 0: 20-64, 1: 65+ gmarry 결혼상태, 1=married (gmarry_1), 2=separated/widodwed/divorced (gmarry_2), 3 =single (gmarry_3) edu 교육수준, 1= equal or less than elementary school (edu_1), 2=middle school (e du_2), 3=high school (edu_3), 4=college or more (edu_4) 건강보험, 1=health insurance (insur_1), 2= 의료급여 + 특례자 (insur_2), 3= 미가입, insur 자격상실, 급여정지 (insur_3) cdcount 만성질환갯수 eincome 가구원수보정가구소득 oucount 외래방문횟수 1: 정해진방법대로당뇨병약을복용하는편, 0: 정해진방법대로당뇨병약을복용 comply 하지않는편 1: 가계에매우큰부담을준다, 2: 가계에약간의부담을준다, 3: 감당할수있다, burden 4: 가계에별로부담을주지않는다, 5: 가계에전혀부담을주지않는다. (2008s년베이스라인값 ) year 연도 44
연구목적 당뇨병환자의만성질환수가외래방문횟수에영향을미치는지분석한다. 45
연구결과 tsset id year panel variable: id (unbalanced) time variable: year, 2008 to 2010, but with gaps delta: 1 unit 불균형패널이고, 2008-2010 년까지시간갭이있다. 46
실습 : 기초통계분석 xttrans comply, freq comply comply 0 1 Total 0 48 135 183 26.23 73.77 100.00 1 122 1,074 1,196 10.20 89.80 100.00 Total 170 1,209 1,379 12.33 87.67 100.00 현재당뇨병복약순응하는사람이다음년도에복약순응할확률이약 90% 이다. 47
실습 : 확률효과모형 xi: xtreg oucount gender agegr_2-agegr_5 cdcount, re Random-effects GLS regression Number of obs = 2202 Group variable: id Number of groups = 828 R-sq: within = 0.0316 Obs per group: min = 1 between = 0.1887 avg = 2.7 overall = 0.1507 max = 3 Random effects u_i ~ Gaussian Wald chi2(6) = 218.32 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ oucount Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- gender -4.673196 1.826831-2.56 0.011-8.253719-1.092673 agegr_2 2.299541 8.701522 0.26 0.792-14.75513 19.35421 agegr_3 4.538715 8.352214 0.54 0.587-11.83132 20.90875 agegr_4 11.65308 8.380605 1.39 0.164-4.772602 28.07877 agegr_5 10.50367 8.585024 1.22 0.221-6.322668 27.33001 cdcount 3.549424.284716 12.47 0.000 2.991391 4.107457 _cons 12.25394 8.298183 1.48 0.140-4.0102 28.51808 -------------+---------------------------------------------------------------- sigma_u 22.827367 sigma_e 17.827798 rho.62114297 (fraction of variance due to u_i) ------------------------------------------------------------------------------ 남자가여자보다외래방문횟수가작고, 만성질환수가 1개늘어나면평균적으로외래방문횟수가 3.5회증가한다. 48
과제 당뇨병환자의복약순응도가외래방문횟수에영향을미치는가? 건강보험형태가외래방문횟수에영향을미치는가? 만성질환수가소득에영향을미치는가? 49
수고하셨습니다. 50