Chapter 7 분산분석

Similar documents
Chapter 7 분산분석

Chapter 7 분산분석

Microsoft PowerPoint - IPYYUIHNPGFU

nonpara6.PDF

ANOVA 란? ANalysis Of VAriance Ø 3개이상의모집단의평균의차이를검정하는방법 Ø 3개의모집단일경우 H0 : μ1 = μ2 = μ3 H0기각 : μ1 μ2 = μ3 or μ1 = μ2 μ3 or μ1 μ2 μ3 àpost hoc test 수행

고객관계를 리드하는 서비스 리더십 전략

abstract.dvi

G Power

Chapter 8 단순선형회귀분석과 상관분석

PPT Template

이다. 즉 μ μ μ : 가아니다. 이러한검정을하기위하여분산분석은다음과같은가정을두고있다. 분산분석의가정 (1) r개모집단분포는모두정규분포를이루고있다. (2) r개모집단의평균은다를수있으나분산은모두같다. (3) r개모집단에서추출한표본은서로독립적이다. 분산분석은집단을구분하는


Microsoft PowerPoint - ANOVA pptx

R t-..

PPT Template

Microsoft Word - multiple


DBPIA-NURIMEDIA

( )실험계획법-머리말 ok

nonpara1.PDF

DBPIA-NURIMEDIA

hwp

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re

Microsoft Word - Chapter9.doc

methods.hwp


012임수진


DBPIA-NURIMEDIA

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a

歯1.PDF

Microsoft Word - sbe13_anova.docx

Microsoft Word - sbe_anova.docx

DBPIA-NURIMEDIA

본문01

Chapter 11 비모수 및 무분포통계학

untitled

제 1 절 two way ANOVA 제1절 1 two way ANOVA 두 요인(factor)의 각 요인의 평균비교와 교호작용(interaction)을 검정하는 것을 이 원배치 분산분석(two way ANalysis Of VAriance; two way ANOVA)이라

DBPIA-NURIMEDIA

#Ȳ¿ë¼®

서론 34 2

cat_data3.PDF

<3130C0E5>

분산분석.pages

- 1 -

DBPIA-NURIMEDIA

eda_ch7.doc

Microsoft PowerPoint - AC3.pptx

DBPIA-NURIMEDIA

부문별 에너지원 수요의 변동특성 및 공통변동에 미치는 거시적 요인들의 영향력 분석

<B0A3C3DFB0E828C0DBBEF7292E687770>

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

<4D F736F F F696E74202D20C1A63132C0E520C0CCBFF8BAD0BBEABAD0BCAE205BC8A3C8AF20B8F0B5E55D>

歯14.양돈규.hwp

Microsoft Word - SBE2012_anova.docx


Chapter 분포와 도수분석

02김헌수(51-72.hwp

Jeeshim & KUCC625 (08/04/2009) Statistical Data Analysis Using R:22 6. 집단간평균비교 집단간평균을비교하는것은기본방법이다. 따라서비교할변수는평균을계산할수있어야하고, 의미있게해석할수있어야한다. 두집단

... 수시연구 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 권혁구ㆍ서상범...


Microsoft PowerPoint - 27.pptx

DBPIA-NURIMEDIA

강의10

한국성인에서초기황반변성질환과 연관된위험요인연구

Orcad Capture 9.x

untitled

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

서론

PowerPoint 프레젠테이션

도비라

Microsoft PowerPoint - 26.pptx

<3136C1FD31C8A320C5EBC7D52E687770>

PJTROHMPCJPS.hwp

14.531~539(08-037).fm

휠세미나3 ver0.4

<31372DB9DABAB4C8A32E687770>

슬라이드 1

27 2, 1-16, * **,,,,. KS,,,., PC,.,,.,,. :,,, : 2009/08/12 : 2009/09/03 : 2009/09/30 * ** ( :

untitled


1..

하나님의 선한 손의 도우심 이세상에서 가장 큰 축복은 하나님이 나와 함께 하시는 것입니다. 그 이 유는 하나님이 모든 축복의 근원이시기 때문입니다. 에스라서에 보면 하나님의 선한 손의 도우심이 함께 했던 사람의 이야기 가 나와 있는데 에스라 7장은 거듭해서 그 비결을

석사논문.PDF

2

에너지경제연구 제13권 제1호

8-VSB (Vestigial Sideband Modulation)., (Carrier Phase Offset, CPO) (Timing Frequency Offset),. VSB, 8-PAM(pulse amplitude modulation,, ) DC 1.25V, [2

,,,.,,,, (, 2013).,.,, (,, 2011). (, 2007;, 2008), (, 2005;,, 2007).,, (,, 2010;, 2010), (2012),,,.. (, 2011:,, 2012). (2007) 26%., (,,, 2011;, 2006;

Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

<352E20BAAFBCF6BCB1C5C320B1E2B9FDC0BB20C0CCBFEBC7D120C7D1B1B920C7C1B7CEBEDFB1B8C0C720B5E6C1A1B0FA20BDC7C1A120BCB3B8ED D2DB1E8C7F5C1D62E687770>

연속형 자료분석 R commander 예제

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 -

DIY 챗봇 - LangCon

6자료집최종(6.8))

44-4대지.07이영희532~

서강대학교 기초과학연구소대학중점연구소 심포지엄기초과학연구소

statistics

04-다시_고속철도61~80p

Transcription:

Chapter 8 실험계획및분산분석 (ANalysis Of VAariance, ANOVA) Updated 2018/4/30

7.1 머리말 (Introduction) 분산분석 (analysis of variance) : 전체변동을몇개의성분으로분할하는기법 (Divide total variation into several components) 전체변동에대해각각의변동요인의기여규모를파악 (contribution of particular components) 목적 (Aims) : 모분산의추정과가설검정 (estimation & testing for the variances) 모평균의추정과가설검정 (estimation & testing for the means)

motivation 비교하고싶은그룹이두개이면 (comparisons of two groups) -> t-test 비교하고싶은그룹이두개이상이면 (more than two groups) -> 두개그룹씩뽑아서쌍을만든후에여러개의 t-test 를실시한다. (pairwise t-tests) 번거롭기도하고이론적으로틀린결론에도달할수있다. (cumbersome & theoretically wrong -> 다중비교의문제 (multiple-comparisons problems) 전체자료를사용하지않고자료의부분만을사용하므로효율이떨어진다. (efficiency problems due to the usage of partial data) 전체자료를이용하여서세그룹이상을비교하는분석 (more than 3 groups using whole data) -> 분산분석 ANOVA ( 종속변수는연속형, 독립변수는이산형 ) response var: conti, explanatory var: categorical

7.2 일원배치분산분석 (one-way analysis of variance) 하나의설명변수 (one explanatory variable)

완전확률계획법 (Completely Randomized Design) 정의 : 처리방법을확률적으로할당하고그처리효과를 randomization) 완전확률화계획법으로관측한표본 처리변수 (treatment) 1 2 3 k x 11 x 12 x 13 x 1k x 21 x 22 x 23 x 2k x 31 x 32 x 33 x 3k x n1 1 x n2 2 x n3 3 x nk k 합 (total) T.1 T.2 T.3 T.k T.. 평균 (sum) x.1 x.2 x.3 x.k x..

일원분산분석 (one-way analysis of variance) 보기 8.2.1 소의연령에따른육류의셀레니움농도비교 Comparison of selenium concentration of meat according to age of cattle 나이그룹 (age group of cattle) A B C D 1820 1483 191 724 1020 1652 775 752 2588 1723 1098 613 805 1309 1393 804 2670 727 644 918 631 1002 533 1182 1022 1463 136 949 641 966 734 1243 1555 1777 1605 877 760 788 485 985 222 1129 1247 1368 1085 472 449 1295 1197 1529 1692 775 471 236 1676 1249 1422 697 1307 771 831 754 1520 445 849 344 869 698 937 489 990 1199 961 513 167 1022 2575 489 429 239 731 824 1073 1426 2408 798 944 1130 448 948 1846 1064 631 1096 1034 991 222 1088 629 1016 1261 590 721 912 1025 42 994 375 1383 948 767 1781 1187

모형 (model) x ij j ij ij번째측정치 j처리의평균 ij번째오차 ij-th observation mean of j-th treatment group error of ij-th observation j ij k j j 1 k j j : 전체평균 Grand mean : j번째처리효과 Effect of j-th treatment group

모형의가정 2 1. ~ N(0, ) independent ij 2. 평균, 등분산, 정규성, 독립적 (mean, variance (homogeneity), normality, independence) 모형의가설 (Hypothesis of the model) H 0 : 1 2 k H A : 모든 j 가같은것은아니다. (All the j 's are not the same)

Same variances & same means Same variances but different means

총제곱합 (sum of squares, total) k n j SST ( x x ) ij.. j 1 i 1 2 k n j j 1 i 1 x ij 2 2 T.. N

k SST ( x x ) k j 1 i 1 n j ( x x x x ) j 1 i 1 n j ij.. 2 ij. j. j.. 2 k n j k n j k n j 2 2 ( xij x. j ) 2 ( xij x. j )( x. j x.. ) ( x. j x.. ) j 1 i 1 j 1 i 1 j 1 i 1 k n j k 2 2 ( xij x. ) j n j ( x. j x.. ) j 1 i 1 j 1

SST SSW SSA 분산비 = Within-group SS within MSA variance ratio= MSW among group 집단내제곱합집단간제곱합 집단간평균제곱집단내평균제곱 Among(Between)-group SS -> 분산비가커지면집단간의 variation 이크다. 집단간의성질이다르다. 집단의효과가크다. ->larger VR -> larger between-group SS 0 -> groups are different -> bigger group effect!

유전율예제 (Heritability Example)

ANOVA Table Mean square Variance ratio Sum of squares df 요인 factor 제곱합자유도평균제곱합 F 집단간제곱합 Between group SSA = k j=1 n j x.j x.. 2 k 1 MSA = SSA/(k 1) MSA MSW 집단내제곱합 Within group SSW = k n j j=1 i=1 x ij x.j 2 N k MSW = SSW/(N k) 총제곱합 total SST = k n j j=1 i=1 x ij x.. 2 N 1

ANOVA Table 1 E(MSA), 2 E(MSW) k 2 2 2 2 k A A j k 1 j 1 The null hypothesis 1 1 k ( 0) indicates the equivalence of variances estimated with MSA and MSW. However, under the alternative test, variance estimate from MSA is bigger than that from MSW. 오른쪽검정 (right-tailed test)

program PROC IMPORT OUT= WORK.sele e.csv" RUN; DATAFILE= "E:\kim\yes\myweb\int\2018\newlectureNote\data\sel DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; * SAS 코드 ; proc anova data=sele; class group; model value=group; run; # R code > sele<read.table('e:\\kim\\yes\\myweb\\int\\2018\\newlecturenote\\data\\sele.csv',s ep=',',header=t) aov(value~group,data=sele) > boxplot(value~group,data=sele)

Betw With total VR Call: aov(formula = value ~ Group, data = sele) Terms: Group Residuals Sum of Squares 5931208 23026500 Deg. of Freedom 3 109 Residual standard error: 459.6219 Estimated effects may be unbalanced >

Multiple Comparisons ( 다중비교 ) ex) significance level = for a test Let H : 0 p( do not reject H H is true) 1 01 1 01 01 H : 0 p( do not reject H H is true) 1 02 2 02 02 then p( do not reject H H ) where H H and H 0 0 0 01 02 p( do not reject H01 and do not reject H02 H0 (1- ) (1- ) (1- ) In general, if we want to test 1 2 3 k 0, then k (1 ) (1 ) 2 4 1 0.1855 0.8145 (.95).95 overall is 0.1855, not 0.05 -> inflated type I error!! )

Bonferroni Correction : Set individual significance the overall significance level is about for m multiple tests. m m=4 4 0.05 1 0.95 1 0.05 4 example) When we have 10 hypotheses, Individual p=0.05 -> multiple comparisons problem (too many false findings) Individual p= 0.05 10 0.005 This is often called Bonferroni corrected p-value.

[ 처리그룹쌍별두모평균차이의검정 ] Detecting pairwise differences After rejecting H0 : 1 2 5, which pairs have larger differences? 1. LSD (least significant difference, 최소유의차검정법 ) 2. Duncan s new multiple range test Duncan 의새로운다중범위검정법 3. Tukey s HSD Liberal Conservative Duncan LSD SNK Tukey HSD Scheffe

3. Tukey 의 HSD (honestly significance difference) 검정 MSE HSD = q, k, N k n n MSE HSD q n * *, k, N k * j n j j j 's are the same : sample size of smaller cell ymax ymin q, k, N k : dist of, S 2/ n : significance level, k : number of gropus, N K: df

보기 8.2.2 Pair-wise differences A B C D A - 455.72 574.54 596.63 B - 118.82 140.91 C - 22.10 D -

표 8.2.6 Pairwise comparisons by Tukey s HSD test 개별영가설 HSD* 검정결과 H 0 : μ A = μ B HSD = 3.690 211252 2 1 22 + 1 14 = 409.99 455.72 > 409.99 이므로 H 0 을기각함. H 0 : μ A = μ C HSD = 3.690 211252 2 1 22 + 1 29 = 339.05 574.54 > 339.05 이므로 H 0 을기각함. H 0 : μ A = μ D HSD = 3.690 211252 2 1 22 + 1 48 = 308.75 596.63 > 308.75 이므로 H 0 을기각함. H 0 : μ B = μ C HSD = 3.690 211252 2 1 14 + 1 29 = 390.27 118.82 < 390.27 이므로 H 0 을기각하지못함. H 0 : μ B = μ D HSD = 3.690 211252 2 1 14 + 1 48 = 364.25 140.91 < 364.25 이므로 H 0 을기각하지못함. H 0 : μ C = μ D HSD = 3.690 211252 2 1 29 + 1 48 = 282.05 22.10 < 282.05 이므로 H 0 을기각하지못함.

proc anova ; class group ; model value= group ; means group /Tukey ; run;

Homework 1-8 9-> 다음문제들을공식을이용해서분산분석표를계산하시오 ( 엑셀사용가능 ). 그리고 SAS 를이용한결과와비교하시오 9-> Make Anova tables using the formulae (you may use MS Excel). Compare your results with the results from SAS

8.3 확률화완전블록계획법과이원배치분산분석 (Randomized complete block design and two-way ANOVA) R.A.Fisher (1925) : to compare the yields of certain species 땅을블록 (block=land) 으로나누고블록안에서 Randomize (other factors) in a block 하는것이다. block 처리 treatments total average 블록 1 2 3 k 합 평균 1 x 11 x 12 x 13 x 1k T 1. 2 x 21 x 22 x 23 x 2k T 2. 3 x 31 x 32 x 33 x 3k T 3. x 1. x 2. x 3. total average n x n1 x n2 x n3 x nk T n. 합 T.1 T.2 T.3 T.k T.. 평균 x.1 x.2 x.3 x.k x n. x..

보기 8.3.1 약에따른치료시간의차이 (treatment duration (days) by drug) Drug 약의종류 Sum Average 나이그룹 A B C 합 평균 20 미만 11 8 10 29 9.667 Age group 20 이상 29 미만 6 5 11 22 7.333 30 이상 39 미만 7 10 13 30 10 40 이상 49 미만 9 12 13 34 11.333 50 이상 10 17 15 42 14 합 (sum) 43 52 62 157 평균 (avegage) 8.6 10.4 12.4 10.467

모형(model) x block effect trt effect ij i j ij 블럭효과 처리효과 x 2 ij ij ( i j ) ~ N(0, ) 가설 (hypothesis) H : 0 j 1,2,, k H 0 j :All 0 is not true. Some 0. A j j

* k SST ( x x ) n j 1 i 1 ij.. 2 k n k n j k n 2 2 2 ( xi. x.. ) ( x. j x.. ) ( xij xi. x. j x.. ) j 1 i 1 j 1 i 1 j 1 i 1 SST SSBl SSTr SSE df : nk 1 ( n 1) ( k 1) ( n 1)( k 1)

ANOVA table factor Sum of squares Degree of freedom Mean square 요인제곱합자유도평균제곱 F trt block error 처리 SSTr (k 1) MSTr = SSTr/(k 1) 블록 SSBl (n 1) MSBl = SSBl/(n 1) 잔차 SSE (n 1)(k 1) MSE = SSE/(n 1)(k 1) MSTr MSE total 합 SST kn 1

> response<-c(11, 6, 7, 9, 10, 8, 5, 10, 12, 17, 10, 11, 13, 13, 15) > drug<-factor(rep(c('a','b','c'),each=5)) > age<-factor(rep(1:5),3) # 교과서오류 > dat<-data.frame(response=response,drug=drug,age=age) > anova(lm(response~drug+age,data=dat)) Analysis of Variance Table data d; do j=1 to 3; do i=1 to 5; drug=j ; output; end; end; run; Response: response Df Sum Sq Mean Sq F value Pr(>F) drug 2 36.133 18.0667 3.4522 0.08300. age 4 71.733 17.9333 3.4268 0.06505. Residuals 8 41.867 5.2333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 > data a; do i=1 to 3; do j=1 to 5; age=j ; output; end; end; run; Data cc; Input response @@; Cards; 11 6 7 9 10 8 5 10 12 17 10 11 13 13 15 ;run; Data res; Merge a(keep=age) d(keep=drug) cc; run; proc print data=res;run; proc anova ; class age drug ; model response= drug age ; run;

8.4 요인실험 (Factorial Design) 과이원배치분산분석 (two-way ANOVA) 반응시간 (reduction of response time ) = 약품수준 ( 소량, 중간, 다량 )* 연령층 ( 중년, 노년 ) drug level (min, med, max)*age(mid, old) 교호작용이없을때 (Without interaction) 요인 B 약품용량 (Factor-B, drug level) 요인 A 연령 Factor A-age j=1 j=2 j=3 중년층 (Mid) i=1 5 10 20 노년층 (old) i=2 10 15 25

교호작용이있을때 (With interaction) 요인B 약품용량 요인A j=1 j=2 j=3 j=2-1 j=3-2 - 연령 중년층 (i=1) 5 10 20 5 10 노년층 (i=2) 15 10 5-5 -5

2 요인완전확률화할당계획법 (2 factors) Factor A Factor B 요인 B 요인 A 1 2 b 합 평균 x 11n x 12n x 1bn T 1.. x 1.. 1 x 111 x 121 x 1b1 x 21n x 22n x 2bn T 2.. 2 x 211 x 221 x 2b1 x 2.. x a1n x a2n x abn T a.. a x a11 x a21 x ab1 x a.. 합 T.1. T.2. T.b. T... 평균 x.1. x.2. x.b. x...

EX 8.4.2 간호사의가정방문시간 (time of staying home for a nurse) = 간호사의연령, 환자의질환 (age of the nurse, disease of the patient) 모형 (Model) x ijk i j ( ) ij ijk i 1,, a j 1,, b k 1,, n

Hypotheses( 가설 ) H : 0 i 1,, a 0 H : Not H 0 for some i. A 0 0 i 0 0 H : 0 j 1,, b j H : Not H 0 for some j. A 0 i j H :( ) 0 i 1,, a j 1,, b ij H :Not H ( ) 0 for some i, j. A ij SST=SSA+SSB+SSAB+SSE

PROC IMPORT OUT= WORK.nurse DATAFILE= "E:\kim\yes\myweb\int\201 8\newlectureNote\data\nurse.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN; proc anova; class a b; model time= a b a*b ; run; > qf(0.95,3,64) [1] 2.748191 > qf(0.95,9,64) [1] 2.029792 > qf(0.95,15,64) [1] 1.825586 > 1-pf(67.95,3,64) [1] 0 > 1-pf(27.27,9,64) [1] 0 > 1-pf(4.61,15,64) [1] 7.473861e-06

miscellaneous ( 기타 ) Log transformation: when normal assumption is violated. Normality is still problematic even after the variable transformation. Sample size is too small to check normality -> Nonparametric approach e.g. income, concentration

One Way ANOVA Type of Sum of Squares * Type Ⅰ:sequential (if we know the relative importance of the variables) Type Ⅱ: partial without interaction terms **TypeⅢ:partial with interactions (If we don t know the relative importance of the variables) TypeⅣ: There are missing cells (if none, same as TypeⅢ) *, ** : defaults model : Y Ai ij

One Way ANOVA, mod12.sas /* File : mod12.sas To demonstrate one way ANOVA */ filename in 'd:\intro\taillite.dat'; data one; infile in; input id vehtype group position speedzn resptime follotme folltmec ; if group = 1; run; proc sort ;by vehtype ; proc means; var resptime; by vehtype ; title 'Means of Response Time by Vehicle Type'; run; proc gplot ; plot resptime*vehtype ; symbol i=box; title 'Box Plot Response Time by Vehicle Type'; run; proc anova; class vehtype; model resptime = vehtype ; means vehtype /tukey lines bon cldiff scheffe snk lsd ; title 'One way Aonva for Tail Light Study'; title2 ; run;

Two Way ANOVA, mod13.sas /* File : mod13.sas To demonstrate Two way ANOVA */ filename stiff 'd:\intro\dummy.dat'; data one; infile stiff; input species $ impactor $ stiff1 stiff2 calcium magnesm ; run; proc gchart ; block species / group=impactor sumvar=stiff1 type=mean ; title 'Block Chart of Stiff1 by Impactor and Species'; run; proc anova; class species impactor; model stiff1 = species impactor species*impactor ; means species impactor / duncan lines ; title 'Two way Aonva Dummy Data'; run;