Chapter 7 분산분석

Similar documents
Chapter 7 분산분석

Chapter 7 분산분석

nonpara6.PDF

Microsoft PowerPoint - IPYYUIHNPGFU


ANOVA 란? ANalysis Of VAriance Ø 3개이상의모집단의평균의차이를검정하는방법 Ø 3개의모집단일경우 H0 : μ1 = μ2 = μ3 H0기각 : μ1 μ2 = μ3 or μ1 = μ2 μ3 or μ1 μ2 μ3 àpost hoc test 수행

abstract.dvi

Microsoft PowerPoint - ANOVA pptx

hwp

G Power

이다. 즉 μ μ μ : 가아니다. 이러한검정을하기위하여분산분석은다음과같은가정을두고있다. 분산분석의가정 (1) r개모집단분포는모두정규분포를이루고있다. (2) r개모집단의평균은다를수있으나분산은모두같다. (3) r개모집단에서추출한표본은서로독립적이다. 분산분석은집단을구분하는

PPT Template

nonpara1.PDF

고객관계를 리드하는 서비스 리더십 전략

R t-..

DBPIA-NURIMEDIA


untitled


( )실험계획법-머리말 ok

methods.hwp

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re

Microsoft Word - multiple


본문01

eda_ch7.doc

Microsoft Word - sbe13_anova.docx

Chapter 8 단순선형회귀분석과 상관분석

PPT Template

#Ȳ¿ë¼®

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA

歯14.양돈규.hwp

012임수진

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a

Microsoft Word - sbe_anova.docx


Chapter 11 비모수 및 무분포통계학

Microsoft Word - Chapter9.doc

cat_data3.PDF

분산분석.pages

DBPIA-NURIMEDIA

歯1.PDF


<B0A3C3DFB0E828C0DBBEF7292E687770>

<31372DB9DABAB4C8A32E687770>

DBPIA-NURIMEDIA

04 형사판례연구 hwp

Microsoft Word - SBE2012_anova.docx

서론 34 2

제 1 절 two way ANOVA 제1절 1 two way ANOVA 두 요인(factor)의 각 요인의 평균비교와 교호작용(interaction)을 검정하는 것을 이 원배치 분산분석(two way ANalysis Of VAriance; two way ANOVA)이라

untitled

???? 1

Abstract Background : Most hospitalized children will experience physical pain as well as psychological distress. Painful procedure can increase anxie

,126,865 43% (, 2015).,.....,..,.,,,,,, (AMA) Lazer(1963)..,. 1977, (1992)

- 1 -

untitled


저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할


Pharmacotherapeutics Application of New Pathogenesis on the Drug Treatment of Diabetes Young Seol Kim, M.D. Department of Endocrinology Kyung Hee Univ


Microsoft PowerPoint - AC3.pptx

유해중금속안정동위원소의 분석정밀 / 정확도향상연구 (I) 환경기반연구부환경측정분석센터,,,,,,,, 2012

Microsoft PowerPoint - 27.pptx

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

1..

BSC Discussion 1

책임연구기관

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

서론

연속형 자료분석 R commander 예제

도비라

하나님의 선한 손의 도우심 이세상에서 가장 큰 축복은 하나님이 나와 함께 하시는 것입니다. 그 이 유는 하나님이 모든 축복의 근원이시기 때문입니다. 에스라서에 보면 하나님의 선한 손의 도우심이 함께 했던 사람의 이야기 가 나와 있는데 에스라 7장은 거듭해서 그 비결을

Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

서강대학교 기초과학연구소대학중점연구소 심포지엄기초과학연구소

<3130C0E5>

PowerPoint 프레젠테이션

untitled

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

sna-node-ties

ÀÌÁÖÈñ.hwp

Microsoft PowerPoint - 26.pptx

기관고유연구사업결과보고

PJTROHMPCJPS.hwp

DBPIA-NURIMEDIA

step 1-1

<3136C1FD31C8A320C5EBC7D52E687770>

Analyses the Contents of Points per a Game and the Difference among Weight Categories after the Revision of Greco-Roman Style Wrestling Rules Han-bong

부문별 에너지원 수요의 변동특성 및 공통변동에 미치는 거시적 요인들의 영향력 분석

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

<35BFCFBCBA2E687770>

<352EC7E3C5C2BFB55FB1B3C5EBB5A5C0CCC5CD5FC0DABFACB0FAC7D0B4EBC7D02E687770>

목 차 1. 서론 1.1. 문제 제기 및 연구 목적 1.2. 연구 대상 및 연구 방법 2. 교양 다큐 프로그램 이해 3. 롤랑바르트 신화론에 대한 이해 3.1. 기호학과 그 에 대하여 3.2. 롤랑바르트 신화 이론 고찰 4. 분석 내용 4.1. 세계테마기행 에 대한 기

<4D F736F F F696E74202D20C1A63132C0E520C0CCBFF8BAD0BBEABAD0BCAE205BC8A3C8AF20B8F0B5E55D>

DBPIA-NURIMEDIA


<B0E6C8F1B4EBB3BBB0FA20C0D3BBF3B0ADC1C E687770>

한국성인에서초기황반변성질환과 연관된위험요인연구

27 2, 1-16, * **,,,,. KS,,,., PC,.,,.,,. :,,, : 2009/08/12 : 2009/09/03 : 2009/09/30 * ** ( :

Transcription:

Chapter 7 분산분석 (ANalysis Of VAariance, ANOVA) 2014/4/29

7.1 머리말 (Introduction) 분산분석 (analysis of variance) : 전체변동을몇개의성분으로분할하는기법 (Divide total variation into several components) 전체변동에대해각각의변동요인의기여규모를파악 (contribution of particular components) 목적 (Aims) : 모분산의추정과가설검정 (estimation & testing for the variances) 모평균의추정과가설검정 (estimation & testing for the means)

motivation 비교하고싶은그룹이두개이면 (comparisons of two groups) -> t-test 비교하고싶은그룹이두개이상이면 (more than two groups) -> 두개그룹씩뽑아서쌍을만든후에여러개의 t-test 를실시한다. (pairwise t-tests) 번거롭기도하고이론적으로틀린결론에도달할수있다. (cumbersome & theoretically wrong -> 다중비교의문제 (multiple-comparisons problems) 전체자료를사용하지않고자료의부분만을사용하므로효율이떨어진다. (efficiency problems due to the usage of partial data) 전체자료를이용하여서세그룹이상을비교하는분석 (more than 3 groups using whole data) -> 분산분석 ANOVA ( 종속변수는연속형, 독립변수는이산형 ) response var: conti, explanatory var: categorical

7.2 완전확률계획법 (Completely Randomized Design) 정의 : 처리방법을확률적으로할당하고그처리효과를판단할수있다. (complete randomization) (treatments) (total) (average)

일원분산분석 (one-way analysis of variance) 보기 7.2.1 Glucose 가인슐린분비량에미치는영향 (glucose & insulin)

모형 (model) x µ ε = + ij j ij ij번째측정치 j처리의평균 ij번째오차 ij-th observation mean of j-th treatment group error of ij-th observation µ τ ε = + + j ij k µ j j = 1 µ = k τ j = µ j µ : 전체평균 Grand mean : j번째처리효과 Effect of j-th treatment group

모형의가정 1. e ~ N (0, σ ) independent ij 2. 평균, 등분산, 정규성, 독립적 2 (mean, variance (homogeneity), normality, independence) 모형의가설 (Hypothesis of the model) H 0 : µ 1 = µ 2 = = µ k H A : 모든 µ j 가같은것은아니다. (All the µ j 's are not the same)

Same variances & same means Same variances but different means

총자승합 (sum of squares, total) k SST = ( x - x ) ij.. n j j = 1 i = 1 2 k n j = j = 1 i = 1 x 2 ij - 2 T.. N

k k n n SST= (x -x ) j=1 i= 1 j = ( x x + x x ) j= 1 i= 1 j ij. j. j.. 2 k nj k nj k nj 2 2 ( xij x. j ) 2 ( xij x. j )( x. j x.. ) ( x. j x.. ) j= 1 i= 1 j= 1 i= 1 j= 1 i= 1 = + + k n j k 2 2 ( xij x. ) j nj ( x. j x.. ) j= 1 i= 1 j= 1 ij.. = + 2

Within-group SS Among(Between)-group SS SST = SSW + SSA within among group 집단내자승합 집단간평균자승분산비 = 집단내평균자승 집단간자승합 MSA variance ratio= MSW -> 분산비가커지면집단간의 variation 이크다. 집단간의성질이다르다. 집단의효과가크다. ->larger VR -> larger between-group SS 0 -> groups are different -> bigger group effect!

유전율예제 (Heritability Example)

ANOVA Table factor Sum of squares df Mean square Variance ratio Between group Within group total

SAS program * file eg7_2_1.sas ; data insul; input glu ins ; cards; 1 1.53 1 1.61 1 3.75 1 2.89 1 3.26 2 3.15 2 3.96 2 3.59 2 1.89 2 1.45 2 1.56 3 3.89 3 3.68 3 5.70 3 5.62 3 5.79 3 5.33 4 8.18 4 5.64 4 7.36 4 5.33 4 8.82 4 5.26 4 7.10 5 5.86 5 5.46 5 5.69 5 6.49 5 7.81 5 9.03 5 7.49 5 8.98 ;run; proc means sum mean ; by glu; var ins ; run; proc anova ; class glu ; model ins=glu ; run;

The ANOVA Procedure Dependent Variable: ins Sum of Source DF Squares Mean Square F Value Pr > F Betw With total Model 4 121.1854282 30.2963570 19.78 <.0001 Error 27 41.3573937 1.5317553 Corrected Total 31 162.5428219 VR R-Square Coeff Var Root MSE ins Mean 0.745560 24.27491 1.237641 5.098438 Source DF Anova SS Mean Square F Value Pr > F glu 4 121.1854282 30.2963570 19.78 <.0001

Multiple Comparisons ( 다중비교 ) α ex) significance level = for a test Let H : α = 0 p( do not reject H H is true) = 1 01 1 01 01 H : α = 0 p( do not reject H H is true) = 1 α 02 2 02 02 then p( do not reject H H ) where H = H and H p( do not reject H 0 0 0 01 02 and do not reje = 01 ct H02 H0 = (1- α) (1- α) = (1- α) In general, if we want to test, then k (1 α) (1 α) 2 α α α α k 1 = 2 = 3 = = = 0 4 1 0.1855 = 0.8145 = (.95).95 overall is 0.1855, not 0.05 -> inflated type I error!! α ) α

Bonferroni Correction : Set individual significance α m the overall significance level is about α for m multiple tests. m=4 0.05 = 4 4 (1 ) 0.95 1 0.05 example) When we have 10 hypotheses, Individual p=0.05 -> multiple comparisons problem (too many false findings) Individual p= 0.05 10 = 0.005 This is often called Bonferroni corrected p-value.

Detecting pairwise differences After rejecting H 0: µ 1 = µ 2 = = µ 5 have larger differences?, which pairs 1. LSD (least significant difference, 최소유의차검정법 ) 2. Duncan s new multiple range test Duncan의새로운다중범위검정법

3. Tukey 의 HSD (honestly significance difference) 검정 MSE HSD = q n n α, kn, k j MSE HSD = q n n * * α, kn, k * j j 's are the same : sample size of smaller cell

보기 7.2.2 Pairwise mean-differences of glucose example

표 7.2.6 Pairwise comparisons by Tukey s HSD test

e.g. 24 4.17 30 4.10 보간법 (interapolation) 0.07 : x = (30 24) : (27 24) 4.17 4.10 6x= 0.07 3 24 27 30 0.07 3 x= = 0.035 6 4.17 0.035= 4.135 4.14 2.60 2.61 5.00 6.81 7.10

proc anova ; class glu ; model ins=glu ; means glu /Tukey ; run; The ANOVA Procedure Tukey's Studentized Range (HSD) Test for ins NOTE: This test controls the Type I experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 27 Error Mean Square 1.531755 Critical Value of Studentized Range 4.13047 Comparisons significant at the 0.05 level are indicated by ***. Difference glu Between Simultaneous 95% Comparison Means Confidence Limits 5-4 0.2884-1.5824 2.1592 5-3 2.0996 0.1474 4.0518 *** 5-1 4.4933 2.4325 6.5540 *** 5-2 4.5013 2.5491 6.4534 *** 4-5 -0.2884-2.1592 1.5824 4-3 1.8112-0.1999 3.8223 4-1 4.2049 2.0883 6.3214 *** 4-2 4.2129 2.2018 6.2239 ***

Comparisons significant at the 0.05 level are indicated by ***. Difference glu Between Simultaneous 95% Comparison Means Confidence Limits 5-4 0.2884-1.5824 2.1592 5-3 2.0996 0.1474 4.0518 *** 5-1 4.4933 2.4325 6.5540 *** 5-2 4.5013 2.5491 6.4534 *** 4-5 -0.2884-2.1592 1.5824 4-3 1.8112-0.1999 3.8223 4-1 4.2049 2.0883 6.3214 *** 4-2 4.2129 2.2018 6.2239 *** 3-5 -2.0996-4.0518-0.1474 *** 3-4 -1.8112-3.8223 0.1999 3-1 2.3937 0.2048 4.5825 *** 3-2 2.4017 0.3147 4.4886 *** 1-5 -4.4933-6.5540-2.4325 *** 1-4 -4.2049-6.3214-2.0883 *** 1-3 -2.3937-4.5825-0.2048 *** 1-2 0.0080-2.1808 2.1968 2-5 -4.5013-6.4534-2.5491 *** 2-4 -4.2129-6.2239-2.2018 *** 2-3 -2.4017-4.4886-0.3147 *** 2-1 -0.0080-2.1968 2.1808

Homework 다음문제들을공식을이용해서분산분석표를계산하시오 ( 엑셀사용가능 ). 그리고 SAS 를이용한결과와비교하시오 Make Anova tables using the formulae (you may use MS Excel). Compare your results with the results from SAS 연습문제 7.2.2 연습문제 7.2.7

7.3 확률화완전블록계획법 (Randomized complete block design) R.A.Fisher (1925) : to compare the yields of certain species 땅을블록 (block=land) 으로나누고블록안에서 Randomize (other factors) in a block 하는것이다. block treatments total average total average

보기 7.3.1 # of days to lean how to use a dental device Age Teaching methods

모형(model) x block effect trt effect = µ + β + τ + e ij i j ij 블럭효과처리효과 e = x µ + β + τ N σ 2 ij ij ( i j )~ (0, ) 가설 (hypothesis) H : τ = 0 j = 1, 2,, k H 0 j :All τ = 0 is not true. Some τ 0. A j j

k n * SST= (x -x ) j=1 i= 1 ij.. 2 k n k n j k n 2 2 2 ( xi. x.. ) ( x. j x.. ) ( xij xi. x. j x.. ) j= 1 i= 1 j= 1 i= 1 j= 1 i= 1 = + + SST = SSBl + SSTr + SSE df : nk 1 = ( n 1) + ( k 1) + ( n 1)( k 1)

ANOVA table factor trt block error total

연습문제 7.3.4 (SAS) Homework

7.4 요인실험 (Factorial Design) 반응시간 (reduction of response time ) = 약품수준 ( 소량, 중간, 다량 )* 연령층 ( 중년, 노년 ) drug level (min, med, max)*age(mid, old) 교호작용이없을때 (Without interaction) 요인 B 약품용량 (Factor-B, drug level) 요인 A 연령 Factor A-age j=1 j=2 j=3 중년층 (Mid) i=1 5 10 20 노년층 (old) i=2 10 15 25 age Drug level reduction of response time Drug level age

교호작용이있을때 (With interaction) 요인B 약품용량 요인A j=1 j=2 j=3 j=2-1 j=3-2 - 연령 중년층 (i=1) 5 10 20 5 10 노년층 (i=2) 15 10 5-5 -5

2 요인완전확률화할당계획법 (2 factors) Factor B Factor A

보기 7.4.2 간호사의가정방문시간 (time of staying home for a nurse) = 간호사의연령, 환자의질환 (age of the nurse, disease of the patient) 모형 (Model) x ijk = µ + α i + β j + ( αβ ) ij + e ijk i = 1,, a j = 1,, b k = 1,, n

Hypotheses( 가설 ) H : α = 0 i = 1,, a H 0 A 0 0 i : Not Ho α 0 for some i. i H : β = 0 j = 1,, b H A j : Not Ho α 0 for some j. i H :( αβ ) = 0 i = 1,, a j = 1,, b H A ij :Not Ho ( αβ ) 0 for some i, j. ij SST=SSA+SSB+SSAB+SSE

factor treatment error total > qf(0.95,3,64) [1] 2.748191 > qf(0.95,9,64) [1] 2.029792 > qf(0.95,15,64) [1] 1.825586 > 1-pf(67.95,3,64) [1] 0 > 1-pf(27.27,9,64) [1] 0 > 1-pf(4.61,15,64) [1] 7.473861e-06

Homework 연습문제 7.4.2 (SAS) 연습문제 7.4.3 (SAS)

7.5 miscellaneous ( 기타 ) Log transformation: when normal assumption is violated. Normality is still problematic even after the variable transformation. Sample size is too small to check normality -> Nonparametric approach e.g. income, concentration

One Way ANOVA Type of Sum of Squares * Type Ⅰ:sequential (if we know the relative importance of the variables) Type Ⅱ: partial without interaction terms **TypeⅢ:partial with interactions (If we don t know the relative importance of the variables) TypeⅣ: There are missing cells (if none, same as TypeⅢ) *, ** : defaults model : Y = µ + Ai + εij

One Way ANOVA, mod12.sas /* File : mod12.sas To demonstrate one way ANOVA */ filename in 'd:\intro\taillite.dat'; data one; infile in; input id vehtype group position speedzn resptime follotme folltmec ; if group = 1; run; proc sort ;by vehtype ; proc means; var resptime; by vehtype ; title 'Means of Response Time by Vehicle Type'; run; proc gplot ; plot resptime*vehtype ; symbol i=box; title 'Box Plot Response Time by Vehicle Type'; run; proc anova; class vehtype; model resptime = vehtype ; means vehtype /tukey lines bon cldiff scheffe snk lsd ; title 'One way Aonva for Tail Light Study'; title2 ; run;

Two Way ANOVA, mod13.sas /* File : mod13.sas To demonstrate Two way ANOVA */ filename stiff 'd:\intro\dummy.dat'; data one; infile stiff; input species $ impactor $ stiff1 stiff2 calcium magnesm ; run; proc gchart ; block species / group=impactor sumvar=stiff1 type=mean ; title 'Block Chart of Stiff1 by Impactor and Species'; run; proc anova; class species impactor; model stiff1 = species impactor species*impactor ; means species impactor / duncan lines ; title 'Two way Aonva Dummy Data'; run;