Chapter 11 비모수 및 무분포통계학

Similar documents
G Power

R t-..


nonpara1.PDF

자료의 이해 및 분석

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re

Chapter 분포와 도수분석

methods.hwp

Chapter 7 분산분석

슬라이드 1

자료의 이해 및 분석

슬라이드 1

012임수진

Chapter 7 분산분석

<31372DB9DABAB4C8A32E687770>

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

SAS를 이용한 자료의 탐색

untitled

abstract.dvi

기관고유연구사업결과보고

모수검정을위한가정 1 종속변수가양적변수이어야함 2 모집단분포가정규분포 3 등분산가정 (equal variance assumption) 이충족되어야함 error term or residual = 이들가정은약자로 NID (0, σ 2 ) 로표현 : Normally, Ind

모수검정과비모수검정 제 6 강 지리통계학

연속형 자료분석 R commander 예제

untitled

Chapter 8 단순선형회귀분석과 상관분석

<4D F736F F F696E74202D20BECBB7B9B8A3B1E2C7D0C8B C0CEBBF3BFACB1B8BFA1BCADC8E7C8F7BBE7BFEBB5C7B4C22E707074>

DBPIA-NURIMEDIA


ANOVA 란? ANalysis Of VAriance Ø 3개이상의모집단의평균의차이를검정하는방법 Ø 3개의모집단일경우 H0 : μ1 = μ2 = μ3 H0기각 : μ1 μ2 = μ3 or μ1 = μ2 μ3 or μ1 μ2 μ3 àpost hoc test 수행

λx.x (λz.λx.x z) (λx.x)(λz.(λx.x)z) (λz.(λx.x) z) Call-by Name. Normal Order. (λz.z)

확률 및 분포

cat_data3.PDF


PowerPoint Presentation

DBPIA-NURIMEDIA

공공기관임금프리미엄추계 연구책임자정진호 ( 한국노동연구원선임연구위원 ) 연구원오호영 ( 한국직업능력개발원연구위원 ) 연구보조원강승복 ( 한국노동연구원책임연구원 ) 이연구는국회예산정책처의정책연구용역사업으로 수행된것으로서, 본연구에서제시된의견이나대안등은

Microsoft PowerPoint - 기계공학실험1-1MATLAB_개요2D.pptx

step 1-1

Buy one get one with discount promotional strategy

4 CD Construct Special Model VI 2 nd Order Model VI 2 Note: Hands-on 1, 2 RC 1 RLC mass-spring-damper 2 2 ζ ω n (rad/sec) 2 ( ζ < 1), 1 (ζ = 1), ( ) 1

DBPIA-NURIMEDIA

Chapter4.hwp

Microsoft PowerPoint - chap_11_rep.ppt [호환 모드]

DBPIA-NURIMEDIA

<4D F736F F D20BDC3B0E8BFADBAD0BCAE20C1A B0AD5FBCF6C1A45FB0E8B7AEB0E6C1A6C7D E646F63>

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할


Microsoft PowerPoint - IPYYUIHNPGFU

제장 2 비모수 검정(NONPARAMETRIC ANALYSIS) ③ 연구자는 SPSS 출력결과에서 유의확률을 확인하여 귀무가설(H0 )의 기각, 채택 여부를 결정한다. 예를 들어 연구자가 연구자료의 정규성을 검정하기 위하여 유 의수준을 α = 0.05로 설정하고 SPS

한국성인에서초기황반변성질환과 연관된위험요인연구

OR MS와 응용-03장

<C3D6C1BE2DBDC4C7B0C0AFC5EBC7D0C8B8C1F D32C8A3292E687770>

Slide 1

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a

<C7A5C1F620BEE7BDC4>

서론 34 2

大学4年生の正社員内定要因に関する実証分析

#Ȳ¿ë¼®


歯49손욱.PDF

?

09È«¼®¿µ 5~152s

6자료집최종(6.8))

hwp

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 -

融合先验信息到三维重建 组会报 告[2]

Analyses the Contents of Points per a Game and the Difference among Weight Categories after the Revision of Greco-Roman Style Wrestling Rules Han-bong

ÀÌÁÖÈñ.hwp


생존분석의 추정과 비교 : 보충자료 이용희 December 12, 2018 Contents 1 생존함수와 위험함수 생존함수와 위험함수 예제: 지수분포

R&D : Ⅰ. R&D OECD 3. Ⅱ. R&D

DBPIA-NURIMEDIA

Orcad Capture 9.x

슬라이드 1

예제 1.1 ( 관계연산자 ) >> A=1:9, B=9-A A = B = >> tf = A>4 % 4 보다큰 A 의원소들을찾을경우 tf = >> tf = (A==B) % A

KOREAN J ANESTHESIOL 문에의학연구자들이비모수통계기법에대해접할수있는기회가모수통계기법에대해상대적으로적었기때문이라고생각된다. 따라서, 이번지면을통하여그동안잘다루어지지않았던비모수통계기법의개념을실례를들어소개하여비모수통계분석에대한이해를높이고자한다. 비모수통계분석의

보고서(겉표지).PDF

확률과통계 강의자료-1.hwp

Microsoft PowerPoint - Stat03_Numerical technique(New) [Compatibility Mode]

nonpara6.PDF

PowerPoint 프레젠테이션

Microsoft Word - SAS_Data Manipulate.docx

도비라

확률과통계6

sna-node-ties

R

PowerPoint 프레젠테이션

À±½Â¿í Ãâ·Â

Chapter 7 분산분석

04-다시_고속철도61~80p

<B0A3C3DFB0E828C0DBBEF7292E687770>

1..

VOL /2 Technical SmartPlant Materials - Document Management SmartPlant Materials에서 기본적인 Document를 관리하고자 할 때 필요한 세팅, 파일 업로드 방법 그리고 Path Type인 Ph

歯4차학술대회원고(장지연).PDF


PJTROHMPCJPS.hwp

300 구보학보 12집. 1),,.,,, TV,,.,,,,,,..,...,....,... (recall). 2) 1) 양웅, 김충현, 김태원, 광고표현 수사법에 따른 이해와 선호 효과: 브랜드 인지도와 의미고정의 영향을 중심으로, 광고학연구 18권 2호, 2007 여름

목 차 1. 서론 1.1. 문제 제기 및 연구 목적 1.2. 연구 대상 및 연구 방법 2. 교양 다큐 프로그램 이해 3. 롤랑바르트 신화론에 대한 이해 3.1. 기호학과 그 에 대하여 3.2. 롤랑바르트 신화 이론 고찰 4. 분석 내용 4.1. 세계테마기행 에 대한 기

Microsoft PowerPoint - ºÐÆ÷ÃßÁ¤(ÀüÄ¡Çõ).ppt


Transcription:

Chapter 12 비모수통계학 (nonparametric analysis) 2017/6/5

9.1 머리말 (introduction) 모수적방법 모집단의분포를가정 그분포는모수의함수 모수를알면분포를완전히안다. 모수의추정과검정이주요문제 모집단의분포가정이틀리면전체논리가다틀리게된다. Parametric approach * assumes dist n of the pop * dist n is the function of the parameters * Characteristics of the pop is determined by the parameters * Estimation and testing of the parameters are main problems * If the parametric assumptions are not valid, all the results of the analysis are questionable.

9.1 머리말 (introduction) 비모수적방법 ; * 모집단의분포를가정하지않음 ( 무분포방법 ) * data 의순위를사용 * 모수가정이합리적인경우모수적방법이훨씬더효과적 (efficient) Nonparametric approach * does not assumes the distributions of the pop (distribution-free method) * uses order of the data * If the parametric assumes are valid then parametric method is more efficient (smaller variance, less p- value)

data mean median 1,2,3,4,5 3 3 1,2,3,4,5,100 19 3.5 Median is robust to the outliers comparing to mean. (<-> sensitive) median is the same if 100 -> 10000000 Nonparametric methods typically uses order of the data, not the value of the data.

Parametric vs. nonparametric methods 비모수적방법은자료의 ( 정규성 ) 분포가정을하지않는다 Nonparametric methods are not dependent on parametric distributions. 자료의평균과분산이아닌순위를이용한방법을사용한다. It typically uses ranks rather than the mean and variance. 자료의분포가정 (eg 정규성 ) 이만족되면효율이떨어진다. If the distributional assumptions are valid, then nonparametric methods are less efficient (larger variance) Robust 한결과를준다. (outlier 에둔감 ) It is robust (not sensitive) to outliers

12.2 측정척도 (measurement scale) 명목척도 (Nominal Scale) 남자, 여자, (male, female) 서울, 부산 (NY, LA) 서열척도 (Ordinal Scale) 上, 中, 下 (high, medium, low) 구간척도 (Interval Scale) 서열도의미, 절대적차이도의미 비척도 (Ratio Scale) 비율도의미

12.3 부호검정 (Sign Test) Ex 12.3.1 학생번호 (No) 점수 (Score) 학생번호 (No) 점수 (Score) 1 75 9 82 2 90 10 103 3 86 11 88 4 110 12 124 5 115 13 110 6 94 14 77 7 132 15 99 8 74 가설 Ho : 중위수 (Median)=102, Ha : 중위수 (Median) 102

Scores above(+) or below(-) the hypothesized median (103) 학생번호 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 관측값 103 + + + + + + Decision rule H A H A H A :P(+)>P(-)=Median>102 : enough # of + s -> Reject :P(+)<P(-)=Median<102: enough # of - s -> Reject :P(+) P(-)=Median 102: enough # of + or - s -> Reject H 0 H 0 H 0 Ex12.3.1 에서 H :( 중위수 =102) H : P(+) P(-) 0 A # of + s out of 15 under ~ Bin(15,1/2)

Test statistic P k 6 15,0.5 = 15 0 0.5 0 0.5 15 + 15 1 0.5 1 0.5 14 + + 15 6 0.5 6 0.5 9 = 0.3036 We cannot reject Ho [ 짝비교를위한부호검정 ] 짝지은관측값들의차이의 + 혹은 여부를사용함. We may apply Sign test for paired observations (like paired t- test)

data sign; input score @@; datalines; 75 90 86 110 115 94 132 74 82 103 88 124 110 77 99 ; run; proc univariate mu0=102 ; run; 2-sided 1-sided=.6072/2 =.3036

Ex 12.3.2 ( 쌍을이룬집단비교 ) paired data instructed Dental Hygiene Score 점수 id 양치질교육을받은사람 (X i ) 양치질교육을받지않은사람 (Y i ) 1 1.6 2 2 2 2 3 3.7 4.1 4 3.5 2.4 5 3.3 4.2 6 2.4 3.6 7 2 3.5 8 1.5 3 9 1.5 2.5 10 2.1 2.5 11 3.6 2.5 12 2.3 2.5 Not-instructed Hypothesis H 0 H A : median of the difference is P(+)=P(-) : median of the difference is negative P(+) < P(-)

Test statistic : # of (+) id 1 2 3 4 5 6 7 8 9 10 11 12 X i Y i 0 + + 11 P k 2 11, p = 0.5 = 2 r=0 0.5 r 0.5 11 r r pbinom(2,11,0.5)=0.0327< 0.05 -> α = 0.05에서영가설을기각한다. (Reject Ho) [ 오른쪽부호검정 ] (Sign Test Using right tail) [ 표본의크기 ] (Sample size)

data pair; input edu noedu ; diff=noedu-edu ; datalines; 1.5 2.0 2.0 2.0 3.5 4.0 3.0 2.5 3.5 4.0 2.5 3.0 2.0 3.5 1.5 3.0 1.5 2.5 2.0 2.5 3.0 2.5 2.0 2.5 ;run; proc univariate ; var diff ; run; 2-sided 1-sided=.0654/2 =.03275

12.4 Wilcoxon 의위치에대한부호순위검정 (Wilcoxon s signed rank test) 관측값 (obs) d i = x i 5. 05 d i 의순서 d i 의순서와부호의곱 4.90 0.15 1 1 4.1 0.95 7 7 6.73 1.68 10 10 7.27 2.22 13 13 7.42 2.37 14 14 7.5 2.45 15 15 6.76 1.71 9 9 4.64 0.41 3 3 5.98 0.93 6 6 3.14 1.91 12 12 3.24 1.81 11 11 5.8 0.75 5 5 6.17 1.12 8 8 5.39 0.34 2 2 5.78 0.73 4 4 W + = 86, W = 34, W = 52 Ho: mean=5.50, Ha: Mean 5.50 Test stat: W= W + + W = 52 Reject Ho if W is too large or too small >wilcox.test(c(4.90,4.1,6.73,7.27,7.42,7.5,6.76,4.64, 5.98,3.14,3.24,5.8,6.17,5.39, 5.78), mu=5.05) p- 값은 0.1514

12.5 중위수검정법 (Median Test) H 0 : 중위수 ( 농촌 )= 중위수 ( 도시 ) Median(rural)=Median(urban) Mental health score urban rural urban rural 35 29 25 50 26 50 27 37 27 43 45 34 21 22 46 31 27 42 33 38 47 26 23 42 46 25 32 41 # >= Median # < Median urban rural 도시 시골 합계 중위수보다큰값의수 6 8 14 중위수보다작은값의수 10 4 14 합계 16 12 28

H 0 하에서는 2ⅹ2분할표의 row와 column이독립 Row and column are independent under Ho 2 n( ad bc ) ( a c)( b d )( a b)( c d) 2 28 6 4 10 8 2 = 2.33 3.841 16 12 14 14 1 2.33 2.706 따라서 p 0.10 Do not reject 두집단의중위수는동일하다. 2 H 0 Medians of two groups are not different.

12.6 Mann-Whitney test 가정 : 두집단의 sample size 가각각 n, m 일때 1 독립적이고확률적으로뽑았다. 2 서열적이다. 3 두집단은같은분포이고, 중위수만다르다. Assumptions: samples are n, m, respectively. 1 sampled independently and randomly. 2 ordinal scale. 3 different only by the medians. Shapes are exactly the same

Ex 12.6.1 몸무게 (Weight) Group 1 (X) Group 2 (Y) 252 254 185 280 240 164 310 264 205 288 212 270 200 138 238 210 170 240 184 192 170 217 136 126 320 240 200 220 148 302 270 295 214 312 그룹 1 의모중위수가그룹 2 의모중위수보다작다고할수있나? Is population median of group 1 is smaller than that of group 2? H 0 M X M Y vs H A M X < M Y

rank rank 그룹 1 순서 그룹 2 순서 126 1 136 2 138 3 148 4 164 5 170 6.5 170 6.5 184 8 185 9 192 10 200 11.5 200 11.5 205 13 210 14 212 15 214 16 217 17 220 18 238 19 240 21 240 21 240 21 252 23 254 24 264 25 270 26.5 270 26.5 280 28 288 29 295 30 302 31 310 32 312 33 320 34 Total 319.5 Rank sum of X m m + 1 U = W 2 18 18 + 1 = 319.5 2 = 148.5 Rule: Reject Ho if U is small enough. p-value=0.14 Evidence is not enough to reject Ho.

install.packages('coin') > library(coin) > xx<c(252,240,205,200,170,170,320,148,214,185,310,212,238,184,136,200,27 0) > yy<c(254,164,288,138,240,217,240,302,312,254,164,288,138,240,217,240,30 2,312) > dat<-data.frame(val=c(xx,yy),group=factor(rep(1:2,c(17,18))) ) > wilcox_test(val~group,data=dat,distribution = 'exact') Exact Wilcoxon-Mann-Whitney Test data: val by group (1, 2) Z = -1.4882, p-value = 0.1404 alternative hypothesis: true mu is not equal to 0

11.6 Kolmogorov-Smirnov (K-S) goodness-of-fit test Are cumulative dist ns the same? Are dist ns of two pops the same? H : F ( x) F ( x) 0 S T H : F ( x) F ( x) A S T Fˆ ( x ) : 표본누적분포함수 Pr( x x ) S F ( x ) : 모집단누적분포함수 Pr( X x ) T S T sample cumulative dist n ft (pop) Cumulative dist n ft 검정통계량 (test stat) D sup F ˆ ( x ) F ˆ ( x ) x S T

계산방법, 보기 11.6.1 공복시혈당량이정규분포를따르는가? Glucose level ~ normal dist n? 75 92 80 80 83 72 83 77 81 77 75 81 80 92 72 77 78 76 77 86 77 92 80 78 67 78 92 67 80 81 87 76 80 87 77 86 x 도수누적도수 F S (x) 67 2 2 0.0556 72 2 4 0.1111 75 2 6 0.1667 76 2 8 0.2222 77 6 14 0.3889 78 3 17 0.4722 80 6 23 0.6389 83 3 26 0.7222 84 2 28 0.7778 86 2 30 0.8333 87 2 32 0.8889 92 4 36 1.0000 합계 36

x F S x F T (x) F S x F T (x) 67 0.0556 0.0228 0.0328 72 0.1111 0.0918 0.0193 75 0.1667 0.2033 0.0366 76 0.2222 0.2514 0.0292 77 0.3889 0.3085 0.0804 x z = (x 80) 6 F T (x) [67,72) 2.00 0.0228 [72,75) 1.33 0.0918 [75,76) 0.83 0.2033 [76,77) 0.67 0.2514 [77,78) 0.50 0.3085 [78,80) 0.33 0.3707 [80,83) 0.00 0.5000 [83,84) 0.17 0.5675 [84,86) 0.67 0.7486 [86,87) 1.00 0.8413 [87,92) 1.17 0.8790 [92, ) 2.00 0.9772 78 0.4722 0.3707 0.1015 80 0.6389 0.5000 0.1389 83 0.7222 0.5675 0.1547 84 0.7778 0.7486 0.0292 86 0.8333 0.8413 0.0080 87 0.8889 0.8790 0.0099 92 1.0000 0.9772 0.0228 D=0.1547 < 0.221

http://www.mathematik.unikl.de/~schwaar/exercises/tabellen/table_kolmogorov.pdf 경고메시지 ( 들 ): In ks.test(xx, "pnorm", mean = 80, sd = 6) : Kolmogorov-Smirnov 테스트를이용할때는 ties 가있으면안됩니다 > 근사적인 p- 값을사용한다. > xx<-c(75,92,80,80,83,72,83,77,81,77,75,81,80,92,72,77,78,76,77,86,77,92,80,78, + 67,78,92,67,80,81,87,76,80,87,77,86) > ks.test(xx,'pnorm',mean=80,sd=6) One-sample Kolmogorov-Smirnov test data: xx D = 0.15604, p-value = 0.3447 alternative hypothesis: two-sided

12.8 Kruskal-Wallis One-way ANOVA7 가정 H 0 : k 개의집단은같은분포에서나왔다. H A : 적어도하나의집단은다른집단과다른분포 ( 큰값혹은작은값 ) 에서나왔다. Assumptions H 0 : k samples from the same distributions H A : one or more sample from distribution with larger or smaller location parameter

H 0 하에서는각집단에서의순위합들은비슷하다. R, R,, R 1 2 k 원래는 R R 2 i 의형태이고 i 값들이비슷하면 R R 2 i 값이작아지므로 Ho를 reject 못한다. rank-sums R, R,, R are similar under Ho 1 2 k If Ri s are similar then R R 2 i are small -> H is small, we cannot reject Ho R

보기 12.8.1 2 12 Rj H 3( n 1) ~ n( n 1) n Original values 반응값 A B C 12.01 3.67 55.63 29.44 4.05 27.88 28.02 6.49 66.81 38.33 21.12 46.27 55.91 1.11 31.19 j 2 k 1 Ordered values 반응값 A B C 5 2 13 9 3 7 8 4 15 11 6 12 14 1 10 47 16 57 H = 12 47 2 15(16) 5 + 162 5 + 572 5 P<0.009 Page 486 3 15 + 1 = 9.14

> xx<c(12.01,3.67,55.63,29.44,4.05,27.88,28.02,6.49,66.81,38.33,21.12,46.27,55.91,1.11,31.19) > dat<-data.frame(val=xx,group=factor(rep(1:3,5))) > kruskal.test(val~group,data=dat) Asymptotic Kruskal-Wallis Test data: val by group (1, 2, 3) chi-squared = 9.14, df = 2, p-value = 0.01036

Ex 12.8.2 Treatment cost by drug type per bed by hospital type Drug type A B C D E 17.38(11) 52.59(35) 27.87(20) 34.55(26) 60.77(40) 15.20(2) 44.55(28) 24.00(12) 31.15(22) 59.99(38) 14.76(1) 44.80(29) 26.55(16) 30.50(21) 58.94(37) 16.88(7) 43.25(27) 25.00(13) 31.25(23) 57.05(36) 17.02(10) 50.75(32) 27.55(19) 32.75(24) 60.50(39) 26.67(17) 52.25(34) 25.92(14) 33.00(25) 61.50(41) 15.75(4) 46.13(30) 26.01(15) 27.30(18) 51.10(33) 16.02(5) 48.87(31) 16.48(6) 15.30(3) 17.00(9) 16.98(8) R 1 =68 R 2 =246 R 3 =124 R 4 =159 R 5 =264 H = 12 41(41 + 1) 68 2 10 + 2462 8 + 1242 9 + 1592 7 + 2642 7 3 41 + 1 = 36.39 pchisq(36.39,4,lower=f)= 2.4 10 7

> val<c(17.38,15.20,14.76,16.88,17.02,26.67,15.75,16.02,15.30,16.98,52.59,44.55,44.80,43.25, 50.75, 52.25,46.13,48.87,27.87,24.00,26.55,25.00,27.55,25.92,26.01,16.48,17.00,34.55,31.15,30.50,31.25,32.75,33.00,27.30,60.77,59.99,58.94,57.05,60.50,61.50,51.10) > group<-factor(rep(c('a','b','c','d','e'),c(10,8,9,7,7))) > dat<-data.frame(val,group) > kruskal.test(val~group,data=dat) Kruskal-Wallis rank sum test data: val by group Kruskal-Wallis chi-squared = 36.394, df = 4, p-value = 2.401e-07

Ex 12.9.1 12.9 Friedman s 2-way ANOVA Physical therapists ranks of three low-volt electrical simulators Therapist Medical device 의료기기 물리치료사 A B C 1 2 3 1 2 2 3 1 3 2 3 1 4 1 3 2 5 3 2 1 6 1 2 3 7 2 3 1 8 1 3 2 9 1 3 2 R j 15 25 14 H 0 : 3 가지의료기기의성능은동일하다. (Three devices are equivalent) H A : 적어도하나의의료기기성능은다르다. (They are not equivalent)

X 2 12 r = 9 3 3 + 1 [ 15 2 + 25 2 + 15 2 ] 3(9)(3 + 1) = 8.222 [ 표 B(a)]-> p=0.016. 유의수준 0.05에서영가설기각 (Reject Ho) > val<-c(2,3,1,2,3,1,2,3,1,1,3,2,3,2,1,1,2,3,2,3,1,1,3,2,1,3,2) > group<-factor(rep(1:3,9)) > id<-factor(rep(1:9,each=3)) > friedman.test(val,group,id) Friedman rank sum test data: val, group and id Friedman chi-squared = 8.2222, df = 2, p-value = 0.01639

12.10 Spearman rank correlation coefficient 양측검정 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는독립적이아니다. 단측검정 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는정비례 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는반비례 2-sided H 0 : X and Y are indep. H A : X and Y are not indep. 1-sided H 0 : X and Y are indep. H A : X and Y: + association H 0 : X and Y are indep. H A : X and Y: - association

Ex 12.10 식별번호 X Y 식별변호 X Y 1 500 525 10 50 60 2 475 130 11 175 105 3 390 325 12 130 148 4 325 190 13 76 75 5 325 90 14 200 250 6 205 295 15 174 102 7 200 180 16 201 151 8 75 74 17 125 130 9 230 420 식별번호 순서 (X) 순서 (Y) 1 17 17 0.0 0.00 2 16 7.5 8.5 72.25 3 15 15 0.0 0.00 4 13.5 12 1.5 2.25 5 13.5 4 9.5 90.25 6 11 14-3.0 9.00 7 8.5 11-2.5 6.25 8 2 2 0.0 0.00 9 12 16-4.0 16.00 10 1 1 0.0 0.00 11 7 6 1.0 1.00 12 5 9-4.0 16.00 13 3 3 0.0 0.00 14 8.5 13-4.5 20.25 15 6 5 1.0 1.00 16 10 10 0.0 0.00 17 4 7.5-3.5 12.25 d 2 i =246.5 d i d i 2

가설검정의순서 1 X,Y 따로순위를준다. 2 d = i 순위 (x )- i 순위 (Y ) i 3 을구한다. d 2 i r s = 1 6 d i 2 =0.697 > 0.4853 n(n 2 1) (table C) 2 steps 1 rank X, Y seperately. 2 d =rank(x i )-rank(y i ) i 3 calculate d 2 i 반비례의관계가있다면 d i 가커지고 r s 가작아진다. 2 비례의관계가있다면 d i 가작아지고 r s 가커진다. -> 충분히큰 r s -> 두변수가독립이라는귀무가설을기각함 negative association -> large positive association -> small d 2 i d 2 i -> small r s -> large r s r s is large enough -> reject H 0 : independence We conclude positive association between X and Y

Ex 12.10.2(n>30 일경우 ) 식별번호 나이 (X) 무기질농도 (Y) 식별번호 나이 (X) 무기질농도 (Y) 1 82 169.62 19 50 4.48 2 85 48.94 20 71 46.93 3 83 41.16 21 54 30.91 4 64 63.95 22 62 34.27 5 82 21.09 23 47 41.44 6 53 5.40 24 66 109.88 7 26 6.33 25 34 2.78 8 47 4.26 26 46 4.17 9 37 3.62 27 27 6.57 10 49 4.82 28 54 61.73 11 65 108.22 29 72 47.59 12 40 10.20 30 41 10.46 13 32 2.69 31 35 3.06 14 50 6.16 32 75 49.57 15 62 23.87 33 50 5.55 16 33 2.70 34 76 50.23 17 36 3.15 35 28 6.81 18 53 60.59

Ex 12.10.2(n>30 일경우 ) 식별번호 순서 (X) 순서 (Y) d i 2 d i 식별번호 순서 (X) 순서 (Y) d i 2 d i 1 32.5 35 2.5 6.25 19 17 9 8.0 64.00 2 35 27 8.0 64.00 20 28 25 3.0 9.00 3 34 23 11.0 121.00 21 21.5 21 0.5 0.25 4 25 32 7.0 49.00 22 23.5 22 1.5 2.25 5 32.5 19 13.5 182.25 23 13.5 24 10.5 110.25 6 19.5 11 8.5 72.25 24 27 34 7.0 49.00 7 1 14 13.0 169.00 25 6 3 3.0 9.00 8 13.5 8 5.5 30.25 26 12 7 5.0 25.00 9 9 6 3.0 9.00 27 2 15 13.0 169.00 10 15 10 5.0 25.00 28 21.5 31 9.5 90.25 11 26 33 7.0 49.00 29 29 26 3.0 9.00 12 10 17 7.0 49.00 30 11 18 7.0 49.00 13 4 1 3.0 9.00 31 7 4 3.0 9.00 14 17 13 4.0 16.00 32 30 28 2.0 4.00 15 23.5 20 3.5 12.25 33 17 12 5.0 25.00 16 5 2 3.0 9.00 34 31 29 2.0 4.00 17 8 5 3.0 9.00 35 3 16-13.0 169.00 18 19.5 30 10.5 110.25 d 2 i =1788.5 r s 0.75 Z r n 1 4.37 1.96 S reject H 0 Z가너무크거나 ( 반비례관계 ) Z가너무작거나 ( 비례관계 ) larger Z (- asso) smaller Z(+asso) larger smaller if Z Z then reject H d 2 i d i 2 d i 2 d i 1 0 2 2들이크고들이작고

12.11 비모수회귀분석 (non-parametric regression) Ex. 12.11.1 [Theil s method] β = median S 12,, S n 1,n, S ij = y j y i / x j x i, S 12 = 164 163 57.4 53.9 = 0.285 테스토스테론 (Y) 163 164 156 151 152 167 165 153 155 구연산 (X) 53.9 57.4 41.0 40.0 42.0 64.4 59.1 49.9 43.2 0.285 0.470 0.202 0.643 0.655 0.126 0.487 0.669 0.965 0.863 0.384 1.304 0.747 0.588 0.747 5.000 0.497 0.633 0.924 0.732 0.454 0.779 0.760 1.250 4.00 0.377 2.500 0.500 2.500 0.566 0.380 1.466 0.628 0.428 0.337 0.298 절편의추정 (Estimating intercept ) β = median y 1 β 1 x 1,, y n β 1 x n β = median mean y 1 β 1 x 1, y 2 β 1 x 2, mean y

Mod20.sas /* File name : mod20.sas Nonparametric One-Way Anova */ options pageno=1 nodate ls=130 ps=60 nocenter; filename inbrakes 'd:\myweb\intro\taillite.dat'; data one; infile inbrakes ; input id vehtype group positn speedzn resptime follotme folltmec; if group=1; label vehtype='vehicle Type' group='group - Light On=1 Light Off=2' positn='light Position' speedzn='speed Zone' resptime='response Time' follotme='following Time in Vedio Frames' folltmec='following Time in Categories ; run; proc sort; by vehtype; /* Let's do one-way ANOVA to see the effect of vehicle type */ proc anova; class vehtype; model resptime=vehtype; title 'Parametric ANOVA analysis'; run; /* What's wrong with this? We didn't check the normality assumption. Let's do proc univariate to check the normality*/ proc univariate normal plot; var resptime; by vehtype; title 'Normality Check'; run;

/* NOT NORMALLY DISTRIBUTED >> NONPARAMETRIC ANOVA */ proc npar1way wilcoxon; class vehtype; var resptime ; title 'Nonpara One-Way ANOVA for Tail Light Study'; run; /* The other way is transformation. Let's take log transformation so that we have normal distribition.*/ data t; set one; t=log(resptime); label t='ln (response time)'; run; proc sort; by vehtype; proc univariate normal plot; var t; by vehtype; title 'Normality Check for transformed variable'; run; /* The transformed variable seems to normally ditributed. */ Then we can do parametric ANOVA with normality assumption proc anova; class vehtype; model t=vehtype; title 'ANOVA for the log transformed response time'; run;

Nonpapametric Smoothing (1) Smoothing Consider X Y plot. Draw a regression line which requires no parametric as sumptions The regression line is not linear The regression line is totally dependent on the data Two components of smoothing Kernal function : How to calculate weighted mean Bandwidth : width of the window (span), determines the smoothness of the regression line; wider > smoother

Nonpapametric Smoothing (2) Uniform Kernel

Nonpapametric Smoothing (3) Triangular Kernel

Nonpapametric Smoothing (4) Normal Kernel

Nonpapametric Smoothing (5) Default Lowess line : Span=0.5

Nonpapametric Smoothing (6) Lowess line : Span=0.2

Nonpapametric Smoothing (7) Lowess line : Span=0.1

data A; input x y @@; datalines; 1 4 2 9 3 20 4 25 5 1 6 5 7-4 8 12 ; title "sm45 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm45 value=circle height=2; /* note that x is sorted */ run; title "sm70 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm70 value=circle height=2; /* note that x is sorted */ run; title "sm20 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm20 value=circle height=2; /* note that x is sorted */ run;

require(graphics) plot(cars, main = "lowess(cars)") lines(lowess(cars), col = 2) lines(lowess(cars, f =.2), col = 3) legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)

data<- read.csv("http://hosting03.snu.ac.kr/~hokim/isee2010/data2010.csv", sep=",") head(data) data$date=as.date(data$date) sl <-subset(data, ccode==11 ) boxplot(meanpm10~yy, ylab=expression(pm[10]), axes=t, data=sl) plot(sl$date,sl$meanpm10, ylab=expression(pm[10]), xaxt='n', cex=0.6) x.at<-seq(as.date("2000-01-01"), as.date("2007-12-31"),"year") xname<-c("'00-01-01","'01-01-01", "'02-01-01", "'03-01-01", "'04-01-01", "'05-01-01", "'06-01-01", "'07-01-01") axis(side=1, at=x.at, labels=xname) table(is.na(sl$meanpm10)) which(is.na(sl$meanpm10)) sl[829,"meanpm10"]<-(sl[828,"meanpm10"]+ sl[830,"meanpm10"])/2 sl[829,"meanpm10"] plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(a)f=.1", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.1), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(b)f=.05", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.05), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(c)f=.5", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.5), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) par(mfrow=c(3,1))