Chapter 12 비모수통계학 (nonparametric analysis) 2017/6/5
9.1 머리말 (introduction) 모수적방법 모집단의분포를가정 그분포는모수의함수 모수를알면분포를완전히안다. 모수의추정과검정이주요문제 모집단의분포가정이틀리면전체논리가다틀리게된다. Parametric approach * assumes dist n of the pop * dist n is the function of the parameters * Characteristics of the pop is determined by the parameters * Estimation and testing of the parameters are main problems * If the parametric assumptions are not valid, all the results of the analysis are questionable.
9.1 머리말 (introduction) 비모수적방법 ; * 모집단의분포를가정하지않음 ( 무분포방법 ) * data 의순위를사용 * 모수가정이합리적인경우모수적방법이훨씬더효과적 (efficient) Nonparametric approach * does not assumes the distributions of the pop (distribution-free method) * uses order of the data * If the parametric assumes are valid then parametric method is more efficient (smaller variance, less p- value)
data mean median 1,2,3,4,5 3 3 1,2,3,4,5,100 19 3.5 Median is robust to the outliers comparing to mean. (<-> sensitive) median is the same if 100 -> 10000000 Nonparametric methods typically uses order of the data, not the value of the data.
Parametric vs. nonparametric methods 비모수적방법은자료의 ( 정규성 ) 분포가정을하지않는다 Nonparametric methods are not dependent on parametric distributions. 자료의평균과분산이아닌순위를이용한방법을사용한다. It typically uses ranks rather than the mean and variance. 자료의분포가정 (eg 정규성 ) 이만족되면효율이떨어진다. If the distributional assumptions are valid, then nonparametric methods are less efficient (larger variance) Robust 한결과를준다. (outlier 에둔감 ) It is robust (not sensitive) to outliers
12.2 측정척도 (measurement scale) 명목척도 (Nominal Scale) 남자, 여자, (male, female) 서울, 부산 (NY, LA) 서열척도 (Ordinal Scale) 上, 中, 下 (high, medium, low) 구간척도 (Interval Scale) 서열도의미, 절대적차이도의미 비척도 (Ratio Scale) 비율도의미
12.3 부호검정 (Sign Test) Ex 12.3.1 학생번호 (No) 점수 (Score) 학생번호 (No) 점수 (Score) 1 75 9 82 2 90 10 103 3 86 11 88 4 110 12 124 5 115 13 110 6 94 14 77 7 132 15 99 8 74 가설 Ho : 중위수 (Median)=102, Ha : 중위수 (Median) 102
Scores above(+) or below(-) the hypothesized median (103) 학생번호 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 관측값 103 + + + + + + Decision rule H A H A H A :P(+)>P(-)=Median>102 : enough # of + s -> Reject :P(+)<P(-)=Median<102: enough # of - s -> Reject :P(+) P(-)=Median 102: enough # of + or - s -> Reject H 0 H 0 H 0 Ex12.3.1 에서 H :( 중위수 =102) H : P(+) P(-) 0 A # of + s out of 15 under ~ Bin(15,1/2)
Test statistic P k 6 15,0.5 = 15 0 0.5 0 0.5 15 + 15 1 0.5 1 0.5 14 + + 15 6 0.5 6 0.5 9 = 0.3036 We cannot reject Ho [ 짝비교를위한부호검정 ] 짝지은관측값들의차이의 + 혹은 여부를사용함. We may apply Sign test for paired observations (like paired t- test)
data sign; input score @@; datalines; 75 90 86 110 115 94 132 74 82 103 88 124 110 77 99 ; run; proc univariate mu0=102 ; run; 2-sided 1-sided=.6072/2 =.3036
Ex 12.3.2 ( 쌍을이룬집단비교 ) paired data instructed Dental Hygiene Score 점수 id 양치질교육을받은사람 (X i ) 양치질교육을받지않은사람 (Y i ) 1 1.6 2 2 2 2 3 3.7 4.1 4 3.5 2.4 5 3.3 4.2 6 2.4 3.6 7 2 3.5 8 1.5 3 9 1.5 2.5 10 2.1 2.5 11 3.6 2.5 12 2.3 2.5 Not-instructed Hypothesis H 0 H A : median of the difference is P(+)=P(-) : median of the difference is negative P(+) < P(-)
Test statistic : # of (+) id 1 2 3 4 5 6 7 8 9 10 11 12 X i Y i 0 + + 11 P k 2 11, p = 0.5 = 2 r=0 0.5 r 0.5 11 r r pbinom(2,11,0.5)=0.0327< 0.05 -> α = 0.05에서영가설을기각한다. (Reject Ho) [ 오른쪽부호검정 ] (Sign Test Using right tail) [ 표본의크기 ] (Sample size)
data pair; input edu noedu ; diff=noedu-edu ; datalines; 1.5 2.0 2.0 2.0 3.5 4.0 3.0 2.5 3.5 4.0 2.5 3.0 2.0 3.5 1.5 3.0 1.5 2.5 2.0 2.5 3.0 2.5 2.0 2.5 ;run; proc univariate ; var diff ; run; 2-sided 1-sided=.0654/2 =.03275
12.4 Wilcoxon 의위치에대한부호순위검정 (Wilcoxon s signed rank test) 관측값 (obs) d i = x i 5. 05 d i 의순서 d i 의순서와부호의곱 4.90 0.15 1 1 4.1 0.95 7 7 6.73 1.68 10 10 7.27 2.22 13 13 7.42 2.37 14 14 7.5 2.45 15 15 6.76 1.71 9 9 4.64 0.41 3 3 5.98 0.93 6 6 3.14 1.91 12 12 3.24 1.81 11 11 5.8 0.75 5 5 6.17 1.12 8 8 5.39 0.34 2 2 5.78 0.73 4 4 W + = 86, W = 34, W = 52 Ho: mean=5.50, Ha: Mean 5.50 Test stat: W= W + + W = 52 Reject Ho if W is too large or too small >wilcox.test(c(4.90,4.1,6.73,7.27,7.42,7.5,6.76,4.64, 5.98,3.14,3.24,5.8,6.17,5.39, 5.78), mu=5.05) p- 값은 0.1514
12.5 중위수검정법 (Median Test) H 0 : 중위수 ( 농촌 )= 중위수 ( 도시 ) Median(rural)=Median(urban) Mental health score urban rural urban rural 35 29 25 50 26 50 27 37 27 43 45 34 21 22 46 31 27 42 33 38 47 26 23 42 46 25 32 41 # >= Median # < Median urban rural 도시 시골 합계 중위수보다큰값의수 6 8 14 중위수보다작은값의수 10 4 14 합계 16 12 28
H 0 하에서는 2ⅹ2분할표의 row와 column이독립 Row and column are independent under Ho 2 n( ad bc ) ( a c)( b d )( a b)( c d) 2 28 6 4 10 8 2 = 2.33 3.841 16 12 14 14 1 2.33 2.706 따라서 p 0.10 Do not reject 두집단의중위수는동일하다. 2 H 0 Medians of two groups are not different.
12.6 Mann-Whitney test 가정 : 두집단의 sample size 가각각 n, m 일때 1 독립적이고확률적으로뽑았다. 2 서열적이다. 3 두집단은같은분포이고, 중위수만다르다. Assumptions: samples are n, m, respectively. 1 sampled independently and randomly. 2 ordinal scale. 3 different only by the medians. Shapes are exactly the same
Ex 12.6.1 몸무게 (Weight) Group 1 (X) Group 2 (Y) 252 254 185 280 240 164 310 264 205 288 212 270 200 138 238 210 170 240 184 192 170 217 136 126 320 240 200 220 148 302 270 295 214 312 그룹 1 의모중위수가그룹 2 의모중위수보다작다고할수있나? Is population median of group 1 is smaller than that of group 2? H 0 M X M Y vs H A M X < M Y
rank rank 그룹 1 순서 그룹 2 순서 126 1 136 2 138 3 148 4 164 5 170 6.5 170 6.5 184 8 185 9 192 10 200 11.5 200 11.5 205 13 210 14 212 15 214 16 217 17 220 18 238 19 240 21 240 21 240 21 252 23 254 24 264 25 270 26.5 270 26.5 280 28 288 29 295 30 302 31 310 32 312 33 320 34 Total 319.5 Rank sum of X m m + 1 U = W 2 18 18 + 1 = 319.5 2 = 148.5 Rule: Reject Ho if U is small enough. p-value=0.14 Evidence is not enough to reject Ho.
install.packages('coin') > library(coin) > xx<c(252,240,205,200,170,170,320,148,214,185,310,212,238,184,136,200,27 0) > yy<c(254,164,288,138,240,217,240,302,312,254,164,288,138,240,217,240,30 2,312) > dat<-data.frame(val=c(xx,yy),group=factor(rep(1:2,c(17,18))) ) > wilcox_test(val~group,data=dat,distribution = 'exact') Exact Wilcoxon-Mann-Whitney Test data: val by group (1, 2) Z = -1.4882, p-value = 0.1404 alternative hypothesis: true mu is not equal to 0
11.6 Kolmogorov-Smirnov (K-S) goodness-of-fit test Are cumulative dist ns the same? Are dist ns of two pops the same? H : F ( x) F ( x) 0 S T H : F ( x) F ( x) A S T Fˆ ( x ) : 표본누적분포함수 Pr( x x ) S F ( x ) : 모집단누적분포함수 Pr( X x ) T S T sample cumulative dist n ft (pop) Cumulative dist n ft 검정통계량 (test stat) D sup F ˆ ( x ) F ˆ ( x ) x S T
계산방법, 보기 11.6.1 공복시혈당량이정규분포를따르는가? Glucose level ~ normal dist n? 75 92 80 80 83 72 83 77 81 77 75 81 80 92 72 77 78 76 77 86 77 92 80 78 67 78 92 67 80 81 87 76 80 87 77 86 x 도수누적도수 F S (x) 67 2 2 0.0556 72 2 4 0.1111 75 2 6 0.1667 76 2 8 0.2222 77 6 14 0.3889 78 3 17 0.4722 80 6 23 0.6389 83 3 26 0.7222 84 2 28 0.7778 86 2 30 0.8333 87 2 32 0.8889 92 4 36 1.0000 합계 36
x F S x F T (x) F S x F T (x) 67 0.0556 0.0228 0.0328 72 0.1111 0.0918 0.0193 75 0.1667 0.2033 0.0366 76 0.2222 0.2514 0.0292 77 0.3889 0.3085 0.0804 x z = (x 80) 6 F T (x) [67,72) 2.00 0.0228 [72,75) 1.33 0.0918 [75,76) 0.83 0.2033 [76,77) 0.67 0.2514 [77,78) 0.50 0.3085 [78,80) 0.33 0.3707 [80,83) 0.00 0.5000 [83,84) 0.17 0.5675 [84,86) 0.67 0.7486 [86,87) 1.00 0.8413 [87,92) 1.17 0.8790 [92, ) 2.00 0.9772 78 0.4722 0.3707 0.1015 80 0.6389 0.5000 0.1389 83 0.7222 0.5675 0.1547 84 0.7778 0.7486 0.0292 86 0.8333 0.8413 0.0080 87 0.8889 0.8790 0.0099 92 1.0000 0.9772 0.0228 D=0.1547 < 0.221
http://www.mathematik.unikl.de/~schwaar/exercises/tabellen/table_kolmogorov.pdf 경고메시지 ( 들 ): In ks.test(xx, "pnorm", mean = 80, sd = 6) : Kolmogorov-Smirnov 테스트를이용할때는 ties 가있으면안됩니다 > 근사적인 p- 값을사용한다. > xx<-c(75,92,80,80,83,72,83,77,81,77,75,81,80,92,72,77,78,76,77,86,77,92,80,78, + 67,78,92,67,80,81,87,76,80,87,77,86) > ks.test(xx,'pnorm',mean=80,sd=6) One-sample Kolmogorov-Smirnov test data: xx D = 0.15604, p-value = 0.3447 alternative hypothesis: two-sided
12.8 Kruskal-Wallis One-way ANOVA7 가정 H 0 : k 개의집단은같은분포에서나왔다. H A : 적어도하나의집단은다른집단과다른분포 ( 큰값혹은작은값 ) 에서나왔다. Assumptions H 0 : k samples from the same distributions H A : one or more sample from distribution with larger or smaller location parameter
H 0 하에서는각집단에서의순위합들은비슷하다. R, R,, R 1 2 k 원래는 R R 2 i 의형태이고 i 값들이비슷하면 R R 2 i 값이작아지므로 Ho를 reject 못한다. rank-sums R, R,, R are similar under Ho 1 2 k If Ri s are similar then R R 2 i are small -> H is small, we cannot reject Ho R
보기 12.8.1 2 12 Rj H 3( n 1) ~ n( n 1) n Original values 반응값 A B C 12.01 3.67 55.63 29.44 4.05 27.88 28.02 6.49 66.81 38.33 21.12 46.27 55.91 1.11 31.19 j 2 k 1 Ordered values 반응값 A B C 5 2 13 9 3 7 8 4 15 11 6 12 14 1 10 47 16 57 H = 12 47 2 15(16) 5 + 162 5 + 572 5 P<0.009 Page 486 3 15 + 1 = 9.14
> xx<c(12.01,3.67,55.63,29.44,4.05,27.88,28.02,6.49,66.81,38.33,21.12,46.27,55.91,1.11,31.19) > dat<-data.frame(val=xx,group=factor(rep(1:3,5))) > kruskal.test(val~group,data=dat) Asymptotic Kruskal-Wallis Test data: val by group (1, 2, 3) chi-squared = 9.14, df = 2, p-value = 0.01036
Ex 12.8.2 Treatment cost by drug type per bed by hospital type Drug type A B C D E 17.38(11) 52.59(35) 27.87(20) 34.55(26) 60.77(40) 15.20(2) 44.55(28) 24.00(12) 31.15(22) 59.99(38) 14.76(1) 44.80(29) 26.55(16) 30.50(21) 58.94(37) 16.88(7) 43.25(27) 25.00(13) 31.25(23) 57.05(36) 17.02(10) 50.75(32) 27.55(19) 32.75(24) 60.50(39) 26.67(17) 52.25(34) 25.92(14) 33.00(25) 61.50(41) 15.75(4) 46.13(30) 26.01(15) 27.30(18) 51.10(33) 16.02(5) 48.87(31) 16.48(6) 15.30(3) 17.00(9) 16.98(8) R 1 =68 R 2 =246 R 3 =124 R 4 =159 R 5 =264 H = 12 41(41 + 1) 68 2 10 + 2462 8 + 1242 9 + 1592 7 + 2642 7 3 41 + 1 = 36.39 pchisq(36.39,4,lower=f)= 2.4 10 7
> val<c(17.38,15.20,14.76,16.88,17.02,26.67,15.75,16.02,15.30,16.98,52.59,44.55,44.80,43.25, 50.75, 52.25,46.13,48.87,27.87,24.00,26.55,25.00,27.55,25.92,26.01,16.48,17.00,34.55,31.15,30.50,31.25,32.75,33.00,27.30,60.77,59.99,58.94,57.05,60.50,61.50,51.10) > group<-factor(rep(c('a','b','c','d','e'),c(10,8,9,7,7))) > dat<-data.frame(val,group) > kruskal.test(val~group,data=dat) Kruskal-Wallis rank sum test data: val by group Kruskal-Wallis chi-squared = 36.394, df = 4, p-value = 2.401e-07
Ex 12.9.1 12.9 Friedman s 2-way ANOVA Physical therapists ranks of three low-volt electrical simulators Therapist Medical device 의료기기 물리치료사 A B C 1 2 3 1 2 2 3 1 3 2 3 1 4 1 3 2 5 3 2 1 6 1 2 3 7 2 3 1 8 1 3 2 9 1 3 2 R j 15 25 14 H 0 : 3 가지의료기기의성능은동일하다. (Three devices are equivalent) H A : 적어도하나의의료기기성능은다르다. (They are not equivalent)
X 2 12 r = 9 3 3 + 1 [ 15 2 + 25 2 + 15 2 ] 3(9)(3 + 1) = 8.222 [ 표 B(a)]-> p=0.016. 유의수준 0.05에서영가설기각 (Reject Ho) > val<-c(2,3,1,2,3,1,2,3,1,1,3,2,3,2,1,1,2,3,2,3,1,1,3,2,1,3,2) > group<-factor(rep(1:3,9)) > id<-factor(rep(1:9,each=3)) > friedman.test(val,group,id) Friedman rank sum test data: val, group and id Friedman chi-squared = 8.2222, df = 2, p-value = 0.01639
12.10 Spearman rank correlation coefficient 양측검정 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는독립적이아니다. 단측검정 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는정비례 H 0 : X 와 Y 는서로독립적이다. H A : X 와 Y 는반비례 2-sided H 0 : X and Y are indep. H A : X and Y are not indep. 1-sided H 0 : X and Y are indep. H A : X and Y: + association H 0 : X and Y are indep. H A : X and Y: - association
Ex 12.10 식별번호 X Y 식별변호 X Y 1 500 525 10 50 60 2 475 130 11 175 105 3 390 325 12 130 148 4 325 190 13 76 75 5 325 90 14 200 250 6 205 295 15 174 102 7 200 180 16 201 151 8 75 74 17 125 130 9 230 420 식별번호 순서 (X) 순서 (Y) 1 17 17 0.0 0.00 2 16 7.5 8.5 72.25 3 15 15 0.0 0.00 4 13.5 12 1.5 2.25 5 13.5 4 9.5 90.25 6 11 14-3.0 9.00 7 8.5 11-2.5 6.25 8 2 2 0.0 0.00 9 12 16-4.0 16.00 10 1 1 0.0 0.00 11 7 6 1.0 1.00 12 5 9-4.0 16.00 13 3 3 0.0 0.00 14 8.5 13-4.5 20.25 15 6 5 1.0 1.00 16 10 10 0.0 0.00 17 4 7.5-3.5 12.25 d 2 i =246.5 d i d i 2
가설검정의순서 1 X,Y 따로순위를준다. 2 d = i 순위 (x )- i 순위 (Y ) i 3 을구한다. d 2 i r s = 1 6 d i 2 =0.697 > 0.4853 n(n 2 1) (table C) 2 steps 1 rank X, Y seperately. 2 d =rank(x i )-rank(y i ) i 3 calculate d 2 i 반비례의관계가있다면 d i 가커지고 r s 가작아진다. 2 비례의관계가있다면 d i 가작아지고 r s 가커진다. -> 충분히큰 r s -> 두변수가독립이라는귀무가설을기각함 negative association -> large positive association -> small d 2 i d 2 i -> small r s -> large r s r s is large enough -> reject H 0 : independence We conclude positive association between X and Y
Ex 12.10.2(n>30 일경우 ) 식별번호 나이 (X) 무기질농도 (Y) 식별번호 나이 (X) 무기질농도 (Y) 1 82 169.62 19 50 4.48 2 85 48.94 20 71 46.93 3 83 41.16 21 54 30.91 4 64 63.95 22 62 34.27 5 82 21.09 23 47 41.44 6 53 5.40 24 66 109.88 7 26 6.33 25 34 2.78 8 47 4.26 26 46 4.17 9 37 3.62 27 27 6.57 10 49 4.82 28 54 61.73 11 65 108.22 29 72 47.59 12 40 10.20 30 41 10.46 13 32 2.69 31 35 3.06 14 50 6.16 32 75 49.57 15 62 23.87 33 50 5.55 16 33 2.70 34 76 50.23 17 36 3.15 35 28 6.81 18 53 60.59
Ex 12.10.2(n>30 일경우 ) 식별번호 순서 (X) 순서 (Y) d i 2 d i 식별번호 순서 (X) 순서 (Y) d i 2 d i 1 32.5 35 2.5 6.25 19 17 9 8.0 64.00 2 35 27 8.0 64.00 20 28 25 3.0 9.00 3 34 23 11.0 121.00 21 21.5 21 0.5 0.25 4 25 32 7.0 49.00 22 23.5 22 1.5 2.25 5 32.5 19 13.5 182.25 23 13.5 24 10.5 110.25 6 19.5 11 8.5 72.25 24 27 34 7.0 49.00 7 1 14 13.0 169.00 25 6 3 3.0 9.00 8 13.5 8 5.5 30.25 26 12 7 5.0 25.00 9 9 6 3.0 9.00 27 2 15 13.0 169.00 10 15 10 5.0 25.00 28 21.5 31 9.5 90.25 11 26 33 7.0 49.00 29 29 26 3.0 9.00 12 10 17 7.0 49.00 30 11 18 7.0 49.00 13 4 1 3.0 9.00 31 7 4 3.0 9.00 14 17 13 4.0 16.00 32 30 28 2.0 4.00 15 23.5 20 3.5 12.25 33 17 12 5.0 25.00 16 5 2 3.0 9.00 34 31 29 2.0 4.00 17 8 5 3.0 9.00 35 3 16-13.0 169.00 18 19.5 30 10.5 110.25 d 2 i =1788.5 r s 0.75 Z r n 1 4.37 1.96 S reject H 0 Z가너무크거나 ( 반비례관계 ) Z가너무작거나 ( 비례관계 ) larger Z (- asso) smaller Z(+asso) larger smaller if Z Z then reject H d 2 i d i 2 d i 2 d i 1 0 2 2들이크고들이작고
12.11 비모수회귀분석 (non-parametric regression) Ex. 12.11.1 [Theil s method] β = median S 12,, S n 1,n, S ij = y j y i / x j x i, S 12 = 164 163 57.4 53.9 = 0.285 테스토스테론 (Y) 163 164 156 151 152 167 165 153 155 구연산 (X) 53.9 57.4 41.0 40.0 42.0 64.4 59.1 49.9 43.2 0.285 0.470 0.202 0.643 0.655 0.126 0.487 0.669 0.965 0.863 0.384 1.304 0.747 0.588 0.747 5.000 0.497 0.633 0.924 0.732 0.454 0.779 0.760 1.250 4.00 0.377 2.500 0.500 2.500 0.566 0.380 1.466 0.628 0.428 0.337 0.298 절편의추정 (Estimating intercept ) β = median y 1 β 1 x 1,, y n β 1 x n β = median mean y 1 β 1 x 1, y 2 β 1 x 2, mean y
Mod20.sas /* File name : mod20.sas Nonparametric One-Way Anova */ options pageno=1 nodate ls=130 ps=60 nocenter; filename inbrakes 'd:\myweb\intro\taillite.dat'; data one; infile inbrakes ; input id vehtype group positn speedzn resptime follotme folltmec; if group=1; label vehtype='vehicle Type' group='group - Light On=1 Light Off=2' positn='light Position' speedzn='speed Zone' resptime='response Time' follotme='following Time in Vedio Frames' folltmec='following Time in Categories ; run; proc sort; by vehtype; /* Let's do one-way ANOVA to see the effect of vehicle type */ proc anova; class vehtype; model resptime=vehtype; title 'Parametric ANOVA analysis'; run; /* What's wrong with this? We didn't check the normality assumption. Let's do proc univariate to check the normality*/ proc univariate normal plot; var resptime; by vehtype; title 'Normality Check'; run;
/* NOT NORMALLY DISTRIBUTED >> NONPARAMETRIC ANOVA */ proc npar1way wilcoxon; class vehtype; var resptime ; title 'Nonpara One-Way ANOVA for Tail Light Study'; run; /* The other way is transformation. Let's take log transformation so that we have normal distribition.*/ data t; set one; t=log(resptime); label t='ln (response time)'; run; proc sort; by vehtype; proc univariate normal plot; var t; by vehtype; title 'Normality Check for transformed variable'; run; /* The transformed variable seems to normally ditributed. */ Then we can do parametric ANOVA with normality assumption proc anova; class vehtype; model t=vehtype; title 'ANOVA for the log transformed response time'; run;
Nonpapametric Smoothing (1) Smoothing Consider X Y plot. Draw a regression line which requires no parametric as sumptions The regression line is not linear The regression line is totally dependent on the data Two components of smoothing Kernal function : How to calculate weighted mean Bandwidth : width of the window (span), determines the smoothness of the regression line; wider > smoother
Nonpapametric Smoothing (2) Uniform Kernel
Nonpapametric Smoothing (3) Triangular Kernel
Nonpapametric Smoothing (4) Normal Kernel
Nonpapametric Smoothing (5) Default Lowess line : Span=0.5
Nonpapametric Smoothing (6) Lowess line : Span=0.2
Nonpapametric Smoothing (7) Lowess line : Span=0.1
data A; input x y @@; datalines; 1 4 2 9 3 20 4 25 5 1 6 5 7-4 8 12 ; title "sm45 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm45 value=circle height=2; /* note that x is sorted */ run; title "sm70 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm70 value=circle height=2; /* note that x is sorted */ run; title "sm20 spline smoother"; proc gplot data=a; plot y*x; symbol1 interpol=sm20 value=circle height=2; /* note that x is sorted */ run;
require(graphics) plot(cars, main = "lowess(cars)") lines(lowess(cars), col = 2) lines(lowess(cars, f =.2), col = 3) legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)
data<- read.csv("http://hosting03.snu.ac.kr/~hokim/isee2010/data2010.csv", sep=",") head(data) data$date=as.date(data$date) sl <-subset(data, ccode==11 ) boxplot(meanpm10~yy, ylab=expression(pm[10]), axes=t, data=sl) plot(sl$date,sl$meanpm10, ylab=expression(pm[10]), xaxt='n', cex=0.6) x.at<-seq(as.date("2000-01-01"), as.date("2007-12-31"),"year") xname<-c("'00-01-01","'01-01-01", "'02-01-01", "'03-01-01", "'04-01-01", "'05-01-01", "'06-01-01", "'07-01-01") axis(side=1, at=x.at, labels=xname) table(is.na(sl$meanpm10)) which(is.na(sl$meanpm10)) sl[829,"meanpm10"]<-(sl[828,"meanpm10"]+ sl[830,"meanpm10"])/2 sl[829,"meanpm10"] plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(a)f=.1", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.1), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(b)f=.05", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.05), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) plot(sl$date, sl$meanpm10, ylab=expression(pm[10]),xlab="date",main="(c)f=.5", xaxt='n', cex=0.6) lines(lowess(sl$date, sl$meanpm10, f=0.5), col="red", lwd=2) axis(side=1, at=x.at, labels=xname) par(mfrow=c(3,1))