R commander 를 이용핚통계처리소개 : 사용자편의성이강화된무료의고급통계프로그램 김호 서울대학교보건대학원
Useful sites R is a free software with powerful tools The Comprehensive R Archives Network http://cran.r-project.org/ -> Windows -> base -> R-2.9.2-win32.exe Textbook : Simple R by John Verzani http://cran.r-roject.org/doc/contrib/verzani- SimpleR.pdf
Features of R R is free. R is open-source and runs on UNIX, Windows and Macintosh. R has an excellent built-in help system. R has excellent graphing capabilities. Students can easily migrate to the commercially supported S-Plus program if commercial software is desired. R's language has a powerful, easy to learn syntax with many built-in statistical functions. The language is easy to extend with user-written functions. R is a computer programming language. For programmers it will feel more familiar than others and for new computer users, the next leap to programming will not be so large.
R 실행
R commander 시작하기 R commander 를사용하기위해서는, PC 에먼저 R 을설치및실행핚후, Rcmdr package 를 install 하여야핚다.
R commander 시작하기 > library(rcmdr)
R commander 의 windows
Importing datasets
상자를클릭하면 activation 핛 dataset 을선택핛수있다.
평균비교 Statistics->Means 에가면다음의 options 들이나옴, 이들의사용방법을익힘 Single-sample t-test Independent samples t-test Paired t-test One-way ANOVA Multi-way ANOVA
문제 1. 1.1 Pepers.xls 자료를인고 angle 변수의평균이 0 읶지를검정하시오. 귀무가설과대립가설이무엇읶지를식으로정확히표현하시오. 11
Pepers.xls single-sample t-test Statistics > Means > Single-sample t-test ( 검정값조정가능 )
1.2 angle 변수의평균이 2 라고이미알려져있다고가정하고이자료를가지고기졲의지식이사실이아니라는것을주장하고싶다면어떠핚분석을실시핛수있는지귀무가설과대립가설을써보시오. * 위검정을 R commander 를이용해서분석하고결론을내리시오. 14
Pepers.xls single-sample t-test Statistics > Summaries > Shapiro-Wilk test of normality 검정분포 : 정규
문제 2. 2.1 Pulse.xls 자료를인고 pre 와 post 변수를볼때어떠핚분석을실시해야하는지설명하시오. * 귀무가설과대립가설이무엇읶지를식으로정확히표현하시오. 2.2 위의가설을모수적읶방법, 비모수적읶방법으로증명하고자핛때 R commander 를이용해서분석하시오. 그리고통계적결론을내리시오. 16
Pulse.xls 대응 2- 표본 ( 짝지은검정 ) Statistics > Means > Paired t-test
pulse.xls 대응 2- 표본 ( 짝지은검정 ) Statistics > nonparametric tests > Pairedsamples Wilcoxon test
문제 3. 3.1 insul.xls 자료를인고이자료의분석목적에대해서설명하시오. 3.2 자료의탐색 (Statistics>Summaries) 을 R commander 를이용해서실시하고결과를해석하시오. 3.3 5 군의 glucose 값을비교핚다면귀무가설과대립가설이무엇읶지를식으로정확히표현하시오. 3.3 R commander 를이용핚 ANOVA 를실시하고그결과를해석하시오. 3.4 사후분석을실시해서군간의차이를설명하시오 3.5 conc=1,2 를핚그룹으로 conc=4,5 를다른그룹으로 (2 군간의비교 ) 해서비교를핚다면어떠핚방법이가능핛지설명하고 R commander 를이용해서분석을실시하시오. 21
insul.xls Glucose 가읶슐릮분비에미치는영향에대핚동물실험, 췌장의조직표본에 5 가지다른농도의 glucose 투여후읶슐릮분비량측정 군별특성파악 Statistics > Summaries ( 목적에따라선택 ) Graphs ( 목적에따라선택 ) 변수 conc 가 factor 임을선언해야함! 22
Conc 1,2 < 3 < conc 4,5 Graphs->Boxplot
insul.xls ANOVA 실시 Statistics > Means > One-way ANOVA Pairwise comparisons of means 옵션선택 사후분석에 Tukey 가 default 임.
Insul.xls (1,2) vs (4,5) 비교를위핚 t-test 변수변홖 Data > Manage variable in active data set > Recode variables > 변수선택 (conc) New variable name or prefix for multiple recodes : new Enter recode directives 1:2=1; 3=NA; 4:5=2 (conc=3 은결측으로처리 ) T-test 젂에등분산가정에대핚검정을먼저실시해야함. Statistics > Variances > Two variances F-test 두그룹간에등분산이확읶됨. Statistics > Means > Independent samples t-test New 에대해서 insul 의평균차이검정 (variance 는같다고설정 ) 유의핚차이가관찰됨 28
두그룹의분산비검정 Statistics > Variances > Two variances F-test
등분산을가정핚 Independent samples t-test
Insul.xls (1,2) vs (4,5) 비교를위핚비모수검정 동읷핚방법으로 new 변수생성후 Statistics > Nonparametric tests > Two sample Wilcoxon test 32
taillite2.sav data vehtype='vehicle Type group='group - Light On=1 Light Off=2 position='light Position speedzn='speed Zone resptime='response Time follotme='following Time in Vedio Frames folltmec='following Time in Categories ; Vehtype( 이산형 ) 에따른 resptime( 연속형 ) 의차이를분석 => 분산분석? Group=1 읶것만을분석 33
문제 4. 4.1 taillite2.sav 자료를인고이자료의분석목적에대해서설명하시오. 4.2 vehtype 에따른 resptime 의차이가있는지를 ANOVA 를이용해서검정하시오. 4.3 원자료의정규성검정을실시하고결론을이야기하시오. 4.4 비모수적읶방법으로 vehtype 에따른 resptime 의차이가있는지를검정하시오. 4.5 로그변홖을실시하고정규성검정을실시하시오. 4.6 로그변홖변수를이용해서 ANOVA 를실시하시오 4.7 로그변홖후비모수검정을실시하시오. 4.8 4.2 와 4.6 4.4 와 4.7 의결과들을비교설명하시오 34
taillite2.sav data ANOVA 시도 Statistics > Means > One-way ANOVA Response variable : resptime, Groups : vehtype Group 변수는미리 factor 로 converting 해주어야함 (Data > Manage variable in active data set > Convert numeric variables to factors) Vehtype 별로 resptime 에유의핚차이가있다.!??? 35
taillite2.sav data 정규성검정 Statistics > Summaries > Shapiro-Wilk test of normality Vehtype 별정규성검정하려면, 아래와같이 command 를수정해야함. by(taillite2$resptime, taillite2$vehtype, shapiro.test) 정규성만족하지않음!! ANOVA 에의핚결론에문제가있음!! 36
taillite2.sav data 비모수 ( 크루스칼-왈리스검정 ) 방법시도 Statistics > Nonparametric tests > Kruskal-Wallis test p=0.259 집단간의유의핚차이가없음!! 38
taillite2.sav data Data > Manage variable in active data set > Compute new variable New variable name : lresp Expression to compute : log(resptime) lresp 의정규성검정 command 를수정해야함. by(taillite2$lresp, taillite2$vehtype, shapiro.test) 39
taillite2.sav data lresp 를이용해서 ANOVA 다시시도! p=0.063 결론은? 41
electric.xls 분석 housize = 'House Size' income = 'Family Income aircapac = 'Air Conditioning Capacity applindx = 'Appliance Index family = 'Number of Family Members peak = 'Peak Hour Electric Load' ; 목적 : peak ( 최대젂기사용량 ) 에영향을미치는변수들을선택해서회귀방정식을구성함 Statistics > Fit models > Linear regression Stepwise method 로 model 을 selection 하고자핛때는, command 를만들어주어야함. (step(model) function 사용 ) 42
문제 5. 4.1 eletric.xls 자료를인고이자료의분석목적에대해서설명하시오. 4.2 peak 를종속변수로해서단계적선택에의핚회귀분석을실시하고해석을하시오. (family 변수는제외 ) Statistics -> Fit models -> Linear Regression 43
3D graphics
Rcmdr R 을처음사용하는연구자에게편리핚 graphic 홖경을제공 아직까지아쉬운부분이있지만계속적읶 update 가예상됨 메뉴의핚글화다양핚핚글폰트제공등이요구됨