Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification ( ) ( ) Robert Ian McKay ( )

Similar documents
<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo


À±½Â¿í Ãâ·Â

정보기술응용학회 발표

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월

김기남_ATDC2016_160620_[키노트].key

°í¼®ÁÖ Ãâ·Â

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

45-51 ¹Ú¼ø¸¸

DBPIA-NURIMEDIA

R을 이용한 텍스트 감정분석

03-최신데이터

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

6.24-9년 6월

DBPIA-NURIMEDIA

#Ȳ¿ë¼®

adfasdfasfdasfasfadf

Æ÷Àå82š

<30312DC1A4BAB8C5EBBDC5C7E0C1A4B9D7C1A4C3A52DC1A4BFB5C3B62E687770>

I

DIY 챗봇 - LangCon

04김호걸(39~50)ok

<32392D342D313020C0FCB0C7BFED2CC0CCC0B1C8F12E687770>

<30382E20B1C7BCF8C0E720C6EDC1FD5FC3D6C1BEBABB2E687770>

27 2, 17-31, , * ** ***,. K 1 2 2,.,,,.,.,.,,.,. :,,, : 2009/08/19 : 2009/09/09 : 2009/09/30 * 2007 ** *** ( :

07.045~051(D04_신상욱).fm

DBPIA-NURIMEDIA

탄도미사일 방어무기체계 배치모형 연구 (Optimal Allocation Model for Ballistic Missile Defense System by Simulated Annealing Algorithm)

<33312D312D313220C0CCC7D1C1F820BFB0C3A2BCB12E687770>

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE. vol. 29, no. 10, Oct ,,. 0.5 %.., cm mm FR4 (ε r =4.4)

05( ) CPLV12-04.hwp

09권오설_ok.hwp

Journal of Educational Innovation Research 2017, Vol. 27, No. 2, pp DOI: : Researc

<5B D B3E220C1A634B1C720C1A632C8A320B3EDB9AEC1F628C3D6C1BE292E687770>

<353420B1C7B9CCB6F52DC1F5B0ADC7F6BDC7C0BB20C0CCBFEBC7D120BEC6B5BFB1B3C0B0C7C1B7CEB1D7B7A52E687770>

원고스타일 정의

Analysis of objective and error source of ski technical championship Jin Su Seok 1, Seoung ki Kang 1 *, Jae Hyung Lee 1, & Won Il Son 2 1 yong in Univ

<B9CCB5F0BEEEB0E6C1A6BFCDB9AEC8AD5F31322D32C8A35FBABBB9AE5FC3CAC6C731BCE25F6F6B5F E687770>

untitled

09È«¼®¿µ 5~152s

3 Gas Champion : MBB : IBM BCS PO : 2 BBc : : /45

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

???? 1

63-69±è´ë¿µ

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

07_À±ÀåÇõ¿Ü_0317

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

44-4대지.07이영희532~


Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Analysis of

Æ÷Àå½Ã¼³94š

04-다시_고속철도61~80p

878 Yu Kim, Dongjae Kim 지막 용량수준까지도 멈춤 규칙이 만족되지 않아 시행이 종료되지 않는 경우에는 MTD의 추정이 불가 능하다는 단점이 있다. 최근 이 SM방법의 단점을 보완하기 위해 O Quigley 등 (1990)이 제안한 CRM(Continu

제3장.ppt

08원재호( )

Microsoft PowerPoint - AC3.pptx

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

pdf 16..

44-3대지.08류주현c

232 도시행정학보 제25집 제4호 I. 서 론 1. 연구의 배경 및 목적 사회가 다원화될수록 다양성과 복합성의 요소는 증가하게 된다. 도시의 발달은 사회의 다원 화와 밀접하게 관련되어 있기 때문에 현대화된 도시는 경제, 사회, 정치 등이 복합적으로 연 계되어 있어 특

<333820B1E8C8AFBFEB2D5A B8A620C0CCBFEBC7D120BDC7BFDC20C0A7C4A1C3DFC1A42E687770>

<35335FBCDBC7D1C1A42DB8E2B8AEBDBAC5CDC0C720C0FCB1E2C0FB20C6AFBCBA20BAD0BCAE2E687770>

Output file

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: A Study on Organizi

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Nov.; 26(11),

16(1)-3(국문)(p.40-45).fm

< FC3D6C1BEBCF6C1A45FB1E2B5B6B1B3B1B3C0B0B3EDC3D E687770>

위해 사용된 기법에 대해 소개하고자 한다. 시각화와 자료구조를 동시에 활용하는 프로그램이 가지는 한계와 이를 극복하기 위한 시도들을 살펴봄으로서 소셜네트워크의 분석을 위한 접근 방안을 고찰해 보고자 한다. 2장에서는 실험에 사용된 인터넷 커뮤니티인 MLBPark 게시판

석사논문.PDF

<352EC7E3C5C2BFB55FB1B3C5EBB5A5C0CCC5CD5FC0DABFACB0FAC7D0B4EBC7D02E687770>

,. 3D 2D 3D. 3D. 3D.. 3D 90. Ross. Ross [1]. T. Okino MTD(modified time difference) [2], Y. Matsumoto (motion parallax) [3]. [4], [5,6,7,8] D/3

The characteristic analysis of winners and losers in curling: Focused on shot type, shot accuracy, blank end and average score SungGeon Park 1 & Soowo

ePapyrus PDF Document

11¹Ú´ö±Ô

DBPIA-NURIMEDIA

시안

3. 클라우드 컴퓨팅 상호 운용성 기반의 서비스 평가 방법론 개발.hwp

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jun.; 27(6),

Journal of Educational Innovation Research 2016, Vol. 26, No. 1, pp.1-19 DOI: *,..,,,.,.,,,,.,,,,, ( )

Journal of Educational Innovation Research 2017, Vol. 27, No. 1, pp DOI: * The

DBPIA-NURIMEDIA

Journal of Educational Innovation Research 2016, Vol. 26, No. 3, pp DOI: Awareness, Supports

¼º¿øÁø Ãâ·Â-1

Ch 1 머신러닝 개요.pptx

이용석 박환용 - 베이비부머의 특성에 따른 주택유형 선택 변화 연구.hwp

step 1-1

, ( ) 1) *.. I. (batch). (production planning). (downstream stage) (stockout).... (endangered). (utilization). *

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: A study on Characte

Analyses the Contents of Points per a Game and the Difference among Weight Categories after the Revision of Greco-Roman Style Wrestling Rules Han-bong

6

untitled

<31352DB0ADB9AEBCB32E687770>

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Study on the Pe

(JBE Vol. 7, No. 4, July 0)., [].,,. [4,5,6] [7,8,9]., (bilateral filter, BF) [4,5]. BF., BF,. (joint bilateral filter, JBF) [7,8]. JBF,., BF., JBF,.

188 최 영 환 청률을 통한 가치측정을 통한 자기 권리를 주장할 수 있 는 근거 자료로 활용할 수 있다. 즉, 방송사가 주장하는 낮은 중계권료를 주장할때는 프로야구가 낮은 시청률을 기록했을 때만이 정당하다. 하지만, 프로야구의 뜨거운 열기만큼이나 시청률도 급 성장세를

Transcription:

Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification 2008 8

Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification 2008 5 2008 6 ( ) ( ) Robert Ian McKay ( )

(rule-based classifier)., (decision boundary) (hypercube)...,. (hypersphere). UCI,. :,,, : 2006-21175 - i -

. 1. 4 2.1. (Rule) 4 2.2. (Rule-Based Classification) 5 2.3. (Inductive Logic Programming) 5 2.4. (Association Rule-based Classification) 6 2.5. (Decision Tree) 7. 9 3.1. 9 3.2. 10 3.3. 11 3.4. 12 3.5. 14 3.6. 15 3.7. 16. 18 4.1. 18 4.2. 18 4.3. SONAR 19 4.4. SPECTF 22 4.5. IONOSPHERE 24 4.6. Wisconsin Diagnostic Breast Cancer (WDBC) 26 4.7. 29. 31 5.1. 31 5.2. 32 33 - ii -

ABSTRACT 36 1. 2 2. 8 3. 9 4. 10 5. 13 6. SONAR 19 7. SONAR 20 8. SONAR r 21 9. SPECTF 22 10. SPECTF 23 11. SPECTF r 24 12. IONOSPHERE 24 13. IONOSPHERE 25 14. IONOSPHERE r 26 15. WDBC 26 16. WDBC 28 17. WDBC r 28 18. 30 1. 1 2. 18 3. SONAR 20 4. SPECTF 22 5. IONOSPHERE 25 6. WDBC 27 - iii -

. (rule-based classifier). 1... 표 1. 학습규칙들의예 ( 날씨 = 맑음, 온도 = 높음, 습도 = 높음, 바람 = 약함 ) à 테니스안침 ( 날씨 = 맑음, 온도 = 낮음, 습도 = 보통 ) à 테니스침 ( 날씨 = 흐림, 습도 = 높음, 바람 = 강함 ) à 테니스안침. (decision boundary) 1 (hypercube) - 1 -

. 그림 1. 초입방체들의집합형태의 결정경계....,.,,,. - 2 -

(hypersphere).. [1] [2] [3] [4].. 2. 3. 4 UCI machine learning repository [6] 30, 200. 5. - 3 -

. 2.1. (Rule) (antecedent) à (consequent).. ( =, =, = ) à à,,,,,,. ( =, =, =, = ),,... ( <25, 65< <75, >1.0) à 1. ( =23, =180, =68, =1.2) 1. - 4 -

2.2. (Rule-Based Classification) 2.1. Sequential Covering.. Sequential Covering AQ [7], CN2 [8], RIPPER [9]. 2.3. (Inductive Logic Programming) 2.2 (propositional rule). 1 (first-order Horn rule). FOIL (First-Order Inductive Learner) [10]. FOIL Sequential Covering. - 5 -

(Relational Learning). 2.4. (Association Rule-based Classification). CBA (Classification Based Associations) [11], CMAR (Classification Based on Multiple Association Rules) [12], CPAR (Classification Based on Predictive Association Rules) [13], MCAR (Multi-class Classification based on Association Rule) [14]. CBA (Breadth First Search) Apriori [15] support, confidence. CMAR (Depth First Search) FP-tree [16]. CPAR FOIL FOIL. - 6 -

CMAR. MCAR (support), (confidence),. 2.5. (Decision Tree).. ID3 [17], C4.5 [18], CART [19] information gain, gain ratio, gini index. 2. - 7 -

그림 2. 결정트리의예 [20]... - 8 -

. 3.1. (CVLS). 입력 : n 개의원소를가지는학습패턴집합 출력 : d 개의원소를가지는서브패턴집합 1. 학습패턴들을정형화 (Standardization) 하고서브패턴길이선택의확률분포 D 와 결정경계선택의확률분포 B 를균등분포 (uniform distribution) 로초기화 2. 학습패턴의임의의속성들을포함하는부분집합인서브패턴들을 d 개생성하여서 브패턴집합 S 를구성. 이때서브패턴의길이와결정경계는각각 D 와 B 의확률 분포에따라결정됨 3. 서브패턴집합을이용하여학습패턴들을분류함. 올바르게분류된학습패턴들은 Xc, 틀린학습패턴들은 Xw 에넣음. 모든학습패턴을바르게분류시종료 4. 각서브패턴원소의적응도를계산 5. D 와 B 를적응도높은서브패턴 개의길이분포와결정경계분포로 갱신 6. 적응도낮은하위서브패턴 개를학습패턴으로부터 새로생성한서브패턴으로교체. 이때생성되는서브패턴의길이의확률 과결정경계선택확률은각각 D 와 B 를따른다. 7. 3 단계로돌아감. 지정된횟수이상 loop 이반복되었으면종료 그림 3. 학습알고리즘 - 9 -

3.2.. =.. 2개의 5차원학습패턴 (0,1,0,0,1,true), (1,1,1,0,0,false) 존재할경우 (f 1 =0, f 3 =0, f 5 =1, class=true) (f 2=1, f 4=0, class=true) (f 2=1, f 3=1, f 4=0, class=false) (f 1 =1, f 2 =1, f 4 =0, f 5 =0, class=false) 그림 4. 서브패턴생성예 D. D 패턴의차원. B.., missing value. - 10 -

..,,.,,.. 3.3. p (1). (1) - 11 -

., 서브패턴의속성개수., (2). 의속성개수 (2) r.,. 3.4.. - 12 -

입력 : 서브패턴집합, m 차원입력패턴 출력 : 입력패턴의클래스 1. 입력패턴에대해서브패턴에정해진결정경계에따라 (1) 또는 (2) 식을 만족하는모든서브패턴들을추출 2. 추출된서브패턴들의클래스중가장많은클래스를입력패턴의클래스로정함 (majority voting) 그림 5. 패턴분류알고리즘 3.... (overfitting). 3. - 13 -

. 3.5. (3). (3) 3 s i mw, mr. s i cw, cr., 0 1.. (boosting) - 14 -

[21]... 3.6., 1-. 1-.. (Occam's razor)., D - 15 -

....,.. B.. 3.7., - 16 -

.....,.. [22]... - 17 -

. 4.1. UCI Machine Learning Repository 30, 200 4 CVLS. 2. 표 2. 실험데이터 이름 속성개수 레코드개수 SONAR 60 208 SPECTF 44 학습 : 80, 테스트 : 187 Wisconsin Diagnostic Breast Cancer (WDBC) 31 569 IONOSPHERE 34 351 4.2.,,.,, 50 [23]. Weka 3.5.7 [24]. - 18 -

10. SPECTF 10 fold stratified cross validation,. 4.3. SONAR 6 SONAR. Iteration, Accuracy.. 0.9 SONAR 0.89 0.88 0.87 Accuracy 0.86 0.85 0.84 0.83 0.82 0 5 10 15 20 25 30 Iteration 그림 6. SONAR 학습중의정확도변화 3 SONAR. CVLS. - 19 -

표 3. SONAR 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 CVLS 87.7404 1.798869 SVM (Pearson VII kernel [25]) 86.29808 1.225735 knn (k=1) 86.15385 0.871897 AdaBoost ( 반복횟수 =50) 84.66347 1.330242 Random Forest ( 트리수 =50) 84.18271 1.544634 Decision Tree (C4.5) 73.60576 2.618124 Naive Bayes 67.69233 1.082175 7 SONAR. Hypercube, Hypersphere, Sum. 17.. 700 600 Hypercube (Avg: 15.7792) Hypersphere (Avg: 19.5184) Sum (Avg: 17.9887) 500 Frequency 400 300 200 100 0 0 10 20 30 40 50 60 Subpattern Length 그림 7. SONAR 학습완료후서브패턴길이의분포 - 20 -

8 SONAR (1) (2) r.,,, 4. r r.. 0.885 0.88 0.875 Hypercube Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.87 0.865 Accurcy 0.86 0.855 0.85 0.845 0.84 0.835 1.5 1.75 2 2.25 2.5 r 그림 8. SONAR 유효성판별식의 r 과결정경계형태에따른정확도차이 - 21 -

4.4. SPECTF 9 SPECTF. SPECTF. 0.8 SPECTF 0.75 0.7 0.65 Accuracy 0.6 0.55 0.5 0.45 0.4 0.35 0 5 10 15 20 25 30 Iteration 그림 9. SPECTF 학습중의정확도변화 4 SPECTF. SPECTF. 표 4. SPECTF 에대한테스트정확도비교 분류기 평균정확도 (%) CVLS 76.2567 SVM (Pearson VII kernel) 74.3316 AdaBoost ( 반복횟수 =50) 73.262 Random Forest ( 트리수 =50) 71.6578 Naive Bayes 69.5187 Decision Tree (C4.5) 67.9144 knn (k=1) 61.4973-22 -

10 SPECTF 5. SPECTF. 900 800 700 Hypercube (Avg Length: 5.0724) Hypersphere (Avg Length: 5.6886) Sum (Avg Length: 5.3083) 600 Frequency 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 Subpattern Length 그림 10. SPECTF 학습완료후서브패턴길이의분포 11 SPECTF. - 23 -

78 77 76 75 Accuracy 74 73 72 Hypercube Hypersphere Mixed (same ratio) Mixed (different ratio) 71 70 69 1.5 1.75 2 2.25 2.5 r 그림 11. SPECTF 유효성판별식의 r 과결정경계형태에따른정확도 차이 4.5. IONOSPHERE 12 IONOSPHERE. 5 93%. 5 IONOPHERE 1 IONOSPHERE 0.95 0.9 Accuracy 0.85 0.8 0.75 0.7 0.65 0 5 10 15 20 25 30 Iteration 그림 12. IONOSPHERE 학습중의정확도변화. IONOSPHERE 80% - 24 -

CVLS. 표 5. IONOSPHERE 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 SVM (Pearson VII kernel) 94.47294 0.306258 AdaBoost ( 반복횟수 =50) 93.84616 0.54056 CVLS 93.6752 0.612517 Random Forest ( 트리수 =50) 93.56126 0.42893 Decision Tree (C4.5) 89.74360 1.433965 knn (k=1) 87.09403 0.485167 Naive Bayes 82.16516 0.48791 13 IONOSPHERE. IONOSPHERE. 1000 900 800 Hypercube (Avg Length: 14.3041) Hypersphere (Avg Length: 10.7683) Sum (Avg Length: 13.3606) 700 600 Frequency 500 400 300 200 100 0 0 5 10 15 20 25 30 35 Subpatten Length 그림 13. IONOSPHERE 학습완료후서브패턴길이의분포 14 IONOSPHERE SPECTF. - 25 -

0.945 0.94 0.935 Accuracy 0.93 0.925 Hypercube 0.92 Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.915 1.5 1.75 2 2.25 2.5 R 그림 14. IONOSPHERE 유효성판별식의 r 과결정경계형태에따른정확도 차이 4.6. Wisconsin Diagnostic Breast Cancer (WDBC) 15 WDBC. 5 95%. 0.97 WDBC 0.965 0.96 Accuracy 0.955 0.95 0.945 0.94 0 5 10 15 20 25 30 Iteration 그림 15. WDBC 학습중의정확도변화 - 26 -

6 WDBC. Naive Bayes 92%. IONOSPHERE CVLS. 표 6. WDBC 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 SVM (Pearson VII kernel) 97.4165 0.118609 AdaBoost ( 반복횟수 =50) 96.8717 0.272256 Random Forest ( 트리수 =50) 96.36204 0.36160 CVLS 95.8699 0.372823 knn (k=1) 95.67662 0.30101 Decision Tree (C4.5) 93.00528 0.976077 Naive Bayes 92.7724 0.146670 16 WDBC. WDBC 10 20.. - 27 -

1200 1000 800 Frequency 600 400 200 Hyperrect (Avg Length: 15.3387) Hypersphere (Avg Length: 15.3077) Sum (Avg Length: 15.3213) 0 0 5 10 15 20 25 30 Subpattern Length 그림 16. WDBC 학습완료후서브패턴길이의분포 17 WDBC. 0.9605 0.96 0.9595 Hypercube Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.959 Accuracy 0.9585 0.958 0.9575 0.957 0.9565 0.956 1.5 1.75 2 2.25 2.5 r 그림 17. WDBC 유효성판별식의 r 과결정경계형태에따른정확도차이 - 28 -

4.7. 4 CVLS. IONOSPHERE WDBC. CVLS. SONAR SPECTF Naive Bayes 60% CVLS. Naive Bayes Naive Bayes. CVLS..,. - 29 -

. 18. 5 20, 100 3000... 1 0.95 0.9 0.85 Accuracy 0.8 0.75 0.7 0.65 SPECTF SONAR IONOSPHERE WDBC 0 10 20 30 40 50 60 70 Number of subpatterns per training pattern 그림 18. 학습패턴당서브패턴의수에따른테스트정확도의변화 - 30 -

. 5.1.,.... UCI Machine Learning Repository 4. - 31 -

5.2. CVLS.... - 32 -

[1] S. Kim, S.-J. Kim, and B.-T. Zhang, Evolving hypernetwork classifiers for microrna expression profile analysis, IEEE Congress on Evolutionary Computation (CEC 07), pp. 313-319, 2007 [2] J.-W. Ha, J.-H. Eom, S.-C. Kim, and B.-T. Zhang, Evolutionary hypernetwork models for aptamer-based cardiovascular disease diagnosis, The Genetic and Evolutionary Computation Conference (GECCO 07), pp. 2709-2716, 2007. [3] S. Kim, M.-O. Heo, and B.-T. Zhang, Text classifiers evolved on a simulated DNA computer, IEEE Congress on Evolutionary Computation (CEC 06), pp. 9196-9202, 2006. [4] J.-K. Kim and B.-T. Zhang, Evolving hypernetworks for pattern classification, IEEE Congress on Evolutionary Computation (CEC 07), pp. 1856~1862, 2007. [5] A. Wojna, Combination of Metric-Based and Rule-Based Classification, Lecture Notes in Artificial Intelligence, vol. 3641, pp. 501-511, 2005. [6] A. Asuncion and D.J. Newman, UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/mlrepository.html]. Irvine, CA: University of California, School of Information and Computer Science, 2007. [7] J. Hong, I. Mozetic, and R. S. Michalski, AQ15: Incremental learning of attribute-based descriptions from examples, the method and user's guide, In Report ISG 85-5, UIUCDCS-F-86-949, Dept. Comp. Science, University of Illinois at Urbana-Champaign, 1986. [8] P. Clark and T. Niblett, The CN2 induction algorithm, Machine Learning, 3:261-283, 1989. [9] W. Cohen, Fast effective rule induction, Proc. 1995 International Conference on Machine Learning (ICML 95), pp. 115-123, Tahoe City, CA, 1995. [10] J. R. Quinlan and R. M. Cameron-Jones, FOIL: A midterm report, Proc. 1993 European Conference on Machine Learning, pp. 3-20, Vienna, Austria, 1993. [11] B. Liu, W. Hsu, and Y. Ma, Integrating Classification and Association - 33 -

Rule Mining, Proc. 1998 International Conference on Knowledge Discovery and Data Mining (KDD 98), pp. 80-86, New York, NY, 1998. [12] W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules, Proc. 2001 International Conference on Data Mining (ICDM 01), pp. 369-376, San Jose, CA, 2001. [13] X. Yin and J. Han, CPAR: Classification based on Predictive Association Rules, Proc. 2003 SIAM International Conference on Data Mining (SDM 03), pages 331-335, San Francisco, CA, 2003. [14] F. Thabtah, P. Cowling, and Y. Peng, MCAR: Multi-class Classification based on Association Rule, Proc. 3rd ACS/IEEE International Conference on Computer Systems and Applications, Cairo, Egypt, pp. 33-39, 2005. [15] R. Agrawal and R. Strikant, Fast algorithms for mining association rules in large databases, Proc. 20th Intermatonal Conference on Very Large Databases, pp. 487 499, Santiago, Chile, 1994. [16] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, Proc. 2000 ACM-SIGMOD International Conference on Management of Data (SIGMOD 00), pages 1-12, Dallas, TX, 2000. [17] J. R. Quinlan, Induction of decision trees, Machine Learning, 1:81-106, 1986. [18] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993 [19] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Wadsworth International Group, 1984. [20] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006. [21] Y. Freund and R. E. Schapire, A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, 1999. [22] J.-K. Kim, B.S. Kim, O.H. Kwon, S.K. Hwang, J.-W. Ha, C.-H. Park, D.J. Chung, C.H. Lee, J. Park, and B.-T. Zhang, A DNA computing-inspired silicon chip for pattern recognition, Preliminary Proceedings of the 13th International Meeting on DNA Computing (DNA13), pp. 373, 2007. [23] R. Caruana and A. Niculescu-Mizil, An empirical comparision of supervised learning algorithms, Proc. 23th International Conference on Machine Learning (ICML 06), pp. 161 168, 2006. - 34 -

[24] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005. [25] B. Uestuen, W.J. Melssen, and L.M.C. Buydens, Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel, Chemometrics and Intelligent Laboratory Systems, vol. 81, pp. 29-40, 2006. - 35 -

ABSTRACT Joo-Kyung Kim School of Computer Science and Engineering The Graduate School Seoul National University Rules generated by the rule-based classifiers are easily interpreted and we can directly control those with lower relevancy to classification. However, most current rule-based classifiers can only show locally optimized solution and decision boundary is represented only as a set of hypercubes which are parallel to axes of input space. In this paper, we introduce a subpattern-based classification model to overcome limits mentioned above. A subpattern is a subset of randomly selected attributes from each training pattern. After generated randomly, subpatterns will either remain or be substituted according to its fitness values. Because many combinations of subpatterns are tested in this process, the accuracy of the solution can be better than locally optimized solutions. Moreover, the proposed model allows both hypercubes and hyperspheres as the shapes of decision boundaries, and therefore it can represent various forms of decision boundaries. The proposed model demonstrated competitive accuracy with the UCI data, showing high accuracy with relatively high complexity issues especially. Keywords : rule-based classification, decision boundary, subpattern, random sampling Student Number : 2006-21175 - 36 -