Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification ( ) ( ) Robert Ian McKay ( )

Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification 2008 8

Evolutionary Optimization of a Collection of Variable-Length Subpatterns for Pattern Classification 2008 5 2008 6 ( ) ( ) Robert Ian McKay ( )

(rule-based classifier)., (decision boundary) (hypercube)...,. (hypersphere). UCI,. :,,, : 2006-21175 - i -

. 1. 4 2.1. (Rule) 4 2.2. (Rule-Based Classification) 5 2.3. (Inductive Logic Programming) 5 2.4. (Association Rule-based Classification) 6 2.5. (Decision Tree) 7. 9 3.1. 9 3.2. 10 3.3. 11 3.4. 12 3.5. 14 3.6. 15 3.7. 16. 18 4.1. 18 4.2. 18 4.3. SONAR 19 4.4. SPECTF 22 4.5. IONOSPHERE 24 4.6. Wisconsin Diagnostic Breast Cancer (WDBC) 26 4.7. 29. 31 5.1. 31 5.2. 32 33 - ii -

ABSTRACT 36 1. 2 2. 8 3. 9 4. 10 5. 13 6. SONAR 19 7. SONAR 20 8. SONAR r 21 9. SPECTF 22 10. SPECTF 23 11. SPECTF r 24 12. IONOSPHERE 24 13. IONOSPHERE 25 14. IONOSPHERE r 26 15. WDBC 26 16. WDBC 28 17. WDBC r 28 18. 30 1. 1 2. 18 3. SONAR 20 4. SPECTF 22 5. IONOSPHERE 25 6. WDBC 27 - iii -

. (rule-based classifier). 1... 표 1. 학습규칙들의예 ( 날씨 = 맑음, 온도 = 높음, 습도 = 높음, 바람 = 약함 ) à 테니스안침 ( 날씨 = 맑음, 온도 = 낮음, 습도 = 보통 ) à 테니스침 ( 날씨 = 흐림, 습도 = 높음, 바람 = 강함 ) à 테니스안침. (decision boundary) 1 (hypercube) - 1 -

. 그림 1. 초입방체들의집합형태의 결정경계....,.,,,. - 2 -

(hypersphere).. [1] [2] [3] [4].. 2. 3. 4 UCI machine learning repository [6] 30, 200. 5. - 3 -

. 2.1. (Rule) (antecedent) à (consequent).. ( =, =, = ) à à,,,,,,. ( =, =, =, = ),,... ( <25, 65< <75, >1.0) à 1. ( =23, =180, =68, =1.2) 1. - 4 -

2.2. (Rule-Based Classification) 2.1. Sequential Covering.. Sequential Covering AQ [7], CN2 [8], RIPPER [9]. 2.3. (Inductive Logic Programming) 2.2 (propositional rule). 1 (first-order Horn rule). FOIL (First-Order Inductive Learner) [10]. FOIL Sequential Covering. - 5 -

(Relational Learning). 2.4. (Association Rule-based Classification). CBA (Classification Based Associations) [11], CMAR (Classification Based on Multiple Association Rules) [12], CPAR (Classification Based on Predictive Association Rules) [13], MCAR (Multi-class Classification based on Association Rule) [14]. CBA (Breadth First Search) Apriori [15] support, confidence. CMAR (Depth First Search) FP-tree [16]. CPAR FOIL FOIL. - 6 -

CMAR. MCAR (support), (confidence),. 2.5. (Decision Tree).. ID3 [17], C4.5 [18], CART [19] information gain, gain ratio, gini index. 2. - 7 -

그림 2. 결정트리의예 [20]... - 8 -

. 3.1. (CVLS). 입력 : n 개의원소를가지는학습패턴집합 출력 : d 개의원소를가지는서브패턴집합 1. 학습패턴들을정형화 (Standardization) 하고서브패턴길이선택의확률분포 D 와 결정경계선택의확률분포 B 를균등분포 (uniform distribution) 로초기화 2. 학습패턴의임의의속성들을포함하는부분집합인서브패턴들을 d 개생성하여서 브패턴집합 S 를구성. 이때서브패턴의길이와결정경계는각각 D 와 B 의확률 분포에따라결정됨 3. 서브패턴집합을이용하여학습패턴들을분류함. 올바르게분류된학습패턴들은 Xc, 틀린학습패턴들은 Xw 에넣음. 모든학습패턴을바르게분류시종료 4. 각서브패턴원소의적응도를계산 5. D 와 B 를적응도높은서브패턴 개의길이분포와결정경계분포로 갱신 6. 적응도낮은하위서브패턴 개를학습패턴으로부터 새로생성한서브패턴으로교체. 이때생성되는서브패턴의길이의확률 과결정경계선택확률은각각 D 와 B 를따른다. 7. 3 단계로돌아감. 지정된횟수이상 loop 이반복되었으면종료 그림 3. 학습알고리즘 - 9 -

3.2.. =.. 2개의 5차원학습패턴 (0,1,0,0,1,true), (1,1,1,0,0,false) 존재할경우 (f 1 =0, f 3 =0, f 5 =1, class=true) (f 2=1, f 4=0, class=true) (f 2=1, f 3=1, f 4=0, class=false) (f 1 =1, f 2 =1, f 4 =0, f 5 =0, class=false) 그림 4. 서브패턴생성예 D. D 패턴의차원. B.., missing value. - 10 -

..,,.,,.. 3.3. p (1). (1) - 11 -

., 서브패턴의속성개수., (2). 의속성개수 (2) r.,. 3.4.. - 12 -

입력 : 서브패턴집합, m 차원입력패턴 출력 : 입력패턴의클래스 1. 입력패턴에대해서브패턴에정해진결정경계에따라 (1) 또는 (2) 식을 만족하는모든서브패턴들을추출 2. 추출된서브패턴들의클래스중가장많은클래스를입력패턴의클래스로정함 (majority voting) 그림 5. 패턴분류알고리즘 3.... (overfitting). 3. - 13 -

. 3.5. (3). (3) 3 s i mw, mr. s i cw, cr., 0 1.. (boosting) - 14 -

[21]... 3.6., 1-. 1-.. (Occam's razor)., D - 15 -

....,.. B.. 3.7., - 16 -

.....,.. [22]... - 17 -

. 4.1. UCI Machine Learning Repository 30, 200 4 CVLS. 2. 표 2. 실험데이터 이름 속성개수 레코드개수 SONAR 60 208 SPECTF 44 학습 : 80, 테스트 : 187 Wisconsin Diagnostic Breast Cancer (WDBC) 31 569 IONOSPHERE 34 351 4.2.,,.,, 50 [23]. Weka 3.5.7 [24]. - 18 -

10. SPECTF 10 fold stratified cross validation,. 4.3. SONAR 6 SONAR. Iteration, Accuracy.. 0.9 SONAR 0.89 0.88 0.87 Accuracy 0.86 0.85 0.84 0.83 0.82 0 5 10 15 20 25 30 Iteration 그림 6. SONAR 학습중의정확도변화 3 SONAR. CVLS. - 19 -

표 3. SONAR 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 CVLS 87.7404 1.798869 SVM (Pearson VII kernel [25]) 86.29808 1.225735 knn (k=1) 86.15385 0.871897 AdaBoost ( 반복횟수 =50) 84.66347 1.330242 Random Forest ( 트리수 =50) 84.18271 1.544634 Decision Tree (C4.5) 73.60576 2.618124 Naive Bayes 67.69233 1.082175 7 SONAR. Hypercube, Hypersphere, Sum. 17.. 700 600 Hypercube (Avg: 15.7792) Hypersphere (Avg: 19.5184) Sum (Avg: 17.9887) 500 Frequency 400 300 200 100 0 0 10 20 30 40 50 60 Subpattern Length 그림 7. SONAR 학습완료후서브패턴길이의분포 - 20 -

8 SONAR (1) (2) r.,,, 4. r r.. 0.885 0.88 0.875 Hypercube Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.87 0.865 Accurcy 0.86 0.855 0.85 0.845 0.84 0.835 1.5 1.75 2 2.25 2.5 r 그림 8. SONAR 유효성판별식의 r 과결정경계형태에따른정확도차이 - 21 -

4.4. SPECTF 9 SPECTF. SPECTF. 0.8 SPECTF 0.75 0.7 0.65 Accuracy 0.6 0.55 0.5 0.45 0.4 0.35 0 5 10 15 20 25 30 Iteration 그림 9. SPECTF 학습중의정확도변화 4 SPECTF. SPECTF. 표 4. SPECTF 에대한테스트정확도비교 분류기 평균정확도 (%) CVLS 76.2567 SVM (Pearson VII kernel) 74.3316 AdaBoost ( 반복횟수 =50) 73.262 Random Forest ( 트리수 =50) 71.6578 Naive Bayes 69.5187 Decision Tree (C4.5) 67.9144 knn (k=1) 61.4973-22 -

10 SPECTF 5. SPECTF. 900 800 700 Hypercube (Avg Length: 5.0724) Hypersphere (Avg Length: 5.6886) Sum (Avg Length: 5.3083) 600 Frequency 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 Subpattern Length 그림 10. SPECTF 학습완료후서브패턴길이의분포 11 SPECTF. - 23 -

78 77 76 75 Accuracy 74 73 72 Hypercube Hypersphere Mixed (same ratio) Mixed (different ratio) 71 70 69 1.5 1.75 2 2.25 2.5 r 그림 11. SPECTF 유효성판별식의 r 과결정경계형태에따른정확도 차이 4.5. IONOSPHERE 12 IONOSPHERE. 5 93%. 5 IONOPHERE 1 IONOSPHERE 0.95 0.9 Accuracy 0.85 0.8 0.75 0.7 0.65 0 5 10 15 20 25 30 Iteration 그림 12. IONOSPHERE 학습중의정확도변화. IONOSPHERE 80% - 24 -

CVLS. 표 5. IONOSPHERE 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 SVM (Pearson VII kernel) 94.47294 0.306258 AdaBoost ( 반복횟수 =50) 93.84616 0.54056 CVLS 93.6752 0.612517 Random Forest ( 트리수 =50) 93.56126 0.42893 Decision Tree (C4.5) 89.74360 1.433965 knn (k=1) 87.09403 0.485167 Naive Bayes 82.16516 0.48791 13 IONOSPHERE. IONOSPHERE. 1000 900 800 Hypercube (Avg Length: 14.3041) Hypersphere (Avg Length: 10.7683) Sum (Avg Length: 13.3606) 700 600 Frequency 500 400 300 200 100 0 0 5 10 15 20 25 30 35 Subpatten Length 그림 13. IONOSPHERE 학습완료후서브패턴길이의분포 14 IONOSPHERE SPECTF. - 25 -

0.945 0.94 0.935 Accuracy 0.93 0.925 Hypercube 0.92 Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.915 1.5 1.75 2 2.25 2.5 R 그림 14. IONOSPHERE 유효성판별식의 r 과결정경계형태에따른정확도 차이 4.6. Wisconsin Diagnostic Breast Cancer (WDBC) 15 WDBC. 5 95%. 0.97 WDBC 0.965 0.96 Accuracy 0.955 0.95 0.945 0.94 0 5 10 15 20 25 30 Iteration 그림 15. WDBC 학습중의정확도변화 - 26 -

6 WDBC. Naive Bayes 92%. IONOSPHERE CVLS. 표 6. WDBC 에대한테스트정확도비교 분류기 평균정확도 (%) 표준편차 SVM (Pearson VII kernel) 97.4165 0.118609 AdaBoost ( 반복횟수 =50) 96.8717 0.272256 Random Forest ( 트리수 =50) 96.36204 0.36160 CVLS 95.8699 0.372823 knn (k=1) 95.67662 0.30101 Decision Tree (C4.5) 93.00528 0.976077 Naive Bayes 92.7724 0.146670 16 WDBC. WDBC 10 20.. - 27 -

1200 1000 800 Frequency 600 400 200 Hyperrect (Avg Length: 15.3387) Hypersphere (Avg Length: 15.3077) Sum (Avg Length: 15.3213) 0 0 5 10 15 20 25 30 Subpattern Length 그림 16. WDBC 학습완료후서브패턴길이의분포 17 WDBC. 0.9605 0.96 0.9595 Hypercube Hypersphere Mixed (Same Ratio) Mixed (Different Ratio) 0.959 Accuracy 0.9585 0.958 0.9575 0.957 0.9565 0.956 1.5 1.75 2 2.25 2.5 r 그림 17. WDBC 유효성판별식의 r 과결정경계형태에따른정확도차이 - 28 -

4.7. 4 CVLS. IONOSPHERE WDBC. CVLS. SONAR SPECTF Naive Bayes 60% CVLS. Naive Bayes Naive Bayes. CVLS..,. - 29 -

. 18. 5 20, 100 3000... 1 0.95 0.9 0.85 Accuracy 0.8 0.75 0.7 0.65 SPECTF SONAR IONOSPHERE WDBC 0 10 20 30 40 50 60 70 Number of subpatterns per training pattern 그림 18. 학습패턴당서브패턴의수에따른테스트정확도의변화 - 30 -

. 5.1.,.... UCI Machine Learning Repository 4. - 31 -

5.2. CVLS.... - 32 -

[1] S. Kim, S.-J. Kim, and B.-T. Zhang, Evolving hypernetwork classifiers for microrna expression profile analysis, IEEE Congress on Evolutionary Computation (CEC 07), pp. 313-319, 2007 [2] J.-W. Ha, J.-H. Eom, S.-C. Kim, and B.-T. Zhang, Evolutionary hypernetwork models for aptamer-based cardiovascular disease diagnosis, The Genetic and Evolutionary Computation Conference (GECCO 07), pp. 2709-2716, 2007. [3] S. Kim, M.-O. Heo, and B.-T. Zhang, Text classifiers evolved on a simulated DNA computer, IEEE Congress on Evolutionary Computation (CEC 06), pp. 9196-9202, 2006. [4] J.-K. Kim and B.-T. Zhang, Evolving hypernetworks for pattern classification, IEEE Congress on Evolutionary Computation (CEC 07), pp. 1856~1862, 2007. [5] A. Wojna, Combination of Metric-Based and Rule-Based Classification, Lecture Notes in Artificial Intelligence, vol. 3641, pp. 501-511, 2005. [6] A. Asuncion and D.J. Newman, UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/mlrepository.html]. Irvine, CA: University of California, School of Information and Computer Science, 2007. [7] J. Hong, I. Mozetic, and R. S. Michalski, AQ15: Incremental learning of attribute-based descriptions from examples, the method and user's guide, In Report ISG 85-5, UIUCDCS-F-86-949, Dept. Comp. Science, University of Illinois at Urbana-Champaign, 1986. [8] P. Clark and T. Niblett, The CN2 induction algorithm, Machine Learning, 3:261-283, 1989. [9] W. Cohen, Fast effective rule induction, Proc. 1995 International Conference on Machine Learning (ICML 95), pp. 115-123, Tahoe City, CA, 1995. [10] J. R. Quinlan and R. M. Cameron-Jones, FOIL: A midterm report, Proc. 1993 European Conference on Machine Learning, pp. 3-20, Vienna, Austria, 1993. [11] B. Liu, W. Hsu, and Y. Ma, Integrating Classification and Association - 33 -

Rule Mining, Proc. 1998 International Conference on Knowledge Discovery and Data Mining (KDD 98), pp. 80-86, New York, NY, 1998. [12] W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules, Proc. 2001 International Conference on Data Mining (ICDM 01), pp. 369-376, San Jose, CA, 2001. [13] X. Yin and J. Han, CPAR: Classification based on Predictive Association Rules, Proc. 2003 SIAM International Conference on Data Mining (SDM 03), pages 331-335, San Francisco, CA, 2003. [14] F. Thabtah, P. Cowling, and Y. Peng, MCAR: Multi-class Classification based on Association Rule, Proc. 3rd ACS/IEEE International Conference on Computer Systems and Applications, Cairo, Egypt, pp. 33-39, 2005. [15] R. Agrawal and R. Strikant, Fast algorithms for mining association rules in large databases, Proc. 20th Intermatonal Conference on Very Large Databases, pp. 487 499, Santiago, Chile, 1994. [16] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, Proc. 2000 ACM-SIGMOD International Conference on Management of Data (SIGMOD 00), pages 1-12, Dallas, TX, 2000. [17] J. R. Quinlan, Induction of decision trees, Machine Learning, 1:81-106, 1986. [18] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993 [19] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Wadsworth International Group, 1984. [20] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006. [21] Y. Freund and R. E. Schapire, A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, 1999. [22] J.-K. Kim, B.S. Kim, O.H. Kwon, S.K. Hwang, J.-W. Ha, C.-H. Park, D.J. Chung, C.H. Lee, J. Park, and B.-T. Zhang, A DNA computing-inspired silicon chip for pattern recognition, Preliminary Proceedings of the 13th International Meeting on DNA Computing (DNA13), pp. 373, 2007. [23] R. Caruana and A. Niculescu-Mizil, An empirical comparision of supervised learning algorithms, Proc. 23th International Conference on Machine Learning (ICML 06), pp. 161 168, 2006. - 34 -

[24] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005. [25] B. Uestuen, W.J. Melssen, and L.M.C. Buydens, Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel, Chemometrics and Intelligent Laboratory Systems, vol. 81, pp. 29-40, 2006. - 35 -

ABSTRACT Joo-Kyung Kim School of Computer Science and Engineering The Graduate School Seoul National University Rules generated by the rule-based classifiers are easily interpreted and we can directly control those with lower relevancy to classification. However, most current rule-based classifiers can only show locally optimized solution and decision boundary is represented only as a set of hypercubes which are parallel to axes of input space. In this paper, we introduce a subpattern-based classification model to overcome limits mentioned above. A subpattern is a subset of randomly selected attributes from each training pattern. After generated randomly, subpatterns will either remain or be substituted according to its fitness values. Because many combinations of subpatterns are tested in this process, the accuracy of the solution can be better than locally optimized solutions. Moreover, the proposed model allows both hypercubes and hyperspheres as the shapes of decision boundaries, and therefore it can represent various forms of decision boundaries. The proposed model demonstrated competitive accuracy with the UCI data, showing high accuracy with relatively high complexity issues especially. Keywords : rule-based classification, decision boundary, subpattern, random sampling Student Number : 2006-21175 - 36 -