16 양동일(287~297).hwp

Similar documents
(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

09권오설_ok.hwp

APOGEE Insight_KR_Base_3P11

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Analysis of

0125_ 워크샵 발표자료_완성.key

<C1DF3320BCF6BEF7B0E8C8B9BCAD2E687770>


DBPIA-NURIMEDIA

Journal of Educational Innovation Research 2018, Vol. 28, No. 3, pp DOI: NCS : * A Study on

학습영역의 Taxonomy에 기초한 CD-ROM Title의 효과분석

45-51 ¹Ú¼ø¸¸

Journal of Educational Innovation Research 2017, Vol. 27, No. 1, pp DOI: * The

DIY 챗봇 - LangCon

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

05( ) CPLV12-04.hwp

À±½Â¿í Ãâ·Â

°í¼®ÁÖ Ãâ·Â

11¹ÚÇý·É

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

Journal of Educational Innovation Research 2017, Vol. 27, No. 2, pp DOI: : Researc

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Nov.; 26(11),

인문사회과학기술융합학회

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE. vol. 29, no. 10, Oct ,,. 0.5 %.., cm mm FR4 (ε r =4.4)


04-다시_고속철도61~80p

본문01

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

4번.hwp

*캐릭부속물

6.24-9년 6월

09È«¼®¿µ 5~152s

표현의 자유

#Ȳ¿ë¼®

서론 34 2

Journal of Educational Innovation Research 2017, Vol. 27, No. 4, pp DOI: A Study on the Opti

?? 1990년대 중반부터 일부 지방에서 자체적인 정책 혁신 을 통해 시도된 대학생촌관 정책은 그 효과에 비자발적 확산 + 대한 긍정적 평가에 힘입어 조금씩 다른 지역으로 수평적 확산이 이루어졌다. 이? + 지방 A 지방 B 비자발적 확산 중앙 중앙정부 정부 비자발적

DBPIA-NURIMEDIA

untitled

DBPIA-NURIMEDIA

2 : (Juhyeok Mun et al.: Visual Object Tracking by Using Multiple Random Walkers) (Special Paper) 21 6, (JBE Vol. 21, No. 6, November 2016) ht

09김정식.PDF

06_ÀÌÀçÈÆ¿Ü0926

○ 제2조 정의에서 기간통신역무의 정의와 EU의 전자커뮤니케이션서비스 정의의 차이점은

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Mar.; 25(3),

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

Journal of Educational Innovation Research 2016, Vol. 26, No. 3, pp DOI: Awareness, Supports

레이아웃 1

±è¼ºÃ¶ Ãâ·Â-1

300 구보학보 12집. 1),,.,,, TV,,.,,,,,,..,...,....,... (recall). 2) 1) 양웅, 김충현, 김태원, 광고표현 수사법에 따른 이해와 선호 효과: 브랜드 인지도와 의미고정의 영향을 중심으로, 광고학연구 18권 2호, 2007 여름

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 -

<31325FB1E8B0E6BCBA2E687770>

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

¼º¿øÁø Ãâ·Â-1

2 佛敎學報 第 48 輯 서도 이 목적을 준수하였다. 즉 석문의범 에는 승가의 일상의례 보다는 각종의 재 의식에 역점을 두었다. 재의식은 승가와 재가가 함께 호흡하는 공동의 場이므로 포 교와 대중화에 무엇보다 중요한 역할을 수행할 수 있다는 믿음을 지니고 있었다. 둘째

I

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: * A Research Trend

DBPIA-NURIMEDIA

8-VSB (Vestigial Sideband Modulation)., (Carrier Phase Offset, CPO) (Timing Frequency Offset),. VSB, 8-PAM(pulse amplitude modulation,, ) DC 1.25V, [2

09한성희.hwp

1 : (Sunmin Lee et al.: Design and Implementation of Indoor Location Recognition System based on Fingerprint and Random Forest)., [1][2]. GPS(Global P

歯1.PDF

04±èºÎ¼º

02김헌수(51-72.hwp

- 2 -

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE. vol. 29, no. 6, Jun Rate). STAP(Space-Time Adaptive Processing)., -

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

19_9_767.hwp

<BFA9BAD02DB0A1BBF3B1A4B0ED28C0CCBCF6B9FC2920B3BBC1F62E706466>

Journal of Educational Innovation Research 2017, Vol. 27, No. 3, pp DOI: (NCS) Method of Con

03-ÀÌÁ¦Çö

12È«±â¼±¿Ü339~370

<C0CCBCBCBFB52DC1A4B4EBBFF82DBCAEBBE7B3EDB9AE2D D382E687770>

디지털포렌식학회 논문양식

삼교-1-4.hwp

강의지침서 작성 양식

<333820B1E8C8AFBFEB2D5A B8A620C0CCBFEBC7D120BDC7BFDC20C0A7C4A1C3DFC1A42E687770>

02( ) SAV12-19.hwp

DBPIA-NURIMEDIA

<33C2F DC5D8BDBAC6AEBEF0BEEEC7D02D3339C1FD2E687770>

<C7C1B7A3C2F7C0CCC1EE20B4BABAF1C1EEB4CFBDBA20B7B1C4AA20BBE7B7CA5FBCADB9CEB1B35F28C3D6C1BE292E687770>

歯5-2-13(전미희외).PDF

<5B D B3E220C1A634B1C720C1A632C8A320B3EDB9AEC1F628C3D6C1BE292E687770>

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: * Suggestions of Ways

Analysis of objective and error source of ski technical championship Jin Su Seok 1, Seoung ki Kang 1 *, Jae Hyung Lee 1, & Won Il Son 2 1 yong in Univ


Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: * A S

... 수시연구 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 권혁구ㆍ서상범...

Journal of Educational Innovation Research 2016, Vol. 26, No. 3, pp.1-16 DOI: * A Study on Good School

Á¶´öÈñ_0304_final.hwp

11¹Ú´ö±Ô

Microsoft Word doc

에너지경제연구 제13권 제1호

30이지은.hwp

목 차 요약문 I Ⅰ. 연구개요 1 Ⅱ. 특허검색 DB 및시스템조사 5

大学4年生の正社員内定要因に関する実証分析

Transcription:

15 2 2011 4 스팸메일필터링을위한한글변칙어인식방법 안희국 *, 한욱표 *, 신승호 *, 양동일 **, 노희영 * Hee-Kook Ahn * Uk-Pyo Han * Seung-Ho Shin * Dong-Il Yang ** and Hee-Young Roh * 요약,.,.,..,..,, Smith-Waterman... Abstract As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit. Key words : Spam Mail Filtering, Korean Abnormal Language, Smith- Waterman Algorithm, Keyword Similarity I. 서론 * ** 1 (First Author) : : : 2011 3 23 ( ) : 2011 3 24 ( : 2011 4 23 ) : 2011 4 30

288 15 2 2011 4.,, /., 2003, 2009 2.16, 2008 2.12 0.04. [1]. (spam) (nonspam) [2]. (pre-acceptance) (post-acceptance) [3]. (Domain, IP, E-mail address), [4-8], (Support Vector Machine),, [9, 10]. 97% [11], (false positive) 15%, (.,, ). SVM 98.9% [12].,,.,. (ISP),,.,,.., (Dynamic Programmi- ng Algorithm. DPA).. DPA.,. II smith-watherman, III,.,. IV. V, VI,. II. 관련연구 2-1 Dynamic Programming Algorithm(DPA)

,,,, ; 289 1970 Needleman & Wunsch DPA divide & conquer.,, [13, 14]. S T (a,b,c,...z),, (scoring function: σ). (,, ). 1. x y, σ(x, y) x y (align), (scoring function: σ)., +2, -1 (penalty). a c σ(a, a)=σ(c, c)=+2, σ(a, c)=σ(a, -)=σ(-, c)=-1. 1 x(acbcdb) y(cadbd) 6-5=1., S T. V(i, j) S[1]...S[i] T[1]...T[j]., S T V(n, m). S i T 0 ( ) A(i,0). : A(0,0)=0 A(i,0)=A(i-1,0) + σ(s[i],-), for i>0 A(0,j)=A(0, j-1) + σ(,-, T[j]), for j>0, 0<i, 0<j. : A(i,j)=max( A(i-1,j-1) + σ(s[i],t[j]), A(i-1,j) + σ(s[i],-), A(i,j-1) + σ(-,t[j]) ),. S=BASEBALL T=BASKETBALL 2. 그림 1. 서열정렬의예 Fig. 1. An example of sequence alignment. 2. S T (optimal alignment) strings., DPA. DPA Needleman- Wunsch,. 그림 2. DPA 에의한 S, T 매트릭스 Fig. 2. S, T matrix of DPA. n, m 3. S =n, T =m string S, T

290 15 2 2011 4 A(i-1,j-1) + σ(s[i],t[j]), A(i-1,j) + σ(s[i],-), A(i,j-1) + σ(-,t[j]) ) 그림 3. DPA 에의한최적정렬탐색과정 Fig. 3. An optimal search of DPA., 4. S=abcxdex, T=xxxcde, match: +2, mismatch: -1, space: -1,. 5 A(6,6)=5., 0. 그림 4. DPA 에의한최적정렬탐색결과 Fig. 4. Result of optimal alignment with DPA..,. 2-2 Smith-waterman algorithm 그림 5. smith-waterman 알고리즘의회복과정 Fig. 5. Recovering of smith-waterman algorithm.. DPA m n,, Smith-Waterman [15]. 0, 0,.. : (0<i, 0<j ) A(i,0)=0 A(0,j)=0 : A(i,j)=max( 0, 그림 6. smith-waterman 알고리즘의부분정렬 Fig. 6. Partial alignment of smith-waterman algorithm. 6 3 1, 3 (2)+1 (-1)=5. III. 스팸메일의유형분석,. 3-1 스팸메일의유형분석 2009, 459,885

,,,, ; 291 44,678 9.5% 89% 419,126, 1.2% 6,081., 40%, 20%, 15%, 5%., 5. 1 2 HTML 3 URL 4 5,., URL,, HTML, Meta-character. 그림 7. 한국어스팸메일예와대응방법 Fig. 7. Response method and example for Korean spam mail.,. 그림 8. 일반정규식으로해결불가능한경우 Fig. 8. Impracticable case of regular expression.,.,,.. 3.,,.,..,. 3-2 한글변칙어의패턴분석,,,,.

292 15 2 2011 4 girl gir1 drag dr@g 그림 9. 영어의변칙어예 Fig. 9. An example of abnormal English..,,,,,, B, ㅂ1 그림 10. 한국어의변칙어예 Fig. 10. An example of abnormal Korean.,,,, ( ) (,, ). (,, ). 1,.,. ) : ㄷ H, 2.,. ) :, 3.,. ) :.,. ) :, : 4. ) :. 1,. 2,. 3.. 4,,,. IV. Koral(Korean Abnormal Language) 인식모듈 그림 11. 한글변칙어의특성 Fig. 11. Characteristics of abnormal Korean.,,. smith-waterman., Koral.

,,,, ; 293 4-1 한글변칙어인식을위한 DPA 의특성.,,. 그림 12. DPA 의특성분석 I Fig. 12. Analysis of DPA characteristics I. match: +2, mismatch:-1, space:-1 " ㄷ H ". 7., +2, 10,, -5.. (1) (sim)., 7/10=0.7.,,,. 0.875., " 0.555. 15,, 0.437,.,.,,,. ) & : 0.85 & ㄷH : 0.85,,., Needleman-Wunsch Smith-Waterman.,,,, 0.954.,. 4-2 알고리즘의상세.,. 그림 13. DPA 특성분석 II. Fig. 13. Analysis of DPA characteristics II. 13

294 15 2 2011 4 Koral.,,. 5-1 실험환경 그림 14. 스팸필터링시스템의구조 Fig. 14. Structure of spam mail filtering system. 14 Koral /.. Smith-Waterman. 3.2.,,,. Koral. 1.. 2.. 3. 6.. 4.. 5.., spam. (, ), nonspam. ( ).., Intel Pentium Processor 1.73GHz, 1G RAM Windows XP Professional. Microsoft C++. 6, 3019 50., Koral.. subject body. DPA,, 100, 204, 393, 807, 1875, 3794. 2081 44192, 28., subject body 1:1, 28 100, 204, 393, 807, 1875, 3794,. match : +2, mismatch : -1, space : -1. 5-2 실험결과 V. 실험

,,,, ; 295 그림 17. 메일의크기에따른요구공간 Fig. 17. Required space by mail size. 그림 15. 대출 의유사도 Fig. 15. Similarity value for " 대출.. 1,, 60%.,,., 0.7,,, i,.., ^*, ^, ^^, ㄷ H. 49.,,.,. 그림 16. 메일크기에따른소요시간 Fig. 16. Required time by mail size. 17 2Byte,. 6,493, 363KB. VI. 결론,.,. Smith-Waterman A, subject body A.,.,..,.,,,.

296 15 2 2011 4 참고문헌 [ 1 ], 2010 (National Informatization Protection White Paper)", pp. 107-109, 2010. [ 2 ]., 13 2, 12, 2004. [ 3 ] L. H. Gomes and C. Cazita, "Characterizing a Spam Traffic.," in Proc. 2004 Internet Measurement Conference, Taormina, Sicily, Italy. Oct. 2004. [4] V. Keselj, E. Milios, A. Tuttle, S. Wang, and R. Zhang. "TREC 2005 Spam Track: Spam Filtering Using N-gram-based Techniques", Proceedings of Text REtrieval Conference, 2005. [5],,,, 31 8, pp.1092-1100, 2004. [6] R. Segal. "IBM SpamGuru on the TREC 2005 Spam Track," Proceedings of Text REtrieval Conference, 2005. [7] Al Brakto, B. Filipic. "Spam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track," Proceedings fo Text REtrieval Conference, 2005. [8] L. A. Breyer. "DBACL at the TREC 2005," Proceedings of Text REtrieval Conference, 2005. [9] http://www.csie.ntu.edu.tw/~cjlin/libsvm [10],, URL, B, 15-B 1, pp.61-68, 2008. [ 11] F. Zhou, L. Zhuang, B. Zhao, L. Huang, A. Joseph, and J. Kubiatozicz, "Approximate object location and spam filtering on peer-to-peer systems," in Proc. Middleware, Rio de Janeiro, Brazil, June 2003. [12],, B, 17-B 3, pp.249-254, 2010. [ 13 ] S. B. Needleman and C. D. Wunsch. "A general method applicable to the search for similarities in the amino acid sequences of two proteins," Journal of Molecular Biology. vol. 48: 443-453, 1970. [ 14 ] Wagner, R. A. and Fischer, M. J. "The string-to-string correction problem," J. ACM 21, 168-173, Jan. 1974. [15 ] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, vol. 147(1): 195-197, Mar. 1981. 안희국 ( 安熙國 )) 2003 2 : 2007 2 : 2011 : :,, 한욱표 ( 韓旭彪 ) 1993 2 : 1996 2 : 2003 : 2007 8 : 2011 : :,, 신승호 ( 辛承浩 ) 1998 : 2000 : 2001 : 2005 : 2011 : :,,, 양동일 ( 梁東一 ) 2004 2 : 2007 8 : 2011 : :,,

,,,, ; 297 노희영 ( 盧熙瑩 ) 1972 ( - ) 1978 (Vordiploma- ) 1982 (Diploma- ) 1983 1984 2 ~ : :,,