(Special Paper) 24 1 2019 1 (JBE Vol. 24 No. 1 January 2019) https//doi.org/10.5909/jbe.2019.24.1.58 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) a) a) a) b) c) d) A Study on Named Entity Recognition for Effective Dialogue Information Prediction Myunghyun Go a) Hakdong Kim a) Heonyeong Lim a) Yurim Lee b) Minkyu Jee c) and Wonil Kim d)..... Abstract Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing word embedding and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain. Keyword Task-Oriented Dialogue System Word Embedding NER(Named Entity Recognition) Bi-LSTM a) (Department of Digital Contents Sejong University) b) (Department of Artificial Intelligence and Linguistic Engineering Sejong University) c) (Department of Software Convergence Sejong University) d) (Department of Software Sejong University) Corresponding Author (Wonil Kim) E-mail wikim@sejong.ac.kr Tel +82-3408-3795 ORCID https//orcid.org/0000-0002-1489-8427 This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2015R1D1A1A01060693) Manuscript received November 15 2018; Revised December 31 2018; Accepted December 31 2018. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http//creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited and not altered.
5 (Myunghyun Go et al. A Study on Named Entity Recognition for Effective Dialogue Information Prediction)........ 2 3. 4. 5.. 1. (Task- Oriented Dialogue System) (ChatBot).... [1]. 2. 1995 MUC-6(Message Understanding Conference). BIO (Tagging). BIO B(Beginning) I(Inside) O(Outside). [2]... [3][4]. 3. (Long Short-Term Memory) [5]. (Recurrent Neural Network) [6]. LSTM RNN
. LSTM RNN. (Bidirectional-LSTM) [7][8]. Bi-LSTM LSTM. 1 LSTM.. Bi-LSTM (Forward) t W2 W0 W1. (Backward) t W2 W4 W3. Bi-LSTM. 1. Bi-LSTM Fig. 1. Introduction of Bi-LSTM Model. 1. (TTA Telecommunications Technology Association) [9]. 15.. [9] 1.. 1. TTA Table 1. Comparison Between TTA Standard and Proposal Tag Set TTA Standard Tag Set Proposal Tag Set meaning Tag meaning Tag PERSON PS PERSON PS LOCATION LC LOCATION LC ORGANIZATION OG ORGANIZATION OG ARTIFACTS AF DATE DT DATE DT TIME TI TIME TI BRAND BR CIVILIZATION CV MODEL NAME MN ANIMAL AM PRODUCT TYPE PT PLANT PT SHAPE TYPE ST QUANTITY QT QUANTITY QT STUDY_FIELD FD MATERIAL MT THEORY TR INFLOW ROUTE IR EVENT MATERIAL TERM 2. EV MT TM 2.1 (Preprocessing).
5 (Myunghyun Go et al. A Study on Named Entity Recognition for Effective Dialogue Information Prediction) 2. (Char-CNN) Fig. 2. Character Level Convolutional Neural Network (Tokenizing).. (' ' 'NNP') (' ' 'NNP') (' ' 'XSN').. 2.2 (Word Embedding) - (One-Hot) Word2Vec [10] Glove [11]. (Char-CNN Character Level Convolutional Neural Network) [12]. 2. 2 2D. (Max-Pooling) (Dropout).. 2.3 (Neural Network Model) LSTM Bi-LSTM. 3 Bi-LSTM. Bi-LSTM.. (Fully Connected Layer). CRF [13].
3. Bi-LSTM Fig. 3. Bi-LSTM Neural Network Model. 4. KonNLPy Komoran [14] gensim word2vec [15] Glove [16]. (Google) (Tensorflow) [17]. 1. 4. Fig. 4. Named Entity Recognition Model Architecture 1500. BIO. 937. Word2Vec Glove. 40000. Word2Vec 6 Skip-gram 20. Glove Word2Vec. Word2Vec 50749 Glove 125706.
5 (Myunghyun Go et al. A Study on Named Entity Recognition for Effective Dialogue Information Prediction) 2. ( /F1 ) Table 2. Proposed Model Experiment Result(Accuracy/F1 score) User dictionary use not use Embedding model Word2Vec Glove Word2Vec Glove Word dimension Filter A size 23 Filter B size 2345 Filter C size 2468 Character dimension 100 150 100 150 100 150 100 95.961 / 81.864 95.132 / 78.762 95.536 / 80.249 95.541 / 80.350 95.718 / 80.880 95.528 / 80.297 150 96.427 / 83.431 95.154 / 79.512 95.548 / 80.541 95.397 / 79.747 95.761 / 80.317 95.504 / 80.528 100 95.045 / 76.904 96.098 / 82.034 95.387 / 80.341 95.268 / 80.214 95.509 / 80.050 95.251 / 80.192 150 95.467 / 80.150 95.169 / 79.369 95.476 / 80.744 95.31 / 80.019 95.667 / 80.763 95.395 / 80.400 100 93.083 / 65.513 92.16 / 61.414 92.383 / 61.300 92.61 / 61.518 92.346 / 63.154 92.995 / 61.124 150 93.051 / 63.358 91.782 / 58.732 92.888 / 60.797 92.616 / 61.130 92.955 / 63.169 92.655 / 62.648 100 92.561 / 63.424 92.888 / 63.599 92.598 / 63.566 92.675 / 62.827 92.282 / 60.491 92.437 / 61.862 150 92.818 / 63.578 92.589 / 60.729 92.999 / 63.331 92.608 / 61.964 92.599 / 63.071 91.844 / 59.497 2.... 20 20. 128 Bi-LSTM 600.. 2 / F1. 3.. 3 F1 3. ( - ) Table 3. Summary of Experiment Result(user dictionary - embedding model) User dictionary use not use Embedding model Word2Vec Glove Embedding model Accuracy average F1 score average Word2Vec 95.601 80.540 Glove 95.420 80.098 total 95.510 80.319 Word2Vec 92.627 61.988 Glove 92.575 62.328 total 92.601 62.158 4. ( - ) Table 4. Summary of Experiment Result(embedding model - filter shape) Filter shape Accuracy average F1 score average Filter A 94.094 71.573 Filter B 94.065 70.704 Filter C 94.183 71.515 total 94.114 71.264 Filter A 94.079 71.223 Filter B 94.04 71.626 Filter C 93.873 70.791 total 93.998 71.213
. Word2Vec Glove 4......... Bi-LSTM.... Bi-LSTM CRF. (References) [1] J. Huang O. Kwon K. Lee and Y. Kim A Chatter Bot for a Task-Oriented Dialogue System KIPS Transactions on Software and Data Engineering Vol.6 No.11 pp.499-506 Nov 2017 [2] D. Nadeau and S. Sekine A survey of named entity recognition and classification Lingvisticae Investigationes Vol.30 No.1 pp.3-26 Jan 2007 [3] S. Na and J. Min Character-Based LSTM CRFs for Named Entity Recognition Proceedings of KISS Korea Computer Congress pp.729-731 Jun 2016 [4] S. Nam Y. Hahm and K. Choi Application of Word Vector with Korean Specific Feature to Bi-LSTM model for Named Entity Recognition Human & Cognitive Language Technology(HCLT 2017) Oct 2017 [5] S. Hochreiter and J. Schmidhuber LONG SHORT-TERM ME- MORY Neural Computation Archive Vol.9 No.8 pp.1735-1780 Nov 1997 [6] JL. Elman Finding Structure in Time Cognitive Science Vol.14 No.2 pp.179-211 Mar 1990 [7] G. Lample M. Ballesteros S. Subramanian K. Kawakami and C. Dyer Neural Architectures for Named Entity Recognition Proceedings of NAACL-HLT 2016 pp.260-270 Jun 2016 [8] X. Ma and E. Hovy End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF arxiv preprint arxiv1603.01354 2016 [9] TTA Tag Set and Tagged Corpus for Named Entity Recognition TTAK.KO-10.0852 2015 [10] T. Mikolov I. Sutskever K. Chen GS. Corrado and J. Dean Distributed Representations of Words and Phrases and their Compositionality In Advances in Neural Information Processing Systems pp.3111 3119 2013 [11] J. Pennington R. Socher and C. Manning GloVe Global Vectors for Word Representation In Proceedings of EMNLP-2014 pp.1532 1543 Oct 2014 [12] X. Zhang J. Zhao and Y. LeCun Character-level Convolutional Networks for Text Classification Advances in Neural Information Processing Systems 28 (NIPS 2015) Vol.1 pp.649-657 2015 [13] C. Sutton and A. McCallum An Introduction to Conditional Random Fields for Relational Learning Foundations and Trends in Machine Learning Vol.2 2006 [14] E. Park and S. Cho KoNLPy Korean natural language processing in Python The 26th Annual Conference on Human & Cognitive Language Technology pp.133-136 Oct 2014 [15] Gensim Topic Modelling for Humans https//radimrehurek.com/gensim (accessed Jul. 1 2018). [16] GloVe Global Vectors for Word Representation https//nlp.stanford. edu/projects/glove/ (accessed Jun. 1 2018). [17] Tensorflow https//www.tensorflow.org (accessed Jun. 1 2018).
고명현 외 5인 효율적 대화 정보 예측을 위한 개체명 인식 연구 (Myunghyun Go et al. A Study on Named Entity Recognition for Effective Dialogue Information Prediction) 저자소개 고명현 년 세종대학교 디지털콘텐츠학과 학사 년 현재 세종대학교 디지털콘텐츠학과 석사과정 주관심분야 텍스트 마이닝 기계학습 딥러닝 - 2016-2016 ~ - ORCID https//orcid.org/0000-0002-6036-4717 김학동 년 경성대학교 컴퓨터공학과 학사 년 현재 세종대학교 디지털콘텐츠학과 석 박사통합과정 주관심분야 머신러닝 딥러닝 자연어처리 - 2016-2017 ~ - ORCID https//orcid.org/0000-0003-3816-1224 임헌영 년 세종대학교 디지털콘텐츠학과 학사 년 현재 세종대학교 디지털콘텐츠학과 석사과정 주관심분야 컴퓨터 비전 기계학습 딥러닝 - 2017-2017 ~ - ORCID https//orcid.org/0000-0002-8547-6248 이유림 년 세종대학교 디지털콘텐츠학과 학사 년 현재 세종대학교 인공지능언어공학과 석사과정 주관심분야 텍스트 마이닝 자연어 처리 딥러닝 - 2018-2018 ~ - ORCID https//orcid.org/0000-0001-8309-090x 지민규 년 세종대학교 천문우주학과 학사 년 현재 세종대학교 소프트웨어융합학과 석사과정 주관심분야 텍스트 마이닝 기계학습 딥러닝 - 2018-2018 ~ - ORCID https//orcid.org/0000-0002-3089-1452 65
- 1981 12 ~ 1985 7-1982 - 1987-1990 - 1994-2000 & - 2000 1 ~ 2001 3 Bhasha INC Technical Staff ( ) - 2002 3 ~ 2003 8 BK - 2003 9 ~ 2017 2-2017 3 ~ - ORCID https//orcid.org/0000-0002-1489-8427 -