Semantic Processing of Engineering Documents in PLM Environment *KAIST 산업및시스템공학과 * 서효원교수 * 전상민박사과정 / 한국타이어 * 김경근박사과정 / 국방과학연구소 * 최승아석사과정
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 1
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 2
1. Background (AS-IS) 제품개발시 Engineering 문서폭증 요구사항 설계 해석 제조 시험 양산 어디서? 어떻게? 원하는문서를빠르게얻을수있을까? PLM이보편화 / 안정화 / 고도화단계 문서의저장 / 관리보다탐색 / 검색이더부각 기존 Engineering 문서의검색 Keyword 검색 선택의폭너무넓음 3
1. Background (TO-BE) 효율적인 Engineering 문서검색을위해, 문서 Package관리가아닌 Text 기반 Contents 관리 Keyword 검색이아니의미기반검색 의미기반검색을위해, 정보의 Semantics 구축필요 이를기반으로, 문서의 Semantic Processing 진행 *Semantic Processing = Syntax Processing(NLP) + Semantic Processing(Ontology) Semantic Processing 기반 Engineering 문서관리 정보검색의효율성 ( ) 정보재활용성 ( ) 정보의통합성 ( ) 4
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 5
2. New Approach Taxonomy Folksonomy PCD 구문분석 의미분석 UC: user created WS/SS/NS: well/semi/non structured PCD: producer-centriccentric data CCD: consumer-centric data Data Base Neutral Data CCD Engineers Engineering 문서 (WS/SS/NS) Semantic Processor Data & Knowledge Base 의미기반검색 WS data Engineer SN Data 참조의미모델 SNS users 정보생산측면 정보소비측면 S-NL ( 약식자연어처리 ) 분야별참조모델온톨로지의미표현의미유사도평가자기기반검색 6
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 7
Research Trend (1/2) 1. Wu Ying-Han; Shaw Heiu-Jou, Document based knowledge base engineering method for ship basic design, OCEAN ENGINEERING Volume: 38 Issue: 13 Pages: 1508-1521, 1521 SEP 2011 2. Wang Han-Hsiang; Boukamp Frank; Elghamrawy Tar, Ontology-Based Approach to Context Representation and Reasoning for Managing Context-Sensitive Construction Information, JOURNAL OF COMPUTING IN CIVIL ENGINEERING Volume: 25 Issue: 5 Pages: 331-346, 346 SEP-OCT 2011 3. Liu S.; McMahon C. A.; Culley S. J., A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management, COMPUTERS IN INDUSTRY Volume: 59 Issue: 1 Pages: 3-16, JAN 2008 4. S. Liu, C.A. McMahon *, M.J. Darlington, S.J. Culley, P.J. Wild, A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management, Advanced d Engineering i Informatics 20 (2006) 401 413 1 5. Zhanjun, L., Karthik, R., A., (2007), " Ontology-based design information extraction and retrieval " Artificial Intelligence for Engineering Design, Analysis and Manufacturing 21, pp. 137 154. 6. Zhanjun Li, Victor Raskin, Karthik Ramani, Developing Engineering Ontology for Information Retrieval, Journal of Computing and Information Science in Engineering, 3.2008, vol 8 7. Zhanjun Li, Maria C.Yang, Karthik Ramani, A methodology for engineering ontology acquisition and validation, Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 2009, vol 23 8
Research Trend (2/3) 8. Zhanjun Li Min Liu David C. Anderson Karthik Ramani, Semantic-based design knowledge annotation and retrieval, Proceedings of IDETC/CIE 2005 ASME 2005 International Design Engineering Technical Conferences & Computer and information in Engineering Conference September 24-28, 2005, Long Beach, California, USA 9. Deeptimahanti Deva Kumar, Ratna Sanyal(2008) Static UML Model Generator from Analysis of Requirements(SUGAR) 2008 Advanced Software Engineering & Its Applications, pp. 77 84. 10. Lin, JX ; Fox, MS ; Bilgic, T(1996) A Requirement Ontology for Engineering Design Concurrent Engineering-Research and Iapplications, Vol 4, Issue3, pp. 279-291. 11. Soner, K., Ozgur, A., Orkunt, S., Samet, A., Nihan, K.C., Ferda, N.A., (2012), " An ontology-based retrieval system using semantic indexing," Information Systems, 37, pp. 294-305. 12. Lin, M., H., (2009), " An optimal workload-based d data allocation approach for multidisk databases" Data and knowledge Engineering, 68, pp. 499 508. 13. Patricia, L., (2000), " Information extraction from documents for automating software testing," Artificial Intelligence in Engineering, 14, pp. 63-69 69. 14. Module-based Failure Propagation (MFP) model for FMEA, Int J Adv Manuf Technol, Kyoung-Won Noh, Hong-Bae Jun, Jae-Hyun Lee, Gyu-Bong Lee, Hyo-Won Suh, 2011 15. A Functional Basis for Engineering Design: Reconciling and Evolving Previous Efforts, NIST Technical Note 1447, Julie Hirtz, Robert B. Stone, Daniel A. McAdams, Simon Szykman, and Kristin L. Wood, 2002 9
Ontology-based design information extraction and retrieval ZHANJUN LI and KARTHIK RAMANI Artificial Intelligence for Engineering Design, Analysis and Manufacturing (2007), 21, 137 154. 10
1. Abstract t Increasing complexity of product design process the number of design documents has exploded To design information retrieval Shallow natural language process(nlp) Domain-specific design semantics/ontology Text/unstructured structured/semantic-based representation DOC (Text) Linguistic Patten Design Concept & Relationship Application Specific Design Semantics To improve the performance of design information retrieval Developed ontology-based query processing Users requests are interpreted based on domain-specific meanings Query Design Concept Concept Scoring & Pairing Doc. Retrieval 11
2. System Architecture t & Functional Diagram ODART: Ontology-based Design document Analysis and Retrieval Tool 구문분석의미분석 Doc Semantics Query Semantics Query Processed Query Query 12
3. Ontology Modeling Taxonomy < Linguistic Knowledge> Reference Model < Domain Knowledge> 13
4. Design Semantic Extraction ti < Linguistic Knowledge> < Design Semantic/Taxonomy Model> <Device Taxonomy> < Domain Knowledge> 14
4. Design Semantic Extraction ti
5. Evaluation Find products having DC motors 16
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 17
Introduction of Basic Algorithm for semantic document processing 18
1. 알고리즘 Outline 의미기반텍스트프로세싱 문서프로세싱 의미기반 CAD 프로세싱 19
2. 문서프로세싱 Engineering DOC ( 구조 / 텍스트 ) Bayesian classification 분류된문서 POI Extractor 타이틀 구조 텍스트 ( 키워드 ) ( 계층 ) ( 테이블 ) TF-IDF Tokenization 텍스트주제 단위문장 TF-IDF : Term Frequency - Inverse Document Frequency IE : Information Extraction OP : Ontology Population 인덱스 / 인스턴스 / 구조문장 IE, OP 20
21 2.1. 주요알고리즘 Input Algorithm Output 문서 ex) FMEA Doc. Bayesian classification 분류된문서 ex) Class 1 Brief Algorithm Description Doc Training Data Set Feature generation Classifier (model) Prediction Classification Example Training data sets w1 w2 Class FMEA Failure Analysis 1 Evaluation Measure 1 Concept Design 2 Detail Design 2 CAD Drawing 3.. (FMEA, Failure Analysis ) Output : Class 1 사용 Reference Training data sets 21
22 2.2. 주요알고리즘 Input Algorithm Output 분류된문서 Apache POI Extractor API 타이틀, 구조, Text 추출 Example Output : -Title : FMEA(FAILURE MODE AND EFFECTS ANALYSIS) Doc. -Structure : Meta-data : Image : 0 개, Table : 1 개, Text : 11 줄, 작성자 : Mr.An, 문서생성일 : 2010.3.20 -Text : FMEA(FAILURE MODE AND EFFECTS ANALYSIS) Doc. Reporting Person : Mr. An Reporting Date : 2010.05.01 -Overall assessment : There are mal-functions in Filtering to collect dust and remove gas. It makes a fan not purifying polluted air 사용 Reference 22
23 2.3. 주요알고리즘 Input Algorithm Output 문서 (Text) Inverted Index Inverted index Brief Algorithm Description 문서 Tokenization 문서의 Token List 생성 Token Normalization Token 별문서정보색인 Example Stopword list Term Document ID FMEA FMEA (FAILURE MODE AND EFFECTS -A FMEA (FAILURE (FAILURE MODE Doc. Reporting MODE AND AND EFFECTS Person EFFECTS 1 FMEA 1 : Mr. -And ANALYSIS) An. ANALYSIS) Doc. Reporting Doc. Reporting Date Reporting Person 2 FAILURE 1.2 : Person : : Mr. Mr. -Around An. There An. Reporting Reporting Date -Every are Date : : 2010.05.01. 2010.05.01. in Filtering to 3 Design 1,3 There -For collect There are mal-functions in Filtering to dust are mal-functions and remove gas. in Filtering It makes to a -From 4 Reporting 1,2,4 collect dust and remove gas. It makes a -In fan collect not purifying dust and polluted remove air gas. It makes a fan fan not not purifying purifying polluted polluted air air -Is -It... 사용 Reference 23
24 2.4. 주요알고리즘 Input Algorithm Output 문서 (Text) TF-IDF Text Weighting ( 중요도 ) Brief Algorithm Description Example Doc.1 Doc.2 Doc.3 FMEA 4 2 1 Failure 5 2 4 Lack 3 0 0 Fan 13 0 0 FMEA FMEA MODE AND FMEA (FAILURE (FAILURE MODE Doc. MODE AND AND EFFECTS Person EFFECTS ANALYSIS) : Mr. An. ANALYSIS) Doc. Doc. Reporting Date Reporting Person : Person : : Mr. Mr. An. There An. Reporting Reporting Date : 2010.05.01. There are are Date : 2010.05.01. mal-functions in in Filtering Filtering to to collect There dust are mal-functions and remove gas. in Filtering It makes to a collect fan collect dust and remove gas. It makes a fan not not dust and purifying polluted remove polluted air gas. It makes a fan not purifying polluted air air Doc.1 Doc.2 Doc.3 FMEA 0.365 0.5228 0.157 Failure 0.406 0.5228 0.365 Lack 0.602 0 0 Fan 1.146 0 0 Output : Fan is the most important word. 사용 Reference 24
3. 의미기반텍스트프로세싱 & 검색 단위문장 Query y( 문장 ) Tokenization Tokenization 단어 POS tagging POS Tagged 단어 Concept Disambiguation Concept 추출 Joining Concept Relationship Concept Indexing Lexicon Domain Ontology Semantic Doc. Representation Similarity 계산 단어 POS tagging POS Tagged 단어 Concept Disambiguation Concept 추출 Joining Concept Relationship Document 획득 Vector Space Model 25
31 3.1. 주요알고리즘 Input Algorithm Output 문장 Tokenization Token( 단어 ) Brief Algorithm Description 문장 단어구분자 ( 하이픈, 생략부호, 마침표..) Token 생성 Example A second DC motor rotates air that makes a smell removed A/ second/ DC/ motor / rotates/ air/ that/ makes/ a/ smell/ removed/ 사용 Reference 26
32 3.2. 주요알고리즘 Input Algorithm Output 문장 POS(Part-of-Speech) tagging POS tagging Brief Algorithm Description POS Tagging Lexicon DB Example A/ second/ DC/ motor / rotates/ air/ that/ makes/ a/ smell/ removed. A<DT> second<jj> DC<NN> motor > rotates<vbz> air<nn> that<tdt> makes<vbz> a<dt> smell<nn> removed<vbz>. DT Determiner JJ - Adjective NN - Noun, singular or mass VBZ - Verb, 3rd person singular present TO to CD - Cardinal number NNS - Noun, plural RB Adverb CC - Coordinating conjunction 사용 Reference Lexicon DB 27
33 3.3. 주요알고리즘 Input Algorithm Output 한글문장 한글형태소분석 한글 POS tagging Brief Algorithm Description 텍스트내용읽음 System Dictionary (283,948 단어 ) User Dictionary (Domain Lexicon DB) 한국어형태소분석 (Korean Morphological Analysis) 한국어품사태깅 (POS Tagging) Number Dictionary Tag Set Table 한나눔한글형태소분석기 (HanNanum v0.8.4) Example 유도탄중량을현재 100kg 에서 80kg 으로감량설계추진요망유도탄 /NC 중량 /NC 100/NN kg/f 에 /PV 80/NN kg/f 감량 /NC 설계 /NC 추진 /NC 요망 /NC 사용 Reference System Dictionary, User Dictionary, Number Dictionary, Tag Set Table 28
34 3.4. 주요알고리즘 Input Algorithm Output 문장 Concept Disambiguation 문장과가장유사한 Concept Brief Algorithm Description -Wm : weight of phrase ( contain only one word : Wm = 1, No match : Wm = 0, Rightmost matched : 0.55 [ 가장매칭되는단어에게높은점수 ] Rest of : split 0.45 equally ) Example ex) )phrase : second DC Motor 0.225 0.225 0.55 Concept 이 Motor 일경우, Motor Tscore= (1*(0+0+0.55) )/ 3 = 0.183 Concept이 AC Motor 일경우, Tscore= (1*(0+0+0 (0+0+0.55) )/ 3 = 0.183 AC-Motor DC-Motor Concept이 DC Motor 일경우 Tscore= (2*(0+0.225+0.55) )/ 3 = 0.516 DC Motor 가가장높은점수로선택됨. 사용 Reference 29
35 3.5. 주요알고리즘 Input Algorithm Output 문장 Joining Concept Relationship Brief Algorithm Description Joining Syntax Rule Example Input : DC motor rotates fan. DC<NN> motor<nn> rotates<vbz> fan<nn>. Syntax Rule : NN^VBZ^NN -> VBZ(NN,NN) Rotate(DC Fan, air) 사용 Reference 30
36 3.6. 주요알고리즘 Input Algorithm Output 문서 (Text) Concept Index Concept index Brief Algorithm Description 문서 Concept 추출 Concept Disambiguation 문서내 Concept List Concept 별문서정보색인 Example Report Term Document ID FMEA FMEA (FAILURE MODE AND EFFECTS FMEA (FAILURE (FAILURE MODE Doc. Reporting MODE AND AND EFFECTS Person EFFECTS 1 FMEA 1 : Mr. FEMA ANALYSIS) An. ANALYSIS) Doc. Reporting Doc. Reporting Date Reporting Person : Person : : Mr. Mr. 2 FAILURE 1.2 An. There An. Reporting Reporting Date are Date : : 2010.05.01. 2010.05.01. in Filtering to Failure 3 Dust 1,3 There are mal-functions in Filtering to collect There dust are mal-functions and remove gas. in Filtering It makes to a 4 Gas 1,2,4 collect dust and remove gas. It makes a fan collect not purifying dust and polluted remove air gas. It makes a fan fan not not purifying purifying polluted polluted air air Dust Gas 사용 Reference 31
37 3.7. 주요알고리즘 Input Algorithm Output Query, 문서들 Vector Space Model (Similarity) Query와가장유사한문서 Brief Algorithm Description Example 문서내각단어의 Score, ex) FMEA, Failure, Dust, Gas ) D2와 D3가가장유사함 사용 Reference 32
4. 의미기반 CAD 모델추출및검색 DOC DOC CAD 주요 Parameter 추출 Feature 추출 Lexicon Feature 속성추출 Feature Disambiguation Domain Ontology Feature XML Query Feature Indexing 33
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 34
Case Study Outline CASE 3. 유사한과거 FMEA 문서추출 CASE 2. Conceptual design of Semantic Issue management system CASE 1. 문서기반 CAD Model 검색 PCD Data Base Neutral Data CCD Engineers Engineering 문서 (WS/SS/NS) Semantic Processor Data & Knowledge Base 의미기반검색 WS data Engineer SN Data 참조의미모델 SNS users 정보생산측면 정보소비측면 35
CASE STUDY 1 : 문서기반 CAD Model 검색 36
Case Study 1 : 문서기반 CAD Model 검색 (1/16) 1. 재사용을위한 CAD Model 검색의문제점 CAD Model 의재사용 제품설계시 80% 의 CAD Model이재사용되고있음. 성공적인재사용은 50% 의비용절감효과발생. 현재 CAD Model 검색방법 CAD File 이름을통해 정보시스템 (PLM,PDM) 의정보 (BOM) 기반검색 사용자가의도한검색에한계 ( 검색결과중 48% 는사용불가 [1]) 보다효과적인검색및재사용을위해서 단순이름및정보기반이아닌 Semantic CAD Model 검색필요 CAD Model 에대한 Knowledge 추출및재사용필요 [1] LI, M, Y F Zhang*, J Y H Fuh and Z M Qiu, "Towards effective mechanical design reuse: feature-based CAD model retrieval on general shapes and partial shapes". JOURNAL OF MECHANICAL DESIGN, (2009). 37
Case Study 1 : 문서기반 CAD Model 검색 (2/16) 2. CAD Model 재사용예 <PLM> <Detail Design> Update Simulation Accuracy QFD, DFEMA, B/M Report Basic Spec. (Dimension, Construction, Material) Re-design Drawing M-BOM Tire Test Result Engineering Requirement Design Concept Design Simulation Manuf. Spec. Test Production Performance Test Sub Process CAD <CAD&CAE> Report Report Drawing Pattern Design Pattern Simulation Mold Drawing CAD Mesh ODB Report Construction Design Pre-Process (Meshing) FEA Simulation Post-Process (Report) Legend CAD : Output Sidewall Design : Process 38
Case Study 1 : 문서기반 CAD Model 검색 (3/16) 3. 문서기반 CAD 모델검색프로세스 (1/2) 유사도계산 Concept 문서 Semantic 문서모델 Semantic CAD 모델 CAD 모델 39
Case Study 1 : 문서기반 CAD Model 검색 (4/16) 3. 문서기반 CAD 모델검색프로세스 (1/2) Syntactic Analysis Semantic Analysis Doc Representation 문서 Text 추출 (Table 정보 ) Concept 추출 Concept XML 생성 DOC DOC DOC Tokenization Concept Disambiguation Concept Indexing Sentence Segmentation Concept Joining Semantic 문서모델생성 (OWL) Lexicon DB Ontology DB Semantic Model DB Feature Extraction Feature Analysis CAD Representation 주요 Parameter 추출 Feature Disambiguation Feature XML 생성 DOC DOC CAD Feature 추출 Feature Relationship 추출 Feature Indexing Feature 속성추출 Semantic CAD 모델생성 40
CASE STUDY 2 : Conceptual design of Semantic Issue management system 41
Case Study 2 : Conceptual design of Semantic Issue management system (1/19) 1. AS-IS AS-IS 1 매일매일시스템개발프로젝트와관련한 다양한형태의회의, 토의, 의사결정이이루 어지고있지만, 이에대한내용은참석자및 관련자 ( 회의록등을메일로전달받은사람 ) 들에게만전달되고있음. 2 시스템또는서브시스템의규격, 설계사항과 관련하여토의되는내용이대부분이지만, 이와관련한분류 ( 시스템구성에따른 ) 및변 경사항반영은현재는모두데이터베이스관 리자가직접수행하여야하는상태임. 3 심각한 overhead 발생. 4 변경사항반영이실시간으로이루어지지못 하고있으며, 누락사항발생도충분히가능 한상태임. 42
Case Study 2 : Conceptual design of Semantic Issue management system (2/19) 2. TO-BE 제안사항 1 조직구성 (OBS), 업무분할구조 (WBS), 시스템분해구조 (SBS), 요구사항전개구조 (RBS) 를 Ontology 모델등으로구축. Reference Ontology Model 2 회의및의사결정데이터입력 : 회의록 (document) 또는 SE 도구를이용한 well-structured data. 3 입력데이터를 Reference Ontology model을이용, 각조직, 업무, 시스템 에대하여 Semantic 하게할당, 재정리. TO-BE 1 엔지니어가출근하여 PLM시스템에로그인하게되면, 항상자기자신의업무 ( 또는조직, 담당시스템, 요구사항 ) 와관련하여토의되거나의사결정된내용을실시간으로확인가능. 2 자기자신이현재해결하여야할이슈 ( 회의의경우 Action item) 을실시간으로확인가능. 43
Case Study 2 : Conceptual design of Semantic Issue management system (3/19) 3. TO-BE Work Flow 44
Contents 1. Background 2. New Approach 3. Research Trend & Paper Introduction 4. Introduction of Basic Algorithm 5. Case Study 1,2,3 6. Conclusion 45
Conclusion 제품개발관련 Engineering 문서폭증 효율적검색방안필요 PLM 이보편화 / 안정화 / 고도화단계 탐색 / 검색기능부각 Keyword 검색 선택의폭너무넓음 Semantic Processing 기반 Engineering 문서관리 / 검색 정보검색의효율성 ( ) 정보재활용성 ( ) 정보의통합성 ( ) 46
Question and Answer THANK YOU 47