오피니언검색연구동향 Research Trends in Opinion Retrieval 2013 년 6 월 26 일 고려대학교자연어처리연구실이승욱

Similar documents
KCC2011 우수발표논문 휴먼오피니언자동분류시스템구현을위한비결정오피니언형용사구문에대한연구 1) Study on Domain-dependent Keywords Co-occurring with the Adjectives of Non-deterministic Opinion

R을 이용한 텍스트 감정분석

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

정보기술응용학회 발표

<BCBCC1BEB4EB BFE4B6F72E706466>

04서종철fig.6(121~131)ok

세종대 요람

김기남_ATDC2016_160620_[키노트].key

03-최신데이터

학습영역의 Taxonomy에 기초한 CD-ROM Title의 효과분석

ADU

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

<353420B1C7B9CCB6F52DC1F5B0ADC7F6BDC7C0BB20C0CCBFEBC7D120BEC6B5BFB1B3C0B0C7C1B7CEB1D7B7A52E687770>

-

Microsoft Word - Westpac Korean Handouts.doc

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: * A S

¼Ò½ÄÁö21È£

대우증권인_표지수정

시안

ePapyrus PDF Document

PowerPoint 프레젠테이션

13.12 ①초점

DW 개요.PDF

<443A5CB1E8BFF8BAD05C B3E2B0E6C1A6C6F7C4BFBDBA5C C E2E2E>

SchoolNet튜토리얼.PDF


백서2011표지

<C5EBC0CFB0FA20C6F2C8AD2E687770>

歯 PDF

?????

44-4대지.07이영희532~

무제


(316) =.hwp

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월

1. 연구 개요 q 2013년 연구목표 제2-1과제명 건축물의 건강친화형 관리 및 구법 기술 연구목표 건강건축 수명예측 Lifecycle Health Assessment (LHA) 모델 개발 건축물의 비용 기반 분석기술(Cost-based Lifecycle Health

½Éº´È¿ Ãâ·Â


°í¼®ÁÖ Ãâ·Â

<C1B6BBE7BFACB1B D303428B1E8BEF0BEC B8F1C2F7292E687770>

Disclaimer IPO Presentation,. Presentation...,,,,, E.,,., Presentation,., Representative...

2013<C724><B9AC><ACBD><C601><C2E4><CC9C><C0AC><B840><C9D1>(<C6F9><C6A9>).pdf

1

5-김재철

0125_ 워크샵 발표자료_완성.key

08년csr3호

*5£00̽ÅÈ�

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Analysis of

<4D F736F F D203032B1E8C1D6BCBA5FC6AFC1FD5F2DC3D6C1BEBCF6C1A45FBCF6C1A42E646F6378>

미리보는 216 년 미국 대선 (1) 극과 극 조연주 ( ) 급진적 보수, 급진적 진보 공약이 대중의 지지률 얻어 극과 극으로 치닫는 216 년 미국 대선 216년에는 미국 대선 이벤트에 주목해야 된다. 이는 미국 대선이 흔들리는 세계 경제를 바로 잡아

DBPIA-NURIMEDIA

KAGRO

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi


스마트폰 애플리케이션 시장 동향 및 전망 그림 1. 스마트폰 플랫폼 빅6 스마트폰들이 출시되기 시작하여 현재는 팜의 웹OS를 탑재한 스마트폰을 제외하고는 모두 국내 시장에도 출 시된 상황이다. 이들 스마트폰 플랫폼이 처해있는 상황 과 애플리케이션 시장에 대해 살펴보자.

광운소식65호출력

<352EC7E3C5C2BFB55FB1B3C5EBB5A5C0CCC5CD5FC0DABFACB0FAC7D0B4EBC7D02E687770>

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

Journal of Educational Innovation Research 2017, Vol. 27, No. 4, pp DOI: A Study on the Opti


Copyrights and Trademarks Autodesk SketchBook Mobile (2.0.2) 2013 Autodesk, Inc. All Rights Reserved. Except as otherwise permitted by Autodesk, Inc.,

FSB-6¿ù-³»Áö

Oracle Apps Day_SEM

15_3oracle

°øÁõ°ú½Å·Ú_º»¹®.PDF

45-51 ¹Ú¼ø¸¸

<30332DB1E2C8B9C6AFC1FD28B7F9C0E7C8AB D E687770>

<30382E20B1C7BCF8C0E720C6EDC1FD5FC3D6C1BEBABB2E687770>

03-ÀÌÁ¦Çö

FMX M JPG 15MB 320x240 30fps, 160Kbps 11MB View operation,, seek seek Random Access Average Read Sequential Read 12 FMX () 2

<91E6308FCD5F96DA8E9F2E706466>

V i s i o n o f M e d i c i n e 인류의 건강과 행복을 향한 서울대학교병원의 비전 Winter / VOL view 1 / 왕규창 시간에 대한 단상( 斷 想 ) interview 3 / 김혜선

±è¼ºÈñ.hwp

05_±è½Ã¿Ł¿Ü_1130

우리들이 일반적으로 기호

개정판 서문 Prologue 21세기 한국경제를 이끌어갈 후배들에게 드립니다 1부 인생의 목표로써 CEO라는 비전을 확고히 하자 2부 인생의 비전을 장기 전략으로 구체화하라 1장 미래 경영환경 이해하기 20p 4장 장기 실행 전략 수립하기 108p 1) 미래 환경분석이

Main Title

AT_GraduateProgram.key

6.24-9년 6월

?

BSC Discussion 1

<BFACB1B85F D30335FB0E6C1A6C0DAC0AFB1B8BFAA2E687770>

엘에스터_06월_내지.indd

UML

DBPIA-NURIMEDIA


Microsoft Word K_01_07.docx

박선영무선충전-내지

구대환 (134~153)97.PDF

09오충원(613~623)

Social Network

untitled

I

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

Journal of Educational Innovation Research 2016, Vol. 26, No. 1, pp.1-19 DOI: *,..,,,.,.,,,,.,,,,, ( )

<BEF0B7D0C1DFC0E B3E220BABDC8A32E706466>

8º»¹®-ÃÖÁ¾-¼öÁ¤

<333820B1E8C8AFBFEB2D5A B8A620C0CCBFEBC7D120BDC7BFDC20C0A7C4A1C3DFC1A42E687770>

대우증권인-11표지최종

<B9DABCBABCF62E687770>

Transcription:

오피니언검색연구동향 Research Trends in Opinion Retrieval 2013 년 6 월 26 일 고려대학교자연어처리연구실이승욱 (swlee@nlp.korea.ac.kr)

Contents 1. 서론 2. 오피니언마이닝 & 오피니언검색 3. 오피니언검색연구동향 4. 결론 5. Appendix: 데이터셋과성능평가, 참고문헌 2

서론 (1) 소셜미디어의증가 3

서론 (2) 개인중심의미디어의급증 블로그, 포럼, 게시판, 소셜네트워크서비스 (SNS) 다양한주제에대해자유롭게개인의의견이나느낌을표현 의견 (Opinion) A belief about matters commonly considered to be subjective, i.e., it is based on that which is less than absolutely certain, and is the result of emotion or interpretation of facts This may refer to unsubstantiated ( 근거없는 ) information, in contrast to knowledge and fact-based belief - Wikipedia 4

서론 (3) 서비스업체, 제조업체, 관공서등다양한조직에서제품, 서비스, 정책의품질을개선하기위해사람들의평가를반영하고자노력 온라인환경에축적된대량의주관적피드백 (subjective feedback) 은품질향상과마케팅전략을수립하는데중요한잠재적정보원으로활용가능 기존의사람들이작성한설문조사는시간적, 인적자원이지나치게소모 일반사용자역시의사결정을위해다른이들이가진생각을참조 제품구매에앞서온라인리뷰문서참고 대다수온라인쇼핑몰은평가및상품평을제공 오피니언분석은산학에서중요한이슈로떠오름 5

6

Contents 1. 서론 2. 오피니언마이닝 & 오피니언검색 3. 오피니언검색연구동향 4. 결론 5. Appendix: 데이터셋과성능평가, 참고문헌 7

오피니언마이닝 Opinion Mining/Sentiment Analysis The application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials - Wikipedia Opinion Word Dictionary Construction Opinion Summarization Polarity Classification Opinion Mining Opinion Extraction Automatic Feature Extraction Opinion Retrieval 8

오피니언검색 (1) Document Collection Information (Ad-hoc) Retrieval Relevant Sentiment Analysis Subjective Opinion Retrieval 9

오피니언검색 (2) Query: iphone 5 Document Collection Information (Ad-hoc) Retrieval Sentiment Analysis iphone 5 s spec Relevant Subjective Nikon D4 is amazing Battery life of iphone 5 The spaghetti was awful The history of computer science Opinion Retrieval iphone 5 is the best iphone 5 is not innovative any more 10

오피니언검색 (3) Query: iphone 5 11

오피니언검색 (4) Query: Barack Obama 12

오피니언검색 (5) 오피니언마이닝에있어서전처리역할 오피니언검색은특정주제에관련된의견을포함하는문서를순위화 오피니언마이닝은일반적으로사람, 제품등과같은특정한주제 (target-specific) 를대상으로수행 문서마다연관성과의견성을측정가능하다면우선순위를두어마이닝수행가능 비연관문서나의견을포함하지않은문서를여과하여마이닝시스템의효율성증대가능 13

오피니언검색 (6) 가장쉬운방법? 기존검색시스템의질의어 (query) 에의견어 (opinion words) 를추가하여검색 예 ) Windows 8 Window 8 amazing awkward defects flaws 한계점 질의어와의견어의단어가중치 (term weighting) 을위한방법의부재 Topic drift 현상 제한된수의의견어사용 낮은효율성 14

오피니언검색 (6) 가장쉬운방법? 기존검색시스템의질의어 (query) 에의견어 (opinion words) 를추가하여검색 예 ) Windows 8 Window 8 amazing awkward defects flaws 한계점 질의어와의견어의단어가중치 (term weighting) 을위한방법의부재 Topic drift 현상 제한된수의의견어사용 낮은효율성 15

Contents 1. 서론 2. 오피니언마이닝 & 오피니언검색 3. 오피니언검색연구동향 4. 결론 5. Appendix: 데이터셋과성능평가, 참고문헌 16

Text REtrieval Conference (TREC) 출처 : Information Retrieval: A Health and Biomedical Perspective, Third Edition, William Hersh, M.D. 17

2 단계 (two-phase) 접근법 (1) 문서의적합성 (topical relevance) 과의견성 (subjectivity) 을각각계산한후선형적 (linear) 으로결합 가장많이사용된단순한방법론 BM25 나언어모델과같은기존정보검색모델을그대로이용하여적합성을계산 의견성계산을위한방법론연구에집중 이론적근거의부재 18

2 단계 (two-phase) 접근법 (2) Query: iphone 5 Opinion word D 1 Apple iphone Steve Jobs amazing iphone.. D 2 non-innovative iphone nice Samsung Galaxy Note 2 Topic: 2/100 = 0.02 Topic: 1/100 = 0.01 Opinion: 1/100 = 0.01 Opinion: 2/100 = 0.02 Final: λ 0.02 + (1- λ ) 0.01 Final: λ 0.01 + (1- λ ) 0.02 19

통합적접근법 2 단계접근법이높은효율성과검색성능을보임에따라최근이론적근거를내세운다양한연구들이소개됨 Vector Space 모델에기반한연구 [Vechtomova, 2007] 문서내출현한의견어들에해당되는가중치를부여 질의확장 (query expansion) 기법을활용한연구 [Huang and Croft, 2009] 언어모델프레임워크에서질의에종속적 / 비종속적인의견어를질의어의확장된단어로서간주 생성모델에기반한연구 [Zhang and Ye, 08], [Lee et al, 11] 적합성과의견성을동시에고려 20

주요순위화요소 (1) 의견어 (opinion word) 대다수의연구는미리구축한의견어사전을활용 의견어들의문서내출현빈도는대다수의연구에서사용 [Kovacevic and Huang 2008], [Clark et al, 2006], [Hanna et al., 2007], 의견어들의개별중요성을계산하는데있어다양한기법들이소개됨 기계학습이용 [Joshi et al., 2006] Diversity 척도를이용 [Hanna et al., 2007] 학습집합을이용한확률로추정 [Huang et al., 2007] 품사정보를활용 [Wiebe et al., 2004] [Yang et al., 2006] 웹을활용 [Oard et al., 2006], [Turney and Littman 2003] WordNet 이나 SentiWordNet 을사용 [Kim and Hovy, 2005]. [Bermingham et al., 2008] [Na et al., 09] 21

주요순위화요소 (2) 근접성 (Proximity) Bag-of-Words 가정을넘어서서두단어의출현위치정보는검색순위화에있어도움된다고알려짐 오피니언검색에서역시질의단어와의견어들의위치정보는중요하게다루어짐 질의어와인접한의견어들이보다주제에적합한의견을표현하고있을가능성이높음 질의어부근에출현한의견어들의점수만을고려 [Yang et al, 2006] Vector space 모델에서질의어와인접한의견어들만고려 [Vechtomova, 2007] 질의에기정의된크기의 window ( 예, 10 단어 ) 내에출현한의견어만고려 [Zhou et at., 2007] 다양한종류의근접성기반커널함수 (proximity-based kernel functions) 를비교 [Gerani et al., 2010] 22

Contents 1. 서론 2. 오피니언마이닝 & 오피니언검색 3. 오피니언검색연구동향 4. 결론 5. Appendix: 데이터셋과성능평가, 참고문헌 23

결론 정보검색과오피니언마이닝기술을접목한오피니언검색은다양한요소를고려하며발전 의견어출현정보, 개별의견어중요도, 질의어와의근접성, 정보검색모델과의결합방법등 향후연구주제 언어학적정보 (linguistic clue) 를보다적극적으로활용 구문분석, 의미분석, 개체명인식, 대용어인식 (anaphora resolution) 노이즈제거기법 스팸탐지, HTML 태그처리, Blog 템플릿제거 SNS, micro blogs 등의도메인확장 트위터, 페이스북등의보다짧고철자오류가많이포함된텍스트를처리하기위한방법론고안 24

감사합니다. 25

Contents 1. 서론 2. 오피니언마이닝 & 오피니언검색 3. 오피니언검색연구동향 4. 결론 5. Appendix: 데이터셋과성능평가, 참고문헌 26

데이터셋 평가콜렉션 TREC Blogs06 3.2M permalinks ( 문서 ) 질의 2006, 2007, 2008 년에공개된 150 개의 topic 의견어사전 General Inquirer (3,600 단어 ) SentiWordNet 27

Topic 예제 28

Title List Topic-06 Topic-07 Topic-08 March of the Penguins, larry summers, state of the union, Ann Coulter, abramoff bush, macbook pro, jon stewart, super bowl ads, letting india into the club, arrested development, mardi gras, blackberry, netflix, colbert report, basque, Whole Foods, cheney hunting, joint strike fighter, muhammad cartoon, barry bonds, cindy sheehan, brokeback mountain, bruce bartlett, coretta scott king, american idol, life on mars, sonic food industry, jihad, hybrid car, natalie portman, Fox News Report, seahawks, heineken, Qualcomm, shimano, west wing, World Trade Organization, audi, scientology, olympics, intel, Jim Moran, zyrtec, board chess, Oprah, global warming, ariel sharon, Business Intelligence Resources, cholesterol, mcdonalds jstor, lactose gas, Steve jobs, alterman, king funeral, davos, brrreeeport, carrie underwood, Barilla, Aperto Networks, SCI FI CHANNEL, nasa, sag awards, northernvoice, allianz, dice com, snopes, varanasi, pfizer, andrew coyne, Christianity Today, howard stern, challenger, mark driscoll, mashup camp, hawthorne heights, oscar fashion, big love, brand manager, ikea, fort mcmurray, goobuntu, winter olympics, cointreau, mozart, grammys, LexisNexis, plug awards, Beggin Strips, Lance Armstrong, teri hatcher, lawful access, censure, Opera Software OR Opera Browser OR Opera Mobile OR Opera Mini, bolivia, tivo, sasha cohen, sorbonne, ford bell, Hitachi Data Systems Carmax, Wikipedia primary source, Jiffy Lube, Starbucks, Windows Vista, Mark Warner for President, women in Saudi Arabia, UN Commission on Human Rights, Frank Gehry architecture, Picasa, Chipotle Restaurant, Ed Norton, Iceland European Union, tax break for hybrid automobiles, Whole Foods wind energy, Papa John's Pizza, Mahmoud Ahmadinejad, MythBusters, China one child law, intelligent design, Sheep and Wool Festival, Subway Sandwiches, Yojimbo, Zillow, Nancy Grace, flag burning, NAFTA, Oregon Death with Dignity Act, Morgan Freeman, System of a Down, Sew Fast Sew Easy, I Walk the Line, World Bank, Ruth Rendell, Mayo Clinic, Project Runway, New York Philharmonic Orchestra, israeli government, The Geek Squad, TomTom, federal shield law, David Irving, A Million Little Pieces, talk show hosts, Women on Numb3rs, universal health care, Trader Joe's, Sopranos, YouTube, George Clooney 29

정답집합 각 Topic 마다적합하고의견을포함한정보를문서단위로단계적으로평가 풀링 (Pooling) 방법으로구축 다양한참가팀이제출한결과물에서상위랭크된문서들을합하여정답집합구축 Topical Retrieval Opinion Retrieval 30

평가척도 정보검색의다양한척도를사용하여평가를수행 Precision at 10, mean average precision (MAP) 검색결과를두가지측면 (topical retrieval, opinion retrieval) 에서각각평가 따라서네가지척도를사용 Topic MAP, Opinion MAP Topic P@10, Opinion P@10 31

Baseline 언어모델 (query likelihood model) 을이용하여정보검색과오피니언검색의성능측정 평탄화파라메터인 μ 에따라각각의 topic 집합마다유사한경향성을보임 정보검색과오피니언검색은강한 corelation 을가짐 μ 32

Baseline Correlation 33

TREC 참가팀의성능 Query expansion, phrasal indexing, passage-based search, 34

Reference (1) Vechtomova, O. (2007). Using subjective adjectives in opinion retrieval from blogs. In TREC 2007: Proceedings of The Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Huang and Croft (2009), A unified relevance model for opinion retrieval, Proceedings of the 18th ACM conference on Information and knowledge management, 2009 Zhang, M. and X. Ye (2008), A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In SIGIR 08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 411 418. ACM Lee, S.-W, Lee, J.-T., Song, Y.-I., Han, K.-S., Rim, H.-C (2011), A New Generative Opinion Retrieval Model Integrating Multiple Ranking Factors, Journal of Intelligent Information Systems (JIIS), May. 2011 Kovacevic, M. and X. Huang (2008). York University at TREC 2008: Blog Track. In TREC 2008: Proceedings of the Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Clark, M., U. C. Beresi, S. Watt, and D. Harper (2006). RGU at the TREC blog track. In TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Hannah, D., C. Macdonald, J. Peng, B. He, and I. Ounis (2007). University of glasgow at TREC 2007: Experiments in blog and enterprise tracks with terrier. In TREC 2007: Proceedings of The Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Joshi, H., C. Bayrak, and X. Xu (2006). UALR at TREC: Blog track. In TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Hoang, L., S.-W. Lee, G.-W. Hong, J.-Y. Lee, and H.-C. Rim (2008). A Hybrid Method for Opinion finding Task (KUNLP at TREC 2008 Blog Track). In TREC 2008: Proceedings of the Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. 35

Reference (2) Wiebe, J., T. Wilson, R. Bruce, M. Bell, and M. Martin (2004). Learning subjective language. Computational linguistics 30(3), 277 308. Yang, K., N. Yu, A. Valerio, and H. Zhang (2006). WIDIT in TREC-2006 Blog Track. In TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Oard, D., T. Elsayed, J. Wang, Y. Wu, P. Zhang, E. Abels, J. Lin, and D. Soergel (2006). TREC-2006 at Maryland: Blog, Enterprise, Legal and QA Tracks. In TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Turney, P. and M. Littman (2003). Measuring praise and criticism: inference of semantic orientation from association. In ACM Transactions on Information Systems, Volume 21, pp. 315 346. Kim, S.-M. and E. H. Hovy (2005). Automatic detection of opinion bearing words and sentences. In IJCNLP-05: Companion Volume to the Proceedings of the Second International Joint Conference on Natural Language Processing Bermingham, A., A. F. Smeaton, J. Foster, and D. Hogan (2008). DCU at the TREC 2008 Blog Track. In TREC 2008: Proceedings of the Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Na, S.-H. and H. T. Ng (2009). A 2-poisson model for probabilistic coreference of named entities for improved text retrieval. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 09, New York, NY, USA, pp. 275 282. ACM. Yang, K., N. Yu, A. Valerio, and H. Zhang (2006). WIDIT in TREC-2006 Blog Track. In TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Zhou, G., H. Joshi, and C. Bayrak (2007). Topic categorization for relevancy and opinion detection. In TREC 2007: Proceedings of The Sixteenth Text REtrieval Conference, Gaithersburg, Maryland, USA. Gerani, S., M. J. Carman, and F. Crestani (2010). Proximity-based opinion retrieval. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, New York, NY, USA, pp. 403 410. ACM. 36