마음과목소리를이해하는인공지능보이스봇만들어보기 안태규차장이지은과장 Client Solutions Professional Data Science & AI, IBM Korea
Agenda How to Start VoiceBot Speech to Text Text to Speech Watson Assistant SIP Service Watson Voice Agent Lessons Learned
VoiceBot 을만들어봅시다
IBM ID 및인공지능서비스 API 만들기 https://cloud.ibm.com/login
IBM ID 및인공지능서비스 API 만들기 https://www.aibril.com/web/user/userjoin/createuserjoinchoice.do
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
IBM ID 및인공지능서비스 API 만들기
Watson Speech to Text 왓슨의한국어음성인식 API
Speech To Text Watson STT Speech To Text 약 4.3 초 차가퍼져서갓길로견인이필요해 Input Streaming 차가퍼져서갓길로견인이필요해 실시간 고객발화인식시간단축 Speech To Text Request By File (MP3 / Wav) 차가퍼져서갓길로견인이필요해음성파일스캔및분석시간 준실시간 차가퍼져서갓길로견인이필요해
Speech To Text - Demo Narrowband (8kHz) 전화상에서의대화를뜻하며, 콜센터프로젝트등이사용되는주파수. Broadband (16kHz) 발화나일반적인대화를뜻하며인공지능스피커등에사용. https://speech-to-text-demo.ng.bluemix.net/
일반적인 Speech To Text 트레이닝데이터생성과정 (1/2) 학습데이터는 수집된음성파일에대한스크립트작성또는, 존재하는대화록에대한녹음수행 텍스트데이터 - LM (Language Model) 오디오데이터 - AM (Acoustic Model) Data and AI Forum 2019
일반적인 Speech To Text 트레이닝데이터생성과정 (2/2) 사람이직접음성파일의타임프레임을지정하지않아도, Light Supervised Learning 을통해미리입력된텍스트로음성파일을학습함 학습데이터를생성하는데걸리는시간단축 Data and AI Forum 2019
STT 인증정보확인 (IBM Cloud) API Key, URL 복사해놓기 ( 뒤에 API 호출시필요 )
STT Acoustic Model Customization 1. 커스텀음성모델만들기 (Acoustic Model) curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data "{\"name\": \"Example acoustic model\", \"base_model_name\": \ ko-kr_broadbandmodel\", \"description\": \"Example custom acoustic model\"}" "https://stream.watsonplatform.net/speech-totext/api/v1/acoustic_customizations" curl 실행예시 새로만든커스텀 AM 에대한 customization ID 가리턴됨 https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-acoustic&locale=ko
STT Acoustic Model Customization 2. 커스텀음성모델에오디오파일추가 (1) 오디오파일하나만추가하는경우 curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/wav" --data-binary @audio1.wav https://stream.watsonplatform.net/speech-to text/api/v1/acoustic_customizations/{customization_id}/audio/audio1 (2) 오디오파일여러개를 zip 파일로추가하는경우 curl -X POST -u "apikey:{apikey}"--header "Content-Type: application/zip --header "Contained-Content-Type: audio/l16;rate=16000 --data-binary @audio2.zip "https://stream.watsonplatform.net/speech-totext/api/v1/acoustic_customizations/{customization_id}/audio/audio2" 띄어쓰기주의 ( 에러발생 )
STT Acoustic Model Customization 3. 커스텀음성모델에추가된오디오파일리스트확인 curl -X GET -u "apikey:{apikey} "https://stream.watsonplatform.net/speech-totext/api/v1/acoustic_customizations/{customization_id}/audio/audio1" curl 실행예시 * 주의사항 curl 커맨드사용시, 음성파일이름이한글인경우 invalid 오류발생하므로, 영문으로할것 1) 프로세스진행중 ( 오디오분석중 ) 2) 분석완료
STT Acoustic Model Customization 4. 커스텀음성모델트레이닝 curl -X POST -u "apikey:{apikey}" "https://stream.watsonplatform.net/speech-totext/api/v1/acoustic_customizations/{customization_id}/train" curl 실행예시 * 주의사항오디오총시간이 10 분이하이거나, 12000 분초과하는경우에러발생
STT Language Model Customization 1. 커스텀언어모델만들기 (Language Model) curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data "{\"name\": \"Example model\", \"base_model_name\": \ ko-kr_broadbandmodel\", \"description\": \"Example custom language model\"}" "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations" curl 실행예시 새로만든커스텀 LM 에대한 customization ID 가리턴됨 https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-languagecreate&locale=ko
STT Language Model Customization 2. 커스텀언어모델에 Corpus ( 말뭉치 ) 추가 curl -X POST -u "apikey:{apikey}" --data-binary @healthcare.txt "https://stream.watsonplatform.net/speech-totext/api/v1/customizations/{customization_id}/corpora/healthcare" curl 실행예시 * 주의사항텍스트파일인코딩 UTF-8 인지확인할것
STT Language Model Customization 3. 커스텀언어모델에단어추가 curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data "{\"words\": [ {\"word\": \"HHonors\", \"sounds_like\": [\" 힐튼아너스 \", \"H. honors\"], \"display_as\": \"HHonors\"}, {\"word\": \"IEEE\", \"sounds_like\": [\" 아이트리플이 \"]}]}" https://stream.watsonplatform.net/speech-totext/api/v1/customizations/{customization_id}/words HHonors 라는단어가나오면 힐튼아너스 로발음되도록함 H.honors 라는단어가나오면 HHonors 로스펠링수정 IEEE 라는단어가나오면 아이트리플이 로발음되도록함 * Json 파일을사용하여단어추가할수도있음 * curl 명령어 (Json 사용하는경우 ) {"words": [ {"word": "HHonors", "sounds_like": ["hilton honors", "H. honors"], "display_as": "HHonors"}, {"word": "IEEE", "sounds_like": ["I. triple E."]} ] } curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data-binary @words.json "https://stream.watsonplatform.net/speech-totext/api/v1/customizations/{customization_id}/words"
STT Language Model Customization 4. 커스텀언어모델에추가된오디오파일리스트확인 curl -X GET -u "apikey:{apikey}" "https://stream.watsonplatform.net/speech-totext/api/v1/customizations/{customization_id}/corpora" curl 실행예시 OOV(Out Of Vocabulary) 단어 : 뭉치파일의컨텐츠를구문분석하고기본어휘에없는단어를추출
STT Language Model Customization 5. 커스텀언어모델트레이닝 curl -X POST -u "apikey:{apikey}" "https://stream.watsonplatform.net/speech-totext/api/v1/customizations/{customization_id}/train"
Public APIs for STT customization https://cloud.ibm.com/apidocs/speech-to-text Data and AI Forum 2019
Watson Text to Speech 왓슨의한국어음성합성 API
Text To Speech - Demo TTS 한국어버전의경우, IBM Korea 에직접문의하면사용할수있으며, Aibril 을통해서도사용해볼수있음 https://speech-to-text-demo.ng.bluemix.net/ https://aibril-tts-demo-korean.sk.kr.mybluemix.net/
TTS 인증정보확인 (Aibril) URL, username, password 값복사해놓기 ( 뒤에 API 호출시필요 )
TTS Voice Synthesize 1. 한국어음성합성 : voice= youngmi, yuna curl -X POST -u "{username}" --header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"hello world\"}" --output hello_world.wav "https://stream.aibril-watson.kr/text-tospeech/api/v1/synthesize?voice=yuna" curl 실행예시 패스워드입력 Response
TTS 커스텀모델만들기 2. 한국어커스텀모델만들기 curl -X POST -u {username}" --header "Content-Type: application/json" --data "{\"name\":\"first Model\", \"language\":\ ko-kr\", \"description\":\"first custom voice model\"}" "https://stream.aibril-watson.kr/text-tospeech/api/v1/customizations" curl 실행예시 커스텀모델에대한 customization ID 리턴됨
TTS 커스텀모델만들기 3. 한국어커스텀모델호출 curl -X GET -u {username}:{password} "https://stream.aibril-watson.kr/text-tospeech/api/v1/customizations/{customization_id}" curl 실행예시 커스텀모델생성정보리턴
TTS - Speech Synthesis Markup Language (SSML) 발음, 볼륨, 음역, 속도및기타속성을지정하여음성합성을제어할수있는마크업언어 SSML 을활용하면음의높낮이 Pitch, 특정발음을교정해주는 sub alias 등을활용할수있음 /v1/synthesize Using SSML curl -X POST -u "{username}:{password}" --header "Content-Type: application/json" --data "{\"text\":\"ibm 에서신규법인폰을개통하기위해서는 <sub alias=' 유심 '>USIM</sub> 구매가필요합니다.\"}" --output hello_world3.ogg "https://stream.aibril-watson.kr/text-to-speech/api/v1/synthesize?voice=yuna" <sub> 태그는 alias 속성에지정된텍스트가음성이합성될때태그내에포함된텍스트를대체한다는것을나타냅니다. alias 속성은이태그가갖는유일한속성이며필수속성입니다. 정의되지않은경우오류가발생합니다. https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-ssml https://www.aibril.com/doc/texttospeech/008.html
Watson Assistant 왓슨의한국어대화형서비스 Intent Entity Dialog
ChatBot? VoiceBot? 챗봇은다양한 UI 및 UX 를지원합니다. 하지만음성봇은어떨까요? 대화에사용되어지는 텍스트 O N L Y!!
Watson Assistant Configuration
Watson Assistant Configuration
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Intent Training
Intent Training
Intent Training
Entity Training
Entity Training
Entity Training
Entity Training
Dialog Configuration
Dialog Configuration
Dialog Configuration
Dialog Configuration
Dialog Configuration
Dialog Configuration
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
Watson Assistant
SIP Service: Twillo Inbound Call
SIP(Session Initiation Protocol) Service : Twillo Twillo 와같은무료 Trial 계정을통해테스트를진행해볼수있음 https://www.twilio.com/try-twilio https://cloud.ibm.com/docs/services/voice-agent?topic=voice-agent-connect&locale=ko#twilio-setup
SIP(Session Initiation Protocol) Service : Twillo SIP Trunking 을통해전화를받는서비스 (Inbound Call) 를구현할수있음.
SIP(Session Initiation Protocol) Service : Twillo Voice Agent Endpoint 를 SIP 서비스에설정, 필요시 DR Endpoint 를추가할수있음. [Elastic SIP Trunking dashboard] - [Create SIP Trunk] - [Origination] - [SIP URI] 항목에서 Voice Agent 의 SIP 주소를등록
SIP(Session Initiation Protocol) Service : Twillo [Numbers] 탭에서 [Get Started with Phone Numbers] 로무료미국번호를얻거나 [Buy a Number] 를통해한국전화번호를구입할수있음 [Elastic SIP Trunking]-[Numbers] 에서새로발급받은번호를할당 Assign 해주어야함. Elastic SIP 를통해 Call log 와과금등을확인할수있음.
Watson Voice Agent
Watson Voice Agent 란?
Watson Voice Agent 설정 SIP Service 에서발급받은전화번호를입력
Watson Voice Agent 설정 대화 Watson Assistant 인스턴스선택및인증정보를입력
Watson Voice Agent 설정 Speech To Text STT 인스턴스선택및인증정보를입력
Watson Voice Agent 설정 Text To Speech TTS 인스턴스선택및인증정보를입력
Watson Voice Agent 설정 이벤트전달 Cloudant 인스턴스가없는경우 새서비스인스턴스작성 을선택
Watson Voice Agent 설정 음성에이전트설정완료
Voice Agent > Event Forwarding 총 3 가지이벤트 Forwarding 지원 - Call Detail Record(CDR), Transcription Events, Watson Assistant turns Events * NoSQL DB 나 IBM CloudantDB 로이벤트를저장할수도있습니다. 1) Call detail record (CDR) 이벤트시작및종료시간, 종료이유및대화트랜잭션에대한세부정보와같이단일호출 Single Call 에대한요약정보가들어있습니다. 2) IBM Watson Assistant Turn 이벤트 Voice Gateway 가 Watson Assistant 에대한각요청 Each request 후에수신한즉각적인 JSON 응답이포함되는이벤트정보가들어있습니다. 3) Transcription 이벤트발화Utterance가감지될때마다발행되며발화텍스트Utterance text, 신뢰점수 Confidence score 및세션정보를포함합니다. 버전 1.0.0.2 이상에서지원됩니다.
이제전화를걸어봅시다!
Thank You
Notices and disclaimers Copyright 2019 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed as is without any warranty, either express or implied. In no event shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply. Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings
Notices and disclaimers continued Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a particular, purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. Management System, FASP, FileNet, Global Business Services, Global Technology Services, IBM ExperienceOne, IBM SmartCloud, IBM Social Business, Information on Demand, ILOG, Maximo, MQIntegrator, MQSeries, Netcool, OMEGAMON, OpenPower, PureAnalytics, PureApplication, purecluster, PureCoverage, PureData, PureExperience, PureFlex, purequery, purescale, PureSystems, QRadar, Rational, Rhapsody, Smarter Commerce, SoDA, SPSS, Sterling Commerce, StoredIQ, Tealeaf, Tivoli Trusteer, Unica, urban{code}, Watson, WebSphere, Worklight, X- Force and System z Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. IBM, the IBM logo, ibm.com, Aspera, Bluemix, Blueworks Live, CICS, Clearcase, Cognos, DOORS, Emptoris, Enterprise Document