기계학습을통한 시계열데이터분석및 금융시장예측응용 울산과학기술원 전기전자컴퓨터공학부최재식
얼굴인식 Facebook 의얼굴인식기 (DeepFace) 가사람과비슷한인식성능을보임 문제 : 사진에서연애인의이름을맞추기 사람의인식율 : 97.5% vs DeepFace 의인식률 : 97.35% (2014 년 3 월 )
물체인식 ImageNet (http://image-net.org): 1500 만개이상의이미지데이터 (2 만 2 천개의물체분류 ) ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 문제 : 주어진이미지에대해서 1,000개의물체분류중 5개를추천 딥러닝이전시스템의예측오류 : (SVM, Ensemble) 2012: >26.2% 딥러닝시스템 : 2013년 AlexNet( 토론토대학 ): 15.3%, 2014년 LeNet( 구글 ): 6.7%, 2015년 ResNet( 마이크로소프트 ): 3.6%, 2016년 Trimps( 중국보안연구소 ): 3.0%
바둑 ( 알파고 ) AlphaGo (https://deepmind.com/alpha-go) 알파고와이세돌기사의대국에서알파고의 4 대 1 승리 (2016 년 3 월 )
The Deep Learning Revolution - NVIDIA https://www.youtube.com/watch?v=dy0hjwltsye
인공지능의미래? What is the future of AI? Artificial Intelligence
파괴적 (Disruptive) 기술이 2025 년에세계경제에미칠영향 ( 맥킨지 2013)
데이터수집 / 분석 / 처리 지식노동의자동화 Automation of Knowledge Work
SOURCE: https://public.tableau.com/profile/mckinsey.analytics#!/vizhome/automationbysector/wheremachinescanreplacehumans 지식노동의자동화 금융및보험 Automation of Knowledge Work
Financial Time Series Analysis
AI based Startups for Fintech
공포가뭐죠?... 냉정한 AI 증시요동칠때 600% 수익 ( 조선비즈, 2017 년 3 월 18 일 )
Problem: Predicting S&P 500 from 1992 ~ 2015 Model: Deep Neural Networks + Gradient boosted tree + Random forest Before using AI (1992 ~ 2001/3) 200% Applying AI methods Crisis (2001/4 ~ 2008/8) (2008/09~2009/12) 22% 400% Recent models (2010/1~) 0% An Ensemble of Model for Stock Market Prediction Kruss et. Al., Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500, European Journal of
약 3 만 5000 명에달하는골드만삭스전체임직원의 4 분의 1 가량이컴퓨터엔지니어 2017 CSE 심포지엄 골드만삭스 CFO 트레이더, 600 명에서 2 명으로, IT 기업된골드만삭스 ( 이코노미조선, 2017 년 2 월 22 일 )
다수의시계열데이터의변화및상호관계를분석하는인공지능시스템 학습 다중주식데이터 베이지안다중커널학습 질의 + + 주식데이터 + 주식간관계정보분석 자동보고서작성 / 변화예측 UNIST 시스템 : MIT/ 캠브리지분석시스템대비예측오류 40% 감소 (2016 년 6 월 ) 관계형자동통계학자 (UNIST) The Relational Automatic Statistician
딥러닝
퍼셉트론 Perceptron
비선형변환 Nonlinear Transform
After a nonlinear transformation, red and blue are linear separable* * Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444. Linear Separable Classes in Multilayer Perceptron
How Does Deep Learning Work? https://www.youtube.com/watch?v=he4t7zekob
Recognizing drawing Recognizing human faces Learning Feature Hierarchy - Examples
LeNet5: Recognizing digits using a neural network with 5 layers Recognizing human faces LeNet 5 (1989) vs GoogLeNets (2014)
Deep Learning Artistic Style
위성사진인공지능기반콩생산량예측 ( 시 / 군구단위 ) [ 미국 USDA: 예측주단위 ] 콩생산량예측 (Bu/Ac) 2016/08 2016/09 2016/10 KERNEL 50 > 51.0 52.1 USDA 48.9 50.6 51.4 http://www.telluslabs.com/2016/10/12/telluslabs-forecasts-lead-usda-reports-corn-soy/ 위성사진기반콩생산량예측 ( 미국 )
https://environment.google/approach/ 에너지사용 ( 탄소배출저감 ) Google says that it emits 1.5m tonnes of carbon annually but claims that its data centres consume 50% less energy than the industry average. 구분 필요성 성과 내용 온실가스배출저감을위해에너지사용량을줄이고효율적인에너지사용방안모색필요 시민들의에너지사용량데이터 ( 기온, 사용량, 전달속도등 ) 를활용하여 neural network 모델구축. 구축된모델을활용하여에너지의사용을저감할수있는있는방안모색가능 이용기술 Machine Learning/Deep Learning/Optimization
에너지사용 ( 탄소배출저감 ) www.google.com/about/datacenters/efficiency/internal/assets/machine-learning-applicationsfor-datacenter-optimization-finalv2.pdf
https://www.kaggle.com/c/otto-group-product-classification-challenge/discussion/14335#133321 1 st ranked model in Kaggle
Problem: Invest Stocks based Google Search Keywords Goal: Maximize return Sell stocks at the closing of the first day of the week when debt search is more than 3 week average. Otherwise, buy stock at the closing of first day of the week. Predicting Markets with Google Trends T. Preis, H. S. Moat and H. E. Stanley, Quantifying Trading Behavior in Financial Markets Using Google Trends, Scientific Report, 2013.
KB 지식비타민, 2015.9 Dataminr Event Detection Technology
Smooth function Length scale: y weeks Rapidly varying smooth function Length scale: z hours Constant function Sudden drop btw 9/12/01 ~ 9/15/01 Quarterly Report News Reading Texts for Predicting the Future
딥러닝과시계열데이터분석
Residual learning 3.6% of error in ImageNet Challenge, 2015 Comparison of Resnet Residual Network (ResNet, He et. al., 2015)
Better Performance than ResNet CIFAR10 (3.74% -ResNet 4.62%) CIFAR 100 (19.25% - ResNet 22.71%) Densely Connected Convolutional Networks (DensNet, Huang et. Al., 2016)
Recurrent Convolutional Layer (RCL) * Figure is drawn by Subin Yi Recurrent Convolutional Neural Layers (RCNN, Liang and Hu, 2015)
Hand Start First Digit Touch Lift off Replace Both Released * Joint work with Azamatbek Akhmedov RCNN on EEG Analysis Luciw et. al., Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction, Scientific Data (Nature), 2014
One chunk: Data: 3584,32 Hand Start First Digit Touch Lift off Replace Both Released * Joint work with Azamatbek Akhmedov RCNN on EEG Analysis Luciw et. al., Multi-channel EEG recordings during 3,936 grasp and lift trials with varying weight and friction, Scientific Data (Nature), 2014
Convolutional Layer:(1,3584) Applying RCL Max pooling RCL:(1,896) Max pooling RCL:(1,224) Max pooling RCL:(1,56) Max pooling RCL:(1,14) Max pooling (1,7) Fully Connected (6) 97.687% RCNN on EEG Analysis
딥러닝모델 400 여개시계열센서데이터 용선온도예측 복잡계시스템 제조업
Grouped CNN/ Grouped RCNN (Yi, Ju and Choi, 2017)
Grouped CNN/ Grouped RCNN (Yi, Ju and Choi, 2017)
Collected from 148 sensors of 12,654 time steps Grouped CNN/ Grouped RCNN (Yi, Ju and Choi, 2017)
Collected from 88 sites for 28 years Dataset: US Groundwater (Yi, Ju and Choi, 2017)
Groundwater Drone Dataset: US Groundwater (Yi, Ju and Choi, 2017)
- 인공지능 / 기계학습기술의발전은시계열데이터의인식 / 분석 / 예측에큰영향을줄것으로예상됨 - 이미지 / 영상인식에서딥러닝의발전은시계열데이터인식의발전에도긍정적인영향을줌 - 최근에는 skip layer 를효과적으로이용하는딥러닝방법들이좋은성능을보임 - 시계열데이터를동적모델 (RNN/LSTM) 로보는방법외에, 특정시계열구간에 CNN 을적용한모델도효과적인방법으로보임. Conclusion
Thank you jaesik@unist.ac.kr