I. 문서표준 1. 문서일반 (HY중고딕 11pt) 1-1. 파일명명체계 1-2. 문서등록정보 2. 표지표준 3. 개정이력표준 4. 목차표준 4-1. 목차슬라이드구성 4-2. 간지슬라이드구성 5. 일반표준 5-1. 번호매기기구성 5-2. 텍스트박스구성 5-3. 테이블구성 5-4. 칼라테이블구성 6. 적용예제 Machine Learning Credit Scoring 최대우교수 한국외대통계학과
파괴적 New Biz. 출현 Attack by FinTech Startup
파괴적 New Biz. 출현
Deep Dive - Airbnb social mobile
Deep Dive - Airbnb analytics cloud
SMAC Business Advantage
Circulation Apps & Sensor Analytics Big Data
Scoring 101 Step 1: 기존고객으로부터자료를수집한다. ( 불량고객이면 Y = 0, 아니면 Y=1) Step 2: 모형적용 Y = f ( 수입, 직업등.) Default 1.2 1 0.8 0.6 0.4 0.2 0 정상부도 0 10000 20000 30000 40000 50000 Income Step 3: 신규대출신청자에대하여불량고객일가능성을계산한다. 8
Scoring 101-Scorecard William Fair Earl J. Issac (1921~1983) 9
New Technology, New Paradigm Deep Learning Real-Time Analytics Machine Learning Big Data FinTech Social Media Data Science Artificial Intelligence 새로운기술에근거한파괴적혁신에대한막연한두려움과동경보다는정확한이론적이해와실험적시도가중요함 10
Modeling Culture Definition of model The data modeling culture Response variables =f(predictor variable, random noise, parameters) x Linear regression Logistic regression Cox model y The algorithmic modeling culture - The approach is to find a function - an algorithm that operates on to predict the response. x y unknown f Decision tree Neural nets (x) y x 11
Modeling Culture The data modeling culture The algorithmic modeling culture fitting training 12
적용알고리즘및최적모형발견을위한 Random Grid Search H2O 에서제공하는다양한알고리즘중 scoring 에강한 3 가지알고리즘에대하여다양한 parameter setting 의조합에 대한모델링을진행하겠습니다. Machine Learning Platform Random Forest Gradient Boosting Machine Deep NN decision tree 의 depth decision tree 의개수 sample rate decision tree 의 depth decision tree 의개수 learning rate sample rate Hidden layer 및노드수 dropout ratio epoch Random Grid Search 13
ML 기반 Credit Scoring 의 issue 리스크관리체계에대한승인이슈 새로운방식에대한거부감 각종필요정보등을실시간으로만들어야함 ( 다양성, 외부정보등 ) 거절자에대한사유설명에대한 issue ML 이 scorecard 를대체하기에는문제가있음 감독당국과의소통 설명력 실시간대응 고객 benefit 위주의의사결정 대출상품신청등이온라인, 모바일에서진행 BAD case 의탐색보다는숨겨진 GOOD 발견에중점 그러나이는 Risk 를증가시킨다는이슈, 그리고기업의포트폴리오의질을떨어뜨려충당금증가이슈와맞물림 14
활용예 -Machine Learning score 와의결합 평점표 (scorecard) 와머신러닝알고리즘에의해산출된 score 의결합의장점은다음과같습니다. 즉, 기존 score 의한계를더욱극복할수있는새로운정보로의 ML score 를활용하며중간등급내에서우량고객을추가 발굴할수있습니다. 평점표 + 머신러닝 score 결합모형 Scorecard Score 661-675 ML Score Bad rate 1.5% to 11.6% Bad rate 6.2% Score by Machine Learning 평점표 S C O R E 15
사례 -Machine Learning score 와의결합 본사례는 GOOD 예측에있어평점표와머신러닝 score 를결합한사례입니다. 결합등급별유지율 RF 등급 스코어카드등급 1등급 (99.0%) 2등급 (98.0%) 3등급 (97.0%) 4등급 (95.5%) 5등급 (93.8%) 6등급 (91.6%) 7등급 (88.8%) 8등급 (84.3%) 9등급 (79.7%) 10등급 (74.8%) 11등급 (69.1%) 12등급 (63.0%) 13등급 (55.0%) 14등급 (46.1%) 15등급 (31.1%) 1등급 96.15% 97.54% 97.03% 96.52% 96.12% 96.58% 95.08% 100.00% 100.00% - - - - - - 2등급 100.00% 98.51% 94.62% 94.37% 96.08% 94.69% 95.51% 94.52% 100.00% - - - - - - 3등급 - 95.00% 95.74% 92.79% 95.15% 91.61% 92.32% 88.20% 86.49% 83.33% 0.00% - - - - 4등급 100.00% 86.67% 88.89% 90.21% 95.64% 91.49% 91.80% 89.08% 85.06% 87.50% 100.00% 100.00% - - - 5등급 - 100.00% 100.00% 92.42% 89.61% 90.85% 89.07% 88.68% 90.64% 91.53% 86.67% 100.00% - - - 6등급 - 100.00% 88.89% 92.86% 90.18% 87.32% 87.10% 85.55% 85.93% 85.83% 89.74% 60.00% - - - 7등급 - - 100.00% 63.64% 89.71% 88.17% 84.22% 83.41% 85.22% 82.87% 73.97% 90.48% 100.00% 100.00% - 8등급 - 100.00% 83.33% 83.33% 78.95% 87.39% 84.29% 82.64% 80.56% 77.48% 78.77% 73.33% 62.50% 100.00% - 9등급 - - 50.00% 88.89% 80.00% 77.94% 78.95% 77.96% 77.83% 75.42% 73.18% 68.71% 69.77% 63.64% 100.00% 10등급 - 100.00% 100.00% 75.00% 72.73% 56.86% 58.54% 61.45% 60.20% 59.38% 59.65% 47.69% 44.37% 35.67% 35.29% 결합등급별구성비 RF 등급 스코어카드등급 1등급 2등급 3등급 4등급 5등급 6등급 7등급 8등급 9등급 10등급 11등급 12등급 13등급 14등급 15등급 1등급 0.16% 1.27% 2.62% 3.14% 1.85% 0.73% 0.19% 0.03% 0.01% - - - - - - 2등급 0.02% 0.21% 0.81% 2.16% 3.34% 2.35% 0.83% 0.23% 0.04% - - - - - - 3등급 - 0.06% 0.29% 1.00% 2.45% 3.42% 2.07% 0.56% 0.12% 0.04% 0.00% - - - - 4등급 0.00% 0.05% 0.17% 0.45% 1.36% 3.15% 3.16% 1.26% 0.27% 0.12% 0.01% 0.00% - - - 5등급 - 0.01% 0.06% 0.21% 0.72% 1.94% 3.74% 2.45% 0.63% 0.18% 0.05% 0.00% - - - 6등급 - 0.00% 0.03% 0.13% 0.35% 1.08% 2.93% 3.33% 1.62% 0.40% 0.12% 0.02% - - - 7등급 - - 0.00% 0.07% 0.21% 0.58% 1.94% 3.50% 2.60% 0.78% 0.23% 0.07% 0.02% 0.01% - 8등급 - 0.00% 0.02% 0.04% 0.12% 0.35% 1.03% 2.91% 3.44% 1.41% 0.56% 0.09% 0.02% 0.01% - 9등급 - - 0.01% 0.03% 0.05% 0.21% 0.53% 1.74% 3.05% 2.59% 1.12% 0.46% 0.13% 0.07% 0.00% 10등급 - 0.00% 0.01% 0.01% 0.03% 0.16% 0.26% 0.56% 1.59% 2.33% 2.15% 1.35% 0.91% 0.53% 0.11% 16
Big Data 시대의요구기술 Big Data era Social + External Unstructured Event captured Dynamic data visualization +Analytics in war room Distributed process+cloud 데이터의원천 데이터의형태 수집대상데이터 분석결과활용 분석환경 요구기술 분산처리기반의데이터가공및컴퓨팅기술 실시간대응을위한비즈니스룰설계및개발기술 통계엔진을활용한분석자동화 Data Visualization 18
Data Analytic Process 19
Data Analytic Tools 20
Who is Data Scientist?
Q & A Q&A 22