IBM Software Group 2003. 11. 13 이지은 IBM Information management 팀
Agenda
Part 1 데이터웨어하우징동향 : 현재와미래
Trends in Data Warehousing 대용량 + 활용성 + 실시간 1. Data Warehouse Reengineering or Rebuilding Rigorous Measurement of RO I 2. Data Warehouse is essential in Customer Relationship Management 3. Integration 4. Proliferation of Data Sources 5. Growing number of end users 6. More Complex Queries 7. Exploding Data Volumes 8. Data Warehouse for real time analysis and actions 9. Increased Analytics
Mid-1980s: Data Warehousing 일관성 Business data warehouse Reconciling disparate data Data marts Single version of the truth Creates historical record Characteristics Historical data Meta data Separation of informational & operational needs Business data warehouse Structured data Unidirectional data flow Trusted sources Basic technical metadata Operational systems
Late-1990s: Operational data stores and data marts Immediate information Operational data store (ODS) Near real-time Integrating related data Relative/partial truth Characteristics Recent and historical data Merging of informational & operational needs Structured data (largely) Bidirectional data flow Trusted sources Comprehensive technical metadata Meta data (Operational) Data marts Business data warehouse Operational data store Operational systems
The vision: Comprehensive integration of information Integrated information Real-time knowledge Integrating all information Complete truth Characteristics Immediate and historical data Fully merged informational & operational needs Structured and unstructured data Bidirectional data flow and access Intelligent caching Trusted and untrusted sources Complete business & technical metadata Meta data (Operational) Data marts Business data warehouse Information integration Operational data store Operational systems Untrusted & unstructured sources (e.g. Internet)
Three ways of extending a data warehouse 1. Accessing and joining real-time data From databases, applications and queues 2. Accessing and joining unstructured data From content stores, web-pages and e-mails 3. Combining data from multiple data marts From disparate data warehouses
Extending DW : Federation allows access to and joining of real-time data with an existing warehouse. Client BI Tool Application Data marts BDW Information Integrator ODS DBMS Application Database Database Operational systems Existing Operational Systems
Client IBM Software Group Information Management Software Extending DW: Federation allows access to XML and unstructured content. BI Tool Data marts BDW Information Integrator ODS XML DBMS Content Operational systems Existing Content Systems
Extending DW: Federation enables joining existing data marts (and data warehouses) together. Client BI Tool Data marts Information Integrator BDW RDDBS ODS Data Mart BDW Operational systems Second data warehouse
Data warehousing has evolved beyond static data. IBM Information Integration delivers information Mid-80s: Data warehouse Point-in-time structured data Integrated within the business intelligence environment Mid-90s: Operational data store Near real-time structured data Integrated within a limited internal environment Today: Information integration => Extending DW Real-time structured and unstructured information Integrated wherever required
Part 2 실시간비즈니스인텔리젼스 : Embedded analysis
Everywhere : Demands of real-time Executive Dashboards Key Performance Indicators SCM, ERP, CRM Balanced Scorecar d Business Performance Management Operational BI Real Time Enterprise Zero Latency Enterprise Real Time BI Real Time Enterprise Information Integration Integration Brokers Enterprise Application Integration
Real-time BI and lifespan of data Operational System EAI / Eii ETL & Data Warehouse Real Time BI tape This Minute Tonight 12-36 months Archives
Real time BI Major factors Parallel ETL Engines MQSeries queues Continuous Loading Concurrent User Queries ODS Warehouse Embedded Analysis with data mining, rules, campaigns Consumers Replication Web services Information Integration Alerts, triggers, KPIs, Analytics Corporate Dashboard
Real time ETL needs Parallelism & Non-stop Loading BI users MQ new data Import Transforms Aggregate Load www Call Center
Real time BI Federation Client BI Tool Monitors & Dashboards wrapper Application Data marts Information integration Federation Data Warehouse ODS Operational systems wrapper DBMS Database wrapper Application Operational Systems
Real-time BI with Embedded analysis 애플리케이션에 BI 의로직이탑재됨을의미합니다. 실시간으로분석하고그결과를사용자가자신의애플리케이션에서보도록합니다. BI Reports Portal KPIs Call Center Web Site Data Mining Data Warehouse Production Database
Real time BI things to consider Style of application & Use OLTP Integration or BI 10 records integrated or 10M Determine level of detail required Determine latency required How quickly can the LOB use the knowledge? Will business processes have to change? Is the price worth the performance? Take only what you need Real time is expensive LOB must be able to take action real time parallel transform & load Warehouse change data capture MQSeries
Part 3 Real Time Enterprise 를위한 BI 플랫폼 : IBM Data Warehouse Edition
The Framework for Business Intelligence SQL XML PMML Administration OLAP Mining Statistics ETL Information Integration
DWE ( Data Warehouse Edition) the BI Framework H y p e r i o n Client & ISV Applications Catalog OLAP & RDBMS Metadata Java SP SP SQL Cube View Cube View Detail Data Web Services XML MQT on MDC federated data access Sybase SQL Server Oracle IDS VSAM IMS etc.
Data Warehouse Edition Components Enterprise Server Edition Database Partitioning Feature Cube Views Intelligent Miner Scoring, Modeling & Visualization Warehouse Center & Manager Office Connect Enterprise Edition Query Patroller Information Integrator Foundation of BI platform Shared-nothing scalability Unified engine for OLAP Data mining inside ETL transforms & scheduling Excel connectivity to Workload management Real time DW
IBM DWE for Extending DW UDB Information Integrator DataStage parallel extender/ Warehouse Manager Information Integrator UDB Extending DW 데이터통합 marts ETL Data Stage Warehouse Manager
UDB : Optimized platform for BI 고가용성 Fail-over 를지원하여안정적인서비스제공및사례를통한우수성입증 고성능 Performance 대용량데이터를위한병렬처리및 Optimization 복잡한질의, 대수사용자의질의에도빠른응답속도지원 데이터웨어하우스를위한인덱스기법제공 대용량데이터를위한다양한 Partition 기법 안정성 stability 연계성 Integration Shared-nothing 아키텍처를사용하여 1000 노드까지의선형적확장성보장 자율컴퓨팅을통한효율적데이터관리 자원최적화하여스스로관리하는기능 확장성 Scalability 이기종데이터소스에투명한접근허용및데이터교환을위한투명한 Architecture 제공및관리의용이함제공
Information Integration extends the warehouse APPC, TCP/IP 390 400 Information Integrator: heterogeneous data federation technology for structured data and content Engine Enhancements for Federated - Optimizer -Rewrite -Runtime wrapper architecture DRDA Driver LAN Driver Oracle SQL* Net Informix client Sybase Open Client MS SQL Srvr ODBC Client ODBC TCP/IP APPC NetBIOS TCP/IP TCP/IP Windows UNIX Oracle Informix Sybase Flatfile MS SQL Server Documentum Blast Excel Any ODBC Data source XML
WPS v1.2 - The Cutting Edge Address: http://the_call_center/customer/lookup Internet IBM Software Group Information Management Software Data Warehouse Manager and Center ETL Data Warehouse Center ETL ƒ Basic administration console for ETL ƒ Database schema & user maintenance ƒ Access to most RDBMS's & flat files ƒ Schedules and monitors database tasks ƒ 150+ data transformations ƒ Loads data into data warehouses ƒ CWMI Standards adherence Warehouse Manager adds: ƒ Extracts & transformations on remote servers via agents ƒ Transformation library Tool Hel File Edit View s p x Back Forward Stop RefreshHome Warehouse Center 3m extract, transform, load, schedule, administrate home Searc h 1 cust-nbr 2 acct-code 3 first name 4 last name 5 street 6 city 7 zip 8 country Histor y Mail Print 07/02/2001 19:29:20 1 cust-number 2 acct_type 3 F-name 4 L-name 5 street 6 city 7 postal_code 8 country IBM Warehouse Manager ETL agents Information Catalog Metadata
IBM DWE for Embedded Analysis Mining Extenders Cube Views OLAP Server Intelligent Miner Embedded Analysis Real-time BI Mining Extenders Cube Views OLAP Server Intelligent Miner
Intelligent Miner Modeling, Scoring, Visualization : UDB can do data mining, even real time mining SQL Modeling Intelligent Miner Answers scores Data Warehouse PMML Model SQL Scoring Neural Nets Associations Time Series Radial Basis Decision Trees Regression clusters predict ions Outliers Decisions PM_usag e multi_lines wireless_rm T_rev98 call_card.47 MM_usag e multi_lines ISP_usa Call_waiting wireless_3t.44 MM_usag e T_rev98 Call_waiting call_card PM_usage.43 visualize res_bldg ISP_amt call_card ISP_amt wireless_3t MM_usag e Sys_rx_ft T_rev98 multi_lines Rtx.36.33 rural_zip wireless_rm ISP_amt PM_usage call_card.29 Call_waiting multi_lines PM_usag e MM_usag e ISP_amt.27 MM_usag e T_rev98 rural_zip multi_lines PM_usage.19
IBM Embedded Analysis Strategy : Database-centric IM Visualization Mining task stored as LOB Database Task IM Modeling Stored Procedure...stand-alone show Mining model stored as LOB Data Training data web Model IM Scoring Data Apply Function Data to which the model is applied...as applet Results of applying the model Mining result
Data Mining architecture IBM Embedded analysis Strategy Background Workbench Analyst RDBMS Extenders SQL invokes extender Application Embedded Programmer PM_usage MM_usage MM_usage res_bldg ISP_amt rural_zip Call_waiting MM_usage multi_lines multi_lines T_rev98 call_card ISP_amt wireless_rm multi_lines T_rev98 wireless_rm ISP_usa Call_waiting wireless_3t MM_usage ISP_amt PM_usage rural_zip T_rev98 Call_waiting call_card Sys_rx_ft T_rev98 PM_usage MM_usage multi_lines call_card wireless_3t PM_usage multi_lines Rtx call_card ISP_amt PM_usage.47.44.43.36.33.29.27.19 Instance Instance Consumer Packaged Applications algorithm extract Instance PMML Advocate data warehouse Instance
Type attributes Type player Type player Type attributes Type player Type attributes T ype player Type player Type attributes Type player Ty pe player IBM Software Group Information Management Software IBM Embedded Analysis Strategy 2 way fits all Applications with embedded Mining Scheduler Mining Work Bench Applications with em bedded scoring Scheduler JO b M odel C alibration Scheduler Ty pe play er Segmentation JO b Model Calibration Scheduler Segmentation Business Objects Application Integration Layer: CLI/JDBC/SQLJ/ODBC IM Modeling API Mining Run IM Scoring API Analytical Data Mart Model Transportation Modeling Environment Operational Data Store Data Models Scores Scoring Environment
IBM Embedded Analysis Strategy : Workbench+DB-centri Data Analyst Data Mining Workbench Models from a consultant, solution provider, or central support group within an enterprise. Models can be exchanged among data mining tools from compliant vendors. Added value: could merge purchased data (such as demographic or industry-specific data) with internal data. Data Warehouse Selected Data Transformed Data Select Transform Mine Assimilate Historical Data Extracted Information Assimilated Information mining model PMML model IM Scoring SQL UDF Scored Data
활용예 1: 고객특성분석및활용 비즈니스이슈 고객의특성을파악하여마케팅전략을위한타겟그룹개발필요 기존고객특성파악정보를새로운고객에게적용하여각특성에맞는적절한상품추천필요 기존고객의행동특성변화에대응필요 접근방법 고객행동정보기반의클러스터링수행 서로다른특성그룹파악 고객세분화모델을운영시스템으로보냄 새로운고객에대한스코어링수행 기존고객에대한스코어링수행 리포트에스코어링결과를보여줌
Type attributes Type player Type player 활용예 1: 고객특성분석및활용개념도 Mining Work Bench Applications with embedded scoring Scheduler JOb Type player Model Calibration Type player Scheduler Type attributes Segmentation OLAP Application Integration Layer: CLI/JDBC/SQLJ/ODBC IM Scoring API Mines the data using Demographic Clustering xfer model Cluster model Data Warehouse Operational Data Store Data Models Scores Analytical Data Mart Modeling Environment Scoring Environment
활용예 2: Real time Promotion - embedded mining 특정고객군별연관상품분석을실시간으로수행하여 CRM 활동에활용 Applications with embedded mining Scheduler JOb Model Calibration Scheduler Transactional Data applications that search for association rules Application Integration Layer: CLI/JDBC/SQLJ or ODBC Business Objects IM Visualization IM Modeling API Mines the data Data Warehouse Analytical Data Mart Modeling Environment
Cube Views : UDB can recognize OLAP Meta Data Bridge OLAP Metadata Meta Data Bridge OLAP Metadata Hyperion OLAP Metadata OLAP Metadata OLAP Metadata OLAP Metadata DML DDL RDBMS Metadata Data Warehouse DATA OLAP Metadata OLAP Metadata BUSINESS OBJECTS MQT s OLAP Metadata Model & ETL Tool Metadata Optimization Advisor OLAP Metadata BI Tool Metadata
OLAP Center Architecture OLAP Center Cube Views XML Import File XML Output File JDBC Excel w/ Office Connect Detail Data Catalog OLAP Metadata MQT & index Metadata Access XML SQL Custom App Data Access
Performance Advisor Administrator Cube Views Model OLAP Metadata Catalog Tables Cube Views Base Tables Time & Space constraints Optimization hints Model Information Statistics Data Samples Performance Advisor MQT's
MQT 활용 Optimization Advisor Cube Views MQT 에있는내용을검색하는쿼리가발생하면쿼리를다시생성 MQT 에서집계된데이터를바로읽어옴. Query select s.region, SUM(f.SALES) from FACT f, STORE s where f.store_key=s.store_key and s.region= Pacific group by s.region Rewritten Query select REGION, SALES from MYMQT where REGION= Pacific Optimizer MQT create table MYMQT as ( select s.region, SUM(f.SALES) from FACT f, STORE s where f.store_key=s.store_key group by s.region ) REGION SALES Eastern $ 155,037.12 Mid Atlantic $ 37,534.96 Mid West $ 233,134.89 Mountain $ 74,019.26 Pacific $ 116,960.98 South East $ 81,799.49 South West $ 82,916.89
DW 시스템과 Cube Views Benefit for Everybody 쿼리성능향상 Cube Views 일반사용자및분석가 일반사용자들이 OLAP 툴등을이용하여데이터베이스에쿼리를수행하게될경우, 이미생성되어있는 MQT 에대하여쿼리가수행되어쿼리응답성능향상된다. 이때, 사용자는기존과같은환경에서같은쿼리를수행한다. MQT 의존재여부에영향을받지않는다. 다차원모델링 다양한툴과메타데이터공유 Cube Views 에서관리되는메타데이터는 ETL, 모델링툴에서부터가져오고각 OLAP 툴로가져가서작업이가능하다. 정보계담당자 분석환경에최적화된자동요약테이블생성 자동요약테이블을사용자쿼리유형, OLAP 메타데이터등을고려하여최적화하여작성할수있어관리자가별도의노력을들여요약테이블을구성하지않아도된다.
진정한 BI 의실현 : Integrated Approach with IBM DWE IBM methodology Experience Real-time BI Embedded Analysis Data Warehouse Edition UDB for EDW Technology Extending data warehouse