Choose the Right Data Integration Solution ; Best Practices on EII/EAI/ETL IBM DB2 Technical Sales BI Team (byrhee@kr.ibm.com) 2005 IBM Corporation
Agenda I. II. ETL, EII, EAI III. ETL, EII, EAI Best Practice IV. V. 2
; Integration People Processes Information Wired / Wireless Devices application asset 24x7 access / real-time legacy packaged data source data type application / DW DM IT 40% 85% 40% 30~50%... 3
Horizontal Integration is the New Challenge Bridging the gap between business transformation and IT Customer Connections Internal Systems Supplier Networks 4
Integration Challenges repository merge format semantic layer Integration Governance Model,, data flow 5
vs. 3 Layer Business Layer Business rules Data Layer ; update Presentation Layer ; web, PDAs, etc. Application level EII (data layer level) Consolidated database level workflow action 6
Design Develop data mart Visible Costs 10% of costs Evolve Operate Change business rules Upgrade to new version Migrate from development to production Investigate bottlenecks Reuse in another project Change OS & DB Upgrade source systems Invisible Costs 90% of costs. 7
; OLAP ( / ) Executive Area Ad-hoc Query Require Specific Area / / / Recursive Summary Area Subject Area Raw Data Area L5 L4 L3, L2 L1 / 2,3 / 1 / 3 / / 1 / / 8
vs. : Current Data dynamic join / : Local performance / / (ETL) / Result Set (Replication) (near) real time 9
Agenda I. II. ETL, EII, EAI III. ETL, EII, EAI Best Practice IV. V. 10
EAI App1 App2 App3 MQIn Formatting & Mapping RouteToLabel MQOut EII ETL Data Warehouse Data Mart Data Warehouse Data Mart Data Mart 11
ETL ETL Extract( ), Transform( ), Load( ),,,,. ETL tool Extract / Transform / Load : parallel concurrent impact analysis 12
DataStage Architecture DB2 UDB Informix Oracle Microsoft SQL Server Sybase UniVerse UniData Sequential Mainframe QSAM VSAM DB2 IMS IDMS Adabas Datacom Others Any SQL Native API ODBC Hash File Sequential or FTP Change Data Capture Mainframe Data Access Windows Client DataStage Manager Designer Director Administrator Repository Server Windows NT/2000 Server ODBC Sequential or FTP Bulk Loaders Sun Solaris, HP-UX, IBM AIX, Compaq ; Parallel Extender Meta Data Exchange Native API Hash File Meta data Oracle 7 & 8 OCI Informix CLI Sybase OC UniVerse UniData Staged Data Meta data Repository Data Warehouse Load Utilities Bulk Loaders DB2 UDB Informix RedBrick Oracle BCP (Sybase & Microsoft) Sybase IQ 13
DataStage ETL DB Import - DBMS - - XML(Web) ETL GUI Drag & Drop 400 GUI,,,, 14
EII,. Marketing e-commerce Fulfillment Integrated Information Enterprise Information Integration / ; read/write, 15
WebSphere Information Integrator WS Information Integrator DB2 Catalog wrapper, server, nickname Server Nickname Remote functions Nickname Nickname Nickname Server Server Wrapper Wrapper : (Oracle, Sybase, MS SQL..) Server : Nickname : II : Global catalog User data MQT 16
WebSphere Information Integrator ; How it works WS II DB2 Nickname DB2 catalog Federated Query DB2 cost-based optimizer Optimizer query WS II. Cost-based optimizer Wrapper pushdown query Client library native client library. WS Information Integrator DB2 cost-based optimizer Local Execution Plan + Remote SQL Wrapper Client library Nickname Nickname Table 17
Federation (aka Enterprise Information Integration) SQL SQL Content Mainframe databases Mainframe files Relational databases XML Web services Packaged applications Web Other Collaboration Systems Content Repositories and Imaging Systems Workflow systems 18
EAI EAI (OS, DBMS ) (ERP, SCP, DW, CRM, Legacy ),, (Interface & Integration) Interface.,. Enterprise Application Integration 19
EAI EAI Adapter ; / Middleware ; ; DB, File ERP, CRM Middleware(MQ) ; / (Integration Broker ) ;, Integration Broker(WMQI) ; (Transformation) ; (Routing) HUB Broker Rules Format DB DB/File Information Adapter Adapter Integrator DBMS DBMS DBMS SAM FileSAM File File WBI WMQ 20
WebSphere MQ A B, (Queue) ( ) / (Queue), (Queue). (Queue) FIFO (First In First Out) FIFO (First In First Out),,, API (C/C++, Java, COBOL ) MOM (Message-Oriented Middleware) 78%, 87% / (Synchronous/Asynchronous Processing) ; A B B (Triggering) B.. (Assured Data Delivery) ; " " (Commercial Environment). 21
Agenda I. II. ETL, EII, EAI III. ETL, EII, EAI Best Practice IV. V. 22
Strength ETL vs. EII vs. EAI Strengths and Challenges ETL EII Extract, Transform, Load Middleware DW DM, Websphere MQ Websphere Information Integrator DataStage structured data structured data unstructured Batch data( ) focus real-time data read/write, metadata API real-time near real-time remote source global access event,,, GUI view workflow impact analysis metadata, metadata Data Grid EAI 23
Challenge ETL vs. EII vs. EAI Strengths and Challenges ETL EII ; SQL - key match data type mismatch 10 staging ; resource ; ~ out-of-sync DW network metadata ; metadata import/export Semantic integrity network EAI ETL tool 24
ETL vs. EII vs. EAI Data Flow Data ETL source to target Batch Job Daily - Monthly Best Medium Low /Metadata ETL Job SQL ESQL, view, DB database object catalog metadata FTP direct database connection EII Query time - Query (SQL) managed Real-time Direct database connection Transaction triggered Transaction managed (Near) Real-time Messaging Very large Medium Small ~ ~ 10 pipe SQL broker semantic transformation EAI 25
ETL vs. EII vs. EAI Event Monitoring Very Limited Versioning Workflow Control ETL Full support EII, Job flow None Job, error exception handling Limited trigger Limited support custom build EAI Best event Limited support custom build rule 26
ETL Best Practices ETL tool ETL I/O bound staging step storage I/O lookup data file ETL data mart data mart locking Key to running many concurrent processes in parallel Query, Load, Backup 27
EII Best Practices Marketinge-CommerceFulfillment Integrated Information ad-hoc WII query WII query DB2 Query Patroller remote source operation WII permanent basis WII virtual warehouse, ad-hoc remote federated query Remote data target access remote federated server remote 28
EAI Best Practices Point-to-point Hub broker workflow trace 29
EII vs. EAI vs. ETL When to use ETL When to use EII select repository connect EDW SQL When to use EAI join query Combination is normally used 30
Agenda I. II. ETL, EII, EAI III. ETL, EII, EAI Best Practice IV. V. 31
Extending the Data Warehouse Data Warehouse WS Information Integrator BI Nonrelational data Real-time data remote data(, ) DB2 ; remote data query remote data ; optimization access path target 32
POS DW Store A App WebSphere MQ DataStage Store B App WebSphere MQ Store C App WebSphere MQ ) MQStage DataStage Job. MQ DB2. 33
DataStage can leverage WS II ETL job Federation lookup join unstructured data fetch SOA external Web Services DataStage extract clean/transform lookup load DB2 II (Federation) II (Federation) source1 source2 source3 sourcea sourceb 34
WII MQ CICS/VSAM BI Portal - EDW E-Commerce Mainfram WPS WBI Server DB2 SQL via JDBC WBI JDBC Adapter DB2 SQL via JDBC WS II ODBC Wrapper MS-SQL Wrapper Oracle Wrapper SQL via ODBC WS II Classic Federation MS-SQL Oracle WBI JDBC Adapter Oracle Call Center EDW DB2 35
Agenda I. II. ETL, EII, EAI III. ETL, EII, EAI Best Practice IV. V. 36
Session Summary ; EAI App1 App2 App3 MQIn Formatting & Mapping RouteToLabel MQOut EII ETL Data Warehouse Data Mart Data Warehouse Data Mart Data Mart 37
Session Summary ; Quiz Batch Job ETL - DataStage right time EII WebSphere Information Integrator real time EAI WebSphere MQ 38