ETL ETL Data,., Data Warehouse DataData Warehouse ETL tool/system: ETL, ETL Process Data Warehouse Platform Database, Access Method Data Source Data Operational Data Near Real-Time Data Modeling Refresh/Replication Metadata Data Bulk Load Near Real-Time User Data Mart Enterprise Warehouse
ETL ETL SoR(System of Record)Table/File, Column/? Target Column?,? Transformation? Process Flow Cleansing, Transformation? Metadata? Load/Refresh??? Transformation???
ETL Process ETL Process Flow
ETL Process ETL Process Metadata Scheduling Metadata DSA SOR Parsing EDW Scheduling Clickstream Data
ETL Process System of Record (SoR) data source, data source : SoR DSA SOR change data capture time stamp
ETL Process data data data DSA SOR
ETL Process source source DSA SOR DSA match-merge/purge rule row
ETL Process ETL Data Demographics(row count, Distinct Value Count, Value Variance) Domain Violation Aggregation Outer Join Analysis Cartesian Product Analysis Business Rule Validation Report
ETL Process Load Load Operation Incremental Update Time stamp CDC(Change Data Capture) Frequent Load(Store & Forward) Bulk Load Near Real-Time EAI Peer-to-Peer Log Audit File Before ImageAfter Image (Refresh) : Insert, Mass Insert, Update, Delete, Mass Delete DB (Utility): Check Data, Load, Reorg, Recover
ETL Process Metadata Metadata ETL metadata metadata
ETL Process Metadata Metadata data definition report data quality tracking metadata driven business user interface decision support impact analysis enterprise wide impact analysis metadata controlled system Meta Data ROI Curve Meta Data Controlled System Enterprise-Wide Impact Analysis Decision Support Impact Analysis Meta Data Driven Business User Interface Data Quality Tracking Data Definition Reporting Meta Data
ETL Process Metadata Metadata : Metadata Exchange Standard Model(Meta Data Council) TOOL A TOOL B TOOL C TOOL D TOOL E TOOL PROFILE TOOL PROFILE TOOL PROFILE TOOL PROFILE TOOL PROFILE USER CONFIGURATRION STANDARD ACCESS FRAMEWORK STANDARD API STANDARD METADATA MODEL
ETL Process Metadata Metadata Model
Data Quality 1. Data Quality Data Quality Management Data Quality Data Quality DW, BI,, (correction, cleansing) (preventing). Data Quality Management 4 Data Quality : Data Quality : / Data Quality : data cleansing, data integration, data enrichment Data Quality : application Data Quality Report Enterprise Data Quality Management Data Quality Tracking Data Quality Data Quality Data Quality Data Quality Data Quality
Data Quality Data Quality(DQ) DQ Process DQ Process Owner Sponsor ( ) DQ DQ DQ DQ - Diagram DQ Test Test DQ (/) DQ Process /
ETL - P h a s e / / T a s k ETL, Master Plan ETL Source Data Target Data ETL Review Master Plan Source Target System Interface Quality Data Data Data
ETL - P h a s e ETL Process ETL T a s k Data Mapping Transformation ETL Program Plan Set up Module Test Test Module Test Data Transformation Mapping Plan Program Source Load Script Test
Data Sampling Prototype Test Prototype Test Prototype Test ETL Plan Prototyping Plan Test Load Plan ETL - P h a s e T a s k
ETL -, ETL Target, Source Data, ETL Program Sampling ETL. Data, System, Data Prototyping Prototyping,.,.. Data Mapping,,. Data Plan. Transformation ETL Prototyping.... Module., Plan Data, ETL Program.,.... H I g h l I g h t Target Table Target Column Target Data Target DBMS Source Table Source Column Source Data Source DBMS Key Man, Source Target Target, ETL, Source 3 Interface Source Data Target Data Target Data Source Data Mapping Transformation Data Data Load ETL Process Flow Data Program Program Plan ETL ( ) Plan Plan ETL Set Up Data Sampling Prototype Test ETL Program Prototype Test Data Sampling Program ETL DataETL Test Test Prototype Test Prototype Test ETL
ETL Best Practice - SoR(system of record) BQ(business question)ds(data source) matrix, DT(data transformation) rule NFD(not found data) DQ(data quality) DQ data profiling: data(sample) DQ DQ DC(data cleansing) DSI(data source gap & issue) ETL review & BQ review DS DS System BQ/DS Matrix NFD NFD DQ Data Quality Data Cleansing DS Gap & Issue DS Gap & Issue DS Issue
ETL Best Practice - DE(data extract) DT (data transformation) rule DC(data cleansing) rule ETL process ETL program,, program, DFD(data flow diagram) program ETL metadata, data model ETL Mapping & Cleansing Data Mapping Data Mapping Matrix DE DT Data Cleansing DC Process & Program Process ETL Process ETL Process Flow Diagram Program Program Program DFD ETL Plan ETL Metadata Metadata Metadata Model
ETL Best Practice - ETL ETL program & ETL (prototyping) ETL tuning ETL Program Program Program Program DFD ETL Plan Program Source/Script Program ETL Prototyping Prototyping Prototyping Prototyping Prototyping Prototyping ETL Tuning ETL Process Tuning Process Tuning ETL Program Tuning Program Tuning
ETL ETL : : & error Metadata Repository / ETL ETL Manager Manager ETL data ETL PM ETL DSA SOR / / / ETL ETL Admin Admin PM ETL ETL & error / & error ETL & error / & DSA
ETL Scheduler Process USER : JOB : :,, loading Rule JOB : JOB Message : ETL LOG : ETL Logging : Scheduling : : JOB,, : JOB log. : JOB Process Manager Process Manager Process Manager Process Manager fork() Init OK Start ETL Process Status Check ETL Status Response Process, Terminate code ETL, Process Kill Kill OK ETL Process JOB, Kill exit
Metadata Data Model(sample) ETL
Metadata Data Model(sample) ETL
ETL, 2 / R&D Data DW Data Data Data,,,? ETL / Data Data Data Product Data Data Customer Data Data Sales Data Data Market Data Data G/L Data Revenue Data External Data R&D Operational System Business Intelligence
ETL