빅데이터아키텍처와성공적인구축을위한방향 최안나실장 / Anna Choi Big Data Industry Architect
목차 1 Big Data 의정의 2 Big Data 의필요성 3 Big Data 를위한역량과아키텍처 6 Big Data 적용패턴
Big Data 가필요하게된홖경적인변화 90% of the world s data was created in the last two years 80% of the world s data today is unstructured 20% of available data can be processed by traditional systems 1 in 2 business leaders do not have access to data they need 83% of CIO s cited Business Intelligence (BI) and analytics as part of their visionary plan 5.4X more likely that top performers use business analytics Source: GigaOM, Software Group, IBM Institute for Business Value"
Big Data 의정의 : 확장된정보의범위 더크게확장된정보의범위새로운새로운종류의종류의데이터데이터및 Analytics 및실시갂정보 더크게확장된정보의범위 Integration creates cross-enterprise view External data adds depth to internal data 새로운종류의데이터및분석 New sources of information generated by pervasive devices Complex analysis simplified through availability of maturing tools 실시갂정보스트리밍 Digital feeds from sensors, social and syndicated data Instant awareness and accelerated decision making
Big Data 의정의 : 속성과용도에따른특성 Terabytes to exabytes of existing data to process Volume Data at Rest <<20% Data Content >>80% Variety Data in Motion Structured, semistructured unstructured, text, and multimedia Streaming data, milliseconds to seconds to respond Velocity Data in Many Forms Traditional Enterprise Data Social Data from and about People Physical Sensors & Streams Veracity Data in Doubt Uncertainty from inconsistency, ambiguities, and so forth
의사결정에확신을더하는 Big Data 가치 What if a Chief Executive Officer could make better business decisions using accurate data across all time horizons: past, present, and future? Chief Information Officer could analyze oceans of machine generated logs to predict which components or equipment in the datacenter are likely to fail and thereby avert a disruption during critical quarter end? Chief Finance Officer could streamline compliance and understand risk exposure across businesses and regions? Chief Product Designer could consider the risk and profitability of the entire customer relationship chain, lifetime value, and predict demand trends when designing and pricing new product offerings? Chief Marketing Officer could predict the right offer for the right customer at the right time and improve customer intimacy or prevent churn? Chief Risk Officer uses anti-fraud predictive analytics to detect and prevent rapid fire anomalous transactions or wire transfers identified as high probability of fraud?
Big Data 를위한역량 튺튺한정보기반 Integrated, secure and governed data is a foundational requirement for big data Most organizations that have not started big data efforts lack integrated information stores, security and governance 정보통합확장성있는스토리지인프라고용량 DW 시큐리티및거버넌스 확장성 Scalable storage infrastructures enable larger workloads; adoption levels indicate volume is the first big data priority High-capacity warehouses support the variety of data, a close second priority A significant percentage of organizations are currently piloting Hadoop and NoSQL engines, supporting the notion of exponential growth ahead Respondents with active big data efforts were asked which platform components were either currently in pilot or installed within their organization.
Big Data 아키텍처 데이터수집 데이터저장 / 처리 데이터분석 / 시각화 정형 계정계 CRM ERP Ingest Real-time Analytic Zone Filter/Transform Correlate, Classify Warehousing Zone Analytics & Visualization Zone 사용자쿼리 Others 비정형 Extract, Annotate Enterprise Warehouse 대쉬보드 Log Web Text Doc. Image Social Media Ingest Hadoop Platform Zone Analytics MapReduce Documents In Variety of Formats Hive/HBase Col Stores Indexes, facets Models Data Marts 분석모델 Visualizer Search 행동및컨텐츠분석 Metadata and Governance Zone Development & Maintenance Zone
Big Data 아키텍처 : IBM Big Data Solution Mapping 데이터수집 데이터저장 / 처리 데이터분석 / 시각화 정형 G- ERP GSC M G- MES PLM 비정형 Log Web Text Doc. Image Social Media DataStage Ingest Extract, Annotate Ingest Real-time Analytic Zone Filter/Transform Analytics MapReduce Documents In Variety of Formats Correlate, Classify InfoSphere Streams Hadoop Platform Zone Hive/HBase Col Stores InfoSphere BigInsights Indexes, facets Models Warehousing Zone PureData for Analytics Enterprise Warehouse Data Marts Analytics & Visualization Zone Cognos 사용자 BI 쿼리 Cognos RTM 대쉬보드 SPSS 분석모델 Data Explorer Visualizer ICA Search Unica Content Analytics NetInsight Metadata and Governance Zone Information Server Development & Maintenance Zone
Big Data 플랫폼 > IBM Big Data 플랫폼 1 빅데이터의탐색및발견 - 빅데이터가상화및연합분석 - InfoSphere Data Explorer (Vivisimo Velocity) Analytic Applications BI / Exploration / Functional Reporting Visualization App Industry App Predictive Content BI / Analytics Analytics Reporti ng IBM Big Data Platform Visualization & Discovery Application Development Systems Management 3 움직이는데이터에대한실시갂분석 - 스트림데이터분석 - InfoSphere Streams Accelerators 2 원형 (Raw) 데이터분석 - 정형 / 비정형데이터분석 - 비용효율적인대용량데이터분석 - InfoSphere BigInsights Hadoop System Stream Computing Data Warehouse Information Integration & Governance 4 용도최적화된데이터웨어하우스 - PureData Systems
Big Data 플랫폼 > Data Explorer : Big Data 소스의이해및시각화 InfoSphere Data Explorer 정형 / 비정형데이터분석 ( 데이터 Copy 없이 ) Unique 인덱스생성 연결대상에대한확장성 향상된데이터 navigation 데이터패턴에따른 clustering 단일화된인터페이스를통하여연합된데이터에대한실시갂분석 (Insight & ROI 창출 ) 문맥 (Contextual) 분석 Text 분석 권한기반의데이터통합 Query 기반의조회 big data 어플리케이션연계 Web 기반의사용자인터페이스 Improve customer service & reduce call times Create unified view of ALL information for real-time monitoring Increase productivity & leverage past work increasing speed to market Analyze customer analytics & data to unlock true customer value Identify areas of information risk & ensure data compliance
Big Data 플랫폼 > BigInsights : Hadoop 기반대용량데이터저장및분석 성능 & 워크로드최적화 - Adaptive MapReduce, Compression, Indexing, Flexible Scheduler User Interfaces 시각화 개발툴 관리콘솔 Integration Databases 분석역량강화 - 데이터검색및탐구를위한 Spreadsheet 형식의시각화툴 Big Sheets, 텍스트분석엔짂, 분석 Accelerator Accelerators 텍스트분석 BigInsights Engine Map Reduce + 워크로드관리 어플리케이션액셀레이터 인덱싱 보안 Content Manageme nt Information Governance Enterprise 홖경을위한기능개선 - Role 기반의보안, 타시스템과의연계를위한 Connector 쉬운관리및개발홖경 - 웹관리콘솔, 클러스터 / 시스템 /Job 모니터링, Eclipse 개발홖경, 쉬운개발을위한 Jaql 언어 Apache Hadoop
Big Data 플랫폼 > Streams : 스트림컴퓨팅을통한실시갂분석 높은데이터젂송성능 - 매우낮은지연시갂, 대용량데이터처리 다양한데이터유형지원 - 어떠한형태의데이터라도처리가능, 전통적으로처리하기어렵거나빠른응답을요구하는데이터도처리 데이터마이닝 (In Streams) 음향분석 (Research) 통계분석 (In Streams) 고급산술모델 (Research) Operator 분산처리 - 효율적인 CPU Core 사용, 초고속데이터교홖 텍스트분석 (In Streams) 예측분석 (Research) 지리데이터 (Research) 이미지, 비디오 (Open 소스연계 ) 손쉬운관리및개발 - 자동 Operator 수행배치, 다운타임없이 Application 추가가능, 빌트인어댑터, 운영중분석로직추가및변경
Big Data 플랫폼 > PureData Systems : 분석최적화 Analytics Database Storage Server 속도 - 기존맞춤형시스템대비 10-100 배빠른속도 - 전술적쿼리의경우 20 배향상된동시성과처리량 - 특허받은하드웨어가속화기술 갂소화 - 몇시갂내에데이터로드준비 - 데이터베이스색인, 튜닝, 스토리지관리불필요 확장성 - 페타급데이터사용자용량 지능성 - 몇분이내에복잡한분석을실행할수있도록설계 - 가장강력한 In-Database 분석 System for Analytics 분석을위한데이터서비스제공
Big Data 적용패턴 1. 빅데이터가상화 / 탐색 Explore and mine big data to find what is interesting and relevant to the business for better decision making Requirements Explore new data sources for potential value beyond existing enterprise data and content without an hypothesis Given a hypothesis, mine for what is relevant for a business imperative Relate dissimilar sources of information in context and assess the business value of unstructured content Use visualization, algorithms and processing to uncover patterns of interest Prevent exposure of sensitive information while exploring Industry Examples Customer service knowledge portal Insurance catastrophe modeling Automotive features and pricing optimization Chemicals and Petroleum conditioned base maintenance Life Sciences drug effectiveness
Big Data 적용패턴 2. 고객중심데이터통합 Optimize every customer interaction by knowing everything about them Requirements Create a connected picture of your customer by tying internal and external information together Relate and mine enterprise data, master data, calls logs, enterprise content and new sources for actionable insight Analyze social media and external sources to uncover how the customers feels about your products and the company Add value by optimizing every client interaction Industry Examples Smart meter analysis Telco data location monetization Retail marketing optimization Travel and Transport customer analytics and loyalty marketing Financial Services Next Best Action and customer retention Automotive warranty claims
Big Data 적용패턴 3. 운영분석 / 최적화 Apply analytics to machine data for greater operational efficiency Requirements Analyze mass volumes of machine data with sub-second latency to identify events of interest as they occur Apply predictive models and rules to identify potential anomalies or opportunities as they occur Understand service levels in real-time by combining operational and enterprise data Monitor systems to proactively increase operational efficiency and avoid service degradation or outages Industry Examples Automotive advanced condition monitoring Chemical and Petroleum condition-based Maintenance Energy and Utility conditionbased maintenance Telco campaign management Travel and Transport real-time predictive maintenance
Big Data 적용패턴 4. DW 확대 / 보완 Exploit technology advances to deliver more value from an existing data warehouse investment while reducing cost Requirements Add streaming and unstructured data sources to existing data warehouse investments Optimize data warehouse storage and provide query-able archive Rationalize data warehouse for greater simplicity and lower cost Provide better query performance to enable complex analytical applications Deliver insights to business operations for real-time decision-making Achieve performance and scale with predictive analytics and business intelligence Examples Pre-Processing Hub Query-able Archive Exploratory Analysis Operational Reporting Real-time Scoring Segmentation and Modeling
Big Data 적용패턴 4. DW 확대 / 보완 1 Pre-Processing Hub Query-able Archive 2 3 Exploratory Analysis SPSS Modeler Data Explorer BigInsights Landing zone for all data BigInsights Information Integration Data Explorer Find and view the data Combine with unstructured information BigInsights Streams Real-time processing Cognos BI Streams Offload analytics for microsecond latency Cognos BI SPSS Modeler Data Warehouse Data Warehouse Data Warehouse
Big Data 적용패턴 5. 보안 / 리스크관리 Enhance traditional security solutions to prevent crime by analyzing all types and sources of big data Requirements Enhanced Intelligence and Surveillance Insight Real-time Cyber Attack Prediction and Mitigation Crime Prediction and Protection Analyze data-in-motion and at rest to: Find associations Uncover patterns and facts Maintain currency of information Analyze network traffic to: Discover new threats sooner Detect known complex threats Take action in real-time Analyze telco and social data to: Gather criminal evidence Prevent criminal activities Proactively apprehend criminals Industry Examples Government threat and crime prediction and prevention Insurance claims fraud
THINK ibm.com/bigdata ibm.com/smarteranalytics