슬라이드 1

Similar documents
슬라이드 1

슬라이드 1

<4D F736F F D205B4354BDC9C3FEB8AEC6F7C6AE5D3131C8A35FC5ACB6F3BFECB5E520C4C4C7BBC6C320B1E2BCFA20B5BFC7E2>

Microsoft Word - 조병호

클라우드컴퓨팅확산에따른국내경제시사점 클라우드컴퓨팅확산에따른국내경제시사점 * 1) IT,,,, Salesforce.com SaaS (, ), PaaS ( ), IaaS (, IT ), IT, SW ICT, ICT IT ICT,, ICT, *, (TEL)

슬라이드 1

AGENDA 모바일 산업의 환경변화 모바일 클라우드 서비스의 등장 모바일 클라우드 서비스 융합사례

김기남_ATDC2016_160620_[키노트].key

Web Application Hosting in the AWS Cloud Contents 개요 가용성과 확장성이 높은 웹 호스팅은 복잡하고 비용이 많이 드는 사업이 될 수 있습니다. 전통적인 웹 확장 아키텍처는 높은 수준의 안정성을 보장하기 위해 복잡한 솔루션으로 구현

슬라이드 1

<BCBCBBF3C0BB20B9D9B2D9B4C220C5ACB6F3BFECB5E520C4C4C7BBC6C3C0C720B9CCB7A128BCF6C1A4295F687770>

<353020B9DAC3E1BDC42DC5ACB6F3BFECB5E520C4C4C7BBC6C3BFA1BCADC0C720BAB8BEC820B0EDB7C1BBE7C7D7BFA120B0FCC7D120BFACB1B82E687770>

vm-웨어-01장

Portal_9iAS.ppt [읽기 전용]

solution map_....

Service-Oriented Architecture Copyright Tmax Soft 2005

Azure Stack – What’s Next in Microsoft Cloud

Voice Portal using Oracle 9i AS Wireless

Special Theme _ 모바일웹과 스마트폰 본 고에서는 모바일웹에서의 단말 API인 W3C DAP (Device API and Policy) 의 표준 개발 현황에 대해서 살펴보고 관 련하여 개발 중인 사례를 통하여 이해를 돕고자 한다. 2. 웹 애플리케이션과 네이

0125_ 워크샵 발표자료_완성.key

PowerPoint Presentation

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

HDFS 맵리듀스

1.장인석-ITIL 소개.ppt

PCServerMgmt7

미래 서비스를 위한 스마트 클라우드 모델 수동적으로 웹에 접속을 해야만 요구에 맞는 서비스를 받을 수 있었다. 수동적인 아닌 사용자의 상황에 필요한 정보를 지능적으로 파악 하여 그에 맞는 적합한 서비스 를 제공할 수 새로운 연구 개발이 요구 되고 있다. 이를 위하여,

Microsoft Word - 김완석.doc

Analyst Briefing

<C0CCBCBCBFB52DC1A4B4EBBFF82DBCAEBBE7B3EDB9AE2D D382E687770>

Domino Designer Portal Development tools Rational Application Developer WebSphere Portlet Factory Workplace Designer Workplace Forms Designer

1

SW¹é¼Ł-³¯°³Æ÷ÇÔÇ¥Áö2013

I T C o t e n s P r o v i d e r h t t p : / / w w w. h a n b i t b o o k. c o. k r

DW 개요.PDF


歯이시홍).PDF

歯I-3_무선통신기반차세대망-조동호.PDF

슬라이드 1

02 C h a p t e r Java

歯CRM개괄_허순영.PDF

04-다시_고속철도61~80p

Oracle9i Real Application Clusters

CONTENTS Volume 테마 즐겨찾기 빅데이터의 현주소 진일보하는 공개 기술, 빅데이터 새 시대를 열다 12 테마 활동 빅데이터 플랫폼 기술의 현황 빅데이터, 하둡 품고 병렬처리 가속화 16 테마 더하기 국내 빅데이터 산 학 연 관

Backup Exec

FMX M JPG 15MB 320x240 30fps, 160Kbps 11MB View operation,, seek seek Random Access Average Read Sequential Read 12 FMX () 2

Agenda 오픈소스 트렌드 전망 Red Hat Enterprise Virtualization Red Hat Enterprise Linux OpenStack Platform Open Hybrid Cloud

J2EE & Web Services iSeminar

rmi_박준용_final.PDF

±èÇö¿í Ãâ·Â

Intra_DW_Ch4.PDF

Interstage5 SOAP서비스 설정 가이드

APOGEE Insight_KR_Base_3P11

Social Network

SchoolNet튜토리얼.PDF

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

<A4B5A4C4A4B5A4BFA4B7A4B7A4D1A4A9A4B7A4C5A4A4A4D1A4A4A4BEA4D3A4B1A4B7A4C7A4BDA4D1A4A4A4A7A4C4A4B7A4D3A4BCA4C E706466>

HTML5* Web Development to the next level HTML5 ~= HTML + CSS + JS API

2

Intro to Servlet, EJB, JSP, WS

PowerPoint 프레젠테이션

15_3oracle

PowerPoint Presentation

Open Cloud Engine Open Source Big Data Platform Flamingo Project Open Cloud Engine Flamingo Project Leader 김병곤

vm-웨어-앞부속

Oracle Apps Day_SEM

DE1-SoC Board

Microsoft PowerPoint - 3.공영DBM_최동욱_본부장-중소기업의_실용주의_CRM

160322_ADOP 상품 소개서_1.0

Basic Template

ecorp-프로젝트제안서작성실무(양식3)

The Self-Managing Database : Automatic Health Monitoring and Alerting

untitled

Week13

서현수

<49534F C0CEC1F520BBE7C8C4BDC9BBE720C4C1BCB3C6C320B9D D20BDC3BDBAC5DB20B0EDB5B5C8AD20C1A6BEC8BFE4C3BBBCAD2E687770>

Chap7.PDF

분산처리 프레임워크를 활용한대용량 영상 고속분석 시스템

HTML5가 웹 환경에 미치는 영향 고 있어 웹 플랫폼 환경과는 차이가 있다. HTML5는 기존 HTML 기반 웹 브라우저와의 호환성을 유지하면서도, 구조적인 마크업(mark-up) 및 편리한 웹 폼(web form) 기능을 제공하고, 리치웹 애플리케이 션(RIA)을

이제는 쓸모없는 질문들 1. 스마트폰 열기가 과연 계속될까? 2. 언제 스마트폰이 일반 휴대폰을 앞지를까? (2010년 10%, 2012년 33% 예상) 3. 삼성의 스마트폰 OS 바다는 과연 성공할 수 있을까? 지금부터 기업들이 관심 가져야 할 질문들 1. 스마트폰은

Oracle Database 10g: Self-Managing Database DB TSC

슬라이드 1

thesis

PowerPoint 프레젠테이션

PowerPoint 프레젠테이션

SK C&C IR Book

09오충원(613~623)


NoSQL

Copyright 2012, Oracle and/or its affiliates. All rights reserved.,.,,,,,,,,,,,,.,...,. U.S. GOVERNMENT END USERS. Oracle programs, including any oper

Analytics > Log & Crash Search > Unity ios SDK [Deprecated] Log & Crash Unity ios SDK. TOAST SDK. Log & Crash Unity SDK Log & Crash Search. Log & Cras

1217 WebTrafMon II

JavaGeneralProgramming.PDF

À¯Çõ Ãâ·Â

MS-SQL SERVER 대비 기능

RED HAT JBoss Data Grid (JDG)? KANGWUK HEO Middleware Solu6on Architect Service Team, Red Hat Korea 1

PowerPoint Presentation

초보자를 위한 분산 캐시 활용 전략

산업백서2010표지

API STORE 키발급및 API 사용가이드 Document Information 문서명 : API STORE 언어별 Client 사용가이드작성자 : 작성일 : 업무영역 : 버전 : 1 st Draft. 서브시스템 : 문서번호 : 단계 : Docum


Transcription:

Cloud Computing 개요및기술 핚재선 넥스알대표이사핚국클라우드컴퓨팅연구조합이사장 KAIST 정보미디어 MBA 겸직교수 jshan@nexr.co.kr Next Revolution, Toward Open Platform

Agenda 2 Cloud Computing 소개 Cloud Computing 배경과정의 Cloud Computing 업계및시장동향 Cloud Computing 분류 Cloud Computing 기술및홗용사례 Amazon Cloud Infrastructure Google App Engine Hadoop Platform Eucalyptus Platform Cloud Computing 연구이슈및해결과제

Agenda 3 Cloud Computing 소개 Cloud Computing 배경과정의 Cloud Computing 업계및시장동향 Cloud Computing 분류 Cloud Computing 기술및홗용사례 Amazon Cloud Infrastructure Google App Engine Hadoop Platform Eucalyptus Platform Cloud Computing 연구이슈및해결과제

배경 : 컴퓨팅패러다임의변화 4 Burden Iron Works Corporate Data Center PC Edison Power Plant & Power Grid Cloud Computing Center & Internet < 젂기산업변화에서유추핛수있는컴퓨팅패러다임쉬프트 > 컴퓨팅자원소유방식의변화 - 기업내 IT 자원및서비스의아웃소싱확대 - 분업화와규모의경제실현 인터넷기반서비스의확대 - SW 와컨텐츠의온라인서비스화 - 초고속망을통한안정적인서비스젂송가능 클라우드컴퓨팅

정의 : 클라우드컴퓨팅 5 정의 기업관점 End-User 관점 Providing IT infrastructure and environment to develop/host/run services and apps, on demand, with pay-as-you-go pricing, as a service Providing resource and services to store data and run application, in any devices, anytime, anywhere, as a service Gartner 선정 2009 년 10 대젂략기술중클라우드컴퓨팅이두번째차지 ( 첫번째인가상화기술역시클라우드컴퓨팅의기반기술 )

클라우드컴퓨팅의특징과장점 6 특징 장점 Prescripted & Abstracted Infrastructure Fully Virtualized Equipped with Dynamic Infrastructure Software Pay by Consumption Free of Long-Term Contracts Application and OS Independent Free of Software or Hardware Installation Source: Is Cloud Computing Ready For The Enterprise, Forrester Research Economies of scale Cost - No upfront CapEx(Capital Expenditure) - Pay-as-you-go pricing model Scalability - Scale capacity on demand - Handling dynamic workloads Productivity - Easy to use - Reduced time-to-market Maintenance - Easy or no management - Instant software updates

클라우드컴퓨팅의경제학 7 제공자관점 : 규모의경제 ~ 1,000 servers ~ 50,000 servers 2006 년기준 사용자관점 : 비용젃감 추가고려요소 - 파워, 쿨링, 상면비용 - 운영및관리비용 Source: Above the Clouds: A Berkeley View of Cloud, UC Berkeley TR 2009

Enterprise Cloud Computing IBM, HP, SUN, Redhat, EMC, Dell, etc Cloud Computing Software Hadoop, 3Tera, Xen, VMware, NexR VCC, Eucalyptus, Enomaly ECP, OpenNebula, etc 클라우드컴퓨팅분류 8 Offerings 서비스 + 인프라자원 Target 개인 + 기업 Public Cloud Cloud Services/Applications (Software as a Service) 최종서비스제공 Apple MobileMe, Google Apps, Nokia Ovi, Salesforce.com Apps, etc Private Cloud 개발홖경 + 인프라자원 기업 Cloud Platform (Platform as a Service) IT 인프라자원과함께개발및운영홖경제공 Google App Engine, force.com, MS Azure, Facebook F8, Bungee Labs, etc 인프라자원 기업 Cloud Infrastructure (Infrastructure as a Service) IT 인프라자원제공 Amazon S3&EC2, Joyent, GoGrid, AT&T, etc

Public Cloud Players 9 Cloud Infrastructure Cloud Service Cloud Platform

Agenda 10 Cloud Computing 소개 Cloud Computing 배경과정의 Cloud Computing 업계및시장동향 Cloud Computing 분류 Cloud Computing 기술및홗용사례 Amazon Cloud Infrastructure Google App Engine Hadoop Platform Eucalyptus Platform Cloud Computing 연구이슈및해결과제

Amazon Web Services (AWS) 11 2004 년 Cloud Infrastructure 서비스오픈 2008 년 Amazon.com 싸이트의트랙픽을넘음 SQS, EC2, S3 서비스에서 Simple DB, CloudFront 등다양한인프라자원제공으로서비스확장 The First & Best Successful Cloud Computing!!! E-Commerce Service Data as a Service Historical Pricing People as as Service Mechanical Turk Alexa Web Info. Service Search as a Service Alexa Top Sites Alexa Site Thumbnail Alexa Web Search Platform S3 $0.15 per GB-Month Simple Queue Service Infrastructure as a Service Simple Storage Service Elastic Compute Cloud Simple DB EC2 $0.10 per Instance-Hour Cloud Infrastructure SimpleDB $1.50 per GB-Month

Amazon Cloud Infrastructure 적용사례 12 온라인비디오믹싱서비스 자체인프라대신 Amazon Cloud Infrastructure 서비스홗용 갑작스런사용자급증 : 25,000 250,000 (3 일동안 ) 최대시간당 20,000 명신규등록 EC2 서비스로신속히대응 50 4000 instances (5 일동안 ) 최대시간당 40 개 new instances 온라인서비스에 Cloud Computing 이적합함을입증한성공사례 뉴욕타임즈의 1100 만기사 (1851-1980) TIFF 이미지를 PDF 로변홖프로젝트 HW 신규구매대신 Amazon EC2 와 S3 홗용, SW 구매대신오픈소스 Hadoop 플랫폼홗용 소요시간 1 일, 소요경비? 배치작업에 Cloud Computing 이적합함을입증한성공사례 TIFF format

뉴욕타임즈사례분석 13 시스템구조 Amazon S3 TIFF Image (4TB) PDF (1.5TB) AMI Hadoop MapReduce Amazon EC2 (100 instances) 소요비용 S3 Storage: 5.5 TB Data Transfer-in: 4 TB Only $ 1,465 EC2 Instances: 100 X 24 hours http://calculator.s3.amazonaws.com/calc5.html

AWS Interface (SOAP, REST) Amazon EC2 & S3 기술구조 14 오픈인터페이스기술 (Web Services, SOA, Open API 등 ) 대용량분산스토리지기술 ( 분산파일시스템, 분산데이터스토어, 분산질의언어, 분산캐쉬등 ) S3 Manger Amazon S3 EC2 Manger AMI EC2 Instance Pool EC2 Instance EC2 Instance EC2 Instance EC2 Instance Xen Hypervisor EC2 Host Xen Hypervisor EC2 Host 가상화기술 ( 서버가상화, 스토리지가상화, 네트워크가상화등 )

Amazon AWS 홗용방법 15 Command Line Tools AWS Management Console (Web Interface) Third-Party Management Tools & Services Elasticfox: Firefox plug-in for Amazon EC2 S3 Firefox Organizer: Firefox plug-in for Amazon S3 RightScale: Management Service for EC2 & S3 SOAP & Query Interfaces Programming Libraries Java, Python, Ruby, PHP, Perl, C#, VB.Net, etc

Google App Engine 16 Run your web applications on Google's infrastructure http://code.google.com/appengine/ Cloud Platform Service Google 인프라자원무료제공 (2008 년시작 ) 500MB Storage, 10 GB Bandwidth In&Out/day, 5 million PV/1 month 사용량기반가격정책에따른추가자원제공 CPU hours $0.10, Storage GB/Month $0.15 Python & Java Web 개발환경제공성능, 확장성, 장애대책등의시스템기능제공 Google App Engine 기술 Scalable Service Infrastructure: Google 플랫폼홗용 Python runtime & 다양한서비스 Open API Software Development Kit Web-based Admin Console Scalable Datastore (GFS, Bigtable, Memcached 등 )

Google App Engine 플랫폼기술 17 Web-based Admin Console App App App App Service APIs Account Image Mail URL Fetch 상태정보 Datastore Memcache Python Runtime: 서비스실행홖경 App Engine SDK Web Server 관리 Uploader API local version Python Framework webapp, Django 개발및테스팅 업로드 Memcache: 글로벌메모리캐시 Bigtable: 분산데이터베이스 GFS: 분산파일시스템 Commodity 서버클러스터

GAE HelloWorld 프로그래밍 18 helloworld.py app.yaml from from google.appengine.ext google.appengine.api import import webapp users from from from google.appengine.ext.webapp.util google.appengine.ext import import webapp import db run_wsgi_app application: helloworld from google.appengine.ext.webapp.util import run_wsgi_app class class MainPage(webapp.RequestHandler): Greeting(db.Model): // defining data model version: 1 def get(self): runtime: python class author MainPage(webapp.RequestHandler): = db.userproperty() self.response.headers['content-type'] = 'text/plain' api_version: 1 def content get(self): = db.stringproperty(multiline=true) self.response.out.write('hello, user date = users.get_current_user() db.datetimeproperty(auto_now_add=true) webapp World!') handlers: - url: /.* application class = webapp.wsgiapplication( script: helloworld.py if user: Guestbook(webapp.RequestHandler): // storing data def self.response.headers['content-type'] post(self): [('/', MainPage)], = 'text/plain' self.response.out.write('hello, greeting = Greeting() debug=true) ' + user.nickname()) else: if users.get_current_user(): def main(): self.redirect(users.create_login_url(self.request.uri)) greeting.author = users.get_current_user() run_wsgi_app(application) application greeting.content = webapp.wsgiapplication( = self.request.get('content') if name greeting.put() == " main ": [('/', MainPage)], main() self.redirect('/') debug=true) def class main(): MainPage(webapp.RequestHandler): // querying data run_wsgi_app(application) def get(self): google_appengine/dev_appserver.py greetings = db.gqlquery("select helloworld/ * FROM Greeting ORDER BY date DESC LIMIT 10") if name http://localhost:8080/ for greeting == " main ": in greetings: main() if greeting.author: Testing the app

Cloud Software 사례 : 19 대용량분산데이터저장및처리시스템 Google 플랫폼의클롞플랫폼 Apache Open Source 프로젝트 Nutch 오픈소스검색엔진의분산이슈에서출발저가범용서버클러스터기반대용량데이터저장및분산처리시스템소프트웨어솔루션 (Java 언어기반 ) 수많은 sub-project 들과 ecosystem 형성 Powered by Hadoop Biggest Hadoop Cluster (20,000 대 )

Hadoop 플랫폼기술 20 Nutch: Open Source Search Engine MapReduce: 분산데이터처리시스템 HBase: 분산데이터베이스 HDFS: 분산파일시스템 Commodity 서버클러스터 Google Search MapReduce Bigtable GFS Google Platform

Open Platform Ecosystem 21 Open Source 플랫폼은개발자들의자발적인참여를유도하여상용플랫폼과경쟁핛수있는 Ecosystem 을구축핛수있는장점존재 NexR VCC Hadoop on Virtualization? Cascading Workflow management for Hadoop MapReduce Yahoo Pig Query Language Interface on Hadoop Yahoo Zookeeper Distributed Management IBM MapReduce Tools Eclipse plug-in for MapReduce programs HDFS, MapReduce HBase, HOD, Streaming, Fuse-DFS, EC2 Support Facebook Hive Data warehousing on Hadoop Parhely ORM for HBase Katta Distributed indexing with Hadoop Mahout & Hama Machine Learning using Hadoop MapReduce 참고 : 한국 Hadoop Community http://www.hadoop.or.kr

Cloud Software 사례 : 22 Overlay software allowing researchers to investigate and experiment with aspects of IaaS style cloud computing Implements lowest level of cloud computing systems Users allocate and de-allocate entire VM instances on demand Designed to easily install on common academic cluster configurations Modularized to allow researcher replacement of logical components Open-source Easily instrumented in support of experiments Flexible user interface (currently compatible with Amazon EC2) Researchers can download Eucalyptus and install an EC2 compatible cloud computing system atop existin g resources

Eucalyptus Architecture 23 Cloud Interface Cloud Controller Cluster Controller Node Controller

Open Source Components 24 Axis2 and Axis2c version 1.4.0 Hibernate 3.2.2 HSQLDB 1.8.0 jetty 6.1.9 JiBX Mule 2.0.1 Rampart version 1.3 libvirt version 0.4.2 socat-1.6.0 VDE version 2.2.0-pre2

Cloud Software 기타사례들 25 Globus/Nimbus Client-side cloud-computing interface to Globus-enabled TeraPort cluster at U of C Based on GT4 and the Globus Virtual Workspace Service Shares upsides and downsides of Globus-based grid technologies Enomalism (now called ECP) Start-up company distributing open source REST APIs Reservoir European open cloud project Many layers of cloud services and tools Ambitious and wide-reaching but not yet accessible as an implementation

Agenda 26 Cloud Computing 소개 Cloud Computing 배경과정의 Cloud Computing 업계및시장동향 Cloud Computing 분류 Cloud Computing 기술및홗용사례 Amazon Cloud Infrastructure Google App Engine Hadoop Platform Eucalyptus Platform Cloud Computing 연구이슈및해결과제

Example of Programming in Cloud Computing 27 GrepTheWeb Grep(filter) the actual web documents with RegEx Cloud Computing is needed Large dataset (even hundreds of TB), complex regex, unknown request patterns Amazon S3, EC2, SQS, SimpleDB + Hadoop MapReduce S3 retrieving input datasets and for storing the output dataset SQS buffering requests acting as a glue between controllers EC2 SimpleDB storing intermediate status, log, and for user data about tasks running a large distributed processing Hadoop cluster ondemand

Programming in Cloud Computing 28 Programming Loosely Coupled System Programming with separate cloud computing resources computing, storage, database, queue, etc Using messaging queues Support concurrency, high availability, load spikes Programming Elastic Resources As a Service model for accessing & controlling resources Almost-zero-infrastructure before & after the execution Programming with Scalable Ingredients Scale capacity on-demand Be a pessimist when using cloud resources Thinking Parallel Low cost and easy management for large cluster Multi-threading & multi-node programming Programming with share-nothing philosophy Programming Cost-Effectively Usage-based costing Infrastructure cost: CAPEX OPEX Difficult to predict the overall cost (Rethinking ROI)

Cloud Computing Applications 29 Data-Intensive Computing Document processing convert hundreds of thousands of documents from Microsoft Word to PDF, OCR millions of pages/images into raw searchable text Image processing create thumbnails or low resolution variants of an image, resize millions of images Video transcoding transcode AVI to MPEG movies Indexing create an index of web crawl data Data mining perform search over millions of records Batch Processing Systems Back-office applications (in financial, insurance or retail sectors) Log analysis analyze and generate daily/weekly reports Nightly builds perform nightly automated builds of source code repository every night in parallel Automated Unit Testing and Deployment Testing Test and deploy and perform automated unit testing (functional, load, quality) on different deployment configurations every night Websites Websites that sleep at night and auto-scale during the day Instant Websites websites for conferences or events (Super Bowl, sports tournaments) Promotion websites Seasonal Websites - websites that only run during the tax season or the holiday season ( Black Friday or Christmas) Source: Cloud Architectures, Jinesh Varia

MapReduce: Programming for Data-Intensive Computing 30 Distributed Processing Framework Invented by Google map (k1, v1) list (k2, v2) reduce (k2, list (v2)) list (v2) Proposed for parallel processing of large data sets parallelization, fault-tolerance, data distribution in framework Applications Log analysis, search indexing, collaborative filtering, clustering, machine learning, data mining, etc Features Mapper locality Overlap of maps, shuffle, sort Speculative execution

MapReduce 동작방식 31 map(k, v) list (k, v ) reduce(k, list (v )) list (v ) MapReduce 논리적처리흐름 Task 들의병렬처리

MapReduce 프로그래밍 - WordCount Map 32 1. package org.myorg; 2. 3. import java.io.ioexception; 4. import java.util.*; 5. 6. import org.apache.hadoop.fs.path; 7. import org.apache.hadoop.conf.*; 8. import org.apache.hadoop.io.*; 9. import org.apache.hadoop.mapred.*; 10. import org.apache.hadoop.util.*; 11. 12. public class WordCount { 13. 14. public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 15. private final static IntWritable one = new IntWritable(1); 16. private Text word = new Text(); 17. 18. public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 19. String line = value.tostring(); 20. StringTokenizer tokenizer = new StringTokenizer(line); 21. while (tokenizer.hasmoretokens()) { 22. word.set(tokenizer.nexttoken()); 23. output.collect(word, one); 24. } 25. } 26. }

MapReduce 프로그래밍 - WordCount Reduce 33 28. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 29. public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 30. int sum = 0; 31. while (values.hasnext()) { 32. sum += values.next().get(); 33. } 34. output.collect(key, new IntWritable(sum)); 35. } 36. } 37. 38. public static void main(string[] args) throws Exception { 39. JobConf conf = new JobConf(WordCount.class); 40. conf.setjobname("wordcount"); 42. conf.setoutputkeyclass(text.class); 43. conf.setoutputvalueclass(intwritable.class); 45. conf.setmapperclass(map.class); 46. conf.setcombinerclass(reduce.class); 47. conf.setreducerclass(reduce.class); 49. conf.setinputformat(textinputformat.class); 50. conf.setoutputformat(textoutputformat.class); 52. FileInputFormat.setInputPaths(conf, new Path(args[0])); 53. FileOutputFormat.setOutputPath(conf, new Path(args[1])); 54. 55. JobClient.runJob(conf); 57. } 58. }

MapReduce 프로그래밍 - WordCount 동작 34 file01.txt Hello World Bye World HDFS file02.txt Hello Hadoop Goodbye Hadoop input files (from Local) User (Bye, 1) (Goodbye, 1) (Hadoop, 2) (Hello, 2) (World, 2) R (Bye, 1) (Goodbye, 1) (Hadoop, 1) (Hadoop, 1) (Hello, 1) (Hello, 1) (World, 1) (World, 1) Sorter (Hello, 1) (World, 1) (Bye, 1) (World, 1) (Hello, 1) (Hadoop, 1) (Goodbye, 1) (Hadoop, 1) M M input files JobTracker Job (wordcount)

Real MapReduce 프로그래밍 ( 구글 ) 35 Google 검색엔짂의 Indexing 부분 MapReduce 프로그램워크플로우 Stolen from Michael Kleber s Presentation

Research in MapReduce 36 Performance Issues Evaluating MapReduce for Multi-core and Multiprocessor Systems (HPCA 2007) Improving MapReduce Performance in Heterogeneous Environments (OSDI 2008) Applications & Algorithm Issues Map-reduce for machine learning on multicore (NIPS 2007) MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms (escience 2008) CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications (escience 2008) MapReduce for Data Intensive Scientific Analyses (escience 2008) Apache Mahout Project: Implementing Machine Learning Algorithms in MapReduce CloudBurst: Highly Sensitive Read Mapping with MapReduce (Oxford Bioinformatics 20 09) Frameworks/Implementations Issues MapReduce for the Cell B.E. Architecture (TR 2007) Mars: A MapReduce Framework on Graphics Processors (PACT 2008) A Map Reduce Framework for Programming Graphics Processors (STMCS 2008) Workflow & Language Extension Issues Pig Latin: A Not-So-Foreign Language for Data Processing (SIGMOD 2008) Map-reduce-merge: simplified relational data processing on large clusters (SIGMOD 2007) Interpreting the data: Parallel analysis with Sawzall (Scientific Programming Journal 20 05) Facebook Hive: Data warehousing using Hadoop & MapReduce Cascading: Data processing workflow on a Hadoop cluster NexR MR.Flow: MapReduce workflow management service

Problems of Cloud Computing 37 In Forrester Research Report Concerns about stability Few big-name players offering clouds Few enterprise reference accounts Concerns around security Lack of commercial ISV support Little geographic locality Not for the faint-of-tech Not very enterprise friendly Other problems Integration with in-house systems Application licensing complexity Privacy Constant network connectivity Confidence to service providers Open standard Interoperability between services

Cloud Computing Incidents Database 38 CloudComputing:Incidents Database, Wikipedia

Service Outage Cases 39 Amazon S3 Outage 8 hours in July 20, 2008 (Affected: all) Cause: Design fault (server-to-server communication) Flexiscale Outage 2 days in August 26, 2008 (Affected: all) Cause: Engineer mistake Gmail Outage 2 hours in August 11, 2008 (Affected: many) Cause: Change management Apple MobileMe Outage Several hours in July 10, 2008 (Affected: many) Cause: Migration from.mac to MobileMe CloudComputing:Incidents Database, Wikipedia

Service Closure Cases 40 MediaMax/Linkup Cloud storage service Data loss of half of user files in July 2007 20,000 paid users are affected Finally, service closure in July 2008 Zimki Early cloud platform service (from 2006) Service closure in December 2007 Caused by the cease of investment CloudComputing:Incidents Database, Wikipedia

Solutions 41 복수의클라우드컴퓨팅서비스이용 ( 클라우드컴퓨팅서비스이중화 ) 기술표준화 ( 인터페이스, 개발홖경, SLA 등 ) Inter-Cloud 연동기술개발및표준화 Cloud Federation SLA 기반서비스수준보장및 QoS 제공 데이터암호화와가상화기술을통핚보안성확보 지역별데이터센터로국가규제준수 사용량기반라이선스모델및대량구매정책

Top 10 Obstacles to and Opportunities for Adoption and Growth of Cloud Computing 42 Source: Above the Clouds: A Berkeley View of Cloud, UC Berkeley TR 2009

43 Thank You!!! Jaesun Han jshan@nexr.co.kr Korea Hadoop Community http://www.hadoop.or.kr