슬라이드 1

Similar documents
슬라이드 1

HDFS 맵리듀스

슬라이드 1

PowerPoint 프레젠테이션

김기남_ATDC2016_160620_[키노트].key

슬라이드 1

PCServerMgmt7

6주차.key

PowerPoint 프레젠테이션

12-file.key

PowerPoint 프레젠테이션

thesis

solution map_....

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

rmi_박준용_final.PDF

Something that can be seen, touched or otherwise sensed

Oracle9i Real Application Clusters

NoSQL

Oracle Database 10g: Self-Managing Database DB TSC

ch09

MS-SQL SERVER 대비 기능

vm-웨어-01장

Backup Exec

Intra_DW_Ch4.PDF

RUCK2015_Gruter_public

05-class.key

<4D F736F F F696E74202D20B8F9B0EDB5F0BAF15F32B1E220BDC9C8ADB0FAC1A4>

The Self-Managing Database : Automatic Health Monitoring and Alerting

C# Programming Guide - Types

07 자바의 다양한 클래스.key

PowerPoint 프레젠테이션

EJB Transaction & Exception

2

PowerPoint Presentation

ecorp-프로젝트제안서작성실무(양식3)

Microsoft PowerPoint - Java7.pptx

mytalk

강의10

PowerPoint 프레젠테이션


Interstage5 SOAP서비스 설정 가이드

I T C o t e n s P r o v i d e r h t t p : / / w w w. h a n b i t b o o k. c o. k r

untitled

신림프로그래머_클린코드.key

JMF3_심빈구.PDF

Analytics > Log & Crash Search > Unity ios SDK [Deprecated] Log & Crash Unity ios SDK. TOAST SDK. Log & Crash Unity SDK Log & Crash Search. Log & Cras

CONTENTS Volume 테마 즐겨찾기 빅데이터의 현주소 진일보하는 공개 기술, 빅데이터 새 시대를 열다 12 테마 활동 빅데이터 플랫폼 기술의 현황 빅데이터, 하둡 품고 병렬처리 가속화 16 테마 더하기 국내 빅데이터 산 학 연 관

スライド タイトルなし

자바 프로그래밍

5장.key

untitled

Domino Designer Portal Development tools Rational Application Developer WebSphere Portlet Factory Workplace Designer Workplace Forms Designer

@OneToOne(cascade = = "addr_id") private Addr addr; public Emp(String ename, Addr addr) { this.ename = ename; this.a



Spring Boot

fundamentalOfCommandPattern_calmglow_pattern_jstorm_1.0_f…

Special Theme _ 모바일웹과 스마트폰 본 고에서는 모바일웹에서의 단말 API인 W3C DAP (Device API and Policy) 의 표준 개발 현황에 대해서 살펴보고 관 련하여 개발 중인 사례를 통하여 이해를 돕고자 한다. 2. 웹 애플리케이션과 네이

02 C h a p t e r Java

ETL_project_best_practice1.ppt

자바-11장N'1-502

Connection 8 22 UniSQLConnection / / 9 3 UniSQL OID SET

11 템플릿적용 - Java Program Performance Tuning (김명호기술이사)

I. - II. DW ETT Best Practice

더스마트한 가상화 CCTV 관제센터 임동현주무관, 서울시관악구청

Microsoft PowerPoint - 04-UDP Programming.ppt

Journal of Educational Innovation Research 2018, Vol. 28, No. 3, pp DOI: NCS : * A Study on

FileMaker ODBC and JDBC Guide

1217 WebTrafMon II

Microsoft PowerPoint - XP Style

ilist.add(new Integer(1))과 같이 사용하지 않고 ilist.add(1)과 같이 사용한 것은 자바 5.0에 추가된 기본 자료형과 해당 객체 자료 형과의 오토박싱/언박싱 기능을 사용한 것으로 오토박싱이란 자바 컴파일러가 객체를 요구하는 곳에 기본 자료형

Chap7.PDF

untitled

Spring Data JPA Many To Many 양방향 관계 예제

슬라이드 1

05( ) CPLV12-04.hwp

°í¼®ÁÖ Ãâ·Â

1

PowerPoint Presentation

R50_51_kor_ch1

리뉴얼 xtremI 최종 softcopy

Æí¶÷4-¼Ö·ç¼Çc03ÖÁ¾š

MasoJava4_Dongbin.PDF

Intro to Servlet, EJB, JSP, WS

다중 한것은 Mahout 터 닝알 즘몇 를 현 다는것외 들을 현 Hadoop 의 MapReduce 프 워크와결 을 다는것 다. 계산 많은 닝은 컴퓨터의큰메 와연산기 을 만 Mahout 는최대한 MapReduce 기 을활용 터분 다용 졌다.. Mahout 의설 Mahou

Open Cloud Engine Open Source Big Data Platform Flamingo Project Open Cloud Engine Flamingo Project Leader 김병곤

Chap12

final_thesis

untitled

PowerPoint Presentation

<BCBCBBF3C0BB20B9D9B2D9B4C220C5ACB6F3BFECB5E520C4C4C7BBC6C3C0C720B9CCB7A128BCF6C1A4295F687770>

T100MD+

PowerPoint 프레젠테이션

Analyst Briefing

Remote UI Guide

Java XPath API (한글)

* Factory class for query and DML clause creation * tiwe * */ public class JPAQueryFactory implements JPQLQueryFactory private f

1.장인석-ITIL 소개.ppt

dbms_snu.PDF

FMX M JPG 15MB 320x240 30fps, 160Kbps 11MB View operation,, seek seek Random Access Average Read Sequential Read 12 FMX () 2

Transcription:

빅데이터기술개요 2016/8/20 ~ 9/3 윤형기 (hky@openwith.net)

D2 http://www.openwith.net 2

Hadoop MR v1 과 v2 http://www.openwith.net 3

Hadoop1 MR Daemons http://www.openwith.net 4

필요성 Feature Multi-tenancy Cluster Utilization Scalability 기능 YARN allows multiple access engines to use Hadoop as the common standard for batch, interactive and real-time engines that can simultaneously access the same data set. Multi-tenant data processing improves an enterprise s return on its Hadoop investments. Dynamic allocation of cluster resources를통해 MR 작업향상 Scheduling 기능개선으로확장성강화 (thousands of nodes managing PB s of data). http://www.openwith.net 5

Hadoop 1 Limitations Scalability NameNode 가취약점 Re-startability 낮은 Resource Utilization MR 에한정 Lack of wire-compatible protocols Max cluster size 4,000 nodes Max. concurrent tasks 40,000 Coarse sync in Job tracker Failure kills all queued and running jobs Restart is very tricky due to complex state Hard partition of resources into map and reduce slots Doesn t support other programs Iterative applications implementations are 10x slower Client and cluster must be of same version Applications and workflows cannot migrate to different clusters http://www.openwith.net 6

Hadoop 2 Design concept job Tracker 의기능을 2 개 function 으로분리 cluster resource management Application life-cycle management MR becomes user library, or one of the application residing in Hadoop http://www.openwith.net 7

MRv2 진행경과 http://www.openwith.net 8

MRv1 vs. MRv2 http://www.openwith.net 9

작업방식 개요 JobTracker/TaskTracker 의기능을세분화 a global ResourceManager a per-application ApplicationMaster a per-node slave NodeManager a per-application Container running on a NodeManager ResourceManager 와 NodeManager 가새로도입 ResourceManager ResourceManager 가 application 간의자원요청을관리 (arbitrates resources among applications) ResourceManager 의 scheduler 를통해 resource allocation to applications ApplicationMaster = a framework-specific entity 로서필요한 resource container 를 scheduler 로부터할당받음 ResourceManager 와협의한후 NodeManager(s) 를통해 component tasks 를수행 Also, tracks status & monitors progress NodeManager = per-machine slave, is responsible for launching the applications containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager. http://www.openwith.net 10

Hadoop 프로그래밍

Hadoop 1.0 2.0 Hadoop 1.0 HDFS MR Hadoop 2.0 = Hadoop v1.0 + HDFS HA support of HDFS NameNode through/with ZooKeeper for failure detection & active NameNode election HDFS Federation HDFS snapshot Heterogeneous Storage hierarchy support In-memory data cashing YARN

Hadoop 2.0 YARN = central resource scheduler = ResourceManager + NodeManager + container (= a unit of resource allocation) JobTracker 에서분화» Cluster management & Job scheduling RM» Job coordination Application Master (; This shifting of allocation coordination responsibilities reduces the burden on the RM)» + new JobHistoryServer

Hadoop 1.0

Hadoop 2.0 과 YARN 출처 : http://www.edureka.co/blog/introduction-tohadoop-2-0-and-advantages-of-hadoop-2-0/

YARN

출처 : http://www.edureka.co/blog/introduction-tohadoop-2-0-and-advantages-of-hadoop-2-0/

YARN 의특징 (1) JobTracker 를 RM 과 ApplicationMaster 로분리 YARN cluster 마다 AM 이존재하고 cluster 내의각서버마다 NM 가존재 (2) 효율적인자원관리 각서버마다의 NM 들이 task 를실행하고필요한자원을과니하므로 Hadoop 1.0 에서와같은 Mapper, Reducer 의 slot 수와같은개념자체가없어졌다. H2.0 에서는 Mapper, Reducer 가모두 container 안에서동작하고 container 자체도전체 cluster 의 resource 상황과요청된 job 의 resource 요구에따라결정된다. (3) 확장성범위확대 기존 4,000 대 node, 40,000 개 task 의한계 - 이러한한계가극복됨 (4) 다양한분산처리환경지원 SPARK, HAMA, GIRAPH 등. 그밖에도 SAP, IBM, EMC 등이자사의솔루션과연동을추진

YARN 의구성요소 (1) RM ; cluster 마다존재하며 cluster 전반의자원관리와 task 들의 scheduling 담당. a. Scheduler b. Application Manager c. Resource Tracker (2) Node Manager ; 해당 container 의 resource 사용량을모니터링하고관련정보를 Resource Manager 에게알린다. a. Application Master = 하나의프로그램에대한 master 역할 b. Container ; 모든작업 (job) 은여러개의 task 로세분화며각 task 는하나의 container 안에서실행.

YARN 활용 :

MR 프로그래밍

Data Types

[ 실습 ] Streaming pipes www.gutenberg.org Hound of Baskerville input.txt mapper1.py $./mapper1.py < input.txt $./mapper2.py < input.txt $./mapper2.py < input.txt sort $./mapper2.py < input.txt sort./reducer2.py $./mapper3.py < input.txt sort./reducer2.py $./mapper3.py < input.txt sort./reducer3.py sort -r $./mapper3.py < input.txt sort./reducer3.py sort r head n 3

[ 실습 ] MapReduce 기초 MR and computational flows

[ 실습 ] MR for WordCount

[ 실습 ] MR for WordCount + Combiner 추가

import public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); context.write(word, one);

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); result.set(sum); context.write(key, result);

public static void main(string[] args) throws Exception { Configuration conf = new Configuration(); String[] otherargs = new GenericOptionsParser(conf, args).getremainingargs(); if (otherargs.length!= 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); Job job = Job.getInstance(conf, "word count"); job.setjarbyclass(wordcount.class); job.setmapperclass(tokenizermapper.class); /**** To enable Combiner, uncomment! ****/ //job.setcombinerclass(intsumreducer.class); job.setreducerclass(intsumreducer.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true)? 0 : 1);

[ 실습 ] MR 활용 Analytics Web log 의평균, 최대, 최소파일크기를파악하는 Hadoop MR 프로그램 데이터 : weblog dataset from ftp://ita.ee.lbl.gov/traces/nasa_access_log_jul95.gz

public class MsgSizeAggregateMapReduce extends Configured implements Tool { public static void main(string[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new MsgSizeAggregateMapReduce(), args); System.exit(res); @Override public int run(string[] args) throws Exception { if (args.length!= 2) { System.err.println("Usage: <input_path> <output_path>"); System.exit(-1); /* input parameters */ String inputpath = args[0]; String outputpath = args[1]; Job job = Job.getInstance(getConf(), "WebLogMessageSizeAggregator"); job.setjarbyclass(msgsizeaggregatemapreduce.class); job.setmapperclass(amapper.class); job.setreducerclass(areducer.class); job.setnumreducetasks(1); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); FileInputFormat.setInputPaths(job, new Path(inputPath)); FileOutputFormat.setOutputPath(job, new Path(outputPath)); int exitstatus = job.waitforcompletion(true)? 0 : 1; return exitstatus; /* @author Srinath Perera (hemapani@apache.org) * @author Thilina Gunarathne (thilina@apache.org) */

public static class AMapper extends Mapper<Object, Text, Text, IntWritable> { public static final Pattern httplogpattern = Pattern.compile("([^\\s]+) - - \\[(.+)\\] \"([^\\s]+) (/[^\\s]*) HTTP/[^\\s]+\" [^\\s]+ ([0-9]+)"); public void map(object key, Text value, Context context) throws IOException, InterruptedException { Matcher matcher = httplogpattern.matcher(value.tostring()); if (matcher.matches()) { int size = Integer.parseInt(matcher.group(5)); context.write(new Text("msgSize"), new IntWritable(size));

public static class AReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { double tot = 0; int count = 0; int min = Integer.MAX_VALUE; int max = 0; Iterator<IntWritable> iterator = values.iterator(); while (iterator.hasnext()) { int value = iterator.next().get(); tot = tot + value; count++; if (value < min) { min = value; if (value > max) { max = value; context.write(new Text("Mean"), new IntWritable((int) tot / count)); context.write(new Text("Max"), new IntWritable(max)); context.write(new Text("Min"), new IntWritable(min));

YARN 의문제점 Complexity Protocol are at very low level, very verbose Long running job 에적합치않음 Application doesn't survive Master crash No built-in communication between container and master Hard to debug http://www.openwith.net 37

Hadoop 의장단점과대응 Haddop 의장점 commodity h/w scale-out fault-tolerance flexibility by MR Hadoop 의단점 MR! Missing! - schema 와 optimizer, index, view,... 기존 tool 과의호환성결여 해결책 : Hive SQL to MR Compiler + Execution 엔진 Pluggable storage layer (SerDes) 미해결숙제 : Hive ANSI SQL, UDF,... MR Latency overhead 계속작업중...! http://www.openwith.net 38