Hadoop 기반 규모확장성있는패킷분석도구 충남대학교데이터네트워크연구실이연희 yhlee06@cnu.ac.kr
Intro 목차 인터넷트래픽측정 Apache Hadoop Hadoop 기반트래픽분석시스템 Hadoop을이용한트래픽분석예제 - 2-
Intro 트래픽이란 - 3-
Intro Data Explosion - 4-
Global Trend: Data Explosion - 5 -
Global Trend: Data Explosion - 6 -
Intro 인터넷트래픽측정 - 7-
Intro 트래픽측정관련이슈 - 8 -
Legacy Traffic Measurement Tools 성능과가격의 Tradeoff -9 -
Intro -10 -
Idea: 하둡기반트래픽분석시스템 트래픽분석 Map-Reudce - 11 -
Hadoop and MapReduce Hadoop 이란 오픈소스분산어플리케이션프레임워크 수천대의클러스터까지확장가능 대용량데이터의처리에적합 Google에의해제안 HDFS + MapReduce 일정수의데이터복제수유지 HDFS MapReduce - 12 -
Hadoop and MapReduce 분산파일시스템 : HDFS -13 -
Hadoop and MapReduce MapReduce 처리흐름 -14 -
Hadoop and MapReduce MapReduce 처리흐름 Map(k1, v1) list(k2, v2) Reduce(k2, list[v2]) (K2, v3) Split Data key, value key, value Data Split 1 Mapper Temp Disk Reducer Large Data Data Split 2 Mapper Temp Disk Reducer Result Data Split 3 Mapper Temp Disk 1. <Key, Value> 형태로자료를가져온다. 2. <Key, Value> 를추출한다. 1. Key, list[value] 로자료를가져온다. 2. key에대한 list[value] 를처리한다. -15 -
Hadoop and MapReduce MapReduce 예제 : WordCount 텍스트의덩어리로부터각단어의출현횟수를계산 텍스트덩어리의특징을파악할수있음 -16 -
Hadoop and MapReduce WordCount 예제 : Mapper public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(longwritable key, Text value, Context context) { String line = value.tostring(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); context.write(word, one); -17 -
Hadoop and MapReduce WordCount 예제 : Reducer public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); context.write(key, new IntWritable(sum)); -18 -
Hadoop and MapReduce WordCount 예제 : Job Driver public static void main(string[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(textinputformat.class); job.setoutputformatclass(textoutputformat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitforcompletion(true); -19 -
하둡기반트래픽분석시스템 System Overview P 3 Trace Analyzer P 3 TA Packet Analyzer Traffic Packet Collector /Loader Pcap InputFormat IO formats Binary Input/OutputF ormat Text Input/Output Format Visualization DDoS analysis Web analysis Transport analysis IP analysis HDFS Hadoop Data Source Jpcap, HDFS Data Cleansing HDFS, MapReduce User Interface A Hadoop-based Packet Trace Processing Tool, TMA 2011 Detecting DDoS Attacks with Hadoop, ACM CoNEXT2011 Student Workshop - 20 -
Hadoop and MapReduce MapReduce 에서패킷의입력처리방법 하나의패킷이다른노드에잘려져있다면, Map 은어떻게패킷을찾아낼수있을까? Text file Packet file Block2 Block3 00 00 68 2B AD 4C 38 A4 04 00 5C 00 00 00 5C 00 00 00 FF FF 00 21 B5 01 68 2B AD 4C 2B 1C 07 00 3C 00 00 00 3C 00 00 00 01 80-21 -
트래픽분석 웹사이트별접속자수, 트래픽량, URL응답시간별접속자수, 재전송율패킷의손실율네트워크의혼잡정도트래픽 량전송률 -22 -
Benchmarking Test completion time min 100 90 80 70 60 50 40 30 20 10 0 PcapRate CoralReef P3(H,10) 90.3 12.5 10 100 200 400 file SizeGB -23-8 7 6 5 4 3 2 1 0 Speed-Up P3(H,10) vs CoralReef 7.5 7.3 6.4 2.9 10 100 200 400 file SizeGB
패킷분석 MapReduce 프로그래밍 1. 우리회사에서인터넷을많이사용하는직원은누구? PacketCount : 패킷들의덩어리로부터 IP 별패킷의갯수를계산 -24 -
패킷분석 MapReduce 프로그래밍 PacketCount 예제 : Mapper public static class Map extends Mapper< LongWritable, BytesWritable, Text, IntWritable > { private final static IntWritable one = new IntWritable(1); PacketStatsWritable ps = new PacketStatsWritable(); public void map(longwritable key, BytesWritable value, Context context) { if(value.getbytes().length<min_pkt_size) return; ps.parse(value.getbytes()); // 패킷각필드추출 context.write(p.getsrc_ip(), one); -25 -
패킷분석 MapReduce 프로그래밍 PacketCount 예제 : Reducer public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); context.write( key, new IntWritable(sum)); -26 -
패킷분석 MapReduce 프로그래밍 PacketCount 예제 : Job Driver public static void main(string[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(pcapinputformat.class); job.setoutputformatclass(textoutputformat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitforcompletion(true); -27 -
패킷분석 MapReduce 프로그래밍 2. 어떤웹사이트가우리회사에서인기가있을까? -28 -
패킷분석 MapReduce 프로그래밍 2. 어떤웹사이트가우리회사에서인기가있을까? 1) 특정서버에접속한 Unique 사용자의리스트? -29 -
패킷분석 MapReduce 프로그래밍 특정서버의 Unique 사용자 : Mapper public static class Map extends Mapper<LongWritable, BytesWritable, TextPair, IntWritable> { private final int MIN_PKT_SIZE = 42; private final static IntWritable one = new IntWritable(1); public void map(longwritable key, BytesWritable value, Context context) { if(value.getbytes().length<min_pkt_size) return; PacketStatsWritable ps = new PacketStatsWritable(); if(ps.parse(value.getbytes())) // 패킷의필드추출 context.write(new TextPair(ps.getDst_ip(), ps.getsrc_ip()), one); -30 -
패킷분석 MapReduce 프로그래밍 특정서버의 Unique 사용자 : Reducer public static class Reduce extends Reducer<TextPair, IntWritable, TextPair, IntWritable> { private final static IntWritable one = new IntWritable(1); public void reduce(textpair key, Iterable<IntWritable> values, Context context) { context.write( key, one ); -31 -
패킷분석 MapReduce 프로그래밍 웹사이트별 Unique 사용자 CountUp : Job Driver public static void main(string[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "PopularityCountUp"); job.setoutputkeyclass(textpair.class); job.setoutputvalueclass(intwritable.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(pcapinputformat.class); job.setoutputformatclass(sequencefileoutputformat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitforcompletion(true); -32 -
패킷분석 MapReduce 프로그래밍 2. 어떤웹사이트가우리회사에서인기가있을까? 2) 특정서버에한번이상접속한사용자는몇명인가? -33 -
패킷분석 MapReduce 프로그래밍 웹사이트별 Unique 사용자 CountUp : Mapper public static class Map extends Mapper<TextPair, IntWritable, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); public void map(textpair key, IntWritable value, Context context) { context.write( key.getfirst(), one); -34 -
패킷분석 MapReduce 프로그래밍 웹사이트별 Unique 사용자 CountUp : Reducer public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); context.write(key, new IntWritable(sum)); -35 -
패킷분석 MapReduce 프로그래밍 웹사이트별 Unique 사용자 CountUp : Job Driver public static void main(string[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "popularitygen"); job.setoutputkeyclass(textpair.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(pcapinputformat.class); job.setoutputformatclass(sequencefileoutputformat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitforcompletion(true); -36 -
충남대플로우모니터링시스템 Hadoop+Hive 를이용한플로우분석흐름 PcapInputFormat 다운로드 https://sites.google.com/a/networks.cnu.ac.kr/yhlee/download -37 -