Hadoop Tutorial - 설치및실행 2008. 7. 17 한재선 (NexR 대표이사 ) jshan0000@gmail.com http://www.web2hub.com H.P: 016-405-5469
Brief History Hadoop 소개 2005년 Doug Cutting(Lucene & Nutch 개발자 ) 에의해시작 Nutch 오픈소스검색엔진의분산확장이슈에서출발 2006년 Yahoo의전폭적인지원 (Doug Cutting과전담팀고용 ) 2008년 Apache Top-level Project로승격현재 (2008년4월) 0.16.3 release Hadoop Java 언어기반 Apache 라이선스 많은컴포넌트들 HDFS, HBase, MapReduce, Hadoop On Demand(HOD), Streaming, HQL, Hama, Mahout, etc
Hadoop 구조 Nutch: Open Source Search Engine MapReduce: 분산데이터처리시스템 HBase: 분산데이터베이스 HDFS: 분산파일시스템 Google Search MapReduce Bigtable GFS Commodity PC 서버클러스터
Hadoop Versions Version Release 0.16.3 Release 2008.4.16 0.16.4 Release 2008.5.5 0.17.0 Release 2008.5.20 0.17.1 Release 2008.6.23 0.17.2 Not released yet 0.18.0 Not released yet 0.19.0 Not released yet Current stable version
HBase Versions Version Release 0.1.0 Release 2008.3.27 0.1.1 Release 2008.4.11 0.1.2 Release 2008.5.13 0.1.3 Release 2008.6.27 0.2 Not released yet Current stable version
Hadoop Project Issue Tracking http://issues.apache.org/jira/browse/hadoop
Hadoop Project 상황 지난 100 일간생성된이슈와해결된이슈들누적분포
Hadoop 설치 1. hadoop-0.17.1.tar.gz 다운및압축해제 2. conf/hadoop-env.sh 편집 export JAVA_HOME=/usr/java/jdk1.6.0_03 3. conf/hadoop-site.xml 편집 (conf/hadoop-default.xml 에서필요한내용가져와편집 ) <property> <name>hadoop.tmp.dir</name> <value>/home/${user.name}/tmp/hadoop-0.17.1-${user.name}</value> <description>a base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://192.168.1.2:9000/</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.1.2:9001</value> </property> <property> <name>dfs.replication</name> <value>3</value> <!-- set to 1 to reduce warnings when running on a single node --> </property>
Hadoop 설치 4. conf/masters 편집 192.168.1.2 5. conf/slaves 편집 192.168.1.3 192.168.1.4 192.168.1.5 6. ssh pub-key 등록 $ ssh-keygen $ ssh-copy-id -i ~/.ssh/id_rsa.pub id@server // 접속하려는서버의 id/ 주소 7. HDFS format $ bin/hadoop namenode format 8. 실행 $ bin/start-all.sh 9. 정지 $ bin/stop-all.sh % HADOOP_HOME 경로는 master 와 slave 모두동일하게하자. master 에서실행시 slave 들과 rsync 를통해 sync 맞춤 % 문제있는경우 iptable 설정확인 iptable 설정을제거하거나 hadoop 이쓰는포트들을등록
Hadoop 실행 - DFS $ bin/hadoop dfs Usage: java FsShell [-ls <path>] [-lsr <path>] [-du <path>] [-dus <path>] [-count <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm <path>] [-rmr <path>] [-expunge] [-put <localsrc>... <dst>] [-copyfromlocal <localsrc>... <dst>] [-movefromlocal <localsrc>... <dst>] [-get [-ignorecrc] [-crc] <src> <localdst>] [-getmerge <src> <localdst> [addnl]] [-cat <src>] [-text <src>] [-copytolocal [-ignorecrc] [-crc] <src> <localdst>] [-movetolocal [-crc] <src> <localdst>] [-mkdir <path>] [-setrep [-R] [-w] <rep> <path/file>] [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>] [-tail [-f] <file>] [-chmod [-R] <MODE[,MODE]... OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-chgrp [-R] GROUP PATH...] [-help [cmd]]
Hadoop 실행 - MapReduce 1. 작성한 MapReduce class 들을 jar 파일로묶기 $ jar -cvf wordcount.jar -C wordcount_classes/. 2. DFS에 input 디렉토리생성및 input file들복사 $ bin/hadoop dfs mkdir wordcount/input $ bin/hadoop dfs ls wordcount /user/jshan/wordcount/input <dir> $ bin/hadoop dfs put file01.txt wordcount/input // file01.txt는로컬파일 $ bin/hadoop dfs put file02.txt wordcount/input // file02.txt는로컬파일 $ bin/hadoop dfs ls wordcount/input /user/jshan/wordcount/input/file01.txt <r 1> /user/jshan/wordcount/input/file02.txt <r 1> $ bin/hadoop dfs cat wordcount/input/file01.txt Hello World Bye World $ bin/hadoop dfs cat wordcount/input/file02.txt Hello Hadoop Goodbye Hadoop 3. MapReduce 실행 $ bin/hadoop jar wordcount.jar org.myorg.wordcount /user/jshan/wordcount/input /user/jshan/wordcount/output $ bin/hadoop dfs -cat /user/jshan/wordcount/output/part-00000 Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 Source: http://hadoop.apache.org/core/docs/r0.17.1/mapred_tutorial.html
Hadoop MapReduce 프로그래밍 1. package org.myorg; 2. 3. import java.io.ioexception; 4. import java.util.*; 5. 6. import org.apache.hadoop.fs.path; 7. import org.apache.hadoop.conf.*; 8. import org.apache.hadoop.io.*; 9. import org.apache.hadoop.mapred.*; 10. import org.apache.hadoop.util.*; 11. 12. public class WordCount { 13. 14. public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 15. private final static IntWritable one = new IntWritable(1); 16. private Text word = new Text(); 17. 18. public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 19. String line = value.tostring(); 20. StringTokenizer tokenizer = new StringTokenizer(line); 21. while (tokenizer.hasmoretokens()) { 22. word.set(tokenizer.nexttoken()); 23. output.collect(word, one); 24. } 25. } 26. }
Hadoop MapReduce 프로그래밍 28. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 29. public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 30. int sum = 0; 31. while (values.hasnext()) { 32. sum += values.next().get(); 33. } 34. output.collect(key, new IntWritable(sum)); 35. } 36. } 37. 38. public static void main(string[] args) throws Exception { 39. JobConf conf = new JobConf(WordCount.class); 40. conf.setjobname("wordcount"); 41. 42. conf.setoutputkeyclass(text.class); 43. conf.setoutputvalueclass(intwritable.class); 44. 45. conf.setmapperclass(map.class); 46. conf.setcombinerclass(reduce.class); 47. conf.setreducerclass(reduce.class); 48. 49. conf.setinputformat(textinputformat.class); 50. conf.setoutputformat(textoutputformat.class); 51. 52. FileInputFormat.setInputPaths(conf, new Path(args[0])); 53. FileOutputFormat.setOutputPath(conf, new Path(args[1])); 54. 55. JobClient.runJob(conf); 57. } 58. }
Hadoop DFS 관리도구
Hadoop MapReduce 관리도구