빅데이터플랫폼운영자가부딪치는다양한이슈와효율적인운영관리방안 OPUSLab 총괄아키텍트 김병곤
Hadoop 설치시주의및고려사항 구분 주의및고려사항 OS Cloudera CDH 설치시반드시 OpenJDK (X), Oracle JDK(O) JDK Version 제약있음 JDK는반드시 RPM 설치 NTP 시간동기화제일중요 (NTP가없다면구성필요 ) iptable, SELINUX Off Transparent Huge Page (THP) Off RHEL 6 이후버전에는기본활성화, 성능향상목적이었으나저하가되는경우도많아서 Off 시키는것을권장 Kernel 2.6.32-303 이전버전의경우 vm.swappiness를 0으로설정하고그이후는 1로설정 Network Disk ETC Hadoop Security 적용시 DNS 필수 (10 노드이상인경우 /etc/hosts 보다 DNS 고려 ) IPv6 Off DNS는 Forward, Reverse Lookup이모두되어야함 (Reverse Lookup이안되는경우발생 ) IP Address와호스트명매핑시호스트명은반드시 FQDN (Fully Qualified Domain Name) 이지정되어야함예시 ) 10.10.10.1 hadoop1.example.com hadoop1 호스트에 2개의 IP가있는케이스 X Data Node의 HDFS를구성하는 Disk는 RAID가아닌 JBOD (Just Bunch Of Disk) 로구성하며 LVM X RHEL 6.x에서 ext3는권장하지않으며상위버전에서는 xfs가기본 OS를설치하면기본 Reserved Space가전체 5% 사용하므로 0% 로조정 ( 큰용량인경우 5% 도큼 ) BIOS에서 SATA Driver가 IDE Emulation 하지않도록 Off Cloudera Manager 및 Cloudera CDH의 Hadoop 관련서비스에필요한다른서버의버전중요! ( 예 ; MySQL 버전이인증되지않은버전을사용하는경우설치시문제발생 문제해결에많은노력및시간소요 )
YARN Scheduler 다수의사용자가 Hadoop Cluster 를사용하는경우 Hadoop Cluster 의자원을관리하는정책관리필요 Hadoop 은 Capacity Scheduler, Fair Scheduler 를기본을제공하여자원분배를관리 스케줄러개요주요특징 Fair Scheduler Capacity Scheduler 모든 Job 이공평하게리소스를점유하도록리소스를관리 하둡클러스터의사용을극대화하기위한멀티태넌트클러스터를추구 자원부족시다수의사용자에게자원을배정하는어려운문제점을최소화 미리사전에자원을점유하지않음 동적으로필요시자원을배정받음 메모리로만스케줄링 종류 : FairSharePolicy, FifoPolicy, DominantResourceFairnessPolicy 메모리와 CPU 로스케줄링 단일애플리케이션, 사용자, Queue 에서자원을모두점유하지않도록하여하둡클러스터의사용을극대화 하기위한멀티태넌트클러스터를추구 멀티태넌트지원 사용이가능한자원을다른 Queue 에배정 각 Queue 에엄격한 ACL 적용 용량개런티 계층적구조의 Queue 지원 운영중안전한방법으로영향도를최소화하여변경가능 사용자와그룹별로 Queue 매핑지원 동일사용자가 Job 을계속실행하면 Job 이 FIFO 방식으로처리되어뒤에실행된 Job 은 Pending
YARN Scheduler Hadoop Cluster 를사용하는사용자가사용할수있는 Memory, vcore 등의자원을제한하기위한정책을수립하고설정하여적용 각스케줄러별차이점이존재하므로각스케줄의동작특성에따라설정필요 FIFO Scheduler Capacity Scheduler Fair Scheduler
Capacity Scheduler Configuration
Cloudera Manager 의 Scheduler Configuration
Cloudera Manager 의 Scheduler Configuration
Node Label 서로다른 Server 사양이클러스터에혼재되어있는경우 ( 예 ; 증설 ) 성능차이가발생 2년후에증설한서버는 2년전에구매한서버보다사양이떨어짐 (CPU, RAM 등 ) 따라서, MapReduce, Spark 등의작업을실행하는경우구형서버에서성능차이로전체작업의성능이저하됨 사용자에따라서서로다른스펙의서버에작업을배정하는경우가발생할수있음
Node Label Hadoop 2.6에서처음나온 Node Label 적용시하나의클러스터를다수의클러스터로나누는효과가있음 MapReduce, Hive 등에서직접적용하기어려운현재이슈가있음 향후개선을기대 현재 Cloudera CDH는지원하지않음 Capacity Scheduler 만지원함 (vcore, Memory 자원스케줄링 ) 스케줄링시과도한 Latency가발생할수있음 YARN 애플리케이션에동시에 1개의 Label만지정가능
Node Label # sudo su yarn # hadoop jar hadoop-yarn-applications-distributedshell.jar \ -shell_command "sleep 100" -jar hadoop-yarn-applications-distributedshell.jar \ -num_containers 30 \ -queue a1 \ -node_label_expression x
HDFS Rebalancing Hadoop Cluster 의 HDFS 여유용량이부족하여 Hadoop Cluster 증설시디스크간 / 서버간용량분배가공평하지않음 신규데이터는적절하게분배되지만, 과거데이터는여전히기존서 버에유지 따라서 HDFS 를구성하는서버의스토리지사용량의균형을맞추어 야함 (HDFS Rebalancing) 데이터가크면클수록과도한 Rebalancing 시간이소요됨 동일노드간 Disk Balancing, 서로다른노드간 Storage Balancing 으로구분 HBase, Impala 등의경우 Rebalancing 후에급격한성능저하가발생할수있으므로반드시재시작등의추가조치가필요
HDFS Rebalancing Disk Balancing Disk Balancing 은동일서버에구성한 JBOD 기반의 HDD 의용량불균형이발생했을때적용 디스크를증설하거나, 많이사용함에따라서디스크간불균형문제를해소할필요가있음 Round-Robin, Available Space Choosing Policy 적용가능 (Available Space Choosing Policy 는권장하지않음 )
HDFS Rebalancing Disk Balancing Disk Balancing 을위해서 Cloudera Manager 에서다음과같이 dfs.disk.balancer.enabled 를 true 로설정
[ 관리이슈 ] HDFS Rebalancing Disk Balancing # df -h. /var/disk1 5.8G 3.6G 1.9G 66% /mnt/disk1 /var/disk2 5.8G 13M 5.5G 1% /mnt/disk2 # hdfs diskbalancer -plan lei-dn-3.example.org 16/08/19 18:04:01 INFO planner.greedyplanner: Starting plan for Node : lei-dn- 3.example.org:20001 16/08/19 18:04:01 INFO planner.greedyplanner: Disk Volume set 03922eb1-63af-4a16-bafefde772aee2fa Type : DISK plan completed.th 16/08/19 18:04:01 INFO planner.greedyplanner: Compute Plan for Node : lei-dn- 3.example.org:20001 took 5 ms 16/08/19 18:04:01 INFO command.command: Writing plan to : /system/diskbalancer/2016-aug-19-18-04-01 # hdfs diskbalancer -execute /system/diskbalancer/2016-aug-17-17-03-56/172.26.10.16.plan.json 16/08/17 17:22:08 INFO command.command: Executing "execute plan" command # df -h Filesystem Size Used Avail Use% Mounted on. /var/disk1 5.8G 2.1G 3.5G 37% /mnt/disk1 /var/disk2 5.8G 1.6G 4.0G 29% /mnt/disk2
HDFS Rebalancing Server Storage Balancing by Cloudera Manager
HDFS Rebalancing Server Storage Balancing by Hadoop CLI $ hdfs dfsadmin -setbalancerbandwidth 100000000 $ hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 \ -Ddfs.balancer.movedWinWidth=5400000 \ -Ddfs.balancer.moverThreads=1000 \ -Ddfs.balancer.dispatcherThreads=200 \ -Ddfs.datanode.balance.max.concurrent.moves=5 \ -Ddfs.balance.bandwidthPerSec=100000000 \ -Ddfs.balancer.max-size-to-move=10737418240 \ -threshold 5
HDFS Transparent Encryption HDFS Client 에서암복호화가이루어짐 Key Management(Cloudera 의경우 Cloudera Navigator Key Trustee Server) 는 HDFS 외부에존재하여 HDFS 는 unencrypted data 또는 encryption key 접근불가 OS 와 HDFS 는암호화된 HDFS 데이터만처리 (OS, File System Level 위협대응 )
HDFS Transparent Encryption HDFS 의특정영역 (Zone) 을암호화 Encryption Zone Encryption Zone 의데이터는알아서암복호화처리 Encryption Zone 에는 Zone 생성시관리자가 EZ Key 가지정 각각의 Encryption Zone 에는각각의 Encryption Key 가필요하며 Data Encryption Key(DEK) 라고부름 DEK 는 EZ Key 로암호화되어 EDEK 가됨 EDEK 는 Namenode 에저장되며각각의파일의메타데이터의일부분
HDFS Transparent Encryption
HDFS Transparent Encryption # As the normal user, create a new encryption key hadoop key create mykey # As the super user, create a new empty directory and make it an encryption zone hadoop fs -mkdir /zone hdfs crypto -createzone -keyname mykey -path /zone # chown it to the normal user hadoop fs -chown myuser:myuser /zone # As the normal user, put a file in, read it out hadoop fs -put helloworld /zone hadoop fs -cat /zone/helloworld # As the normal user, get encryption information from the file hdfs crypto -getfileencryptioninfo -path /zone/helloworld # console output: {ciphersuite: {name: AES/CTR/NoPadding, algorithmblocksize: 16}, cryptoprotocolversion: CryptoProtocolVersion{description='Encryption zones', version=1, unknownvalue=null}, edek: 2010d301afbd43b58f10737ce4e93b39, iv: ade2293db2bab1a2e337f91361304cb3, keyname: mykey, ezkeyversionname: mykey@0}
256-bit HDFS Encryption Performance Improvement Hadoop HDFS Encryption 은기본으로 AES-CTR(Advanced Encryption Standard-Counter mode) 암호화알고리즘을사용 AES는 AES-128, AES-256 사용할수있음 AES-128은 JDK에서기본으로제공하지만 AES-256은미국의관련법의제약으로별도설치필요 AES-256은 JCE(Java Cryptography Extension) unlimited strength JCE 설치가필요 http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html AES 의성능을위해서 AES-NI 적용가능 (Intel CPU 의 Hardware 가속을통해서최소의성능손실로암호화 ) Intel CPU(2010 년에출시한 Westmere 부터 ) 에는 Intel Advanced Encryption Standard Instruction (AES-NI) 가탑재 JDK Server VM 에서만 activate $ wget http://download.oracle.com/otn-pub/java/jce/7/unlimitedjcepolicyjdk7.zip $ unzip UnlimitedJCEPolicyJDK7.zip $ cp local_policy.jar $JAVA_HOME/jre/lib/security $ cp US_export_policy.jar $JAVA_HOME/jre/lib/security
256-bit HDFS Encryption Performance Improvement CDH 에서 HDFS Transparent Encryption 의성능을최적화하기위해서별도의절차가필요 추가적인성능향상을위한 HDFS, MapReduce Client 에 OpenSSL 의 libcrypto.so 설치 $ sudo yum install openssl-devel $ wget http://mirror.centos.org/centos/6/os/x86_64/packages/openssl-1.0.1e-30.el6.x86_64.rpm $ rpm2cpio openssl-1.0.1e-30.el6.x86_64.rpm cpio -idmv $ sudo mkdir -p /var/lib/hadoop/extra/native $ sudo cp./usr/lib64/libcrypto.so.1.0.1e /var/lib/hadoop/extra/native/libcrypto.so $ hadoop checknative 14/12/12 13:48:39 INFO bzip2.bzip2factory: Successfully loaded & initialized native-bzip2 library system-native14/12/12 13:48:39 INFO zlib.zlibfactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/lib64/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib64/libbz2.so.1 openssl: true /usr/lib64/libcrypto.so
256-bit HDFS Encryption Performance Improvement
256-bit HDFS Encryption Performance Improvement
256-bit HDFS Encryption Performance Improvement
MapReduce Performance Profiling Spark 이후에 MapReduce 가구시대의유산처럼취급당하지만여전히안정적이며뛰어난병렬 / 분산프레임워크임은확실 MapReduce 는 Map과 Reduce Task로분리되어 Hadoop의 Worker Node에서수많은 JVM 상에서동작 개발시다양한성능문제에직면하며, MapReduce Job에대한성능튜닝은늘어려운문제 환경적인요인을제외하고 MapReduce Job을 Java Performance 측면에서어떻게프로파일링할것인가를고민해볼필요는있음
MapReduce Performance Profiling MapReduce 코드에서외부 Java Library 를사용하거나, 과도한메모리를사용하거나, 복잡한수식을처리하는등의작업의경우 MapReduce Job 이 Long-Running 하고 Out Of Memory(OOM) 이발생하기도함 MapReduce Framework 에는 mapreduce.task.profile 옵션이 true 인경우 Profiler 가동작 MapReduce Profiling 옵션이활성화되면 Map 과 Reduce Task 에 Java 자체 Profiler 가 2~3 개동작 설정 API Profiler 켜기 Configuration.set(MRJobConfig.TASK_PROFILE, true) Map/Reduce 프로파일개수설정하기 Configuration.set(MRJobConfig.NUM_{MAP REDUCE}_PROFILES, 0~2) 프로파일링옵션지정하기 Configuration.set(MRJobConfig.TASK_PROFILE_PARAMS, agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s )
MapReduce Performance Profiling MapReduce Framework 에는 mapreduce.task.profile 옵션이 true 인경우 Profiler 가동작 MapReduce Profiling 옵션이활성화되면 Map 과 Reduce Task 에 Java 자체 Profiler 가 2~3 개동 # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount -D mapreduce.task.profile=true /root/wordcount/input /root/wordcount/output
MapReduce Performance Profiling YourKit Java Profiler 를 Hadoop MapReduce 에적용했습니다. # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount -D mapreduce.task.profile=true -D mapreduce.task.profile.params=-agentpath:/usr/local/yourkit/bin/linux-x86-64/libyjpagent.so=onexit=snapshot,dir=/snapshots /root/wordcount/input /root/wordcount/output
MapReduce Performance Profiling 생성된 Snapshot 을 YourKit Java Profiler 에서로딩하여프로파일링결과를분석할수있습니다.
Hadoop EcoSystem Realtime Monitoring Cloudera Manager 에서다양한모니터링기능을제공하나자체개발또는기능보완을위해서개발하고자하는경우실시간모니터링시스템을개발할수있음 Hadoop 의 Metrics 2, Oozie Event Listener, Byte Code Instrumentation 등을이용하여실시간으로정보를수집하여모니터링시스템구축가능 YARN 모니터링 History Server 운영자 모니터링서비스 Oozie Server Cloudera Manager 장애시메시지수신 장애시통보 장애알람서비스
Hadoop EcoSystem Realtime Monitoring Hadoop Metrics 2 System Hadoop 은 Hadoop 서비스 (Namenode, Datanode ) 의각종성능정보를수집하기위해서 Metrics 2 System 을제공 Hadoop 의 conf/hadoop-metrics2.properties 파일에 Sink 설정을통해수집가능
Hadoop EcoSystem Realtime Monitoring Hadoop Metrics 2 System Hadoop Metrics 2 는 Resource Manager, Node Manager, Namenode, Datanode 등다양한서비스에기본 탑재되어있음. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # syntax: [prefix].[source sink].[instance].[options] # Here we define a file sink with the instance name foo *.sink.foo.class=org.apache.hadoop.metrics2.sink.filesink # Now we specify the filename for every prefix/daemon that is used for # dumping metrics to this file. Notice each of the following lines is # associated with one of those prefixes. namenode.sink.foo.filename=/tmp/namenode-metrics.out secondarynamenode.sink.foo.filename=/tmp/secondarynamenode-metrics.out datanode.sink.foo.filename=/tmp/datanode-metrics.out resourcemanager.sink.foo.filename=/tmp/resourcemanager-metrics.out nodemanager.sink.foo.filename=/tmp/nodemanager-metrics.out maptask.sink.foo.filename=/tmp/maptask-metrics.out reducetask.sink.foo.filename=/tmp/reducetask-metrics.out mrappmaster.sink.foo.filename=/tmp/mrappmaster-metrics.out # We here define another file sink with a different instance name bar *.sink.bar.class=org.apache.hadoop.metrics2.sink.filesink # The following line specifies the filename for the nodemanager daemon # associated with this instance. Note that the nodemanager metrics are # dumped into two different files. Typically you ll use a different sink type # (e.g. ganglia), but here having two file sinks for the same daemon can be # only useful when different filtering strategies are applied to each. nodemanager.sink.bar.filename=/tmp/nodemanager-metrics-bar.out
Hadoop EcoSystem Realtime Monitoring Hadoop Metrics 2 System
Cloudera Manager 의 Alert Cloudera Manager 에는 Hadoop 서비스 ( 예 ; Oozie Server) 수준의장애알람을적용할수있음
Hadoop EcoSystem Realtime Monitoring Apache Oozie Apache Oozie 는 Hadoop EcoSystem 에서 Batch Job Executor 로써위치 독립서버로구성되어있으며 Open API 및 CLI 를통해 Workflow 관리, Batch Job 관리를제공
Hadoop EcoSystem Realtime Monitoring Oozie Action Oozie 의주요기능은 Workflow ( 워크플로우 ), Coordinator ( 배치작업 ), Bundle ( 배치작업의묶음 ) 로구성 Workflow 는하나이상의작업을다수의 Action 으로연결하여구성 Action 은유형에따라서 Synchronous, Asynchronous Action 으로구성
Hadoop EcoSystem Realtime Monitoring Oozie Workflow
Hadoop EcoSystem Realtime Monitoring Oozie 의 Event Listener Oozie 는배치작업을실행하는주체로작업실행에대한모니터링이중요 장애발생시알람을위해서 Oozie 의 Event Listener 를구현하여관련정보수집가능
Hadoop EcoSystem Realtime Monitoring Oozie 의강점 / 약점 Workflow 를개발하는데 Unix/Linux 의쉘스크립트대비복잡 HDFS 에 Workflow 업로드후실행 Workflow 의실행이력이많아지고등록된 Workflow/Coordinator/Bundle Job 이많아지면 Web UI 가급격하게느려짐 장애알람기능의부재 등록하고관리하는절차가복잡함 Hadoop EcoSystem 에최적화된스케줄러 직관적이지않은 Web UI 워크플로우의버전관리가어려움
참고사이트 https://fharenheit.atlassian.net/wiki/spaces/kb/blog/2017/11/22/207126531/hadoop+2+mapredu ce+profiling http://www.informit.com/articles/article.aspx?p=2755708&seqnum=5 https://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-clusters-like-a-boss/ https://www.slideshare.net/hadoop_summit/node-labels-in-yarn-49792443 https://community.hortonworks.com/articles/72450/node-labels-configuration-on-yarn.html https://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancerin-apache-hadoop/ https://www.cloudera.com/documentation/enterprise/5-11- x/topics/cdh_sg_hdfs_encryption.html#concept_aj3_q3w_hp https://www.cloudera.com/documentation/enterprise/5-11- x/topics/sg_optimize_hdfs_encryption.html#concept_t3z_vmx_yt http://ggoals.tistory.com/77
Thank you for listening