In-memory 클러스터컴퓨팅프레임워크 Hadoop MapReduce 대비 Machine Learning 등반복작업에특화 2009년, UC Berkeley AMPLab에서 Mesos 어플리케이션으로시작 2010년 Spark 논문발표, 2012년 RDD 논문발표 2013년에 Apache 프로젝트로전환후, 2014년 Apache op-level Project 2018년 11/12 기준, v2.4.0 released https://github.com/apache/spark
Running time (s) Library Spark SQL Spark Streaming MLlib GraphX 120 90 110 Spark Core 60 Hadoop Spark Language R SQL Python Scala Java 30 0.9 Cluster Manager Standalone Yarn Mesos Local 0 Logistic regression Data Source HDFS RDB HBASE S3
I am a boy you are a girl I am happy you are happy I, am a, boy I : {2} am : {2} you, are a, girl a : {2} boy : {1} I, am happy you : {2} are : {2} you, are happy val textfile = sc.textfile( hdfs://path/to/file ) val counts = textfile.flatmap(line => line.split( \\s+ )) I : 1 a : 1.map(word you : 1 a : 1 => (word, I 1)) : 1 am : 1 happy:1 you : 1 am : 1 boy : 1 are : 1 girl : 1 are : 1.reduceByKey(_ + _) counts.saveasextfile( hdfs://path/to/output ) shuffle girl : {1} happy:1 happy: {2} map reduce I : 2, am : 2,, happy : 2
Slave Node Spark Worker No Client Node Driver Program Master Node ask ask ask Idle SparkContext Spark Master Slave Node Spark Worker Share ask ask ask
Brightics AI Spark Cluster
`15.1. `15.12. `16.12. `17.6. `17.12. `18.6. `18.11. Spark 1.2.x Spark 1.4.x Spark 1.6.x Spark 2.3.x C/S 기반 Before Brightics Brightics 1.0 Brightics 1.5 Brightics 2.0 ~ 2.1 Brightics 2.2 ~ 2.3 Web 기반 Brightics Suite Brightics 2.0 Suite Brightics AI 2.5 Suite Brightics AI 3.0 Suite Brightics AI 3.5 Suite Brightics AI 3.6 Suite Open Source Brightics Studio 1.0
`15.1. `15.12. `16.12. `17.6. `17.12. `18.6. `18.11. Spark 1.2.x Spark 1.4.x Spark 1.6.x Spark 2.3.x Step1 Step3 Spark 적용 Brightics Agent 개발 C/S 기반 Before Brightics Brightics 1.0 Brightics 1.5 Brightics 2.0 ~ 2.1 Brightics 2.2 ~ 2.3 Step2 Spark Job Server 적용 Spark Fair Scheduler 적용 (+DRA) Web 기반 Brightics Suite Brightics 2.0 Suite Brightics AI 2.5 Suite Brightics AI 3.0 Suite Brightics AI 3.5 Suite Brightics AI 3.6 Suite Containerization Open Source Brightics Studio 1.0
S Cluster Filter [Hive] query.jar [MR] ttest.jar Correlation [MR] filter.jar DB Unload [MR] corr.jar [Sqoop] dbload.jar E S Filter S [Sqoop] dbunload.jar Correlation DB Write E Filter Correlation DB Write E S Filter Correlation DB Write E
굉장히많은데이터를분석할수있지만, 시간이너무오래걸려요.
S Cluster Filter [Hive] query.jar [Spark] [MR] ttest.jar Correlation [Spark] [MR] filter.jar DB Unload [Spark] [MR] corr.jar [Sqoop] dbload.jar E S Filter S [Sqoop] dbunload.jar Correlation DB Write E Filter Correlation DB Write E S Filter Correlation DB Write E
분석시간을굉장히많이단축했어요. 이제전공정을한번에분석할수있어요.
성능 단순한함수도 10 초이상기다려야해요. 누군가작업중에, Spark 를사용할수없어요. 리소스
단순한함수도 10 초이상기다려야해요. 새로운 Spark Job 실행환경필요 성능 누군가작업중에, Spark 를사용할수없어요. 리소스
미디어플랫폼회사 Ooyala에서공개한오픈소스프로젝트 Spark Job, Jar, Context를관리할수있는 Rest API 제공 Spark Context 유지시키면서, 여러 Job이공유할수있게함 Scala, Spray, Slick 등의기술로개발 2018년 10/31 기준, v0.8.0 released https://github.com/spark-jobserver/spark-jobserver
Jar 등록 Context 생성 Job 실행 Job 결과확인 Client Spark Worker Spark Job Server SparkContextFactory LocalContextSupervisorActor Spark Master Spark Worker Web Api JobManagerActor JarManager JobDAO Spark Worker
Jar 등록 Context 생성 Job 실행 Job 결과확인 Client 2) CX 실행 JobManagerActor 에서 SparkContextFactory 를통해 SparkContext 실행 Spark Job Server 3) Spark App 실행 SparkContext 생성및 Worker 에 실행 Spark Worker 1) CX 실행요청 curl -X POS \ localhost:8090/context/brtc Web Api SparkContextFactory LocalContextSupervisorActor JobManagerActor SparkContext JobDAO Spark Master Spark Worker JarManager 4) Jar 로딩 SparkContext 에서 addjar 처리 Spark Worker
모델실행이안돼요. Spark Job Server 가멈췄어요. Spark 가죽어요.
제공되는분석함수보다는직접 Script 실행을원함 Script 기능개발이후, 많은사용자들이 Script 를요청 Spark Job Server 를재기동하기전까지는 memory 사용량이증가 Spark Job Server (Driver) 에서 OOM 발생
제공되는분석함수보다는직접 Script 실행을원함 Script 기능개발이후, 많은사용자들이 Script 를요청 Spark Job Server 제거 Job 실행 Agent 직접구현 Spark Job Server 를재기동하기전까지는 memory 사용량이증가 Spark Job Server (Driver) 에서 OOM 발생
Script 해석 : SparkILoop 인터프리터방식 FSC 컴파일방식 Script 실행 : Spark Job으로구현 Agent 자체로직으로변경 Jar등록방식 : 사용자가 Jar Upload 방식 Agent 자체에서 addjar SparkContext 관리 : 다중 SparkContext 단일 SparkContext 경량화 : 구조단순화. 불필요로직제거. 18,160K 146K
Spark Cluster Brightics Agent (Spark Context) Spark #1 Job #1 1) 하나의실행로직이병렬 Job 으로구성. ex) Group By Job #2 Spark #2 Brightics Agent (Spark Context) Job #1 Job #2 Brightics Agent (Spark Context) Job #1 Job #2 Spark #3 Spark #4 Spark #5 Spark #6 2) 하나의실행이단일 Job 의연속으로구성. Ex) wordcount 3) 한사용자가여러 Agent 를사용. Ex) 관리자, Batch Job
Filter Correlation Unload Filter Correlation Unload DISK Hadoop MapReduce SparkContext SparkContext SparkContext SparkContext Filter Correlation Unload Filter Correlation Unload DISK Spark DISK Spark with Brightics Agent
`15.1. `15.12. `16.12. `17.6. `17.12. `18.6. `18.11. Spark 1.2.x Spark 1.4.x Spark 1.6.x Spark 2.3.x Spark 적용 Brightics Agent 개발 C/S 기반 Before Brightics Brightics 1.0 Brightics 1.5 Brightics 2.0 ~ 2.1 Brightics 2.2 ~ 2.3 Spark Job Server 적용 Spark Fair Scheduler 적용 (+DRA) Web 기반 Brightics Suite Brightics 2.0 Suite Brightics AI 2.5 Suite Brightics AI 3.0 Suite Brightics AI 3.5 Suite Brightics AI 3.6 Suite Containerization Open Source Brightics Studio 1.0
Spark Cluster Driver Program SparkContext Job #1 Job #2
Spark Cluster Driver Program SparkContext Job #2
Job vs Job Spark Fair Scheduler
Q. 만약, 큰데이터를오래돌리는 Job 이실행된다면?
Spark Cluster Driver Program SparkContext Job #2
A. Spark Fair Scheduler 를활용해 Job 간의경합해결 Spark Job Scheduling 방식에는 FIFO (default) 와 FAIR 가있음 FIFO(default) 는 Job 을한개씩실행. FAIR 는 Job 을동시에동일한리소스 (tasks) 로실행. scala> conf.set("spark.scheduler.mode", "FAIR") SparkContext 의 Scheduler 설정을위해서는 conf/fairscheduler.xml 에서 Pool 을정의
Spark Cluster Driver Program SparkContext
Spark Cluster Driver Program SparkContext
Q. 모든 Resource 가다점유된상태에서 긴급하게다른 Job 을실행시켜야한다면?
Spark Cluster Driver Program SparkContext
A. Pool 의옵션을통해 Job 의우선순위제어가능 Job 실행전에 pool 을지정하여원하는 Pool 에서실행가능 scala> sc.setlocalproperty("spark.scheduler.pool", FasterJob")
Spark Cluster Driver Program SparkContext POOL : DefaultJob
Spark Cluster Driver Program SparkContext POOL : DefaultJob POOL : FasterJob
Spark Cluster Driver Program SparkContext POOL : DefaultJob POOL : FasterJob
default Pool 에서는상태및데이터확인작업수행 다른 Pool 에서는일반적인 Spark Job 수행
여러 Pool 에서동시에여러 Job 이실행되면, 높은 weight 의 Pool 이더많은리소스를사용 Pool 에서 Job 이실행되면, 최소한 minshare 만큼의 cores 를얻도록시도
Application vs Application Dynamic Resource Allocation
Spark Cluster Driver Program SparkContext
Spark Cluster Driver Program SparkContext Driver Program SparkContext
Q. 한 Spark Context 에만너무많은 Job 이있다면?
Driver Program SparkContext Spark Cluster Driver Program SparkContext
A. Dynamic Resource Allocation 을활용하여, Application 간효율적리소스사용가능
Spark Standalone 에서의 Dynamic Resource Allocation 단위는 수 : 총 cores 수 (--total-executor-cores, spark.cores.max), 당 cores 수 (--executor-cores) 에의존 아무작업도없을때 최초리소스할당 (spark.dynamicallocation.initials) 최소리소스로반환 (spark.dynamicallocation.mins) 작업실행할때 가용리소스가있다면최대한사용하면서작업시작 spark.dynamicallocation.maxs 등으로상한선제어가능 작업이완료될때 최소리소스로반환다른 App 이있다면그 App 이남는리소스사용
Driver Program SparkContext Spark Cluster Driver Program SparkContext
`15.1. `15.12. `16.12. `17.6. `17.12. `18.6. `18.11. Spark 1.2.x Spark 1.4.x Spark 1.6.x Spark 2.3.x Spark 적용 Brightics Agent 개발 C/S 기반 Before Brightics Brightics 1.0 Brightics 1.5 Brightics 2.0 ~ 2.1 Brightics 2.2 ~ 2.3 Spark Job Server 적용 Spark Fair Scheduler 적용 (+DRA) Web 기반 Brightics Suite Brightics 2.0 Suite Brightics AI 2.5 Suite Brightics AI 3.0 Suite Brightics AI 3.5 Suite Brightics AI 3.6 Suite Containerization Open Source Brightics Studio 1.0
Local Mode for Single User Single Context for a Group Multiple Contexts for a Company Multiple Clusters for a Cloud
Demo
1. Spark Job Server 를사용할때와안할때의실행시간비교 2. FIFO scheduler 와 FAIR scheduler 각각에서 Job 실행
Q & A
hank you