Web Analytics at Scale with Elasticsearch @ naver.com Part 2 - Lessons Learned 허정수 / 네이버 jason.heo.sde@gmail.com
Agenda 2017.06.22. 밋업발표내용 Introduction 콘텐츠소비통계 Part I - Architecture Initial Architecture -> Problems & Solutions -> Proven Architecture Data Pipelines Part II - Lessons Learned 성능개선 Tip 운영 Tip
Part 1 발표영상 https://youtu.be/mc9gy-5d60w?t=10m40s
콘텐츠소비통계 다양한네이버의서비스들 네이버블로그 (2016.06. 서비스시작 ) OOO 서비스 (2016.09. 서비스시작 ) YYY 서비스 (2017.07. 서비스시작 ) XXX 서비스 (2017.10. 서비스계획 ) 공통통계플랫폼 (2016.01. 개발시작 ) 네이버사용자 회사내부직원용이아닌, 네이버사용자를위한서비스
< 블로그프론트엔드 > < 블로그통계메뉴 >
Goal High Throughput Low Latency Ease of Use
Architecture End Users Node.js Batch ES Cluster Realtime ES Loader SparkSQL nginx access log Logstash Realtime ES Cluster nbase-arc (Redis Cluster) Scoreboard Loader Parquet Files Parquet Loader SparkSQL Impala Zeppelin 업무요청 & 내부지표 Kafka 1 (Raw Log) Transform Kafka 2 (Refined Log)
Versions 1. Elasticsearch 2.3 & es-hadoop 2.3 2. Logstash 2.1 3. Spark 1.6 4. JDK 1.8 for ES, 1.7 for Spark 5. CDH 5.8 6. Storm 0.10 7. CentOS 7.2 8. Kafka 0.9 9. nbase-arc 1.3
Agenda Introduction 콘텐츠소비통계 Part I - Architecture Initial Architecture -> Problems & Solutions -> Proven Architecture Data Pipelines Part II - Lessons Learned 성능개선 Tip 운영 Tip 8 월 10 일밋업내용
Execution Hint (1) { "query": { "match": {..., "aggr": { "terms": { "field": "u", "execution_hint": "map"
Execution Hint (2) SQL 실행순서 SELECT u, COUNT(*) FROM tab WHERE < 조건 > GROUP BY u 1. " 조건에맞는문서 " 조회 2. u field 로 Aggregation 예상수행시간 - Matching Document 개수에비례 - " 조건에맞는문서 " 개수가 0 건이면 0 초에가까워야한다 - Aggregation 할대상이없으므로
Execution Hint (3) 실험결과 Matching Document 개수
JVM Tuning (1) Stop-The-World phase Full GC 자체가문제는아니지만종종 STW 가발생함 [INFO ][monitor.jvm ] [hostname] [gc][old][109757][7966] duration [15.9s], collections [2]/[16.2s], <= 16 초동안아무응답이없음 total [15.9s]/[12.8m], memory [12.9gb]->[11.2gb]/[14.5gb], all_pools {[young] [1.2gb]->[146.1mb]/[1.2gb]{[survivor] [394.7mb]->[0b]/[438.8mb]{[old] [11.3gb]->[11gb]/[12.8gb] <ES Log 에서발췌 >
JVM Tuning (2) Tuning 전
JVM Tuning (3) Node 별 GC 옵션을다르게한뒤입수시, Heap 사용량그래프 <Default GC Option> <GC Tuning> JVM Option OLD Gen. 으로옮길경향을줄인다 -XX:MaxTenuringThreshold=15 -XX:NewRatio=7 -XX:SurvivorRatio=3 -XX:-UseAdaptiveSizePolicy
g1 gc (1) 100B docs are indexed 5 nodes in the cluster 3 nodes with cms gc 2 nodes with g1 gc <Disclaimer> elastic.co would like to recommend G1GC someday, but not for now <g1 gc option> https://wiki.apache.org/solr/shawnheisey#gc_tuning -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=250 -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts
g1 gc (2) the output of node status (/_nodes/hostname/) API "gc": { "collectors": { "young": { "collection_count": 141144, "collection_time": "1.7h", "collection_time_in_millis": 6295572, "old": { "collection_count": 13005, "collection_time": "20.6m", "gc": { "collectors": { "young": { "collection_count": 117135, "collection_time": "1.4h", "collection_time_in_millis": 5061268, "old": { "collection_count": 2, "collection_time": "27s", "collection_time_in_millis": 1241134 "collection_time_in_millis": 27075 <cms gc> Which one looks better? <g1 gc>
g1 gc (3) STW with g1 gc took a longer time than cms gc [INFO ][monitor.jvm ] [hostname] [gc][old][109757][7966] duration [15.9s], collections [2]/[16.2s], total [15.9s]/[12.8m], memory [12.9gb]->[11.2gb]/[14.5gb], all_pools {[young] [1.2gb]- >[146.1mb]/[1.2gb]{[survivor] [394.7mb]->[0b]/[438.8mb]{[old] [11.3gb]->[11gb]/[12.8gb] <cms gc> stw occurred 1 time, 16.2s [2017-01-02 01:47:16,525][WARN ][monitor.jvm ] [hostname] [gc][old][111127][1] duration [14.4s], collections [1]/[15.2s], total [14.4s]/[14.4s], memory [13.5gb]->[11.2gb]/[15gb], all_pools {[young] [176mb]->[40mb]/[0b]{[survivor] [96mb]->[0b]/[0b]{[old] [13.2gb]- >[11.2gb]/[15gb] [2017-01-02 03:28:27,815][WARN ][monitor.jvm ] [hostname] [gc][old][117128][2] duration [12.6s], collections [1]/[13.5s], total [12.6s]/[27s], memory [14.1gb]->[11gb]/[15gb], all_pools {[young] [320mb]->[112mb]/[0b]{[survivor] [96mb]->[0b]/[0b]{[old] [13.8gb]- >[10.9gb]/[15gb] <g1 gc> stw occurred 2 times, 28.7s
Circuit Breaker (1) GROUP BY with more than two high cardinality fields causes OOM SELCT c, u, COUNT(*) FROM monthly_idx // 수십억건짜리 Index GROUP BY c, u 과도한메모리사용 Full GC 만계속발생 모든질의에대한응답없음 ES Full Start 방법밖에없음
Circuit Breaker (2) PUT /_cluster/settings { "persistent": { "indices.breaker.request.limit": "2.5%" 전체메모리의 2.5% 이상사용시, 수행중인 Query 가 Fail 되지만, Cluster 전체가먹통되는현상방지가능
Index 휴지통기능 (1) 사전개념 - alias 입수중 Client 조회요청 Alias 가없으므로조회되는 Data 없음 daily_2017.01.01 (alias) daily_2017.01.01_ver_1 ( 실제 index) 장점 Partial Data 가서비스되는것을맊을수있음 (all or nothing)
Index 휴지통기능 (2) 사전개념 - alias 입수완료 Client 조회요청 ver_1 에속한 Data 가전송 daily_2017.01.01 (alias) daily_2017.01.01_ver_1 ( 실제 index) Data 가온전히입수완료되었을경우에만 alias 생성
Index 휴지통기능 (3) 사전개념 - alias 재입수 Client 조회요청 ver_2 에속한 Data 가전송 daily_2017.01.01 (alias) daily_2017.01.01_ver_1 ( 실제 index) 입수완료후 alias 교체 daily_2017.01.01_ver_2 ( 실제 index) Rollback 도가능
Index 휴지통기능 (4) index 삭제 Alias 만끊는다. Data 조회안됨 Client daily_2017.01.01 (alias) daily_2017.01.01_ver_1 ( 실제 index).trash (alias) 주기적으로.Trash 에 Alias 걸린 Index 삭제
Index 휴지통기능 (5) 실수로삭제한경우 Alias 만교체하면됨 Client daily_2017.01.01 (alias) daily_2017.01.01_ver_1 ( 실제 index).trash (alias)
Index 휴지통기능 (6) DELETE /daily_2017.01.01_ver1 { "actions": [ { "remove": { "indices": ["daily_2017.01.01_ver1"], "alias": "*", { "add": { "indices": ["daily_2017.01.01_ver1"], "alias": ".Trash"
적절 Shard 개수, Size Shard Size 별 Query 응답시간조사 문서개수 2억개기준 Num of shards Docs per shard shard size Query 1 (sec) Qeury 2 (sec) Query 3 (sec) 5 4천만 17GB 0.6524 0.7728 0.8876 10 2천만 8.5GB 0.5328 0.5554 0.4526 20 1천만 4.2GB 0.8972 0.5044 0.5578 Shard Size 별응답시간이크지않음 저희는 Shard Size 를 10GB 이내로사용중입니다 Index 개수가많지않은경우 Shard 개수는 (Core 개수 * 2) 개정도가좋습니다
Reduce Disk Size Disabling _all field: 18.6% 감소 Disabling _source field: 20% reduced Think before disabling the _source field
Logstash option for exactly-once (1) Options for File input start_position => "beginning" for log rotate http://jason-heo.github.io/elasticsearch/2016/02/28/logstash-offset.html Options for Kafka Output acks => "all" retries => n
Logstash option for exactly-once (2) access_log access_log stat_interval (1 초 ) discover_interval (15 초 ) end 인경우유실발생 신규파일인지시점 log rotate 시점 ( 신규파일생성 ) stat_interval: 파일갱신여부검사주기 discover_interval: pattern 에맞는신규파일생성여부검사주기
Logstash option for exactly-once (3) Option for the Kafka Output output { kafka {... compression_type => 'gzip' acks => "all" # default:1 retries => 5 # defualt:0 The leader waits for all the acks sent by followers Pros: Strongest available guarantee. Cons: Slow cf) acks=>"1" means that the leader will respond without waiting the follower's ack Broker 1 Leader Broker 2 Follower 1 Broker n Follower m ack ack
Nested Document format (1) [ ] <Flattened Doc> { "c": "blogger1", "u": "url1", "g": "m", "a": "1", "pv": 10", { "c": "blogger1", "u": "url1", "g": "f", "a": "2", "pv": 20" c: blogger id u: url g: gender a: age <Nested Doc> [ ] { "c": "blogger1", "u": "url1", "page_views": [ { "g": "m", "a": "1", "pv": 10", { "g": "f", "a": "2", "pv": 20" ]
Nested Document format (2) 일반적인저장모델 - Flattened Doc Model [ ] { "c": "blogger1", "u": "url1", "g": "m", "a": "1", "pv": 10", { "c": "blogger1", "u": "url1", "g": "f", "a": "2", "pv": 20" Data 중복 sqlcontext.sql(" SELECT c, u, g, a, COUNT(*) AS pv FROM logs GROUP BY c, u, g, a ").savetoes("index_name/doc_type") < 입수스크립트 > < 문서포맷 >
Nested Document format (3) Nested Doc Model [ ] { "c": "blogger1", "u": "url1", "page_views": [ { "g": "m", "a": "1", "pv": 10", { "g": "f", "a": "2", "pv": 20" ] 중복제거 case class PageView(g: String, a: String, pv: Integer) sqlcontext.udf.register("page_view", (c: String, u: String, pv: Integer) => PageView(c, u, pv)) sqlcontext.sql(" SELECT c, u, COLLECT_LIST(page_view) AS page_views FROM ( SELECT c, u, page_view(g, a, pv) AS page_view FROM ( SELECT c, u, g, a, COUNT(*) AS pv FROM logs GROUP BY c, u, g, a ) t1 ) t2 GROUP BY c, u ").savetoes("index_name/doc_type") < 입수스크립트 >
Nested Document format (4) Pros Data size is 49% smaller than Flattened Model Bulk Loading time is 52% faster than Flattened Model (including extra processing time) Cons Extra processing is required using SparkSQL But the bottleneck is saving the result to ES. Extra processing time is not a problem ES gets slower when nested field has too many children So, use it when the number of children is small
복합필드 (1) 초기 Schema { "properties" : [... "c" : {..., "type" : {...,... ] 질의패턴 c 로도조회 : 5% type 으로조회 : 3% 두개필드 AND 조회 : 92% 위의질의패턴을모두지원해야함참고 : ES 에는복합키개념이없다
복합필드 (2) c 와 type 을조합한 1 개추가생성 { "properties" : [... "c": {..., "type": {..., "ctype": {... ] { "c": "blogger_id", "type": "channel_pv", "ctype": "blogger_id:channel_pv", "pv": 10 <Document 예 > <schema>
복합필드 (3) 응답속도 40% 개선 (Page Cache Miss 시 ) { "query_type": "BooleanQuery", "lucene": "+c:blogger_id +type: channel_cv" "time": "269.686223ms" <ES Query Profile 결과 > { "query_type": "ConstantScoreQuery", "lucene": "ConstantScore (ctype:c:blogger_id:channel_cv)", "time": "124.790570ms"
single doc 의일부 field 조회개선 (1) _source 필드에서 Data 조회 { "query": { "bool": { "must": [ { "term": { "primary_key": "xx" ], "_source": { "includes": ["pv"] <DSL> SELECT pv FROM tab WHERE primary_key = 'xx' <SQL>
single doc 의일부 field 조회개선 (2) Doc Value 에서 Data 조회 { "query": {......, "aggregations": { "MAX(pv)": { "max": { "field": "pv" <DSL> SELECT MAX(pv) FROM tab WHERE primary_key = 'xx' <SQL> 조회문서가 1 건이므로 pv = MAX(pv) = MIN(pv) = AVG(pv)
single doc 의일부 field 조회개선 (3) Query 조회방식처리량 (QPS) 평균응답시간 (ms) Q1 Q2 _source 활용 4,604 107 Doc Value 활용 7,484 66 _source 활용 5,024 98 Doc Value 활용 7,595 65
single doc 의일부 field 조회개선 (4) _source Doc Value
single doc 의일부 field 조회개선 (5) ES 5.x 에는 Doc Value Fields 라는것이생겼음 앞장과같은용도로사용되는것인지는테스트못해봤습니다ㅠㅠ GET /_search { "query" : { "match_all": {, "docvalue_fields" : ["test1", "test2"]
Segment Merge (1) 1 + 1 < 2 Segment 2 개를 1 개로합치면더적은 Resource 를사용합니다
Segment Merge (2) https://github.com/exo-archives/exo-es-search
Segment Merge (3) POST /index-name/_forcemerge/?max_num_segments=1 Lucene Memory: 36.8% 감소 Index Size: 15% 감소
Segment Merge (4)
Segment Merge (5) Segment Merge 를안했다면 꽉차는기간을늘릴뿐, 이문제에대한완벽한해결책은아님
Segment Merge (6) 주의 : 간혹 Heap 이오히려증가하는경우도있습니다
Q&A
Q. 엘라스틱서치와스파크의연동작업중주의해야할사항또는같이사용했을때의시너지효과에대해묻고싶습니다 A. WRITE 관점관점 입수가편하다 dataframe 을 savetoes() 만호출하면자동입수 에러처리를 es-hadoop 이다해줌 다양한옵션들 입수진행율을 Spark Job 모니터링을통해서쉽게알수있다 READ 관점 편하다 다양한 Data Source 와 JOIN 가능 Index Backup 이쉽다 filter push down 주의사항 Write 관점 : Spark worker 개수를늘려도어느임계점이후부터는 CPU 사용량만많아질뿐 indexing rate 는동일 Read 관점 : Shard 개수와 worker 개수를맞추는것이좋음