이슈및장애 OutOfMemory 장애조치방법 2013. 11. 01
목차 1. 개요... 3 2. 현상... 3 3. 문제분석... 5 4. 문제해결... 13 2
1. 개요 OutOfMemory Exception 정의 Garbage Collector 가새로운 Object 를유지하기위해충분한메모리공간을확보하지못할때발생함. Vendor 사의 JVM 메모리모델방식을참조하여야함. OutOfMemory 는이하 OOM 이라약어로표기함. 2. 현상 Heap Memory 가 Full 인경우 1) 응용프로그램의특정시점에서의과도한메모리사용에의한경우 ( 급진적증가 ) 응용프로그램에서과도하게 heap 메모리를사용하여 OOM 이발생하는경우는이에러가발생하는시점의 Thread dump 를통하여어느프로그램인지를판별할수있습니다. 특정시점에서 Thread dump 를 2~3 번정도시도하여야정확하게어느로직에서문제를야기시키는지명확해집니다. 2) 메모리릭에의해조금씩메모리가쌓여가는경우 ( 점진적증가 ) JVM 상의응용프로그램중에 GC 대상에서제외되는영역에 Data 를조금씩쌓고있다는얘기입니다. GC 대상제외영역을표기하면다음과같습니다. HttpSession Data(Session timeout 및 invalidate 시까지제외 ) Static 변수서블릿또는 JSP 의멤버변수 (WAS 기동후최초호출시인스턴스화되어 WAS 가내려갈때까지사라지지않음 ) EJB 의멤버변수 (Pool 안에서객체가존재하는한 GC 대상에서제외 ) Heap Memory 가 Full 이아닌경우 1) Perm 영역이 Full 이되는경우 Permanent generation is full... increase MaxPermSize (current capacity is set to: 134217728 bytes) [Full GC[Unloading class sun.reflect.generatedserializationconstructoraccessor31282] 810207K->802132K(1013632K), 8.3480617 secs] <GC: 2 4 2465625.831280 10348 0 31 113802808 105534632 286326784 0 0 35782656 715849728 715848840 715849728 134217720 134023296 134217728 8.348677 8.348677 > Passwd Check ============= 3
<2005. 5. 19. 오전 9 시 32 분 23 초 KST> <Error> <HTTP> <BEA-101017> <[ServletContext(id=2536415,name=/,context-path=)] Root cause of ServletException. java.lang.outofmemoryerror 위와같은에러는 JVM 에서 Perm 영역이 Full 이났을때나는경우입니다. Perm 영역에는 Class Object 및관련된 meta data 가로드되는영역인데사이트가매우많은수의 Class 를사용하는경우늘려줘야하는경우도있습니다. 사이즈가얼마가적당한가에대한답은없고 Max size 까지늘어나지않고일정한사이즈를유지하면안정적으로운영된다고볼수있습니다. 해당에러발생시 XX:MaxPermSize=256m 과같이사이즈를늘려주는데해당조치에도 Perm 영역이늘어난다면 WAS 및 JDK Bug 로보는것이일반적입니다. Native(C) heap 증가로인한경우 1) MAXDSIZ 관련 Exception java.lang.outofmemoryerror: requested 20643840 bytes for OneContigSpaceCardGeneration::grow_to_reserved. Out of swap space? Possible causes: - not enough swap space left, or - kernel parameter MAXDSIZ is very small. => low system memory? commit_memory fails: 2 Java out of memory messages are marked with pid: 14574 in /var/adm/syslog/syslog.log MAXDSIZ 란 Process 의데이터세그먼트의최대사이즈를의미하는데, 기본값은 64MB 로대부분의 응용프로그램에서는너무작은값입니다. 이값을최대적정수치에두어야하는데이영역이 Full 이날 경우에발생하는 OOM 입니다. JDK 도내부적으로 OS native library 를사용하고있고, 응용프로그램및 WAS 단에서 JNI 를사용한다고본다면 위와같은에러에충분히직면할수있다고봅니다. 해당 Kernel 은 HP 에있는 Kernel parameter 이므로값이적절하게설정되어있는지체크합니다. 2) Thread 생성실패 [2006.08.01 14:34:18][0] [JeusServer] edslang_container1 is starting... Error occurred during initialization of VM java.lang.outofmemoryerror: unable to create new native thread [2006.08.01 14:34:20][0] [VirtualTerminal] edslang_container1 initialization failed [2006.08.01 14:34:20][0] [JeusServer] fail to start [edslang_container1] 4
Thread 관련프로그램실수로인해과도한 Thread 를생성할때도과도하게메모리를사용할수있으므로 JVM thread dump 를통해과도한 thread 가생성되지않았는지살펴보셔야합니다. 그리고위에러는 max_thread_proc 라는 Kernel parameter 가너무작기때문에날수도있으므로 HPJtune 을통해해당시스템의적절한값을산정합니다. 3. 문제분석 JEUS 상의문제분석방법 JEUS Server log 분석 1) OutOfMemory 인데어떤유형의 OOM 인지대략적으로파악을합니다. 2) Log size 가크면 split 같은 command 를이용하여 file 을자르거나 Log pattern 검색스크립트같은것을 작성하여필터링한로그를가지고따로확인합니다. webadmin 의 gc 모니터링 [Wed Feb 21 11:25:22 KST 2007] trials = 4 < memory information > VM Total Memory = 37812736 Bytes VM Free Memory = 21827768 Bytes 위와같은모니터링이효과적일때는급진적으로 OOM 발생시점이며위사항과더불어 Thread dump 를통하여확인할수있습니다. 점진적으로느릿느릿증가하는추세는 GC log 를분석하는편이좋습니다. GC log 분석 gc.log 를생성하여 tail f gc.log 를하여 Full GC 횟수및시간등을파악한다. 원인파악을위한 JVM 옵션및 JEUS 옵션 1.SUN 및 HP - -Xloggc:${LOG_HOME}/gc.log 2.IBM - -verbose:gc -Xverbosegclog:${LOG_HOME}/gc.log Thread dump 를통한분석 급진적증가일때아래와같은 thread dump 를확인하여문제분석을할수가있습니다. 3XMTHREADINFO "webtob2-hth0(10.10.123.23:9900)-w3" (TID:0x7000000000ACDE8, sys_thread_t:0x118365748, state:r, native ID:0x494D) prio=5 5
4XESTACKTRACE at java.lang.stringbuffer.append(stringbuffer.java(compiled Code)) 4XESTACKTRACE at hkmc.common_hkbs.utility.jsphelper.replace(jsphelper.java(compiled Code)) 4XESTACKTRACE at jeus_jspwork._kmc._kb._403_kb06_5ftext._jspservice(_403_kb06_5ftext.java:164) 4XESTACKTRACE at jeus.servlet.jsp.httpjspbase.service(httpjspbase.java(compiled Code)) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.jsp.jspservletwrapper.executeservlet(jspservletwrapper.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.requestdispatcherimpl.forward(requestdispatcherimpl.java(compiled Code)) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.printjsppage(hkmcbaseservlet.java:172) 4XESTACKTRACE at hkmc.hkbs.kb.servlet.kb06_textservlet.performtask(kb06_textservlet.java:51) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.performpretask(hkmcbaseservlet.java:110) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.dopost(hkmcbaseservlet.java:78) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.servletwrapper.executeservlet(servletwrapper.java(compiled Code)) 4XESTACKTRACE jeus.servlet.engine.servletwrapper.execute(servletwrapper.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.servlets.workerservlet.execute(workerservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.webtobrequestprocessor.run(webtobrequestprocessor.java(compiled Code)) 3XHNATIVESTACK Native Stack NULL ------------ 3XHSTACKLINE at 0x90000000143702C in DumpThreadDetails 3XHSTACKLINE at 0x9000000012B231C in xmdumpthreadshelper 3XHSTACKLINE at 0x9000000012B25E8 in xmdumproutine 3XHSTACKLINE at 0x9000000012D8F40 in dggeneratejavacore 3XHSTACKLINE at 0x9000000012D746C in dgdumphandler 3XHSTACKLINE at 0x900000001354C7C in manageallocfailure 3XHSTACKLINE at 0x900000001311044 in lockedheapalloc 3XHSTACKLINE at 0x9000000013139CC in realobjalloc 3XHSTACKLINE at 0x900000001314884 in targetedallocmiddlewarearray 3XHSTACKLINE at 0x90000000149DE3C in _jit_newarray 3XHSTACKLINE at 0x90000000169BBF0 in java_lang_compiler_start 6
APM 및타사디버깅툴을이용한분석방법 SysMaster 를이용한원인파악 1) GC 분석및메모리사용량분석 20 시경에 GC count 및 time 이증가, 메모리사용량은급격히감소함. 확인결과배치작업으로인하여일시적으로튀는현상임. 2) Java Collection 객체분석 JVM 상의객체를분석하여 put 개수가비정상적으로발생하면메모리릭의심 7
IBM 의 Heap analyzer 분석법 Heap Analyzer 를통해서 Memory leak 을분석을위해아래와같이 Tree view 를확인하고 Memory 를많이차지하거나 leak 이있을의심스러운곳을마우스오른쪽클릭을하여 locate a leak suspect 부분을누르면아래와같은화면을확인할수가있습니다. 아래분석결과를보면 110M 에서 8M 로급격히떨어지는부분을확인할수가있는데, 모두 java.util.hashtable( 대략 5K) 이모여서 110M 정도의메모리를소요하고있음을확인할수있습니다. 대략적으론이렇고어느어플리케이션에서일으키는지확인하려면 heapdump 발생후에발생한 javacorexxx.txt 를분석해야지알수가있습니다. HP 의 Jmeter 분석법 1, SUN, HP HP Jmeter 를활용하기위해선 JVM option 에아래와같은옵션을추가해주어야합니다. -Xrunhprof:heap=all,cutoff=0 HP jmeter 실행은아래와같이합니다. Memory dump size 에따라 heap 을조절하여야합니다. java -mx128m -jar HPjmeter.jar 8
위와같이하고 kill -3 java_pid 와같이하여분석하고자하는 Container 에서의 memory snapshot 을 jmeter 를 통해확인할수있게됩니다. 주의할점은운영시위와같은 JVM 옵션으로인해 performance 가저하될수 있으므로꼭확인할수있는상황에서만쓰도록한다. Window-> New Window 그리고다른 metric 을선택합니다. Metric->Reference Graph Tree 를누르면 object reference graph 를볼수있다. 9
아래그림은 Guess -> Memory leaks 를누르면모든전체 heap 을 HPjmeter 가분석한결과입니다. 10
HP 의 Jtune 분석법 1.HP HP 에서의 GC log 분석은 HPjtune 을가지고합니다. 1)Heap usage 패널을통해 Application 이 object 들을어떻게유지하고있는지확인할수있다. X 축은시간을보여주고 Y 축은 GC 후에 Heap size 를보여줍니다. 11
2)Duration 패널을통해 GC 의지속기간을파악할수있다. 우리는아래그림을통하여 Full gc 를하는데많은시간이걸린것을확인할수있다. 기타다른도구들의문제분석 위와같이 Optimizeit 을통해서 OOM 분석을할수있습니다. Optimizeit setting 방법은 technet 을참조합니다. 12
4. 문제해결 문제발생원인 ( 예제 ) 환경 : AIX5.2, JEUS 4.X, IBM JDK 1.4.1 64bit 발생상황 : WAS 의 performance 가급격히떨어져서 respose time 이느려지고 JEUS Server log 를확인한결과 OOM 이발생하였음 AIX 에서는 dump 위치를고정시켜서 /was/itgwas01/dump 밑에위현상발생시점의 javacore 및 heapdump 가저장되어있었음. 해당부분에대한 pattern 을분석한결과 kb06_text.jsp 에서특정 method 가문제를야기시키는것을확인할수있었음 (Kb06_Text.jsp) 3XMTHREADINFO "webtob2-hth0(10.10.123.23:9900)-w3" (TID:0x7000000000ACDE8, sys_thread_t:0x118365748, state:r, native ID:0x494D) prio=5 4XESTACKTRACE at java.lang.stringbuffer.append(stringbuffer.java(compiled Code)) 4XESTACKTRACE at hkmc.common_hkbs.utility.jsphelper.replace(jsphelper.java(compiled Code)) 4XESTACKTRACE at jeus_jspwork._kmc._kb._403_kb06_5ftext._jspservice(_403_kb06_5ftext.java:164) 4XESTACKTRACE at jeus.servlet.jsp.httpjspbase.service(httpjspbase.java(compiled Code)) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.jsp.jspservletwrapper.executeservlet(jspservletwrapper.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.requestdispatcherimpl.forward(requestdispatcherimpl.java(compiled Code)) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.printjsppage(hkmcbaseservlet.java:172) 4XESTACKTRACE at hkmc.hkbs.kb.servlet.kb06_textservlet.performtask(kb06_textservlet.java:51) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.performpretask(hkmcbaseservlet.java:110) 4XESTACKTRACE at hkmc.common_hkbs.base.hkmcbaseservlet.dopost(hkmcbaseservlet.java:78) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at javax.servlet.http.httpservlet.service(httpservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.servletwrapper.executeservlet(servletwrapper.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.servletwrapper.execute(servletwrapper.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.servlets.workerservlet.execute(workerservlet.java(compiled Code)) 4XESTACKTRACE at jeus.servlet.engine.webtobrequestprocessor.run(webtobrequestprocessor.java(compiled Code)) 3XHNATIVESTACK Native Stack NULL ------------ 13
3XHSTACKLINE at 0x90000000143702C in DumpThreadDetails 3XHSTACKLINE at 0x9000000012B231C in xmdumpthreadshelper 3XHSTACKLINE at 0x9000000012B25E8 in xmdumproutine 3XHSTACKLINE at 0x9000000012D8F40 in dggeneratejavacore 3XHSTACKLINE at 0x9000000012D746C in dgdumphandler 3XHSTACKLINE at 0x900000001354C7C in manageallocfailure 3XHSTACKLINE at 0x900000001311044 in lockedheapalloc 3XHSTACKLINE at 0x9000000013139CC in realobjalloc 3XHSTACKLINE at 0x900000001314884 in targetedallocmiddlewarearray 3XHSTACKLINE at 0x90000000149DE3C in _jit_newarray 3XHSTACKLINE at 0x90000000169BBF0 in java_lang_compiler_start 아래는 heapdump 분석결과입니다. 위 dump 분석결과를보면정확히 JSP 라인수는알수가없으나 StringBuffer 관련 obj 를많이담을때 발생한다는것을알수가있었습니다. 해당라인을어느정도확인하시려면 javacore dump 와같이분석을 하셔야합니다. 문제의원인은아래 class 의 replace method 에있었음을위 heapdump 및 javacore 의분석을통해알수있었으며상황재현은 Optimizeit 이라는 tool 을이용하여고객에게상황재현을시켜주었습니다. JSPHelper.replace(vo.A01_PTNM,""," ") 위에러는로직버그인데 Null check 에대한 exception 이정의가안되어있어서 StringBuffer 에 append 하는과정에서과도한메모리를사용하는것으로결론이났습니다. 14
Copyright 2013 TmaxSoft Co., Ltd. All Rights Reserved. TmaxSoft Co., Ltd. Trademarks Tmax, WebtoB, WebT, JEUS, ProFrame, SysMaster and OpenFrame are registered trademarks of TmaxSoft Co., Ltd. Other products, titles or services may be registered trademarks of their respective companies. Contact Information TmaxSoft can be contacted at the following addresses to arrange for a consulting team to visit your company and discuss your options for legacy modernization. Korea - TmaxSoft Co., Ltd. Corporate Headquarters 272-6 Seohyeon-dong, Bundang-gu, Seongnam-si, South Korea, 463-824 Tel : (+82) 31-8018-1708 Fax : (+82) 31-8018- 1710 Website : http://tmaxsoft.com U.S.A. - TmaxSoft Inc. 560 Sylvan Avenue Englewood Cliffs, NJ 07632, USA Tel : (+1) 201-567-8266 Fax : (+1) 201-567- 7339 Website : http://us.tmaxsoft.com Japan TmaxSoft Japan Co., Ltd. 5F Sanko Bldg, 3-12-16 Mita, Minato-Ku, Tokyo, 108-0073 Japan Tel : (+81) 3-5765-2550 Fax: (+81) 3-5765- 2567 Website : http://jp.tmaxsoft.com China TmaxSoft China Co., Ltd. Room 1101, Building B, Recreo International Center, East Road Wang Jing, Chaoyang District, Beijing, 100102, P.R.C Tel : (+86) 10-5783-9188 Fax: (+86) 10-5783- 9188(#800) Website : http://cn.tmaxsoft.com China(JV) Upright(Beijing) Software Technology Co., Ltd Room 1102, Building B, Recreo International Center, East Road Wang Jing, Chaoyang District, Beijing, 100102, P.R.C Tel : (+86) 10-5783-9188 Fax: (+86) 10-5783- 9188(#800) Website : www.uprightsoft.com TD-JSRP-C1101014 15