초대형 유한요소 해석결과의 효율적 후처리를 위한 후 분류 기반 병렬 가시화 알고리듬의 개발

.,.,..,.,. - i -

Abstract The necessity of large-scale finite element analysis is increased according to the request of reliable design of aerospace structures and highly accurate analysis. To satisfy the need for large-scale finite element analysis, various parallel computing algorithms are actively developing, where HPC power can be utilized intensively. However researches on efficient parallel postprocessing algorithms for large-scale analysis data are not carried out sufficiently. Therefore, in this paper a parallel visualization algorithm is proposed for efficient visualization of the massive data generated from large-scale parallel finite element analysis through investigating the characteristics of parallel rendering methods. The proposed parallel visualization algorithm is designed to be highly compatible with the characteristics of domain-wise computation in parallel finite element analysis by using sort-last sparse approach. And virtual communication network tableau algorithm is proposed to decrease the parallel visualization overhead on image composition step in sort-last parallel rendering. Additionally, data compression parallel visualization algorithm is proposed to enhance the performance of parallel visualization in the low-speed network environment such as GRID computing environment. Several benchmarking tests are carried out by using the developed in-house software, and the performance of the proposed algorithms are closely investigated. - ii -

ⅰ Abstract ⅱ ⅲ ⅵ ⅶ ⅻ 1. 1 2. 4 2.1 4 2.2 7 2.3 9 2.4 13 3. 16 3.1 16 3.2 21 3.3 25 - iii -

4. 29 4.1 30 4.1.1 ( ) 31 4.1.2 ( ) 36 4.1.3 55 4.2 64 4.2.1 67 4.2.2 74 4.2.3 78 5. 85 5.1 1 ( ) 86 5.2 2 (LS-Dyna ) 88 5.3 3 (ATLAS V500 ) 93 6. 95 7. 97 99 - iv -

101 A. nxview 101 A.1 101 A.2 103 A.3 105 A.4 106 A.5 108 A.6 110 A.7 111 A.8 nxview 114 B. MPI 115 C. 119 C.1 Psychovisual Redundancy 119 C.2 Bit-plane decomposition 124 D. / IPSAP 129 136 - v -

i N H W H j i W j i PH i PW i T X i max X i min Y i max Y i min Γ PDSL VCNT : i : : : : i j : i j : i : i : : i X : i X : i Y : i Y : : Pre-Detection Sort-Last : Virtual Communication Network Tableau - vi -

Fig. 1 Graphic pipeline using sort-first algorithm 4 Fig. 2 A concept of sort-first algorithm 4 Fig. 3 Graphic pipeline using sort-last algorithm 6 Fig. 4 A concept of sort-last algorithm 6 Fig. 5 Sort-first is connected to the parallel solver 7 Fig. 6 Sort-last is connected to the parallel solver 7 Fig. 7 The example of parallel visualization [ Pantheon ] 9 Fig. 8 The result of time on the parallel visualization 10 Fig. 9 The result of time ratio on the parallel overhead 11 Fig. 10 The chart of visualization 13 Fig. 11 Finite Element Model A and B for the rendering test using sort-last 14 Fig. 12 The rendering time result according to viewpoint 14 Fig. 13 A example of parallel visualization using 4 visualization cluster node 16 Fig. 14 Sort-Last full method 17 Fig. 15 Sort-Last sparse method 19 Fig. 16 Pre-Detection processing 21 Fig. 17 Pre-Detection Sort-Last sparse algorithm 22 Fig. 18 The chart of PDSL 22 Fig. 19 Speed-up result of FE analysis data on the FE model Pantheon 25 Fig. 20 Parallel efficiency result of FE analysis data on the FE model Pantheon 26 Fig. 21 The time result of depth comparison on the Sort-Last algorithms 27 Fig. 22 The result of time ratio on the Sort-Last full algorithm 28 Fig. 23 The result of time ratio on the Pre-Detection Sort-Last algorithm 28 - vii -

Fig. 24 Binary tree communication structure 31 Fig. 25 The network communication time analysis between pipeline and binary tree communication algorithms 32 Fig. 26 The characteristic of binary tree structural communication algorithm 33 Fig. 27 Inefficiency of binary tree communication structure on parallel visualization using synchronized user input signal 34 Fig. 28 A concept of Virtual Communication Network Tableau 37 Fig. 29 Screen detection and division using Pre-Detection information 39 Fig. 30 Examples of screen division 40 Fig. 31 Virtual Communication Network Tableau 41 Fig. 32 A example of intersection region between visualization node 4 and virtual screen 3 43 Fig. 33 Virtual Communication Network Tableau using weight factor as the size of network communication data 44 Fig. 34 Minimization of network communication on Virtual Communication Network Tableau using weight factor 46 Fig. 35 Chart of Virtual Communication Network Tableau 48 Fig. 36 Speed-up result of parallel visualization algorithms according to network communication structure 55 Fig. 37 Parallel efficiency result of parallel visualization algorithms according to network communication structure 56 Fig. 38 Depth comparison time result on Pre-Detection Sort-Last algorithm 57 Fig. 39 Network communication time result on Pre-Detection Sort-Last algorithm 58 Fig. 40 Parallel overhead time result on Pre-Detection Sort-Last algorithm 59 Fig. 41 The result of time ratio using Virtual Communication Network Tableau on Pre-Detection Sort-Last algorithm 60 - viii -

Fig. 42 The others of parallel overhead using Virtual Communication Network Tableau on Pre-Detection Sort-Last algorithm 61 Fig. 43 The result of time ratio according to network communication structure on Pre-Detection Sort-Last algorithm 62 Fig. 44 The chart of PDSL using data compression 66 Fig. 45 One Dimensional Run Length data encoding and decoding 67 Fig. 46 Original analysis data image and IGS data image 68 Fig. 47 RGB value data 69 Fig. 48 Finite Element Model ( HEXA element model ) 70 Fig. 49 The result of time according to compression algorithms 70 Fig. 50 Time analysis of One-Dimensional Run Length using Virtual Communication Network Tableau and PDSL 71 Fig. 51 Time analysis of One-Dimensional Run Length using binary-tree structure 72 Fig. 52 Parallel overhead result of binary-tree communication structure on 1Gbps LAN 72 Fig. 53 Parallel graining 74 Fig. 54 The chart of binary-tree communication structure using One-Dimensional Run Length encoding and decoding 75 Fig. 55 Speed-up result of binary-tree communication structure on 100Mbps LAN 76 Fig. 56 Network maximum data analysis on binary-tree communication structure 77 Fig. 57 The chart of Virtual Communication Network Tableau using Compression algorithm 79 Fig. 58 Time result of parallel visualization algorithms based on 100Mbps LAN 80 - ix -

Fig. 59 Parallel efficiency result of parallel visualization algorithm based on 100Mbps LAN using Virtual Communication Network Tableau 81 Fig. 60 The amount of network data according to algorithms on 100Mbps LAN 82 Fig. 61 Network communication time result according to algorithms on 100Mbps LAN 82 Fig. 62 Time analysis of Virtual Communication Network Tableau on 100Mbps LAN 84 Fig. 63 Finite Element model example 1 86 Fig. 64 Speed-up result of example 1 86 Fig. 65 The result of time ratio according to network communication structure on example 1 87 Fig. 66 Finite Element model example 2 88 Fig. 67 The result of example 2 using onboard graphic clusters 88 Fig. 68 The time result of example 2 using Geforce FX 5700 89 Fig. 69 Parallel overhead result on parallel visualization algorithms and network communication algorithms in example 2 90 Fig. 70 Time analysis using binary tree network communication on PDSL in example 2 91 Fig. 71 Time analysis using Virtual Communication Network Tableau on PDSL in example 2 91 Fig. 72 Finite Element model example 3 93 Fig. 73 The time result of example 3 94 Fig. 74 The result of time ratio according to network communication structure on example 3 94 Fig. A.1 104 Fig. A.2 105 - x -

Fig. A.3 106 Fig. A.4 108 Fig. A.5 108 Fig. A.6 109 Fig. A.7 109 Fig. A.8 110 Fig. B.1 115 Fig. C.1 Image resampling 119 Fig. C.2 Uniform quantization to 16 levels 120 Fig. C.3 IGS quantization to 16 levels 121 Fig. C.4 24bit Original RGB Image 122 Fig. C.5 Uniform quantization image 123 Fig. C.6 IGS quantization image 123 Fig. C.7 One dimensional run-length code with Bit-plane decomposition 124 Fig. C.8 Bit-plane decomposition 126 Fig. C.9 Image quantization and Bit-plane decomposition 128 Fig. D.1 129 Fig. D.2 IPSAP 130 Fig. D.3 Perspective mode 131 Fig. D.4 Orthogonal mode 131 Fig. D.5 8 131 Fig. D.6 (5 ) 132 Fig. D.7 132 Fig. D.8 8 133 Fig. D.9 ( ) 134 Fig. D.10 135 - xi -

Table 1. 15 Table 2. 15 Table 3. The analysis of sort-last full method 18 Table 4. The analysis of sort-last sparse method 20 Table 5. The analysis of Pre-Detection Sort-last algorithm 23 Table 6. MPI non-blocking mode communication using C++ 42 Table 7. Step #1 on Virtual Communication Network Tableau using C++ 49 Table 8. Step #2 on Virtual Communication Network Tableau using C++ 49 Table 9. Step #3 on Virtual Communication Network Tableau using C++ 50 Table 10. Step #4 on Virtual Communication Network Tableau using C++ 51 Table 11. Step #5 on Virtual Communication Network Tableau using C++ 52 Table 12. Step #6 on Virtual Communication Network Tableau using C++ 53 Table 13. 800 600 65 Table. A.1 101 Table. A.2 101 Code A.1 opengl 111 Code A.2 112 Code A.3 113 Code B.1 116 Code B.2 Data stream 118 Table. C.1 IGS quantization to 16 levels 121 - xii -

1.. HPC(High Performance Computing), ASCI(Accelerated Strategic Computing Initiative) [1] ASCI Q, ASCI White, ASCI Red, ASCI Blue-Pacific, Salinas[1]. Salinas ASCI, ASCI., GeoFEM[2,3],. 200 400 CPU Pegasus[4,5], 1000 IPSAP[5],.,,, (High Performance Computing). 500 (http://www.top500.org) 208., - 1 -

,,,. GeoFEM 2003 [3],,.,,.,, Salinas/GeoFEM.,. 3D CAD/CAM, Digital Mockup, VP(Virtual Prototyping)/VM(Virtual Manufacturing) / (Virtual Development),. Aerospace America 2001 2002 [6,7],,,. - 2 -

ASCI, Salina, GeoFEM, wiregl[8]. wiregl CFD(Computational Fluid Dynamics), (Sort-first algorithm). wiregl,.,.,,,,.,. - 3 -

2. 2.1 [9] (Sort-first) (Sort-last), (Sort-last full) (Sort-last sparse). 1. Fig. 1 Graphic pipeline using sort-first algorithm,,. 2. Fig. 2 A concept of sort-first algorithm - 4 -

2 (Load balancing). 2,,.,,.,, [10],.. [11]., 3..,, - 5 -

,. 4. Fig. 3 Graphic pipeline using sort-last algorithm Fig. 4 A concept of sort-last algorithm..,.. (Parallel overhead), 4. - 6 -

2.2,. 5 1, 6 5. Fig. 5 Sort-first is connected to the parallel solver Fig. 6 Sort-last is connected to the parallel solver 5. 6,.[12] 6-7 -

,.,. (Network bandwidth) [9],.,., (Pre-Detection Sort-Last sparse),... - 8 -

2.3,.,. 8 7 Pantheon 12. 7 273,157 1,329,027. 4. Fig. 7 The example of parallel visualization [ Pantheon ] - 9 -

병렬화시간분석 [ Sort Last - full ] 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 12 대 1 대 12 = 42.9 % 0 10 20 30 40 50 60 70 80 시간 [ 초 ] Fig. 8 The result of time on the parallel visualization 8 4 1Gbps., 7,. 7 4,, 12 8 43%. 8 12,, (Parallel overhead). - 10 -

. 9. 병렬화시간비율분석 [ Sort Last - full ] 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 12 대 1 대 6.2784 3.58659 74.3614 3.41836 1.14995 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 시간분포 [%] Fig. 9 The result of time ratio on the parallel overhead 12 9 14.5, 43.4%, 24.8%, 23.6%, 7.9%..,.. 2,,.. - 11 -

(Pre-Detection Sort-Last sparse algorithm, PDSL), (Virtual Communication Network Tableau, VCNT),.. - 12 -

2.4., 10. Fig. 10 The chart of visualization 10..,. B,.. - 13 -

,. 3., 10,.[12] 12 11,. The Side view The Front view Finite Element Model A The Side view The Front view Finite Element Model B Fig. 11 Finite Element Model A and B for the rendering test using sort-last 모델정면묘사 모델측면묘사 time [sec] 14 12 10 8 6 4 2 0 FE model A FE model B Fig. 12 The rendering time result according to viewpoint - 14 -

12,..,. 12. 360 24, 24.,,. 1,2. Table 1. Cluster CPU C P U P4 2.8GHz Memory 1 GByte / node Graphic card Geforce FX 5700 Network card 1 Gbps / node Switching HUB 1Gbps HUB / 100Mbps HUB Table 2. Operating System Linux Kernel version 2.4.18-15hl compiler version gcc-3.2 QT 3.0.5 lam mpi 7.0 opengl 1.3 (glut ) - 15 -

3. 3.1 (Sort-Last full) (Sort-Last sparse) 13 4 (CPU 4). 13.. Fig. 13 A example of parallel visualization using 4 visualization cluster node 2 1 (Sort-last full method) (Sort-last sparse method) - 16 -

.. 14,..,,, Pixel merging [11],. Fig. 14 Sort-Last full method - 17 -

, 1, 1,2. (Sort-last full method) N (CPU N). Table 3. The analysis of sort-last full method : Σi N ( Γ W H ) i : W H N Γ W H : [CPU ] : [RGBA, Z-buffer depth] : : 15,. 15, 1, 1,2. 15, - 18 -

,. Fig. 15 Sort-Last sparse method (Sort-last sparse) N (CPU N).. - 19 -

Table 4. The analysis of sort-last sparse method : Σi N ( Γ W i H i ) i : W i H i N : [CPU ] Γ : [RGBA, Z-buffer depth] W : H : X i ma x : i x X i min : i x Y i max : i y Y i min : i y W i : i X i max - X i min = W i W H i : i Y i max - Y i min = H i H.,,., (Pre-Detection Sort-Last sparse algorithm). - 20 -

3.2 (Pre-Detection Sort-Last sparse Algorithm, PDSL) 16,, Pixel merging 14,15,.[13] X i min X i ma x Y i max PX k min PX k ma x Y i min PY k ma x PY k min X j min X j max Y j max Y j min PXma k x = min[ Xmax, i Xmax j ] PXmin k = max[ Xmin, i Xmin j ] PYma k x = min[ Ymax, i Ymax j ] PYmin k = max[ Ymin, i Ymin j ] : x : x : y : y Fig. 16 Pre-Detection processing - 21 -

17. Fig. 17 Pre-Detection Sort-Last sparse algorithm 18. Fig. 18 The chart of PDSL - 22 -

10,... Table 5. The analysis of Pre-Detection Sort-last algorithm : Σi N ( Γ W i H i ) i : PW i PH i N Γ : : [RGBA, Z-buffer depth] W : H : X i ma x : i x X i min : i x Y i max : i y Ymin i : i y W i : i X i max - X i min = W i W H i : i Y i max - Y i min = H i H PW k : min[x i max,x j max ] - max[x i min,x j min ] PW k W k W PH k : min[y i max,y j max] - max[y i min,y j min ] PH k H k H - 23 -

. = PDSL = SL-sparse SL-full PDSL SL-sparse SL-full (PDSL). 4 1. - 24 -

3.3 1 2. 2 4 1,2. 7 Pantheon,. 4. 19 12 Speed-up, 20. 병렬가시화 Speed-up 결과 전체후분류기법 선탐색부분후분류기법 부분후분류기법 Ideal 14 12 10 Speed up 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 Fig. 19 Speed-up result of FE analysis data on the FE model Pantheon - 25 -

병렬가시화병렬효율측정결과 120 전체후분류기법 선탐색부분후분류기법 부분후분류기법 Ideal 100 효율 [%] 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 20 Parallel efficiency result of FE analysis data on the FE model Pantheon 20, 42.9%, 56.2% 59.6%., 7,.,. 21, - 26 -

. 시간 [ 초 ] 4 3.5 3 2.5 2 1.5 1 0.5 병렬가시화알고리듬의깊이비교연산시간결과비교 전체후분류기법부분후분류기법선탐색부분후분류기법 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 21 The time result of depth comparison on the Sort-Last algorithms 22, 23., 22 (Parallel overhead) 57.1%. 23, 40.4%. - 27 -

전체후분류기법의병렬화시간비율분석 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 12 11 10 9 8 7 6 5 4 3 2 1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 시간비율 [%] Fig. 22 The result of time ratio on the Sort-Last full algorithm 선탐색부분후분류기법의병렬화시간비율분석 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 12 11 10 9 8 7 6 5 4 3 2 1 0% 20% 40% 60% 80% 100% 시간비율 [%] Fig. 23 The result of time ratio on the Pre-Detection Sort-Last algorithm - 28 -

4.,.,. 8,9,. 3,.,., (Binary structural communication) (Structural communication), (Virtual Communication Network Tableau) (Non-structural communication).,. - 29 -

4.1,,,.,... - 30 -

4.1.1 ( ). 13 24.[12,13] 1 1 3 1 2 3 4 Fig. 24 Binary tree communication structure - 31 -

N,,. T : N M = min{ x x=2 a N, a=1,2,3, } : T log 2 (M) 25,. Fig. 25 The network communication time analysis between pipeline and binary tree communication algorithms,. Tlog 2 M. - 32 -

26. 이진트리구조특성 [ Sort Last - full ] 네트워크통신시간깊이비교시간병렬화를위한연산시간 9 8 7 6 시간 [ 초 ] 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 26 The characteristic of binary tree structural communication algorithm 26, 3, 5, 9.,.,.,, 23, 3, - 33 -

5, 9., 24., 27. Fig. 27 Inefficiency of binary tree communication structure on parallel visualization using synchronized user input signal 27, 2 1 1.. MPI (Non-Blocking), 2 4 10-34 -

., (Load balancing),., 1.2 (Non-structural communication) (Virtual Communication Network Tableau). - 35 -

4.1.2 ( ) 1.1,. (Virtual Communication Network Tableau). (Non-structural communication). 28. 28,,.. 27,,.,,. 28,,. - 36 -

Fig. 28 A concept of Virtual Communication Network Tableau,,., 1 1. 28 8, 4. - 37 -

. 28,.. 28 4 1 1,.. 1?,? 2 1,? 3? (Non-Blocking). - 38 -

29. Fig. 29 Screen detection and division using Pre-Detection information - 39 -

18 29.., 30. Fig. 30 Examples of screen division. 2.., 30. - 40 -

.,. 31 1 0,. (Virtual Communication Tableau),. Fig. 31 Virtual Communication Network Tableau - 41 -

, (Dead-lock). 31 1 1 1. 28 2, 1., C++ MPI (Non-Blocking). MPI::Request send; MPI::Request recv; Table 6. MPI non-blocking mode communication using C++ send=mpi::comm_world.isend(,,,, ); recv=mpi::comm_world.irecv(,,,, ); send.wait(); recv.wait(); 28 MPI (Non-Blocking). 31. 31 1 0,. - 42 -

3... (1),(2),(3) 3. Γ W j H j (1) Γ W j i H j i (2) Γ Wi j Hi j (3) i =, j = Fig. 32 A example of intersection region between visualization node 4 and virtual screen 3-43 -

3. (3) 32... (3) 31 33. Fig. 33 Virtual Communication Network Tableau using weight factor as the size of network communication data - 44 -

33,. (Virtual Communication Network Tableau). 1 2,. 1 1,.,. 34, 33.,.. - 45 -

Fig. 34 Minimization of network communication on Virtual Communication Network Tableau using weight factor 33 34. 1, 33 1 (4), 34 3 (5). (4) (5). 25491(Screen 2) + 25491(Screen 3) + 202(Screen 4) = 51184 (4) 2048(Screen 1) + 25491(Screen 2) + 202(Screen 4) = 27741 (5) - 46 -

. 34,.... (Non-structural communication). 35. 35, 10 8.., 3 30, 5 34... - 47 -

Fig. 35 Chart of Virtual Communication Network Tableau - 48 -

. 28, 9. C++. 7 1. MPI Allgather 29. Table 7. Step #1 on Virtual Communication Network Tableau using C++ unsigned short int PD_inf[np][4]; // Total PD information unsigned short int my_pos[4]; // Temporary memory for data communication for (i=0 ; i<4 ; i++) my_pos[i] = PD[i]; // Pre-Detection information each node MPI::COMM_WORLD.Allgather(my_pos, 4, MPI::UNSIGNED_SHORT, PD_inf, 4, MPI::UNSIGNED_SHORT); 8. 7. Table 8. Step #2 on Virtual Communication Network Tableau using C++ unsigned short int bp[4]; // Maximum buffer pixel position bp[0] = width(); // Initial buffer value for x min bp[1] = 0; // Initial buffer value for x max bp[2] = height(); // Initial buffer value for y min bp[3] = 0; // Initial buffer value for y max for (i=0 ; i<np ; i++) for (j=0 ; j<4 ; j++) { if ((j==0 j==2) && (PD_inf[i][j] <= bp[j])) bp[j] = PD_inf[i][j]; // Min if ((j==1 j==3) && (PD_inf[i][j] >= bp[j])) bp[j] = PD_inf[i][j]; // Max } - 49 -

9 35., 30. 8. Table 9. Step #3 on Virtual Communication Network Tableau using C++ unsigned short int pbp[np][4]; // Total division screen information para_range(bp[2], bp[3], np, my_rank, bp+2, bp+3); // Division screen MPI::COMM_WORLD.Allgather(bp, 4, MPI::UNSIGNED_SHORT, pbp, 4, MPI::UNSIGNED_SHORT); // Gathering the screen information // Additional function for screen division void para_range( unsigned short int n1, unsigned short int n2, int nprocs, int myrank, unsigned short int *ista, unsigned short int *iend) { int iwork1, iwork2; iwork1 = (n2-n1+1) / nprocs; iwork2 = (n2-n1+1) % nprocs; *ista = myrank*iwork1 + n1 + min(myrank, iwork2); *iend = *ista + iwork1-1; if(iwork2 > myrank) *iend = *iend + 1; } 9 [14],. 9 /. Allgather. 10 32-50 -

.,. Table 10. Step #4 on Virtual Communication Network Tableau using C++ unsigned short int sbis[np][4]; // send bi-section result unsigned int f_srrc[np]; // Send size information for (i=0 ; i<np ; i++) f_srrc[i] = bisection(pd_inf[my_rank], pbp[i], sbis[i]); int PostProcessor::bisection(unsigned short int sec_a[4], unsigned short int sec_b[4], unsigned short int output[4]) { output[0] = width(); // Initial output value x min output[1] = 0; // Initial output value x max output[2] = height(); // Initial output value y min output[3] = 0; // Initial output value y max if ((sec_b[0] <= sec_a[0]) && (sec_a[0] <= sec_b[1])) output[0] = sec_a[0]; // x min if ((sec_a[0] <= sec_b[0]) && (sec_b[0] <= sec_a[1])) output[0] = sec_b[0]; // x min if (output[0] == width()) // There is NOT intersection region. { output[0] = 0; output[2] = 0; return 0; } if ((sec_b[2] <= sec_a[2]) && (sec_a[2] <= sec_b[3])) output[2] = sec_a[2]; // y min if ((sec_a[2] <= sec_b[2]) && (sec_b[2] <= sec_a[3])) output[2] = sec_b[2]; // y min if (output[2] == height()) // There is NOT intersection region. { output[0] = 0; output[2] = 0; return 0; } if ((sec_b[0] <= sec_a[1]) && (sec_a[1] <= sec_b[1])) output[1] = sec_a[1]; // x max if ((sec_a[0] <= sec_b[1]) && (sec_b[1] <= sec_a[1])) output[1] = sec_b[1]; // x max if ((sec_b[2] <= sec_a[3]) && (sec_a[3] <= sec_b[3])) output[3] = sec_a[3]; // y max if ((sec_a[2] <= sec_b[3]) && (sec_b[3] <= sec_a[3])) output[3] = sec_b[3]; // y max return ((output[1]-output[0]+1)*(output[3]-output[2]+1)); // Total communication data } - 51 -

32 10, f_srrc. f_srrc,. 11. 10 (f_srrc). Table 11. Step #5 on Virtual Communication Network Tableau using C++ unsigned int check_srrc[np][np]; MPI::COMM_WORLD.Allgather(f_SRRC,np,MPI::UNSIGNED,check_SRRC,np,MPI::U NSIGNED); int send_sum; int recv_sum; int send_node[np]; int recv_node[np]; // Total send count // Total recv count // Send node information // Recv node information send_sum = 0; recv_sum = 0; for (i=0 ; i<np ; i++) // Check Send node & Recv node { if (i!= my_rank) // do NOT need to send/recv for itself-node { if (check_srrc[my_rank][i]!= 0) send_node[send_sum++] = i; if (check_srrc[i][my_rank]!= 0) recv_node[recv_sum++] = i; // Row-major // Column-major } } 11,.,.. - 52 -

12. Table 12. Step #6 on Virtual Communication Network Tableau using C++ float ** trd; // Temporary recv depth buffer unsigned char ** trp; // Temporary recv pixel buffer unsigned short int rbis[recv_sum][4]; // Recv bi-section information int b_size; if (recv_sum!= 0) { trd = new float * [recv_sum]; // Temporary recv depth per Recv node trp = new unsigned char * [recv_sum]; // Temporary recv pixel per Recv node } for (i=0 ; i<recv_sum ; i++) // Recv information { b_size = bisection(pd_inf[recv_node[i]], pbp[sn], rbis[i]); trd[i] = new float [b_size]; trp[i] = new unsigned char [b_size*pix]; } int bs[2]; // buffer size information unsigned char * buffer_pixel; // pixel buffer float * buffer_depth; // depth buffer bs[0] = pbp[sn][1] - pbp[sn][0] + 1; // Local buffer width bs[1] = pbp[sn][3] - pbp[sn][2] + 1; // Local buffer height buffer_pixel = new unsigned char [ bs[0]*bs[1]*pix ]; buffer_depth = new float [ bs[0]*bs[1] ]; for (i=pbp[sn][2] ; i<=pbp[sn][3] ; i++) // Y for (j=pbp[sn][0] ; j<=pbp[sn][1] ; j++) // X { if 자기자신의가상화면데이터로초기화 else 깊이값은 1로픽셀값은 0 혹은 1로초기화 } 11., 6 (Non-blocking). - 53 -

, 35. 1.3. - 54 -

4.1.3 1.1 1.2. 2 4 1, 2, 7 273,157 1,329,027. 36 Speed-up. 병렬가시화 Speed-up 결과 14 12 10 전체후분류기법 부분후분류기법 선탐색부분후분류기법 선탐색부분후분류기법 + 가상통신망구성기법 Ideal Speed up 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 Fig. 36 Speed-up result of parallel visualization algorithms according to network communication structure,. 3-55 -

,. 37 36. 120 병렬가시화병렬효율측정결과 100 80 효율 [%] 60 40 20 0 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 가상통신망구성기법 Ideal 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 37 Parallel efficiency result of parallel visualization algorithms according to network communication structure,, 10.4%. 1.2, - 56 -

. 23, 30%. 38,. 선탐색부분후분류기법의통신기법에따른깊이비교연산시간의비교 이진트리통신구조 가상통신망통신기법 1.4 1.2 1 시간 [ 초 ] 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 38 Depth comparison time result on Pre-Detection Sort-Last algorithm. 28 35, - 57 -

39. 선탐색부분후분류기법의통신기법에따른통신시간측정결과 이진트리통신구조 가상통신망통신기법 2.5 2 시간 [ 초 ] 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 39 Network communication time result on Pre-Detection Sort-Last algorithm 39 26.,,. 12,.,. - 58 -

. 40. 선탐색부분후분류기법의통신기법에따른병렬화연산시간비교 이진트리통신구조 가상통신망기법 5 4.5 4 3.5 시간 [ 초 ] 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 40 Parallel overhead time result on Pre-Detection Sort-Last algorithm 40.. 37,,. 41-59 -

. 가상통신망기법을이용한선탐색부분후분류기법의병렬화시간비율분석 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 12 11 10 9 8 7 6 5 4 3 2 1 0% 20% 40% 60% 80% 100% 시간비율 [%] Fig. 41 The result of time ratio using Virtual Communication Network Tableau on Pre-Detection Sort-Last algorithm 12, 73.6%. 1.4%.,.,. (Access time). 42-60 -

,. 가상통신망행렬구조의병렬연산을위한기타시간측정 그래픽버퍼읽기시간버퍼삭제시간 버퍼설정시간그래픽버퍼쓰기시간 시간 [ 초 ] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 42 The others of parallel overhead using Virtual Communication Network Tableau on Pre-Detection Sort-Last algorithm 2..,. 43, 12. - 61 -

. 선탐색부분후분류기법의통신구조에따른병렬화시간비율분석 랜더링시간후분류깊이비교시간 네트워크통신시간병렬화를위한기타시간 가상통신망기법 통신구조 이진트리구조 0% 20% 40% 60% 80% 100% 시간비율 [%] Fig. 43 The result of time ratio according to network communication structure on Pre-Detection Sort-Last algorithm,.. 23, 23.. - 62 -

,.,.,.,.. 2. - 63 -

4.2 1,. 1Gbps, 100Mbps 10Mbps., [15]. NASA IPG(Information Power Grid)[16], EU DataGrid Project[17].,,.,. 13 800 600., 25.6Mbit 24FPS(Frame per second, Hz ) 614Mbit. - 64 -

1Gbit Lan Cable category 5., 100Mbps 10Mbps. 24FPS,. Table 13. 800 600 RGB( 1byte 3) 800 600 = 1.37Mbyte (4byte) 800 600 = 1.83Mbyte = 3.2Mbyte = 25.6Mbit 24 FPS(Frame per second), (1.37+1.83) 24 = 76.8 Mbyte/second = 614Mbit/second.. 44. 44,, Encoding Decoding,., (Load balancing). - 65 -

. Fig. 44 The chart of PDSL using data compression - 66 -

4.2.1 44,, Encoding Decoding. 2 4,. Encoding Decoding. Encoding, Decoding., Encoding Decoding. 45 One-Dimensional Run Length Encoding / Decoding [18]. Encoding Decoding. Encoding Decoding. One-Dimensional Run Length Encoding/Decoding. Fig. 45 One Dimensional Run Length data encoding and decoding - 67 -

45,. RGB 8bit(1byte), 32bit(4byte). 45 1 2 Red.. RGB. Bit-plane, Encoding Decoding RGB. Bit-plane C. Encoding/Decoding, One-Dimensional Run Length Uniform quantization IGS(Improved Gray Scale) quantization. quantization RGB 8bit, 4bit RGB, 46. Original analysis data image IGS data image Fig. 46 Original analysis data image and IGS data image - 68 -

Uniform quantization IGS quantization C. 47 RGB. Original Image 256 Level Uniform quantization 16 Level Image IGS quantization 16 Level Image RGB R G B Fig. 47 RGB value data - 69 -

quantization Encoding Decoding 49. 49 48 One-Dimensional Run Length(ODRL) Uniform / IGS quantization. 100Mbps. Fig. 48 Finite Element Model ( HEXA element model ) 25 20 23.712 1 : PDSL + ODRL 2 : PDSL + ODRL + IGS 3 : PDSL + ODRL + Uniform Time [second] 15 10 5 0 13.103 13.4295 13.1349 PDSL 1 2 3 Algorithm Fig. 49 The result of time according to compression algorithms - 70 -

, Encoding/Decoding., Encoding Decoding. 50 One-Dimensional Run Length Encoding/Decoding. 1Gbps. 2.5 선탐색부분후분류기법에대한 1Gbps LAN 환경하에서의압축알고리듬적용된네트워크시간측정 ( 가상통신망행렬구조 ) 네트워크통신시간 Encoding 시간 Decoding 시간 2 시간 [ 초 ] 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 50 Time analysis of One-Dimensional Run Length using Virtual Communication Network Tableau and PDSL 39,. Encoding Decoding. 51. - 71 -

선탐색부분후분류기법에대한 1Gbps LAN 환경하에서의압축알고리듬적용된네트워크시간측정 ( 이진트리통신구조 ) 네트워크통신시간 Encoding 시간 Decoding 시간 2.5 2 시간 [ 초 ] 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 51 Time analysis of One-Dimensional Run Length using binary-tree structure 1Gbps LAN 환경하에서의알고리듬에따른병렬연산시간비교 ( 이진트리통신구조 ) 선탐색부분후분류기법 선탐색부분후분류기법 + 압축알고리듬 6 5 시간 [ 초 ] 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 52 Parallel overhead result of binary-tree communication structure on 1Gbps LAN - 72 -

1Gbps 39,. 52.. 2.2 2.3 100Mbps. - 73 -

4.2.2 1.1. 53 Fine-grained [19]. Encoding Decoding. 1.1, Fine-grained. Fig. 53 Parallel graining - 74 -

54. Fig. 54 The chart of binary-tree communication structure using One-Dimensional Run Length encoding and decoding 54 1 Encoding 2 Decoding, 1 2 Encoding (Load balancing).[20] Fine-grained. 2.3 Coarse-grained., Fine-grained. - 75 -

2.1, 1Gbps Encoding/Decoding. 100Mbps 55. 100Mbps 네트워크환경하에서의이진트리통신구조의알고리듬에따른 Speed-up 결과 선탐색부분후분류기법 선탐색부분후분류기법 + 압축기법 Ideal 14 12 10 Speed-up 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 55 Speed-up result of binary-tree communication structure on 100Mbps LAN 100Mbps 14%.. 56,. 32%. - 76 -

100Mbps 네트워크환경하에서의이진트리통신구조를이용한각알고리듬의최대통신량분석결과 선탐색부분후분류기법 선탐색부분후분류기법 + 압축기법 45 40 최대통신량 [Mbits] 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 56 Network maximum data analysis on binary-tree communication structure - 77 -

4.2.3. 2.2. 28 35,,. 57., MPI (Non-blocking), Fine-grained 53 Coarse-grained. 57,. RGB, RGB. - 78 -

Fig. 57 The chart of Virtual Communication Network Tableau using Compression algorithm - 79 -

100Mbps 58. 80 100Mbps LAN 환경하에서의알고리듬시간측정 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 압축기법선탐색부분후분류기법 + 가상통신망행렬기법선탐색부분후분류기법 + 가상통신망행렬기법 + 압축기법 시간 [ 초 ] 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 58 Time result of parallel visualization algorithms based on 100Mbps LAN,. 100Mbps.. 59,. - 80 -

100Mbps LAN 환경하에서의알고리듬에따른병렬효율측정 선탐색부분후분류기법선탐색부분후분류기법 + 압축기법선탐색부분후분류기법 + 가상통신망행렬기법선탐색부분후분류기법 + 가상통신망행렬기법 + 압축기법 120 100 병렬효율 [%] 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 59 Parallel efficiency result of parallel visualization algorithm based on 100Mbps LAN using Virtual Communication Network Tableau 27%.. 7 5. 60..,. 61. - 81 -

전송량 [Mbits] 45 40 35 30 25 20 15 10 5 0 100Mbps LAN 환경하에서의알고리듬에따른최대데이터전송량측정결과 선탐색부분후분류기법선탐색부분후분류기법 + 압축기법선탐색부분후분류기법 + 가상통신망행렬구조선탐색부분후분류기법 + 가상통신망행렬구조 + 압축기법 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 60 The amount of network data according to algorithms on 100Mbps LAN 100Mbps LAN 환경하에서의네트워크통신시간측정결과 선탐색부분후분류기법선탐색부분후분류기법 + 압축기법선탐색부분후분류기법 + 가상통신망행렬구조선탐색부분후분류기법 + 가상통신망행렬구조 + 압축기법 시간 [ 초 ] 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 61 Network communication time result according to algorithms on 100Mbps LAN - 82 -

60 61,,.. (branch), 35 ( 34 7 ) ( 34 9 ),,., 100Mbps.,. 1.2, 59. 12-83 -

62. 100Mbps LAN 환경하에서의압축기법이적용된병렬알고리듬의병렬화성능분석 ( 가상통신망행렬구조 ) 랜더링시간 Encoding 시간후분류깊이비교시간 네트워크통신시간 Decoding 시간병렬화를위한기타시간 선탐색부분후분 + 압축기법알고리듬류기법선탐색부분후분류기법 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100 % 시간비율 [%] Fig. 62 Time analysis of Virtual Communication Network Tableau on 100Mbps LAN, Encoding/Decoding.., Encoding/Decoding 62.. 100Mbps 10Mbps. - 84 -

5. 7 Pantheon,... - 85 -

5.1 1 ( ) 63 1,000,000. Fig. 63 Finite Element model example 1 64 65. 1Gbps 네트워크환경에서의병렬 Speed-up 측정결과 1,000,000 HEXA 유한요소모델 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 가상통신망행렬구조 Ideal 시간 [ 초 ] 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 64 Speed-up result of example 1-86 -

63,. 64. 10%. 65. 1Gbps 네트워크환경에서의선탐색부분후분류기법의알고리듬에따른시간비율 1,000,000 HEXA 유한요소모델 랜더링시간깊이비교연산시간 네트워크통신시간병렬화를위한기타시간 가상통신망행렬구조 알고리듬 이진트리통신구조 0% 20% 40% 60% 80% 100% 시간 [ 초 ] Fig. 65 The result of time ratio according to network communication structure on example 1-87 -

5.2 2 ( ) 66. Fig. 66 Finite Element model example 2 Pentium Ⅳ 2.4 GHz 8. Fig. 67 The result of example 2 using onboard graphic clusters 67. 8 74%. - 88 -

Geforce FX 5700. 68. 1Gbps 네트워크환경에서의가시화시간측정 LS-Dyna 유한요소모델 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 가상통신망행렬구조 시간 [ 초 ] 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 68 The time result of example 2 using Geforce FX 5700 68, 12%. 67. 67 8 18, 68 3.2. 68, 68. 69. - 89 -

1Gbps 네트워크환경에서의병렬화를위한추가연산시간비교 LS-Dyna 유한요소모델 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 가상통신망행렬구조 시간 [ 초 ] 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 69 Parallel overhead result on parallel visualization algorithms and network communication algorithms in example 2 68 69,,.,.. 70, 71. - 90 -

선탐색부분후분류기법을이용한이진트리통신구조에서의시간분석 LS-Dyna 유한요소모델 랜더링시간깊이비교연산시간 네트워크통신시간병렬화를위한기타시간 7 6 5 시간 [ 초 ] 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 70 Time analysis using binary tree network communication on PDSL in example 2 선탐색부분후분류기법을이용한가상통신망행렬구조에서의시간분석 LS-Dyna 유한요소모델 랜더링시간깊이비교연산시간 네트워크통신시간병렬화를위한기타시간 7 6 5 시간 [ 초 ] 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 가시화노드 [ 대수 ] Fig. 71 Time analysis using Virtual Communication Network Tableau on PDSL in example 2-91 -

,. 70 5 9. 71.,. 4 1,. - 92 -

5.3 3 (ATLAS V500 ) 72 ATLAS V500. 71 8. ATLAS V500 Finite Element model Node 400,517 Element(Solid) 255,550 Fig. 72 Finite Element model example 3 73,.,. - 93 -

1Gbps 네트워크환경에서의가시화시간측정 ATLAS V500 발사체 전체후분류기법부분후분류기법선탐색부분후분류기법선탐색부분후분류기법 + 가상통신망행렬구조 시간 [ 초 ] 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 가시화노드 [ 대수 ] Fig. 73 The time result of example 3 1Gbps 네트워크환경에서의선탐색부분후분류기법의알고리듬에따른시간비율분석 랜더링시간깊이비교연산시간 네트워크통신시간병렬화를위한기타시간 가시화노드 [ 대수 ] 가상통신망행렬구조 이진트리통신구조 0% 20% 40% 60% 80% 100% 알고리듬 Fig. 74 The result of time ratio according to network communication structure on example 3-94 -

6. (Pre-Detection Sort-Last sparse algorithm) (Virtual Communication Network Tableau).,., 16.,,.[21]. 1600 1400,.. 4 1, 33, 34. 35,. 30-95 -

,.,..,.. 1600 1400,...,..,.,,. - 96 -

7. /.,..,.,.,.... 100Mbit 10Mbit - 97 -

,.,.,,,,,.,,. - 98 -

[1] Bhardwaj, M., Pierson, K., Reese, G., Walsh, T., Day, D., Alvin, K., Peery, J., Farhat, C., Lesoinne, M., "Salinas:A Scalable Software for High-Performance Structural and Solid Mechanics Simulations", Proceedings of the IEEE/ACM SC2002 Conference, November 2002, pp.35 [2] Yagawa, G., Okuda, H., Nakamura, H., "GeoFEM : Multi-Purpose Parallel FEM system for Solid Earth", Fourth World congress on Computational Mechanics, Vol. 2, 1988, pp.1048 [3] Hiroshi Okuda, "Developmenet of Solid Earth Simulation Platform," FY2002 Report of Earth Simulator Results, 2003. [4],,,,,,, 29, 3, 2001, pp. 28-37 [5] Kim, J.S., Lee, C.S., Kim. J.H., Joh, M.S., Lee, S.S., "IPSAP : A High-performance Parallel Finite Element Code for Large-scale Structural Analysis Based on Domain-wise Multifrontal Technique", Proceedings of the ACM/IEEE SC2003 Conference, 2003, p.32 [6] Edwards, D. E., "Interactive computer graphics," Aerospace America, December, 2001, p. 77. [7] Jay G. Horowitz, Interactive computer graphics, Aerospace America, December, 2002 [8] Humphreys, G., Eldridge, M., Buck, I., Stoll, G., Everett, M., Hanrahan., P., "WireGL : A scalable graphics system for clusters," Proceedings of SIGGRAPH 2001, pp. 129-140 [9] Molnar, S., Cox, M., Ellsworth, D., Fuchs, H., "A sorting Classification of Parallel Rendering," IEEE Computer Graphics and Applications, Vol. 14, No. 4, 1994, pp. 23-32 [10] Lee, T. Y., Raghavendra, C. S., Nicholas, J. B., "Image Composition Schemes for Sort-Last Polygon Rendering on 2D Mesh Multicomputers," - 99 -

IEEE Transactions on Visualization and Computer Graphics, Vol. 2, No. 3, 1996, pp. 202-217 [11] Foley, J., Dam, A., Feiner, S., Hughes, J., "Computer Graphics : principles and practice", ADDISON WESLEY, second edition, 1997, pp.668-672 [12],,,,,, 32, 10, 2004, pp. 38-45 [13],, MPI, 2003, 2003 4, pp.164-168 [14], MPI,, 2003, pp.97 [15] Foster, I., Kesselman, C., Tuecke, S., "The Anatomy of the Grid : Enabling Scalable Virtual Organizations", International Journal Supercomputer Applications, Vol. 15, No.3, 2001. [16] http://www.ipg.nasa.gov [17] http://www.eu-datagrid.org [18] Gonzales, R.C., Woods, R.E., "Digital Image Processing", Prentice Hall, second edition, 2002, pp.409-514. [19] Kumar, V., Grama, A., Gupta. A., Karypis, G., "Introduction to parallel computing", The Benjamin/Cummings Publishing Company, 1994, pp.124-126 [20],,,, 2004, 2004 11, pp.567-571 [21],,,, 2004, 2004 11, pp.562-566 - 100 -

A. nxview [nxview is n times View : Visualization with n node cluster] A.1 - Table. A.1 Cluster Cluster 16 C P U 16 P4 2.4 GHz Memory 1 GByte / node DDR Network Network card 1 GBit / node configuration Switching HUB Asante 1GBit 16port HUB Table. A.2 Operation System O S configuration Kernel version compiler version Linux 2.4.18-15hl gcc-3.2 Red hat 8.0 Hancom Linux 3.0 QT 3.0.5 Library lam mpi 6.5.6 opengl 1.3 glut - 101 -

1. 2 30 ( 1,000,000 DOF) 16 1 FPS. 24 FPS,. 3,,. 4. 5,. - 102 -

A.2., (Rendering pipeline) Geometry Rasterization,, GPU PC. Geometry pipeline Rasterization, Geometry, (Sort-first algorithm) (Sort-last algorithm). 1 [Sort-First algorithm] Stanford University wiregl A.1. Open Source Chromium project. A.1 (Load balancing), - 103 -

(Element),. 2 [Sort-Last algorithm] A.1., (Sort-Last algorithm). A.1,. Fig. A.1-104 -

A.3 1 (Pipeline communication structure) A.2,,. 2 (Binary tree communication structure),. Fig. A.2-105 -

A.4 A.3. FASTA(in house FEM analysis program), NASTRAN. Node/Element information Disp./Stress/Strain information Scale factor Option information Color index box Displacement (3 ) Stress (7 ) Strain (6 ) Bench marker Network speed Network Frame per second Fig. A.3-106 -

/ -, Displacement/Stress/Strain information - Scale factor - Scale, Scale - - - (Color index box) Displacement Color index - X, Y, Z displacement Stress Color index - XX, YY, ZZ, XY, YZ, ZX, Von mises stress Strain Color index - XX, YY, ZZ, XY, YZ, ZX strain (Bench marker) Network speed - Mega bits [Mbits per second] Network - Frame per second - 1-107 -

A.5 (Keyboard) / A.4, Scale factor Fig. A.4 (Mouse),, A.5 Mesh Color / Model information Color index box (Orthogonal viewport) (x, y, z ) Node/Element Selection(single option) opengl vortex array (single option) Bench mark test Fig. A.5-108 -

병렬처리 옵션 사용자가 지정한 가시화 클러스터 노드의 화면그림을 현재 화면에 묘사 할 수 있도록 하는 사용자 정의 옵션으로 아래 그림 A.6과 같이 전체 화면 에 대한 각 가시화 노드의 부분화면을 확인할 수 있도록 지정할 수 있다. Cluster Node 0 Cluster Node 1 Cluster Node 2 Cluster Node 3 Fig. A.6 병렬 처리 옵션을 통한 각 가시화 노드의 해석결과에 대한 가시화 절점/요소 선택에 의한 해당 정보 출력 옵션 - 병렬처리 가시화에서는 지원하지 않고 오직 순차 프로그램에서 지원하 는 옵션으로 모델에 대한 절점 및 요소 선택시 해당 절점 및 요소의 정보 를 화면상에 표시해 주는 옵션으로 아래 그림 A.7과 같다. Fig. A.7 절점 및 요소 선택에 의한 정보 표시 - 109 -

Bench Test option - option 24. - 24 1. -, -. A.6 1 2 100 (3,090,903 DOF, 6,000,000 Polygon) 8. Fig. A.8, 16 55%. - 110 -

A.7 opengl MPI,. A.1. opengl glreadpixels RGB (frame buffer).. RGB,. GL_FLOAT 0~1, GL_UNSIGNED_BYTE 0~255,. Code A.1 opengl void Postprocessor::paintGL() { : 684 glreadpixels(0, 0, w, h, GL_DEPTH_COMMPONENT, GL_FLOAT, depth_all); 731 glreadpixels(,1, 2, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, 4); : Z-buffer : } Code 2,, - 111 -

(, 0 ),. Code A.2 // code #2. Second algorism // -- 2 ** == (n-1) // -- [Test #1] Master computer - 2003. 1.9 // -- [Test #2] 2^n algorism if ( ( my_rank == (int)(pow(2.0,kk)*(floor(my_rank/pow(2.0,kk)))) ) && ( my_rank!= (np-1)) && ( my_rank+pow(2.0,kk-1)) <np ) // depth { pixel_pick[0] = w; pixel_pick[1] = 0; pixel_pick[2] = h; pixel_pick[3] = 0; Rank // intersection ---------------------------------------- // ##6. Intersection // x position if ((pixel_recv[0][0]<=pixel_recv[1][0]) && (pixel_recv[1][0]<=pixel_recv[0][1])) pixel_pick[0] = pixel_recv[1][0]; if ((pixel_recv[0][0]<=pixel_recv[1][1]) && (pixel_recv[1][1]<=pixel_recv[0][1])) pixel_pick[1] = pixel_recv[1][1]; if ((pixel_recv[1][0]<=pixel_recv[0][0]) && (pixel_recv[0][0]<=pixel_recv[1][1])) pixel_pick[0] = pixel_recv[0][0]; if ((pixel_recv[1][0]<=pixel_recv[0][1]) && (pixel_recv[0][1]<=pixel_recv[1][1])) pixel_pick[1] = pixel_recv[0][1]; // y position if ((pixel_recv[0][2]<=pixel_recv[1][2]) && (pixel_recv[1][2]<=pixel_recv[0][3])) pixel_pick[2] = pixel_recv[1][2]; if ((pixel_recv[0][2]<=pixel_recv[1][3]) && (pixel_recv[1][3]<=pixel_recv[0][3])) pixel_pick[3] = pixel_recv[1][3]; if ((pixel_recv[1][2]<=pixel_recv[0][2]) && (pixel_recv[0][2]<=pixel_recv[1][3])) pixel_pick[2] = pixel_recv[0][2]; if ((pixel_recv[1][2]<=pixel_recv[0][3]) && (pixel_recv[0][3]<=pixel_recv[1][3])) pixel_pick[3] = pixel_recv[0][3]; // if (pixel_pick[0] == w) pixel_pick[0] = 0; if (pixel_pick[2] == h) pixel_pick[2] = 0; //--------------------------------------------------------------- // ##7. // [ intersection comparison ] // // -- i *width*depth + j * depth + k // [.!] if ( (pixel_pick[1]!= 0) && (pixel_pick[3]!=0)) for (i=pixel_pick[2] ; i<=pixel_pick[3] ; i++) for (j=pixel_pick[0] ; j<=pixel_pick[1] ; j++) { if( depth_p[(i-pixel_p[2])*w_p+(j-pixel_p[0])] >=depth_recv[0][(i-pixel_recv[0][2])*w_recv[0]+(j-pixel_recv[0][0])]) { depth_p[(i-pixel_p[2])*w_p+(j-pixel_p[0])] = depth_recv[0][(i-pixel_recv[0][2])*w_recv[0]+(j-pixel_recv[0][0])]; for (k=0 ; k<3 ;k++) draw_p[(i-pixel_p[2])*w_p*3+(j-pixel_p[0])*3 +k] = draw_recv[0][(i-pixel_recv[0][2])*w_recv[0]*3+(j-pixel_recv[0][0])*3+k]; } } // depth number // -- depth number if (kk!= depth_number) { delete draw_recv[0]; delete depth_recv[0]; for (i=0 ; i<4 ; i++) pixel_recv[0][i] = pixel_p[i]; w_recv[0] = pixel_recv[0][1]-pixel_recv[0][0]+1; h_recv[0] = pixel_recv[0][3]-pixel_recv[0][2]+1; draw_recv[0] = new unsigned char [h_recv[0]*w_recv[0]*3]; depth_recv[0] = new float [h_recv[0]*w_recv[0]]; for (i=0 ; i<h_p ; i++) for (j=0 ; j<w_p ; j++) { depth_recv[0][i*w_recv[0]+j] = depth_p[i*w_p+j]; for (k=0 ; k<3 ; k++) draw_recv[0][i*w_recv[0]*3+j*3+k] = draw_p[i*w_p*3+j*3+k]; } delete draw_p; delete depth_p; } - 112 -

Code A.3 void Postprocessor::paintGL() { : 684 glreadpixels(0, 0, w, h, GL_DEPTH_COMMPONENT, GL_FLOAT, depth_all); pixel_recv[0][0] = w; // pixel x min init. pixel_recv[0][1] = 0; // pixel x max init. pixel_recv[0][2] = h; // pixel y min init. pixel_recv[0][3] = 0; // pixel y max init. // frame // width, height for (i=0 ; i<h ; i++) { for (j=0 ; j<w ; j++) { if ((depth_all[i*w+j] < 1.0)&& (depth_all[i*w+j] > 0.0)) { if (pixel_recv[0][0] >= j) pixel_recv[0][0] = j; if (pixel_recv[0][1] <= j) pixel_recv[0][1] = j; if (pixel_recv[0][2] >= i) pixel_recv[0][2] = i; if (pixel_recv[0][3] <= i) pixel_recv[0][3] = i; } } } // if (pixel_recv[0][0] == w) pixel_recv[0][0] = 0; if (pixel_recv[0][2] == h) pixel_recv[0][2] = 0; w_recv[0] = pixel_recv[0][1]-pixel_recv[0][0]+1; h_recv[0] = pixel_recv[0][3]-pixel_recv[0][2]+1; draw_recv[0] = new unsigned char [h_recv[0]*w_recv[0]*3]; depth_recv[0] = new float [h_recv[0]*w_recv[0]]; 731 glreadpixels(,1, 2, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, 4); : Z-buffer : } width(w) pixel_recv[0][3] height(h) pixel_recv[0][2] pixel_recv[0][0] pixel_recv[0][1] - 113 -

, pixel_recv[][], pixel_recv[0][], pixel_recv[1][].,., 3. 0. A.8 nxview [2002. 9. 30] [2002. 11. 29] (wiregl ) [2002. 12. 2] [2003. 1. 19] [2003. 2. 20] [2003. 3. 3] Prototype ( 0.03) - - [2003. 4. 16] nxview 0.03, [2003. 11. 12] nxview 0.05, Screen space [2004. 1. 6] nxview 0.06, Non-blocking mode [2004. 7. 12] nxview 0.08, [2004. 10. 7] nxview 1.00, - 114 -

B. MPI. MPI CPU MPI, GUI,., CPU CPU GUI, GUI,,,., Thread programming,. B.1. Fig. B.1-115 -

Code B.1 // MPI event --------------------------------------------------------------- // event signal // -- [2002. 12. 13] trackball event // --, mouse event..!! // // rank0 signal // -- event signal // -- event data_stream MPI::COMM_WORLD.Barrier(); bench_mpi_temp = MPI::Wtime(); if (my_rank == 0) { for (i=1; i<np ; i++) MPI::COMM_WORLD.Send(data_stream, 6, MPI::INT, i, 100); } else { MPI::COMM_WORLD.Recv(data_stream, 6, MPI::INT, 0, 100);// tag=100 mouse event if (data_stream[0] == 1) signal_press_mpi(data_stream[1], data_stream[2], data_stream[3]); if (data_stream[0] == 2) signal_release_mpi(data_stream[1], data_stream[2], data_stream[4], data_stream[5], data_stream[3]); if (data_stream[0] == 3) signal_move_mpi(data_stream[1], data_stream[2], data_stream[3]); // [..!!] evnet // rank0 paintgl // -- event Popup menu paintgl if (parallel_event!= data_stream[0]) { if (data_stream[0] == 200) menuoption(data_stream[1]); if (data_stream[0] == 201) setwhatview(data_stream[1]); if (data_stream[0] == 202) setscalefactor(data_stream[1]); if (data_stream[0] == 210) { select_cpu = data_stream[1]; entire_shape = (FType)data_stream[2]; // menu option // color option // scale option // parallel cpu select if (data_stream[2] == 1) { accel_mode = PostProcessor::PFALSE; surface_mode = PostProcessor::PFALSE; } if (data_stream[2] == 0) { accel_mode = PostProcessor::PFALSE; surface_mode = PostProcessor::PTRUE; } updatelist(); } } parallel_event = data_stream[0]; } MPI::COMM_WORLD.Barrier(); if (bench_test == PostProcessor::PTRUE) bench_mpi += MPI::Wtime() - bench_mpi_temp; //------------------------------------------------------------------------------------ - 116 -

, Mouse event Keyboard event., GUI QT C++, MPI,. nxview data stream. Tag, Tag, Code B.1. B.1,,. Code B.1 paintgl(),. data_stream. Code B.2 mousepressevent Mouse/Keyboard event data_stream data_stream., Code B.2 (rank0). (rank0) dummy signal. - 117 -

Code B.2 Data stream void PostProcessor::timerEvent(QTimerEvent *) { // MPI evnet ------------------------------------------------------------- // rank0 dumy data if (my_rank == 0) data_stream[0] = 0; //-------------------------------------------------------------------------------- updategl(); } void PostProcessor::mousePressEvent(QMouseEvent * event) { // MPI event ------------------------------------------------------------ if (my_rank == 0) { data_stream[0] = 1; //mouse press evnet data_stream[1] = event->x(); //mouse x position data_stream[2] = event->y(); //mouse y position data_stream[3] = (int)event->state(); //mouse state information data_stream[4] = event->globalx(); //global X data_stream[5] = event->globaly(); //global Y } //--------------------------------------------------------------------------------- signal_press_mpi(event->x(), event->y(), (int)event->state()); updategl(); } - 118 -

C. C.1 Psychovisual Redundancy,..,..,. C.1, Image resampling,,. Fig. C.1 Image resampling - 119 -

C.1 1024 1024 1024 1024 512 512, 256 256... RGB 8bit, 256. C.2 4bit Uniform quantization 16. 8bit 4bit 2:1,. Fig. C.2 Uniform quantization to 16 levels, C.3 Improved gray-scale(igs) quantization. IGS quantization Uniform quantization 4bit, 4 4,. - 120 -

Fig. C.3 IGS quantization to 16 levels Table. C.1 IGS quantization to 16 levels Pixel Gray Level Sum IGS code i - 1 N/A 00000000 N/A i 01101100 01101100 0110 i + 1 10001011 10010111 1001 i + 2 10000111 10001110 1000 i + 3 11110100 11110010 1111 C.1, 0 Sum 4 Gray Level 4 5. 2 i+3 4 1111, 5,. Image quantization resampled image C.4, C.5, C.6., 16level quantization image, - 121 -

.,. Uniform quantization image IGS quantization image. C.4, C.5, C.6 4. quantization,,., 8,9,. quantization. Fig. C.4 24bit Original RGB Image - 122 -

Fig. C.5 Uniform quantization image Fig. C.6 IGS quantization image - 123 -

C.2 Bit-plane decomposition 1 Image quantization. quantization. Bit-plane decomposition. Image quantization,,, bit-plane coding. m bit gray-scale C.1. a m 1 2 m 1 + a m 2 2 m 2 + + a 1 2 1 + a 0 2 0 (C.1) Bit plane. C.7. 1 quantization bit-plane decomposition. Fig. C.7 One dimensional run-length code with Bit-plane decomposition - 124 -

C.7 Bit plane One-dimensional run-length coding. 10 5 6 127(01111111) 128(10000000), 127 128 binary,. C.2 m-bit Gray code. g i = a i a i +1, 0 i m 2 (C.2) g m 1 = a m 1 exclusive OR operator., 127 01000000, 128 11000000 Bit-plane run-length coding. C.8 8bit 256level bit-plane decomposition. bit-plane decomposition, Gray coded bit plane image. C.8 C.4, bit-plane RED bit-plane. 8 bit-plane 4 bit-plane. bit-plane 5 6 bit, Gray-coded image bit, Gray-coded image. - 125 -

The eight binary bit planes Original image Gray-coded bit planes RGB R bit 7 R bit 6 R bit 5 R bit 4 Fig. C.8 Bit-plane decomposition - 126 -

bit-plane decomposition RGB, C.8 RGB. C.8 RGB Gray-coded image, bit-plane 5,6 Gray-coded image, bit-plane 4. bit-plane,. quantization RGB.,, Gray-code RGB. Gray-coded image RGB,,. C.9 1 quantization bit-plane decomposition. quantization, C.8 Gray-code bit-plane decomposition. Uniform quantization bit-plane decomposition IGS quantization bit-plane decomposition. - 127 -

The eight binary bit planes Original image Uniform quantization & Gray-coded bit IGS quantization & Gray-coded bit RGB R bit 7 R bit 6 R bit 5 R bit 4 Fig. C.9 Image quantization and Bit-plane decomposition - 128 -

D. / IPSAP IPSAP.,... nxview D.1. D.2. Fig. D.1-129 -

Fig. D.2 IPSAP D.1. D.3, D.4 Orthogonal view mode Perspective view mode,. - 130 -

. Fig. D.3 Perspective mode Fig. D.4 Orthogonal mode D.5 8,. Fig. D.5 8-131 -

IPSAP. D.6 400,517 ( : 1,201,551 ), 255,5550. 8, 30 5. Fig. D.6 (5 ) Fig. D.7-132 -

D.7,,. D.8 7. 8. Fig. D.8 8-133 -

D.5.. D.9, 0 ~0.002 0.0001. Fig. D.9 ( ) D.9 X,. D.5,. D.10. 0 ~0.0014 0.0001. - 134 -

0.0000 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.0010 0.0011 0.0012 0.0013 0.0014 Fig. D.10-135 -

...,,., 10,,,,,,.,....... 2004 2-136 -