PowerPoint Presentation

Similar documents
fprintf(fp, "clf; clear; clc; \n"); fprintf(fp, "x = linspace(0, %d, %d)\n ", L, N); fprintf(fp, "U = [ "); for (i = 0; i <= (N - 1) ; i++) for (j = 0

차례. MPI 소개. MPI 를이용한병렬프로그래밍기초. MPI 를이용한병렬프로그래밍실제 4. MPI 병렬프로그램예제 l 부록 : MPI- l 용어정리 / 참고자료 Supercomputing Center 제 장 MPI 소개 MPI 를소개하고 MPI 를이해하는데 필요한기본

Microsoft PowerPoint - 병렬표준.pptx

Parallel Programming & MPI 박필성 수원대학교 IT 대학컴퓨터학과

슬라이드 1

6주차.key

Parallel Programming 박필성 IT 대학컴퓨터학과

Microsoft Word - 3부A windows 환경 IVF + visual studio.doc

Microsoft Word - cover.docx

VIA

Microsoft PowerPoint - [2009] 02.pptx

PowerPoint 프레젠테이션

학습목차 2.1 다차원배열이란 차원배열의주소와값의참조

2011 PLSI 병렬컴퓨팅경진대회문제 01. 대학원팀 02. 학부팀 - 경진내용 - 경진환경 주어진순차코드를병렬화하여성능향상도 ( 획득점수 ) 를측정 점수 = ( 순차코드수행시간 ) / ( 병렬화코드수행시간 ) 프로그래밍언어 : C, Fortran 순차코드는 50 라

Microsoft PowerPoint - chap13-입출력라이브러리.pptx

K&R2 Reference Manual 번역본

Microsoft PowerPoint - ch07 - 포인터 pm0415

Microsoft PowerPoint - 3ÀÏ°_º¯¼ö¿Í »ó¼ö.ppt

[ 마이크로프로세서 1] 2 주차 3 차시. 포인터와구조체 2 주차 3 차시포인터와구조체 학습목표 1. C 언어에서가장어려운포인터와구조체를설명할수있다. 2. Call By Value 와 Call By Reference 를구분할수있다. 학습내용 1 : 함수 (Functi

Chapter #01 Subject

<4D F736F F F696E74202D20B8AEB4AABDBA20BFC0B7F920C3B3B8AEC7CFB1E22E BC8A3C8AF20B8F0B5E55D>

Microsoft PowerPoint - chap11-포인터의활용.pptx

11장 포인터

chap7.key

Microsoft PowerPoint - chap06-2pointer.ppt

PowerPoint 프레젠테이션

<4D F736F F F696E74202D20BBB7BBB7C7D15F FBEDFB0A3B1B3C0B05FC1A638C0CFC2F72E BC8A3C8AF20B8F0B5E55D>

The Pocket Guide to TCP/IP Sockets: C Version

제 14 장포인터활용 유준범 (JUNBEOM YOO) Ver 본강의자료는생능출판사의 PPT 강의자료 를기반으로제작되었습니다.

Microsoft PowerPoint - ch07 - 포인터 pm0415

이번장에서학습할내용 동적메모리란? malloc() 와 calloc() 연결리스트 파일을이용하면보다많은데이터를유용하고지속적으로사용및관리할수있습니다. 2

chap 5: Trees

SRC PLUS 제어기 MANUAL

Microsoft PowerPoint - chap12-고급기능.pptx

BMP 파일 처리

A Dynamic Grid Services Deployment Mechanism for On-Demand Resource Provisioning

슬라이드 1

금오공대 컴퓨터공학전공 강의자료

4. 1 포인터와 1 차원배열 4. 2 포인터와 2 차원배열 4. 3 포인터배열 4. 4 포인터와문자그리고포인터와문자열

Microsoft PowerPoint - 알고리즘_2주차_1차시.pptx

< E20C6DFBFFEBEEE20C0DBBCBAC0BB20C0A7C7D12043BEF0BEEE20492E707074>

0. 표지에이름과학번을적으시오. (6) 1. 변수 x, y 가 integer type 이라가정하고다음빈칸에 x 와 y 의계산결과값을적으시오. (5) x = (3 + 7) * 6; x = 60 x = (12 + 6) / 2 * 3; x = 27 x = 3 * (8 / 4

untitled

제1장 Unix란 무엇인가?

untitled

11장 포인터

<4D F736F F F696E74202D20C1A63137C0E520B5BFC0FBB8DEB8F0B8AEBFCD20BFACB0E1B8AEBDBAC6AE>

강의10

<443A5C4C C4B48555C B3E25C32C7D0B1E25CBCB3B0E8C7C1B7CEC1A7C6AE425CBED0C3E0C7C1B7CEB1D7B7A55C D616E2E637070>

Microsoft PowerPoint - eSlim SV [080116]

(Asynchronous Mode) ( 1, 5~8, 1~2) & (Parity) 1 ; * S erial Port (BIOS INT 14H) - 1 -

<4D F736F F F696E74202D203137C0E55FBFACBDC0B9AEC1A6BCD6B7E7BCC72E707074>

Microsoft PowerPoint - chap03-변수와데이터형.pptx

À©µµ³×Æ®¿÷ÇÁ·Î±×·¡¹Ö4Àå_ÃÖÁ¾

PowerPoint 프레젠테이션

歯9장.PDF

Microsoft PowerPoint - chap02-C프로그램시작하기.pptx

C++ Programming

1 1. INTRODUCTION 2 2. DOWNLOAD Windows Desktop & Server Max OS X, Linux, Windows CE 2 3. API REFERENCE CAN_OpenVcp CAN_Op


임베디드시스템설계강의자료 6 system call 2/2 (2014 년도 1 학기 ) 김영진 아주대학교전자공학과

PowerPoint 프레젠테이션

Microsoft PowerPoint - chap10-함수의활용.pptx

C++-¿Ïº®Çؼ³10Àå

iii. Design Tab 을 Click 하여 WindowBuilder 가자동으로생성한 GUI 프로그래밍환경을확인한다.

2009년 상반기 사업계획

1217 WebTrafMon II

Microsoft PowerPoint APUE(Intro).ppt

bn2019_2

PRO1_04E [읽기 전용]

untitled

<4D F736F F F696E74202D B3E22032C7D0B1E220C0A9B5B5BFECB0D4C0D3C7C1B7CEB1D7B7A1B9D620C1A638B0AD202D20C7C1B7B9C0D320BCD3B5B5C0C720C1B6C0FD>

Microsoft PowerPoint - ch09 - 연결형리스트, Stack, Queue와 응용 pm0100

The Pocket Guide to TCP/IP Sockets: C Version

Microsoft Word - KPMC-400,401 SW 사용 설명서

C 언어 프로그래밊 과제 풀이

untitled

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

휠세미나3 ver0.4

이도경, 최덕재 Dokyeong Lee, Deokjai Choi 1. 서론

Microsoft PowerPoint - 알고리즘_1주차_2차시.pptx

vi 사용법

특집-5

<32B1B3BDC32E687770>

The Pocket Guide to TCP/IP Sockets: C Version

금오공대 컴퓨터공학전공 강의자료


<4D F736F F F696E74202D E20B3D7C6AEBFF6C5A920C7C1B7CEB1D7B7A1B9D62E >

문서의 제목 나눔명조R, 40pt

KNK_C_05_Pointers_Arrays_structures_summary_v02

Outline PLSI 시스템접속병렬처리병렬프로그래밍개요 OpenMP를이용한병렬화 MPI를이용한병렬화순차코드의병렬화

chapter4

프로그래밍개론및실습 2015 년 2 학기프로그래밍개론및실습과목으로본내용은강의교재인생능출판사, 두근두근 C 언어수업, 천인국지음을발췌수정하였음

Microsoft PowerPoint - o8.pptx

<C6F7C6AEB6F5B1B3C0E72E687770>

歯7장.PDF

chap7.PDF

02장.배열과 클래스

Microsoft PowerPoint - eSlim SV [ ]

슬라이드 1

Transcription:

MPI 를이용한병렬프로그래밍기초 : Point-to-Point Communication Hongsuk Yi KISTI Supercomputing Center 2009-11-09 1

MPI 란무엇인가? MPI Message Passing Interface 병렬프로그래밍을위해표준화된데이터통신라이브러리 MPI-1 표준마련 (MPI Forum) : 1994 년 Procs Procs Procs Procs Memory Memory Memory Memory Network

MPI 포럼 MPI Forum MPI 표준제정 MPI 1.0 June,1994. MPI 1.1 June 12, 1995. MPI-2 - July 18, 1997 3

MPI 의목적과범위 MPI 의목적은 MPI를제공 소스코드이식성보장 효율적인구현을가능하도록 MPI 에는 많은기능들이포함 이질적인병렬아키텍처를위한지원 (Grid 환경등 ) MPI-2 에는 중요한부가적인몇개의기능추가 MPI-1에는변화가없음 4

MPI_COMM_WORLD 서로통신할수있는프로세스들의집합을나타내는핸들 모든 MPI 통신루틴에는커뮤니케이터인수가포함됨 커뮤니케이터를공유하는프로세스들끼리통신가능 프로그램실행시정해진, 사용가능한모든프로세스를포함하는커뮤니케이터 MPI_Init 이호출될때정의됨 1 2 0 5 MPI_COMM_WORLD 3 4 소켓1 7 6 소켓2 0 1 4 3 2 6 5 15 9 14 13 소켓3 8 10 12 11 소켓4 tachyon189

Massage Passing 메시지패싱은 지역적으로메모리를따로가지는프로세스들이데이터를서로공유하기위해메시지 ( 데이터 ) 를주고받음 병렬화를위한작업할당, 데이터분배, 통신의운용등모든것을프로그래머가담당으로, 코딩은어렵지만유용성좋음 (Very Flexible) program program program program communication network

MPI 메시지 MPI 데이터 특정 MPI 데이터타입을가지는원소들의배열로구성 송신과수신데이터타입은반드시일치해야한다. MPI Data Type MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE C Data Type signed char signed short int signed int signed long int float double long double

MPI 의기본개념 프로세스와프로세서 MPI는프로세스기준으로작업할당 프로세서대프로세스 = 일대일또는일대다 메시지 어떤프로세스가보내는가 어디에있는데이터를보내는가 어떤데이터를보내는가 얼마나보내는가 어떤프로세스가받는가 어디에저장할것인가 얼마나받을준비를해야하는가

MPI 의기본개념 꼬리표 (tag) 메시지매칭과구분에이용 순서대로메시지도착을처리할수있음 와일드카드사용가능 커뮤니케이터 (Communicator) 서로간에통신이허용되는프로세스들의집합 프로세스랭크 (Rank) 동일한커뮤니케이터내의프로세스들을식별하기위한식별자

MPI 헤더파일 헤더파일삽입 Fortran INCLUDE mpif.h C #include mpi.h MPI 서브루틴과함수의프로토타입선언 매크로, MPI 관련인수, 데이터타입정의

MPI 의기본개념 점대점통신 (Point to Point Communication) 두개프로세스사이의통신 하나의송신프로세스에하나의수신프로세스가대응 집합통신 (Collective Communication) 동시에여러개의프로세스가통신에참여 일대다, 다대일, 다대다대응가능 여러번의점대점통신사용을하나의집합통신으로대체 오류의가능성이적다. 최적화되어일반적으로빠르다.

MPI 참고도서 MPI: A Message-Passing Interface Standard (1.1, June 12, 1995) MPI-2: Extensions to the Message-Passing Interface (J uly 18,1997) MPI: The Complete Reference Using MPI: Portable Parallel Programming With the Mes sage-passing Interface Using MPI-2: Advanced Features of the Message-Passing Interface. Parallel Programming with MPI 12

MPI 를이용한 병렬프로그래밍기초

필수 MPI Commands : 6 개 int MPI_Init(int *argc, char **argv) int MPI_Finalize(void) int MPI_Comm_size(MPI_Comm comm, int *size) int MPI_Comm_rank(MPI_Comm comm, int *rank) int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

시작과끝 MPI_Init(int *argc, char **argv) MPI 루틴중가장먼저오직한번반드시호출되어야함 변수선언후바로다음에위치 MPI 환경초기화 MPI_Finalize(void) 코드의마지막끝에위치 모든 MPI 자료구조정리 모든프로세스들에서마지막으로한번호출되어야함

MPI 프로세스설정 MPI_Comm_size(MPI_Comm comm, int *size) 커뮤니케이터에포함된프로세스들의총개수 커뮤니케이터사이즈가져오기 MPI_Comm_rank(MPI_Comm comm, int *rank) 현재프로세스의 ID 같은커뮤니케이터에속한프로세스의식별번호 프로세스가 n개있으면 0부터 n-1까지번호할당 0 rank size-1

Message Passing: Send MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) buf : 송신버퍼의시작주소 count : 송신될원소개수 datatype : 각원소의 MPI 데이터타입 ( 핸들 ) dest : 수신프로세스의랭크 tag : 메시지꼬리표 comm : MPI 커뮤니케이터 ( 핸들 ) MPI_Send(&x,1,MPI_DOUBLE,manager,me, MPI_COMM_WORLD)

Message Passing: Receive MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) buf : 수신버퍼의시작주소 count : 수신될원소개수 datatype : 각원소의 MPI 데이터타입 ( 핸들 ) source : 송신프로세스의랭크 tag : 메시지꼬리표 comm : MPI 커뮤니케이터 ( 핸들 ) status(mpi_status_size) : 수신된메시지의정보저장

블록킹수신 수신자는와일드카드를사용할수있음 모든프로세스로부터메시지수신 : MPI_ANY_SOURCE 어떤꼬리표를단메시지든모두수신 MPI_ANY_TAG 수신자의 status 인수에저장되는정보 송신프로세스, 꼬리표 MPI_GET_COUNT : 수신된메시지의원소개수를리턴

Message Passing 블로킹통신 : 주의사항 표준송신과수신은블로킹통신이다 MPI_Recv는메시지버퍼를완전히받은후통신완료함 MPI_Send는메시지받을때까지블로킹이거나아니거나.. 교착 (deadlock) 에항상주의해야함

Deadlock Code in each MPI process: MPI_Ssend(, right_rank, ) MPI_Recv(, left_rank, ) 오른쪽에있는프로세스는전혀메시지를받을수없다. 0 6 1 5 2 4 3 만일 MPI 가동기프로토콜로구현되어있으면표준송신 (MPI_Send) 모드에서도교착이발생함 21

점대점통신 반드시두개의프로세스만참여하는통신 통신은커뮤니케이터내에서만이루어진다. 송신 / 수신프로세스의확인을위해커뮤니케이터와랭크사용 Communicator 5 2 1 0 source 3 4 destination

점대점통신 통신의완료 전송에이용된메모리위치에안전하게접근할수있음을의미 송신 : 송신변수는통신이완료되면다시사용될수있음 수신 : 수신변수는통신이완료된후부터사용될수있음 블록킹통신과논블록킹통신 블록킹 통신이완료된후루틴으로부터리턴됨 논블록킹 통신이시작되면완료와상관없이리턴, 이후완료여부를검사

통신모드 통신모드 블록킹 MPI 호출루틴 논블록킹 동기송신 MPI_SSEND MPI_ISSEND 준비송신 MPI_RSEND MPI_IRSEND 버퍼송신 MPI_BSEND MPI_IBSEND 표준송신 MPI_SEND MPI_ISEND 수신 MPI_RECV MPI_IRECV

통신모드 표준송신 : Standard send MPI_SEND) minimal transfer time may block due to synchronous mode > risks with synchronous send Synchronous send (MPI_SSEND) risk of deadlock risk of serialization risk of waiting > idle time high latency / best bandwidth Buffered send (MPI_BSEND) low latency / bad bandwidth Ready send (MPI_RSEND) use never, except you have a 200% guarantee that Recv is already called in the current version and all futu re versions of your code 25

Synchronous Send: 동기송신 MPI_SSEND (Blocking Synchronous Send) Task Waits data transfer from source complete S R Wait MPI_RECV Receiving task waits Until buffer is filled 송신시작 : 대응되는수신루틴의실행에무관하게시작 송신 : 수신측이받을준비가되면전송시작 송신완료 : 수신루틴이메시지를받기시작 + 전송완료 가장안전한논-로컬송신모드

Ready Send : 준비송신 MPI_RSEND (blocking ready send) data transfer from source complete S R Wait MPI_RECV Receiving task waits Until buffer is filled 수신측이미리받을준비가되어있음을가정하고송신시작 수신이준비되지않은상황에서의송신은에러 성능면에서유리 ; 논 - 로컬송신모드

Buffered Send : 버퍼송신 MPI_BSEND (buffered send) copy data to buffer data transfer user-supplied buffer complete S R task waits MPI_RECV 송신시작 : 대응되는수신루틴의실행에무관하게시작 송신완료 : 버퍼로복사가끝나면수신과무관하게완료 사용자가직접버퍼공간관리 MPI_Buffer_attach, MPI_Buffer_detach 로컬송신모드

성공적인통신을위해주의할점들 송신측에서수신자랭크를명확히할것수신측에서송신자랭크를명확히할것커뮤니케이터가동일할것메시지꼬리표가일치할것수신버퍼는충분히클것

논블록킹통신 통신을세가지상태로분류 논블록킹통신의초기화 : 송신또는수신의포스팅 전송데이터를사용하지않는다른작업수행 통신과계산작업을동시수행 통신완료 : 대기또는검사 교착가능성제거, 통신부하감소대기 (waiting) 루틴이호출되면통신이완료될때까지프로세스를블록킹 논블록킹통신 + 대기 = 블록킹통신 검사 (testing) 루틴은통신의완료여부에따라참또는거짓을리턴

점대점통신의사용 단방향통신과양방향통신 양방향통신은교착에주의 rank 0 rank 1 rank 0 rank 1 sendbuf recvbuf sendbuf recvbuf recvbuf sendbuf recvbuf sendbuf

단방향통신 (1/2) 블록킹송신, 블록킹수신 IF (myrank==0) THEN CALL MPI_SEND(sendbuf, icount, MPI_REAL, 1, itag, MPI_COMM_WORLD, ierr) ELSEIF (myrank==1) THEN CALL MPI_RECV(recvbuf, icount, MPI_REAL, 0, itag, MPI_COMM_WORLD, istatus, ierr) ENDIF 논블록킹송신, 블록킹수신 IF (myrank==0) THEN CALL MPI_ISEND(sendbuf, icount, MPI_REAL, 1, itag, MPI_COMM_WORLD, ireq, ierr) CALL MPI_WAIT(ireq, istatus, ierr) ELSEIF (myrank==1) THEN CALL MPI_RECV(recvbuf, icount, MPI_REAL, 0, itag, MPI_COMM_WORLD, istatus, ierr) ENDIF

단방향통신 (2/2) 블록킹송신, 논블록킹수신 IF (myrank==0) THEN CALL MPI_SEND(sendbuf, icount, MPI_REAL, 1, itag, MPI_COMM_WORLD, ierr) ELSEIF (myrank==1) THEN CALL MPI_IRECV(recvbuf, icount, MPI_REAL, 0, itag, MPI_COMM_WORLD, ireq, ierr) CALL MPI_WAIT(ireq, istatus, ierr) ENDIF 논블록킹송신, 논블록킹수신 IF (myrank==0) THEN CALL MPI_ISEND(sendbuf, icount, MPI_REAL, 1, itag, MPI_COMM_WORLD, ireq, ierr) ELSEIF (myrank==1) THEN CALL MPI_IRECV(recvbuf, icount, MPI_REAL, 0, itag, MPI_COMM_WORLD, ireq, ierr) ENDIF CALL MPI_WAIT(ireq, istatus, ierr)

실습 1) MPI_Counting3s.c 2) 점대점통신성능시험

Serial counting3s.c #include <stdlib.h>; #include <stdio.h> ; #include <time.h> double dtime(); int main(int argc, char **argv) { } int i,j, *array, count=0; const int length = 100000000, iters = 10; double stime, etime; array = (int *)malloc(length * sizeof(int)); for (i = 0; i < length; i++) array[i] = i % 10; dtime(&stime); for (j = 0; j < iters; j++) { } for (i = 0; i < length; i++) { } if (array[i] == 3) { count++; } dtime(&etime); printf("serial: Number of 3's: %d \t Elapsed Time = %12.8lf (sec) \n ", count, etime-stime); return 0; - tachyon190 $>./c3s_serial.x serial: Number of 3's: 100000000 Elapsed Time = 3.13399506 (sec) 2009-11-09 35

mpi_counting 3s (1/4) 2 3 6 9 8 1 0 3 3 3 4 3 9 0 0 0 1 2 3 4 5 6 7 8 9 Rank=0 Rank=1 Rank=2 Rank=3 배열크기 = length_per_processor Rank=4 1) Rank=0 : 배열초기화 globalarray[i 2) 작업분할 : myarray[i] 의크기 = length_per_processor 3) 각각의 rank로 myarray 배열송신 (MPI_Send) 4) 각각의 rank는 myarray 수신 (MPI_Recv) 및 3s count 시작 5) 각각의 rank에서계산된 count를 master 노드로송신 6) Master 노드는 count를합하여최종 global_count를출력 7) 시간은 MPI_Wtime() 로측정 2009-11-09 36

#include <stdio.h> #include <stdlib.h> #include <mpi.h> mpi_counting3s.c (2/4) int main(int argc, char **argv) { const int length = 100000000, iters = 10; int myid, nprocs, length_per_process, i, j, p; int *myarray, *garray, root, tag, mycount, gcount; FILE *fp; double t1, t2; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); mycount = 0; root=0; tag=0; length_per_process=length/nprocs; myarray=(int *)malloc(length_per_process*sizeof(int)); 37

mpi_counting3s.c (3/4) if (myid==root { garray=(int *)malloc(length*sizeof(int)); for (i=0; i< length;i++) garray[i]=i%10; } if(myid==root) { t1 = MPI_Wtime(); for(p=0; p<nprocs-1; p++) { for(i=0;i< length_per_process;i++){ j = i + p*length_per_process; myarray[i]=garray[j]; } MPI_Send(myArray, length_per_process, MPI_INT, p+1, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(myArray, length_per_process, MPI_INT, root, tag, MPI_COMM_WORLD, &status); } 38

mpi_counting3s.c (4/4) for (j=0; j< iters; j++){ } for(i=0; i<length_per_process; i++) { } if(myarray[i]==3) mycount++; MPI_Reduce(&myCount,&gCount,1,MPI_INT,MPI_SUM root, MPI_COMM_WORLD); if(myid==root) { t2 = MPI_Wtime(); printf("nprocs=%d Number of 3's: %d Elapsed Time =%12.8lf(sec)\n, nprocs, gcount, t2-t1); } } MPI_Finalize(); $>./serial.x serial=1 Number of 3's: 100000000 Elapsed Time = 3.05412793 (sec) $> mpirun -np 10 -machinefile hostname./parallel.x -O3 nprocs=10 Number of 3's: 100000000 Elapsed Time = 0.22715100 (sec) 2009-11-09 39

예제 : mpi_multibandwidth.c case 1: MPI_Send(&msgbuf1, n, MPI_CHAR, dest, tag, MPI_COMM_WORLD); case 2: MPI_Recv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); MPI_Send(&msgbuf1, n, MPI_CHAR, dest, tag, MPI_COMM_WORLD); MPI_Irecv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD,&reqs[0]); MPI_Wait(&reqs[0], stats); case 3: MPI_Isend(&msgbuf1, n, MPI_CHAR, dest, tag,mpi_comm_world,&reqs[0]); MPI_Irecv(&msgbuf1, n, MPI_CHAR, src, tag,mpi_comm_world,&reqs[1]); case 4: MPI_Waitall(2, reqs, stats); MPI_Ssend(&msgbuf1, n, MPI_CHAR, dest, tag, MPI_COMM_WORLD); MPI_Recv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); LAST REVISED: 12/27/2001 Blaise Barney 2009-11-09 40

Point-to-Point 통신 case 5: MPI_Ssend(&msgbuf1, n, MPI_CHAR, dest, tag, MPI_COMM_WORLD); MPI_Irecv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD,&reqs[0]); MPI_Wait(&reqs[0], stats); case 6: MPI_Sendrecv(&msgbuf1, n, MPI_CHAR, dest, tag, &msgbuf2, n, case 7: MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); MPI_Issend(&msgbuf1, n, MPI_CHAR, dest,tag,mpi_comm_world,&reqs[0]); MPI_Irecv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD,&reqs[1]); MPI_Waitall(2, reqs, stats); case 8: MPI_Issend(&msgbuf1, n, MPI_CHAR, dest,tag,mpi_comm_world,&reqs[0]); MPI_Recv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); MPI_Wait(&reqs[0], stats); case 9: MPI_Isend(&msgbuf1, n, MPI_CHAR, dest,tag,mpi_comm_world,&reqs[0]); MPI_Recv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); MPI_Wait(&reqs[0], stats); 41

Time and Bandwidth t1 = MPI_Wtime(); MPI_Isend(&msgbuf1, n,mpi_char,dest,tag,mpi_comm_world,&reqs[0]); MPI_Recv(&msgbuf1, n, MPI_CHAR, src, tag, MPI_COMM_WORLD, stats); MPI_Wait(&reqs[0], stats); t2 = MPI_Wtime(); thistime = t2 - t1; bw = ((double)nbytes * 2) / thistime; Bw~1200 MB/s 2009-11-09 42

Output: Mpi_multibandwidth.c **** MPI/POE Bandwidth Test *** Message start size= 104857600 bytes Message finish size= 104857600 bytes Incremented by 100000 bytes per iteration Roundtrips per iteration= 10 MPI_Wtick resolution = 1.000000e-05 ************************* task 0 is on tachyon192 partner= 1 task 1 is on tachyon192 partner= 0 *********************************** *** Case 1: Send with Recv Message size: 104857600 task pair: 0-1: 1066.148 *** Case 2: Send with Irecv Message size: 104857600 task pair: 0-1: 1063.166 *** Case 3: Isend with Irecv avg (MB/sec) avg (MB/sec) Message size: 104857600 avg (MB/sec) task pair: 0-1: 1062.683 *** Case 4: Ssend with Recv Message size: 104857600 task pair: 0-1: 1069.341 *** Case 5: Ssend with Irecv avg (MB/sec) Message size: 104857600 avg (MB/sec) task pair: 0-1: 1065.343 *** Case 6: Sendrecv Message size: 104857600 task pair: 0-1: 1070.993 *** Case 7: Issend with Irecv Message size: 104857600 task pair: 0-1: 1065.081 *** Case 8: Issend with Recv avg (MB/sec) avg (MB/sec) 2009-11-09 43

Project 2 : Bandwidth & Latency 멀티코어 Barcelona Chip 의 Bandwidth 를측정하기 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 소켓 1 소켓 2 소켓 1 소켓 2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 소켓 3 소켓 4 소켓 3 소켓 4 tachyon189 tachyon189 2009-11-09 44

Membench: What to Expect average cost per access memory time size > L1 cache hit time total size < L1 s = stride 45

500c 50c 1-2c 10c Core-Memory 속도차이 : AMD 2350 L1 (64kB) 1.5ns : 3 cycle L2 (512kB) 15 cycle 3cycle (L1) 3(L1-L2) L1 9cycle(L2 only) L2 L3 cache (2MB) L3 47 cycle 15cyle(L2) 9cycle(L2-L3) memory 23 cycle (L3 only) 2009-11-09 46

AMD 2350 Membenchmark Memory :176 ns (0.17us) 256B line? 4k page TLB? L1:1ns 2009-11-09 47 (2cycle) L3:24ns 48cycle L2:8ns (16cycle)

Bandwidth 2009-11-09 48

Q & A 2009-11-09 49