HPC Azure - Scalable, Distributed Applications in Windows Azure
- 8 years ago
9 Nodes don t need to talk to each other, or very little cross-node communication Usually a parameter sweep, a job splitting, or a search/comparison through data Examples: Monte Carlo simulations, image/video rendering, genetic algorithms, sequence matching Great workload for the cloud!
10 Nodes need to talk to each other constantly Requires a fast interconnection network (low latency and high throughput) Examples: automotive crash simulation, fluid dynamics, climate modeling, reservoir simulation, manufacturing modeling More challenging, but possible with HPC VMs on Azure
11 Backbone Network Customer s Integrated Network S/W F/S I/O S D L C C C #1 #2 #3 80Gbps IB Network 1GigE Network 10GigE Network 10/100 Ethernet Network C L D S I/O M F/S Computing Node Login Node Debugging Node Scheduler Node I/O Gateway Node Management Node File Services Node 24 Ports IB Switch Shared Home Directory 20TB NAS Storage Backup Servers 1 M C C C C C #4 #5 #6 #7 #8 #1 #2 MDS MDS OSS 288 Ports IB Switch 48 Ports 10/100 Switch (Clustering) Data Backup Disk 67TB Tape Storage 400TB HSM Disk 20TB Dual SAN Switches Dual HSM Servers 48 Ports 10/100 Switch Management Network C C C C C C C #26 #27 #28 #29 #30 #31 # Ports IB Switch Communication Network OSS OSS OSS OSS C #33
14 HPC as a Service The cloud on your terms 사용량기반 편리한사용 리소스의관리 대규모확장성 Private Cloud 경제성 고가용성 새로운 Insights 운영비측면 HYBRID Public Cloud COMPUTE STORAGE NETWORKING BUSINESS INTELLIGENCE DATA API
16 Variety of VM sizes
19 Type Family CPU Cores Mem/Core Storage General Purpose A1-4 A5-7 Optimized Compute D1-4 D11-14 Dv2 1-5 DS DS v2 1/2/4/8 2/4/8 60% faster than A 1/2/4/8 2/4/8/ E v3 (2.4 GHz) 35% faster than D 1/2/4/8/ 2/4/8/ 1.75 GB 7 GB 3.5 GB 7 GB 3.5 GB 7 GB Local HDD Local SSD Local SSD +Premium storage Performance Optimized G1-5 E v3 2/4/8//32 14 GB (448) Local SSD Compute Intensive with InfiniBand network GS +Premium storage A8-9 E v2 (2.6 GHz) 8/ 7 GB Local HDD Compute Intensive A10-11 E v2 (2.6 GHz) 8/ 7 GB Local HDD
20 Hardware designed for HPC High CPU: 2x8 core processors per node, Sandybridge E-2670 at 2.6 GHz High Memory: 128 GB, 00MHz DDR3 Fast Interconnect: QDR InfiniBand for intra deployment traffic, 10gigE for standard Azure traffic and internet access Scratch storage: 2 TB per node Available in 8 core/56 GB and core/112 GB instances RDMA for Linux and Windows Available in 7 regions Bare Metal Equivalent Performance ~ microsecond latency >3GB/sec non blocking 90% efficiency on Linpack Example: linear scaling on NAMD Next generation coming soon: H Series
23 Time in seconds Car to Car/Caravan Top Crunch Benchmark up to 256 cores on LS-DYNA MPP Number of Cores
24 Run time in seconds AWS C3 8X LARGE Vs A9 run time for crash models/jobs Number of cores Azure A9 nodes MPI RDMA AWS C3 8X Large MPI SRIOV
25 Solution time goes from 11.8 minutes with one A9 VM with cores, to 1.5 minutes using A9 VMs with 256 cores. Cost of the compute for the job goes from $1. to $2.33. This is an 8x improvement in solution time at just over twice the cost.
28 CUDA OpenGL
29 Size/ Component CPU Cores (E5-2690v3) NV6 NV12 NV24 NC6 NC12 NC24 NC24r RAM 56 GB 112 GB 224 GB 56 GB 112 GB 224 GB 224 GB SSD ~0.5 TB ~1.0 TB ~2.0 TB ~0.5 TB ~1.0 TB ~2.0 TB ~2.0 TB Network Azure Network Azure Network Azure Network Azure Network Azure Network Azure Network Azure Network + Dedicated RDMA Backend GPU Resources 1 x M60 GPU (1/2 Physical Card) 2 x M60 GPU (1 Physical Card) 4 x M60 GPU (2 Physical Cards) 1 x K80 GPU (1/2 Physical Card) 2 x K80 GPUs (1 Physical Card) 4 x K80 GPUs (2 Physical Cards) 4 x K80 GPUs (2 Physical Cards)
30 Stateless HPC as a service No OS image to manage and maintain No scheduler Automated compute provisioning per job Automatic job submission, data movement, and de-provisioning MPI Support in future Hyperscale (thousands of cores)
31 Pay as you go User application or service Remote Visualization Available in near future Application Scheduler Customer responsibility O/S (Linux/Windows) High Speed Network IB Servers Storage Physical infrastructure in Azure Networking Data center infrastructure
33 Altair PBS SLURM Tibco Data Synapse HPC Pack stage data Cycle Computing Rescale
36 주식, 신용, 이자율, 외환등각종파생상품개발및리스크분석알고리즘트레이딩 Monte Carlo Simulation 보증준비금적정성분석, Pricing, 리스크헤지고정자산유동성분석포트폴리오모델링신용카드 scoring, pricing, 부정사용리스크계산 Case 별채무불이행리스크분석
39 On-Demand Reservations (discount and SLA) Low priority (spot)
40 G Series Largest VM available in the market 32 cores, 448GB Ram, SSD Only Support Infiniband and RDMA Experts l * Operated by 21Vianet
43 Altair Corporate Presentation 민승욱 Innovation Intelligence
44 Copyright 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 글로벌기업 전세계 6 대륙 22 개국가의 45 개가넘는지사에서엔지니어, 과학자, 개발자, 설계자등창의적인인력 2,200 명이상이알테어와함께일하고있습니다.
45 Copyright 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 소프트웨어 알테어사업의핵심 시뮬레이션, 최적화및해석기술을통한고성능컴퓨팅 환경을제공하여빠르고강력한판단으로제품의성능 강화를지원합니다. 하이퍼웍스솔리드씽킹 PBS 웍스클라우드솔루션파트너얼라이언스
46 Copyright 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 알테어만의소프트웨어및서비스 엔지니어링시뮬레이션및최적화기술수행 컨셉디자인및개발기술 알테어는통일상표를통한제품라인으로탄탄한기업구조를갖추고있습니다. HPC 및온디맨드컴퓨팅기술 제품엔지니어링및개발컨설팅 인간중심의산업디자인및제품전략컨설팅 형광등을대신할 LED 제품
47 Copyright 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 알테어의고객 자동차항공우주중장비정부 생명 / 지구과학전자제품 / 소비재에너지건축 전세계약 5,000 고객사보유
48 소프트웨어
49 PBS Works Suite Compute Manager RUN MONITOR MANAGE Display Manager VISUALIZE ACCESS COLLABORATE PBS Professional SCHEDULE PRIORITIZE SCALE PBS Analytics VIEW OPTIMIZE FORECAST Copyright 2015 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
50 PBS Professional PBS Professional 은 : - HPC 환경을지원하기위해작업스케줄러이자 HPC 클러스터, 클라우드및수퍼컴퓨터의관리를위해개발된강력한워크로드관리툴 - 소규모클러스터부터복잡하고방대한시스템에서도검증된신뢰할수있는솔루션 데스크탑에서클라우드를어우르는확장성및신뢰성 TOP 3 Job 스케쥴링도구 (IDC 발표자료 ) 정책기반스케쥴링 가속스케쥴링 (e.g. MIC, GPGPU, FPGA) 전원관리를위한 Green Provisioning Topology 인식스케쥴링 EAL3+ Security 인증획득 확장가능한프레임워크 성숙하고신뢰가능한기술 NASA s Workload Manager of Choice for All NAS HPC Resources ~200k cores scheduled by PBS Professional Schedule Prioritize Scale Copyright 2015 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
51 HPC 계산자원 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 스케줄러 PBS Professional 사용자의해석작업을스케줄링하여 HPC 클러스터의작업의효율성을극대화합니다. Job 2 Job 6 Job 1 Job 5 Running Jobs Job 4 Job 3 시간 [PBS Professional 도입전 ] [ 도입후작업효율증대 ]
52 Compute Manager Compute Manager 는작업제출, 관리및모니터링을위한웹포털로, 최종사용자가데이터와 애플리케이션에만집중할수있도록해줍니다. HPC 작업을위한단일패널 ( 제로클라이언트 브라우저만사용 ) HPC작업제출을위한드래그-앤-드랍인풋덱지원 ( 커스텀기능을이용한자동옵션채우기 ) 진행상황을보고작업이실행되는동안즉시결과확인 결과파일을클라이언트로복사할필요없이후처리및원격가시화를이용한결과확인 (e.g., graph energies) 인풋인터페이스내에서즉시수정하고재실행가능 ( 큰용량의인풋파일을재업로드할필요없음 ) 보안, 통합엑세스, 언제어디서나 Run Monitor Manage Copyright 2015 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
53 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 웹포털솔루션 Compute Manager HPC 웹포털솔루션으로사용자에게쉽고편리한환경을제공합니다. StarCCM+ Abaqus 해석용웹포털 Optistruct Compute Manager LS-Dyna Fluent Nastran Radioss HPC 클러스터
54 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 웹포털솔루션 Compute Manager 웹기반작업실행 사용자는 Compute Manager 를통하여웹브라우저에서 ID/Password 로로그인후, 원하는어플리케이션을클릭하여 Solving 작업을실행합니다.
55 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 웹포털솔루션 Compute Manager 해석작업의결과를실시간으로확인 ( 애니메이션생성 )
56 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 웹포털솔루션 Compute Manager 해석작업의결과를실시간으로확인 (Plot TOC 생성 )
57 Copyright 2012 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 감사합니다.
