Advanced Storage Networking Support for Low Delay Data Relocation 고재용 공학박사, 대표이사 데이타코러스 ( 주 ) Jykoh@datachorus.com
목 차 1. 네트워크스토리지요구사항 - 특히 data intensive online service에서 2. Big Pipe: 파일공유및병렬스토리지 3. Fast Fluid: 고속전송 HBA 및스위치 4. Local Tank: CDN & P2P Caching 5. 결론 Page 2
1. 네트워크스토리지요구사항 특히 Data intensive online 서비스에서 전반적인 storage 요구사항 새로운 online services Page 3
스토리지이슈와대응 도전 : IT 예산 50% 이상스토리지에소요대응 : Storage Networking + Business Continuity 스토리지의독립인프라化 : 디지털데이터의기하급수적증가 데이터관리복잡도증가 데이터중요성증가 : 유실불허 24 x 7 무중단서비스절실 Storage Networking (SAN, NAS, iscsi) 관리단순화기법 : Consolidation( 통합 ), Virtualization( 가상화 ), Capacity-on-demand, etc 데이터및서비스보호기법 = Business Continuity (BC) backup, replication, mirroring, security, high availability clustering DR 센터구축 Page 4
스토리지의발전방향 지능형 Storage Application SAN and NAS DAS 1999 2000 2001 2002 2003 2004 2005.. Page 5
스토리지네트워킹 - DAS vs. 스토리지네트워킹 DAS: Direct Attached Storage 서버에부착하는주변장치로서의스토리지 Storage Networking: Client A 1 Client A 2 Client A n Server SUN Server Windows NT Server HP 관리복잡 : 다수의관리 Point 벤더종속적, 비효율적투자발생 중복된데이터로스토리지낭비 데이터일관성유지어려움 DAS DAS DAS 용량증설의한계, 서비스의품질저하 Client A 1 Client A 2 Client A n Server SUN Server Windows NT Server HP Storage Network 네트워크를스토리지에접목 확장성 (Scalability), 안정성 (Reliability), 통합을통한관리단순화 (Manageability) Shared Storage Shared Storage Page 6
스토리지네트워킹 - 상세비교 요 소 SAN NAS Hardware 관리단순성 ( 비용 ) 도입비용온라인확장성지능프로토콜데이터공유성능지리적제약적용분야 FC 전용네트워크관리복잡, FC 전문가필요, 고가고가, Vendor-lock-in 염려용량확장없음 각서버의 s/w가지능을가짐 Block level (SCSI) 불가고속지리적으로넓게분산되기어려움고속 DB 일반이더넷관리용이, 저가저가, 장비간호환성 100% 용량확장 NAS head에통합된관리를하는지능내재 File Level (NFS, CIFS, HTTP, FTP) 가능 (NAS head의공통 FS를통해 ) SAN에비해떨어짐지리적으로제약되지않음 File 및 multimedia 등 Contents (DB에도적용 ) Ethernet Ethernet SAN Server A FS A Server B FS B ( 지능없는더미스토리지 ) NAS 헤드 ( 공통 FS) Server A Server B ( 자체 FS 가진지능형스토리지 ) * FS = File System Page 7
지능형스토리지 Application: HA & DR = BC (Biz Continuity) 사고는항상난다 + 유지보수상의중단이필요하다 서비스연속성및데이터보호 HA: High Availability, 서비스무중단보장 S/W, H/W 장애에대해서수초내에서비스가연속되어야함 Clustering failover기술을통해구현 DR: Disaster Recovery, 장애복구 장애 / 재해시다음의파라미터고려하여적절한방법선택필요 : 복구가얼마나빠르게수행되야하는지 비용 : 관련장비및솔루션도입, 네트워크사용료등고려 실시간데이터이중화 (Mirroring) 필수 Sync/Async Mirroring Local or Remote Mirroring HA: Clustering 서비스이중화 DR: Mirroring 데이터까지이중화 Page 8
지능형스토리지 Application: Storage Networking + BC DAS/SAN 기반 BC ( 서버용 BC package 등 ) 고가의도입비용 Storage 기반고가제품도입또는복잡한서버기반복합제품구입필요 서버수가늘어날때마다 per-seat license 비용 OS, drivers, add-on cards 환경에 tuning 필요 NAS 기반 BC (ClusStor) NAS head 에서 BC 업무동시수행 단일시스템으로도입비용 / 관리복잡도획기적개선 스토리지관리를위한요구사항만족 : 데이터통합 + 데이터보호 단일벤더에의한신속 / 신뢰의기술지원 관리비용 FC/Unix전문가필요 장애발생시여러벤더사간책임소재문제발생 Page 9
대규모데이터전송필요성 Ubiquitous Internet broadcast Anyone can broadcast Anyone can tune in Page 10
Massive Data 가필요한새로운 application 들 개인정보온라인서비스 불특정다수로부터의 Upload/Download Mini-homepage, Blog, DiCa, Web Hard Drive Massive data transfer 비디오 streaming Ex) 휴대폰에서의비디오데이터 capture 및공유 기존 broadcast 형태의웹페이지와 traffic 및 storage requirement 판이하게다름 10x ~ 100x 기타 Super computing BIO Informatics, 석유탐사, 기타 Page 11
대규모서비스용스토리지기법 대규모전송을가능하게하는기법 Big Pipe: Data Sharing NAS 및기타공유파일시스템 Fast Fluid: TOE, Direct path, Storage Switch Local Water Tank : CDN, Caching Page 12
2. Big Pipe: 파일공유및병렬파일시스템 대규모 data transfer 를위한시스템 Data Sharing 방법론 NAS Clustered FS Distributed FS Object Storage Page 13
Data Sharing Data sharing 이란? Shared access to same data from multiple servers Changes to data become visible to all servers Data Sharing 을해야하는이유? Better performance and scalability Larger server can be very expensive Concurrency Use same data for more than one application Avoid replication or cloning Administration Consolidated shared resource has lower TCO Data sharing increases the benefits of storage consolidation Page 14
The SNIA shared storage model Application Storage domain Database (dbms) File/record layer Host Network Block aggregation Storage devices (disks, ) Block layer File system (FS) Device Services Page 15
How to apply Data Sharing High availability clusters (local & geographic) Scaling applications Web servers Read mostly/load balanced Databases Mostly use direct I/O Parallel applications and fast failover Systems and applications consolidation/migration Off-host processing e.g., serverless backup, mirroring Based on shared file system Page 16
Data Sharing: Scale up by Scale out Cluster software Shared Storage software Storage Network Shared disks Page 17
Data Sharing: Resiliency Parallel DB Engine Cluster software Panic Shared Storage software Storage Network Shared disks Page 18
IBM AFS Actona ActaStor DB2 CIFS PPFS ISO9660 PolyServe Matrix Server ONStor ST SMB EMC Parallel & Partitioned Applications Web FS Coda VERITSA CFS Distributed, Cluster or SAN File System MPP Melio FS FS Isilon IQ One FS RFS Open AFS Redhat GFS Volume Page 19
How is data shared? 공유레벨 Share at the volume level Share at file or file system (FS) level Share at database or application level 이슈 동종 / 이기종 OS 간공유 Concurrently or serially Storage or network Object Storage Volume manager Volume built by RAID controller EMC Highroad SGI CXFS OnStor GoogleFS IBM DB2 Sybase Informix Ingres Oracle 9i RAC Page 20
Data sharing at Volume level 볼륨메니져가 low level에서데이터공유를하게함 Physical storage를병합하거나쪼갬 -> logical volume Logical volume을공유하게함 Concurrency control은상위레벨에서해야하는경우가대부분제품예 Veritas Volume Manager IBM LVM HP Shared LVM, REDHAT LVM Page 21
Data sharing at File/FS level NAS Dedicated File server SAN FS (NAS-SAN Integration) Metadata server Manage and use metadata on disk Other servers (clients) connected to SAN storage Metadata exchanged through network, file blocks transferred through SAN (NAS-SAN integration) Good for Video/Media apps Cluster FS All nodes understand on-disk FS structure (symmetric) Single FS image Same data view from all nodes Good for Web servers farm Object Storage Central metadata server Data striped over Object Storage Device (OSD) Page 22
NAS NFS & CIFS Asymmetric architecture - TCP/IP based Application 1. Direct-attach 2. SN-attach File/record layer Host. with LVM and software RAID Host. with LVM Host NAS head LAN Host 3. NAS head 4. NAS server Host block-aggregation SN NAS server Network block-aggregation Block layer Disk array Device block-aggregation Page 23
NAS 외의병렬공유시스템 Application File/record layer Block layer Host. with LVM and software RAID Shared LVM Host. with LVM SAN FS Disk array SN Host NAS head LAN Host Cluster FS Object Storage NAS server Services Discovery, monitoring Resource mgmt, configuration Security, billing Redundancy mgmt (backup, ) High availability (fail-over, ) Capacity planning Page 24
SAN FS File/record layer Host. with LVM Application Host File metadata LAN File system metadata Hosts get file meta-data from FS/NAS controller, Then access the data directly through SAN (or IP-SAN) FS controller can also be NAS server Block layer Block accesses Disk array SN Page 25
CFS (Cluster FS) File/record layer Host. with LVM and software RAID Optional Cluster FS Host. with LVM Shared LVM Application Host Host LAN Cluster FS NAS head NAS head How? 공유 VM/FS Load balancing in front 장점 Increased throughput More efficient use of servers Failure is transparent 높은 SLA SN Block layer Disk array Page 26
CFS (Cluster FS) Asymmetric implementation Master node 가 FS 를 mount 하고, logging & locking 관리 어느노드든 master 가될수있음 다른노드는 logging & locking 에대한클라이언트 node failover Master 노드 : System log 와 lock 을모두복원하여야함 다른노드 : failed node 의 lock 을 release/recover 해야함 Symmetric implementation 모든노드가 FS 를 mount 하고파일접근 Symmetric lock management 노드별 logging Node failover: recover log & release/recover locks Page 27
CFS (Cluster FS) Lock Management DLM or GLM (Distributed/Global Lock Management) Symmetric (master node) or Asymmetric (all node) Lock state propagation: to all nodes vs. local lock repository Granularity: file, record, byte Concurrent vs. Serial Cache coherency Modification is seen to all nodes Page 28
CFS (Cluster FS) Pros Nodes in cluster access the same data Good application portability (Standard POSIX API) Good for Read-intensive apps Better cache effect Cons Scalability up to tens of nodes (NOT hundreds, thousands) Based on expensive SAN technology The more disks, the better performance (depends on I/O parallelism) Lock manager should be maintained (Global vs. Distributed) Limited performance scalability Performance in case of concurrent writes to the same file Write-intensive apps Page 29
Object Storage (Google FS or GFS) Single master to handle meta-data. Many chunkservers to hold data. Files are broken into chunks. Many clients which need access to possibly tera-bytes of data. Page 30
Object Storage Architecture Application GFS client chunk location GFS Master chunk data GFS chunkserver GFS chunkserver Linux file system GFS chunkserver Linux file system Linux file system Page 31
GFS: pros and cons Pros and Cons Pros Scalability: up to hundreds and thousands of nodes Linear performance scale up Availability: Failure protection Performance: Can boost up by collaborative caching Very cost effective parallel architecture without SAN Cons Slow individual response time Not suitable for simple and small storage application Page 32
Parallel Database Parallel DB Engine Clustor software Shared Storage software Parallel DB Engine Clustor software Shared Storage software Storage Network Storage Network Parallel: Shared Disk Partitioned: Shared Nothing Page 33
3. Fast Fluid: 고속전송 HBA 및스위치 고속전송방법론 TOE Direct path Storage Switch Page 34
고속전송시스템 TOE(TCP/IP Offload Engine) TCP/IP Offload Engine TCP/IP 를 hardware 로수행하여 CPU 의부담을 offload Direct Path Direct path from Fibre Channel to Ethernet Data 와 control path 의분리 Data packet 이 Network HBA, Network Device Driver, OS Kernel, Storage Device Driver, Storage HBA 를거치는복잡한과정을하나로통일 Storage Switch Storage Application (virtualization, snapshot, mirroring 등 ) 을네트워크스위치상에서수행 Page 35
4. Local Tank: CDN & P2P Page 36
Overlay multicast architectures Router Source Application end-point Page 37
Infrastructure-based CDN [Akamai] + Well-provisioned Router Source Application end-point Infrastructure server Page 38
Application end-point CDN (P2P) + Instantly deployable + Enables ubiquitous broadcast Router Source Application end-point Page 39
Waypoint architecture[esm] W + Waypoints as insurance Router Source W Application end-point Waypoint Page 40
5. 결론 Fast Data Relocation 방법 Page 41
Fast Data Relocation Big Pipe 데이터전송병렬성을높혀 throughput의향상 공유파일또는병렬 DB 시스템의사용 Fast Fluid 전송프로토콜의향상 TCP/IP 또는스위치에서 h/w로 logic 수행 Local Tank 필요한데이터를 edge에저장, 네트워크트래픽자체를줄임 CDN, P2P caching Page 42