프로젝트 1 1 ARM CPU Architecture 단국대학교컴퓨터학과 2009 백승재 ibanez1383@dankook.ac.kr k k http://embedded.dankook.ac.kr/~ibanez1383
강의목표 2 ARM 종류와특징및최신동향파악 ARM CPU Architecture 이해 ARM Assembly 숙지
ARM 3 ARM? Advanced RISC Machines! not In 1983 at Acorn Computers Ltd. By Herman Hauser, Steve Furber, Sophie Wilson, Robert Heaton, Jamie Urquhart For low power, low cost, simple, small Today, the ARM family accounts for approximately 75% of all embedded 32-bit RISC CPUs.
ARM 4 They do not sells ARM processors They sells only IP What s the IP? As Hard macrocell or Synthesizable core... To processors manufactures They also sells various development enviroments
An auxiliary textbooks 5 Steve Furber Addison-Wesley Andrew Sloss Morgan Kaufmann
ARM Core & Family & Processor 6 Cortex ARMv6M Cortex-M1 ARMv7-A ARMv7-R ARMv7-M Cortex-A8 Cortex-A9 Cortex-A9 MPCore Cortex-R4(F) Cortex-M3 ARM11 ARMv6 ARM1136J(F)-S 8 stage pipeline, SIMD ARMv6T2 ARM1156T2(F)-S 9 stage pipeline, Thumb-2 ARMv6KZ ARM1176JZ(F)-S ARMv6K ARM11 MPCore 1~4 core SMP ARMv5TEJ ARM7TDMI ARM7EJ-S 5 stage pipeline, Jazelle, Enhanced DSP instructions ARM9E ARM926EJ-S ARM10E ARM1026EJ-S ARMv5TE ARM9E ARM946E-S ARM966E-S ARM968E-S ARM966HS ARM10E ARM1020E ARM1022E XScale 80200 IOP321 PXA210 PXA250 PXA255 PXA26x PXA27x Monahans PXA900 IXP42x ARMv4T ARM7TDMI ARM7TDMI ARM719T ARM720T ARM740T 3 stage pipeline, Thumb ARM9TDMI ARM9TDMI ARM920T ARM922T ARM940T ARMv4 StrongARM SA-110 SA-1110 ARM8 ARM810 5 stage pipeline, static branch prediction ARMv3 32bit addr ARM6 ARM60 ARM600 ARM610 ARM7 ARM700 ARM710 ARM7100 ARM7500 ARM7500FE ARMv2 ARM2/3 ARM1 ARM2 ARM2a MUL instruction, MMU, CPU cache ARMv1 ARM1 ARM1 13 stage pipeline, superscalar, application profile, NEON,... 6 stage pipeline 7 stage pipeline CORE FAMILY Processor
Case Study: Cortex-A9 MPCore 7
Case Study: OMAP 4 8 1080 pixel full HD 동영상녹화. 재생, DSLR급의 2천만화소사진촬영, 약 1주일간의오디오재생, 기존스마트폰대비 10배이상빠른웹페이지로딩, 7배이상의컴퓨팅성능,...
ARM Architecture Ver ARM{x}{y}{z}{T}{D}{M}{I}{E}{J}{F}{S} x : 제품군 y : MMU/MPU z : cache T : Thumb 16bit decoder D : JTAG Debug M : 고속덧셈기 I : EmbeddedICE macrocell E:DSP 확장명령어 J : Jazelle F : VFP Device S : Synthesizible version
3-stage 5-stage 6-stage Fetch : Instruction fetch ARM pipeline Decode : Instruction decoding, operand read Execute : ALU 연산결과생성, 레지스터에기록 Fetch : Instruction fecth Decode : Instruction decoding, operand read Execute : ALU 연산결과생성, load/store 명령인경우메모리주소계산 Buffer/data : 필요시 data 메모리접근 / 그렇지않으면모든명령어에대해서동일한파이프라인흐름을만들기위해한클럭동안 buffer 됨 Write-back : 결과를레지스터파일에저장 Fetch, Issue, Decode, Execute, Memory, Write 7 or 8-stage Why?
ARM Architecture 의몇가지특징 Pipelined architecture 특정명령어 (load/store-multiple instructions) 에대한여러cycle에걸친실행허용 2개의 source reg (Rn, Rm) 와 1개의결과 reg(rd) Inline barrel shifter Barrel shifter 를통한 operand 의선처리작업가능 ARM 32-bit instruction set 과 Thumb 16-bit instruction set Thumb instruction set 을사용하여코드크기를 30% 정도줄임 Conditional Execution Branch instruction ti 의수를줄여코드크기와성능을향상시킴 Data forwading Pipeline 단계에서각 operand 의 forwading 가능 PC 값의모호성 증가된 PC 값은별도 reg 에저장
ARM Register r 0 r 1 r 2 Unbanked r 3 Register r 4 Banked Register r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 1 0 r 11 r12 r 13(SP) r 14(LR) r 15(PC) CPSR SPSR SPSR SPSR SPSR SPSR User FIQ IRQ Supervisor Undefined Abort User Mode System Mode
Processor Mode Privileged mode cpsr을완전히읽고쓸수있는 6개의모드 일반모드 abort( 메모리액세스가실패한경우 ) mode FIQ, IRQ mode supervisor( 리셋걸렸을때진입, OS 커널이동작 ) mode system(user mode의특수한경우로, cpsr을완전히읽고쓸수있음 ) mode undefined( 비정의된명령어등을만났을때 ) mode User mode( 일반 app용 ) cpsr의제어필드는읽기만가능, 상태플래그는읽고쓰기가가능 특정 mode 진입방법 Privileged mode 에서 CPSR 의 mode 값을 set Exception의발생
Special registers 각레지스터의용도 R13 : SP(Stack Pointer) 스택을사용하지않을경우일반레지스터로사용가능 R14 : LR(Link Register) Branch시복귀주소저장 사용하지않을경우일반레지스터로사용가능 스택비사용으로인한성능향상가능 R15 : PC(Program Counter) Operand로사용가능 mov pc, lr? 중첩된 branch?
Current Program Status Register 31 30 29 28 27 CPSR register 7 6 5 4 3 2 1 0 N Z C V Q UNUSED I F T Mode CPSR[4:0] CPSR[4:0] Mode 의미 Register 10000 User User mode user 10001 FIQ fast interrupt 처리 _fiq 10010 IRQ 정상 interrupt 처리 _irq 10011 SVC Software interrupt (SWI) 처리 _svc 10111 Abort Memory fault 처리 _abt 11011 Undef 정의되지않은명령어 trap 처리 _und 11111 System Privileged OS task 실행 user CPSR[31:27] 비교명령어나 S 가붙은명령어에의해변경 N: negative 마지막 ALU 연산의결과가음의값 32bit 결과값의 MSB가 1 Z: zero 마지막 ALU 연산의결과가 0 C: carry 마지막 ALU 연산이나 shift 연산의결과가 carry-out를발생시킴 V: overflow 마지막 ALU 연산이 sign bit에 overflow를발생시킴 Q: Enhanced DSP instruction에서 overflow나 saturation
ARM 의 Exception Exception, INT 발생시 PC 값은미리정해진주소를가리킴 Exception/INT 약자주소 Reset RESET 0x0000 0000 Undefined Instruction UNDEF 0x0000 0004 Software Instruction SWI 0x0000 0008 Prefetch Abort PABT 0x0000 000c Data Abort DABT 0x0000 0010 Reserved 0x0000 0014 Interrupt Request IRQ 0x0000 0018 Fast Interrupt Request FIQ 0x0000 001c 그렇다면 0x00000000 번지에는? branch 명령어 레이블 Booting 과정및자세한 Interrupt 처리는다음에자세히...
Operand 의선처리작업 Barrel Shifter Rn Rm Immediate #imm Barrel Shifter Register Logical shift left by immediate Logical shift left by register Logical shift right by immediate Logical shift right by register Arithmetic shift right by immediate Rm Rm, LSL #imm Rm, LSL Rs Rm, LSR #imm Rm, LSR Rs Rm, ASR #imm ALU Aih Arithmetic i shift right by register Rm, ASR Rs Rotate right by immediate Rm, ROR #imm Rd Rotate right by register Rotate right with extend Rm, ROR Rs Rm, RRX
Arithmetic Shift 와 Logical Shift Logical right shift 는 shift 되어비어있는왼쪽 k 개의 bit 를 0 으로채움 Arithmetic right shift는 shift 되어비어있는왼쪽 k개의 bit를원래의 MSB(Most Significant Bit) 를가지고채움 X >> 8 Logical shift Arithmetic shift 10000000001110000100000100000100 00000000100000000011100001000001 11111111100000000011100001000001
Barrel Shifter 사용예
수행예제 (1/4)
수행예제 (2/4)
수행예제 (3/4)
수행예제 (4/4)
ARM Basic Instructions binary op 설명 0000 AND Rd = op1 AND op2 0001 EOR Rd = op1 XOR op2 0010 SUB Rd = op1 op2 0011 RSB Rd = op2 op1 0100 ADD Rd = op1 + op2 0101 ADC Rd = op1 + op2 + C 0110 SBC Rd = op1 op2 + C 1 0111 RSC Rd = op2 op1 + C 1 1000 TST op1 AND op2 CPSR 1001 TEQ op1 XOR op2 CPSR 1010 CMP op1 op2 CPSR 1011 CMN op1 + op2 CPSR 1100 ORR Rd = op1 OR op2 1101 MOV Rd = op2 1110 BIC Rd = op1 AND (NOT op2) 1111 MVN Rd = NOT op2
수행예제 (1/4)
수행예제 (2/4)
수행예제 (3/4)
수행예제 (4/4)
수행예제 (1/7)
수행예제 (2/7)
수행예제 (3/7)
수행예제 (4/7)
수행예제 (5/7)
수행예제 (6/7)
수행예제 (7/7)
PSR 관련, MUL, 데이터이동 instruction MRS{cond} Rd, <psr> Transfer PSR contents to a reg MSR{cond} <psr>, Rm Transfer reg contents to PSR MUL{cond}{S} Rd, Rm, Rs Rd = Rm * Rs MLA{cond}{S} Rd, Rm, Rs, Rn Rd = Rm * Rs + Rn LDR{cond}{B} Rd, address{!} Rd = contents of addr LDR{cond}{B} Rd, =expression Rd = expression STR{cond}{B} Rd, address{i} contents of addr = Rd
Multiple reg Data Transfer <LDM STM>{cond}mode Rn{!}, {reg_list}{^} 1. Post-Increment Addr 2. Pre-Increment Addr 3. Post-Decrement Addr 4. Pre-Decrement Addr
수행예제 (1/5)
수행예제 (2/5)
수행예제 (3/5)
수행예제 (4/5)
수행예제 (5/5)
Swap instruction & SWI SWP{cond}{B} Rd, Rm, [Rn] Temp = Rn; Rn = Rm; Rd = temp B: bit 0 ~ 7 까지만영향미침 Int disable 없이 semaphore 연산가능 SWI{cond} <expression> Software interrupt instruction Expression 의내용이 SWI 명령의 low24bit 에인코딩됨
수행예제 (1/2)
수행예제 (2/2)
Branch Instruction 24비트의 offset을 2비트 left로 shift하여, +/-32MB 접근만약 32MB가넘을때는Register를이용
ARM Procedure Call Standard(APCS) Reg ister APCS name APCS role 0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register 2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register 4 v1 Register variable 1 5 v2 Register variable 2 6 v3 Register variable 3 7 v4 Register variable 4 8 v5 Register variable 5 9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7 11 fp Frame pointer 12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame 14 lr Link address / scratch register 15 pc Program counter
Stack Why? When? Stack addressing mode
Stack Mode 49 High SP High SP Low base Full Ascending base Low Empty Ascending High base High base Low Full Descending SP SP Low Empty Descending
Addressing Mode and Stack 50 Addressing Mode Stack Post-increment Addressing (IA) Pre-increment Addressing g( (IB) Post-decrement Addressing (DA) Pre-decrement Addressing (DB) Full Assending (FA) Empty Assending (EA) Full Decending (FD) Empty Decending (ED)
조건부실행 니모닉의미상태플래그 EQ equal Z NE not equal z CS HS carry set / unsigned higher 또는 same C CC LO carry clear / unsigned lower c MI minus / negative N PL plus / positive 또는 zero n VS overflow V VC no overflow v HI unsigned higher zc LS unsigned lower 또는 same Z or c GE signed greater than 또는 equal NV or nv LT signed less than Nv or nv GT signed greater than NzV or nzv LE signed less than 또는 equal Z or Nv or nv AL always( 무조건실행 ) ignored
ARM Assembly Test 52 -O3 ARM compiler -O3 IA compiler