S3C6410 & boot loader for real target android system Naver linuxkernel26 운영자 : 박철 (e2g1234@naver.com)
강좌목적 Real target Android HW system 이해 Real target Android HW system의핵심인 ARM core 그리고 S3C6410에대하여알아본다. Real target Android HW system이부팅하기위해서필요한부트로더의기본기능과동작을통하여안드로이드 SW platform이어떻게로딩되어지는지를알아본다. 기존에많이사용하는 U-boot 대신에자체제작한부트로더의특징및활용에대하여알아본다.
강좌순서 이번주 ARMv6 : Instruction set architecture ARM1176JZFS : ARM core S3C6410 : ARM SoC A&W6410 : HW platform for android Boot loader for android system 다음주 Kernel for android Android SW platform Native app & JAVA app on Android SW platform
RISC Design Philosophy (1) The ARM core uses a RISC architecture RISC : Reduced Instruction Set Computer execute within a single cycle at a high clock speed easier to provide greater flexibility and intelligence in software Greater demands on the compiler
RISC Design Philosophy (2) 1. Instructions 4 major design rules! A reduced number of instruction classes to provide simple operations that can each execution in a single cycle Compiler or programmer synthesizes complicated operations Fixed length to allow the pipeline to fetch future instructions before decoding the current instruction CISC : variable size, take many cycles to execute 2. Pipelines Broken down into smaller units that can be executed in parallel by pipelines Ideally the pipeline advances by one step on each cycle for maximum throughput Can be decoded in one pipeline stage No microcode as on CISC
RISC Design Philosophy (3) 3. Registers A large general-purpose register set 4 major design rules! Any register can contain either data or an address Fast local memory store for all data processing operations CISC : dedicated registers 4. Load-store architecture Memory accesses are costly, so separating memory accesses from data processing provides an advantage because you can use data items held in the register bank multiple times without needing multiple memory accesses CISC : data processing operations can act on memory directly Ex) 80x86 : ADD [BX], CX
ARM Design Philosophy The ARM core is not a pure RISC architecture Non-RISC ideas suitable for embedded applications Variable cycle execution on certain instructions to save power, area, and code size (e.g. LDM, STM) Barrel shifter to expand the capability of certain instructions Thumb 16-bit instruction set to improve code density (about 30%) Improving code density and performance by conditionally executing instructions (by reducing branch instructions) Enhanced instructions to perform digital signal processing type functions Made the ARM one of the most commonly used 32-bit embedded processor core Over one billion ARM processors had been shipped worldwide by the end of 2001
ARM Nomenclature ARM{x}{y}{z}{T}{D}{M}{I}{E}{J}{F}{-S} x : family y : memory management/protection unit z : cache T : Thumb 16-bit decoder D : JTAG debug M : fast multiplier I : EmbeddedICE macrocell (debug hardware) E : enhanced instructions (assumes TDMI) J : Jazelle (acceleration technology for the Java platform) F : vector floating-point unit S : synthesizable version Processor family : sharing the same H/W characteristics ARM7TDMI, ARM740T, ARM720T => ARM7 family ARM926EJ-S ARM1176JZF-S
ARM Instruction Set Architecture (ISA) Version Version Revision Example core ISA enhancement 1 ARMv1 ARM1 First ARM processor (1985), 26-bit addressing 2 3 4 5 ARMv2 ARM2 32-bit multiplier, 32-bit coprocessor support ARMv2a ARMv3 ARM3 ARM6, ARM7DI StrongAR M ARM7TDM I ARM9T ARM9E ARM10E On-chip cache, Atomic SWP instruction, CP15 for cache management 32-bit addressing, Separate CPSR and SPSR, New modes (undefined instruction, abort), MMU support (virtual memory) ARMv3M ARM7M Signed & unsigned long multiply instructions Load-store instructions for signed and unsigned ARMv4 halfwords/bytes, New mode (system), Reserve SWI space for architecturally defined ARMv4T ARMv5TE ARMv5TEJ ARM7EJ ARM926EJ 6 ARMv6 ARM11 Thumb Superset of the ARMv4T, Enhanced multiply instructions, Extra DSP-type instructions, Faster multiply accumulate Java acceleration Improved multiprocessor instructions, New multimedia instructions
ARM Family Attribute Comparison ARM7 ARM9 ARM10 ARM11 Pipeline depth 3-stage 5-stage 6-stage 8-stage Typical MHz 80 150 260 335 mw/mhz 0.06 0.19 0.5 0.4 MIPS/MHz 0.97 1.1 1.3 1.2 Architecture Von Neumann Harvard Harvard Harvard Multiplier 8 x 32 8 x 32 16 x 32 16 x 32
ARM Processor Variants CPU core MMU/MPU Cache Thumb ISA ARM7TDMI None None Yes v4t ARM926EJ-S MMU Separate 8K/8K D+I Yes v5tej ARM946E-S MPU Separate 8K/8K D+I Yes v5te ARM1020E MMU Separate 32K/32K D+I Yes v5te ARM1176 MMU Separate 16K/16K D+I Yes v6te SA-1110 MMU Separate 8K/16K D+I No v4 PXA255 MMU Separate 32K/32K D+I Yes v5te
ARM9E-S Core 32-bit RISC 32-bit registers Instruction Fetch PC 32-bit instructions Longword aligned Decode register read issue checks PC - 4 32-bit datapaths 5-stage pipeline Shift, ALU Address Calc. Buffer and Cache access PC - 8 Register Write
ARMv6 pipeline stages
ARM9 organization StrongARM core pipeline organization Rn Rn
ARM1176JZF-S 블록도 AMBA 3.0 AXI 를통하여주변장치가연결 4개의 AXI 사용가능 - SoC 내부에서병목현상을줄일수있다. ISA는 ARMv6 확장버젼을사용 ARMv6 : SIMD 미디어지원 ARMv6T2(Thumb 2 지원 ) ARMv6Z(TrustZone, IEM)
status registers
Mode dependent shadow register
ARM1176JZF-S 의특징 (1) SIMD : 멀티미디어응용을위한 Single Instruction Multiple Data 명령어확장 Codec 프로그램의 4배향상 NEON : Multimedia, DSP 응용을위한 SIMD 명령어확장 64/128-비트 SIMD Thumb-2 - Thumb 버전 2 Thumb : ARM 명령어의부분집합을 16-비트명령어로재코딩 Thumb 개선버전으로고성능, 에너지효율, 코드밀도의개선 35% 메모리절약 TrustZone 주소공간의보안영역을설정하는하드웨어지원 Secure 주소공간제공 TrustZone security extensions
ARM1176JZF-S 의특징 (2) VFP(Vector Floating point) coprocessor support 실수연산가속을위한보조프로세서 실수연산 external coprocessor interface Instruction and Data Memory Management Units (MMUs), managed using MicroTLB structures backed by a unified Main TLB Instruction and data caches, including a non-blocking data cache with Hit-Under-Miss (HUM) virtually indexed and physically addressed caches 64-bit interface to both caches
ARM1176JZF-S 의특징 (3) level one Tightly-Coupled Memory (TCM) that you can use as a local RAM with DMA trace support JTAG-based debug. DSP DSP 응용프로그램을위한 16, 32-비트산술연산능력확장 Audio 응용에서 70% 성능향상 Jazelle Java byte code의하드웨어적수행능력확장 JVM에비해 8배성능향상, 80% 전력감소
TrustZone model
Features Powerful Multimedia support H.264, MPEG4, WMV JPEG 2D/3D Accelerator TV Out Communication support I2C, I2S, UART, IrDA, SPI USB 2.0 MMC/SD, CE-ATA AC97 Audio MMI Max 1024*1024 LCD Touch screen 8*8 Keypad
Block diagram
S3C6410 특징 ARM1176ZJF 533/667MHz VFP/SIMD 65nm low-power process DVFS power management Dedicated x32 mddr/ddr, x32 msdr/sdr WVGA or higher display resolution Hard-wired 3D GFX accelerator 4M triangles/second OpenGL ES 1.1/2.0 Hard-wired multimedia (>WVGA) MPEG-4 SP codec: SD/D1 >30fps H.264/263 BP codec: SD/D1 >30fps VC-1 (WMV9) decoder: SD/D1 >30fps JPEG/2D hardware Hardware rotator & post processor TV-out (DAC + image enhancer) 32-channel DMA Security hardware: DES/3DES, AES, SHA-1 High-speed connectivity UART interfacing BT EDR 2.0 up to 3Mbps High-speed SPI, 50Mbps for mobile TV USB 2.0 OTG High-speed MMC 8-bit 50MHz MMC+/eMMC SDHC 4-bit 50MHz for highdensity SD cards/inand 2.0 and WiFi 802.11a/b/g I2S for 5.1-channel Dolby and stereo audio BOM cost savings by integrating: USB host 1.1/USB 2.0 OTG PHYs 12-bit ADC with TS + built-in FETs Direct boot/nand FS for NAND SLC/MLC, movinand, inand, OneNAND TV-out DAC integration Built-in keypad controller Package: 424 pins, 13x13, 0.5mm pitch FBGA
메모리맵
Memory map 0 번지는설정에따라 remap 됨
AHB 버스주변장치메모리맵
AHB 버스주변장치메모리맵
AHB 버스주변장치메모리맵
AHB 버스주변장치메모리맵
AHB 버스주변장치메모리맵
AHB 버스주변장치메모리맵
System clock 설정 Clock generator Power-management Controller와함께 System controller를구성하는모듈 클럭의종류 ARMCLK ARM PLL(APLL) 에서생성하는클럭으로 ARM core로만공급됨 HCLK Main PLL(MPLL) 에서생성함 AXI/AHB-bus 주변장치에공급됨 PCLK MPLL에서생성함 APB-bus 주변장치에공급됨 ECLK Extra PLL(MPLL) 에서생성함 오디오관련클럭으로사용됨 By passing 동작 각주변장치용클럭은전원절약을위해소프트웨저적컨드롤에의해활성화 / 비활성화가능함
시스템클럭 1. CPU clock 과동기를맞출지를결정한다. 2. PLL 설정을한다. 3. Routing 을한다. 4. 분주를한다. 5. Clock Enable
PLL 구성도
PLL 공식
Core clock
Memory and peri clock
NAND controller feature Page Size - 512 bytes and 2048 bytes NAND Type - support SLC and MLC ECC - 1 bit ECC for SLC and 4 bits ECC for MLC NAND Boot - using internal SRAM (8K bytes) Pin Configuration - indicates page size and address cycle Software Mode - full acces to NAND device and check RnB SFR I/F - byte/half word/word access
Nand 의분류 종류 Page size Block 당 page 수 Size 및 Address SLC Small block 512Byte 32page Large block 2048Byte 64page K9F1208 4096block 64Mbyte K9F2G08 2048block 256Mbyte MLC Large block 2048Byte 64page K9F8G08 8196block 1Gbyte
NAND Timing
K9K2G08 Functional Block Diagram
K9F1208 Array organization
K9F1G08 Array organization
K9F2G08 Array organization
K9F8G08 Array organization
SLC spare area
MLC spare area
Bootloader 기능 H/W 초기화 CPU clock, Memory timing, interrupt, UART, GPIO 등을초기화 Linux Booting Kernel Image를 DDR에저장한후 Kernel image 주소로점프 Image Downloading Kernel Image(zImage) 와 file system를 DDR에 download USB, Ethernet을통한다운로딩가능 Flash 제어 Flash write, erase 기능 Flash lock, unlock 기능등
S3C6410 Boot flow(1) 메모리맵을이해한다. NAND 부팅모드의설정에서의전반적인메모리맵대한이해 메모리와주변장치의물리번지를 block diagram 과관계해서이해한다. Stepping stone 동작 NAND에있는 8Kbyte의영역이 6410내부에있는 SRAM으로복사됨 Stepping stone이동작함 NAND에있는 loader의일부가 SRAM으로복사됨 SRAM과 NAND에관련된동작과정분석 Power on reset 0 번지에서 ARM 11core 동작 (step loader 동작 ) ARM의 Power on reset 과정분석 SRAM의 code가 ARM 11 core에서동작됨 8 stage의과정을이해해야함이때코드의동작은 NAND에있는 AnW_boot가 DDR로복사됨
S3C6410 Boot flow(2) DDR 메모리에서동작 기본적인 HW초기화 인터럽트 disable 시스템 clock 초기화 GPIO 초기화 memory controller 초기화 UART 초기화 메뉴에의한동작제어 Download기능 USB 을통해 bootloader kerenel 램디스트드이 DDR 메모리로이동함 DDR 에서 NAND 로 fusing NAND 에있는 image 를 DDR 로 copy 커널로점프 각종 Device test
Boot loader 화면 A&W6410-Ver01 Copyright (C) 2009 e2g(embedded expert group) Support: http://www.e2g.org http:// Autoboot in progress, press any key to stop.. ********************** Boot menu ******************** * 1. Bootloader Download & Flash fusing * * 2. Kernel Download & Flash fusing * * 3. Ramdisk Download & Flash fusing * * 4. e-boot Download & Flash fusing * * 5. wince Download & Flash fusing * * 6. Jump to eboot * * 7. downloading & execute * * a. Jump to Command Line * ***************************************************** select the command number :
Fusing & 초기화과정 DDR PC Bootloader Cupcake Host 용 File system System.img System source Ramdisk.img copy NAND Ramdisk source Kernel source zimage Bootloader source bootloader fusing Bootloader SRAM PC Nand boot Bootloader (8K)
Bootloader Download & Flash fusing DDR PC Bootloader Cupcake Host 용 File system Bootloader System source System.img Ramdisk source Ramdisk.img download NAND Kernel source zimage Copy Bootloader source bootloader Bootloader SRAM
Kernel Download & Flash fusing DDR PC Bootloader Cupcake Host 용 File system System source System.img zimage Ramdisk source Ramdisk.img download NAND Kernel source zimage Copy Bootloader source bootloader zimage SRAM
Ramdisk Download & Flash fusing DDR PC Bootloader Cupcake Host 용 File system Ramdisk.img System source System.img Ramdisk source Ramdisk.img download Copy NAND Kernel source zimage Ramdisk.img Bootloader source bootloader SRAM
Auto boot DDR Cupcake Host 용 File system PC Bootloader Ramdisk.img System source Ramdisk source System.img Ramdisk.img copy PC copy NAND zimage copy Ramdisk.img Kernel source zimage zimage Bootloader source bootloader Bootloader SRAM Nand boot PC Bootloader (8K)
System 다운로딩및 write 과정 DDR Cupcake Host 용 File system PC tftpd Ramdisk System.img vmlinux System source Ramdisk.img NAND Ramdisk source Kernel source zimage download tar system Bootloader source bootloader System.tz gunzip System.img SRAM
감사합니다. http://