1 Interrupt, Trap and System call May, 2016 Dept. of software Dankook University http://embedded.dankook.ac.kr/~baeksj
인터럽트의분류 2 인터럽트 외부인터럽트 fault page fault, 트랩 trap int, system call, abort devide by zero,
인터럽트와트랩의처리 3 devide by zero error debug nmi int3 timer H/W CPU 0 1 2 3 IDT(IVT) idt_table devide_error debug nmi int3 Kernel 0 sys_call_table sys_restart_syscall sys_exit sys_write N action irq_desc action keyboard PIC 32 0 timer_interrupt floppy HDD 128 255 system_call do_irq 223 floppy_interrupt floppy_interrupt action action
시스템호출처리 (1/14) 4 User program C library Call gate System call handler 레벨 3 ( 사용자레벨 ) 레벨 0 ( 커널레벨 ) System call rountine
시스템호출처리 (3/14) 5 시스템호출 (system call) 태스크가커널의서비스요청 (fork, read, socket, ) 트랩처리메커니즘에의해처리 : 0x80 system call table 이용 0x0 0x80 IDT divide_error() debug() nmi() system_call() Kernel sys_call_table (sysent[]) 0 sys_no_syscall() 1 sys_exit() 2 sys_fork() 3 sys_read () 4 sys_write () system_call() 47 sys_getpid() sys_fork() sys_read() 255 sys_no_syscall()
시스템호출처리과정 6 시스템호출처리과정 Kernel user task do system call sys_call_table /* arch/i386/kernel/entry.s */ libc.a push args save system call number make trap idt_table /* arch/i386/kernel/traps.c*/ real system call handler system_call () /*arch/i386/kernel/entry.s */ catch trap through IDT call real handler function using sys_call_table
시스템호출처리과정 7 시스템호출처리과정예 : fork user task main() fork() libc.a 0x0 IDT divide_error() debug() nmi() Kernel ENTRY(system_call) /* arch/i386/kernel/entry.s */ SAVE_ALL call *SYMBOL_NAME(sys_call_table)(,%eax,4) ret_from_sys_call (schedule, signal, bh_active, nested interrupt handling) fork() movl 2, %eax int $0x80 0x80 system_call() 1 2 3 4 sys_call_table sys_exit() sys_fork() sys_fork() sys_read () /* arch/i386/kernel/process.c */ sys_write () /* kernel/fork.c */
시스템호출처리과정 8 1 2 3 넘어온매개변수를커널모드스택에저장 4 5 1. 제어유닛이자동으로저장한 eflags, cs, eip, ss, esp 제외한모든 reg 를스택에저장 2. ebx reg 에 current P 의디스크립터를저장 3. current 의 ptrace 필드에 PT_TRACESYS 플래그가들어있는지, 즉디버거가프로그램의시스템콜호출을추적중인지검사 syscall_trace() 를처음, 마지막두번호출하게됨 4. 올바른 syscall 번호인지검사. 잘못된번호이면바로종료 5. dispatch table 의각엔트리는 4 바이트이므로, 시스템콜번호에 4 를곱한후 + sys_call_table 시작주소를더해서 서비스루틴의 ptr 얻어와서호출함. 호출종료되면리턴값을저장한스택 ( 사용자모드에서의 eax) 에저장해놓고, syscall 핸들러를종료하는 ret_from_sys_call 로점프
시스템호출처리과정 9
시스템호출구현 10 system_call() sys_system_call() asmlinkage Asm 내에서 C 함수를호출할수있도록함 시스템콜번호 각시스템콜에시스템콜번호부여. 고유한숫자 sys_call_table에저장하고있다 시스템콜핸들러 int $0x80 128번 vector의, 프로그래밍에의한예외발생순서 시스템콜사용위해커널에인터럽트를건다 (eax에 syscall번호 ) reg를커널모드스택에저장 시스템콜서비스루틴호출하여처리 ret_from_sys_call() 로핸들러서빠져나옴적법한매개변수, 퍼미션인지검사필수 copy_to_user(), copy_from_user() 사용
시스템호출구현 11 시스템콜컨텍스트 이때 current 포인터는시스템콜을호출한프로세스를가리킴시스템콜을실행하는동안커널은프로세스컨텍스트에존재따라서휴면가능, 완전히선점가능 휴면가능하므로, 시스템콜은커널의대부분의기능을사용가능 선점가능하므로, re-entrant해야함 최대한빠르고, 더이상간단할수없도록구현
시스템호출구현 12 매개변수확인 매개변수가주소인경우검사방법두가지 선형주소가 P 주소공간에속하는지, 속하면접근권한이있는지검사 선형주소가 PAGE_OFFSET 보다낮은지만확인 verify_area() 함수이용 system call 에전달한주소검사 (=access_ok 매크로 ) 프로세스주소공간접근
새로운시스템호출구현 (1/9) 13 새로운시스템호출구현 커널수정 : 4 단계 1. 새로운시스템호출번호할당 (allocate syscall_number) 2. 새로운시스템호출함수 sys_call_table[] 에등록 3. 새로운시스템호출함수커널에구현 4. 커널컴파일및리부팅 사용자응용작성 : 2단계 1. 새로운시스템호출을사용하는사용자수준응용작성 2. 라이브러리로사용자응용작성 (optional) : ar, ranlib
새로운시스템호출구현 (2/9) 새로운시스템호출구현예 : newsyscall() 이라는이름의새로운시스템호출구현 14 1. 새로운시스템호출번호할당 (~/arch/x86/syscalls/syscall_64.tbl)
새로운시스템호출구현 (3/9) 15 2. 새로운시스템호출함수 sys_call_table[] 에등록 sys_call_table /* arch/x86/kernel/syscalls/syscall_64.tbl 파일의내용 */ 0 common read sys_read 1 common write sys_write 2 common open sys_open 3 common close sys_close 4 common stat sys_newstat 5 common fstat sys_newfstat 316 common renameat2 sys_renameat2 317 common newsyscall sys_newsyscall 0 sys_read( ) sys_write( ) sys_open( ) sys_close( ) sys_newstat( ) 316 sys_renameat2() 317 sys_newsyscall()
새로운시스템호출구현 (5/9) 16 3. 새로운시스템호출함수커널에구현 /* include/linux/syscalls.h 파일의내용 */ asmlinkage long sys_fork(void); asmlinkage long sys_vfork(void); asmlinkage long sys_newsyscall(void); /* kernel/newfile.c 파일의내용 */ #include <linux/unistd.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/sched.h> asmlinkage long sys_newsyscall(void) printk("<0>hello Linux, I'm in Kernel\n"); return 0; EXPORT_SYMBOL_GPL(sys_newsyscall); printf() 가아니라 printk() 임에주의
새로운시스템호출구현 (6/9) 17 4. 커널컴파일및리부팅 커널컴파일전에 makefile 을다음과같이수정해야한다. /* kernel/makefile 의변경전내용 */ obj-y = fork.o exec_domain.o panic.o \ cpu.o exit.o itimer.o time.o softirq.o resource.o \ sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o task_work.o \ extable.o params.o posix-timers.o \ kthread.o sys_ni.o posix-cpu-timers.o \ hrtimer.o nsproxy.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o groups.o smpboot.o /* kernel/makefile 의변경전내용 */ obj-y = fork.o exec_domain.o panic.o \ cpu.o exit.o itimer.o time.o softirq.o resource.o \ sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o task_work.o \ extable.o params.o posix-timers.o \ kthread.o sys_ni.o posix-cpu-timers.o \ hrtimer.o nsproxy.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o groups.o smpboot.o newfile.o 커널컴파일후재부팅
새로운시스템호출구현 (7/9) 18 1. 새로운시스템호출을사용하는사용자수준응용작성 #include <linux/unistd.h> int main(void) syscall(325); return 0; #include <linux/unistd.h> int main(void) syscall( NR_newsyscall); return 0;
syscall 매크로 19 /* /usr/include/unistd.h */ extern long int syscall (long int sysno,...) THROW; /* glibc/sysdeps/unix/sysv/linux/i386/syscall.s */.text ENTRY (syscall) PUSHARGS_6 /* Save register contents. */ _DOARGS_6(44) /* Load arguments. */ movl 20(%esp), %eax /* Load syscall number into %eax. */ ENTER_KERNEL /* Do the system call. */ POPARGS_6 /* Restore register contents. */ cmpl $-4095, %eax /* Check %eax for error. */ jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */ L(pseudo_end): ret /* Return to caller. */ PSEUDO_END (syscall) /* glibc/sysdeps/unix/sysv/linux/i386/sysdep.h */ # define ENTER_KERNEL int $0x80
새로운시스템호출구현 (8/9) 20 2. 라이브러리로사용자응용작성 $ vi newsys.c $ gcc c newsys.c $ ar -r libnew.a newsys.o $vi test.c $ gcc O o test test.c L./ -lnew $./test #include <linux/unistd.h> #include <stdio.h> int newsyscall(void) syscall(317); #include <linux/unistd.h> int main(void) newsyscall(); return 0;
새로운시스템호출구현 (9/9) 21 새로운시스템호출처리과정 user task Kernel main() newsyscall() library 0x0 Irq_desc divide_error() debug() nmi() ENTRY(system_call) /* arch/i386/kernel/entry.s */ SAVE_ALL call *SYMBOL_NAME(sys_call_table)(,%eax,4) ret_from_sys_call (schedule, signal, bh_active, nested interrupt handling) newsyscall() movl 191, %eax int $0x80 0x80 system_call() 1 2 3 4 sys_call_table sys_exit() sys_fork() sys_read () sys_write () /* real handler */ sys_newsyscall() printk( ); 325 sys_newsyscall()
#include <linux/unistd.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/sched.h> #include <linux/mm_types.h> #include <linux/hugetlb.h> #include <linux/mm.h> asmlinkage long sys_newsyscall(unsigned long address) struct vm_area_struct *vma = find_vma(current->mm, address); unsigned int flags = 0; unsigned char * pa; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *ptep, pte; spinlock_t *ptl; struct page *page; struct mm_struct *mm = current->mm; page = follow_huge_addr(mm, address, flags & FOLL_WRITE); if (!IS_ERR(page)) BUG_ON(flags & FOLL_GET); goto out; page = NULL; pgd = pgd_offset(mm, address); if (pgd_none(*pgd) unlikely(pgd_bad(*pgd))) goto no_page_table; pud = pud_offset(pgd, address); if (pud_none(*pud)) goto no_page_table; if (pud_huge(*pud)) BUG_ON(flags & FOLL_GET); page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE); goto out; if (unlikely(pud_bad(*pud))) goto no_page_table; 가상주소변환 pmd = pmd_offset(pud, address); if (pmd_none(*pmd)) goto no_page_table; if (pmd_huge(*pmd)) BUG_ON(flags & FOLL_GET); page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE); goto out; if (unlikely(pmd_bad(*pmd))) goto no_page_table; ptep = pte_offset_map_lock(mm, pmd, address, &ptl); pte = *ptep; if (!pte_present(pte)) goto no_page; if ((flags & FOLL_WRITE) &&!pte_write(pte)) goto unlock; page = vm_normal_page(vma, address, pte); if (unlikely(!page)) if ((flags & FOLL_DUMP) ) goto bad_page; page = pte_page(pte); if (flags & FOLL_GET) get_page(page); pa = (unsigned char *)page_address(page); pa += address & (PAGE_SIZE - 1); printk("<0>[%s]: arg VA=[%x]\n", FUNCTION, address); printk("<0>[%s]: PGD(%p)=%x\n", FUNCTION, pgd, *pgd); printk("<0>[%s]: PUD(%p)=%x\n", FUNCTION, pud, *pud); printk("<0>[%s]: PMD(%p)=%x\n", FUNCTION, pmd, *pmd); printk("<0>[%s]: PTE(%p)=%x\n", FUNCTION, ptep, *ptep); printk("<0>[%s]: result PA=[%x], data=%d\n", FUNCTION, pa, *(int *)pa); unlock: pte_unmap_unlock(ptep, ptl); out: return page; bad_page: pte_unmap_unlock(ptep, ptl); return ERR_PTR(-EFAULT); no_page: pte_unmap_unlock(ptep, ptl); if (!pte_none(pte)) return page; no_page_table: if ((flags & FOLL_DUMP) && (!vma->vm_ops!vma->vm_ops->fault)) return ERR_PTR(-EFAULT); return page; 22
가상주소변환결과 23 Offset in the page frame
시스템호출구현확장 (1/9) 24 인자전달 기존시스템호출분석 커널정보얻기 모듈프로그래밍을이용한시스템호출구현 => 모듈프로그래밍장참조 Just Do It ( 百見不如一打 )
시스템호출구현확장 (2/9) 25 인자전달 : show_mult(arg1, arg2, result) 1. 새로운시스템호출번호할당 : 192 번 2. 새로운시스템호출함수 sys_call_table[] 에등록 3. 새로운시스템호출함수커널에구현 #include<linux/unistd.h> #include<linux/kernel.h> #include<asm-x86/uaccess.h> asmlinkage int sys_show_mult(int x, int y, int* res) int error, compute; int i; error = access_ok(verify_write,res,sizeof(*res)); if(error < 0) printk("error in cdang\n"); printk("error is %d\n",error); return error; compute = x*y; printk("computeis %d\n",compute); i= copy_to_user(res,&compute,sizeof(int)); return 0; 4. 커널컴파일및리부팅
시스템호출구현확장 (3/9) 26 인자전달 : show_mult(arg1, arg2, result) 1. 사용자수준응용 #include <stdio.h> #include <linux/unistd.h> int main(void) int mult_ret = 0; int x = 2,y=5; int i; i=syscall(325,x,y,&mult_ret); printf("x is %d\ny is %d\nret is %d\n",x,y,mult_ret); return 0;
시스템호출구현확장 (4/9) 27 커널정보얻기 : gettaskinfo() header #include<linux/kernel.h> #include<linux/sched.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/fs.h> #include <linux/fdtable.h> struct mystat pid_t pid; pid_t ppid; int stat; int priority; int policy; long utime; long stime; long starttime; unsigned long min_flt; unsigned long maj_flt; long open_files; ;
시스템호출구현확장 (5/9) 커널정보얻기 : gettaskinfo() 커널함수 #include mystat.h asmlinkage int sys_gettaskinfo(int id, struct mystat *user_buf) long ret = 0; struct mystat *buf; int i, cnt ; struct task_struct *search; cnt = i = 0; search = pid_task(find_vpid(id), PIDTYPE_PID); if(search) printk(kern_err "search pid: %d n", search->pid); if(!user_buf->starttime) return -1; buf = (char *)kmalloc(sizeof(struct mystat),gfp_kernel); if(buf == NULL) printk("[sm] buf is NULL n"); 28 return -1; buf->pid = search->pid; buf->ppid = search->parent->pid; buf->stat = search->state; buf->priority = search->prio; buf->policy = search->policy; buf->utime = search->utime; buf->stime = search->stime; buf->starttime = search->start_time.tv_sec; buf->min_flt = search->min_flt; buf->maj_flt = search->maj_flt; for(i=0; i<32; i++) if((search->files->fd_array[i])!= NULL) cnt++; buf->open_files = cnt; ret = copy_to_user(user_buf,buf,sizeof(struct mystat)); printk(kern_err "[SM] copy_to_user return: %ld n", ret); return 0;
시스템호출구현확장 (6/9) 29 커널정보얻기 : gettaskinfo() 사용자수준응용 #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include "mystat.h" int main(int argc, char * argv[]) int task_number; struct mystat* mybuf; if(argc!= 2) printf("usage : a.out pid n"); exit(1); task_number = atoi(argv[1]); printf("mybuf size : %d n",sizeof(mybuf)); if(mybuf == NULL) exit(1); syscall(319,task_number,mybuf); printf("pid is %d n",(int)mybuf->pid); printf("ppid is %d n",(int)mybuf->ppid); printf("state is %d n",(int)mybuf->stat); printf("policy is %d n",(int)mybuf->policy); printf("file count is %d n",mybuf->open_files); printf("start time is %d n",mybuf->starttime); return 0;
시스템호출구현확장 (7/9) 30 기존시스템호출분석 getpid asmlinkage int sys_getpid() current->tgid; all tasks connected using double linked list (next_task, next_run) global variable: init_task, current task[0]: init_task, task[1]: init process nice asmlinkage int sys_nice(new_priority) current->priority = newpriority ; pause asmlinkage int sys_pause() current->state = TASK_INTTERUPTIBLE; schedule();
시스템호출구현확장 (8/9) 31 기존시스템호출분석 fork /* arch/i386/kernel/process.c */ sys_fork() - p = alloc_task_struct() - task structure initialize - copy_mm() - copy_thread() -wake_up_process(p) - return (p->pid) /* kernel/fork.c */ do_fork() /* arch/i386/kernel/entry.s */ ret_from_sys_call() /* kernel/sched.c */ schedule() /* arch/i386/kernel/process.c */ copy_thread() - p->tss.eax = 0; - p->tss.eip = ret_from_fork; /* kernel/sched.c */ wake_up_process() - add_to_runqueue(p); - current->need_resched = 1 if (schedule parent) else (schedule child)
시스템호출구현확장 (9/9) 32 기존시스템호출분석 exit /* kernel/exit.c */ sys_exit() - sem_exit() - exit_mmap() - free_page_tables() - exit_files() - exit_thread() /* kernel/exit.c */ do_exit() - handling each child process - current->state=task_zombie -schedule() /* kernel/signal.c */ notify_parent()