프로젝트 1 Linux Task Management 단국대학교컴퓨터학과 2009 백승재 ibanez1383@dankook.ac.kr k k http://embedded.dankook.ac.kr/~ibanez1383
Linux 의 task 개념이해 강의목표 Task 자료구조파악 Linux 의 task 관리방법파악
Task, Process and Thread 3 Process 실행상태에있는프로그램의 instance... 자원소유권의단위 Thread 디스패칭의단위, 실행흐름... 수행의단위 In Linux source code Process is Task and Thread is also Task
Linux Task Model 4 User Level Internal structure Process A Process B Process C : Process : Thread : task_struct struct 구조체 : Process Discriptor(8KB) ( 프로세스디스크립터 + 커널모드스택 ) : 커널모드스택 Kernel Level task_structstruct task_structstruct task_structstruct task_struct task_struct task_struct task_struct Kernel Thread A task_struct Kernel Thread B task_struct Scheduler
Flow controls Task 관련함수흐름 5 User App Library System call fork() fork() clone() sys_clone() vfork() vfork() vfork() sys_vfork() clone() clone() clone() sys_fork() pthread_create() pthread_create() clone() User Level Kernel Level do_fork() Using strace, ltrace, ctags and cscope kernel_thread()
Thread example(1/5) 6 _syscall0(pid_t, gettid); int main(void) { int pid; printf("before fork n n"); } if((pid = fork()) < 0){ printf("fork error n"); exit(-2); }else if (pid == 0){ printf("child, tgid=(%d), pid=(%d) n", getpid(), gettid()); }else{ } exit(0); i(0) printf("parent, tgid=(%d), pid=(%d) n", getpid(), gettid()); sleep(2);
_syscall0(pid_t, gettid); Thread example(2/5) 7 void *t_function(void *data) { int id; int i=0; pthread_t t_id; id = *((int *)data); } int main() { printf("child, tgid=(%d), pid=(%d), pthread_self()=(%d) n", (%d) getpid(), gettid(), pthread_self()); sleep(2); return (void *)(id*id); pthread_t p_thread[2]; int thr_id; int status; int a = 1; int b = 2; printf("before pthread_create() n"); if((thr_id = pthread_create(&p_thread[0], NULL, t_function, (void*)&a)) < 0){ perror("thread create error : "); exit(1); } if((thr_id = pthread_create(&p_thread[1], NULL, t_function, (void*)&b)) < 0){ perror("thread create error : "); exit(2); } pthread_join(p_thread[0], (void **)&status); printf("thread join : %d n", status); pthread_join(p_thread[1], (void **)&status); printf("thread thread join : %d n", status); } printf("parent, tgid=(%d), pid=(%d) n", getpid(), gettid()); return 0;
Thread example(3/5) 8
Thread example(4/5) 9 _syscall0(pid_t, gettid); int main(void) { pid_t pid; printf("before vfork n"); if((pid = vfork()) < 0){ printf("fork error n"); exit(-2); }else if (pid == 0){ printf("child child, tgid=(%d), pid=(%d) n", getpid(), gettid()); _exit(0); }else{ printf("parent, tgid=(%d), pid=(%d) n", getpid(), gettid()); } } exit(0);
Thread example(5/5) 10 _syscall0(pid_t, gettid); int sub_a(void *); int main(void) { int child_a_stack[4096], child_b_stack[4096]; printf("main:before clone n"); printf("parent, t tgid=(%d), pid=(%d) n", getpid(), gettid()); CLONE_THREAD option 추가시같은 tgid를가짐 clone (sub_a, (void *)(child_a_stack+4095), CLONE_VM, NULL); clone (sub_a, (void *)(child_b_stack+4095), CLONE_VM, NULL); } exit(0); int sub_a(void *arg) { printf("child, tgid=(%d), pid=(%d) n", getpid(), gettid()); exit(1); }
Task in Linux 11 POSIX interface 기반 구현은버전에따라약간씩다름 Linux 1.X User / kernel mode 지원 Thread 생성시 thread의 attribute를지정해줌으로써 thread mode를설정할수있음 Linux 2.X Kernel mode 만지원 (sys_clone) LinuxThreads라고불림 각 thread마다 task 자료구조할당 (MACH의 thread와는다름 ) POSIX incompatible 문제 (pid/tid, signal handling, ) Linux 2.6 NPTL (Native POSIX Thread Library)
Task in Linux 12 LinuxThreads vs NPTL
Task, Process and Thread 13 Task 는 사용자의요청을대리하며시스템자원을두고서로경쟁한다 자신의메모리공간 ( 코드, 변수, 스택 ) 과하드웨어레지스터를갖는다 태스크계층구조를갖는다 상태와전이를갖는다. /* test.c */ int glob = 6; char buf[] = a write to stdout n ; int main(void) { int var; pid_t pid; var = 88; write(stdout_fileno, buf, sizeof(buf)-1); printf( before fork n ); if ((pid = fork()) == 0) { /* child */ glob++; var++; } else sleep(2); /* parent */ printf( pid = %d, glob = %d, var = %d n, getpid(), glob, var); exit (0); } (Source : Adv. programming in the UNIX Env., pgm 8.1)
Process State Diagram(1/2) 14 New Exit Admit dispatch Release Ready Running Event Occurs Time-Out Event Wait Blocked Five-State Process Model (Source : OS(stallings))
Process State Diagram(2/2) 15 fork initial (idle) running exit zombie wait fork switch sleep, lock ready wakeup, unlock sleep swap swap suspended d ready suspended d sleep (Source : UNIX Internals)
Task State Diagram of LINUX(2.4) 16 create wait signal TASK_STOPPED signal TASK_ZOMBIE schedule exit TASK _ RUNNING (ready) TASK_RUNNING (running) preempt Event Occurs Event Wait TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE
Task State Diagram of LINUX(2.6) 17 create TASK_STOPPED signal TASK_TRACED signal EXIT_ZOMBIE wait schedule exit EXIT_DEAD TASK _ RUNNING (ready) TASK_RUNNING (running) preempt Event Occurs Event Wait TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE
문맥 (Context) : Context 커널이관리하는태스크의자원과수행환경집합 3 부분으로구성 : 시스템문맥 (system context), 메모리문맥 (memory context), 하드웨어문맥 (hardware context) system context task_structstruct segment table page table memory 18 file table fd thread structure eip eflags sp eax cs (TSS) hardware context swap or a.out disk memory context
System Context 19 태스크를관리하는정보집합 태스크정보 : pid, uid, euid, suid, 태스크상태 : 실행상태, READY 상태, 수면상태, 태스크의가족관계 : p_pptr, p_cptr, next_task, next_run 스케줄링정보 : policy, priority, counter, rt_priority, need_resched 태스크의메모리정보 : 세그먼트, 페이지 태스크가접근한파일정보 : file descriptor 시그널정보 쓰레드정보 그외 : 수행시간정보, 수행파일이름, 등등 (kernel dependent)
Task 와 file 20 파일관리자료구조 : fd, inode include/linux/sched.h task_struct... fs_struct files_struct... struct tfs_struct t{ atomic_t count; int umask; struct dentry *root, *pwd; } include/linux/dcache.h include/linux/sched.h struct files_struct { atomic_t count; int max_fds; struct file **fd; fd_set close_on_exec; fd_set open_fds; } include/linux/fs.h struct file { struct file *f_next, **f_pprev; struct dentry f_dentry;; struct file_operations *f_op; mode_t f_mode; loff_t f_pos; unsigned int f_count, f_flags;. struct dentry { int d_count, d_flags; struct inode *d_inode; struct dentry *d_parent;. unsigned char d_iname[] } include/linux/fs.h } struct inode {. }
Task 와 signal 21 시그널처리자료구조 task_structstruct. sig blocked signal sigpending. signal_struct count action[_nsig] siglock sigaction sa_handler sa_flags sa_restorer sa_mask sigset_t. 63 0 /* kernel/signal.c */ sys_signal(sig, handler) do_sigaction(sig, new_sa, old_sa) sigset _ t. 63 0 /* kernel/signal.c */ sys_kill(pid,sig) kill_proc_info(sig, info, pid) send_sig_info(sig, info, *t) sigaddset(t->signal, sig); t->sigpending = 1;
Task 연결구조 22 태스크연결구조 init_task task_struct task_struct task_struct task_struct next_task next_task next_task next_task... prev_task prev_task prev_task prev_task next_run prev_run next_run prev_run next_run prev_run run_queue where is run_queue? prev_task? next_task? 태스크가족관계 children.next task_structstruct parent children.prev parent, sibling.prev parent, sibling.next parent sibling.next sibling.next younger child child older child task_struct t t sibling.prev task_struct t t sibling.prev task_struct t t
Task 계층구조확인 (1/2) 23
Task 계층구조확인 (2/2) 24
Memory Context (1/5) 25 fork internal : compile results gcc test.c 0xffffffff 0xbfffffff kernel stack header text data bss stack movl %eax, [glob] addl %eax, 1 movl [glob], %eax... glob, buf var, pid 0x0 data text user s perspective (virtual memory) a.out : (ELF format) /*include/linux/elf.h /li / lf */
Memory Context (2/5) 26 fork internal : after loading (after run a.out) & before fork segment table memory task_struct pid = 11 (vm_area_struct) text var, pid stack data glob, buf In this figure, we assume that there is no paging mechanism.
Memory Context (3/5) 27 fork internal : after fork memory glob, buf task_struct pid = 11 segment data text var, pid stack task_struct pid = 12 segment data glob, buf var, pid stack address space : basic protection ti barrier
Memory Context (4/5) 28 fork internal : with COW (Copy on Write) mechanism after fork with COW after glob++ operation task_struct segment task_struct segment data pid = 11 text pid = 11 text stack stack task_struct segment task_struct segment pid = 12 data pid = 12 data memory memory
Memory Context (5/5) 29 새로운프로그램수행 : execve() internal memory task_struct segment data pid = 11 text stack text data a.out header textt data bss stack stack
Task 의 memory context 30 메모리관리자료구조 task_struct include/linux/sched.h, include/linux/mm.h, include/asm- i386/page.h mm_struct vm_area_struct text mm mmap map_count pgd... start_code, end_code start_data, end_data start_brk, brk start_stack arg_start, arg_end env_start, env_end; rss,... vm_start, vm_end vm_next vm_offset, vm_file vm_start, vm_end vm_next vm_offset, vm_file data swap or a.out pgd_t
Hardware Context (1/4) 31 brief reminds the 80x86 architecture ALU IN Control U. Registers OUT eip, eflags eax, ebx, ecx, edx, esi, edi, cs, ds, ss, es, fs,... cr0, cr1, cr2, cr3, LDTR, TR,...
Hardware Context(2/4) 32 stack 쓰레드자료구조 : CPU 추상화 stack 2 heap EIP heap data text movl $2, %eax pushl %eax addl %eax, %ebx ESP EAX.... registers 10 2 movl $10, %eax call func1 data text Address space for Task A Address space for Task B 다시 Task A 가수행되려면? 문맥교환 (Context Switch) thread structure
Hardware Context(3/4) 33 쓰레드자료구조 : CPU 추상화 stack stack 2 heap dt data EIP ESP EAX... heap data text movl $2, %eax pushl %eax addl %eax, %ebx Address space for Task A EIP ESP EAX registers EIP ESP EAX movl $10, %eax call func1 Address space for Task B text...... tss struct. for task A tss struct. for task B
Hardware Context(4/4) 34 문맥교환 (Context Switch) save context CPU restore context task TSS eip sp eflags eax ebx task TSS eip sp eflags eax ebx
Task 의 H/W context 35 쓰레드자료구조 : CPU 추상화 include/asm-i386/processor.h task_struct... struct thread_struct thread;... include/asm-arm/processor.h struct thread_struct { unsigned long r4; unsigned dlong r5; unsigned long r6; unsigned long r7; unsigned long r8; unsigned long r9; unsigned long sl; unsigned long fp; unsigned long pc;.. struct thread_struct { unsigned long esp0; unsigned short ss0; unsigned long esp1; unsigned short ss1; unsigned long esp2; unsigned short ss2; unsigned long cr3; unsigned long eip, eflags; unsigned long eax, ecx, edx, ebx; unsigned long esp; unsigned long ebp, esi, edi; unsigned short es, cs, ss, ds, fs, gs; unsigned short ldt;..
Kernel Stack 36 Thread union 커널은각태스크별로 8KB 메모리할당 thread_ info 구조체와 kernel stack alloc_thread_struct, free_thread_struct pt_regs CPU Stack Pointer Register stack current thread_info{ *task flags(need_resched ) } task_struct
Kernel mode 진입 37 INT, syscall kernel mode 로전환 control path : kernel mode 로진입하기위한일련의명령어 struct pt_regs? 현재 P 의상태를커널스택에저장하기위해사용 sys_open() 처럼인자가임의의개수인경우저장된 pt_regs구조체의정보중위에서부터차례로전달됨 sys_fork() 처럼 pt_regs를통째로받는경우도있음 변환한 IRQ 번호
Linux 의 Task 생성 38 프로세스관련 Interface fork(), vfork(), clone() do_fork() copy_process() dup_task_struct(): 새커널스택, 부모의 task_struct복사 get_pid() 를호출해새로운 PID할당 플래그에따라열린파일, 파일시스템정보, 시그널핸들러, 프로세스주소영역, namespace등을복제 / 공유 하위바이트는자식프로세스종료시부모프로세스에전달할시그널신호 ( 보통 SIGCHLD), 나머지 3바이트는아래와같음 CLONE_VM : 메모리디스크립터와모든 PT 공유 CLONE_FS : 루트 dir과 CWD나타내는 table과 umask 값공유 CLONE_FILES : 열린 file나타내는테이블공유 CLONE_PARENT : 새로만들어지는자식 P의부모를호출P의부모로설정 CLONE _PID : PID 공유 CLONE_PTRACE : ptrace() 로부모 P 추적중이라면, 자식 P도추적가능 CLONE_SIGHAND : 시그널핸들러나타내는 table공유 CLONE_THREAD : 자식P를부모P와같은스레드그룹에넣고, 자식P의 tgid필드도이에알맞게설정, CLONE_PARENT자동설정됨 CLONE_SIGNAL : CLONE_SIGHAND + CLONE_THREAD 멀티스레드 APP 에있는모든스레드에시그널송신가능 CLONE_VFORK : vfork() syst call서사용함 남은 time slice 를부모와자식간에분배 새로운프로세스의포인터반환
Linux 의 Task 제거 39 exit() do_exit() task_struct구조체의 flags멤버에 PF_EXITING플래그설정 exit_mm(), sem_exit(), exit_files(), exit_fs(), exit_namespace(), exit_sighand() 태스크의종료코드 task_struct 구조체의 exit_code 멤버 exit_notify() 부모에게시그널자식들의부모를같은스레드그룹의다른스레드나혹은 initit 프로세스로지정 고아라면? forget_original_parent() 를사용함 Reaper를프로세스의스레드그룹에있는다른태스크로설정 다른태스크가없으면 reaper 를 child_reaper 로설정 -> 이것이바로 initit 프로세스이다 태스크의상태를 TASK_ZOMBIE로설정 이후에는부모에게정보를전달하기위한커널스택과슬랩객체즉, thread_info와 task_struct구조체가남아있다. schedule() 함수를호출해새프로세스로스위칭
스케줄링기법 40 선점지원여부에따른분류 선점스케줄링 비선점스케줄링 우선순위변경지원여부에따른분류 정적우선순위스케줄링 동적우선순위스케줄링 구체적인스케줄링알고리즘 FIFO 스케줄링 다단계피드백큐스케줄링 SJT, SRT 스케줄링 EDF? RM? RR?...
Priority Inversion Problem 41 R/T TASK 1 TASK 2 TASK 3 3 1 3 2 TASK 3 1 해결책은? Priority inheritance task 3 의우선순위를 R/T task 1 의우선순위와같게하여수행후, R/T task 1이기다리는자원에대한처리가끝나면원래우선순위로복귀시킴 동적우선순위변경이가능해야한다
Multi-CPU 를위한리눅스특징 42 스케줄링 Run queue per each CPU T do_fork() Effective priority based O(1) scheduler Priority, affinity, interactivity, Load balancing wake_up_new_task() Run Queue schedule() T T CPU T Task list Run Queue schedule() T T T T T T T T CPU wake_up() load_balance() Run Queue schedule() T CPU
Linux Scheduling 43 Multi-level feed back queue based on priority Task의종류 실시간태스크 (SCHED_FIFO, SCHED_RR) 정적우선순위 0 ~ 99 일반태스크 (SCHED_OTHER) 동적우선순위 Nice -20 ~ 19 100 ~ 139 타임슬라이스 Interactivity를고려한 time slice와 priority의재계산 최소 10ms 기본 100ms 최대 200ms
Run Queue 44 실행큐 프로세서에있는실행가능한프로세스의목록 우선순위배열 O(1) scheduling 각실행큐는 2개의우선순위배열을가짐활성 (active), 비활성 (expired) 각우선순위레벨마다의실행가능프로세스큐를유지각큐에는해당우선순위레벨의실행가능한프로세스목록유지 우선순위비트맵사용 상수시간내에최고우선순위태스크선택가능 특정우선순위의태스크가 TASK_RUNNING상태가되면해당우선순위비트가 1로 set됨 sched_find_first_bit() d 이용첫번째 1 로 set 된비트찾음 같은우선순위끼리는라운드로빈
Run Queue 동작 45 타임슬라이스의재계산 기존의타임슬라이스재계산위한루프형태의루틴탈피단지, 활성과비활성배열을스위칭하는것으로끝남 schedule() 우선순위 7의실행가능태스크리스트를뒤짐 sched_find_first_set() queue[0] queue[1] queue[2] queue[7] 리스트의첫번째 프로세스를실행 queue[139] prio_array_t.bitmap prio_array_t.queue
schedule() 함수의호출 46 schedule() 함수 직접적인호출 (direct invocation) current process 가필요로하는자원을사용할수없어, 이 process 를당장블록해야하는경우 Process를 block 하려는 kernel routine의단계 current를적당한 wait queue에넣는다 current 의상태를 TASK_INTERRUPTIBLE 나 TASK_UNINTERRUPTIBLE 로만든다 schedule() 을호출 자원이사용가능한지검사하여불가하면 2단계로돌아감 사용가능하면 current 를대기큐에서제거 우회적인호출 (lazy invocation) current 의 need_resched 필드를 1 로설정해우회적호출함 쓰이는경우 current 가자신의 CPU 시간 quantum 을모두썼을때, update_process_times() 가설정함 임의의 process 가깨어났는데, 그 process 의우선순위가현재 process 의우선순위보다높을때. seched_setscheduler() 이나, sched_yield() 시스템콜호출시
CFS basic concept 47 If N tasks are on the Runqueue fair_clock(virtual clock) Real Clock * 1/N wait_runtime Waiting time of a task To sort tasks on the red-black tree fair_clock wait_runtime
2.6.23 부터도입된 CFS 스케줄러의자료구조 CPU CPU CPU runqueues struct rq q{ struct cfs_rq cfs; struct rt_rq rt; u64 clock, prev_clock_raw; u64 clock_max_delta; unsigned int clock_warps, clock_overflows; u64 tick_timestamp }; struct cfs_rq { struct load_weight load; unsigned long nr_running; u64 exec_clock; clock; u64 min_vruntime; struct rb_root tasks_timeline; struct rb_node *rb_leftmost; struct rb_node *rb_load_balance_curr; struct sched_entity entity *curr; unsigned long nr_spread_over; #ifdef CONFIG_FAIR_GROUP_SCHED struct rq *rq; struct list_head leaf_cfs_rq_list; #endif }; struct task_group *tg; struct rt_rq { struct rt_prio_array active; int rt_load_balance_idx; struct list_head *rt_load_balance_head, *rt_load_balance_curr; }; #define MAX_RT_PRIO MAX_USER_RT_PRIO struct rt_prio_array { DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); struct list_head queue[max_rt_prio]; };
task 와스케줄러관련구조체 struct task_struct { int prio, static_prio, normal_prio; struct list_head run_list; const struct sched_class *sched_class; struct sched_entity se; unsigned int policy; cpumask_t cpus_allowed; unsigned int time_slice; } struct sched_entity entity { struct load_weight load; struct rb_node run_node; unsigned int on_rq; u64 exec_start; u64 sum_exec_runtime; u64 vruntime; u64 prev_sum_exec_runtime; #ifdef CONFIG_FAIR_GROUP_SCHED FAIR GROUP struct sched_entity *parent; struct cfs_rq *cfs_rq; struct cfs_rq *my_q; #endif }; SCHED_NORMAL SCHED_BATCH SCHED_IDLE SCHED_FIFO SCHED_RR static const struct sched_class fair_sched_class = {.next = &idle_sched_class,.enqueue_task = enqueue_task_fair,.dequeue_task = dequeue_task_fair,.yield_task = yield_task_fair,.check_preempt_curr = check_preempt_wakeup,.pick_next_task = pick_next_task_fair,.put_prev_task = put_prev_task_fair, #ifdef CONFIG_SMP.load_balance = load_balance_fair,.move_one_task = move_one_task_fair, #endif.set_curr_task = set_curr_task_fair,.task_tick tick = task_tick_fair, tick.task_new = task_new_fair, }; const struct sched_class rt_sched_class = {.next = &fair_sched_class, class,.enqueue_task = enqueue_task_rt,.dequeue_task = dequeue_task_rt,.yield_task = yield_task_rt,.check_preempt_curr = check_preempt_curr_rt,.pick_next_task task = pick_next_task_rt, task.put_prev_task = put_prev_task_rt, #ifdef CONFIG_SMP.load_balance = load_balance_rt,.move_one_task = move_one_task_rt, #endif.set_curr_task = set_curr_task_rt,.task_tick = task_tick_rt, };
schedule() 함수의동작 /* kernel/schedc */ schedule() pick_next_task() if( RT 태스크가없으면 ) fair_sched_class의 pick_next_task() 함수호출 else RT class, fair class 순으로 pick_next_task() 함수를호출하여태스크선정
Test Code 51 #include <stdio.h> #include <unistd.h> #include <sched.h> int main(void) { struct sched_param param; int i, j; sched_getparam( 0, ¶m); printf(" nbefore set n"); printf(" Param.priority = %d n", param.sched_priority); printf(" Sched policy = %d n", sched_getscheduler(0)); 1 param.sched a ed_priority = 10; sched_setscheduler(0, SCHED_FIFO, ¶m); sched_getparam( 0, ¶m); } printf(" nfifo set n"); printf(" Param.priority = %d n",p param.sched_priority); printf(" Sched policy = %d n", sched_getscheduler(0)); 2 param.sched_priority = 20; sched_setscheduler(0, SCHED_RR, ¶m); sched_getparam( 0, ¶m); printf(" nrr set n"); printf(" Param.priority = %d n", param.sched_priority); printf(" Sched policy = %d n", sched_getscheduler(0)); return 0; 3 아래코드를화살표가가리키고있는세군데에각각넣고실행해보자. 실행도중키보드를누르면반응이있는가? 언제반응이있고, 언제없는가? for(i=0; i<100000; i++) for(j=0; j<100000; j++);
Context Switching 52 Context? 사용자주소공간 : code, data, stack, 공유메모리제어정보 : task_struct struct credentials : 보안유지위한정보, uid.gid 환경변수 H/W context : reg 값 진행과정 필요시 schedule() 가 context_switch() 호출 switch_mm() 을호출 이전프로세스의가상메모리매핑을새프로세스의것으로대체 switch_to() 를호출 프로세서상태를이전프로세스에서현재프로세스로전환 스택정보와프로세서레지스터값저장 / 복원
switch_to() in ARM (1/16) 53 Code Window Memory Dump Window CPU register window
switch_to() in ARM (2/16) 54
switch_to() in ARM (3/16) 55
switch_to() in ARM (4/16) 56
switch_to() in ARM (5/16) 57
switch_to() in ARM (6/16) 58
switch_to() in ARM (7/16) 59
switch_to() in ARM (8/16) 60
switch_to() in ARM (9/16) 61
switch_to() in ARM (10/16) 62
switch_to() in ARM (11/16) 63
switch_to() in ARM (12/16) 64
switch_to() in ARM (13/16) 65
switch_to() in ARM (14/16) 66
switch_to() in ARM (15/16) 67
switch_to() in ARM (16/16) 68