저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우, 이저작물에적용된이용허락조건을명확하게나타내어야합니다. 저작권자로부터별도의허가를받으면이러한조건들은적용되지않습니다. 저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다. 이것은이용허락규약 (Legal Code) 을이해하기쉽게요약한것입니다. Disclaimer
工學博士學位請求論文 분산이기종컴퓨팅시스템을위한휴리스틱기반태스크스케줄링알고리즘에관한연구 A Study on Heuristic Based Task Scheduling Algorithms for Distributed Heterogeneous Computing Systems 2010 年 8 月 仁荷大學校大學院 電子工學科 ( 情報工學專攻 ) 尹完五
工學博士學位請求論文 분산이기종컴퓨팅시스템을위한휴리스틱기반태스크스케줄링알고리즘에관한연구 A Study on Heuristic Based Task Scheduling Algorithms for Distributed Heterogeneous Computing Systems 2010 年 8 月 指導敎授崔相昉 이論文을博士學位論文으로提出함 仁荷大學校大學院 電子工學科 ( 情報工學專攻 ) 尹完五
이論文을尹完五의博士學位論文으로認定함 2010 年 8 月 主審 副審 委員 委員 委員
- i -
- ii -
- iii -
- iv -
- v -
- vi -
- vii -
- viii -
_ _ _ _ - ix -
- x -
- xi -
- xii -
- xiii -
- xiv -
- 1 -
- 2 -
- 3 -
- 4 -
- 5 -
- 6 -
sub-task decomposition, dependence analysis scheduling, mapping execute subtask processor 1 subtask Application specification subtask subtask subtask subtask executable program subtask subtask processor 2 processor 3 subtask subtask - 7 -
Distributed Computing System Distributed Homogeneous Computing System Distributed Heterogeneous Computing System Dedicated Distributed Heterogeneous Computing System Shared Distributed Heterogeneous Computing System - 8 -
- 9 -
Meshes 1D 2D 3D Hypercubes Acyclic Linear network 5 X 4 Grid Part of acyclic torus 3D Cyclic Ring 5 X 4 Cyclic grid 2 X 2 X 2 Cyclic torus 4D - 10 -
Scheduling architectures Centralized Hierarchical Decentralized - 11 -
Parallel Program Independent Tasks Task Graph Non-DAG DAG Task Interaction Graph(TIG) Iterative Task Graph(ITG) - 12 -
- 13 -
v 1 v 1 v2 v3 v2 v3 v 1 v 2 v4 v5 v 3 v4 v5 v 6 v4 v5 v 6-14 -
- 15 -
- 16 -
// General List Scheduling Algorithm Calculate the priority of each task according to some predefined function PriorityList = {,,..., } is sorted by descending order of task priorities while PriorityList is not empty do Remove the first task from the PriorityList and assign it to an appropriate processor in order to optimize a predefined cost function endwhile - 17 -
max min // HCPT Algorithm Traverse the graph downward and compute for each task Traverse the graph upward and compute for each task s on the stack in reverse order of their while is not empty do if there is an unlisted parent of then the parent node on else the and enqueue it to endif endwhile while not the end of do dequeue from for each processor in the processor set do Compute Assign task to the processor that minimizes of endfor endwhile - 18 -
// HEFT Algorithm Compute for all tasks by traversing graph upward, starting from the exit task Sort the tasks in a scheduling list by decreasing order of value while there are unscheduled tasks in the list do Select the first task, from the list for scheduling for each processor in the processor-set do Compute value using the insertion-based scheduling policy Assign task to the processor that minimizes of task endfor endwhile - 19 -
- 20 -
//CPOP Algorithm Compute for all tasks by traversing graph upward, starting from the exit task Compute for all tasks by traversing graph downward, starting from the start task Compute for each task in the graph. where is start node. where is the set of tasks on critical path while is not exit task do Select where and endwhile Select the critical path processor ( ) which minimizes Initialize the priority queue with the start task while there is an unscheduled task in the priority queue do Select the highest priority task from priority queue if then else endif endwhile Assign the task on Assign the task to the processor which minimizes the Update the priority queue with successors of. if they become ready tasks - 21 -
//GCA Algorithm Initially, construct an array of Boolean and a stack on top of while is not empty do Peek task on the top if all are, for all or node is then node from top of and put into scheduling list else for each task, where do if = then Put _ into container endif endfor endif nodes from into by non-decreasing order according to their _ Reset empty endwhile - 22 -
//LMT Algorithm Procedure : LMT Level sort tasks for each level, in order, do Assign tasks in level using Min-Time endfor Procedure : Min-Time for each task do Let = average value of for all possible Adjust average values based upon when results will be needed endfor Sort groups in reverse order by for each task in sorted order do Find such that processor does not have a task assigned to it and is minimal Assign task to processor endfor - 23 -
log - 24 -
log //PETS Algorithm Level sort tasks for all tasks at each level do Compute, and Tie, if any is broken based on value, the task with minimum value Receives the higher priority followed by the task with next minimum value and so on endfor Construct a priority queue using while there are unscheduled tasks in the queue do Select the first task, form the priority queue for scheduling for each processor in the processor set do Compute value using insertion based upon scheduling policy endfor Assign the task to processor, which minimizes the endwhile - 25 -
- 26 -
_ _ - 27 -
_ _ - 28 -
- 29 -
_ - 30 -
v 1 v 1 v 2 v 3 v 2 v 3 v 4 v 5 v 4 v 5 v 6 v 6 _ - 31 -
_ _ _ - 32 -
v s start node, pred( vs) = 0 v2 v3 parent node, pred( v5) v4 v5 v6 v 7 v 8 child node, succ( v5) v e exit node, succ( ve) = 0-33 -
- 34 -
- 35 -
Top vs Critical Path t _ level( vi) vi b _ level( vi) ve Bottom _ _ _ _ //Compute _ of Insert in inverse topological order into sequential list for each in do, _ _ for each do if _ then _, _ _ endif _ endfor endfor _ _ _ _ - 36 -
_ max _ _ _ _ max _ //Compute _ of Insert in topological order into sequential list // for each in do for each do if _ then _ endif _ endfor endfor _ _ _ _ _ _ _ - 37 -
i f max max max _ _ _ - 38 -
_ _ _ - 39 -
_ v s RP CP= RP1 2 v2 v3 ORPN v4 v5 v6 v e _ - 40 -
v s v2 v 3 v 4 v 5 v6 v 7 v8 v9 v e _ _ _ _ _ _ _ _ _ _ - 41 -
14 16 9 13 13 19 18 16.7 11 13 19 14.3 13 8 17 12.7 12 13 10 11.7 13 16 9 12.7 7 15 11 11 5 11 14 10 18 12 20 16.7 21 7 16 14.7 _ 108.1 105 98.7 93.1 77 - - 42 -
Iteration Stack Stack list Checked parents 1,,, - 2,, - 3,, - - 4, - 5-6, - 7 - - 8, - 9-10, - 11-12, - - 13 - - 43 -
//Priority Decision Phase Set the computation costs of nodes and communication costs of edges with mean values Compute _ and for each node according to Eqns. (2.4) and (4.7) Determine using Build using while is not empty do if there is an unlisted parent of then if then else endif else if then enqueue to endif endif endwhile - 44 -
node p1 p2 p3 p1 p2 p3 p1 p2 p3-45 -
- 46 -
_ _ _ _ _ _ _ _ _ _ _ _ _ - 47 -
Processor Group Node Processor selected 0 14 0 16 0 9 - - 27 40 27 46 9 27 - - 21 32 21 34 27 46 - - 32 39 55 70 55 66 - - 39 52 18 26 27 44 - - 39 51 26 39 27 37 - - 50 68 50 62 49 69 62 - - - 43 55 - - - 55 39 52 55 71 27 36 - - 53 58 55 66 53 67 - - 68 89 69 76 69 85 - - _ _ _ _ - 48 -
_ _ _ _ _ - 49 -
p1 p2 p3 p1 p2 p3 vs vs v s v s v 2 v 2 v 3 v 3 v 2 v 2 v 7 v 3 v v7 7 p1 p2 p3 p1 p2 p3 v s v s v 3 v 4 v 2 v 3 v 4 v 2 v 7 v 4 v 7 v 5 v 4 v 9 v 9 v 9-50 -
p1 p2 p3 p1 p2 p3 v s v s v 3 v 4 v 2 v 3 v 4 v 2 v 7 v 5 v 7 v 5 v 6 v 9 v 6 v 9 v 8 v 6 p1 p2 p3 p1 p2 p3 v s v s v 3 v 7 v 4 v 5 v 2 v 6 v 3 v 7 v 4 v 5 v 2 v 6 v 9 v 9 v 8 v 8 v e v e v e v e - 51 -
p1 p2 p3 p1 p2 p3 v s v s v 3 v 7 v 4 v 2 v 5 v 3 v 7 v 4 v 2 v v5 6 v 6 v 8 v 9 v 8 v 9 v e v e - 52 -
//Look-ahead Insertion Policy while there are ungrouped nodes in the list do if current node has a dependence with a node in and then Add to node to else,, add to node to endif endwhile while there are unscheduled nodes in the do if ( == 2) then // for each processor in the processor-set( ) do // Compute _ value using the insertion-based policy // endfor Assume assigning node _ to the processor that _ of node _ for each processor in the processor-set( ) do // Compute _ value using the insertion-based policy // endfor _ _ if ( and _ _ ) then Assign node _ to the processor Compute _ // _ if ( ) then Assign node _ to the processor Assign node _ to the processor else Assign node _ to the processor Assign node _ to the processor endif endif else for each processor in the processor-set( ) do // Compute _ value using the insertion-based policy // endfor endif endwhile Assign node _ to the processor that _ of node _ - 53 -
p1 p2 p3 v s p1 p2 p3 v s v 2 v3 v4 v v 5 6 v 5 v 4 v 3 v 2 v 7 v 6 v 8 v 9 v 8 v 7 v 9 v e v e - 54 -
- 55 -
v s v2 v 3 v 4 v 5 v6 v 7 v8 v9 v e - 56 -
max - 57 -
1 13 0 97.01 110.01 1 16.67 18 62.34 97.01 1 14.33 12 62.34 88.67 2 2 12.67 9 62.34 84.01 5 11.67 11 62.34 85.01 4 12.67 14 58.67 85.34 3 11 23 28.34 62.34 2 3 10 20.33 28.34 58.67 3 16.67 17.33 28.34 62.34 1 4 14.67 13.67 0 28.34 1-58 -
pk p k + 1 ect( v, p ) r k v r v j est( v, p ) = ect( v, p ) i k + 1 j k + 1 LDRT ( v, p ) i k v i ect( vi, p k + 1) v r + 1 est( v, p ) = ect( v, p ) i k r+ 1 k ect( v, p ) i k v i - 59 -
p k p k + 1 v r v j est( v, p ) = ect( v, p ) i k + 1 j k + 1 ect( v, p ) r k est( v, p ) = LDRT ( v, p ) i k i k ect( v, p ) i r+ 1 k est( v, p ) k v i ITS ( vr, v r + 1) k v i ect( vi, p k + 1) v r + 1 ect( v, p ) r+ 1 k max - 60 -
p k p k + 1 pk p k + 1 v j v j v r v r LDRT ( v, p ) est( v, p ) i r+ 1 k k ITS ( vr -1, vr ) k LDRT ( v, p ) i k ITS ( vr, v r + 1) k ect( v, p ) r+ 1 k v r + 1 v r + 2 est( v, p ) r+ 1 r+ 1 k ect( v, p ) k v r + 1 v r + 2-61 -
min min max - 62 -
Processors Ordered Tasks Processor selected 0 14 0 16 0 9 Null 3 27 40 27 46 9 27 1 3 21 32 21 34 27 46 1 1 32 45 23 39 27 36 1 3 32 44 20 33 36 46 1 2 32 45 33 41 36 53 1 2 32 39 55 70 55 66 3 1 64 82 43 55 64 84 2,4,5 2 68 73 55 66 68 82 2,4,6 2 77 98 66 73 77 93 7,8,9 2 log - 63 -
//Level Sorting Phase Level Sort the //number of level //Task Prioritizing Phase while (!= 0) do for each nodes at each level do endfor Compute, and Compute Construct a priority queue( ) using endwhile //Processor Selection Phase while there are unscheduling nodes in the do Select the highest priority task, from the for scheduling for each processor in the processor set do endfor Compute value using the MIP scheduling policy Assign the node to the processor which minimizes the of node endwhile p1 p2 p3 p1 p2 p3 p1 p2 p3 v s v s v s v 3 v 7 v 5 v 4 v 2 v 6 v 2 v 3 v 7 v 4 v 6 v 5 v 3 v 7 v 4 v 5 v 2 v 6 v 9 v 9 v 8 v 8 v 8 v e v 9 v e v e - 64 -
- 65 -
// Task Prioritization Phase Compute for all tasks by traversing graph upward Sort the tasks in a scheduling list by non-increasing order of values // Processor Selection Phase while there are unscheduled tasks in a scheduling list do Select the first task in a scheduling list for each processor in do Get_Latest_Data_Receive_Time(, ) endfor if _ and Duplicate_mode == true then endif Duplicate on processor Schedule to the that provides the _ endwhile // // // // // - 66 -
v s 20 14 22 v2 v 3 v 4 v 5 v6 v 7 v8 v9 12 20 19 16 25 10 17 16 14 12 13 25 16 23 21 23 24 12 21 17 17 12 12 10 16 23 17 v e 18.67 115.00 1 17.00 82.00 2 17.00 80.33 3 15.67 76.67 4 16.67 62.67 6 20.00 73.00 5 19.67 47.33 8 18.33 49.00 7 11.33 33.00 9 18.67 18.67 10-67 -
pk p k + 1 p k p k + 1 v r Dupl. v r v r v i ect ( v, p ) dup i k v i v i ect( v, p ) ect( v, p ) i k i k - 68 -
pk p k + 1 p k p k + 1 v r v r v i ect( v, p ) ect( v, p ) i k Dupl. v r i k [ ] PA p k [ ] PA p k v i ect ( v, p ) dup i k - 69 -
Procedure : Get_Latest_Data_Receive_Time(Node, Processor ) begin, for each parent of do // if is scheduled on then if then = endif else if + > then = + endif endfor endprocedure - 70 -
Procedure : TaskInsertion(Node, Processor ) begin InsertionMode = false if then return endif for each node already scheduled on, from last to first scheduled do// if < then if then if - then = InsertionMode = true endif else if - then = InsertionMode = true endif endif endfor = + endprocedure - 71 -
p k p k + 1 pk p k + 1 p k + 1 p k p k + 1 p k + 1 ( ) CIP v i ( ) CIP v i ( ) CIP v i LDRT ( v, p ) i k v r LDRT 2( vi, pk ) v r v r PA[ p k ] PA[ p k ] LDRT ( v, p ) i k PA[ p k ] LDRT 2( vi, pk ) LDRT ( v, p ) i k - 72 -
Procedure : TaskDuplication(Node, Processor ) begin //CASE 1 if or is already scheduled on then Duplicate_mode = false, = + endif //CASE 2 if and then Get_Latest_Data_Receive_Time(, ) // TaskInsertion(, ) // if InsertionMode == true then = + else if then Duplicate_mode = true, = + else Duplicate_mode = false, = + endif endif endif //CASE 3 if then Get_Latest_Data_Receive_Time(, ) TaskInsertion(, ) if InsertionMode == true then // // else endif endif Duplicate_mode = false, = + if then else endif endprocedure Duplicate_mode = true if then = + else = + endif Duplicate_mode = false, = + - 73 -
Processor selected Status 0 20 0 14 0 22 20 32 14 34 22 41 Dupl. 32 48 14 39 22 32 Dupl. 32 49 14 30 32 46 32 48 30 53 32 53 48 60 30 43 32 57 48 69 52 69 52 69 48 71 48 72 32 44 Inserted 56 68 43 55 69 79 69 85 72 95 69 86 Dupl. - 74 -
p1 p2 p3 p1 p2 p3 vs vs vs vs vs v s v 2 v 2 v 2-75 -
p1 p2 p3 p1 p2 p3 v s v s v s vs vs v s v 2 v 3 v 3 v v 4 2 v3 v 3 v v 4 4-76 -
p1 p2 p3 p1 p2 p3 v s v 2 v s v 4 v s v 3 vs vs v 4 v2 v s v 3 v 6 v6 v6 v 6 v 5 v 5 v 5-77 -
p1 p2 p3 p1 p2 p3 v s v s v s vs vs v s v 4 v2 v 3 v 4 v2 v 3 v 6 v 5 v 6 v 5 v 7 v 8 v8 v8 v 7 v 7 v 8 v 7-78 -
- 79 -
p1 p2 p3 p1 p2 p3 v s v4 v2 v v 5 6 v 9 v 9 v s v s v 3 v 7 v 8 v s v s v4 v2 v v 5 6 v 9 v 8 v s v 3 v 7 v v8 8 v 9 v e v e v e - 80 -
p1 p2 p3 p1 p2 p3 v s v s v4 v2 v v 5 6 v 9 v 8 v s v 3 v 7 v 8 vs v 2 v 6 v 9 vs v 3 v v 7 5 v s v 4 v 8 v e v e p p1 p2 p3 p1 p2 3 v s v s v s v s v 2 v 6 v 3 v 5 v 4 v 4 v 2 v 6 v 3 v 7 v 9 v 8 v 5 v 9 v 7 v 8 v e v e - 81 -
- 82 -
min - 83 -
min - 84 -
- 85 -
- 86 -
- 87 -
1 2 3 4 1 0 1 1 1 2 0 0 0 1 3 0 0 0 1 4 0 0 0 0 1 2 3 4 2 4 3 4 4-88 -
for for i f - 89 -
- 90 -
- 91 -
for to do : { for to do endfor } for to do : { for to do endfor endfor endfor } - 92 -
T 1,1 T1,2 T1,3 T1,4 T1,5 T1,6 T 2,2 T2,3 T2,4 T2,5 T2,6 T 3,3 T3,4 T3,5 T3,6 T 4,4 T4,5 T4,6 T 5,5 T 5,6-93 -
log FFT( ) =length( ) if () then return ( ) endif for to do endfor return( ) - 94 -
- 95 -
- 96 -
- 97 -
- 98 -
CPOP HCPT HEFT GCA Combined better 59982 63743 54318 59710 79.2% equal 7514 5031 7618 5891 8.7% worse 7504 6226 13064 9399 12.1% total 75000 75000 75000 75000 100% - 99 -
- 100 -
CPOP HCPT HEFT GCA Combined better 94741 99641 79743 90160 81% equal 6538 4398 9351 9507 6.6% worse 11221 8461 23406 12833 12.4% total 112500 112500 112500 112500 100% - 101 -
- 102 -
- 103 -
- 104 -
- 105 -
- 106 -
- 107 -
HCPT HCPT_LIP GCA GCA_LIP HEFT HEFT_LIP PETS PETS_LIP Original With LIP better 1942 16960 2899 10823 305 5680 968 5020 6.79% 42.77% equal 3598 8778 16515 16512 50.44% - 108 -
- 109 -
- 110 -
- 111 -
- 112 -
PETS HPS LMT Combined better 80625 84644 97126 77.7% equal 13349 11652 6252 9.3% worse 18526 16204 9122 13% total 112500 112500 112500 100% - 113 -
- 114 -
- 115 -
- 116 -
- 117 -
- 118 -
- 119 -
- 120 -
- 121 -
Node Algorithm HIDS HCPFD DPD HCT dup. ins. 50 12.53 5.19 7.79 7.32 7.01 100 18.63 13.31 12.16 11.07 10.97 300 32.60 42.64 24.56 21.84 22.14 500 39.82 50.12 35.76 33.89 32.23 750 53.82 69.46 52.15 50.74 46.93 HCPFD DPD HCT Combined better 110857 109139 108841 97.4% equal 1189 2128 1588 1.5% worse 454 1233 2071 1.1% total 112500 112500 112500 100% - 122 -
HRPS HCPT HEFT CPOP GCA Time complexity ELTS LMT PETS HPS Time complexity log log HIDS HCPFD DPD HCT Time complexity - 123 -
- 124 -
- 125 -
- 126 -
- 127 -
- 128 -
- 129 -
- 130 -
- 131 -
- 132 -
- 133 -
- 134 -