5 Collision Resolution by Progressive Overflow Progressive Overflow Linear Probing 51 How Progressive Overflow Works 기본개념 Collision 발생할때, 이후빈공간에삽입 ( 그림 104) End of file 일경우, 처음부터다시검색 ( 그림 105) Circular queue 의개념 Key 검색시, h(key) 수행 h(key) 에해당 Key 가없더라도, 계속검색 언제까지? 무한 loop의가능성? 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 20)
Key York Hash function Address 6 0 1 5 6 7 8 9 Novak Rosen Jasper Moreley York s home Address (busy) 2nd try (busy) 3rd try (busy) 4th try (open) York s actual address FIGURE 104 Collision i resolution with progressive overflow 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 21)
Key Blue 0 1 2 3 Hash routine Address 99 98 99 Jello Wrapping around FIGURE 105 Searching for an address beyond the end of a file 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 22)
52 Search Length 정의 주어진 Key 를검색하기위해필요한 disk 액세스수 그림 106 average search length = total search length total number of records Packing density 와의관계 ( 그림 107) 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 23)
Key Home Address Adams 20 Bates 21 Actual address 0 Home address Cole 21 20 Adams 20 1 Dean 22 Evans 20 21 22 23 24 25 Bates Cole Dean 21 21 22 Evans 20 5 Number of accesses needed to retrieve FIGURE 106 Illustration of the effects of clustering of a records As keys are clustered, the number of accesses required to access later keys can become large 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 24) 1 2 2
5 4 Average search length 3 2 1 20 40 60 80 100 Packing density FIGURE 107 Average search length versus packing density in a hashed file in which one record can be stored per address, progressive overflow is used to resolve collisions, and the file has just been loaded 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 25)
6 Storing More than One Record per Address: Buckets Bucket? A block of records that is retrieved in one disk access Hashing의결과로 bucket address 결정 Synonym 들을하나의 bucket 에저장 Bucket overflow? 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 26)
Key Home Address Green 30 Hall 30 Jenks 32 King 33 Land 33 Marx 33 Nutt 33 Bucket address 30 31 32 33 Green Hall Jenks Bucket contents King Land Marks (Nutt Is an overflow record) FIGURE 108 An illustration of buckets Each bucket can hold up to three records Only one synonym (Nutt) results in overflow 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 27)
61 Effects of Buckets on Performance Packing Density r packing density = bn b: number of records that fit in a bucket 예 N = 1,000 & r = 750 & b = 1 N = 500 & r = 750 & b = 2 각경우에대해 packing density 는동일 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 28)
성능향상 r/n 비율의증가 p(0) 의감소 Space utilization 증가 표 103 Overflow 발생확률감소 표 104 & 표 105 File without Buckets File with Buckets Number of records r = 750 r = 750 Number of addresses N = 1,000 N = 500 Bucket size b = 1 b = 2 Packing density 075 075 Ratio of records to address r/n = 075 r/n = 15 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 29)
TABLE 103 Poisson distributions for two different file organizations p(x) File without File with Buckets Buckets (r/n = 075) (r/n =15) 15) p(0) 0472 0223 p(1) 0354 0335 p(2) 0133 0251 p(3) 0033 0126 p(4) 0006 0047 p(5) 0001 0014 p(6) ) 0004 p(7) 0001 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 30)
TABLE 104 Synonyms y causing collisions as a percent of records for different packing densities and different bucket sizes Packing Bucket Size Density (%) 1 2 5 10 100 10 48 06 00 00 00 20 94 22 01 00 00 30 136 45 04 00 0 00 0 40 176 73 11 01 00 50 213 104 25 04 00 60 248 248 137 137 45 45 13 13 00 00 70 281 170 71 29 00 75 296 187 86 40 00 80 312 204 103 53 01 90 341 238 138 86 08 100 368 271 176 125 40 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 31)
TABLE 105 Average number of accesses required in a successful search hby progressive overflow Packing Bucket Size Density (%) 1 2 5 10 50 10 106 106 101 101 100 100 100 100 100 100 30 121 106 100 100 100 40 133 110 101 100 100 50 150 118 103 100 100 60 175 129 107 101 100 70 217 149 114 104 100 80 300 190 129 111 101 90 550 315 178 135 104 95 1050 1050 56 56 27 27 18 18 11 11 Adapted from Donald Knuth, The Art of Computer Programming, Vol 3, 1973, Addison-Wesley, Reading, Mass, Page 536 Reprinted with permission 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 32)
62 Implementation Issues Hashed Index는 fixed-length file B-Tree Index 를미리초기화 ( Empty 표시위해 ) Index page 구조와연관하여생각 삽입 / 삭제 / 갱신시 hash 함수와의연관관계고려 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 33)
Bucket Structure Counter field 필요 Empty slot 표현방법필요 An empty bucket : Two entries : A full bucket : 0 / / / / / / / / / / / / / / / / / / / / / / / / / 2 JONES ARNSWORTH / / / / / / / / / / / / / / / 5 JONES ARNSWORTH STOCKTON BRICE THROOP 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 34)
Initializing and Loading Initializing a File for Hashing 데이터를저장하기전에파일영역미리할당 & 초기화 Clustering의관점 record 의삽입및검색알고리즘이단순화 이렇게하지않을경우, 해결방법? Loading a Hash File 일반적인 fixed-length record file과의차이점 hash() 함수호출을통해서만데이터액세스 Key 검색및삽입과정 ( linear probing ) record 개수 > file 크기? 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 35)
7 Making Deletions Record 삭제시고려사항 Collision 으로인해 record 가다른위치에존재가능 Record 삭제후재배치필요 Free slot 은이후새로운 record 에할당가능하도록 예 : 그림 109 ~ 1010 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 36)
record Home address Actual address 4 Adams 5 5 5 Adams Jones 6 6 Morris 6 7 Smith 5 8 6 7 Jones Morris 8 Smith FIGURE 109 File organization before deletions 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 37)
4 5 6 7 8 Adams Jones Smith 5 6 7 8 Adams Jones # # # # # # Smith FIGURE 1010 FIGURE 101111 The same organization as in Fig 109, with Morris deleted The same file as in Fig 109 after the insertion of a tombstone for Morris 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 38)
71 Tombstones for Handling Deletions 기본개념 Record 삭제후, 삭제되었다는표시남기자 Free slot과는구분 예 : 그림 101111 장점 Search 중 tombstone 이발견되면, 계속 search Tombstone 위치에새로운 record 저장가능 주의사항 Searching delay ( 그림 1011에서 Smith 삭제시 ) 중복확인곤란 ( 그림 1011에서새로운 Smith 삽입 ) 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 39)
72 Effects of Deletions and Additions on Performance 문제점 거듭되는 deletion 의결과로, tombstone 과다존재 searching delay 초래 해결방안 Local reorganization at each deletion Complete reorganization after threshold 다른 collision resolution algorithm 사용 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 40)
8 Other Collision Resolution Techniques 81 Double Hashing 기본개념 Progressive overflow 의문제점 Hashing table에서 record cluster 발생가능 Collision i 발생시새로운함수로다시 hashinghi 장, 단점 Hashing table 에서 record 들을골고루분산 Overflow record의경우, 여러번의 disk seek 필요 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 41)
82 Chained Progressive Overflow Synonym 들을 pointer로연결 Overflow 시 synonym 들만 search 가능 Average search length가준다 ( 그림 1013 ) Home address 가다른 key 에할당될경우? two-pass loading Two-pass Loading 1 st pass: Home record 들을 load 2 nd pass: Overflow record 들을빈 slot 에할당 Question: 새로운 home record가나중에삽입? Chaining with a separate overflow area 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 42)
Average Search Length Key Home Address Actual Address Search Length Adams Bates Cole Dean Evans Flint 20 21 20 21 24 20 20 21 22 23 24 25 1 1 3 3 1 6 Average Search Length = ( 1 + 1 + 3 + 3 + 1 + 6 ) / 6 = 25 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 43)
Home Actual Address of Search address address Data next synonym length 20 20 Adams 22 1 21 21 Bates 23 1 20 22 Cole 25 2 21 23 Dean -1 2 24 24 Evans -1 1 20 25 Flint -1 3 FIGURE 1013 Hashing with chained progressive overflow Adams, Cole, and Flint are synonyms; Bates and Dean are synonyms 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 44)
83 Chaining with a Separate Overflow Area Hashing table을두부분으로분리 Prime data area: Home record 저장 Overflow area: Overflow record 들의 linked list 장, 단점 Potential home address 는항상 unused (Home address = free)? Store there Overflow area가다른cylinder 에존재할경우? 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 45)
Home Primary Overflow address data area area 20 21 22 23 24 Adams 0 1 Cole Bates 1 Dean Evans -1 0 1 2 3 Flint 2-1 -1 FIGURE 1014 Chaining to a separate overflow area, Adams, Cole, and Flint are synonyms; Bates and Dean are synonyms 영남대학교데이터베이스연구실 Algorithm: Chapter 10 (Page 46)