Chapter 4 Fundamental File Structure Concepts Reference: M. J. Folk and B. Zoellick, File Structures, Addison-Wesley (1992).
TABLE OF CONTENTSN Field and Record Organization Record Access More about Record Structures File Access and File Organization 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 1)
1. Field and Record Organization at 11 1.1 A Stream File 정의 File : Stream of byte 로구성 예 : 이름과주소정보를저장하는 File (Program 1) 실행예 John Ames Alan Mason 123 Maple 90 Eastgate Stillwater, OK 74075 Ada, OK 74820 AmesJohn123 MapleStillwaterOK74075MasonAlan90.. 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 2)
Stream File 의문제점 Information loss : 정보단위가불명확 Field 의개념필요 Field? Smallest logically meaningful unit of information in a file Conceptual tool 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 3)
Creates name and address file that is strictly a stream of bytes -- writstrm.c #include "fileio.h" #define out_str(fd,s) s) write((fd),(s),strlen(s)) (s) strlen(s)) main () { char first[30], last[30], address[30], city[20]; char state[15], zip[9], filename[15]; int fd; printf("enter the name of the file you wish to create: "); gets(filename); if ((fd = open(filename, O_WRONLY O_CREAT O_EXCL)) < 0) { printf("file opening error --- program stopped n"); exit(1); 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 4)
printf(" n ntype in a last name (surname), or <CR> to exit n"); gets(last); t(l t) while (strlen(last) > 0) { printf(" nfirst Name:"); gets(first); printf(" Address:"); gets(address); printf(" City:"); gets(city); printf(" State:"); gets(state); printf(" Zip:"); gets(zip); /* output the strings to the buffer and then to the file */ out_str(fd,last); t t) out_str(fd,first); t fi t) out_str(fd,address); t dd out_str(fd,city); out_str(fd,state); out_str(fd,zip); /* prepare for next entry */ printf(" n ntype in a last name, or <CR> to exit n"); gets(last); close(fd);
1.2 Field Structures File 내에서 field 표현방법 Force the field into a predictable length Begin each field with a length indicator 각 field 의끝에 delimiter 사용 "keyword = value" 사용 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 6)
Force the Length of Fields 정의 struct { char last_name[10]; char first_name[10]; char address[15]; char city[15]; char state[2]; char zip[9]; set_of_fields; 장점 구현용이, 가장많이사용 단점 기억공간낭비 주어진공간보다큰데이터저장불가 가변길이 field가많이존재할경우, 사용곤란 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 7)
가변길이 Field 의구현방법 Begin Each Field with a Length Indicator 각 field 의시작시, 그 field 의길이저장 (Field 크기 < 256 Byte) : 1 Character 로크기표현 Separate the Field with Delimiters 특수문자로 Field 끝을표시 특수문자가 Field 내용중에나타날경우? "Keyword = Value" Expression to Identify Fields Self-describing structure ( 어떤 field? Missing field?) 기억공간낭비가매우크다. 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 8)
Ames John 123 Maple Stillwater OK74075377-1808 Mason Alan 90 Eastgate Ada OK74820 (a) Field lengths fixed. Place blanks in the spaces where the phone number would go. Ames John 123 Maple Stillwater OK 74075 377-1808 Mason Alan 90 Eastgate Ada OK 74820 (b) Delimiters are used to indicate the end of a field. Place the delimiter for the empty field immediately after the delimiter for the previous fields. Ames... Stillwater t OK 74075 377-1808 #Mason... 90 Eastgate t Ada OK 74820 #... (c) Place the field for business phone at the end of the record. If the end-of-record mark is encountered, assume that the field is missing. SURNAME=Ames FIRSTNAME=John STREET=123 Maple... ZIP=74075 PHONE=377-1808 #... (d) Use a keyword to identify each field. If the keyword is missing, the corresponding field is assumed to be missing. FIGURE 4.3 Four methods for organizing fields within records to account for possible missing fields. In the examples, the second record is missing the phone number.
1.3 Reading a Stream of Fields Program: readstrm.c Field 를 grouping 하는개념필요 Record set of fields 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 10)
readstrm.c : reads a stream of delimited fields #include "fileio.h" main() { int fd, n, fld_count; char s[30], filename[15]; printf("enter name of file to read: "); gets(filename); if ((fd = open(filename, O_RDONLY)) < 0) { printf("file opening error --- program stopped n"); exit(1); /* main program loop -- calls readfield() for as long as the function succeeds */ fld_count = 0; while ((n = readfield(fd, s)) > 0) printf(" tfield # %3d: %s n", ++fld_count, s); close(fd); 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 11)
int readfield(int fd, char s[]) { int i; char c; i= 0; while ( read(fd, &c, 1) > 0 && c!= DELIM_CHR) s[i++] = c; s[i] = ' 0'; /* append null to end string */ return (i); 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 12)
1.4 Record Structures Record 의정의 A set of fields that belong together when the file is viewed in terms of higher level of organization 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 13)
File 을 record 로구성하는방법 고정길이 record ( Figure 4.5 (a) (b) ) 각 record 는동일한크기 : 가장많이사용 고정길이 record & 가변길이 field 가능 고정된 field 수를갖는 record (Figure45(c)) 4.5 (c) ) counting fields modulo filed number 각 record 앞에 record 길이를표현 ( Figure 4.6 (a) ) Use Index ( Figure 4.6 (c) ) Data file 및 Index file 로구분 Self-describing structure 로표현가능 Delimiter 로 record 구분 ( Figure 4.6 (c) ) delimiter : 공백이나특수문자 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 14)
Ames John 123 Maple Stillwater OK74075 Mason Alan 90 Eastgate Ada OK74820 (a) Fixed-length fields Ames John 123 Maple Stillwater OK 74075 Unused space Mason Alan 90 Eastgate Ada OK 74820 Unused space (b) Fixed-length records with variable-length fields Ames John 123 Maple Stillwater OK 74075 Mason Alan 90 Eastgate Ada OK... (c) Fixed number of fields per record FIGURE 4.5 Three ways of marking the lengths of records constant and predictable. (a) Counting bytes : fixed-length records with fixed-length fields. (b) Counting bytes : fixed-length records with variable-length fields. (c) Counting fields : six fields per record.
40Ames John 123 Maple Stillwater OK 74075 36Mason Alan 90 Eastgate... (a) Length indicator Data file : Ames John 123 Maple Stillwater OK 74075 Mason Alan... Index file : 00 40... (b) Index file Ames John 123 Maple Stillwater t OK 74075 #Mason Al Alan 90E Eastgate t Ad Ada OK... (c) Delimiter FIGURE 4.6 Record structures for variable-length records. (a) Beginning each record with a length indicator. (b) Using an index file to keep track of record addresses. (c) Placing the delimiter i # at the end of each record.
1.5 A Record Structure using Length Indicator Writing Variable-length Records to File Record 앞에길이를어떻게저장할것인가? Buffering 을이용하여길이계산 길이표현방법 : binary or ASCII writrec.c Representing the Record Length Binary Space efficient ( 32,767 vs. 99 ) fixed length ASCII: portable & printable Reading Variable-length Records from File readrec.c 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 17)
writrec.c: creates name and address file using fixed length (2-byte) record length field ahead of each record #include "fileio.h" #define fld_to_recbf(rb, fld) strcat(rb, fld); strcat(rb, DELIM_STR) char recbf[max_rec_size + 1]; char *prompt[] p = { "Enter Last Name -- or <CR> to exit: ", " First name: ", " Address: ", " City: ", " State: ", " Zip: ", "" /* null string to terminate the prompt loop */ ; main () { char response[50], filename[15]; int fd, i; short rec_lgth; 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 18)
printf("enter the name of the file you wish to create: "); gets(filename); if ((fd = open(filename, O_WRONLY O_CREATE O_EXCL)) < 0) { printf("file opening error --- program stopped n"); exit(1); printf(" n n%s", prompt[0]); gets(response); t( while (strlen(response) > 0) { recbf[0] = ' 0'; fld_to_recbf(recbf, response); for (i=1; *prompt[i]!= ' 0' ; i++) { printf("%s" %s, prompt[i]); gets(response); fld_to_recbf(recbuff, response); 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 19)
/* write out the record length and buffer contents */ rec_ lgth = strlen(recbf); write(fd, &rec_lgth, sizeof(rec_lgth)); /* record의길이저장 */ write(fd, recbf, rec_lgth); /* record의내용저장 */ /* prepare for next entry */ printf(" n n%s", prompt[0]); gets(response); /* close the file before leaving */ close(fd); /* question: How does the termination condition work in the loop: for (i=1; *prompt[i]!= ' 0' ; i++) What does the "i" refer to? Why do we need the "*"? */ 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 20)
readrec.c ec.c #include "fileio.h" main() { int fd, rec_count, fld_count, scan_pos; short rec_lgth; char filename[15], e[ recbuff[max _ REC_S SIZE + 1]; char field[max_rec_size + 1]; printf("enter a file name to read: "); gets(filename); if ((fd = open(filename, O_RDONLY)) < 0) { printf("file opening error --- program stopped n"); exit(1); 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 21)
rec_count = scan_pos = 0; while ((rec _ lgth = get _ rec(fd, recbuff)) > 0) { printf ("Record %d n", ++rec_count); fld_count = 0; while ((scan_pos = get_fld(field, recbuff, scan_pos, rec_lgth)) > 0) printf (" tfield %d: %s n", ++fld_count, field); close(fd); /* Q: Why can I assign 0 to scan_pos just once, outside of the while loop for records? */ get_rec(int fd, char recbuff[]) { short rec_lgth; if (read(fd, &rec_lgth, sizeof(short)) == 0) /* get record length */ return(0); /* return 0 if EOF */ rec_lgth = read(fd, recbuff, rec_lgth); /* read record */ return(rec_lgth); th) 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 22)
get_fld(char field[], char recbuff[], short scan_pos, short rec_lgth) { short fpos = 0; /* position in "field" array */ if (scan_pos == rec_lgth) /* if no more fields to read, */ return(0); /* return scan_pos of 0. */ /* scanning loop */ while ( scan_pos < rec_lgth && (field[fpos++] = recbuff[scan_pos++])!= DELIM_CHR) ; if (field[fpos - 1] == DELIM_ CHR) /* if last character is a field */ field[--fpos] = ' 0'; /* delimiter, replace with null */ else field[fpos] = ' 0'; /* otherwise, just ensure that the field is null-terminated */ return(scan_pos); /* return position of start of next field */ 영남대학교데이터베이스연구실 Algorithm: Chapter 4 (Page 23)