Commit | Line | Data |
---|---|---|
03d53b73 | 1 | .\" -*-nroff-*- |
2 | .de VS | |
3 | .sp 1 | |
4 | .RS | |
5 | .nf | |
6 | .ft B | |
7 | .. | |
8 | .de VE | |
9 | .ft R | |
10 | .fi | |
11 | .RE | |
12 | .sp 1 | |
13 | .. | |
14 | .de hP | |
15 | .IP | |
16 | .ft B | |
17 | \h'-\w'\\$1\ 'u'\\$1\ \c | |
18 | .ft P | |
19 | .. | |
20 | .ie t .ds o \(bu | |
21 | .el .ds o o | |
fbf20b5b | 22 | .TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library" |
03d53b73 | 23 | .SH "NAME" |
24 | hash \- low-level hashtable implementation | |
25 | .\" @hash_create | |
26 | .\" @hash_destroy | |
27 | .\" @hash_bin | |
28 | .\" @hash_extend | |
29 | .\" @hash_remove | |
30 | .\" @hash_mkiter | |
31 | .\" @hash_next | |
32 | .\" | |
33 | .\" @HASH_BIN | |
34 | .\" @HASH_MKITER | |
35 | .\" @HASH_NEXT | |
36 | .\" | |
37 | .SH "SYNOPSIS" | |
38 | .nf | |
39 | .B "#include <mLib/hash.h>" | |
40 | ||
4729aa69 MW |
41 | .B "typedef struct {" |
42 | .B "\h'4n'uint32 mask;" | |
43 | .B "\h'4n'hash_base **v;" | |
44 | .B "\h'4n'arena *a;" | |
45 | .B "} hash_table;" | |
46 | ||
47 | .B "typedef struct {" | |
48 | .B "\h'4n'hash_base *next;" | |
49 | .B "\h'4n'uint32 hash;" | |
50 | .B "} hash_base;" | |
51 | ||
52 | .B "typedef struct { ...\& } hash_iter;" | |
53 | ||
03d53b73 | 54 | .BI "void hash_create(hash_table *" t ", size_t " n ); |
55 | .BI "void hash_destroy(hash_table *" t ); | |
56 | .BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash ); | |
57 | .BI "int hash_extend(hash_table *" t ); | |
db0e70a1 | 58 | .BI "void hash_remove(hash_table *" t ", hash_base *" b ); |
03d53b73 | 59 | .BI "void hash_mkiter(hash_iter *" i ", hash_table *" t ); |
60 | .BI "hash_base *hash_next(hash_iter *" i ); | |
61 | ||
62 | .BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash ); | |
63 | .BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t ); | |
64 | .BI "void HASH_NEXT(hash_iter *" i ", " b ); | |
65 | .fi | |
66 | .SH "OVERVIEW" | |
67 | The | |
68 | .B hash | |
69 | functions provide the basis for an extensible hashtable implementation. | |
70 | The implementation is not complete. Many decisions have been left to | |
71 | the user, including: | |
72 | .hP \*o | |
528c8b4d | 73 | how keys should be represented, hashed and compared; |
03d53b73 | 74 | .hP \*o |
528c8b4d | 75 | how objects contained within the table should be allocated; and |
03d53b73 | 76 | .hP \*o |
528c8b4d | 77 | when the hashtable should be extended. |
03d53b73 | 78 | .PP |
79 | A complete hashtable implementation will need to take the above | |
80 | decisions. If you just want a prepackaged solution, see | |
81 | .BR sym (3) | |
82 | which provides one. | |
83 | .SH "IMPLEMENTATION DETAILS" | |
84 | Each item in the hashtable is assigned a 32-bit integer | |
85 | .IR hash : | |
86 | a number computed somehow from the item's data such that two items which | |
87 | are considered equal will yield the same hash. Ideally, different items | |
88 | will yield different hashes. It is important for this implementation | |
89 | that all bits of the hash are similarly good. | |
90 | .PP | |
91 | In order to look up an item, the high bits of the hash are masked off | |
92 | and the remainder used as an index into a vector of | |
93 | .IR "bin lists" . | |
94 | Each bin contains a list of items with identical low-order bits of their | |
95 | hashes. | |
96 | .PP | |
97 | A table expansion involves doubling the size of the bin vector. Each | |
98 | bin list is then split into two, items being placed into a new bin | |
99 | depending on the setting of the next lowest hash bit. By expanding the | |
100 | hashtable as needed, lookups remain constant-time. | |
101 | .PP | |
102 | A low-level hashtable is represented by a | |
103 | .B hash_table | |
104 | structure. It contains two members: | |
105 | .TP | |
ff76c38f | 106 | .B "uint32 mask" |
03d53b73 | 107 | The current bitmask to be applied to hashes. This is one less than the |
108 | current number of bins in the hashtable, and is applied to hash values | |
109 | in order to decide which bin an item should be in. | |
110 | .TP | |
ff76c38f | 111 | .B "hash_base **v" |
03d53b73 | 112 | The bin vector. It is an array of pointers to hashtable items. |
113 | .PP | |
114 | A hashtable item consists of a | |
115 | .B hash_base | |
116 | structure. A full hashtable implementation will need to extend this | |
117 | structure by adding keying information and other data; the | |
118 | .B hash_base | |
119 | only contains the bare minimum of information needed to maintain the | |
120 | hashtable at a low level. It contains the following members: | |
121 | .TP | |
ff76c38f | 122 | .B "hash_base *next" |
03d53b73 | 123 | Pointer to the next item in the bin list. The final item has a null |
124 | .B next | |
125 | pointer. The entry in the bin vector is null if the bin list is empty. | |
126 | It is up to the high-level implementation to insert items into the list. | |
127 | .TP | |
ff76c38f | 128 | .B "uint32 hash" |
03d53b73 | 129 | The hash for this item. This must be the full 32-bit hash for the |
130 | current item. It is used during hashtable expansion to determine which | |
131 | bin an item should be moved to. | |
132 | .SH "FUNCTIONALITY PROVIDED" | |
133 | This section describes the functions and macros provided for building | |
134 | hashtables. Code examples are given throughout. They assume the | |
135 | following definitions: | |
136 | .VS | |
137 | /* --- A table of items --- */ | |
138 | ||
139 | typedef struct item_table { | |
140 | hash_table t; | |
141 | size_t load; | |
142 | }; | |
143 | ||
144 | /* --- An item --- */ | |
145 | ||
146 | typedef struct item { | |
147 | hash_base b; | |
148 | const char *k; | |
149 | /* ... */ | |
150 | } item; | |
151 | .VE | |
152 | The implementation presented here is simple but relatively bad. The | |
153 | source file | |
154 | .B sym.c | |
155 | presents a more realistic example, but is rather more complex. | |
156 | .SS "Initialization and finalization" | |
157 | An empty hashtable is initialized by calling | |
158 | .B hash_create | |
159 | with the address of a | |
160 | .B hash_table | |
161 | structure to be filled in and the initial number of hash bins to create. | |
162 | .PP | |
163 | For example, an item table might be initialized like this: | |
164 | .VS | |
165 | void item_createtab(item_table *t) | |
166 | { | |
167 | hash_create(&t->t, ITEM_INITSZ); | |
168 | t->load = ITEM_INITLOAD; | |
169 | } | |
170 | .VE | |
171 | A hashtable can be destroyed by calling | |
172 | .B hash_destroy | |
173 | with the address of the | |
174 | .B hash_table | |
175 | structure. This does not attempt to deallocate the individual items; | |
176 | that must be done beforehand. | |
177 | .PP | |
178 | The usual way to deallocate the individual hashtable items is using the | |
179 | iteration constructs described below. | |
180 | .VS | |
181 | void item_destroytab(item_table *t) | |
182 | { | |
183 | hash_iter i; | |
184 | hash_base *b; | |
185 | for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) { | |
186 | item *ii = (item *)b; | |
187 | free(ii->k); | |
188 | /* ... */ | |
189 | DESTROY(ii); | |
190 | } | |
191 | hash_destroy(&t->t); | |
192 | } | |
193 | .VE | |
194 | .sp -1 | |
195 | .SS "Searching, adding and removing" | |
196 | Items must be searched for and added by hand. | |
197 | .PP | |
198 | The macro | |
199 | .B HASH_BIN | |
200 | returns the address of the bin list haed for a particular hashtable and | |
201 | hash value. The function | |
202 | .B hash_bin | |
203 | works the same way and provides the same result, but since the macro is | |
204 | very simple its use is preferred. However, it will evaluate its | |
205 | arguments multiple times. | |
206 | .PP | |
207 | Once the bin list has been found, it's fairly easy to search for an | |
208 | exact match. A simple search might look something like this: | |
209 | .VS | |
210 | item *lookup(item_table *t, const char *k) | |
211 | { | |
212 | uint32 h = hash(k); /* Hash @k@ somehow */ | |
213 | hash_base **bin = HASH_BIN(&t->t, h); | |
214 | hash_base *b; | |
215 | for (b = *bin; b; b = b->next) { | |
216 | item *i = (item *)b; | |
217 | if (h == i->b.hash && strcmp(k, i->k) == 0) | |
218 | return (i); | |
219 | } | |
220 | return (0); | |
221 | } | |
222 | .VE | |
223 | Insertion is also relatively trivial given the bin list head. Insertion | |
224 | may make the hashtable too large, so it might be necessary to extend | |
225 | it. Extension is performed by | |
226 | .BR hash_extend , | |
227 | which is passed only the address of the hashtable. It returns nonzero | |
228 | if extension was successful. | |
229 | .VS | |
230 | item *add(item_table *t, const char *k, /* ... */) | |
231 | { | |
232 | item *i; | |
233 | uint32 h; | |
234 | hash_base **bin; | |
235 | ||
236 | /* --- See if the item is already there --- */ | |
237 | ||
238 | if ((i = = lookup(t, k)) != 0) | |
239 | return (i); | |
240 | ||
241 | /* --- Make a new hashtable item --- */ | |
242 | ||
243 | i = CREATE(item); | |
244 | i->k = xstrdup(k); | |
245 | /* ... */ | |
246 | ||
247 | /* --- Link it into the bin list --- */ | |
248 | ||
249 | h = i->b.hash = hash(k); | |
250 | bin = HASH_BIN(&t->t, h); | |
251 | i->b.next = *bin; | |
252 | *bin = &i->b.next; | |
253 | ||
254 | /* --- Maybe extend the hashtable --- */ | |
255 | ||
256 | if (t->load) | |
257 | t->load--; | |
258 | else if (hash_extend(&t->t)) | |
259 | t->load = recalc_load(t); | |
260 | ||
261 | /* --- Done --- */ | |
262 | ||
263 | return (i); | |
264 | } | |
265 | .VE | |
266 | The | |
267 | .B sym | |
268 | implementation is rather more sophisticated in its approach but the idea | |
269 | is the same. In particular, | |
270 | .B sym | |
271 | provides a single interface for lookup and insertion which prevents the | |
272 | unnecessary rehashing performed by the above code. | |
273 | .PP | |
274 | Removal of items is more straightforward. The function | |
275 | .B hash_remove | |
276 | will unlink a given item from its bin list, after which point it is safe | |
277 | to remove. | |
278 | .SS "Iteration" | |
279 | Iteration allows code to be performed on all the items in a hashtable. | |
280 | This is done using an | |
281 | .I iterator | |
282 | object, represented by a | |
283 | .B hash_iter | |
284 | structure. An iterator is initialized by calling | |
285 | .BR hash_mkiter . | |
286 | Each call to | |
287 | .B hash_next | |
288 | yields a different item from the hashtable until there are none left, a | |
289 | condition signified by a null return value. | |
290 | .PP | |
291 | The macros | |
292 | .B HASH_MKITER | |
293 | and | |
294 | .B HASH_NEXT | |
295 | do the same jobs as the above functions. However, | |
296 | .B HASH_NEXT | |
297 | has a slightly bizarre argument passing convention: its second argument | |
298 | is an | |
299 | .I lvalue | |
300 | which is updated to contain the address of the next item. | |
301 | .PP | |
302 | The finalization code above contained an example of iteration. | |
303 | .SH "SEE ALSO" | |
304 | .BR sym (3), | |
305 | .BR mLib (3). | |
306 | .SH "AUTHOR" | |
9b5ac6ff | 307 | Mark Wooding, <mdw@distorted.org.uk> |