17 \h'-\w'\\$1\ 'u'\\$1\ \c
22 .TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library"
24 hash \- low-level hashtable implementation
39 .B "#include <mLib/hash.h>"
42 .B "\h'4n'uint32 mask;"
43 .B "\h'4n'hash_base **v;"
48 .B "\h'4n'hash_base *next;"
49 .B "\h'4n'uint32 hash;"
52 .B "typedef struct { ...\& } hash_iter;"
54 .BI "void hash_create(hash_table *" t ", size_t " n );
55 .BI "void hash_destroy(hash_table *" t );
56 .BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash );
57 .BI "int hash_extend(hash_table *" t );
58 .BI "void hash_remove(hash_table *" t ", hash_base *" b );
59 .BI "void hash_mkiter(hash_iter *" i ", hash_table *" t );
60 .BI "hash_base *hash_next(hash_iter *" i );
62 .BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash );
63 .BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t );
64 .BI "void HASH_NEXT(hash_iter *" i ", " b );
69 functions provide the basis for an extensible hashtable implementation.
70 The implementation is not complete. Many decisions have been left to
73 how keys should be represented, hashed and compared;
75 how objects contained within the table should be allocated; and
77 when the hashtable should be extended.
79 A complete hashtable implementation will need to take the above
80 decisions. If you just want a prepackaged solution, see
83 .SH "IMPLEMENTATION DETAILS"
84 Each item in the hashtable is assigned a 32-bit integer
86 a number computed somehow from the item's data such that two items which
87 are considered equal will yield the same hash. Ideally, different items
88 will yield different hashes. It is important for this implementation
89 that all bits of the hash are similarly good.
91 In order to look up an item, the high bits of the hash are masked off
92 and the remainder used as an index into a vector of
94 Each bin contains a list of items with identical low-order bits of their
97 A table expansion involves doubling the size of the bin vector. Each
98 bin list is then split into two, items being placed into a new bin
99 depending on the setting of the next lowest hash bit. By expanding the
100 hashtable as needed, lookups remain constant-time.
102 A low-level hashtable is represented by a
104 structure. It contains two members:
107 The current bitmask to be applied to hashes. This is one less than the
108 current number of bins in the hashtable, and is applied to hash values
109 in order to decide which bin an item should be in.
112 The bin vector. It is an array of pointers to hashtable items.
114 A hashtable item consists of a
116 structure. A full hashtable implementation will need to extend this
117 structure by adding keying information and other data; the
119 only contains the bare minimum of information needed to maintain the
120 hashtable at a low level. It contains the following members:
123 Pointer to the next item in the bin list. The final item has a null
125 pointer. The entry in the bin vector is null if the bin list is empty.
126 It is up to the high-level implementation to insert items into the list.
129 The hash for this item. This must be the full 32-bit hash for the
130 current item. It is used during hashtable expansion to determine which
131 bin an item should be moved to.
132 .SH "FUNCTIONALITY PROVIDED"
133 This section describes the functions and macros provided for building
134 hashtables. Code examples are given throughout. They assume the
135 following definitions:
137 /* --- A table of items --- */
139 typedef struct item_table {
144 /* --- An item --- */
146 typedef struct item {
152 The implementation presented here is simple but relatively bad. The
155 presents a more realistic example, but is rather more complex.
156 .SS "Initialization and finalization"
157 An empty hashtable is initialized by calling
159 with the address of a
161 structure to be filled in and the initial number of hash bins to create.
163 For example, an item table might be initialized like this:
165 void item_createtab(item_table *t)
167 hash_create(&t->t, ITEM_INITSZ);
168 t->load = ITEM_INITLOAD;
171 A hashtable can be destroyed by calling
173 with the address of the
175 structure. This does not attempt to deallocate the individual items;
176 that must be done beforehand.
178 The usual way to deallocate the individual hashtable items is using the
179 iteration constructs described below.
181 void item_destroytab(item_table *t)
185 for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) {
186 item *ii = (item *)b;
195 .SS "Searching, adding and removing"
196 Items must be searched for and added by hand.
200 returns the address of the bin list haed for a particular hashtable and
201 hash value. The function
203 works the same way and provides the same result, but since the macro is
204 very simple its use is preferred. However, it will evaluate its
205 arguments multiple times.
207 Once the bin list has been found, it's fairly easy to search for an
208 exact match. A simple search might look something like this:
210 item *lookup(item_table *t, const char *k)
212 uint32 h = hash(k); /* Hash @k@ somehow */
213 hash_base **bin = HASH_BIN(&t->t, h);
215 for (b = *bin; b; b = b->next) {
217 if (h == i->b.hash && strcmp(k, i->k) == 0)
223 Insertion is also relatively trivial given the bin list head. Insertion
224 may make the hashtable too large, so it might be necessary to extend
225 it. Extension is performed by
227 which is passed only the address of the hashtable. It returns nonzero
228 if extension was successful.
230 item *add(item_table *t, const char *k, /* ... */)
236 /* --- See if the item is already there --- */
238 if ((i = = lookup(t, k)) != 0)
241 /* --- Make a new hashtable item --- */
247 /* --- Link it into the bin list --- */
249 h = i->b.hash = hash(k);
250 bin = HASH_BIN(&t->t, h);
254 /* --- Maybe extend the hashtable --- */
258 else if (hash_extend(&t->t))
259 t->load = recalc_load(t);
268 implementation is rather more sophisticated in its approach but the idea
269 is the same. In particular,
271 provides a single interface for lookup and insertion which prevents the
272 unnecessary rehashing performed by the above code.
274 Removal of items is more straightforward. The function
276 will unlink a given item from its bin list, after which point it is safe
279 Iteration allows code to be performed on all the items in a hashtable.
280 This is done using an
282 object, represented by a
284 structure. An iterator is initialized by calling
288 yields a different item from the hashtable until there are none left, a
289 condition signified by a null return value.
295 do the same jobs as the above functions. However,
297 has a slightly bizarre argument passing convention: its second argument
300 which is updated to contain the address of the next item.
302 The finalization code above contained an example of iteration.
307 Mark Wooding, <mdw@distorted.org.uk>