@@@ man wip
[mLib] / struct / hash.3
CommitLineData
03d53b73 1.\" -*-nroff-*-
2.de VS
3.sp 1
4.RS
5.nf
6.ft B
7..
8.de VE
9.ft R
10.fi
11.RE
12.sp 1
13..
14.de hP
15.IP
16.ft B
17\h'-\w'\\$1\ 'u'\\$1\ \c
18.ft P
19..
20.ie t .ds o \(bu
21.el .ds o o
fbf20b5b 22.TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library"
03d53b73 23.SH "NAME"
24hash \- low-level hashtable implementation
25.\" @hash_create
26.\" @hash_destroy
27.\" @hash_bin
28.\" @hash_extend
29.\" @hash_remove
30.\" @hash_mkiter
31.\" @hash_next
32.\"
33.\" @HASH_BIN
34.\" @HASH_MKITER
35.\" @HASH_NEXT
36.\"
37.SH "SYNOPSIS"
38.nf
39.B "#include <mLib/hash.h>"
40
adec5584 41.ta 2n
4729aa69 42.B "typedef struct {"
adec5584
MW
43.B " uint32 mask;"
44.B " hash_base **v;"
45.B " arena *a;"
4729aa69
MW
46.B "} hash_table;"
47
48.B "typedef struct {"
adec5584
MW
49.B " hash_base *next;"
50.B " uint32 hash;"
4729aa69
MW
51.B "} hash_base;"
52
53.B "typedef struct { ...\& } hash_iter;"
54
03d53b73 55.BI "void hash_create(hash_table *" t ", size_t " n );
56.BI "void hash_destroy(hash_table *" t );
57.BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash );
58.BI "int hash_extend(hash_table *" t );
db0e70a1 59.BI "void hash_remove(hash_table *" t ", hash_base *" b );
03d53b73 60.BI "void hash_mkiter(hash_iter *" i ", hash_table *" t );
61.BI "hash_base *hash_next(hash_iter *" i );
62
63.BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash );
64.BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t );
65.BI "void HASH_NEXT(hash_iter *" i ", " b );
66.fi
67.SH "OVERVIEW"
68The
69.B hash
70functions provide the basis for an extensible hashtable implementation.
71The implementation is not complete. Many decisions have been left to
72the user, including:
73.hP \*o
528c8b4d 74how keys should be represented, hashed and compared;
03d53b73 75.hP \*o
528c8b4d 76how objects contained within the table should be allocated; and
03d53b73 77.hP \*o
528c8b4d 78when the hashtable should be extended.
03d53b73 79.PP
80A complete hashtable implementation will need to take the above
81decisions. If you just want a prepackaged solution, see
82.BR sym (3)
83which provides one.
84.SH "IMPLEMENTATION DETAILS"
85Each item in the hashtable is assigned a 32-bit integer
86.IR hash :
87a number computed somehow from the item's data such that two items which
88are considered equal will yield the same hash. Ideally, different items
89will yield different hashes. It is important for this implementation
90that all bits of the hash are similarly good.
91.PP
92In order to look up an item, the high bits of the hash are masked off
93and the remainder used as an index into a vector of
94.IR "bin lists" .
95Each bin contains a list of items with identical low-order bits of their
96hashes.
97.PP
98A table expansion involves doubling the size of the bin vector. Each
99bin list is then split into two, items being placed into a new bin
100depending on the setting of the next lowest hash bit. By expanding the
101hashtable as needed, lookups remain constant-time.
102.PP
103A low-level hashtable is represented by a
104.B hash_table
105structure. It contains two members:
106.TP
ff76c38f 107.B "uint32 mask"
03d53b73 108The current bitmask to be applied to hashes. This is one less than the
109current number of bins in the hashtable, and is applied to hash values
110in order to decide which bin an item should be in.
111.TP
ff76c38f 112.B "hash_base **v"
03d53b73 113The bin vector. It is an array of pointers to hashtable items.
114.PP
115A hashtable item consists of a
116.B hash_base
117structure. A full hashtable implementation will need to extend this
118structure by adding keying information and other data; the
119.B hash_base
120only contains the bare minimum of information needed to maintain the
121hashtable at a low level. It contains the following members:
122.TP
ff76c38f 123.B "hash_base *next"
03d53b73 124Pointer to the next item in the bin list. The final item has a null
125.B next
126pointer. The entry in the bin vector is null if the bin list is empty.
127It is up to the high-level implementation to insert items into the list.
128.TP
ff76c38f 129.B "uint32 hash"
03d53b73 130The hash for this item. This must be the full 32-bit hash for the
131current item. It is used during hashtable expansion to determine which
132bin an item should be moved to.
133.SH "FUNCTIONALITY PROVIDED"
134This section describes the functions and macros provided for building
135hashtables. Code examples are given throughout. They assume the
136following definitions:
137.VS
138/* --- A table of items --- */
139
140typedef struct item_table {
141 hash_table t;
142 size_t load;
143};
144
145/* --- An item --- */
146
147typedef struct item {
148 hash_base b;
149 const char *k;
150 /* ... */
151} item;
152.VE
153The implementation presented here is simple but relatively bad. The
154source file
155.B sym.c
156presents a more realistic example, but is rather more complex.
157.SS "Initialization and finalization"
158An empty hashtable is initialized by calling
159.B hash_create
160with the address of a
161.B hash_table
162structure to be filled in and the initial number of hash bins to create.
163.PP
164For example, an item table might be initialized like this:
165.VS
166void item_createtab(item_table *t)
167{
168 hash_create(&t->t, ITEM_INITSZ);
169 t->load = ITEM_INITLOAD;
170}
171.VE
172A hashtable can be destroyed by calling
173.B hash_destroy
174with the address of the
175.B hash_table
176structure. This does not attempt to deallocate the individual items;
177that must be done beforehand.
178.PP
179The usual way to deallocate the individual hashtable items is using the
180iteration constructs described below.
181.VS
182void item_destroytab(item_table *t)
183{
184 hash_iter i;
185 hash_base *b;
186 for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) {
187 item *ii = (item *)b;
188 free(ii->k);
189 /* ... */
190 DESTROY(ii);
191 }
192 hash_destroy(&t->t);
193}
194.VE
195.sp -1
196.SS "Searching, adding and removing"
197Items must be searched for and added by hand.
198.PP
199The macro
200.B HASH_BIN
201returns the address of the bin list haed for a particular hashtable and
202hash value. The function
203.B hash_bin
204works the same way and provides the same result, but since the macro is
205very simple its use is preferred. However, it will evaluate its
206arguments multiple times.
207.PP
208Once the bin list has been found, it's fairly easy to search for an
209exact match. A simple search might look something like this:
210.VS
211item *lookup(item_table *t, const char *k)
212{
213 uint32 h = hash(k); /* Hash @k@ somehow */
214 hash_base **bin = HASH_BIN(&t->t, h);
215 hash_base *b;
216 for (b = *bin; b; b = b->next) {
217 item *i = (item *)b;
218 if (h == i->b.hash && strcmp(k, i->k) == 0)
219 return (i);
220 }
221 return (0);
222}
223.VE
224Insertion is also relatively trivial given the bin list head. Insertion
225may make the hashtable too large, so it might be necessary to extend
226it. Extension is performed by
227.BR hash_extend ,
228which is passed only the address of the hashtable. It returns nonzero
229if extension was successful.
230.VS
231item *add(item_table *t, const char *k, /* ... */)
232{
233 item *i;
234 uint32 h;
235 hash_base **bin;
236
237 /* --- See if the item is already there --- */
238
239 if ((i = = lookup(t, k)) != 0)
240 return (i);
241
242 /* --- Make a new hashtable item --- */
243
244 i = CREATE(item);
245 i->k = xstrdup(k);
246 /* ... */
247
248 /* --- Link it into the bin list --- */
249
250 h = i->b.hash = hash(k);
251 bin = HASH_BIN(&t->t, h);
252 i->b.next = *bin;
253 *bin = &i->b.next;
254
255 /* --- Maybe extend the hashtable --- */
256
257 if (t->load)
258 t->load--;
259 else if (hash_extend(&t->t))
260 t->load = recalc_load(t);
261
262 /* --- Done --- */
263
264 return (i);
265}
266.VE
267The
268.B sym
269implementation is rather more sophisticated in its approach but the idea
270is the same. In particular,
271.B sym
272provides a single interface for lookup and insertion which prevents the
273unnecessary rehashing performed by the above code.
274.PP
275Removal of items is more straightforward. The function
276.B hash_remove
277will unlink a given item from its bin list, after which point it is safe
278to remove.
279.SS "Iteration"
280Iteration allows code to be performed on all the items in a hashtable.
281This is done using an
282.I iterator
283object, represented by a
284.B hash_iter
285structure. An iterator is initialized by calling
286.BR hash_mkiter .
287Each call to
288.B hash_next
289yields a different item from the hashtable until there are none left, a
290condition signified by a null return value.
291.PP
292The macros
293.B HASH_MKITER
294and
295.B HASH_NEXT
296do the same jobs as the above functions. However,
297.B HASH_NEXT
298has a slightly bizarre argument passing convention: its second argument
299is an
300.I lvalue
301which is updated to contain the address of the next item.
302.PP
303The finalization code above contained an example of iteration.
304.SH "SEE ALSO"
305.BR sym (3),
306.BR mLib (3).
307.SH "AUTHOR"
9b5ac6ff 308Mark Wooding, <mdw@distorted.org.uk>