17 \h'-\w'\\$1\ 'u'\\$1\ \c
36 .TH unihash 3 "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
38 unihash \- simple and efficient universal hashing for hashtables
46 .B "#include <mLib/unihash.h>"
48 .B "unihash_info unihash_global;"
50 .BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
51 .BI "uint32 UNIHASH_INIT(const unihash_info *" i );
52 .BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
53 .BI " const void *" p ", size_t " sz );
54 .BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
55 .BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
60 system implements a simple and relatively efficient
61 .IR "universal hashing family" .
62 Using a such a universal hashing family means that it's provably
63 difficult for an adversary to choose input data whose hashes collide,
64 thus guaranteeing good average performance even on maliciously chosen
73 \- in addition to the data to be hashed, the function takes as input a
74 32-bit key. This key should be chosen at random each time the program
76 .SS "Preprocessing a key"
77 Before use, a key must be
79 into a large (16K) table which is used by the main hashing functions.
80 The preprocessing is done by
82 pass it a pointer to a
84 structure and the 32-bit key you've chosen, and it stores the table in
89 don't contain any pointers to other data and are safe to free when
90 you've finished with them; or you can just allocate them statically or
91 on the stack if that's more convenient.
97 .BI "const unihash_info *" i
98 A pointer to the precomputed tables for a key.
101 An accumulator value. This should be
102 .BI UNIHASH_INIT( i )
103 for the first chunk of a multi-chunk input, or the result of the
106 call for subsequent chunks.
109 A pointer to the start of a buffer containing this chunk of data.
112 The length of the chunk.
114 The function returns a new accumulator value, which is also the hash of
115 the data so far. So, to hash multiple chunks of data, do something like
117 uint32 a = UNIHASH_INIT(i);
118 a = unihash_hash(i, a, p_0, sz_0);
119 a = unihash_hash(i, a, p_1, sz_1);
121 a = unihash_hash(i, a, p_n, sz_n);
127 are convenient interfaces to
129 if you only wanted to hash one chunk.
130 .SS "Global hash info table"
131 There's no problem with using the same key for several purposes, as long
132 as it's secret from all of your adversaries. Therefore, there is a
137 This initially contains information for a fixed key which the author
138 chose at random, but if you need to you can set a different key into it
140 it gets used to hash any data (otherwise your hash tables will become
142 .SS "Theoretical issues"
143 The hash function implemented by
146 .RI ( l \ +\ 1)/2\*(ss32\*(se-almost
149 is the length (in bytes) of the longest string you hash. That means
150 that, for any pair of strings
154 and any 32-bit value \*(*d, the probability taken over all choices of the
158 .IR H\*(usk\*(ue ( x )\ \c
160 .RI \ H\*(usk\*(ue ( y )\ =\ \*(*d
162 .RI ( l \ +\ 1)/2\*(ss32\*(se.
164 This fact is proven in the header file, but it requires more
165 sophisticated typesetting than is available here.
167 The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
168 coefficients are the bytes of the message and whose variable is the key.
169 Details are given in the header file.
171 For best results, you should choose the key as a random 32-bit number
172 each time your program starts. Choosing a different key for different
173 hashtables isn't necessary. It's probably a good idea to avoid the keys
174 0 and 1. This raises the collision bound to
175 .RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
176 (which isn't a significant increase) but eliminates keys for which the
177 hash's behaviour is particularly poor.
181 actually performed better than
183 so if you want to just use it as a fast-ish hash with good statistical
184 properties, choose some fixed key
187 We emphasize that the proof of this function's collision behaviour is
189 dependent on any unproven assumptions (unlike many `proofs' of
190 cryptographic security, which actually reduce the security of some
191 construction to the security of its components). It's just a fact.
193 .BR unihash-mkstatic (3),
197 Mark Wooding (mdw@distorted.org.uk).