@@@ adjust bench timings
[mLib] / hash / unihash.3
CommitLineData
8fe3c82b 1.\" -*-nroff-*-
2.de VS
3.sp 1
4.RS
5.nf
6.ft B
7..
8.de VE
9.ft R
10.fi
11.RE
12.sp 1
13..
14.de hP
15.IP
16.ft B
17\h'-\w'\\$1\ 'u'\\$1\ \c
18.ft P
19..
20.ie t \{\
21. ds ss \s8\u
22. ds se \d\s0
23. ds us \s8\d
24. ds ue \u\s0
25. ds *d \(*d
26. ds >= \(>=
27.\}
28.el \{\
29. ds ss ^
30. ds se
31. ds us _
32. ds ue
33. ds *d \fIdelta\fP
34. ds >= >=
35.\}
36.TH unihash 3 "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
37.SH NAME
38unihash \- simple and efficient universal hashing for hashtables
39.\" @unihash_setkey
40.\" @UNIHASH_INIT
41.\" @unihash_hash
42.\" @UNIHASH
43.\" @unihash
44.SH SYNOPSIS
45.nf
46.B "#include <mLib/unihash.h>"
47
4729aa69
MW
48.B "typedef struct { ...\& } unihash_info;"
49
6f444bda 50.B "unihash_info unihash_global;"
51
8fe3c82b 52.BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
53.BI "uint32 UNIHASH_INIT(const unihash_info *" i );
3bc42912 54.BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
55.BI " const void *" p ", size_t " sz );
8fe3c82b 56.BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
57.BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
58.fi
59.SH DESCRIPTION
60The
61.B unihash
62system implements a simple and relatively efficient
63.IR "universal hashing family" .
64Using a such a universal hashing family means that it's provably
65difficult for an adversary to choose input data whose hashes collide,
66thus guaranteeing good average performance even on maliciously chosen
67data.
68.PP
69Unlike, say,
70.BR crc32 (3),
71the
72.B unihash
73function is
74.I keyed
75\- in addition to the data to be hashed, the function takes as input a
7632-bit key. This key should be chosen at random each time the program
77runs.
78.SS "Preprocessing a key"
79Before use, a key must be
80.I preprocessed
81into a large (16K) table which is used by the main hashing functions.
82The preprocessing is done by
83.BR unihash_setkey :
84pass it a pointer to a
85.B unihash_info
86structure and the 32-bit key you've chosen, and it stores the table in
87the structure.
88.PP
89Objects of type
90.B unihash_info
91don't contain any pointers to other data and are safe to free when
92you've finished with them; or you can just allocate them statically or
93on the stack if that's more convenient.
94.SS "Hashing data"
95The function
96.B unihash_hash
97takes as input:
98.TP
99.BI "const unihash_info *" i
100A pointer to the precomputed tables for a key.
101.TP
d4efbcd9 102.BI "uint32 " a
8fe3c82b 103An accumulator value. This should be
104.BI UNIHASH_INIT( i )
105for the first chunk of a multi-chunk input, or the result of the
106previous
107.B unihash_hash
108call for subsequent chunks.
109.TP
110.BI "const void *" p
111A pointer to the start of a buffer containing this chunk of data.
112.TP
113.BI "size_t " sz
114The length of the chunk.
115.PP
116The function returns a new accumulator value, which is also the hash of
117the data so far. So, to hash multiple chunks of data, do something like
118.VS
119uint32 a = UNIHASH_INIT(i);
120a = unihash_hash(i, a, p_0, sz_0);
121a = unihash_hash(i, a, p_1, sz_1);
122/* ... */
123a = unihash_hash(i, a, p_n, sz_n);
124.VE
125The macro
126.B UNIHASH
127and function
128.B unihash
129are convenient interfaces to
130.B unihash_hash
131if you only wanted to hash one chunk.
6f444bda 132.SS "Global hash info table"
133There's no problem with using the same key for several purposes, as long
134as it's secret from all of your adversaries. Therefore, there is a
135global
136.B unihash_info
137table define, called
138.BR unihash_global .
139This initially contains information for a fixed key which the author
140chose at random, but if you need to you can set a different key into it
141.I before
142it gets used to hash any data (otherwise your hash tables will become
143messed up).
8fe3c82b 144.SS "Theoretical issues"
145The hash function implemented by
146.B unihash
147is
148.RI ( l \ +\ 1)/2\*(ss32\*(se-almost
149XOR-universal, where
150.I l
151is the length (in bytes) of the longest string you hash. That means
152that, for any pair of strings
153.I x
154and
155.I y
156and any 32-bit value \*(*d, the probability taken over all choices of the
157key
158.I k
159that
160.IR H\*(usk\*(ue ( x )\ \c
161.BR xor \c
162.RI \ H\*(usk\*(ue ( y )\ =\ \*(*d
163is no greater than
164.RI ( l \ +\ 1)/2\*(ss32\*(se.
165.PP
166This fact is proven in the header file, but it requires more
167sophisticated typesetting than is available here.
168.PP
169The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
170coefficients are the bytes of the message and whose variable is the key.
171Details are given in the header file.
172.PP
173For best results, you should choose the key as a random 32-bit number
174each time your program starts. Choosing a different key for different
175hashtables isn't necessary. It's probably a good idea to avoid the keys
1760 and 1. This raises the collision bound to
177.RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
178(which isn't a significant increase) but eliminates keys for which the
179hash's behaviour is particularly poor.
180.PP
181In tests,
182.B unihash
183actually performed better than
184.BR crc32 ,
185so if you want to just use it as a fast-ish hash with good statistical
186properties, choose some fixed key
187.IR k \ \*(>=\ 2.
188.PP
189We emphasize that the proof of this function's collision behaviour is
190.I not
191dependent on any unproven assumptions (unlike many `proofs' of
192cryptographic security, which actually reduce the security of some
193construction to the security of its components). It's just a fact.
194.SH SEE ALSO
6f444bda 195.BR unihash-mkstatic (3),
8fe3c82b 196.BR crc32 (3),
197.BR mLib (3).
198.SH AUTHOR
9b5ac6ff 199Mark Wooding (mdw@distorted.org.uk).