mdw@git.distorted.org.uk Git - mLib/blob - hash/unihash.3.in

   1 .\" -*-nroff-*-
   2 .\"
   3 .\" Manual for universal hashing
   4 .\"
   5 .\" (c) 2003, 2005, 2007, 2009, 2023, 2024 Straylight/Edgeware
   6 .\"
   7 .
   8 .\"----- Licensing notice ---------------------------------------------------
   9 .\"
  10 .\" This file is part of the mLib utilities library.
  11 .\"
  12 .\" mLib is free software: you can redistribute it and/or modify it under
  13 .\" the terms of the GNU Library General Public License as published by
  14 .\" the Free Software Foundation; either version 2 of the License, or (at
  15 .\" your option) any later version.
  16 .\"
  17 .\" mLib is distributed in the hope that it will be useful, but WITHOUT
  18 .\" ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  19 .\" FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Library General Public
  20 .\" License for more details.
  21 .\"
  22 .\" You should have received a copy of the GNU Library General Public
  23 .\" License along with mLib.  If not, write to the Free Software
  24 .\" Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
  25 .\" USA.
  26 .
  27 .\"--------------------------------------------------------------------------
  28 .so ../defs.man \" @@@PRE@@@
  29 .
  30 .\"--------------------------------------------------------------------------
  31 .TH unihash 3mLib "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
  32 .\" @unihash_setkey
  33 .\" @UNIHASH_INIT
  34 .\" @unihash_hash
  35 .\" @UNIHASH
  36 .\" @unihash
  37 .
  38 .\"--------------------------------------------------------------------------
  39 .SH NAME
  40 unihash \- simple and efficient universal hashing for hashtables
  41 .
  42 .\"--------------------------------------------------------------------------
  43 .SH SYNOPSIS
  44 .
  45 .nf
  46 .B "#include <mLib/unihash.h>"
  47 .PP
  48 .B "typedef struct { ...\& } unihash_info;"
  49 .PP
  50 .B "unihash_info unihash_global;"
  51 .PP
  52 .BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
  53 .BI "uint32 UNIHASH_INIT(const unihash_info *" i );
  54 .ta \w'\fBuint32 unihash_hash('u
  55 .BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
  56 .BI "   const void *" p ", size_t " sz );
  57 .BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
  58 .BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
  59 .fi
  60 .
  61 .\"--------------------------------------------------------------------------
  62 .SH DESCRIPTION
  63 The
  64 .B unihash
  65 system implements a simple and relatively efficient
  66 .IR "universal hashing family" .
  67 Using a such a universal hashing family means that it's provably
  68 difficult for an adversary to choose input data whose hashes collide,
  69 thus guaranteeing good average performance even on maliciously chosen
  70 data.
  71 .PP
  72 Unlike, say,
  73 .BR crc32 (3),
  74 the
  75 .B unihash
  76 function is
  77 .I keyed
  78 \- in addition to the data to be hashed, the function takes as input a
  79 32-bit key.  This key should be chosen at random each time the program
  80 runs.
  81 .
  82 .SS "Preprocessing a key"
  83 Before use, a key must be
  84 .I preprocessed
  85 into a large (16K) table which is used by the main hashing functions.
  86 The preprocessing is done by
  87 .BR unihash_setkey :
  88 pass it a pointer to a
  89 .B unihash_info
  90 structure and the 32-bit key you've chosen, and it stores the table in
  91 the structure.
  92 .PP
  93 Objects of type
  94 .B unihash_info
  95 don't contain any pointers to other data and are safe to free when
  96 you've finished with them; or you can just allocate them statically or
  97 on the stack if that's more convenient.
  98 .
  99 .SS "Hashing data"
 100 The function
 101 .B unihash_hash
 102 takes as input:
 103 .TP
 104 .BI "const unihash_info *" i
 105 A pointer to the precomputed tables for a key.
 106 .TP
 107 .BI "uint32 " a
 108 An accumulator value.  This should be
 109 .BI UNIHASH_INIT( i )
 110 for the first chunk of a multi-chunk input, or the result of the
 111 previous
 112 .B unihash_hash
 113 call for subsequent chunks.
 114 .TP
 115 .BI "const void *" p
 116 A pointer to the start of a buffer containing this chunk of data.
 117 .TP
 118 .BI "size_t " sz
 119 The length of the chunk.
 120 .PP
 121 The function returns a new accumulator value, which is also the hash of
 122 the data so far. So, to hash multiple chunks of data, do something like
 123 .VS
 124 uint32 a = UNIHASH_INIT(i);
 125 a = unihash_hash(i, a, p_0, sz_0);
 126 a = unihash_hash(i, a, p_1, sz_1);
 127 /* ...\& */
 128 a = unihash_hash(i, a, p_n, sz_n);
 129 .VE
 130 The macro
 131 .B UNIHASH
 132 and function
 133 .B unihash
 134 are convenient interfaces to
 135 .B unihash_hash
 136 if you only wanted to hash one chunk.
 137 .
 138 .SS "Global hash info table"
 139 There's no problem with using the same key for several purposes, as long
 140 as it's secret from all of your adversaries.  Therefore, there is a
 141 global
 142 .B unihash_info
 143 table define, called
 144 .BR unihash_global .
 145 This initially contains information for a fixed key which the author
 146 chose at random, but if you need to you can set a different key into it
 147 .I before
 148 it gets used to hash any data (otherwise your hash tables will become
 149 messed up).
 150 .
 151 .SS "Theoretical issues"
 152 The hash function implemented by
 153 .B unihash
 154 is
 155 .RI ( l \ +\ 1)/2\*(ss32\*(se-almost
 156 XOR-universal, where
 157 .I l
 158 is the length (in bytes) of the longest string you hash.  That means
 159 that, for any pair of strings
 160 .I x
 161 and
 162 .I y
 163 and any 32-bit value \*(*d, the probability taken over all choices of the
 164 key
 165 .I k
 166 that
 167 .IR H\*(usk\*(ue ( x )\  \c
 168 .BR xor \c
 169 .RI \  H\*(usk\*(ue ( y )\ =\ \*(*d
 170 is no greater than
 171 .RI ( l \ +\ 1)/2\*(ss32\*(se.
 172 .PP
 173 This fact is proven in the header file, but it requires more
 174 sophisticated typesetting than is available here.
 175 .PP
 176 The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
 177 coefficients are the bytes of the message and whose variable is the key.
 178 Details are given in the header file.
 179 .PP
 180 For best results, you should choose the key as a random 32-bit number
 181 each time your program starts.  Choosing a different key for different
 182 hashtables isn't necessary.  It's probably a good idea to avoid the keys
 183 0 and 1.  This raises the collision bound to
 184 .RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
 185 (which isn't a significant increase) but eliminates keys for which the
 186 hash's behaviour is particularly poor.
 187 .PP
 188 In tests,
 189 .B unihash
 190 actually performed better than
 191 .BR crc32 ,
 192 so if you want to just use it as a fast-ish hash with good statistical
 193 properties, choose some fixed key
 194 .IR k \ \*(>=\ 2.
 195 .PP
 196 We emphasize that the proof of this function's collision behaviour is
 197 .I not
 198 dependent on any unproven assumptions (unlike many `proofs' of
 199 cryptographic security, which actually reduce the security of some
 200 construction to the security of its components).  It's just a fact.
 201 .
 202 .SS "Practical issues"
 203 The implementation of
 204 .B unihash
 205 uses a (fairly large) table precomputed from the key.
 206 When a message is hashed,
 207 some of the message data
 208 and internal state of the hashing operation
 209 are leaked to other processes on the same hardware
 210 through the processor cache
 211 and other stateful microarchitectural features.
 212 It's possible for an adversary to determine the hashing key
 213 by observing this leakage.
 214 This is unlikely to be a major concern
 215 since local processes have other, more effective ways to deny service;
 216 but if is then
 217 .BR siphash (3)
 218 may be more appropriate.
 219 See that manual page for a comparison of the two.
 220 .
 221 .\"--------------------------------------------------------------------------
 222 .SH SEE ALSO
 223 .
 224 .BR unihash-mkstatic (1),
 225 .BR crc32 (3),
 226 .BR siphash (3),
 227 .BR mLib (3).
 228 .
 229 .\"--------------------------------------------------------------------------
 230 .SH AUTHOR
 231 .
 232 Mark Wooding (mdw@distorted.org.uk).
 233 .
 234 .\"----- That's all, folks --------------------------------------------------