git.distorted.org.uk Git - u/mdw/catacomb/blob - hashsum.1

   1 .\" -*-nroff-*-
   2 .de hP
   3 .IP
   4 .ft B
   5 \h'-\w'\\$1\ 'u'\\$1\ \c
   6 .ft P
   7 ..
   8 .ie t .ds o \(bu
   9 .el .ds o o
  10 .TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
  11 .SH NAME
  12 hashsum \- compute and verify cryptographic checksums of files
  13 .SH SYNOPSIS
  14 .B hashsum
  15 .RB [ \-f0ecbpv ]
  16 .RB [ \-a
  17 .IR algorithm ]
  18 .RB [ \-E
  19 .IR encoding ]
  20 .IR files ...
  21 .SH DESCRIPTION
  22 The
  23 .B hashsum
  24 program generates and verifies cryptographic checksums (hashes) of
  25 files.  A number of hashing algorithms are available.
  26 .PP
  27 The
  28 .B hashsum
  29 program's options and output were originally designed to be upwardly
  30 compatible with the GNU
  31 .BR md5sum (1)
  32 program, but the two have diverged somewhat.  See the
  33 .B "COMPATIBILITY NOTES"
  34 section of this manual for details.
  35 .PP
  36 Usually,
  37 .B hashsum
  38 generates checksums of a collection of files named either on the command
  39 line or read from standard input, and write their hashes to standard
  40 output using a simple file format.  However, given the
  41 .B \-c
  42 option, it will read in files in its usual output format and verify that
  43 the named files have the reported hashes.
  44 .SS "Options"
  45 The
  46 .B hashsum
  47 program understands the following options:
  48 .TP
  49 .B "\-h, \-\-help"
  50 Prints a help message to standard output and exits successfully.
  51 .TP
  52 .B "\-V, \-\-version"
  53 Prints the program's version number to standard output and exits
  54 successfully.
  55 .TP
  56 .B "\-u, \-\-usage"
  57 Prints a brief usage summary to standard output and exits successfully.
  58 .TP
  59 .BR "\-l, \-\-list " [ \fIitem ...]
  60 Show lists of hash functions and encodings supported.
  61 .TP
  62 .BI "\-a, \-\-algorithm=" alg
  63 Use the hash algorithm
  64 .IR alg .
  65 If this option is not given, a default hashing algorithm is selected:
  66 see
  67 .B "Hashing algorithms"
  68 below.
  69 .TP
  70 .BI "\-E, \-\-encoding=" encoding
  71 Use the given
  72 .I encoding
  73 to represent hashes in the output.  This is not interoperable with other
  74 programs, but it's handy, e.g., for building sha1 URNs.  The encodings
  75 recognized are
  76 .B hex
  77 (the default),
  78 .B base64
  79 and
  80 .BR base32 .
  81 Type
  82 .B hashsum \-\-list enc
  83 for a list of supported encodings.
  84 .TP
  85 .B "\-f, \-\-files"
  86 Each input file is considered to be a list of filenames which should be
  87 read and hashed.  By default, the filenames are considered to be
  88 whitespace-separated, although control characters can be escaped (see
  89 .B "Escaping control characters"
  90 below).
  91 .TP
  92 .B "\-0, \-\-null"
  93 In conjunction with the
  94 .B \-f
  95 option above, reads null-terminated filenames, as emitted by GNU
  96 .BR find (1)'s
  97 .B \-print0
  98 option, rather than whitespace-delimited filenames.  If the
  99 .B \-c
 100 option is also given, each named in the list is a list of filename/hash
 101 pairs to be checked.
 102 .TP
 103 .B "\-e, \-\-escape"
 104 Escape control characters (see
 105 .B "Escaping control characters"
 106 below) in filenames when generating output.  Escaped
 107 output is not compatible with
 108 .BR md5sum (1),
 109 but copes better with files containing newlines and other strange
 110 control characters.
 111 .TP
 112 .B "\-c, \-\-check"
 113 Check hashes.  Each input file is assumed to be in
 114 .BR hashsum 's
 115 output format.  It is read, and
 116 .B hashsum
 117 will verify that each named file has the correct hash.  Assuming that
 118 the hash list is authentic (e.g., it has been digitally signed, or
 119 obtained via some secure medium), this provides strong assurance that
 120 the files listed have not been tampered with.
 121 .TP
 122 .B "\-b, \-\-binary"
 123 Assume that the files to be hashed are binary files.  This doesn't make
 124 any difference in Unix systems, although it might on other platforms
 125 which draw a distinction.
 126 .TP
 127 .B "\-p, \-\-progress"
 128 Display a progress indicator while hashing large files.  The progress
 129 indicator is written to standard error.
 130 .TP
 131 .B "\-v, \-\-verbose"
 132 In conjunction with the
 133 .B \-c
 134 option above, be verbose when checking files.
 135 .PP
 136 If no filenames are given on the command line, standard input is read.
 137 Standard input does not have a filename.
 138 .SS "Output format"
 139 There are three types of line in
 140 .BR hashsum 's
 141 output format:
 142 .IR directives ,
 143 .IR "file lines" ,
 144 and
 145 .IR rubbish .
 146 .PP
 147 A
 148 .I directive
 149 begins with a hash
 150 .RB (` # ')
 151 character.  These directives are currently understood:
 152 .TP
 153 .BI "#hash " alg
 154 Subsequent hashes in this file were generated using the algorithm
 155 .IR alg .
 156 .TP
 157 .BI "#encoding " encoding
 158 Subsequent hashes in this file are represented using the named
 159 .IR encoding .
 160 .TP
 161 .BI "#escape"
 162 Filenames in subsequence lines are written using the `escaped' format,
 163 described below.
 164 .PP
 165 A
 166 .I "file line"
 167 consists of a hash, in the requested encoding, followed by a space, a
 168 .IR flag ,
 169 and the filename.  The
 170 .I flag
 171 is either a star
 172 .RB (` * ')
 173 to indicate that the file should be read in binary mode, or a space.
 174 The rest of the line contains the filename.
 175 .PP
 176 A
 177 .I rubbish
 178 line is one which doesn't look like a directive or a file line.  Rubbish
 179 lines are ignored.  Hence, you can apply PGP clear-signing to a
 180 .B hashsum
 181 file without preventing it from being read.
 182 .SS "Escaping control characters"
 183 When reading filenames to hash from a list of files or an escaped hash
 184 list, the following rules are obeyed:
 185 .hP \*o
 186 An escaped string cannot contain unescaped, unquoted whitespace
 187 characters.  If such a character is found, the string is considered to
 188 have ended.
 189 .hP \*o
 190 A backslash
 191 .RB (` \e ')
 192 escapes the following character.  If the character is one of
 193 .RB ` a ',
 194 .RB ` b ',
 195 .RB ` f ',
 196 .RB ` n ',
 197 .RB ` r ',
 198 .RB ` t ',
 199 or
 200 .RB ` v ',
 201 it is replaced by the control character for an audible alert, backspace,
 202 form-feed, newline, carriage return, horizontal tab or vertical tab
 203 respectively; other escaped characters are unchanged, although they lose
 204 any special meaning they might have had.
 205 .hP \*o
 206 A section of text may be quoted by surrounding it by
 207 .BR ' ... ' ,
 208 .BR """" ... """" ,
 209 or
 210 .BR ` ... '
 211 pairs.  Within a quoted section, whitespace characters may appear
 212 unescaped.  The backslash may be used to quote control characters or the
 213 quoting characters as usual.
 214 .hP \*o
 215 A word beginning with a hash
 216 .RB (` # ')
 217 character is considered to begin a
 218 .I comment
 219 which extends to the end of the current line.  The hash character may be
 220 escaped as usual.
 221 .SS "Hashing algorithms"
 222 The
 223 .B hashsum
 224 program understands several hashing algorithms:
 225 .TP
 226 .BR md2
 227 Designed by Ron Rivest, although I don't know when, and described in
 228 RFC1319, MD2 is a really old and slow hash function.  Its security is
 229 suspect too: only its checksum stands between it and collision-finding
 230 attacks.  Use of MD2 is not recommended, though it's still used in
 231 various standards.
 232 .TP
 233 .BR md4 " and " md5
 234 Designed by Ron Rivest in 1990 and 1992 respectively and described in
 235 RFCs 1186, 1320 and 1321, these two early hash functions are efficient
 236 but cryptographically suspect: the MD4 algorithm has been shown not to
 237 be collision-resistant and there are `pseudo-collisions' in MD5.
 238 Despite this,
 239 .B md5
 240 has been used heavily since its introduction and is still popular.  MD4
 241 is still useful when a fast non-cryptographic hash is wanted.
 242 .TP
 243 .B sha
 244 Designed by the US National Security Agency as part of the Digital
 245 Signature Standard, SHA-1 provides a longer output than
 246 .B md4
 247 and
 248 .BR md5 ,
 249 and is seen as being more secure.
 250 .TP
 251 .BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
 252 Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
 253 as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
 254 the same length output as SHA-1, but has been designed in the open by
 255 experts.  RIPEMD28 is a shortened version of RIPEMD160 designed as a
 256 drop-in replacement for MD4, MD5 and the old RIPEMD.  The 256 and
 257 320-bit versions are efficient double-width extensions of the 128 and
 258 160-bit hashes, although they may not offer any additional security.
 259 .TP
 260 .B tiger
 261 Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
 262 processors, Tiger seems to be an efficient and strong hash function.
 263 It's a relatively new algorithm, however, and should probably be
 264 approached with an open-minded caution.
 265 .TP
 266 .BR sha256 ", " sha384 " and " sha512
 267 Designed by the US National Security Agency to provide security
 268 commensurate with the Advanced Encryption Standard, these hash functions
 269 provide long outputs.  SHA-256 is fairly quick, though the longer
 270 variants are slower on 32-bit hardware since they require 64-bit
 271 arithmetic.  They're all very new at the moment, and should be
 272 approached with an open-minded caution.
 273 .PP
 274 The default hashing algorithm is determined by looking at the name by
 275 which it was invoked passed to it in
 276 .BR argv[0] :
 277 if it has the form
 278 .RI ` alg \c
 279 .BR sum '
 280 where
 281 .I alg
 282 is the name of a hash function, that hash becomes the default.  (Hence,
 283 .B hashsum
 284 can be used as a drop-in replacement for
 285 .BR md5sum (1).)
 286 If the program name doesn't match an algorithm, then
 287 .B md5
 288 is selected for compatibility with files generated by
 289 .BR md5sum (1).
 290 .PP
 291 Note that the same default algorithm is used for both generating new
 292 output files and checking existing ones.  If the algorithm is forced by
 293 the
 294 .B \-a
 295 option,
 296 .B hashsum
 297 will emit a
 298 .RB ` #hash '
 299 directive in its output.
 300 .SH "COMPATIBILITY NOTES"
 301 Once upon a time, there was only the
 302 .BR md5sum (1)
 303 utility.  As its name suggested, it calculated MD5 hashes of files.  MD5
 304 was shown to be weak, so the author wrote
 305 .B hashsum
 306 to do the same job with other, hopefully stronger, hash functions.  The
 307 original
 308 .B hashsum
 309 program tried hard to be compatible with GNU
 310 .BR md5sum (1),
 311 but the latter has itself changed in incompatible ways since then;
 312 .B hashsum
 313 has intentionally not changed to match.
 314 .PP
 315 The following
 316 .B hashsum
 317 features are not found in the GNU Coreutils hashing utilities.
 318 .hP
 319 Filename escaping (the
 320 .B \-e
 321 option).
 322 .hP
 323 Magic comment lines in hash data to indicate algorithm selection, hash
 324 encoding, and filename escaping.
 325 .hP
 326 Base-64 and Base-32 output.
 327 .PP
 328 Other differences are as follows.
 329 .hP
 330 Originally, if GNU
 331 .B md5sum
 332 was invoked without any filename arguments, it would print only the hash
 333 of its stdin to stdout, which was very convenient for scripts which
 334 manipulate hashes in nontrivial ways.  This behaviour was later changed,
 335 and now the GNU Coreutils hashing utilities always print a filename or
 336 .RB ` \- '
 337 after the hash.  The
 338 .B hashsum
 339 program follows the original
 340 .B md5sum
 341 behaviour, and doesn't print a filename if no files were listed on the
 342 command line.
 343 .SH "SEE ALSO"
 344 .BR md5sum (1),
 345 .BR dsig (1),
 346 .BR catsign (1),
 347 .BR catcrypt (1).
 348 .SH "AUTHOR"
 349 Mark Wooding, <mdw@distorted.org.uk>