Merge branch 'master' of git.distorted.org.uk:~mdw/publish/public-git/catacomb
[u/mdw/catacomb] / hashsum.1
1 .\" -*-nroff-*-
2 .de hP
3 .IP
4 .ft B
5 \h'-\w'\\$1\ 'u'\\$1\ \c
6 .ft P
7 ..
8 .ie t .ds o \(bu
9 .el .ds o o
10 .TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
11 .SH NAME
12 hashsum \- compute and verify cryptographic checksums of files
13 .SH SYNOPSIS
14 .B hashsum
15 .RB [ \-f0ecbjpv ]
16 .RB [ \-a
17 .IR algorithm ]
18 .RB [ \-E
19 .IR encoding ]
20 .IR files ...
21 .SH DESCRIPTION
22 The
23 .B hashsum
24 program generates and verifies cryptographic checksums (hashes) of
25 files. A number of hashing algorithms are available.
26 .PP
27 The
28 .B hashsum
29 program's options and output were originally designed to be upwardly
30 compatible with the GNU
31 .BR md5sum (1)
32 program, but the two have diverged somewhat. See the
33 .B "COMPATIBILITY NOTES"
34 section of this manual for details.
35 .PP
36 Usually,
37 .B hashsum
38 generates checksums of a collection of files named either on the command
39 line or read from standard input, and write their hashes to standard
40 output using a simple file format. However, given the
41 .B \-c
42 option, it will read in files in its usual output format and verify that
43 the named files have the reported hashes.
44 .SS "Options"
45 The
46 .B hashsum
47 program understands the following options:
48 .TP
49 .B "\-h, \-\-help"
50 Prints a help message to standard output and exits successfully.
51 .TP
52 .B "\-V, \-\-version"
53 Prints the program's version number to standard output and exits
54 successfully.
55 .TP
56 .B "\-u, \-\-usage"
57 Prints a brief usage summary to standard output and exits successfully.
58 .TP
59 .BR "\-l, \-\-list " [ \fIitem ...]
60 Show lists of hash functions and encodings supported.
61 .TP
62 .BI "\-a, \-\-algorithm=" alg
63 Use the hash algorithm
64 .IR alg .
65 If this option is not given, a default hashing algorithm is selected:
66 see
67 .B "Hashing algorithms"
68 below.
69 .TP
70 .BI "\-E, \-\-encoding=" encoding
71 Use the given
72 .I encoding
73 to represent hashes in the output. This is not interoperable with other
74 programs, but it's handy, e.g., for building sha1 URNs. The encodings
75 recognized are
76 .B hex
77 (the default),
78 .B base64
79 and
80 .BR base32 .
81 Type
82 .B hashsum \-\-list enc
83 for a list of supported encodings.
84 .TP
85 .B "\-f, \-\-files"
86 Each input file is considered to be a list of filenames which should be
87 read and hashed. By default, the filenames are considered to be
88 whitespace-separated, although control characters can be escaped (see
89 .B "Escaping control characters"
90 below).
91 .TP
92 .B "\-0, \-\-null"
93 In conjunction with the
94 .B \-f
95 option above, reads null-terminated filenames, as emitted by GNU
96 .BR find (1)'s
97 .B \-print0
98 option, rather than whitespace-delimited filenames. If the
99 .B \-c
100 option is also given, each named in the list is a list of filename/hash
101 pairs to be checked.
102 .TP
103 .B "\-e, \-\-escape"
104 Escape control characters (see
105 .B "Escaping control characters"
106 below) in filenames when generating output. Escaped
107 output is not compatible with
108 .BR md5sum (1),
109 but copes better with files containing newlines and other strange
110 control characters.
111 .TP
112 .B "\-c, \-\-check"
113 Check hashes. Each input file is assumed to be in
114 .BR hashsum 's
115 output format. It is read, and
116 .B hashsum
117 will verify that each named file has the correct hash. Assuming that
118 the hash list is authentic (e.g., it has been digitally signed, or
119 obtained via some secure medium), this provides strong assurance that
120 the files listed have not been tampered with.
121 .TP
122 .B "\-j, \-\-junk"
123 Report files whose hashes have not been checked. This is most useful in
124 conjunction with
125 .RB ` \-c ',
126 though it's valid without. The program merely prints warnings about
127 junk files when computing hashes, but will exit nonzero if any are found
128 when checking them.
129 .TP
130 .B "\-b, \-\-binary"
131 Assume that the files to be hashed are binary files. This doesn't make
132 any difference in Unix systems, although it might on other platforms
133 which draw a distinction.
134 .TP
135 .B "\-p, \-\-progress"
136 Display a progress indicator while hashing large files. The progress
137 indicator is written to standard error.
138 .TP
139 .B "\-v, \-\-verbose"
140 In conjunction with the
141 .B \-c
142 option above, be verbose when checking files.
143 .PP
144 If no filenames are given on the command line, standard input is read.
145 Standard input does not have a filename.
146 .SS "Output format"
147 There are three types of line in
148 .BR hashsum 's
149 output format:
150 .IR directives ,
151 .IR "file lines" ,
152 and
153 .IR rubbish .
154 .PP
155 A
156 .I directive
157 begins with a hash
158 .RB (` # ')
159 character. These directives are currently understood:
160 .TP
161 .BI "#hash " alg
162 Subsequent hashes in this file were generated using the algorithm
163 .IR alg .
164 .TP
165 .BI "#encoding " encoding
166 Subsequent hashes in this file are represented using the named
167 .IR encoding .
168 .TP
169 .BI "#escape"
170 Filenames in subsequence lines are written using the `escaped' format,
171 described below.
172 .PP
173 A
174 .I "file line"
175 consists of a hash, in the requested encoding, followed by a space, a
176 .IR flag ,
177 and the filename. The
178 .I flag
179 is either a star
180 .RB (` * ')
181 to indicate that the file should be read in binary mode, or a space.
182 The rest of the line contains the filename.
183 .PP
184 A
185 .I rubbish
186 line is one which doesn't look like a directive or a file line. Rubbish
187 lines are ignored. Hence, you can apply PGP clear-signing to a
188 .B hashsum
189 file without preventing it from being read.
190 .SS "Escaping control characters"
191 When reading filenames to hash from a list of files or an escaped hash
192 list, the following rules are obeyed:
193 .hP \*o
194 An escaped string cannot contain unescaped, unquoted whitespace
195 characters. If such a character is found, the string is considered to
196 have ended.
197 .hP \*o
198 A backslash
199 .RB (` \e ')
200 escapes the following character. If the character is one of
201 .RB ` a ',
202 .RB ` b ',
203 .RB ` f ',
204 .RB ` n ',
205 .RB ` r ',
206 .RB ` t ',
207 or
208 .RB ` v ',
209 it is replaced by the control character for an audible alert, backspace,
210 form-feed, newline, carriage return, horizontal tab or vertical tab
211 respectively; other escaped characters are unchanged, although they lose
212 any special meaning they might have had.
213 .hP \*o
214 A section of text may be quoted by surrounding it by
215 .BR ' ... ' ,
216 .BR """" ... """" ,
217 or
218 .BR ` ... '
219 pairs. Within a quoted section, whitespace characters may appear
220 unescaped. The backslash may be used to quote control characters or the
221 quoting characters as usual.
222 .hP \*o
223 A word beginning with a hash
224 .RB (` # ')
225 character is considered to begin a
226 .I comment
227 which extends to the end of the current line. The hash character may be
228 escaped as usual.
229 .SS "Hashing algorithms"
230 The
231 .B hashsum
232 program understands several hashing algorithms:
233 .TP
234 .BR md2
235 Designed by Ron Rivest, although I don't know when, and described in
236 RFC1319, MD2 is a really old and slow hash function. Its security is
237 suspect too: only its checksum stands between it and collision-finding
238 attacks. Use of MD2 is not recommended, though it's still used in
239 various standards.
240 .TP
241 .BR md4 " and " md5
242 Designed by Ron Rivest in 1990 and 1992 respectively and described in
243 RFCs 1186, 1320 and 1321, these two early hash functions are efficient
244 but cryptographically suspect: the MD4 algorithm has been shown not to
245 be collision-resistant and there are `pseudo-collisions' in MD5.
246 Despite this,
247 .B md5
248 has been used heavily since its introduction and is still popular. MD4
249 is still useful when a fast non-cryptographic hash is wanted.
250 .TP
251 .B sha
252 Designed by the US National Security Agency as part of the Digital
253 Signature Standard, SHA-1 provides a longer output than
254 .B md4
255 and
256 .BR md5 ,
257 and is seen as being more secure.
258 .TP
259 .BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
260 Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
261 as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
262 the same length output as SHA-1, but has been designed in the open by
263 experts. RIPEMD28 is a shortened version of RIPEMD160 designed as a
264 drop-in replacement for MD4, MD5 and the old RIPEMD. The 256 and
265 320-bit versions are efficient double-width extensions of the 128 and
266 160-bit hashes, although they may not offer any additional security.
267 .TP
268 .B tiger
269 Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
270 processors, Tiger seems to be an efficient and strong hash function.
271 It's a relatively new algorithm, however, and should probably be
272 approached with an open-minded caution.
273 .TP
274 .BR sha256 ", " sha384 " and " sha512
275 Designed by the US National Security Agency to provide security
276 commensurate with the Advanced Encryption Standard, these hash functions
277 provide long outputs. SHA-256 is fairly quick, though the longer
278 variants are slower on 32-bit hardware since they require 64-bit
279 arithmetic. They're all very new at the moment, and should be
280 approached with an open-minded caution.
281 .PP
282 The default hashing algorithm is determined by looking at the name by
283 which it was invoked passed to it in
284 .BR argv[0] :
285 if it has the form
286 .RI ` alg \c
287 .BR sum '
288 where
289 .I alg
290 is the name of a hash function, that hash becomes the default. (Hence,
291 .B hashsum
292 can be used as a drop-in replacement for
293 .BR md5sum (1).)
294 If the program name doesn't match an algorithm, then
295 .B md5
296 is selected for compatibility with files generated by
297 .BR md5sum (1).
298 .PP
299 Note that the same default algorithm is used for both generating new
300 output files and checking existing ones. If the algorithm is forced by
301 the
302 .B \-a
303 option,
304 .B hashsum
305 will emit a
306 .RB ` #hash '
307 directive in its output.
308 .SH "COMPATIBILITY NOTES"
309 Once upon a time, there was only the
310 .BR md5sum (1)
311 utility. As its name suggested, it calculated MD5 hashes of files. MD5
312 was shown to be weak, so the author wrote
313 .B hashsum
314 to do the same job with other, hopefully stronger, hash functions. The
315 original
316 .B hashsum
317 program tried hard to be compatible with GNU
318 .BR md5sum (1),
319 but the latter has itself changed in incompatible ways since then;
320 .B hashsum
321 has intentionally not changed to match.
322 .PP
323 The following
324 .B hashsum
325 features are not found in the GNU Coreutils hashing utilities.
326 .hP
327 Filename escaping (the
328 .B \-e
329 option).
330 .hP
331 Magic comment lines in hash data to indicate algorithm selection, hash
332 encoding, and filename escaping.
333 .hP
334 Base-64 and Base-32 output.
335 .PP
336 Other differences are as follows.
337 .hP
338 Originally, if GNU
339 .B md5sum
340 was invoked without any filename arguments, it would print only the hash
341 of its stdin to stdout, which was very convenient for scripts which
342 manipulate hashes in nontrivial ways. This behaviour was later changed,
343 and now the GNU Coreutils hashing utilities always print a filename or
344 .RB ` \- '
345 after the hash. The
346 .B hashsum
347 program follows the original
348 .B md5sum
349 behaviour, and doesn't print a filename if no files were listed on the
350 command line.
351 .SH "SEE ALSO"
352 .BR md5sum (1),
353 .BR dsig (1),
354 .BR catsign (1),
355 .BR catcrypt (1).
356 .SH "AUTHOR"
357 Mark Wooding, <mdw@distorted.org.uk>