cc.h: Reorder the declarations.
[u/mdw/catacomb] / hashsum.1
CommitLineData
4a3d0d52 1.\" -*-nroff-*-
2.de hP
3.IP
4.ft B
5\h'-\w'\\$1\ 'u'\\$1\ \c
6.ft P
7..
8.ie t .ds o \(bu
9.el .ds o o
d07dfe80 10.TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
4a3d0d52 11.SH NAME
12hashsum \- compute and verify cryptographic checksums of files
13.SH SYNOPSIS
14.B hashsum
43d1332f 15.RB [ \-f0ecbpv ]
4a3d0d52 16.RB [ \-a
17.IR algorithm ]
c65df279 18.RB [ \-E
19.IR encoding ]
4a3d0d52 20.IR files ...
21.SH DESCRIPTION
22The
23.B hashsum
24program generates and verifies cryptographic checksums (hashes) of
25files. A number of hashing algorithms are available.
26.PP
27The
28.B hashsum
c3321b13
MW
29program's options and output were originally designed to be upwardly
30compatible with the GNU
4a3d0d52 31.BR md5sum (1)
c3321b13
MW
32program, but the two have diverged somewhat. See the
33.B "COMPATIBILITY NOTES"
34section of this manual for details.
4a3d0d52 35.PP
36Usually,
37.B hashsum
38generates checksums of a collection of files named either on the command
39line or read from standard input, and write their hashes to standard
40output using a simple file format. However, given the
41.B \-c
42option, it will read in files in its usual output format and verify that
43the named files have the reported hashes.
44.SS "Options"
45The
46.B hashsum
47program understands the following options:
48.TP
49.B "\-h, \-\-help"
50Prints a help message to standard output and exits successfully.
51.TP
52.B "\-V, \-\-version"
53Prints the program's version number to standard output and exits
54successfully.
55.TP
56.B "\-u, \-\-usage"
57Prints a brief usage summary to standard output and exits successfully.
58.TP
c65df279 59.BR "\-l, \-\-list " [ \fIitem ...]
60Show lists of hash functions and encodings supported.
61.TP
4a3d0d52 62.BI "\-a, \-\-algorithm=" alg
63Use the hash algorithm
64.IR alg .
65If this option is not given, a default hashing algorithm is selected:
66see
67.B "Hashing algorithms"
68below.
69.TP
c65df279 70.BI "\-E, \-\-encoding=" encoding
71Use the given
72.I encoding
73to represent hashes in the output. This is not interoperable with other
74programs, but it's handy, e.g., for building sha1 URNs. The encodings
75recognized are
45c0fd36 76.B hex
c65df279 77(the default),
78.B base64
79and
80.BR base32 .
81Type
82.B hashsum \-\-list enc
83for a list of supported encodings.
4a3d0d52 84.TP
85.B "\-f, \-\-files"
86Each input file is considered to be a list of filenames which should be
87read and hashed. By default, the filenames are considered to be
88whitespace-separated, although control characters can be escaped (see
89.B "Escaping control characters"
90below).
91.TP
92.B "\-0, \-\-null"
93In conjunction with the
94.B \-f
95option above, reads null-terminated filenames, as emitted by GNU
96.BR find (1)'s
97.B \-print0
98option, rather than whitespace-delimited filenames. If the
99.B \-c
100option is also given, each named in the list is a list of filename/hash
101pairs to be checked.
102.TP
103.B "\-e, \-\-escape"
104Escape control characters (see
105.B "Escaping control characters"
106below) in filenames when generating output. Escaped
107output is not compatible with
108.BR md5sum (1),
109but copes better with files containing newlines and other strange
110control characters.
111.TP
112.B "\-c, \-\-check"
113Check hashes. Each input file is assumed to be in
114.BR hashsum 's
115output format. It is read, and
116.B hashsum
117will verify that each named file has the correct hash. Assuming that
118the hash list is authentic (e.g., it has been digitally signed, or
119obtained via some secure medium), this provides strong assurance that
120the files listed have not been tampered with.
121.TP
122.B "\-b, \-\-binary"
123Assume that the files to be hashed are binary files. This doesn't make
124any difference in Unix systems, although it might on other platforms
125which draw a distinction.
126.TP
43d1332f
MW
127.B "\-p, \-\-progress"
128Display a progress indicator while hashing large files. The progress
129indicator is written to standard error.
130.TP
4a3d0d52 131.B "\-v, \-\-verbose"
132In conjunction with the
133.B \-c
134option above, be verbose when checking files.
135.PP
136If no filenames are given on the command line, standard input is read.
137Standard input does not have a filename.
138.SS "Output format"
139There are three types of line in
140.BR hashsum 's
141output format:
142.IR directives ,
143.IR "file lines" ,
144and
145.IR rubbish .
146.PP
147A
148.I directive
149begins with a hash
150.RB (` # ')
7fb0660b 151character. These directives are currently understood:
4a3d0d52 152.TP
153.BI "#hash " alg
154Subsequent hashes in this file were generated using the algorithm
155.IR alg .
156.TP
c65df279 157.BI "#encoding " encoding
158Subsequent hashes in this file are represented using the named
159.IR encoding .
160.TP
4a3d0d52 161.BI "#escape"
162Filenames in subsequence lines are written using the `escaped' format,
163described below.
164.PP
165A
166.I "file line"
c65df279 167consists of a hash, in the requested encoding, followed by a space, a
4a3d0d52 168.IR flag ,
c65df279 169and the filename. The
4a3d0d52 170.I flag
171is either a star
172.RB (` * ')
173to indicate that the file should be read in binary mode, or a space.
174The rest of the line contains the filename.
175.PP
176A
177.I rubbish
178line is one which doesn't look like a directive or a file line. Rubbish
179lines are ignored. Hence, you can apply PGP clear-signing to a
180.B hashsum
181file without preventing it from being read.
182.SS "Escaping control characters"
183When reading filenames to hash from a list of files or an escaped hash
184list, the following rules are obeyed:
185.hP \*o
186An escaped string cannot contain unescaped, unquoted whitespace
187characters. If such a character is found, the string is considered to
188have ended.
189.hP \*o
190A backslash
191.RB (` \e ')
192escapes the following character. If the character is one of
193.RB ` a ',
194.RB ` b ',
195.RB ` f ',
196.RB ` n ',
197.RB ` r ',
198.RB ` t ',
199or
200.RB ` v ',
201it is replaced by the control character for an audible alert, backspace,
202form-feed, newline, carriage return, horizontal tab or vertical tab
203respectively; other escaped characters are unchanged, although they lose
204any special meaning they might have had.
205.hP \*o
206A section of text may be quoted by surrounding it by
207.BR ' ... ' ,
208.BR """" ... """" ,
209or
210.BR ` ... '
211pairs. Within a quoted section, whitespace characters may appear
212unescaped. The backslash may be used to quote control characters or the
213quoting characters as usual.
214.hP \*o
215A word beginning with a hash
216.RB (` # ')
217character is considered to begin a
218.I comment
219which extends to the end of the current line. The hash character may be
220escaped as usual.
221.SS "Hashing algorithms"
222The
223.B hashsum
224program understands several hashing algorithms:
225.TP
2d3de78a 226.BR md2
227Designed by Ron Rivest, although I don't know when, and described in
228RFC1319, MD2 is a really old and slow hash function. Its security is
229suspect too: only its checksum stands between it and collision-finding
230attacks. Use of MD2 is not recommended, though it's still used in
231various standards.
232.TP
4a3d0d52 233.BR md4 " and " md5
234Designed by Ron Rivest in 1990 and 1992 respectively and described in
235RFCs 1186, 1320 and 1321, these two early hash functions are efficient
236but cryptographically suspect: the MD4 algorithm has been shown not to
237be collision-resistant and there are `pseudo-collisions' in MD5.
238Despite this,
239.B md5
240has been used heavily since its introduction and is still popular. MD4
241is still useful when a fast non-cryptographic hash is wanted.
242.TP
243.B sha
244Designed by the US National Security Agency as part of the Digital
245Signature Standard, SHA-1 provides a longer output than
246.B md4
247and
248.BR md5 ,
249and is seen as being more secure.
250.TP
251.BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
252Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
253as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
254the same length output as SHA-1, but has been designed in the open by
255experts. RIPEMD28 is a shortened version of RIPEMD160 designed as a
256drop-in replacement for MD4, MD5 and the old RIPEMD. The 256 and
257320-bit versions are efficient double-width extensions of the 128 and
258160-bit hashes, although they may not offer any additional security.
259.TP
260.B tiger
261Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
262processors, Tiger seems to be an efficient and strong hash function.
4a3d0d52 263It's a relatively new algorithm, however, and should probably be
264approached with an open-minded caution.
2d3de78a 265.TP
bad16614 266.BR sha256 ", " sha384 " and " sha512
2d3de78a 267Designed by the US National Security Agency to provide security
268commensurate with the Advanced Encryption Standard, these hash functions
269provide long outputs. SHA-256 is fairly quick, though the longer
270variants are slower on 32-bit hardware since they require 64-bit
271arithmetic. They're all very new at the moment, and should be
272approached with an open-minded caution.
4a3d0d52 273.PP
274The default hashing algorithm is determined by looking at the name by
275which it was invoked passed to it in
276.BR argv[0] :
277if it has the form
278.RI ` alg \c
279.BR sum '
280where
281.I alg
282is the name of a hash function, that hash becomes the default. (Hence,
283.B hashsum
284can be used as a drop-in replacement for
285.BR md5sum (1).)
286If the program name doesn't match an algorithm, then
287.B md5
288is selected for compatibility with files generated by
289.BR md5sum (1).
290.PP
291Note that the same default algorithm is used for both generating new
292output files and checking existing ones. If the algorithm is forced by
293the
294.B \-a
295option,
296.B hashsum
297will emit a
298.RB ` #hash '
299directive in its output.
c3321b13
MW
300.SH "COMPATIBILITY NOTES"
301Once upon a time, there was only the
302.BR md5sum (1)
303utility. As its name suggested, it calculated MD5 hashes of files. MD5
304was shown to be weak, so the author wrote
305.B hashsum
306to do the same job with other, hopefully stronger, hash functions. The
307original
308.B hashsum
309program tried hard to be compatible with GNU
310.BR md5sum (1),
311but the latter has itself changed in incompatible ways since then;
312.B hashsum
313has intentionally not changed to match.
314.PP
315The following
316.B hashsum
317features are not found in the GNU Coreutils hashing utilities.
318.hP
319Filename escaping (the
320.B \-e
321option).
322.hP
323Magic comment lines in hash data to indicate algorithm selection, hash
324encoding, and filename escaping.
325.hP
326Base-64 and Base-32 output.
327.PP
328Other differences are as follows.
329.hP
330Originally, if GNU
331.B md5sum
332was invoked without any filename arguments, it would print only the hash
333of its stdin to stdout, which was very convenient for scripts which
334manipulate hashes in nontrivial ways. This behaviour was later changed,
335and now the GNU Coreutils hashing utilities always print a filename or
336.RB ` \- '
337after the hash. The
338.B hashsum
339program follows the original
340.B md5sum
341behaviour, and doesn't print a filename if no files were listed on the
342command line.
4a3d0d52 343.SH "SEE ALSO"
fa54fe1e 344.BR md5sum (1),
345.BR dsig (1),
346.BR catsign (1),
347.BR catcrypt (1).
4a3d0d52 348.SH "AUTHOR"
f387fcb1 349Mark Wooding, <mdw@distorted.org.uk>