Rearrange the file tree.
[u/mdw/catacomb] / progs / hashsum.1
CommitLineData
4a3d0d52 1.\" -*-nroff-*-
2.de hP
3.IP
4.ft B
5\h'-\w'\\$1\ 'u'\\$1\ \c
6.ft P
7..
8.ie t .ds o \(bu
9.el .ds o o
d07dfe80 10.TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
4a3d0d52 11.SH NAME
12hashsum \- compute and verify cryptographic checksums of files
13.SH SYNOPSIS
14.B hashsum
f5e91c02 15.RB [ \-f0ecbjpv ]
4a3d0d52 16.RB [ \-a
17.IR algorithm ]
c65df279 18.RB [ \-E
19.IR encoding ]
4a3d0d52 20.IR files ...
21.SH DESCRIPTION
22The
23.B hashsum
24program generates and verifies cryptographic checksums (hashes) of
25files. A number of hashing algorithms are available.
26.PP
27The
28.B hashsum
c3321b13
MW
29program's options and output were originally designed to be upwardly
30compatible with the GNU
4a3d0d52 31.BR md5sum (1)
c3321b13
MW
32program, but the two have diverged somewhat. See the
33.B "COMPATIBILITY NOTES"
34section of this manual for details.
4a3d0d52 35.PP
36Usually,
37.B hashsum
38generates checksums of a collection of files named either on the command
39line or read from standard input, and write their hashes to standard
40output using a simple file format. However, given the
41.B \-c
42option, it will read in files in its usual output format and verify that
43the named files have the reported hashes.
44.SS "Options"
45The
46.B hashsum
47program understands the following options:
48.TP
49.B "\-h, \-\-help"
50Prints a help message to standard output and exits successfully.
51.TP
52.B "\-V, \-\-version"
53Prints the program's version number to standard output and exits
54successfully.
55.TP
56.B "\-u, \-\-usage"
57Prints a brief usage summary to standard output and exits successfully.
58.TP
c65df279 59.BR "\-l, \-\-list " [ \fIitem ...]
60Show lists of hash functions and encodings supported.
61.TP
4a3d0d52 62.BI "\-a, \-\-algorithm=" alg
63Use the hash algorithm
64.IR alg .
65If this option is not given, a default hashing algorithm is selected:
66see
67.B "Hashing algorithms"
68below.
69.TP
c65df279 70.BI "\-E, \-\-encoding=" encoding
71Use the given
72.I encoding
73to represent hashes in the output. This is not interoperable with other
74programs, but it's handy, e.g., for building sha1 URNs. The encodings
75recognized are
45c0fd36 76.B hex
c65df279 77(the default),
78.B base64
79and
80.BR base32 .
81Type
82.B hashsum \-\-list enc
83for a list of supported encodings.
4a3d0d52 84.TP
85.B "\-f, \-\-files"
86Each input file is considered to be a list of filenames which should be
87read and hashed. By default, the filenames are considered to be
88whitespace-separated, although control characters can be escaped (see
89.B "Escaping control characters"
90below).
91.TP
92.B "\-0, \-\-null"
93In conjunction with the
94.B \-f
95option above, reads null-terminated filenames, as emitted by GNU
96.BR find (1)'s
97.B \-print0
98option, rather than whitespace-delimited filenames. If the
99.B \-c
100option is also given, each named in the list is a list of filename/hash
101pairs to be checked.
102.TP
103.B "\-e, \-\-escape"
104Escape control characters (see
105.B "Escaping control characters"
106below) in filenames when generating output. Escaped
107output is not compatible with
108.BR md5sum (1),
109but copes better with files containing newlines and other strange
110control characters.
111.TP
112.B "\-c, \-\-check"
113Check hashes. Each input file is assumed to be in
114.BR hashsum 's
115output format. It is read, and
116.B hashsum
117will verify that each named file has the correct hash. Assuming that
118the hash list is authentic (e.g., it has been digitally signed, or
119obtained via some secure medium), this provides strong assurance that
120the files listed have not been tampered with.
121.TP
f5e91c02
MW
122.B "\-j, \-\-junk"
123Report files whose hashes have not been checked. This is most useful in
124conjunction with
125.RB ` \-c ',
126though it's valid without. The program merely prints warnings about
127junk files when computing hashes, but will exit nonzero if any are found
128when checking them.
129.TP
4a3d0d52 130.B "\-b, \-\-binary"
131Assume that the files to be hashed are binary files. This doesn't make
132any difference in Unix systems, although it might on other platforms
133which draw a distinction.
134.TP
43d1332f
MW
135.B "\-p, \-\-progress"
136Display a progress indicator while hashing large files. The progress
137indicator is written to standard error.
138.TP
4a3d0d52 139.B "\-v, \-\-verbose"
140In conjunction with the
141.B \-c
142option above, be verbose when checking files.
143.PP
144If no filenames are given on the command line, standard input is read.
145Standard input does not have a filename.
146.SS "Output format"
147There are three types of line in
148.BR hashsum 's
149output format:
150.IR directives ,
151.IR "file lines" ,
152and
153.IR rubbish .
154.PP
155A
156.I directive
157begins with a hash
158.RB (` # ')
7fb0660b 159character. These directives are currently understood:
4a3d0d52 160.TP
161.BI "#hash " alg
162Subsequent hashes in this file were generated using the algorithm
163.IR alg .
164.TP
c65df279 165.BI "#encoding " encoding
166Subsequent hashes in this file are represented using the named
167.IR encoding .
168.TP
4a3d0d52 169.BI "#escape"
170Filenames in subsequence lines are written using the `escaped' format,
171described below.
172.PP
173A
174.I "file line"
c65df279 175consists of a hash, in the requested encoding, followed by a space, a
4a3d0d52 176.IR flag ,
c65df279 177and the filename. The
4a3d0d52 178.I flag
179is either a star
180.RB (` * ')
181to indicate that the file should be read in binary mode, or a space.
182The rest of the line contains the filename.
183.PP
184A
185.I rubbish
186line is one which doesn't look like a directive or a file line. Rubbish
187lines are ignored. Hence, you can apply PGP clear-signing to a
188.B hashsum
189file without preventing it from being read.
190.SS "Escaping control characters"
191When reading filenames to hash from a list of files or an escaped hash
192list, the following rules are obeyed:
193.hP \*o
194An escaped string cannot contain unescaped, unquoted whitespace
195characters. If such a character is found, the string is considered to
196have ended.
197.hP \*o
198A backslash
199.RB (` \e ')
200escapes the following character. If the character is one of
201.RB ` a ',
202.RB ` b ',
203.RB ` f ',
204.RB ` n ',
205.RB ` r ',
206.RB ` t ',
207or
208.RB ` v ',
209it is replaced by the control character for an audible alert, backspace,
210form-feed, newline, carriage return, horizontal tab or vertical tab
211respectively; other escaped characters are unchanged, although they lose
212any special meaning they might have had.
213.hP \*o
214A section of text may be quoted by surrounding it by
215.BR ' ... ' ,
216.BR """" ... """" ,
217or
218.BR ` ... '
219pairs. Within a quoted section, whitespace characters may appear
220unescaped. The backslash may be used to quote control characters or the
221quoting characters as usual.
222.hP \*o
223A word beginning with a hash
224.RB (` # ')
225character is considered to begin a
226.I comment
227which extends to the end of the current line. The hash character may be
228escaped as usual.
229.SS "Hashing algorithms"
230The
231.B hashsum
232program understands several hashing algorithms:
233.TP
2d3de78a 234.BR md2
235Designed by Ron Rivest, although I don't know when, and described in
236RFC1319, MD2 is a really old and slow hash function. Its security is
237suspect too: only its checksum stands between it and collision-finding
238attacks. Use of MD2 is not recommended, though it's still used in
239various standards.
240.TP
4a3d0d52 241.BR md4 " and " md5
242Designed by Ron Rivest in 1990 and 1992 respectively and described in
243RFCs 1186, 1320 and 1321, these two early hash functions are efficient
244but cryptographically suspect: the MD4 algorithm has been shown not to
245be collision-resistant and there are `pseudo-collisions' in MD5.
246Despite this,
247.B md5
248has been used heavily since its introduction and is still popular. MD4
249is still useful when a fast non-cryptographic hash is wanted.
250.TP
251.B sha
252Designed by the US National Security Agency as part of the Digital
253Signature Standard, SHA-1 provides a longer output than
254.B md4
255and
256.BR md5 ,
257and is seen as being more secure.
258.TP
259.BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
260Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
261as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
262the same length output as SHA-1, but has been designed in the open by
263experts. RIPEMD28 is a shortened version of RIPEMD160 designed as a
264drop-in replacement for MD4, MD5 and the old RIPEMD. The 256 and
265320-bit versions are efficient double-width extensions of the 128 and
266160-bit hashes, although they may not offer any additional security.
267.TP
268.B tiger
269Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
270processors, Tiger seems to be an efficient and strong hash function.
4a3d0d52 271It's a relatively new algorithm, however, and should probably be
272approached with an open-minded caution.
2d3de78a 273.TP
bad16614 274.BR sha256 ", " sha384 " and " sha512
2d3de78a 275Designed by the US National Security Agency to provide security
276commensurate with the Advanced Encryption Standard, these hash functions
277provide long outputs. SHA-256 is fairly quick, though the longer
278variants are slower on 32-bit hardware since they require 64-bit
279arithmetic. They're all very new at the moment, and should be
280approached with an open-minded caution.
4a3d0d52 281.PP
282The default hashing algorithm is determined by looking at the name by
283which it was invoked passed to it in
284.BR argv[0] :
285if it has the form
286.RI ` alg \c
287.BR sum '
288where
289.I alg
290is the name of a hash function, that hash becomes the default. (Hence,
291.B hashsum
292can be used as a drop-in replacement for
293.BR md5sum (1).)
294If the program name doesn't match an algorithm, then
295.B md5
296is selected for compatibility with files generated by
297.BR md5sum (1).
298.PP
299Note that the same default algorithm is used for both generating new
300output files and checking existing ones. If the algorithm is forced by
301the
302.B \-a
303option,
304.B hashsum
305will emit a
306.RB ` #hash '
307directive in its output.
c3321b13
MW
308.SH "COMPATIBILITY NOTES"
309Once upon a time, there was only the
310.BR md5sum (1)
311utility. As its name suggested, it calculated MD5 hashes of files. MD5
312was shown to be weak, so the author wrote
313.B hashsum
314to do the same job with other, hopefully stronger, hash functions. The
315original
316.B hashsum
317program tried hard to be compatible with GNU
318.BR md5sum (1),
319but the latter has itself changed in incompatible ways since then;
320.B hashsum
321has intentionally not changed to match.
322.PP
323The following
324.B hashsum
325features are not found in the GNU Coreutils hashing utilities.
326.hP
327Filename escaping (the
328.B \-e
329option).
330.hP
331Magic comment lines in hash data to indicate algorithm selection, hash
332encoding, and filename escaping.
333.hP
334Base-64 and Base-32 output.
335.PP
336Other differences are as follows.
337.hP
338Originally, if GNU
339.B md5sum
340was invoked without any filename arguments, it would print only the hash
341of its stdin to stdout, which was very convenient for scripts which
342manipulate hashes in nontrivial ways. This behaviour was later changed,
343and now the GNU Coreutils hashing utilities always print a filename or
344.RB ` \- '
345after the hash. The
346.B hashsum
347program follows the original
348.B md5sum
349behaviour, and doesn't print a filename if no files were listed on the
350command line.
4a3d0d52 351.SH "SEE ALSO"
fa54fe1e 352.BR md5sum (1),
353.BR dsig (1),
354.BR catsign (1),
355.BR catcrypt (1).
4a3d0d52 356.SH "AUTHOR"
f387fcb1 357Mark Wooding, <mdw@distorted.org.uk>