Add an internal-representation no-op function.
[u/mdw/catacomb] / hashsum.1
CommitLineData
4a3d0d52 1.\" -*-nroff-*-
2.de hP
3.IP
4.ft B
5\h'-\w'\\$1\ 'u'\\$1\ \c
6.ft P
7..
8.ie t .ds o \(bu
9.el .ds o o
d07dfe80 10.TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
4a3d0d52 11.SH NAME
12hashsum \- compute and verify cryptographic checksums of files
13.SH SYNOPSIS
14.B hashsum
15.RB [ \-f0ecbv ]
16.RB [ \-a
17.IR algorithm ]
18.IR files ...
19.SH DESCRIPTION
20The
21.B hashsum
22program generates and verifies cryptographic checksums (hashes) of
23files. A number of hashing algorithms are available.
24.PP
25The
26.B hashsum
27program's options and output are designed to be upwardly compatible with
28the GNU
29.BR md5sum (1)
30program.
31.PP
32Usually,
33.B hashsum
34generates checksums of a collection of files named either on the command
35line or read from standard input, and write their hashes to standard
36output using a simple file format. However, given the
37.B \-c
38option, it will read in files in its usual output format and verify that
39the named files have the reported hashes.
40.SS "Options"
41The
42.B hashsum
43program understands the following options:
44.TP
45.B "\-h, \-\-help"
46Prints a help message to standard output and exits successfully.
47.TP
48.B "\-V, \-\-version"
49Prints the program's version number to standard output and exits
50successfully.
51.TP
52.B "\-u, \-\-usage"
53Prints a brief usage summary to standard output and exits successfully.
54.TP
55.BI "\-a, \-\-algorithm=" alg
56Use the hash algorithm
57.IR alg .
58If this option is not given, a default hashing algorithm is selected:
59see
60.B "Hashing algorithms"
61below.
62.TP
63.B "\-l, \-\-list"
64Prints a space-separated list of available hashing algorithms to
65standard output and exits successfully.
66.TP
67.B "\-f, \-\-files"
68Each input file is considered to be a list of filenames which should be
69read and hashed. By default, the filenames are considered to be
70whitespace-separated, although control characters can be escaped (see
71.B "Escaping control characters"
72below).
73.TP
74.B "\-0, \-\-null"
75In conjunction with the
76.B \-f
77option above, reads null-terminated filenames, as emitted by GNU
78.BR find (1)'s
79.B \-print0
80option, rather than whitespace-delimited filenames. If the
81.B \-c
82option is also given, each named in the list is a list of filename/hash
83pairs to be checked.
84.TP
85.B "\-e, \-\-escape"
86Escape control characters (see
87.B "Escaping control characters"
88below) in filenames when generating output. Escaped
89output is not compatible with
90.BR md5sum (1),
91but copes better with files containing newlines and other strange
92control characters.
93.TP
94.B "\-c, \-\-check"
95Check hashes. Each input file is assumed to be in
96.BR hashsum 's
97output format. It is read, and
98.B hashsum
99will verify that each named file has the correct hash. Assuming that
100the hash list is authentic (e.g., it has been digitally signed, or
101obtained via some secure medium), this provides strong assurance that
102the files listed have not been tampered with.
103.TP
104.B "\-b, \-\-binary"
105Assume that the files to be hashed are binary files. This doesn't make
106any difference in Unix systems, although it might on other platforms
107which draw a distinction.
108.TP
109.B "\-v, \-\-verbose"
110In conjunction with the
111.B \-c
112option above, be verbose when checking files.
113.PP
114If no filenames are given on the command line, standard input is read.
115Standard input does not have a filename.
116.SS "Output format"
117There are three types of line in
118.BR hashsum 's
119output format:
120.IR directives ,
121.IR "file lines" ,
122and
123.IR rubbish .
124.PP
125A
126.I directive
127begins with a hash
128.RB (` # ')
129character. Two directives are currently understood:
130.TP
131.BI "#hash " alg
132Subsequent hashes in this file were generated using the algorithm
133.IR alg .
134.TP
135.BI "#escape"
136Filenames in subsequence lines are written using the `escaped' format,
137described below.
138.PP
139A
140.I "file line"
141consists of a hash, in hexadecimal, followed by a space, a
142.IR flag ,
143and the filename. If the current hash algorithm produces
144.IR n -bit
145output, there must be
146.IR n /4
147hex digits of hash in a file line. The
148.I flag
149is either a star
150.RB (` * ')
151to indicate that the file should be read in binary mode, or a space.
152The rest of the line contains the filename.
153.PP
154A
155.I rubbish
156line is one which doesn't look like a directive or a file line. Rubbish
157lines are ignored. Hence, you can apply PGP clear-signing to a
158.B hashsum
159file without preventing it from being read.
160.SS "Escaping control characters"
161When reading filenames to hash from a list of files or an escaped hash
162list, the following rules are obeyed:
163.hP \*o
164An escaped string cannot contain unescaped, unquoted whitespace
165characters. If such a character is found, the string is considered to
166have ended.
167.hP \*o
168A backslash
169.RB (` \e ')
170escapes the following character. If the character is one of
171.RB ` a ',
172.RB ` b ',
173.RB ` f ',
174.RB ` n ',
175.RB ` r ',
176.RB ` t ',
177or
178.RB ` v ',
179it is replaced by the control character for an audible alert, backspace,
180form-feed, newline, carriage return, horizontal tab or vertical tab
181respectively; other escaped characters are unchanged, although they lose
182any special meaning they might have had.
183.hP \*o
184A section of text may be quoted by surrounding it by
185.BR ' ... ' ,
186.BR """" ... """" ,
187or
188.BR ` ... '
189pairs. Within a quoted section, whitespace characters may appear
190unescaped. The backslash may be used to quote control characters or the
191quoting characters as usual.
192.hP \*o
193A word beginning with a hash
194.RB (` # ')
195character is considered to begin a
196.I comment
197which extends to the end of the current line. The hash character may be
198escaped as usual.
199.SS "Hashing algorithms"
200The
201.B hashsum
202program understands several hashing algorithms:
203.TP
2d3de78a 204.BR md2
205Designed by Ron Rivest, although I don't know when, and described in
206RFC1319, MD2 is a really old and slow hash function. Its security is
207suspect too: only its checksum stands between it and collision-finding
208attacks. Use of MD2 is not recommended, though it's still used in
209various standards.
210.TP
4a3d0d52 211.BR md4 " and " md5
212Designed by Ron Rivest in 1990 and 1992 respectively and described in
213RFCs 1186, 1320 and 1321, these two early hash functions are efficient
214but cryptographically suspect: the MD4 algorithm has been shown not to
215be collision-resistant and there are `pseudo-collisions' in MD5.
216Despite this,
217.B md5
218has been used heavily since its introduction and is still popular. MD4
219is still useful when a fast non-cryptographic hash is wanted.
220.TP
221.B sha
222Designed by the US National Security Agency as part of the Digital
223Signature Standard, SHA-1 provides a longer output than
224.B md4
225and
226.BR md5 ,
227and is seen as being more secure.
228.TP
229.BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
230Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
231as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
232the same length output as SHA-1, but has been designed in the open by
233experts. RIPEMD28 is a shortened version of RIPEMD160 designed as a
234drop-in replacement for MD4, MD5 and the old RIPEMD. The 256 and
235320-bit versions are efficient double-width extensions of the 128 and
236160-bit hashes, although they may not offer any additional security.
237.TP
238.B tiger
239Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
240processors, Tiger seems to be an efficient and strong hash function.
4a3d0d52 241It's a relatively new algorithm, however, and should probably be
242approached with an open-minded caution.
2d3de78a 243.TP
bad16614 244.BR sha256 ", " sha384 " and " sha512
2d3de78a 245Designed by the US National Security Agency to provide security
246commensurate with the Advanced Encryption Standard, these hash functions
247provide long outputs. SHA-256 is fairly quick, though the longer
248variants are slower on 32-bit hardware since they require 64-bit
249arithmetic. They're all very new at the moment, and should be
250approached with an open-minded caution.
4a3d0d52 251.PP
252The default hashing algorithm is determined by looking at the name by
253which it was invoked passed to it in
254.BR argv[0] :
255if it has the form
256.RI ` alg \c
257.BR sum '
258where
259.I alg
260is the name of a hash function, that hash becomes the default. (Hence,
261.B hashsum
262can be used as a drop-in replacement for
263.BR md5sum (1).)
264If the program name doesn't match an algorithm, then
265.B md5
266is selected for compatibility with files generated by
267.BR md5sum (1).
268.PP
269Note that the same default algorithm is used for both generating new
270output files and checking existing ones. If the algorithm is forced by
271the
272.B \-a
273option,
274.B hashsum
275will emit a
276.RB ` #hash '
277directive in its output.
278.SH "SEE ALSO"
279.BR md5sum (1).
280.SH "AUTHOR"
281Mark Wooding, <mdw@nsict.org>