hashsum.c: Optional progress indicator for large files.
[u/mdw/catacomb] / hashsum.1
1 .\" -*-nroff-*-
2 .de hP
3 .IP
4 .ft B
5 \h'-\w'\\$1\ 'u'\\$1\ \c
6 .ft P
7 ..
8 .ie t .ds o \(bu
9 .el .ds o o
10 .TH hashsum 1 "29 July 2000" "Straylight/Edgeware" "Catacomb cryptographic library"
11 .SH NAME
12 hashsum \- compute and verify cryptographic checksums of files
13 .SH SYNOPSIS
14 .B hashsum
15 .RB [ \-f0ecbpv ]
16 .RB [ \-a
17 .IR algorithm ]
18 .RB [ \-E
19 .IR encoding ]
20 .IR files ...
21 .SH DESCRIPTION
22 The
23 .B hashsum
24 program generates and verifies cryptographic checksums (hashes) of
25 files. A number of hashing algorithms are available.
26 .PP
27 The
28 .B hashsum
29 program's options and output are designed to be upwardly compatible with
30 the GNU
31 .BR md5sum (1)
32 program.
33 .PP
34 Usually,
35 .B hashsum
36 generates checksums of a collection of files named either on the command
37 line or read from standard input, and write their hashes to standard
38 output using a simple file format. However, given the
39 .B \-c
40 option, it will read in files in its usual output format and verify that
41 the named files have the reported hashes.
42 .SS "Options"
43 The
44 .B hashsum
45 program understands the following options:
46 .TP
47 .B "\-h, \-\-help"
48 Prints a help message to standard output and exits successfully.
49 .TP
50 .B "\-V, \-\-version"
51 Prints the program's version number to standard output and exits
52 successfully.
53 .TP
54 .B "\-u, \-\-usage"
55 Prints a brief usage summary to standard output and exits successfully.
56 .TP
57 .BR "\-l, \-\-list " [ \fIitem ...]
58 Show lists of hash functions and encodings supported.
59 .TP
60 .BI "\-a, \-\-algorithm=" alg
61 Use the hash algorithm
62 .IR alg .
63 If this option is not given, a default hashing algorithm is selected:
64 see
65 .B "Hashing algorithms"
66 below.
67 .TP
68 .BI "\-E, \-\-encoding=" encoding
69 Use the given
70 .I encoding
71 to represent hashes in the output. This is not interoperable with other
72 programs, but it's handy, e.g., for building sha1 URNs. The encodings
73 recognized are
74 .B hex
75 (the default),
76 .B base64
77 and
78 .BR base32 .
79 Type
80 .B hashsum \-\-list enc
81 for a list of supported encodings.
82 .TP
83 .B "\-f, \-\-files"
84 Each input file is considered to be a list of filenames which should be
85 read and hashed. By default, the filenames are considered to be
86 whitespace-separated, although control characters can be escaped (see
87 .B "Escaping control characters"
88 below).
89 .TP
90 .B "\-0, \-\-null"
91 In conjunction with the
92 .B \-f
93 option above, reads null-terminated filenames, as emitted by GNU
94 .BR find (1)'s
95 .B \-print0
96 option, rather than whitespace-delimited filenames. If the
97 .B \-c
98 option is also given, each named in the list is a list of filename/hash
99 pairs to be checked.
100 .TP
101 .B "\-e, \-\-escape"
102 Escape control characters (see
103 .B "Escaping control characters"
104 below) in filenames when generating output. Escaped
105 output is not compatible with
106 .BR md5sum (1),
107 but copes better with files containing newlines and other strange
108 control characters.
109 .TP
110 .B "\-c, \-\-check"
111 Check hashes. Each input file is assumed to be in
112 .BR hashsum 's
113 output format. It is read, and
114 .B hashsum
115 will verify that each named file has the correct hash. Assuming that
116 the hash list is authentic (e.g., it has been digitally signed, or
117 obtained via some secure medium), this provides strong assurance that
118 the files listed have not been tampered with.
119 .TP
120 .B "\-b, \-\-binary"
121 Assume that the files to be hashed are binary files. This doesn't make
122 any difference in Unix systems, although it might on other platforms
123 which draw a distinction.
124 .TP
125 .B "\-p, \-\-progress"
126 Display a progress indicator while hashing large files. The progress
127 indicator is written to standard error.
128 .TP
129 .B "\-v, \-\-verbose"
130 In conjunction with the
131 .B \-c
132 option above, be verbose when checking files.
133 .PP
134 If no filenames are given on the command line, standard input is read.
135 Standard input does not have a filename.
136 .SS "Output format"
137 There are three types of line in
138 .BR hashsum 's
139 output format:
140 .IR directives ,
141 .IR "file lines" ,
142 and
143 .IR rubbish .
144 .PP
145 A
146 .I directive
147 begins with a hash
148 .RB (` # ')
149 character. Two directives are currently understood:
150 .TP
151 .BI "#hash " alg
152 Subsequent hashes in this file were generated using the algorithm
153 .IR alg .
154 .TP
155 .BI "#encoding " encoding
156 Subsequent hashes in this file are represented using the named
157 .IR encoding .
158 .TP
159 .BI "#escape"
160 Filenames in subsequence lines are written using the `escaped' format,
161 described below.
162 .PP
163 A
164 .I "file line"
165 consists of a hash, in the requested encoding, followed by a space, a
166 .IR flag ,
167 and the filename. The
168 .I flag
169 is either a star
170 .RB (` * ')
171 to indicate that the file should be read in binary mode, or a space.
172 The rest of the line contains the filename.
173 .PP
174 A
175 .I rubbish
176 line is one which doesn't look like a directive or a file line. Rubbish
177 lines are ignored. Hence, you can apply PGP clear-signing to a
178 .B hashsum
179 file without preventing it from being read.
180 .SS "Escaping control characters"
181 When reading filenames to hash from a list of files or an escaped hash
182 list, the following rules are obeyed:
183 .hP \*o
184 An escaped string cannot contain unescaped, unquoted whitespace
185 characters. If such a character is found, the string is considered to
186 have ended.
187 .hP \*o
188 A backslash
189 .RB (` \e ')
190 escapes the following character. If the character is one of
191 .RB ` a ',
192 .RB ` b ',
193 .RB ` f ',
194 .RB ` n ',
195 .RB ` r ',
196 .RB ` t ',
197 or
198 .RB ` v ',
199 it is replaced by the control character for an audible alert, backspace,
200 form-feed, newline, carriage return, horizontal tab or vertical tab
201 respectively; other escaped characters are unchanged, although they lose
202 any special meaning they might have had.
203 .hP \*o
204 A section of text may be quoted by surrounding it by
205 .BR ' ... ' ,
206 .BR """" ... """" ,
207 or
208 .BR ` ... '
209 pairs. Within a quoted section, whitespace characters may appear
210 unescaped. The backslash may be used to quote control characters or the
211 quoting characters as usual.
212 .hP \*o
213 A word beginning with a hash
214 .RB (` # ')
215 character is considered to begin a
216 .I comment
217 which extends to the end of the current line. The hash character may be
218 escaped as usual.
219 .SS "Hashing algorithms"
220 The
221 .B hashsum
222 program understands several hashing algorithms:
223 .TP
224 .BR md2
225 Designed by Ron Rivest, although I don't know when, and described in
226 RFC1319, MD2 is a really old and slow hash function. Its security is
227 suspect too: only its checksum stands between it and collision-finding
228 attacks. Use of MD2 is not recommended, though it's still used in
229 various standards.
230 .TP
231 .BR md4 " and " md5
232 Designed by Ron Rivest in 1990 and 1992 respectively and described in
233 RFCs 1186, 1320 and 1321, these two early hash functions are efficient
234 but cryptographically suspect: the MD4 algorithm has been shown not to
235 be collision-resistant and there are `pseudo-collisions' in MD5.
236 Despite this,
237 .B md5
238 has been used heavily since its introduction and is still popular. MD4
239 is still useful when a fast non-cryptographic hash is wanted.
240 .TP
241 .B sha
242 Designed by the US National Security Agency as part of the Digital
243 Signature Standard, SHA-1 provides a longer output than
244 .B md4
245 and
246 .BR md5 ,
247 and is seen as being more secure.
248 .TP
249 .BR rmd128 ", " rmd160 ", " rmd256 " and " rmd320
250 Designed by Antoon Bosselaers, Hans Dobbertin and Bart Preneel in 1996
251 as a replacement for the earlier RIPEMD algorithm, RIPEMD160 provides
252 the same length output as SHA-1, but has been designed in the open by
253 experts. RIPEMD28 is a shortened version of RIPEMD160 designed as a
254 drop-in replacement for MD4, MD5 and the old RIPEMD. The 256 and
255 320-bit versions are efficient double-width extensions of the 128 and
256 160-bit hashes, although they may not offer any additional security.
257 .TP
258 .B tiger
259 Designed by Ross Anderson and Eli Biham to take advantage of 64-bit
260 processors, Tiger seems to be an efficient and strong hash function.
261 It's a relatively new algorithm, however, and should probably be
262 approached with an open-minded caution.
263 .TP
264 .BR sha256 ", " sha384 " and " sha512
265 Designed by the US National Security Agency to provide security
266 commensurate with the Advanced Encryption Standard, these hash functions
267 provide long outputs. SHA-256 is fairly quick, though the longer
268 variants are slower on 32-bit hardware since they require 64-bit
269 arithmetic. They're all very new at the moment, and should be
270 approached with an open-minded caution.
271 .PP
272 The default hashing algorithm is determined by looking at the name by
273 which it was invoked passed to it in
274 .BR argv[0] :
275 if it has the form
276 .RI ` alg \c
277 .BR sum '
278 where
279 .I alg
280 is the name of a hash function, that hash becomes the default. (Hence,
281 .B hashsum
282 can be used as a drop-in replacement for
283 .BR md5sum (1).)
284 If the program name doesn't match an algorithm, then
285 .B md5
286 is selected for compatibility with files generated by
287 .BR md5sum (1).
288 .PP
289 Note that the same default algorithm is used for both generating new
290 output files and checking existing ones. If the algorithm is forced by
291 the
292 .B \-a
293 option,
294 .B hashsum
295 will emit a
296 .RB ` #hash '
297 directive in its output.
298 .SH "SEE ALSO"
299 .BR md5sum (1),
300 .BR dsig (1),
301 .BR catsign (1),
302 .BR catcrypt (1).
303 .SH "AUTHOR"
304 Mark Wooding, <mdw@distorted.org.uk>