Release 1.1.1.
[rsync-backup] / fshash.1
1 .ie t .ds o \(bu
2 .el .ds o o
3 .de hP
4 .IP
5 \h'-\w'\fB\\$1\ \fP'u'\fB\\$1\ \fP\c
6 ..
7 .TH fshash 1 "8 October 2012" rsync-backup
8 .SH SYNOPSIS
9 .B fshash
10 .RB [ \-a ]
11 .RB [ \-c
12 .IR cache ]
13 .RB [ \-f
14 .IR format ]
15 .RB [ \-H
16 .IR hash ]
17 .RI [ file
18 \&...]
19 .br
20 .B fshash
21 .RB \-u
22 .B \-c
23 .I cache
24 .RB [ \-H
25 .IR hash ]
26 .RI [ dir ]
27 .SH DESCRIPTION
28 The
29 .B fshash
30 program generates digests of filesystems. It's similar in concept to
31 (but somewhat different from) Ian Jackson's
32 .BR summer (1)
33 tool.
34 .PP
35 The idea is to capture everything interesting about a filesystem in a
36 file with the following properties:
37 .TP
38 .I Completeness
39 The digest file describes everything `interesting' about the filesystem,
40 such that two filesystems which are interestingly different will have
41 distinct digests.
42 .TP
43 .I Canonicalness
44 If two filesystems aren't different in any interesting way, then their
45 digests should be identical.
46 .TP
47 .I Readability
48 Given two subtly different filesystems, it's easy for a human equipped
49 with digests for them and
50 .BR diff (1)
51 to work out what the differences actually are.
52 .SS Command-line processing
53 The following command-line arguments are accepted.
54 .TP
55 .B \-h, \-\-help
56 Show a summary of the command-line syntax, and exit successfully.
57 .TP
58 .B \-\-version
59 Show the program's version number, and exit successfully.
60 .TP
61 .B \-a, \-\-all
62 Clear the cache of information about all files except those processed in
63 this run.
64 .TP
65 .B \-c, \-\-cache=\fIfile
66 Keep a cache of file hashes in the
67 .IR file .
68 The cache is keyed by inode and modification time: if a file has an
69 entry in the cache already then it won't be hashed again, which can
70 provide a valuable performance improvement on large filesystems. If the
71 .I file
72 doesn't exist, then it will be created.
73 .TP
74 .B \-f, \-\-files=\fIformat
75 Read a list of filenames on standard input in the given
76 .I format
77 and write digest lines for them. The
78 .I format
79 may be:
80 .B find0
81 for simple null-terminated names, as produced by
82 .BR "find \-\-print0" ;
83 or
84 .B rsync
85 for file data as produced by
86 .BR rsync (1).
87 The latter is useful, since
88 .B rsync
89 has powerful file inclusion and exclusion capabilities \(en and a common
90 use case is generating a digest for a collection of files copied using
91 .BR rsync .
92 (The
93 .B find0
94 format doesn't work well: see
95 .B BUGS
96 below.)
97 .TP
98 .B \-H, \-\-hash=\fIhash
99 Use the
100 .I hash
101 function, which can be any hash function supported by Python's
102 .BR hashlib .
103 This option may be omitted: if it is, then the hash is read from the
104 cache file; if there is no cache file either, then an error is reported.
105 .TP
106 .B \-u, \-\-udiff
107 Rather than produce a manifest, read a unified
108 .BR diff (1)
109 from standard input, and clear from the cache all files mentioned as
110 being different. Filenames in the diff are considered relative to
111 .I dir ,
112 defaulting to the current working directory.
113 .PP
114 Positional arguments are interpreted as files and directories to be
115 processed, in order. A directory name which ends in
116 .RB ` / '
117 is treated specially:
118 .B fshash
119 writes filenames relative to the given directory.
120 .SS Output format
121 Information about each filesystem object is written on a separate line.
122 These lines can be quite long, and consist of a number of fields:
123 .hP 1.
124 For regular files, a cryptographic hash of the file's content, in
125 hexadecimal. For other kinds of filesystem object, a description of the
126 object type and any special information about it, in square brackets,
127 and padded with spaces so as to take the same width as a hash; see
128 below for details.
129 .hP 2.
130 A `virtual inode identifier': a string which will be the same in two
131 lines if and only if they represent hard links to the same underlying
132 inode. Some care is taken so that files are assigned the same
133 identifier even if other parts of the filesystem are different, so as to
134 avoid spurious differences.
135 .hP 3.
136 The object's permissions and mode bits, in octal.
137 .hP 4.
138 The file's owner and group, in decimal, separated by a colon.
139 .hP 5.
140 The file's last-modified time, in UTC, in ISO8601 format, i.e.,
141 .IB yyyy \(en mm \(en dd T hh : mm : ss Z \fR.
142 .hP 6.
143 The file's size in bytes, in decimal.
144 .hP 7.
145 The file's name (relative to some appropriate parent directory).
146 Characters which
147 would cause ambiguity are escaped: tab, linefeed and carriage return are
148 printed as
149 .RB ` \et ',
150 .RB ` \en ',
151 and
152 .RB ` \er ',
153 respectively;
154 .RB ` ' '
155 is printed as
156 .RB ` \e' ';
157 .RB ` \e '
158 is printed as
159 .RB ` \e\e ';
160 and other codes outside the range 32\(en127 are printed as hex escaped,
161 in the form
162 .RB ` \ex\fIxx '.
163 Finally, the sequence
164 .RB ` \~\->\~ '
165 is printed as
166 .RB ` \~\e\->\~ '
167 so that symlink targets are presented unambiguously (see below).
168 .PP
169 For non-regular file objects, the first field is an information field
170 enclosed in square brackets, and some of the other fields provide other
171 information or are suppressed, follows.
172 .TP
173 .I Errors
174 If there was an error reading the object's metadata then the information
175 field shows
176 .BI E nn
177 .IR message ,
178 and the other fields, except the name, are printed as
179 .B error
180 rather than having any useful information.
181 .TP
182 .I Sockets
183 The information field shows
184 .BR socket .
185 .TP
186 .I Named pipes
187 The information field shows
188 .BR fifo .
189 .TP
190 .I Symbolic links
191 The information field shows
192 .BR symbolic-link .
193 The name is followed by
194 .RB ` \~\->\~ '
195 and the link target (or
196 .BI <E nn \~ message >
197 if there was an error reading the link destination).
198 .TP
199 .I Directories
200 The information field shows
201 .BR directory ,
202 and the size field shows
203 .B dir
204 (since directory sizes are not consistent across filesystem
205 implementations). The name is followed by
206 .RB ` / '.
207 .TP
208 .I Block and character devices
209 The information field shows
210 .B block-device
211 or
212 .BR character-device ,
213 as appropriate, followed by the major and minor device numbers in
214 decimal, and separated by a colon.
215 .PP
216 .SH BUGS
217 No attempt is made to sort filenames read in
218 .B find0
219 format, so they're not very likely to match digests produced any other
220 way. Indeed, they're not very likely to match digests produced by
221 .B find0
222 on other machines either.
223 .SH SEE ALSO
224 .BR find (1),
225 .BR rsync (1),
226 .BR sha256sum (1)
227 etc.
228 .SH AUTHOR
229 Mark Wooding, <mdw@distorted.org.uk>