Maintain an index of backup artifacts.
[rsync-backup] / fshash.1
1 .ie t .ds o \(bu
2 .el .ds o o
3 .de hP
4 .IP
5 \h'-\w'\fB\\$1\ \fP'u'\fB\\$1\ \fP\c
6 ..
7 .TH fshash 1 "8 October 2012" rsync-backup
8 .SH SYNOPSIS
9 .B fshash
10 .RB [ \-a ]
11 .RB [ \-c
12 .IR cache ]
13 .RB [ \-f
14 .IR format ]
15 .RB [ \-H
16 .IR hash ]
17 .RI [ file
18 \&...]
19 .SH DESCRIPTION
20 The
21 .B fshash
22 program generates digests of filesystems. It's similar in concept (but
23 somewhat different from) Ian Jackson's
24 .BR summer (1)
25 tool.
26 .PP
27 The idea is to capture everything interesting about a filesystem in a
28 file with the following properties:
29 .TP
30 .I Completeness
31 The digest file describes everything `interesting' about the filesystem,
32 such that two filesystems which are interestingly different will have
33 distinct digests.
34 .TP
35 .I Canonicalness
36 If two filesystems aren't different in any interesting way, then their
37 digests should be identical.
38 .TP
39 .I Readability
40 Given two subtly different filesystems, it's easy for a human equipped
41 with digests for them and
42 .BR diff (1)
43 to work out what the differences actually are.
44 .SS Command-line processing
45 The following command-line arguments are accepted.
46 .TP
47 .B \-h, \-\-help
48 Show a summary of the command-line syntax, and exit successfully.
49 .TP
50 .B \-\-version
51 Show the program's version number, and exit successfully.
52 .TP
53 .B \-a, \-\-all
54 Clear the cache of information about all files except those processed in
55 this run.
56 .TP
57 .B \-c, \-\-cache=\fIfile
58 Keep a cache of file hashes in the
59 .IR file .
60 The cache is keyed by inode and modification time: if a file has an
61 entry in the cache already then it won't be hashed again, which can
62 provide a valuable performance improvement on large filesystems. If the
63 .I file
64 doesn't exist, then it will be created.
65 .TP
66 .B \-f, \-\-files=\fIformat
67 Read a list of filenames on standard input in the given
68 .I format
69 and write digest lines for them. The
70 .I format
71 may be:
72 .B find0
73 for simple null-terminated names, as produced by
74 .BR "find \-\-print0" ;
75 or
76 .B rsync
77 for file data as produced by
78 .BR rsync (1).
79 The latter is useful, since
80 .B rsync
81 has powerful file inclusion and exclusion capabilities \(en and a common
82 use case is generating a digest for a collection of files copied using
83 .BR rsync .
84 (The
85 .B find0
86 format doesn't work well: see
87 .B BUGS
88 below.)
89 .TP
90 .B \-H, \-\-hash=\fIhash
91 Use the
92 .I hash
93 function, which can be any hash function supported by Python's
94 .BR hashlib .
95 If this option may be omitted then the hash is read from the cache file;
96 if there is no cache file either, then an error is reported.
97 .PP
98 Positional arguments are interpreted as files and directories to be
99 processed, in order. A directory name which ends in
100 .RB ` / '
101 is treated specially:
102 .B fshash
103 writes filenames relative to the given directory.
104 .SS Output format
105 Information about each filesystem object is written on a separate line.
106 These lines can be quite long, and consist of a number of fields:
107 .hP 1.
108 For regular files, a cryptographic hash of the file's content, in
109 hexadecimal. For other kinds of filesystem object, a description of the
110 object type and any special information about it, in square brackets,
111 and padded with spaces so as to take the same width as a hash; see
112 below for details.
113 as follows.
114 .hP 2.
115 A `virtual inode identifier': a string which will be the same in two
116 lines if and only if they represent hard links to the same underlying
117 inode. Some care is taken so that files are assigned the same
118 identifier even if other parts of the filesystem are different, so as to
119 avoid spurious differences.
120 .hP 3.
121 The object's permissions and mode bits, in octal.
122 .hP 4.
123 The file's owner and group, in decimal, separated by a colon.
124 .hP 5.
125 The file's last-modified time, in UTC, in ISO8601 format, i.e.,
126 .IB yyyy \(en mm \(en dd T hh : mm : ss Z \fR.
127 .hP 6.
128 The file's size in bytes, in decimal.
129 .hP 7.
130 The file's name (relative to some appropriate parent directory).
131 Characters which
132 would cause ambiguity are escaped: tab, linefeed and carriage return are
133 printed as
134 .RB ` \et ',
135 .RB ` \en ',
136 and
137 .RB ` \er ',
138 respectively;
139 .RB ` ' '
140 is printed as
141 .RB ` \e' ';
142 .RB ` \e '
143 is printed as
144 .RB ` \e\e ';
145 and other codes outside the range 32\(en127 are printed as hex escaped,
146 in the form
147 .RB ` \ex\fIxx '.
148 Finally, the sequence
149 .RB ` \~\->\~ '
150 is printed as
151 .RB ` \~\e\->\~ '
152 so that symlink targets are presented unambiguously (see below).
153 .PP
154 For non-regular file objects, the first field is an information field
155 enclosed in square brackets, and some of the other fields provide other
156 information or are suppressed, follows.
157 .TP
158 .I Errors
159 If there was an error reading the object's metadata then the information
160 field shows
161 .BI Enn
162 .IR message ,
163 and the other fields, except the name, are printed as
164 .B error
165 rather than having any useful information.
166 .TP
167 .I Sockets
168 The information field shows
169 .BR socket .
170 .TP
171 .I Named pipes
172 The information field shows
173 .BR fifo .
174 .TP
175 .I Symbolic links
176 The information field shows
177 .BR symbolic-link .
178 The name is followed by
179 .RB ` \~\->\~ '
180 and the link target (or by
181 .BI <E nn \~ message >
182 if there was an error reading the link destination).
183 .TP
184 .I Directories
185 The information field shows
186 .BR directory ,
187 and the size field shows
188 .B dir
189 (since directory sizes are not consistent across filesystem
190 implementations). The name is followed by
191 .RB ` / '.
192 .TP
193 .I Block and character devices
194 The information field shows
195 .B block-device
196 or
197 .BR character-device ,
198 as appropriate, followed by the major and minor device numbers in
199 decimal, and separated by a colon.
200 .PP
201 .SH BUGS
202 No attempt is made to sort filenames read in
203 .B find0
204 format, so they're not very likely to match digests produced any other
205 way. Indeed, they're not very likely to match digests produced by
206 .B find0
207 on other machines either.
208 .SH SEE ALSO
209 .BR find (1),
210 .BR rsync (1),
211 .BR sha256sum (1)
212 etc.
213 .SH AUTHOR
214 Mark Wooding, <mdw@distorted.org.uk>