From: Mark Wooding Date: Sun, 7 Oct 2012 21:20:47 +0000 (+0100) Subject: First cut at documentation. X-Git-Tag: 0.99.1 X-Git-Url: https://git.distorted.org.uk/~mdw/rsync-backup/commitdiff_plain/e268723f8b2c538bcc05a68e050ee64b64e1b5bd First cut at documentation. --- diff --git a/Makefile.am b/Makefile.am index e2fabd9..36d5d92 100644 --- a/Makefile.am +++ b/Makefile.am @@ -64,6 +64,7 @@ rfreezefs_SOURCES = rfreezefs.c rfreezefs_LDADD = $(mLib_LIBS) sbin_SCRIPTS += rsync-backup +dist_man_MANS += rsync-backup.8 CLEANFILES += rsync-backup EXTRA_DIST += rsync-backup.in rsync-backup: rsync-backup.in Makefile @@ -73,6 +74,7 @@ rsync-backup: rsync-backup.in Makefile mv rsync-backup.new rsync-backup bin_SCRIPTS += fshash +dist_man_MANS += fshash.1 CLEANFILES += fshash EXTRA_DIST += fshash.in fshash: fshash.in Makefile diff --git a/fshash.1 b/fshash.1 new file mode 100644 index 0000000..9f8a0cc --- /dev/null +++ b/fshash.1 @@ -0,0 +1,214 @@ +.ie t .ds o \(bu +.el .ds o o +.de hP +.IP +\h'-\w'\fB\\$1\ \fP'u'\fB\\$1\ \fP\c +.. +.TH fshash 1 "8 October 2012" rsync-backup +.SH SYNOPSIS +.B fshash +.RB [ \-a ] +.RB [ \-c +.IR cache ] +.RB [ \-f +.IR format ] +.RB [ \-H +.IR hash ] +.RI [ file +\&...] +.SH DESCRIPTION +The +.B fshash +program generates digests of filesystems. It's similar in concept (but +somewhat different from) Ian Jackson's +.BR summer (1) +tool. +.PP +The idea is to capture everything interesting about a filesystem in a +file with the following properties: +.TP +.I Completeness +The digest file describes everything `interesting' about the filesystem, +such that two filesystems which are interestingly different will have +distinct digests. +.TP +.I Canonicalness +If two filesystems aren't different in any interesting way, then their +digests should be identical. +.TP +.I Readability +Given two subtly different filesystems, it's easy for a human equipped +with digests for them and +.BR diff (1) +to work out what the differences actually are. +.SS Command-line processing +The following command-line arguments are accepted. +.TP +.B \-h, \-\-help +Show a summary of the command-line syntax, and exit successfully. +.TP +.B \-\-version +Show the program's version number, and exit successfully. +.TP +.B \-a, \-\-all +Clear the cache of information about all files except those processed in +this run. +.TP +.B \-c, \-\-cache=\fIfile +Keep a cache of file hashes in the +.IR file . +The cache is keyed by inode and modification time: if a file has an +entry in the cache already then it won't be hashed again, which can +provide a valuable performance improvement on large filesystems. If the +.I file +doesn't exist, then it will be created. +.TP +.B \-f, \-\-files=\fIformat +Read a list of filenames on standard input in the given +.I format +and write digest lines for them. The +.I format +may be: +.B find0 +for simple null-terminated names, as produced by +.BR "find \-\-print0" ; +or +.B rsync +for file data as produced by +.BR rsync (1). +The latter is useful, since +.B rsync +has powerful file inclusion and exclusion capabilities \(en and a common +use case is generating a digest for a collection of files copied using +.BR rsync . +(The +.B find0 +format doesn't work well: see +.B BUGS +below.) +.TP +.B \-H, \-\-hash=\fIhash +Use the +.I hash +function, which can be any hash function supported by Python's +.BR hashlib . +If this option may be omitted then the hash is read from the cache file; +if there is no cache file either, then an error is reported. +.PP +Positional arguments are interpreted as files and directories to be +processed, in order. A directory name which ends in +.RB ` / ' +is treated specially: +.B fshash +writes filenames relative to the given directory. +.SS Output format +Information about each filesystem object is written on a separate line. +These lines can be quite long, and consist of a number of fields: +.hP 1. +For regular files, a cryptographic hash of the file's content, in +hexadecimal. For other kinds of filesystem object, a description of the +object type and any special information about it, in square brackets, +and padded with spaces so as to take the same width as a hash; see +below for details. +as follows. +.hP 2. +A `virtual inode identifier': a string which will be the same in two +lines if and only if they represent hard links to the same underlying +inode. Some care is taken so that files are assigned the same +identifier even if other parts of the filesystem are different, so as to +avoid spurious differences. +.hP 3. +The object's permissions and mode bits, in octal. +.hP 4. +The file's owner and group, in decimal, separated by a colon. +.hP 5. +The file's last-modified time, in UTC, in ISO8601 format, i.e., +.IB yyyy \(en mm \(en dd T hh : mm : ss Z \fR. +.hP 6. +The file's size in bytes, in decimal. +.hP 7. +The file's name (relative to some appropriate parent directory). +Characters which +would cause ambiguity are escaped: tab, linefeed and carriage return are +printed as +.RB ` \et ', +.RB ` \en ', +and +.RB ` \er ', +respectively; +.RB ` ' ' +is printed as +.RB ` \e' '; +.RB ` \e ' +is printed as +.RB ` \e\e '; +and other codes outside the range 32\(en127 are printed as hex escaped, +in the form +.RB ` \ex\fIxx '. +Finally, the sequence +.RB ` \~\->\~ ' +is printed as +.RB ` \~\e\->\~ ' +so that symlink targets are presented unambiguously (see below). +.PP +For non-regular file objects, the first field is an information field +enclosed in square brackets, and some of the other fields provide other +information or are suppressed, follows. +.TP +.I Errors +If there was an error reading the object's metadata then the information +field shows +.BI Enn +.IR message , +and the other fields, except the name, are printed as +.B error +rather than having any useful information. +.TP +.I Sockets +The information field shows +.BR socket . +.TP +.I Named pipes +The information field shows +.BR fifo . +.TP +.I Symbolic links +The information field shows +.BR symbolic-link . +The name is followed by +.RB ` \~\->\~ ' +and the link target (or by +.BI +if there was an error reading the link destination). +.TP +.I Directories +The information field shows +.BR directory , +and the size field shows +.B dir +(since directory sizes are not consistent across filesystem +implementations). The name is followed by +.RB ` / '. +.TP +.I Block and character devices +The information field shows +.B block-device +or +.BR character-device , +as appropriate, followed by the major and minor device numbers in +decimal, and separated by a colon. +.PP +.SH BUGS +No attempt is made to sort filenames read in +.B find0 +format, so they're not very likely to match digests produced any other +way. Indeed, they're not very likely to match digests produced by +.B find0 +on other machines either. +.SH SEE ALSO +.BR find (1), +.BR rsync (1), +.BR sha256sum (1) +etc. +.SH AUTHOR +Mark Wooding, diff --git a/rsync-backup.8 b/rsync-backup.8 index 3d2cffc..a32c656 100644 --- a/rsync-backup.8 +++ b/rsync-backup.8 @@ -1,3 +1,9 @@ +.ie t .ds o \(bu +.el .ds o o +.de hP +.IP +\h'-\w'\fB\\$1\ \fP'u'\fB\\$1\ \fP\c +.. .TH rsync-backup 8 "7 October 2012" rsync-backup .SH SYNOPSIS .B rsync-backup @@ -40,10 +46,91 @@ security disaster. Remember that the backup server is, in the end, responsible for the integrity of the backup data. A dishonest backup server can easily compromise a client which is being restored from corrupt backup data.) +.SS Command-line options +Most of the behaviour of +.B rsync-backup +is controlled by a configuration file, described starting with the +section named +.B Configuration commands +below. +But a few features are controlled by command-line options. +.TP +.B \-h +Show a brief help message for the program, and exit successfully. +.TP +.B \-V +Show +.BR rsync-backup 's +version number and some choice pieces of build-time configuration, and +exit successfully. +.TP +.BI "\-c " conf +Read +.I conf +instead of the default configuration file (shown as +.B conf +in the +.B \-V +output). +.TP +.B \-v +Produce verbose progress information on standard output while the backup +is running. This keeps one amused while running a backup +interactively. In any event, +.B rsync-backup +will report failures to standard error, and otherwise run silently, so +it doesn't annoy unnecessarily if run by +.BR cron (8). +.SS Backup process +Backing up a filesystem works as follows. +.hP \*o +Make a snapshot of the filesystem on the client, and ensure that the +snapshot is mounted. There are some `trivial' snapshot types which use +the existing mounted filesystem, and either prevent processes writing to +it during the backup, or just hope for the best. Other snapshot types +require the snapshot to be mounted somewhere distinct from the main +filesystem, so that the latter can continue being used. +.hP \*o +Run +.B rsync +to copy the snapshot to the backup volume \(en specifically, to +.IB host / fs / new \fR. +If this directory already exists, then it's presumed to be debris from a +previous attempt to dump this filesystem: +.B rsync +will update it appropriately, by adding, deleting or modifying the +files. This means that retrying a failed dump \(en after fixing whatever +caused it to go wrong, obviously! \(en is usually fairly quick. +.hP \*o +Run +.B fshash +on the client to generate a `digest' describing the contents of the +filesystem, and send this to the server as +.IB host / fs / new .fshash \fR. +.hP \*o +Release the snapshot: we don't need it any more. +.hP \*o +Run +.B fshash +over the new backup; specifically, to +.BI tmp/fshash. host . fs . date \fR. +This gives us a digest for what the backup volume actually stored. +.hP \*o +Compare the two +.B fshash +digests. If they differ then dump the differences to the log file and +report a backup failure. (Backups aren't any good if they don't +actually back up the right thing. And you stand a better chance of +fixing them if you know that they're going wrong.) +.hP \*o +Commit the backup, by renaming the dump directory to +.IB host / fs / date +and the +.B fshash +digest file to +.IB host / fs / date .fshash \fR. .PP -The - - +The backup is now complete. .SS Configuration commands The configuration file is simply a Bash shell fragment: configuration commands are shell functions. @@ -58,7 +145,12 @@ Future .B backup commands will back up filesystems on the named .IR host . -This clears the +To back up filesystems on the backup server itself, use its hostname: +.B rsync-backup +will avoid inefficient and pointless messing about +.BR ssh (1) +in this case. +This command clears the .B like list. .TP @@ -87,7 +179,7 @@ can be .BR weekly , .BR monthly , or -.B annually +.B annually (or .BR yearly , which means the same); the @@ -120,9 +212,15 @@ value is 14. Command-line options to pass to .BR rsync (1) in addition to the basic set: -.B \-\-archive \-\-hard-links \-\-numeric-ids \-\-del \-\-sparse -.B \-\-compress \-\-one-file-system \-\-partial -.B \-\-filter="dir-merge .rsync-backup" +.B \-\-archive +.B \-\-hard-links +.B \-\-numeric-ids +.B \-\-del +.B \-\-sparse +.B \-\-compress +.B \-\-one-file-system +.B \-\-partial +.BR "\-\-filter=""dir-merge .rsync-backup""" . The default is .BR \-\-verbose . .TP @@ -261,3 +359,42 @@ type handlers: please see the script for details. Please send the author interesting snapshot handlers for inclusion in the main distribution. .SS Archive structure +Backup trees are stored in a fairly straightforward directory tree. +.PP +At the top level is one directory for each client host. There are also +some special entries: +.TP +.B fshash.cache +The cache database used for improving performance of local file +hashing. There may be other +.B fshash.cache-* +files used by SQLite for its own purposes. +.TP +.B lost+found +Part of the filesystem used on the backup volume. You don't want to +mess with this. +.TP +.B tmp +Used to store temporary files during the backup process. (Some of them +want to be on the same filesystem as the rest of the backup.) When +things go wrong, files are left behind in the hope that they might help +someone debug the mess. It's always safe to delete the files in here +when no backup is running. +.PP +So don't use those names for your hosts. +.PP +The next layer down contains a directory for each filesystem on the given host. +.PP +The bottom layer contains a directory for each dump of that filesystem, +named with the date at which the dump was started (in ISO8601 +.IB yyyy \(en mm \(en dd +format), together with associated files named +.IB date .* \fR. +.SH SEE ALSO +.BR fshash (1), +.BR lvm (8), +.BR rfreezefs (8), +.BR rsync (1), +.BR ssh (1). +.SH AUTHOR +Mark Wooding,