.ie t .ds o \(bu .el .ds o o .de hP .IP \h'-\w'\fB\\$1\ \fP'u'\fB\\$1\ \fP\c .. .TH rsync-backup 8 "7 October 2012" rsync-backup .SH NAME rsync-backup \- back up files using rsync .SH SYNOPSIS .B rsync-backup .RB [ \-nv ] .RB [ \-c .IR config-file ] .SH DESCRIPTION The .B rsync-backup script is a backup program of the currently popular .RB ` rsync (1) .BR \-\-link-dest ' variety. It uses .BR rsync 's ability to create hardlinks from (apparently) similar existing local trees to make incremental dumps efficient, even from remote sources. Restoring files is easy because the backups created are just directories full of files, exactly as they were on the source \(en and this is verified using the .BR fshash (1) program. .PP The script does more than just running .BR rsync . It is also responsible for creating and removing snapshots of volumes to be backed up, and expiring old dumps according to a user-specified retention policy. .SS Installation The idea is that the .B rsync-backup script should be installed and run on a central backup server with local access to the backup volumes. .PP The script should be run with full (root) privileges, so that it can correctly record file ownership information. The server should also be able to connect via .BR ssh (1) to the client machines, and run processes there as root. (This is not a security disaster. Remember that the backup server is, in the end, responsible for the integrity of the backup data. A dishonest backup server can easily compromise a client which is being restored from corrupt backup data.) .SS Command-line options Most of the behaviour of .B rsync-backup is controlled by a configuration file, described starting with the section named .B Configuration commands below. But a few features are controlled by command-line options. .TP .B \-h Show a brief help message for the program, and exit successfully. .TP .B \-V Show .BR rsync-backup 's version number and some choice pieces of build-time configuration, and exit successfully. .TP .BI "\-c " conf Read .I conf instead of the default configuration file (shown as .B conf in the .B \-V output). .TP .B \-n Don't actually take a backup, or write proper logs: instead, write a description of what would be done to standard error. .TP .B \-v Produce verbose progress information on standard output while the backup is running. This keeps one amused while running a backup interactively. In any event, .B rsync-backup will report failures to standard error, and otherwise run silently, so it doesn't annoy unnecessarily if run by .BR cron (8). .SS Backup process Backing up a filesystem works as follows. .hP \*o Make a snapshot of the filesystem on the client, and ensure that the snapshot is mounted. There are some `trivial' snapshot types which use the existing mounted filesystem, and either prevent processes writing to it during the backup, or just hope for the best. Other snapshot types require the snapshot to be mounted somewhere distinct from the main filesystem, so that the latter can continue being used. .hP \*o Run .B rsync to copy the snapshot to the backup volume \(en specifically, to .IB host / fs / new \fR. If this directory already exists, then it's presumed to be debris from a previous attempt to dump this filesystem: .B rsync will update it appropriately, by adding, deleting or modifying the files. This means that retrying a failed dump \(en after fixing whatever caused it to go wrong, obviously! \(en is usually fairly quick. .hP \*o Run .B fshash on the client to generate a `digest' describing the contents of the filesystem, and send this to the server as .IB host / fs / new .fshash \fR. .hP \*o Release the snapshot: we don't need it any more. .hP \*o Run .B fshash over the new backup; specifically, to .BI tmp/fshash. host . fs . date \fR. This gives us a digest for what the backup volume actually stored. .hP \*o Compare the two .B fshash digests. If they differ then dump the differences to the log file and report a backup failure. (Backups aren't any good if they don't actually back up the right thing. And you stand a better chance of fixing them if you know that they're going wrong.) .hP \*o Commit the backup, by renaming the dump directory to .IB host / fs / date and the .B fshash digest file to .IB host / fs / date .fshash \fR. .PP The backup is now complete. .SS Configuration commands The configuration file is simply a Bash shell fragment: configuration commands are shell functions. .TP .BI "addhook " hook " " command Arrange that the named .I hook runs the given .IR command . See .B runhook for more details. .TP .BI "backup " "fs\fR[:\fIfsarg\fR] ..." Back up the named filesystems. The corresponding .IR fsarg s may be required by the snapshot type. .TP .BI "defhook " hook Define a new hook named .IR hook . See .B addhook and .B runhook for more information. .TP .BI "host " host Future .B backup commands will back up filesystems on the named .IR host . To back up filesystems on the backup server itself, use its hostname: .B rsync-backup will avoid inefficient and pointless messing about .BR ssh (1) in this case. This command clears the .B like list, the remote .B user name, and resets the retention policy to its default (i.e., the to policy defined prior to the first .B host command). .TP .BI "like " "host\fR ..." Declare that subsequent filesystems are `similar' to like-named filesystems on the named .IR host s, and that .B rsync should use those trees as potential sources of hardlinkable files. Be careful when using this option without .BR rsync 's .B \-\-checksum option: an erroneous hardlink will cause the backup to fail. (The backup won't be left silently incorrect.) .TP .BI "retain " frequency " " duration Define part a backup retention policy: backup trees of the .I frequency should be kept for the .IR duration . The .I frequency can be .BR daily , .BR weekly , .BR monthly , or .B annually (or .BR yearly , which means the same); the .I duration may be any of .BR week , .BR month , .BR year , or .BR forever . Expiry considers each existing dump against the policy lines in order: the last applicable line determines the dump's fate \(en so you should probably write the lines in decreasing order of duration. .RS .PP Groups of .B retain commands between .B host and/or .B backup commands collectively define a retention policy. Once a policy is defined, subsequent .B backup operations use the policy. The first .B retain command after a .B host or .B backup command clears the policy and starts defining a new one. The policy defined before the first .B host is the .I default policy: at the start of each .B host stanza, the policy is reset to the default. .RE .TP .BI "retry " count The .B live snapshot type (see below) doesn't prevent a filesystem from being modified while it's being backed up. If this happens, the .B fshash pass will detect the difference and fail. If the filesystem in question is relatively quiescent, then maybe retrying the backup will result in a successful consistent copy. Following this command, a backup which results in an .B fshash mismatch will be retried up to .I count times before being declared a failure. The default is to retry once, clearing mismatching files from the .BR fshash (1) caches before the second attempt. .TP .BI "runhook " hook " " args\fR... Invoke the named .IR hook . The individual commands on the hook are run, in order, as .RS .IP .I command .IR args ... .PP If any command fails (returns nonzero) then no other hooks are run and .B runhook fails with the same exit code. .RE .TP .BI "snap " type " " \fR[\fIargs\fR...] Use the snapshot .I type for subsequent backups. Some snapshot types require additional arguments, which may be supplied here. This command clears the .B retry counter. .TP .BI "user " name Specify the user name on the remote host. Without this, calls to .BR ssh (1) and .BR rsync (1) won't specify any user name, so the default (probably from the .BR ssh_config (5) file) will apply. .SS Configuration variables The following shell variables may be overridden by the configuration file. .TP .B HASH The hash function to use for verifying archive integrity. This is passed to the .B \-H option of .BR fshash , so it must name one of the hash functions supported by your Python's .B hashlib module. The default is .BR sha256 . .TP .B INDEXDB The name of a SQLite database initialized by .BR update-bkp-index (8) in which an index is maintained of which dumps are on which backup volumes. If the file doesn't exist, then no index is maintained. The default is .IB localstatedir /lib/bkp/index.db where .I localstatedir is the state directory configured at build time. .TP .B MAXLOG The number of log files to be kept for each filesystem. Old logfiles are deleted to keep the total number below this bound. The default value is 14. .TP .B METADIR The metadata directory for the currently mounted backup volume. The default is .IB mntbkpdir /meta where .I mntbkpdir is the backup mount directory configured at build time. .TP .B RSYNCOPTS Command-line options to pass to .BR rsync (1) in addition to the basic set: .B \-\-archive .B \-\-hard-links .B \-\-numeric-ids .B \-\-del .B \-\-sparse .B \-\-compress .B \-\-one-file-system .B \-\-partial .BR "\-\-filter=""dir-merge .rsync-backup""" . The default is .BR \-\-verbose . .TP .B SNAPDIR LVM (and .BR rfreezefs ) snapshots are mounted on subdirectories below the .B SNAPDIR .IR "on backup clients" . The default is .IB mntbkpdir /snap where .I mntbkpdir is the backup mount directory configured at build time. .TP .B SNAPSIZE The volume size option to pass to .BR lvcreate (8) when creating a snapshot. The default is .B \-l10%ORIGIN which seems to work fairly well. .TP .B STOREDIR Where the actual backup trees should be stored. See the section on .B Archive structure below. The default is .IB mntbkpdir /store where .I mntbkpdir is the backup mount directory configured at build time. .TP .B VOLUME The name of the current volume. If this is left unset, the volume name is read from the file .IB METADIR /volume once at the start of the backup run. .SS Hooks The configuration file can modify the behaviour of the backup in two main ways: by adding commands to hooks (see the .B addhook command); and by redefining shell functions. .PP The following hooks are defined. .TP .BI "commit " host " " fs " " date Called during the commit procedure. The backup tree and manifest have been renamed into their proper places. Typically one would use this hook to rename files created in a corresponding .B precommit command. .TP .BI "end " rc The backup has completed; .B rsync-backup will exit with status .IR rc . .TP .BI "precommit " host " " fs " " date Called after a backup has been verified complete and about to be committed. The backup tree is in .B new in the current directory, and the .B fshash manifest is in .BR new.fshash . A typical action would be to create a digital signature on the manifest. .TP .BI "setup " host " " fs " " date Called when a backup of a particular filesystem is about to start. It can return with code 99 to skip the backup. .TP .B "start" Invoked before performing any actual dumps (the first time .B host is run). .PP The following shell functions can be redefined by users. .TP .BI "backup_commit_hook " host " " fs " " date Called from the .B commit hook for compatibility. .TP .BI "backup_precommit_hook " host " " fs " " date Called from the .B precommit hook for compatibility. .TP .BR "whine " [ \-n ] " " \fItext\fR... Called to report `interesting' events when the .B \-v option is in force. The default action is to echo the .I text to (what was initially) standard output, followed by a newline unless .B \-n is given. .SS Snapshot types The following snapshot types are available. .TP .B live A trivial snapshot type: attempts to back up a live filesystem. How well this works depends on how active the filesystem is. If files change while the dump is in progress then the .B fshash verification will likely fail. Backups using this snapshot type must specify the filesystem mount point as the .IR fsarg . .TP .B ro A slightly less trivial snapshot type: make the filesystem read-only while the dump is in progress. Backups using this snapshot type must specify the filesystem mount point as the .IR fsarg . .TP .BI "lvm " vg Create snapshots using LVM. The snapshot argument is interpreted as the relevant volume group. The filesystem name is interpreted as the origin volume name; the snapshot will be called .IB fs .bkp and mounted on .IB SNAPDIR / fs \fR; space will be allocated to it according to the .I SNAPSIZE variable. .TP .BI "rfreezefs " client " " vg This gets complicated. Suppose that a server has an LVM volume group, and exports (somehow) a logical volume to a client. Examples are a host providing a virtual disk to a guest, or a server providing network-attached storage to a client. The server can create a snapshot of the volume using LVM, but must synchronize with the client to ensure that the filesystem image captured in the snapshot is clean. The .BR rfreezefs (8) program should be installed on the client to perform this rather delicate synchronization. Declare the server using the .B host command as usual; pass the client's name as the .I client and the server's volume group name as the .I vg snapshot arguments. Finally, backups using this snapshot type must specify the filesystem mount point (or, actually, any file in the filesystem) on the client, as the .IR fsarg . .PP Additional snapshot types can be defined in the configuration file. A snapshot type requires two shell functions. .TP .BI snap_ type " " snapargs " " fs " " fsarg Create the snapshot, and write the mountpoint (on the client host) to standard output, in a form suitable as an argument to .BR rsync . .TP .BI unsnap_ type " " snapargs " " fs " " fsarg Remove the snapshot. .PP There are a number of utility functions which can be used by snapshot type handlers: please see the script for details. Please send the author interesting snapshot handlers for inclusion in the main distribution. .SS Archive structure Backup trees are stored in a fairly straightforward directory tree. .PP At the top level is one directory for each client host. There are also some special entries: .TP .B \&.rsync-backup-store This file must be present in order to indicate that a backup volume is present (and not just an empty mount point). .TP .B fshash.cache The cache database used for improving performance of local file hashing. There may be other .B fshash.cache-* files used by SQLite for its own purposes. .TP .B lost+found Part of the filesystem used on the backup volume. You don't want to mess with this. .TP .B tmp Used to store temporary files during the backup process. (Some of them want to be on the same filesystem as the rest of the backup.) When things go wrong, files are left behind in the hope that they might help someone debug the mess. It's always safe to delete the files in here when no backup is running. .PP So don't use those names for your hosts. .PP The next layer down contains a directory for each filesystem on the given host. .PP The bottom layer contains a directory for each dump of that filesystem, named with the date at which the dump was started (in ISO8601 .IB yyyy \(en mm \(en dd format), together with associated files named .IB date .* \fR. There is also a symbolic link .B last referring to the most recent backup of the filesystem. .SH SEE ALSO .BR check-bkp-status (8), .BR fshash (1), .BR lvm (8), .BR rfreezefs (8), .BR rsync (1), .BR ssh (1), .BR update-bkp-index (8). .SH AUTHOR Mark Wooding,