| 1 | Archvsync |
| 2 | ========= |
| 3 | |
| 4 | This is the central repository for the Debian mirror scripts. The scripts |
| 5 | in this repository are written for the purposes of maintaining a Debian |
| 6 | archive mirror (and shortly, a Debian bug mirror), but they should be |
| 7 | easily generalizable. |
| 8 | |
| 9 | |
| 10 | Currently the following scripts are available: |
| 11 | |
| 12 | * ftpsync - Used to sync an archive using rsync |
| 13 | * runmirrors - Used to notify leaf nodes of available updates |
| 14 | * dircombine - Internal script to manage the mirror user's $HOME |
| 15 | on debian.org machines |
| 16 | * typicalsync - Generates a typical Debian mirror |
| 17 | * udh - We are lazy, just a shorthand to avoid typing the |
| 18 | commands, ignore... :) |
| 19 | |
| 20 | Usage |
| 21 | ===== |
| 22 | For impatient people, short usage instruction: |
| 23 | |
| 24 | - Create a dedicated user for the whole mirror. |
| 25 | - Create a seperate directory for the mirror, writeable by the new user. |
| 26 | - Place the ftpsync script in the mirror user's $HOME/bin (or just $HOME) |
| 27 | - Place the ftpsync.conf.sample into $HOME/etc as ftpsync.conf and edit |
| 28 | it to suit your system. You should at the very least change the TO= |
| 29 | and RSYNC_HOST lines. |
| 30 | - Create $HOME/log (or wherever you point $LOGDIR to) |
| 31 | - Setup the .ssh/authorized_keys for the mirror user and place the public key of |
| 32 | your upstream mirror into it. Preface it with |
| 33 | no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="~/bin/ftpsync",from="IPADDRESS" |
| 34 | and replace $IPADDRESS with that of your upstream mirror. |
| 35 | - You are finished |
| 36 | |
| 37 | In order to receive different pushes or syncs from different archives, |
| 38 | name the config file ftpsync-$ARCHIVE.conf and call the ftpsync script |
| 39 | with the commandline "sync:archive:$ARCHIVE". Replace $ARCHIVE with a |
| 40 | sensible value. If your upstream mirror pushes you using runmirrors |
| 41 | bundled together with this sync script, you do not need to add the |
| 42 | "sync:archive" parameter to the commandline, the scripts deal with it |
| 43 | automatically. |
| 44 | |
| 45 | |
| 46 | |
| 47 | Debian mirror script minimum requirements |
| 48 | ========================================= |
| 49 | As always, you may use whatever scripts you want for your Debian mirror, |
| 50 | but we *STRONGLY* recommend you to not invent your own. However, if you |
| 51 | want to be listed as a mirror it *MUST* support the following minimal |
| 52 | functionality: |
| 53 | |
| 54 | - Must perform a 2-stage sync |
| 55 | The archive mirroring must be done in 2 stages. The first rsync run |
| 56 | must ignore the index files. The correct exclude options for the |
| 57 | first rsync run are: |
| 58 | --exclude Packages* --exclude Sources* --exclude Release* --exclude ls-lR* |
| 59 | The first stage must not delete any files. |
| 60 | |
| 61 | The second stage should then transfer the above excluded files and |
| 62 | delete files that no longer belong on the mirror. |
| 63 | |
| 64 | Rationale: If archive mirroring is done in a single stage, there will be |
| 65 | periods of time during which the index files will reference files not |
| 66 | yet mirrored. |
| 67 | |
| 68 | - Must not ignore pushes whil(e|st) running. |
| 69 | If a push is received during a run of the mirror sync, it MUST NOT |
| 70 | be ignored. The whole synchronization process must be rerun. |
| 71 | |
| 72 | Rationale: Most implementations of Debian mirror scripts will leave the |
| 73 | mirror in an inconsistent state in the event of a second push being |
| 74 | received while the first sync is still running. It is likely that in |
| 75 | the near future, the frequency of pushes will increase. |
| 76 | |
| 77 | - Should understand multi-stage pushes. |
| 78 | The script should parse the arguments it gets via ssh, and if they |
| 79 | contain a hint to only sync stage1 or stage2, then ONLY those steps |
| 80 | SHOULD be performed. |
| 81 | |
| 82 | Rationale: This enables us to coordinate the timing of the first |
| 83 | and second stage pushes and minimize the time during which the |
| 84 | archive is desynchronized. This is especially important for mirrors |
| 85 | that are involved in a round robin or GeoDNS setup. |
| 86 | |
| 87 | The minimum arguments the script has to understand are: |
| 88 | sync:stage1 Only sync stage1 |
| 89 | sync:stage2 Only sync stage2 |
| 90 | sync:all Do everything. Default if none of stage1/2 are |
| 91 | present. |
| 92 | There are more possible arguments, for a complete list see the |
| 93 | ftpsync script in our git repository. |
| 94 | |
| 95 | |
| 96 | |
| 97 | ftpsync |
| 98 | ======= |
| 99 | |
| 100 | This script is based on the old anonftpsync script. It has been rewritten |
| 101 | to add flexibilty and fix a number of outstanding issues. |
| 102 | |
| 103 | Some of the advantages of the new version are: |
| 104 | - Nearly every aspect is configurable |
| 105 | - Correct support for multiple pushes |
| 106 | - Support for multi-stage archive synchronisations |
| 107 | - Support for hook scripts at various points |
| 108 | - Support for multiple archives, even if they are pushed using one ssh key |
| 109 | - Support for multi-hop, multi-stage archive synchronisations |
| 110 | |
| 111 | Correct support for multiple pushes |
| 112 | ----------------------------------- |
| 113 | When the script receives a second push while it is running and syncing |
| 114 | the archive it won't ignore it. Instead it will rerun the |
| 115 | synchronisation step to ensure the archive is correctly synchronised. |
| 116 | |
| 117 | Scripts that fail to do that risk ending up with an inconsistent archive. |
| 118 | |
| 119 | |
| 120 | Can do multi-stage archive synchronisations |
| 121 | ------------------------------------------- |
| 122 | The script can be told to only perform the first or second stage of the |
| 123 | archive synchronisation. |
| 124 | |
| 125 | This enables us to send all the binary packages and sources to a |
| 126 | number of mirrors, and then tell all of them to sync the |
| 127 | Packages/Release files at once. This will keep the timeframe in which |
| 128 | the mirrors are out of sync very small and will greatly help things like |
| 129 | DNS RR entries or even the planned GeoDNS setup. |
| 130 | |
| 131 | |
| 132 | Multi-hop, multi-stage archive synchronisations |
| 133 | ----------------------------------------------- |
| 134 | The script can be told to perform a multi-hop multi-stage archive |
| 135 | synchronisation. |
| 136 | |
| 137 | This is basically the same as the multi-stage synchronisation |
| 138 | explained above, but enables the downstream mirror to push his own |
| 139 | staged/multi-hop downstreams before returning. This has the same |
| 140 | advantage than the multi-stage synchronisation but allows us to do |
| 141 | this over multiple level of mirrors. (Imagine one push going from |
| 142 | Europe to Australia, where then locally 3 others get updated before |
| 143 | stage2 is sent out. Instead of 4times transferring data from Europe to |
| 144 | Australia, just to have them all updated near instantly). |
| 145 | |
| 146 | |
| 147 | Can run hook scripts |
| 148 | -------------------- |
| 149 | ftpsync currently allows 5 hook scripts to run at various points of the |
| 150 | mirror sync run. |
| 151 | |
| 152 | Hook1: After lock is acquired, before first rsync |
| 153 | Hook2: After first rsync, if successful |
| 154 | Hook3: After second rsync, if successful |
| 155 | Hook4: Right before leaf mirror triggering |
| 156 | Hook5: After leaf mirror trigger (only if we have slave mirrors; HUB=true) |
| 157 | |
| 158 | Note that Hook3 and Hook4 are likely to be called directly after each other. |
| 159 | The difference is that Hook3 is called *every* time the second rsync |
| 160 | succeeds even if the mirroring needs to re-run due to a second push. |
| 161 | Hook4 is only executed if mirroring is completed. |
| 162 | |
| 163 | |
| 164 | Support for multiple archives, even if they are pushed using one ssh key |
| 165 | ------------------------------------------------------------------------ |
| 166 | If you get multiple archives from your upstream mirror (say Debian, |
| 167 | Debian-Backports and Volatile), previously you had to use 3 different ssh |
| 168 | keys to be able to automagically synchronize them. This script can do it |
| 169 | all with just one key, if your upstream mirror tells you which archive. |
| 170 | See "Commandline/SSH options" below for further details. |
| 171 | |
| 172 | |
| 173 | For details of all available options, please see the extensive documentation |
| 174 | in the sample configuration file. |
| 175 | |
| 176 | |
| 177 | Commandline/SSH options |
| 178 | ======================= |
| 179 | Script options may be set either on the local command line, or passed by |
| 180 | specifying an ssh "command". Local commandline options always have |
| 181 | precedence over the SSH_ORIGINAL_COMMAND ones. |
| 182 | |
| 183 | Currently this script understands the options listed below. To make them |
| 184 | take effect they MUST be prepended by "sync:". |
| 185 | |
| 186 | Option Behaviour |
| 187 | stage1 Only do stage1 sync |
| 188 | stage2 Only do stage2 sync |
| 189 | all Do a complete sync (default) |
| 190 | mhop Do a multi-hop sync |
| 191 | archive:foo Sync archive foo (if the file $HOME/etc/ftpsync-foo.conf |
| 192 | exists and is configured) |
| 193 | callback Call back when done (needs proper ssh setup for this to |
| 194 | work). It will always use the "command" callback:$HOSTNAME |
| 195 | where $HOSTNAME is the one defined in config and |
| 196 | will happen before slave mirrors are triggered. |
| 197 | |
| 198 | So, to get the script to sync all of the archive behind bpo and call back when |
| 199 | it is complete, use an upstream trigger of |
| 200 | ssh $USER@$HOST sync:all sync:archive:bpo sync:callback |
| 201 | |
| 202 | |
| 203 | Mirror trace files |
| 204 | ================== |
| 205 | Every mirror needs to have a 'trace' file under project/trace. |
| 206 | The file format is as follows: |
| 207 | |
| 208 | The filename has to be the full hostname (eg. hostname -f), or in the |
| 209 | case of a mirror participating in RR DNS (where users will never use |
| 210 | the hostname) the name of the DNS RR entry, eg. security.debian.org |
| 211 | for the security rotation) |
| 212 | |
| 213 | The content has (no leading spaces): |
| 214 | Sat Nov 8 13:20:22 UTC 2008 |
| 215 | Used ftpsync version: 42 |
| 216 | Running on host: steffani.debian.org |
| 217 | |
| 218 | First line: Output of date -u |
| 219 | Second line: Freeform text containing the program name and version |
| 220 | Third line: Text "Running on host: " followed by hostname -f |
| 221 | |
| 222 | The third line MUST NOT be the DNS RR name, even if the mirror is part |
| 223 | of it. It MUST BE the hosts own name. This is in contrast to the filename, |
| 224 | which SHOULD be the DNS RR name. |
| 225 | |
| 226 | |
| 227 | runmirrors |
| 228 | ========== |
| 229 | This script is used to tell leaf mirrors that it is time to synchronize |
| 230 | their copy of the archive. This is done by parsing a mirror list and |
| 231 | using ssh to "push" the leaf nodes. You can read much more about the |
| 232 | principle behind the push at [1], essentially it tells the receiving |
| 233 | end to run a pre-defined script. As the whole setup is extremely limited |
| 234 | and the ssh key is not usable for anything else than the pre-defined |
| 235 | script this is the most secure method for such an action. |
| 236 | |
| 237 | This script supports two types of pushes: The normal single stage push, |
| 238 | as well as the newer multi-stage push. |
| 239 | |
| 240 | The normal push, as described above, will simply push the leaf node and |
| 241 | then go on with the other nodes. |
| 242 | |
| 243 | The multi-staged push first pushes a mirror and tells it to only do a |
| 244 | stage1 sync run. Then it waits for the mirror (and all others being pushed |
| 245 | in the same run) to finish that run, before it tells all of the staged |
| 246 | mirrors to do the stage2 sync. |
| 247 | |
| 248 | This way you can do a nearly-simultaneous update of multiple hosts. |
| 249 | This is useful in situations where periods of desynchronization should |
| 250 | be kept as small as possible. Examples of scenarios where this might be |
| 251 | useful include multiple hosts in a DNS Round Robin entry. |
| 252 | |
| 253 | For details on the mirror list please see the documented |
| 254 | runmirrors.mirror.sample file. |
| 255 | |
| 256 | |
| 257 | [1] http://blog.ganneff.de/blog/2007/12/29/ssh-triggers.html |