| 1 | * Design of new, multi-subnet secnet protocol |
| 2 | |
| 3 | Like the first (1995/6) version, we're tunnelling IP packets inside |
| 4 | UDP packets. To defeat various restrictions which may be imposed on us |
| 5 | by network providers (like the prohibition of incoming TCP |
| 6 | connections) we're sticking with UDP for everything this time, |
| 7 | including key setup. This means we have to handle retries, etc. |
| 8 | |
| 9 | Other new features include being able to deal with subnets hidden |
| 10 | behind changing 'real' IP addresses, and the ability to choose |
| 11 | algorithms and keys per pair of communicating sites. |
| 12 | |
| 13 | ** Configuration and structure |
| 14 | |
| 15 | [The original plan] |
| 16 | |
| 17 | The network is made up from a number of 'sites'. These are collections |
| 18 | of machines with private IP addresses. The new secnet code runs on |
| 19 | machines which have interfaces on the private site network and some |
| 20 | way of accessing the 'real' internet. |
| 21 | |
| 22 | Each end of a tunnel is identified by a name. Often it will be |
| 23 | convenient for every gateway machine to use the same name for each |
| 24 | tunnel endpoint, but this is not vital. Individual tunnels are |
| 25 | identified by their two endpoint names. |
| 26 | |
| 27 | [The new plan] |
| 28 | |
| 29 | It appears that people want to be able to use secnet on mobile |
| 30 | machines like laptops as well as to interconnect sites. In particular, |
| 31 | they want to be able to use their laptop in three situations: |
| 32 | |
| 33 | 1) connected to their internal LAN by a cable; no tunnel involved |
| 34 | 2) connected via wireless, using a tunnel to protect traffic |
| 35 | 3) connected to some other network, using a tunnel to access the |
| 36 | internal LAN. |
| 37 | |
| 38 | They want the laptop to keep the same IP address all the time. |
| 39 | |
| 40 | Case (1) is simple. |
| 41 | |
| 42 | Case (2) requires that the laptop run a copy of secnet, and have a |
| 43 | tunnel configured between it and the main internal LAN default |
| 44 | gateway. secnet must support the concept of a 'soft' tunnel where it |
| 45 | adds a route and causes the gateway to do proxy-ARP when the tunnel is |
| 46 | up, and removes the route again when the tunnel is down. |
| 47 | |
| 48 | The usual prohibition of packets coming in from one tunnel and going |
| 49 | out another must be relaxed in this case (in particular, the |
| 50 | destination address of packets from these 'mobile station' tunnels may |
| 51 | be another tunnel as well as the host). |
| 52 | |
| 53 | (Quick sanity check: if chiark's secnet address was in |
| 54 | 192.168.73.0/24, would this work properly? Yes, because there will be |
| 55 | an explicit route to it, and proxy ARP will be done for it. Do we want |
| 56 | packets from the chiark tunnel to be able to go out along other |
| 57 | routes? No. So, spotting a 'local' address in a remote site's list of |
| 58 | networks isn't sufficient to switch on routing for a site. We need an |
| 59 | explicit option. NB packets may be routed if the source OR the |
| 60 | destination is marked as allowing routing [otherwise packets couldn't |
| 61 | get back from eg. chiark to a laptop at greenend]). |
| 62 | |
| 63 | [the even newer plan] |
| 64 | |
| 65 | secnet sites are configured to grant access to particular IP address |
| 66 | ranges to the holder of a particular public key. The key can certify |
| 67 | other keys, which will then be permitted to use a subrange of the IP |
| 68 | address range of the certifying key. |
| 69 | |
| 70 | This means that secnet won't know in advance (i.e. at configuration |
| 71 | time) how many tunnels it might be required to support, so we have to |
| 72 | be able to create them (and routes, and so on) on the fly. |
| 73 | |
| 74 | ** VPN-level configuration |
| 75 | |
| 76 | At a high level we just want to be able to indicate which groups of |
| 77 | users can claim ownership of which ranges of IP addresses. Assuming |
| 78 | these users (or their representatives) all have accounts on a single |
| 79 | machine, we can automate the submission of keys and other information |
| 80 | to make up a 'sites' file for the entire VPN. |
| 81 | |
| 82 | The distributed 'sites' file should be in a more restricted format |
| 83 | than the secnet configuration file, to prevent attackers who manage to |
| 84 | distribute bogus sites files from taking over their victim's machines. |
| 85 | |
| 86 | The distributed 'sites' file is read one line at a time. Each line |
| 87 | consists of a keyword followed by other information. It defines a |
| 88 | number of VPNs; within each VPN it defines a number of locations; |
| 89 | within each location it defines a number of sites. These VPNs, |
| 90 | locations and sites are turned into a secnet.conf file fragment using |
| 91 | a script. |
| 92 | |
| 93 | Some keywords are valid at any 'level' of the distributed 'sites' |
| 94 | file, indicating defaults. |
| 95 | |
| 96 | The keywords are: |
| 97 | |
| 98 | vpn n: we are now declaring information to do with VPN 'n'. Must come first. |
| 99 | |
| 100 | location n: we are now declaring information for location 'n'. |
| 101 | |
| 102 | site n: we are now declaring information for site 'n'. |
| 103 | endsite: we're finished declaring information for the current site |
| 104 | |
| 105 | restrict-nets a b c ...: restrict the allowable 'networks' for the current |
| 106 | level to those in this list. |
| 107 | end-definitions: prevent definition of further vpns and locations, and |
| 108 | modification of defaults at VPN level |
| 109 | |
| 110 | dh x y: the current VPN uses the specified group; x=modulus, y=generator |
| 111 | |
| 112 | hash x: which hash function to use. Valid options are 'md5' and 'sha1'. |
| 113 | |
| 114 | admin n: administrator email address for current level |
| 115 | |
| 116 | key-lifetime n |
| 117 | setup-retries n |
| 118 | setup-timeout n |
| 119 | wait-time n |
| 120 | renegotiate-time n |
| 121 | |
| 122 | address a b: a=dnsname, b=port |
| 123 | networks a b c ... |
| 124 | pubkey x y z: x=keylen, y=encryption key, z=modulus |
| 125 | mobile: declare this to be a 'mobile' site |
| 126 | |
| 127 | ** Logging etc. |
| 128 | |
| 129 | There are several possible ways of running secnet: |
| 130 | |
| 131 | 'reporting' only: --version, --help, etc. command line options and the |
| 132 | --just-check-config mode. |
| 133 | |
| 134 | 'normal' run: perform setup in the foreground, and then background. |
| 135 | |
| 136 | 'failed' run: setup in the foreground, and terminate with an error |
| 137 | before going to background. |
| 138 | |
| 139 | 'reporting' modes should never output anything except to stdout/stderr. |
| 140 | 'normal' and 'failed' runs output to stdout/stderr before |
| 141 | backgrounding, then thereafter output only to log destinations. |
| 142 | |
| 143 | ** Protocols |
| 144 | |
| 145 | *** Protocol environment: |
| 146 | |
| 147 | Each gateway machine serves a particular, well-known set of private IP |
| 148 | addresses (i.e. the agreement over which addresses it serves is |
| 149 | outside the scope of this discussion). Each gateway machine has an IP |
| 150 | address on the interconnecting network (usually the Internet), which |
| 151 | may be dynamically allocated and may change at any point. |
| 152 | |
| 153 | Each gateway knows the RSA public keys of the other gateways with |
| 154 | which it wishes to communicate. The mechanism by which this happens is |
| 155 | outside the scope of this discussion. There exists a means by which |
| 156 | each gateway can look up the probable IP address of any other. |
| 157 | |
| 158 | *** Protocol goals: |
| 159 | |
| 160 | The ultimate goal of the protocol is for the originating gateway |
| 161 | machine to be able to forward packets from its section of the private |
| 162 | network to the appropriate gateway machine for the destination |
| 163 | machine, in such a way that it can be sure that the packets are being |
| 164 | sent to the correct destination machine, the destination machine can |
| 165 | be sure that the source of the packets is the originating gateway |
| 166 | machine, and the contents of the packets cannot be understood other |
| 167 | than by the two communicating gateways. |
| 168 | |
| 169 | XXX not sure about the address-change stuff; leave it out of the first |
| 170 | version of the protocol. From experience, IP addresses seem to be |
| 171 | quite stable so the feature doesn't gain us much. |
| 172 | |
| 173 | **** Protocol sub-goal 1: establish a shared key |
| 174 | |
| 175 | Definitions: |
| 176 | |
| 177 | A is the originating gateway machine name |
| 178 | B is the destination gateway machine name |
| 179 | A+ and B+ are the names with optional additional data, see below |
| 180 | PK_A is the public RSA key of A |
| 181 | PK_B is the public RSA key of B |
| 182 | PK_A^-1 is the private RSA key of A |
| 183 | PK_B^-1 is the private RSA key of B |
| 184 | x is the fresh private DH key of A |
| 185 | y is the fresh private DH key of B |
| 186 | k is g^xy mod m |
| 187 | g and m are generator and modulus for Diffie-Hellman |
| 188 | nA is a nonce generated by A |
| 189 | nB is a nonce generated by B |
| 190 | iA is an index generated by A, to be used in packets sent from B to A |
| 191 | iB is an index generated by B, to be used in packets sent from A to B |
| 192 | i? is appropriate index for receiver |
| 193 | |
| 194 | Note that 'i' may be re-used from one session to the next, whereas 'n' |
| 195 | is always fresh. |
| 196 | |
| 197 | The optional additional data after the sender's name consists of some |
| 198 | initial subset of the following list of items: |
| 199 | * A 32-bit integer with a set of capability flags, representing the |
| 200 | abilities of the sender. |
| 201 | * In MSG3/MSG4: a 16-bit integer being the sender's MTU, or zero. |
| 202 | (In other messages: nothing.) See below. |
| 203 | * More data which is yet to be defined and which must be ignored |
| 204 | by receivers. |
| 205 | The optional additional data after the receiver's name is not |
| 206 | currently used. If any is seen, it must be ignored. |
| 207 | |
| 208 | Capability flag bits must be in one the following two categories: |
| 209 | |
| 210 | 1. Early capability flags must be advertised in MSG1 or MSG2, as |
| 211 | applicable. If MSG3 or MSG4 advertise any "early" capability bits, |
| 212 | MSG1 or MSG3 (as applicable) must have advertised them too. Sadly, |
| 213 | advertising an early capability flag will produce MSG1s which are |
| 214 | not understood by versions of secnet which predate the capability |
| 215 | mechanism. |
| 216 | |
| 217 | 2. Late capability flags are advertised in MSG2 or MSG3, as |
| 218 | applicable. They may also appear in MSG1, but this is not |
| 219 | guaranteed. MSG4 must advertise the same set as MSG2. |
| 220 | |
| 221 | No capability flags are currently defined. Unknown capability flags |
| 222 | should be treated as late ones. |
| 223 | |
| 224 | |
| 225 | MTU handling |
| 226 | |
| 227 | In older versions of secnet, secnet was not capable of fragmentation |
| 228 | or sending ICMP Frag Needed. Administrators were expected to configure |
| 229 | consistent MTUs across the network. |
| 230 | |
| 231 | It is still the case in the current version that the MTUs need to be |
| 232 | configured reasonably coherently across the network: the allocated |
| 233 | buffer sizes must be sufficient to cope with packets from all other |
| 234 | peers. |
| 235 | |
| 236 | However, provided the buffers are sufficient, all packets will be |
| 237 | processed properly: a secnet receiving a packet larger than the |
| 238 | applicable MTU for its delivery will either fragment it, or reject it |
| 239 | with ICMP Frag Needed. |
| 240 | |
| 241 | The MTU additional data field allows secnet to advertise an MTU to the |
| 242 | peer. This allows the sending end to handle overlarge packets, before |
| 243 | they are transmitted across the underlying public network. This can |
| 244 | therefore be used to work around underlying network braindamage |
| 245 | affecting large packets. |
| 246 | |
| 247 | If the MTU additional data field is zero or not present, then the peer |
| 248 | should use locally-configured MTU information (normally, its local |
| 249 | netlink MTU) instead. |
| 250 | |
| 251 | If it is nonzero, the peer may send packets up to the advertised size |
| 252 | (and if that size is bigger than the peer's administratively |
| 253 | configured size, the advertiser promises that its buffers can handle |
| 254 | such a large packet). |
| 255 | |
| 256 | A secnet instance should not assume that just because it has |
| 257 | advertised an mtu which is lower than usual for the vpn, the peer will |
| 258 | honour it, unless the administrator knows that the peers are |
| 259 | sufficiently modern to understand the mtu advertisement option. So |
| 260 | secnet will still accept packets which exceed the link MTU (whether |
| 261 | negotiated or assumed). |
| 262 | |
| 263 | |
| 264 | Messages: |
| 265 | |
| 266 | 1) A->B: *,iA,msg1,A+,B+,nA |
| 267 | |
| 268 | i* must be encoded as 0. (However, it is permitted for a site to use |
| 269 | zero as its "index" for another site.) |
| 270 | |
| 271 | 2) B->A: iA,iB,msg2,B+,A+,nB,nA |
| 272 | |
| 273 | (The order of B and A reverses in alternate messages so that the same |
| 274 | code can be used to construct them...) |
| 275 | |
| 276 | 3) A->B: {iB,iA,msg3,A+,B+,[chosen-transform],nA,nB,g^x mod m}_PK_A^-1 |
| 277 | |
| 278 | If message 1 was a replay then A will not generate message 3, because |
| 279 | it doesn't recognise nA. |
| 280 | |
| 281 | If message 2 was from an attacker then B will not generate message 4, |
| 282 | because it doesn't recognise nB. |
| 283 | |
| 284 | 4) B->A: {iA,iB,msg4,B+,A+,nB,nA,g^y mod m}_PK_B^-1 |
| 285 | |
| 286 | At this point, A and B share a key, k. B must keep retransmitting |
| 287 | message 4 until it receives a packet encrypted using key k. |
| 288 | |
| 289 | 5) A: iB,iA,msg5,(ping/msg5)_k |
| 290 | |
| 291 | 6) B: iA,iB,msg6,(pong/msg6)_k |
| 292 | |
| 293 | (Note that these are encrypted using the same transform that's used |
| 294 | for normal traffic, so they include sequence number, MAC, etc.) |
| 295 | |
| 296 | The ping and pong messages can be used by either end of the tunnel at |
| 297 | any time, but using msg0 as the unencrypted message type indicator. |
| 298 | |
| 299 | **** Protocol sub-goal 2: end the use of a shared key |
| 300 | |
| 301 | 7) i?,i?,msg0,(end-session/msg7,A,B)_k |
| 302 | |
| 303 | This message can be sent by either party. Once sent, k can be |
| 304 | forgotten. Once received and checked, k can be forgotten. No need to |
| 305 | retransmit or confirm reception. It is suggested that this message be |
| 306 | sent when a key times out, or the tunnel is forcibly terminated for |
| 307 | some reason. |
| 308 | |
| 309 | **** Protocol sub-goal 3: send a packet |
| 310 | |
| 311 | 8) i?,i?,msg0,(send-packet/msg9,packet)_k |
| 312 | |
| 313 | **** Other messages |
| 314 | |
| 315 | 9) i?,i?,NAK (NAK is encoded as zero) |
| 316 | |
| 317 | If the link-layer can't work out what to do with a packet (session has |
| 318 | gone away, etc.) it can transmit a NAK back to the sender. |
| 319 | |
| 320 | This can alert the sender to the situation where the sender has a key |
| 321 | but the receiver doesn't (eg because it has been restarted). The |
| 322 | sender, on receiving the NAK, will try to initiate a key exchange. |
| 323 | |
| 324 | Forged (or overly delayed) NAKs can cause wasted resources due to |
| 325 | spurious key exchange initiation, but there is a limit on this because |
| 326 | of the key exchange retry timeout. |
| 327 | |
| 328 | 10) i?,i?,msg8,A,B,nA,nB,msg? |
| 329 | |
| 330 | This is an obsolete form of NAK packet which is not sent by any even |
| 331 | vaguely recent version of secnet. (In fact, there is no evidence in |
| 332 | the git history of it ever being sent.) |
| 333 | |
| 334 | This message number is reserved. |
| 335 | |
| 336 | 11) *,*,PROD,A,B |
| 337 | |
| 338 | Sent in response to a NAK from B to A. Requests that B initiates a |
| 339 | key exchange with A, if B is willing and lacks a transport key for A. |
| 340 | (If B doesn't have A's address configured, implicitly supplies A's |
| 341 | public address.) |
| 342 | |
| 343 | This is necessary because if one end of a link (B) is restarted while |
| 344 | a key exchange is in progress, the following bad state can persist: |
| 345 | the non-restarted end (A) thinks that the key is still valid and keeps |
| 346 | sending packets, but B either doesn't realise that a key exchange with |
| 347 | A is necessary or (if A is a mobile site) doesn't know A's public IP |
| 348 | address. |
| 349 | |
| 350 | Normally in these circumstances B would send NAKs to A, causing A to |
| 351 | initiate a key exchange. However if A and B were already in the |
| 352 | middle of a key exchange then A will not want to try another one until |
| 353 | the first one has timed out ("setup-time" x "setup-retries") and then |
| 354 | the key exchange retry timeout ("wait-time") has elapsed. |
| 355 | |
| 356 | However if B's setup has timed out, B would be willing to participate |
| 357 | in a key exchange initiated by A, if A could be induced to do so. |
| 358 | This is the purpose of the PROD packet. |
| 359 | |
| 360 | We send no more PRODs than we would want to send data packets, to |
| 361 | avoid a traffic amplification attack. We also send them only in state |
| 362 | WAIT, as in other states we wouldn't respond favourably. And we only |
| 363 | honour them if we don't already have a key. |
| 364 | |
| 365 | With PROD, the period of broken communication due to a key exchange |
| 366 | interrupted by a restart is limited to the key exchange total |
| 367 | retransmission timeout, rather than also including the key exchange |
| 368 | retry timeout. |