mdw@git.distorted.org.uk Git - qmail/blob - RFCLOOPS

   1 Tools in the war on mail loops
   2 D. J. Bernstein, djb@pobox.com
   3 19970201
   4
   5
   6 1. Introduction
   7
   8    An automailer means any program that receives a mail message and
   9    automatically sends one or more mail messages. This term is meant to
  10    include not only a mail-based server, such as a mailing list exploder
  11    or a vacation program, but also an SMTP server, which receives a
  12    message from the network and relays it to a local or remote user.
  13
  14    In a network full of automailers, any mistake can cause a mail loop.
  15    Since some automailers generate several outputs in response to a
  16    single input, a loop can produce an exponential explosion of mail.
  17
  18    All the automailers in the qmail package follow a general philosophy
  19    designed to prevent mail loops and limit the damage from any loops
  20    that do occur. These automailers have been repeatedly observed to
  21    fail safe: they stop loops in the face of typical failures by other
  22    hosts. This document explains the philosophy and describes the
  23    automailers.
  24
  25    To some extent the philosophy here simply repeats and amplifies
  26    standard practice as codified in RFC 974 and RFC 1123. Unfortunately,
  27    the standards do not adequately control bounce loops, since they do
  28    not recognize that postmasters want to see double bounces; they do
  29    not adequately control relaying loops; and they do not prevent
  30    cross-host forwarding loops.
  31
  32    Terminology: The mail message received by an automailer is called
  33    input. The mail messages sent by an automailer are called outputs.
  34    For simplicity, this document focuses on the case that the input has
  35    just one envelope recipient.
  36
  37    REMINDER: This document describes the automailers in the qmail
  38    package. Other packages include automailers that do not fit the
  39    descriptions given here.
  40
  41    Beware that the war on mail loops can never be won: any method of
  42    preventing mail loops can be subverted by other hosts. I welcome
  43    further development of techniques that work well in practice.
  44
  45
  46 2. Basics
  47
  48    The output from an automailer is always further down the following
  49    list than the input.
  50
  51       0 hops, <sender> is neither <> nor <#@[]>  normal messages
  52       1 hop, <sender> is neither <> nor <#@[]>
  53       2 hops, <sender> is neither <> nor <#@[]>
  54       etc.
  55       0 hops, <sender> is <>                     bounces
  56       1 hop, <sender> is <>
  57       2 hops, <sender> is <>
  58       etc.
  59       0 hops, <sender> is <#@[]>                 double bounces
  60       1 hop, <sender> is <#@[]>
  61       2 hops, <sender> is <#@[]>
  62       etc.
  63
  64    Here sender means the envelope sender address. Hops means the number
  65    of Received and Delivered-To fields in the header. See sections 3.3
  66    and 3.4 for an explanation of <> and <#@[]>.
  67
  68    Consequently, no automailer ever generates an entirely new normal
  69    message in response to a normal message. If the output is a normal
  70    message, it always has more hops than the input.
  71
  72    When input and output are both normal messages, both bounces, or both
  73    double bounces, the output header is essentially the same as the
  74    input header. However, when an automailer moves from a normal message
  75    to a bounce, or from a bounce to a double bounce, it generates an
  76    entirely new header.
  77
  78    An automailer may refuse to operate if the input has too many hops.
  79    The definition of too many hops depends on the automailer. This
  80    practice is called hop counting. Note that some existing messages
  81    legitimately take as many as 20 hops. One automailer uses a limit of
  82    100 hops; this will be adequate for all messages in the foreseeable
  83    future.
  84
  85    Hop counting is a weapon of last resort. It will, if correctly
  86    implemented, prevent all infinite loops; however, even a finite loop
  87    can do practically infinite damage, as illustrated in section 4.3.
  88
  89
  90 3. Pre-delivery automailers
  91
  92    Conceptually: The input is a message that has not yet reached its
  93    envelope recipient address. It is fed to a relay, which attempts to
  94    deliver the message directly to, or at least closer to, that address;
  95    if the relay fails permanently, the message is fed to a bouncer or a
  96    double-bouncer. Relays, bouncers, and double-bouncers are examples of
  97    pre-delivery automailers.
  98
  99    A pre-delivery automailer produces at most one output.
 100
 101    The basic weapon against pre-delivery mail loops is gravity. A normal
 102    message always moves closer to its envelope recipient, according to a
 103    notion of distance defined in section 3.1. If it bounces before
 104    reaching the recipient, it turns into a bounce message, which always
 105    moves closer to the original envelope sender. If that in turn
 106    bounces, it turns into a double bounce, which always moves closer to
 107    a local postmaster. (Triple bounces do not exist.)
 108
 109
 110 3.1. Distance
 111
 112    The distance from a DNS domain D to a recipient U@R is defined as
 113    follows, when R has an MX list: the minimum preference of D in the
 114    MX list, or 100000 if D does not appear in the list.
 115
 116    When R has no MX records, the distance from R to U@R is defined as 0,
 117    and the distance from any other domain to U@R is defined as 100000.
 118
 119    Exception: If R is an alias, i.e., if R has a CNAME record, the
 120    distance from any domain to U@R is defined as 500000.
 121
 122    The distance from a host H to U@R is defined as the minimum distance
 123    to U@R from any domain that touches H. (``D touches H'' means ``D has
 124    an A record listing one of H's IP addresses.'')
 125
 126    Exception: If H does not accept mail from the network, its distance
 127    to any recipient is defined as 999999.
 128
 129
 130 3.2. Relays
 131
 132    A relay is a pre-delivery automailer that sends the output towards
 133    the envelope recipient. What this means for intra-host relays is not
 134    discussed here. What this means for cross-host relays is the
 135    following: if the relay is at host H, and it sends its output to host
 136    T, then the distance from T to the output envelope recipient is
 137    always smaller than the distance from H to the input envelope
 138    recipient.
 139
 140    The following facts guarantee that certain cross-host relay behavior
 141    is safe. For proofs of these facts, see Appendix A.
 142
 143       Fact 1: If R is an alias for X, X is not an alias, D touches T,
 144       and T accepts mail from the network, then the distance from T to
 145       U@X is smaller than the distance from H to U@R.
 146
 147       Fact 2: If R is not an alias, R has no MX records, H is not
 148       touched by R, T is touched by R, and T accepts mail from the
 149       network, then T is closer to U@R than H is.
 150
 151       Fact 3: If R is not an alias, R has an MX record with domain X and
 152       preference p, H is not touched by any of the domains in the MX
 153       list for R with preference <= p, T is touched by X, and T accepts
 154       mail from the network, then T is closer to U@R than H is.
 155
 156    Also, a host that does not accept mail from the network can relay
 157    messages to a nearby hub.
 158
 159    A relay adds a new Received header field to the top of the output.
 160    Other than this, the output header, body, and envelope are exactly
 161    the same as the input header, body, and envelope. Exception: If the
 162    input envelope recipient is U@R, R is an alias for X, and X is not
 163    an alias, the output envelope recipient is U@X.
 164
 165
 166 3.3. Bouncers
 167
 168    A bouncer is a pre-delivery automailer that lets the envelope sender
 169    know what happened to a message. Most bouncers send failure notices.
 170    Some bouncers, such as vacation servers and echo servers, send
 171    success notices.
 172
 173    In a bouncer's output, the envelope sender is <>, and the envelope
 174    recipient is the input envelope sender. A bouncer refuses to operate
 175    if the input envelope sender is <> or <#@[]>.
 176
 177    Some mailers on the Internet do not understand the <> convention. In
 178    fact, some mailers will rewrite <> as <@host>. So any message with an
 179    envelope recipient of <> or <@host> is discarded upon local delivery.
 180
 181    Unlike a relay, a bouncer produces output with a new header, not
 182    simply a copy of the input header. For example:
 183
 184       (envelope) from <> to <djb@silverton.berkeley.edu>
 185       Date: 2 Jan 1996 03:38:25 GMT
 186       From: DELIVERY NOTICE SYSTEM <MAILER-DAEMON@heaven.af.mil>
 187       To: djb@silverton.berkeley.edu
 188       Subject: failure notice
 189
 190    However, the body of the bounce indicates the relevant input envelope
 191    recipient, as well as the Message-ID of the input, if the input had a
 192    Message-ID. The body of a failure notice includes a copy of the
 193    entire input message.
 194
 195
 196 3.4. Double-bouncers
 197
 198    A double-bouncer is a pre-delivery automailer that informs a local
 199    postmaster of permanent failures to deliver bounce messages. Such
 200    failures are generally caused by poorly configured hosts that produce
 201    normal messages with faulty envelope sender addresses.
 202
 203    A double-bouncer refuses to operate unless the input envelope sender
 204    is <>. The output envelope sender from a double-bouncer is <#@[]>;
 205    note that <#@[]> cannot be used as an SMTP envelope sender under
 206    RFC 821. The output envelope recipient is predetermined.
 207
 208    Note that double bounces are not suggested by RFC 1123. However,
 209    faulty envelope sender addresses are usually configuration errors
 210    that can and should be fixed. Some postmasters, faced with mail
 211    software that throws away double bounces, resort to keeping copies of
 212    all bounces; but single bounces are rarely the postmaster's problem.
 213
 214
 215 4. Post-delivery automailers
 216
 217    Conceptually: The input is a message that has reached its envelope
 218    recipient address. It is fed to a post-delivery automailer at that
 219    address.
 220
 221    The basic weapon against post-delivery loops is a new header field,
 222    Delivered-To, tracing all the forwarders and mailing lists that a
 223    message has been through. This field has the side benefit of making
 224    it much easier for a user (or for a postmaster seeing a bounce) to
 225    figure out the path that the message took. Delivered-To is similar to
 226    RFC 1327's DL-Expansion-History, but (1) it omits the time stamp,
 227    removing any need for parsing, and (2) it has a much better name.
 228
 229
 230 4.1. Exploders and repliers
 231
 232    There are two basic types of post-delivery automailers: exploders,
 233    where the output envelope recipients are predetermined; and repliers,
 234    where there is just one output, with envelope recipient determined
 235    from the input.
 236
 237    Repliers normally determine the output envelope recipient as either
 238    the input Reply-To header field, if it exists; or else the input
 239    From header field, if it exists; or else the envelope sender. A
 240    replier never produces an output to <> or <#@[]>.
 241
 242    Exploders are classified into mailing lists, where the output
 243    envelope senders are predetermined, and forwarders, where every
 244    output has envelope sender equal to the original envelope sender.
 245
 246    Exception: if the input envelope sender is <> or <#@[]>, then the
 247    output envelope senders are equal to the input envelope sender, even
 248    for a mailing list.
 249
 250    Note that, if the envelope sender of a mailing list with M bad
 251    addresses is another exploder with E bad addresses, the local
 252    postmaster will receive EM double bounces for each message to the
 253    mailing list.
 254
 255
 256 4.2. Delivered-To
 257
 258    Every post-delivery automailer adds a new Delivered-To header field
 259    to the top of each output.
 260
 261    The contents of the Delivered-To field are typically the address of
 262    the automailer, i.e., the input envelope recipient, conventionally
 263    without any quoting. The contents of the Delivered-To field are in
 264    any case entirely predetermined. The automailer checks if exactly the
 265    same Delivered-To field already appears in the header; if so, it
 266    refuses to operate.
 267
 268    A post-delivery automailer preserves existing Delivered-To and
 269    Received fields. In fact, a post-delivery automailer generally
 270    preserves all header fields. The exceptions are limited to known
 271    fields that are not used for loop detection and that must be removed
 272    for correct operation. For example, a replier generally changes the
 273    body of a message and thus should not preserve the SVR4
 274    Content-Length field.
 275
 276
 277 4.3. An example
 278
 279    Aliases and mailing lists are highly dangerous, because they can
 280    generate several outputs for each input.
 281
 282    Here is an extreme example. A user has three accounts, and wants any
 283    message to any of the accounts to be delivered to all three. So he
 284    forwards luser@host1 to luser@host2 and luser@host3, forwards
 285    luser@host2 to luser@host1 and luser@host3, and forwards luser@host3
 286    to luser@host1 and luser@host2.
 287
 288    Without Delivered-To, someone who sends a message to luser@host1 will
 289    receive a practically infinite series of bounces. For example, with a
 290    hop count limit of 50, the sender will receive 1125899906842624
 291    bounces.
 292
 293    If all the hosts, or two out of the three, support Delivered-To, the
 294    message will bounce just a few times. If just one of the hosts
 295    supports Delivered-To, it will be the unfortunate victim of a loop
 296    between the other two hosts---although the total number of bounces
 297    will drop from practically infinite down to a few hundred, with
 298    typical hop count limits.
 299
 300
 301 Appendix A. Proofs of correctness for MX handling
 302
 303    Section 3.2 states three facts about the notion of distance defined
 304    in section 3.1. Here are mathematical proofs of those facts.
 305
 306    Symbols: D, E, R, and X are domains; H and T are hosts; p and q are
 307    nonnegative integers. {} is the empty set.
 308
 309    Hypotheses: M(R), the ``MX list for R,'' is a set of pairs (p,D)
 310    where p <= 65535. There is a set A of domains, called ``aliases.''
 311    There is a relation D->H, called ``D touches H.'' There is a set N of
 312    hosts, called ``hosts that accept mail from the network.''
 313
 314    Definitions: m(D,R) = min { p: p = 100000 or (p,D) in M(R) } when
 315    M(R) is nonempty. When M(R) is empty, m(D,R) is 0 if D = R, 100000
 316    otherwise. f(D,R) is defined as 500000 if R is in A, m(D,R)
 317    otherwise; this is the ``distance from D to U@R,'' for any U. g(H,R)
 318    is defined as min { f(D,R): D->H } if H is in N, 999999 otherwise;
 319    this is the ``distance from H to U@R,'' for any U.
 320
 321    Fact 1 (generalized): If R is in A, X is not in A, D->T, and T is in
 322    N, then g(T,X) < g(H,R). Proof: R is in A, so f(E,R) = 500000 for any
 323    E; thus g(H,R) >= 500000. X is not in A, so f(D,X) = m(D,X) <=
 324    100000; hence g(T,X) <= f(D,X) <= 100000 < g(H,R).
 325
 326    Fact 2: If R is not in A, M(R) = {}, R->T, T is in N, and not R->H,
 327    then g(T,R) < g(H,R). Proof: f(R,R) = m(R,R) = 0 since R is not in A
 328    and M(R) = {}. T is in N so g(T,R) <= f(R,R) = 0 so g(T,R) = 0.
 329    Suppose that g(H,R) <= g(T,R). Then g(H,R) = 0, so f(D,R) = 0 for
 330    some D with D->H, so m(D,R) = 0. But then D = R by definition of m,
 331    so R->H. Contradiction. Thus g(T,R) < g(H,R).
 332
 333    Fact 3: If R is not in A, (p,X) is in M(R), X->T, T is in N, and
 334    (q,D) is not in M(R) whenever D->H and q <= p, then g(T,R) < g(H,R).
 335    Proof: First m(X,R) <= p. R is not in A, so f(X,R) = m(X,R). T is in
 336    N, so g(T,R) <= f(X,R). Thus g(T,R) <= p. Suppose that g(H,R) <= p.
 337    Then f(D,R) <= p for some D with D->H, so m(D,R) <= p. But then
 338    (m(D,R),D) is in M(R). Contradiction. Thus g(T,R) <= p < g(H,R).