+++ /dev/null
-Tools in the war on mail loops
-D. J. Bernstein, djb@pobox.com
-19970201
-
-
-1. Introduction
-
- An automailer means any program that receives a mail message and
- automatically sends one or more mail messages. This term is meant to
- include not only a mail-based server, such as a mailing list exploder
- or a vacation program, but also an SMTP server, which receives a
- message from the network and relays it to a local or remote user.
-
- In a network full of automailers, any mistake can cause a mail loop.
- Since some automailers generate several outputs in response to a
- single input, a loop can produce an exponential explosion of mail.
-
- All the automailers in the qmail package follow a general philosophy
- designed to prevent mail loops and limit the damage from any loops
- that do occur. These automailers have been repeatedly observed to
- fail safe: they stop loops in the face of typical failures by other
- hosts. This document explains the philosophy and describes the
- automailers.
-
- To some extent the philosophy here simply repeats and amplifies
- standard practice as codified in RFC 974 and RFC 1123. Unfortunately,
- the standards do not adequately control bounce loops, since they do
- not recognize that postmasters want to see double bounces; they do
- not adequately control relaying loops; and they do not prevent
- cross-host forwarding loops.
-
- Terminology: The mail message received by an automailer is called
- input. The mail messages sent by an automailer are called outputs.
- For simplicity, this document focuses on the case that the input has
- just one envelope recipient.
-
- REMINDER: This document describes the automailers in the qmail
- package. Other packages include automailers that do not fit the
- descriptions given here.
-
- Beware that the war on mail loops can never be won: any method of
- preventing mail loops can be subverted by other hosts. I welcome
- further development of techniques that work well in practice.
-
-
-2. Basics
-
- The output from an automailer is always further down the following
- list than the input.
-
- 0 hops, <sender> is neither <> nor <#@[]> normal messages
- 1 hop, <sender> is neither <> nor <#@[]>
- 2 hops, <sender> is neither <> nor <#@[]>
- etc.
- 0 hops, <sender> is <> bounces
- 1 hop, <sender> is <>
- 2 hops, <sender> is <>
- etc.
- 0 hops, <sender> is <#@[]> double bounces
- 1 hop, <sender> is <#@[]>
- 2 hops, <sender> is <#@[]>
- etc.
-
- Here sender means the envelope sender address. Hops means the number
- of Received and Delivered-To fields in the header. See sections 3.3
- and 3.4 for an explanation of <> and <#@[]>.
-
- Consequently, no automailer ever generates an entirely new normal
- message in response to a normal message. If the output is a normal
- message, it always has more hops than the input.
-
- When input and output are both normal messages, both bounces, or both
- double bounces, the output header is essentially the same as the
- input header. However, when an automailer moves from a normal message
- to a bounce, or from a bounce to a double bounce, it generates an
- entirely new header.
-
- An automailer may refuse to operate if the input has too many hops.
- The definition of too many hops depends on the automailer. This
- practice is called hop counting. Note that some existing messages
- legitimately take as many as 20 hops. One automailer uses a limit of
- 100 hops; this will be adequate for all messages in the foreseeable
- future.
-
- Hop counting is a weapon of last resort. It will, if correctly
- implemented, prevent all infinite loops; however, even a finite loop
- can do practically infinite damage, as illustrated in section 4.3.
-
-
-3. Pre-delivery automailers
-
- Conceptually: The input is a message that has not yet reached its
- envelope recipient address. It is fed to a relay, which attempts to
- deliver the message directly to, or at least closer to, that address;
- if the relay fails permanently, the message is fed to a bouncer or a
- double-bouncer. Relays, bouncers, and double-bouncers are examples of
- pre-delivery automailers.
-
- A pre-delivery automailer produces at most one output.
-
- The basic weapon against pre-delivery mail loops is gravity. A normal
- message always moves closer to its envelope recipient, according to a
- notion of distance defined in section 3.1. If it bounces before
- reaching the recipient, it turns into a bounce message, which always
- moves closer to the original envelope sender. If that in turn
- bounces, it turns into a double bounce, which always moves closer to
- a local postmaster. (Triple bounces do not exist.)
-
-
-3.1. Distance
-
- The distance from a DNS domain D to a recipient U@R is defined as
- follows, when R has an MX list: the minimum preference of D in the
- MX list, or 100000 if D does not appear in the list.
-
- When R has no MX records, the distance from R to U@R is defined as 0,
- and the distance from any other domain to U@R is defined as 100000.
-
- Exception: If R is an alias, i.e., if R has a CNAME record, the
- distance from any domain to U@R is defined as 500000.
-
- The distance from a host H to U@R is defined as the minimum distance
- to U@R from any domain that touches H. (``D touches H'' means ``D has
- an A record listing one of H's IP addresses.'')
-
- Exception: If H does not accept mail from the network, its distance
- to any recipient is defined as 999999.
-
-
-3.2. Relays
-
- A relay is a pre-delivery automailer that sends the output towards
- the envelope recipient. What this means for intra-host relays is not
- discussed here. What this means for cross-host relays is the
- following: if the relay is at host H, and it sends its output to host
- T, then the distance from T to the output envelope recipient is
- always smaller than the distance from H to the input envelope
- recipient.
-
- The following facts guarantee that certain cross-host relay behavior
- is safe. For proofs of these facts, see Appendix A.
-
- Fact 1: If R is an alias for X, X is not an alias, D touches T,
- and T accepts mail from the network, then the distance from T to
- U@X is smaller than the distance from H to U@R.
-
- Fact 2: If R is not an alias, R has no MX records, H is not
- touched by R, T is touched by R, and T accepts mail from the
- network, then T is closer to U@R than H is.
-
- Fact 3: If R is not an alias, R has an MX record with domain X and
- preference p, H is not touched by any of the domains in the MX
- list for R with preference <= p, T is touched by X, and T accepts
- mail from the network, then T is closer to U@R than H is.
-
- Also, a host that does not accept mail from the network can relay
- messages to a nearby hub.
-
- A relay adds a new Received header field to the top of the output.
- Other than this, the output header, body, and envelope are exactly
- the same as the input header, body, and envelope. Exception: If the
- input envelope recipient is U@R, R is an alias for X, and X is not
- an alias, the output envelope recipient is U@X.
-
-
-3.3. Bouncers
-
- A bouncer is a pre-delivery automailer that lets the envelope sender
- know what happened to a message. Most bouncers send failure notices.
- Some bouncers, such as vacation servers and echo servers, send
- success notices.
-
- In a bouncer's output, the envelope sender is <>, and the envelope
- recipient is the input envelope sender. A bouncer refuses to operate
- if the input envelope sender is <> or <#@[]>.
-
- Some mailers on the Internet do not understand the <> convention. In
- fact, some mailers will rewrite <> as <@host>. So any message with an
- envelope recipient of <> or <@host> is discarded upon local delivery.
-
- Unlike a relay, a bouncer produces output with a new header, not
- simply a copy of the input header. For example:
-
- (envelope) from <> to <djb@silverton.berkeley.edu>
- Date: 2 Jan 1996 03:38:25 GMT
- From: DELIVERY NOTICE SYSTEM <MAILER-DAEMON@heaven.af.mil>
- To: djb@silverton.berkeley.edu
- Subject: failure notice
-
- However, the body of the bounce indicates the relevant input envelope
- recipient, as well as the Message-ID of the input, if the input had a
- Message-ID. The body of a failure notice includes a copy of the
- entire input message.
-
-
-3.4. Double-bouncers
-
- A double-bouncer is a pre-delivery automailer that informs a local
- postmaster of permanent failures to deliver bounce messages. Such
- failures are generally caused by poorly configured hosts that produce
- normal messages with faulty envelope sender addresses.
-
- A double-bouncer refuses to operate unless the input envelope sender
- is <>. The output envelope sender from a double-bouncer is <#@[]>;
- note that <#@[]> cannot be used as an SMTP envelope sender under
- RFC 821. The output envelope recipient is predetermined.
-
- Note that double bounces are not suggested by RFC 1123. However,
- faulty envelope sender addresses are usually configuration errors
- that can and should be fixed. Some postmasters, faced with mail
- software that throws away double bounces, resort to keeping copies of
- all bounces; but single bounces are rarely the postmaster's problem.
-
-
-4. Post-delivery automailers
-
- Conceptually: The input is a message that has reached its envelope
- recipient address. It is fed to a post-delivery automailer at that
- address.
-
- The basic weapon against post-delivery loops is a new header field,
- Delivered-To, tracing all the forwarders and mailing lists that a
- message has been through. This field has the side benefit of making
- it much easier for a user (or for a postmaster seeing a bounce) to
- figure out the path that the message took. Delivered-To is similar to
- RFC 1327's DL-Expansion-History, but (1) it omits the time stamp,
- removing any need for parsing, and (2) it has a much better name.
-
-
-4.1. Exploders and repliers
-
- There are two basic types of post-delivery automailers: exploders,
- where the output envelope recipients are predetermined; and repliers,
- where there is just one output, with envelope recipient determined
- from the input.
-
- Repliers normally determine the output envelope recipient as either
- the input Reply-To header field, if it exists; or else the input
- From header field, if it exists; or else the envelope sender. A
- replier never produces an output to <> or <#@[]>.
-
- Exploders are classified into mailing lists, where the output
- envelope senders are predetermined, and forwarders, where every
- output has envelope sender equal to the original envelope sender.
-
- Exception: if the input envelope sender is <> or <#@[]>, then the
- output envelope senders are equal to the input envelope sender, even
- for a mailing list.
-
- Note that, if the envelope sender of a mailing list with M bad
- addresses is another exploder with E bad addresses, the local
- postmaster will receive EM double bounces for each message to the
- mailing list.
-
-
-4.2. Delivered-To
-
- Every post-delivery automailer adds a new Delivered-To header field
- to the top of each output.
-
- The contents of the Delivered-To field are typically the address of
- the automailer, i.e., the input envelope recipient, conventionally
- without any quoting. The contents of the Delivered-To field are in
- any case entirely predetermined. The automailer checks if exactly the
- same Delivered-To field already appears in the header; if so, it
- refuses to operate.
-
- A post-delivery automailer preserves existing Delivered-To and
- Received fields. In fact, a post-delivery automailer generally
- preserves all header fields. The exceptions are limited to known
- fields that are not used for loop detection and that must be removed
- for correct operation. For example, a replier generally changes the
- body of a message and thus should not preserve the SVR4
- Content-Length field.
-
-
-4.3. An example
-
- Aliases and mailing lists are highly dangerous, because they can
- generate several outputs for each input.
-
- Here is an extreme example. A user has three accounts, and wants any
- message to any of the accounts to be delivered to all three. So he
- forwards luser@host1 to luser@host2 and luser@host3, forwards
- luser@host2 to luser@host1 and luser@host3, and forwards luser@host3
- to luser@host1 and luser@host2.
-
- Without Delivered-To, someone who sends a message to luser@host1 will
- receive a practically infinite series of bounces. For example, with a
- hop count limit of 50, the sender will receive 1125899906842624
- bounces.
-
- If all the hosts, or two out of the three, support Delivered-To, the
- message will bounce just a few times. If just one of the hosts
- supports Delivered-To, it will be the unfortunate victim of a loop
- between the other two hosts---although the total number of bounces
- will drop from practically infinite down to a few hundred, with
- typical hop count limits.
-
-
-Appendix A. Proofs of correctness for MX handling
-
- Section 3.2 states three facts about the notion of distance defined
- in section 3.1. Here are mathematical proofs of those facts.
-
- Symbols: D, E, R, and X are domains; H and T are hosts; p and q are
- nonnegative integers. {} is the empty set.
-
- Hypotheses: M(R), the ``MX list for R,'' is a set of pairs (p,D)
- where p <= 65535. There is a set A of domains, called ``aliases.''
- There is a relation D->H, called ``D touches H.'' There is a set N of
- hosts, called ``hosts that accept mail from the network.''
-
- Definitions: m(D,R) = min { p: p = 100000 or (p,D) in M(R) } when
- M(R) is nonempty. When M(R) is empty, m(D,R) is 0 if D = R, 100000
- otherwise. f(D,R) is defined as 500000 if R is in A, m(D,R)
- otherwise; this is the ``distance from D to U@R,'' for any U. g(H,R)
- is defined as min { f(D,R): D->H } if H is in N, 999999 otherwise;
- this is the ``distance from H to U@R,'' for any U.
-
- Fact 1 (generalized): If R is in A, X is not in A, D->T, and T is in
- N, then g(T,X) < g(H,R). Proof: R is in A, so f(E,R) = 500000 for any
- E; thus g(H,R) >= 500000. X is not in A, so f(D,X) = m(D,X) <=
- 100000; hence g(T,X) <= f(D,X) <= 100000 < g(H,R).
-
- Fact 2: If R is not in A, M(R) = {}, R->T, T is in N, and not R->H,
- then g(T,R) < g(H,R). Proof: f(R,R) = m(R,R) = 0 since R is not in A
- and M(R) = {}. T is in N so g(T,R) <= f(R,R) = 0 so g(T,R) = 0.
- Suppose that g(H,R) <= g(T,R). Then g(H,R) = 0, so f(D,R) = 0 for
- some D with D->H, so m(D,R) = 0. But then D = R by definition of m,
- so R->H. Contradiction. Thus g(T,R) < g(H,R).
-
- Fact 3: If R is not in A, (p,X) is in M(R), X->T, T is in N, and
- (q,D) is not in M(R) whenever D->H and q <= p, then g(T,R) < g(H,R).
- Proof: First m(X,R) <= p. R is not in A, so f(X,R) = m(X,R). T is in
- N, so g(T,R) <= f(X,R). Thus g(T,R) <= p. Suppose that g(H,R) <= p.
- Then f(D,R) <= p for some D with D->H, so m(D,R) <= p. But then
- (m(D,R),D) is in M(R). Contradiction. Thus g(T,R) <= p < g(H,R).