Commit | Line | Data |
---|---|---|
2117e02e MW |
1 | Tools in the war on mail loops |
2 | D. J. Bernstein, djb@pobox.com | |
3 | 19970201 | |
4 | ||
5 | ||
6 | 1. Introduction | |
7 | ||
8 | An automailer means any program that receives a mail message and | |
9 | automatically sends one or more mail messages. This term is meant to | |
10 | include not only a mail-based server, such as a mailing list exploder | |
11 | or a vacation program, but also an SMTP server, which receives a | |
12 | message from the network and relays it to a local or remote user. | |
13 | ||
14 | In a network full of automailers, any mistake can cause a mail loop. | |
15 | Since some automailers generate several outputs in response to a | |
16 | single input, a loop can produce an exponential explosion of mail. | |
17 | ||
18 | All the automailers in the qmail package follow a general philosophy | |
19 | designed to prevent mail loops and limit the damage from any loops | |
20 | that do occur. These automailers have been repeatedly observed to | |
21 | fail safe: they stop loops in the face of typical failures by other | |
22 | hosts. This document explains the philosophy and describes the | |
23 | automailers. | |
24 | ||
25 | To some extent the philosophy here simply repeats and amplifies | |
26 | standard practice as codified in RFC 974 and RFC 1123. Unfortunately, | |
27 | the standards do not adequately control bounce loops, since they do | |
28 | not recognize that postmasters want to see double bounces; they do | |
29 | not adequately control relaying loops; and they do not prevent | |
30 | cross-host forwarding loops. | |
31 | ||
32 | Terminology: The mail message received by an automailer is called | |
33 | input. The mail messages sent by an automailer are called outputs. | |
34 | For simplicity, this document focuses on the case that the input has | |
35 | just one envelope recipient. | |
36 | ||
37 | REMINDER: This document describes the automailers in the qmail | |
38 | package. Other packages include automailers that do not fit the | |
39 | descriptions given here. | |
40 | ||
41 | Beware that the war on mail loops can never be won: any method of | |
42 | preventing mail loops can be subverted by other hosts. I welcome | |
43 | further development of techniques that work well in practice. | |
44 | ||
45 | ||
46 | 2. Basics | |
47 | ||
48 | The output from an automailer is always further down the following | |
49 | list than the input. | |
50 | ||
51 | 0 hops, <sender> is neither <> nor <#@[]> normal messages | |
52 | 1 hop, <sender> is neither <> nor <#@[]> | |
53 | 2 hops, <sender> is neither <> nor <#@[]> | |
54 | etc. | |
55 | 0 hops, <sender> is <> bounces | |
56 | 1 hop, <sender> is <> | |
57 | 2 hops, <sender> is <> | |
58 | etc. | |
59 | 0 hops, <sender> is <#@[]> double bounces | |
60 | 1 hop, <sender> is <#@[]> | |
61 | 2 hops, <sender> is <#@[]> | |
62 | etc. | |
63 | ||
64 | Here sender means the envelope sender address. Hops means the number | |
65 | of Received and Delivered-To fields in the header. See sections 3.3 | |
66 | and 3.4 for an explanation of <> and <#@[]>. | |
67 | ||
68 | Consequently, no automailer ever generates an entirely new normal | |
69 | message in response to a normal message. If the output is a normal | |
70 | message, it always has more hops than the input. | |
71 | ||
72 | When input and output are both normal messages, both bounces, or both | |
73 | double bounces, the output header is essentially the same as the | |
74 | input header. However, when an automailer moves from a normal message | |
75 | to a bounce, or from a bounce to a double bounce, it generates an | |
76 | entirely new header. | |
77 | ||
78 | An automailer may refuse to operate if the input has too many hops. | |
79 | The definition of too many hops depends on the automailer. This | |
80 | practice is called hop counting. Note that some existing messages | |
81 | legitimately take as many as 20 hops. One automailer uses a limit of | |
82 | 100 hops; this will be adequate for all messages in the foreseeable | |
83 | future. | |
84 | ||
85 | Hop counting is a weapon of last resort. It will, if correctly | |
86 | implemented, prevent all infinite loops; however, even a finite loop | |
87 | can do practically infinite damage, as illustrated in section 4.3. | |
88 | ||
89 | ||
90 | 3. Pre-delivery automailers | |
91 | ||
92 | Conceptually: The input is a message that has not yet reached its | |
93 | envelope recipient address. It is fed to a relay, which attempts to | |
94 | deliver the message directly to, or at least closer to, that address; | |
95 | if the relay fails permanently, the message is fed to a bouncer or a | |
96 | double-bouncer. Relays, bouncers, and double-bouncers are examples of | |
97 | pre-delivery automailers. | |
98 | ||
99 | A pre-delivery automailer produces at most one output. | |
100 | ||
101 | The basic weapon against pre-delivery mail loops is gravity. A normal | |
102 | message always moves closer to its envelope recipient, according to a | |
103 | notion of distance defined in section 3.1. If it bounces before | |
104 | reaching the recipient, it turns into a bounce message, which always | |
105 | moves closer to the original envelope sender. If that in turn | |
106 | bounces, it turns into a double bounce, which always moves closer to | |
107 | a local postmaster. (Triple bounces do not exist.) | |
108 | ||
109 | ||
110 | 3.1. Distance | |
111 | ||
112 | The distance from a DNS domain D to a recipient U@R is defined as | |
113 | follows, when R has an MX list: the minimum preference of D in the | |
114 | MX list, or 100000 if D does not appear in the list. | |
115 | ||
116 | When R has no MX records, the distance from R to U@R is defined as 0, | |
117 | and the distance from any other domain to U@R is defined as 100000. | |
118 | ||
119 | Exception: If R is an alias, i.e., if R has a CNAME record, the | |
120 | distance from any domain to U@R is defined as 500000. | |
121 | ||
122 | The distance from a host H to U@R is defined as the minimum distance | |
123 | to U@R from any domain that touches H. (``D touches H'' means ``D has | |
124 | an A record listing one of H's IP addresses.'') | |
125 | ||
126 | Exception: If H does not accept mail from the network, its distance | |
127 | to any recipient is defined as 999999. | |
128 | ||
129 | ||
130 | 3.2. Relays | |
131 | ||
132 | A relay is a pre-delivery automailer that sends the output towards | |
133 | the envelope recipient. What this means for intra-host relays is not | |
134 | discussed here. What this means for cross-host relays is the | |
135 | following: if the relay is at host H, and it sends its output to host | |
136 | T, then the distance from T to the output envelope recipient is | |
137 | always smaller than the distance from H to the input envelope | |
138 | recipient. | |
139 | ||
140 | The following facts guarantee that certain cross-host relay behavior | |
141 | is safe. For proofs of these facts, see Appendix A. | |
142 | ||
143 | Fact 1: If R is an alias for X, X is not an alias, D touches T, | |
144 | and T accepts mail from the network, then the distance from T to | |
145 | U@X is smaller than the distance from H to U@R. | |
146 | ||
147 | Fact 2: If R is not an alias, R has no MX records, H is not | |
148 | touched by R, T is touched by R, and T accepts mail from the | |
149 | network, then T is closer to U@R than H is. | |
150 | ||
151 | Fact 3: If R is not an alias, R has an MX record with domain X and | |
152 | preference p, H is not touched by any of the domains in the MX | |
153 | list for R with preference <= p, T is touched by X, and T accepts | |
154 | mail from the network, then T is closer to U@R than H is. | |
155 | ||
156 | Also, a host that does not accept mail from the network can relay | |
157 | messages to a nearby hub. | |
158 | ||
159 | A relay adds a new Received header field to the top of the output. | |
160 | Other than this, the output header, body, and envelope are exactly | |
161 | the same as the input header, body, and envelope. Exception: If the | |
162 | input envelope recipient is U@R, R is an alias for X, and X is not | |
163 | an alias, the output envelope recipient is U@X. | |
164 | ||
165 | ||
166 | 3.3. Bouncers | |
167 | ||
168 | A bouncer is a pre-delivery automailer that lets the envelope sender | |
169 | know what happened to a message. Most bouncers send failure notices. | |
170 | Some bouncers, such as vacation servers and echo servers, send | |
171 | success notices. | |
172 | ||
173 | In a bouncer's output, the envelope sender is <>, and the envelope | |
174 | recipient is the input envelope sender. A bouncer refuses to operate | |
175 | if the input envelope sender is <> or <#@[]>. | |
176 | ||
177 | Some mailers on the Internet do not understand the <> convention. In | |
178 | fact, some mailers will rewrite <> as <@host>. So any message with an | |
179 | envelope recipient of <> or <@host> is discarded upon local delivery. | |
180 | ||
181 | Unlike a relay, a bouncer produces output with a new header, not | |
182 | simply a copy of the input header. For example: | |
183 | ||
184 | (envelope) from <> to <djb@silverton.berkeley.edu> | |
185 | Date: 2 Jan 1996 03:38:25 GMT | |
186 | From: DELIVERY NOTICE SYSTEM <MAILER-DAEMON@heaven.af.mil> | |
187 | To: djb@silverton.berkeley.edu | |
188 | Subject: failure notice | |
189 | ||
190 | However, the body of the bounce indicates the relevant input envelope | |
191 | recipient, as well as the Message-ID of the input, if the input had a | |
192 | Message-ID. The body of a failure notice includes a copy of the | |
193 | entire input message. | |
194 | ||
195 | ||
196 | 3.4. Double-bouncers | |
197 | ||
198 | A double-bouncer is a pre-delivery automailer that informs a local | |
199 | postmaster of permanent failures to deliver bounce messages. Such | |
200 | failures are generally caused by poorly configured hosts that produce | |
201 | normal messages with faulty envelope sender addresses. | |
202 | ||
203 | A double-bouncer refuses to operate unless the input envelope sender | |
204 | is <>. The output envelope sender from a double-bouncer is <#@[]>; | |
205 | note that <#@[]> cannot be used as an SMTP envelope sender under | |
206 | RFC 821. The output envelope recipient is predetermined. | |
207 | ||
208 | Note that double bounces are not suggested by RFC 1123. However, | |
209 | faulty envelope sender addresses are usually configuration errors | |
210 | that can and should be fixed. Some postmasters, faced with mail | |
211 | software that throws away double bounces, resort to keeping copies of | |
212 | all bounces; but single bounces are rarely the postmaster's problem. | |
213 | ||
214 | ||
215 | 4. Post-delivery automailers | |
216 | ||
217 | Conceptually: The input is a message that has reached its envelope | |
218 | recipient address. It is fed to a post-delivery automailer at that | |
219 | address. | |
220 | ||
221 | The basic weapon against post-delivery loops is a new header field, | |
222 | Delivered-To, tracing all the forwarders and mailing lists that a | |
223 | message has been through. This field has the side benefit of making | |
224 | it much easier for a user (or for a postmaster seeing a bounce) to | |
225 | figure out the path that the message took. Delivered-To is similar to | |
226 | RFC 1327's DL-Expansion-History, but (1) it omits the time stamp, | |
227 | removing any need for parsing, and (2) it has a much better name. | |
228 | ||
229 | ||
230 | 4.1. Exploders and repliers | |
231 | ||
232 | There are two basic types of post-delivery automailers: exploders, | |
233 | where the output envelope recipients are predetermined; and repliers, | |
234 | where there is just one output, with envelope recipient determined | |
235 | from the input. | |
236 | ||
237 | Repliers normally determine the output envelope recipient as either | |
238 | the input Reply-To header field, if it exists; or else the input | |
239 | From header field, if it exists; or else the envelope sender. A | |
240 | replier never produces an output to <> or <#@[]>. | |
241 | ||
242 | Exploders are classified into mailing lists, where the output | |
243 | envelope senders are predetermined, and forwarders, where every | |
244 | output has envelope sender equal to the original envelope sender. | |
245 | ||
246 | Exception: if the input envelope sender is <> or <#@[]>, then the | |
247 | output envelope senders are equal to the input envelope sender, even | |
248 | for a mailing list. | |
249 | ||
250 | Note that, if the envelope sender of a mailing list with M bad | |
251 | addresses is another exploder with E bad addresses, the local | |
252 | postmaster will receive EM double bounces for each message to the | |
253 | mailing list. | |
254 | ||
255 | ||
256 | 4.2. Delivered-To | |
257 | ||
258 | Every post-delivery automailer adds a new Delivered-To header field | |
259 | to the top of each output. | |
260 | ||
261 | The contents of the Delivered-To field are typically the address of | |
262 | the automailer, i.e., the input envelope recipient, conventionally | |
263 | without any quoting. The contents of the Delivered-To field are in | |
264 | any case entirely predetermined. The automailer checks if exactly the | |
265 | same Delivered-To field already appears in the header; if so, it | |
266 | refuses to operate. | |
267 | ||
268 | A post-delivery automailer preserves existing Delivered-To and | |
269 | Received fields. In fact, a post-delivery automailer generally | |
270 | preserves all header fields. The exceptions are limited to known | |
271 | fields that are not used for loop detection and that must be removed | |
272 | for correct operation. For example, a replier generally changes the | |
273 | body of a message and thus should not preserve the SVR4 | |
274 | Content-Length field. | |
275 | ||
276 | ||
277 | 4.3. An example | |
278 | ||
279 | Aliases and mailing lists are highly dangerous, because they can | |
280 | generate several outputs for each input. | |
281 | ||
282 | Here is an extreme example. A user has three accounts, and wants any | |
283 | message to any of the accounts to be delivered to all three. So he | |
284 | forwards luser@host1 to luser@host2 and luser@host3, forwards | |
285 | luser@host2 to luser@host1 and luser@host3, and forwards luser@host3 | |
286 | to luser@host1 and luser@host2. | |
287 | ||
288 | Without Delivered-To, someone who sends a message to luser@host1 will | |
289 | receive a practically infinite series of bounces. For example, with a | |
290 | hop count limit of 50, the sender will receive 1125899906842624 | |
291 | bounces. | |
292 | ||
293 | If all the hosts, or two out of the three, support Delivered-To, the | |
294 | message will bounce just a few times. If just one of the hosts | |
295 | supports Delivered-To, it will be the unfortunate victim of a loop | |
296 | between the other two hosts---although the total number of bounces | |
297 | will drop from practically infinite down to a few hundred, with | |
298 | typical hop count limits. | |
299 | ||
300 | ||
301 | Appendix A. Proofs of correctness for MX handling | |
302 | ||
303 | Section 3.2 states three facts about the notion of distance defined | |
304 | in section 3.1. Here are mathematical proofs of those facts. | |
305 | ||
306 | Symbols: D, E, R, and X are domains; H and T are hosts; p and q are | |
307 | nonnegative integers. {} is the empty set. | |
308 | ||
309 | Hypotheses: M(R), the ``MX list for R,'' is a set of pairs (p,D) | |
310 | where p <= 65535. There is a set A of domains, called ``aliases.'' | |
311 | There is a relation D->H, called ``D touches H.'' There is a set N of | |
312 | hosts, called ``hosts that accept mail from the network.'' | |
313 | ||
314 | Definitions: m(D,R) = min { p: p = 100000 or (p,D) in M(R) } when | |
315 | M(R) is nonempty. When M(R) is empty, m(D,R) is 0 if D = R, 100000 | |
316 | otherwise. f(D,R) is defined as 500000 if R is in A, m(D,R) | |
317 | otherwise; this is the ``distance from D to U@R,'' for any U. g(H,R) | |
318 | is defined as min { f(D,R): D->H } if H is in N, 999999 otherwise; | |
319 | this is the ``distance from H to U@R,'' for any U. | |
320 | ||
321 | Fact 1 (generalized): If R is in A, X is not in A, D->T, and T is in | |
322 | N, then g(T,X) < g(H,R). Proof: R is in A, so f(E,R) = 500000 for any | |
323 | E; thus g(H,R) >= 500000. X is not in A, so f(D,X) = m(D,X) <= | |
324 | 100000; hence g(T,X) <= f(D,X) <= 100000 < g(H,R). | |
325 | ||
326 | Fact 2: If R is not in A, M(R) = {}, R->T, T is in N, and not R->H, | |
327 | then g(T,R) < g(H,R). Proof: f(R,R) = m(R,R) = 0 since R is not in A | |
328 | and M(R) = {}. T is in N so g(T,R) <= f(R,R) = 0 so g(T,R) = 0. | |
329 | Suppose that g(H,R) <= g(T,R). Then g(H,R) = 0, so f(D,R) = 0 for | |
330 | some D with D->H, so m(D,R) = 0. But then D = R by definition of m, | |
331 | so R->H. Contradiction. Thus g(T,R) < g(H,R). | |
332 | ||
333 | Fact 3: If R is not in A, (p,X) is in M(R), X->T, T is in N, and | |
334 | (q,D) is not in M(R) whenever D->H and q <= p, then g(T,R) < g(H,R). | |
335 | Proof: First m(X,R) <= p. R is not in A, so f(X,R) = m(X,R). T is in | |
336 | N, so g(T,R) <= f(X,R). Thus g(T,R) <= p. Suppose that g(H,R) <= p. | |
337 | Then f(D,R) <= p for some D with D->H, so m(D,R) <= p. But then | |
338 | (m(D,R),D) is in M(R). Contradiction. Thus g(T,R) <= p < g(H,R). |