| 1 | 1. Overview |
| 2 | |
| 3 | Here's the data flow in the qmail suite: |
| 4 | |
| 5 | qmail-smtpd --- qmail-queue --- qmail-send --- qmail-rspawn --- qmail-remote |
| 6 | / | \ |
| 7 | qmail-inject _/ qmail-clean \_ qmail-lspawn --- qmail-local |
| 8 | |
| 9 | Every message is added to a central queue directory by qmail-queue. |
| 10 | qmail-queue is invoked as needed, usually by qmail-inject for locally |
| 11 | generated messages, qmail-smtpd for messages received through SMTP, |
| 12 | qmail-local for forwarded messages, or qmail-send for bounce messages. |
| 13 | |
| 14 | Every message is then delivered by qmail-send, in cooperation with |
| 15 | qmail-lspawn and qmail-rspawn, and cleaned up by qmail-clean. These four |
| 16 | programs are long-running daemons. |
| 17 | |
| 18 | The queue is designed to be crashproof, provided that the underlying |
| 19 | filesystem is crashproof. All cleanups are handled by qmail-send and |
| 20 | qmail-clean without human intervention. See section 6 for more details. |
| 21 | |
| 22 | |
| 23 | 2. Queue structure |
| 24 | |
| 25 | Each message in the queue is identified by a unique number, let's say |
| 26 | 457. The queue is organized into several directories, each of which may |
| 27 | contain files related to message 457: |
| 28 | |
| 29 | mess/457: the message |
| 30 | todo/457: the envelope: where the message came from, where it's going |
| 31 | intd/457: the envelope, under construction by qmail-queue |
| 32 | info/457: the envelope sender address, after preprocessing |
| 33 | local/457: local envelope recipient addresses, after preprocessing |
| 34 | remote/457: remote envelope recipient addresses, after preprocessing |
| 35 | bounce/457: permanent delivery errors |
| 36 | |
| 37 | Here are all possible states for a message. + means a file exists; - |
| 38 | means it does not exist; ? means it may or may not exist. |
| 39 | |
| 40 | S1. -mess -intd -todo -info -local -remote -bounce |
| 41 | S2. +mess -intd -todo -info -local -remote -bounce |
| 42 | S3. +mess +intd -todo -info -local -remote -bounce |
| 43 | S4. +mess ?intd +todo ?info ?local ?remote -bounce (queued) |
| 44 | S5. +mess -intd -todo +info ?local ?remote ?bounce (preprocessed) |
| 45 | |
| 46 | Guarantee: If mess/457 exists, it has inode number 457. |
| 47 | |
| 48 | |
| 49 | 3. How messages enter the queue |
| 50 | |
| 51 | To add a message to the queue, qmail-queue first creates a file in a |
| 52 | separate directory, pid/, with a unique name. The filesystem assigns |
| 53 | that file a unique inode number. qmail-queue looks at that number, say |
| 54 | 457. By the guarantee above, message 457 must be in state S1. |
| 55 | |
| 56 | qmail-queue renames pid/whatever as mess/457, moving to S2. It writes |
| 57 | the message to mess/457. It then creates intd/457, moving to S3, and |
| 58 | writes the envelope information to intd/457. |
| 59 | |
| 60 | Finally qmail-queue creates a new link, todo/457, for intd/457, moving |
| 61 | to S4. At that instant the message has been successfully queued, and |
| 62 | qmail-queue leaves it for further handling by qmail-send. |
| 63 | |
| 64 | qmail-queue starts a 24-hour timer before touching any files, and |
| 65 | commits suicide if the timer expires. |
| 66 | |
| 67 | |
| 68 | 4. How queued messages are preprocessed |
| 69 | |
| 70 | Once a message has been queued, qmail-send must decide which recipients |
| 71 | are local and which recipients are remote. It may also rewrite some |
| 72 | recipient addresses. |
| 73 | |
| 74 | When qmail-send notices todo/457, it knows that message 457 is in S4. It |
| 75 | removes info/457, local/457, and remote/457 if they exist. Then it reads |
| 76 | through todo/457. It creates info/457, possibly local/457, and possibly |
| 77 | remote/457. When it is done, it removes intd/457. The message is still |
| 78 | in S4 at this point. Finally qmail-send removes todo/457, moving to S5. |
| 79 | At that instant the message has been successfully preprocessed. |
| 80 | |
| 81 | |
| 82 | 5. How preprocessed messages are delivered |
| 83 | |
| 84 | Messages at S5 are handled as follows. Each address in local/457 and |
| 85 | remote/457 is marked either NOT DONE or DONE. |
| 86 | |
| 87 | DONE: The message was successfully delivered, or the last delivery |
| 88 | attempt met with permanent failure. Either way, qmail-send |
| 89 | should not attempt further delivery to this address. |
| 90 | |
| 91 | NOT DONE: If there have been any delivery attempts, they have all |
| 92 | met with temporary failure. Either way, qmail-send should |
| 93 | try delivery in the future. |
| 94 | |
| 95 | qmail-send may at its leisure try to deliver a message to a NOT DONE |
| 96 | address. If the message is successfully delivered, qmail-send marks the |
| 97 | address as DONE. If the delivery attempt meets with permanent failure, |
| 98 | qmail-send first appends a note to bounce/457, creating bounce/457 if |
| 99 | necessary; then it marks the address as DONE. Note that bounce/457 is |
| 100 | not crashproof. |
| 101 | |
| 102 | qmail-send may handle bounce/457 at any time, as follows: it (1) injects |
| 103 | a new bounce message, created from bounce/457 and mess/457; (2) deletes |
| 104 | bounce/457. |
| 105 | |
| 106 | When all addresses in local/457 are DONE, qmail-send deletes local/457. |
| 107 | Same for remote/457. |
| 108 | |
| 109 | When local/457 and remote/457 are gone, qmail-send eliminates the |
| 110 | message, as follows. First, if bounce/457 exists, qmail-send handles it |
| 111 | as described above. Once bounce/457 is definitely gone, qmail-send |
| 112 | deletes info/457, moving to S2, and finally mess/457, moving to S1. |
| 113 | |
| 114 | |
| 115 | 6. Cleanups |
| 116 | |
| 117 | If the computer crashes while qmail-queue is trying to queue a message, |
| 118 | or while qmail-send is eliminating a message, the message may be left in |
| 119 | state S2 or S3. |
| 120 | |
| 121 | When qmail-send sees a message in state S2 or S3---other than one |
| 122 | it is currently eliminating!---where mess/457 is more than 36 hours old, |
| 123 | it deletes intd/457 if that exists, then deletes mess/457. Note that any |
| 124 | qmail-queue handling the message must be dead. |
| 125 | |
| 126 | Similarly, when qmail-send sees a file in the pid/ directory that is |
| 127 | more than 36 hours old, it deletes it. |
| 128 | |
| 129 | Cleanups are not necessary if the computer crashes while qmail-send is |
| 130 | delivering a message. At worst a message may be delivered twice. (There |
| 131 | is no way for a distributed mail system to eliminate the possibility of |
| 132 | duplication. What if an SMTP connection is broken just before the server |
| 133 | acknowledges successful receipt of the message? The client must assume |
| 134 | the worst and send the message again. Similarly, if the computer crashes |
| 135 | just before qmail-send marks a message as DONE, the new qmail-send must |
| 136 | assume the worst and send the message again. The usual solutions in the |
| 137 | database literature---e.g., keeping log files---amount to saying that |
| 138 | it's the recipient's computer's job to discard duplicate messages.) |
| 139 | |
| 140 | |
| 141 | 7. Further notes |
| 142 | |
| 143 | Currently info/457 serves two purposes: first, it records the envelope |
| 144 | sender; second, its modification time is used to decide when a message |
| 145 | has been in the queue too long. In the future info/457 may store more |
| 146 | information. Any non-backwards-compatible changes will be identified by |
| 147 | version numbers. |
| 148 | |
| 149 | When qmail-queue has successfully placed a message into the queue, it |
| 150 | pulls a trigger offered by qmail-send. Here is the current triggering |
| 151 | mechanism: lock/trigger is a named pipe. Before scanning todo/, |
| 152 | qmail-send opens lock/trigger O_NDELAY for reading. It then selects for |
| 153 | readability on lock/trigger. qmail-queue pulls the trigger by writing a |
| 154 | byte O_NDELAY to lock/trigger. This makes lock/trigger readable and |
| 155 | wakes up qmail-send. Before scanning todo/ again, qmail-send closes and |
| 156 | reopens lock/trigger. |