| 1 | .TH ezmlm-archive 1 |
| 2 | .SH NAME |
| 3 | ezmlm-archive \- create thread and author index for a mailing list archive |
| 4 | .SH SYNOPSIS |
| 5 | .B ezmlm-archive |
| 6 | [ |
| 7 | .B \-cCFTvV |
| 8 | ][ |
| 9 | .B \-f\fI msg1 |
| 10 | ] |
| 11 | ][ |
| 12 | .B \-t\fI msg2 |
| 13 | ] |
| 14 | .I dir |
| 15 | .SH DESCRIPTION |
| 16 | .B ezmlm-archive |
| 17 | reads the index files from a message archive, and creates a subject index, a |
| 18 | collection of subject files, and a collection of author files. These |
| 19 | files are suitable as an index for WWW access to, and navigation through |
| 20 | a mailing list archive by |
| 21 | .BR ezmlm-cgi(1) . |
| 22 | |
| 23 | The index files read are created by |
| 24 | .B ezmlm-idx(1) |
| 25 | on a per-list basis and by |
| 26 | .B ezmlm-send(1) |
| 27 | on a per-message archive for a indexed list. |
| 28 | |
| 29 | The output files created are: |
| 30 | .TP |
| 31 | .I dir\fB/archive/threads/yyyymm |
| 32 | The thread index. It contains one line per subject, starting with the |
| 33 | number of the first message with that subject within the set |
| 34 | investigated, ``:'', a 20 character |
| 35 | subject hash, blank, ``[n]'' where ``n'' is the number of messages in the |
| 36 | thread, blank, and the subject. |
| 37 | The file ``yyyymm'' contains |
| 38 | entries for all threads that have messages in the month ``yyyymm'' |
| 39 | or that have messages both before and after that month. |
| 40 | The subject hash is a key to the subject files; the message number is |
| 41 | a key to the index file. |
| 42 | The lines are in ascending order by message number when the index is |
| 43 | created |
| 44 | .I de novo |
| 45 | on an existing archive. When the messages are added one-by-one as in normal |
| 46 | archive operation, ``n'' is the number of message in the thread |
| 47 | .I for the particular month |
| 48 | and the order is in reverse of latest message, i.e. the last extended thread |
| 49 | is shown last. The message number accompanying a thread is |
| 50 | always a message within the thread. It is the first in |
| 51 | archives created |
| 52 | on existing lists, and the last message in incrementally created archives. |
| 53 | Use the corresponding subject index file to get a list of all |
| 54 | messages in the thread in ascending order. |
| 55 | .TP |
| 56 | .I dir\fB/archive/subjects/xx/yyyyyyyyyyyyyyyyyy |
| 57 | A subject file. The first line is the subject hash, a space, and the subject. |
| 58 | This is followed by one line per message with this subject, in the format |
| 59 | message number, ``:'', date (yyyymm), ``:'', |
| 60 | author hash, blank, author from line. The lines are |
| 61 | sorted by message number. The author hash is a key to the author files; |
| 62 | the message number is a key to the index file. The file in the example |
| 63 | would be for the subject hash ``xxyyyyyyyyyyyyyyyyyy''. |
| 64 | .TP |
| 65 | .I dir\fB/archive/authors/xx/yyyyyyyyyyyyyyyyyy |
| 66 | An author file. The first line is the author hash, a space, and the author |
| 67 | from line. |
| 68 | This is followed by one line per message with this author, in the format |
| 69 | message number, ``:'', date (yyyymm), ``:'', |
| 70 | subject hash, blank, subject. The lines are |
| 71 | sorted by message number. The subject hash is a key to the subject files; |
| 72 | the message number is a key to the index file. The file in the example |
| 73 | would be for the author hash ``xxyyyyyyyyyyyyyyyyyy''. |
| 74 | |
| 75 | .I dir\fB/archnum |
| 76 | keeps track of the last message processed. Normally, |
| 77 | .B ezmlm-archive |
| 78 | will process entries for messages from one above the contents of this file |
| 79 | up to an including the message number in |
| 80 | .IR dir\fB/num . |
| 81 | .SH OPTIONS |
| 82 | .B ezmlm-archive |
| 83 | writes messages in a crash-proof manner when run in normal mode. When overriding |
| 84 | the normal message range with any of the options listed, the normal |
| 85 | .B sync(3) |
| 86 | of the output files is suppressed for efficiency. Should the computer crash |
| 87 | during this time the state of the indices is not defined. Use the |
| 88 | .B \-s |
| 89 | option in the (extremely rare) cases where this would be a problem. |
| 90 | .TP |
| 91 | .B \-c |
| 92 | Create a new index. This overrides |
| 93 | .I dir\fB/archnum |
| 94 | causing |
| 95 | .B ezmlm-archive |
| 96 | to start with the first message in the archive. Synonym for |
| 97 | .BR \-f\fI0 . |
| 98 | .B NOTE: |
| 99 | .B ezmlm-archive |
| 100 | does not remove files in the index. While it will overwrite/update old files |
| 101 | it will not remove files that are obsolete for other reasons. |
| 102 | .TP |
| 103 | .B \-C |
| 104 | (Default.) |
| 105 | Process entries starting with the message after the message listed in |
| 106 | .IR dir\fB/archnum . |
| 107 | .TP |
| 108 | .B \-f\fI msg1 |
| 109 | Process messages from the archive section (set of 100 messages) |
| 110 | containing message |
| 111 | .IR msg1 . |
| 112 | This is useful if you have removed part of the archive, as it will shorten |
| 113 | processing time and decrease memory use. |
| 114 | .B NOTE: |
| 115 | .B ezmlm-archive |
| 116 | does not remove files in the index. While it will overwrite/update old files |
| 117 | it will not remove files that are obsolete for other reasons. The number of |
| 118 | messages per thread will be incorrect when using of the |
| 119 | .B \-f |
| 120 | and |
| 121 | .B \-t |
| 122 | switches leads to partial re-indexing of already indexed messages. |
| 123 | .TP |
| 124 | .B \-F |
| 125 | (Default.) |
| 126 | Do not change the starting message from the default |
| 127 | (see |
| 128 | .BR \-C ). |
| 129 | .TP |
| 130 | .B \-s |
| 131 | Always sync files. |
| 132 | .TP |
| 133 | .B \-S |
| 134 | (Default.) |
| 135 | Sync files, except when on of the message range modifying options is |
| 136 | used. |
| 137 | .TP |
| 138 | .B \-t\fI msg2 |
| 139 | Process messages to message |
| 140 | .I msg2 |
| 141 | instead of the last message in the archive. Again, files written are |
| 142 | corrected, but other files are not explicitly removed. |
| 143 | .TP |
| 144 | .B \-T |
| 145 | (Default.) |
| 146 | Process entries for messages up to the last message in the archive. |
| 147 | .TP |
| 148 | .B \-v |
| 149 | Display |
| 150 | .B ezmlm-archive |
| 151 | version info. |
| 152 | .TP |
| 153 | .B \-V |
| 154 | Display |
| 155 | .B ezmlm-archive |
| 156 | version info. |
| 157 | .SH "MEMORY USAGE" |
| 158 | .B ezmlm-archive |
| 159 | stores its linked lists in memory. On at 32-bit architecture, it uses |
| 160 | 12 bytes per message, 28 bytes per thread (plus one copy of the subject), |
| 161 | and 20 bytes per author (plus one copy of the author from line). |
| 162 | |
| 163 | In normal list use, it processes only at most a few messages at a time, |
| 164 | but for initial processing of a large archive, considerable amounts of |
| 165 | memory may be used. Assuming |
| 166 | 40 bytes for subject/from line, 5 messages per thread, 100,000 messages, |
| 167 | and 1000 authors, this is 2.5 MB. For 1,000,000 messages this is about 20 MB. |
| 168 | |
| 169 | Thus, for large archives, it may be useful to use the |
| 170 | .I \-t |
| 171 | switch to process the archive in multiple subsets, starting with e.g. the first |
| 172 | 100,000, then the next, and so on. |
| 173 | .SH "SEE ALSO" |
| 174 | ezmlm-cgi(1), |
| 175 | ezmlm-idx(1), |
| 176 | ezmlm-send(1), |
| 177 | ezmlm(5) |
| 178 | |