3 ezmlm-archive \- create thread and author index for a mailing list archive
17 reads the index files from a message archive, and creates a subject index, a
18 collection of subject files, and a collection of author files. These
19 files are suitable as an index for WWW access to, and navigation through
20 a mailing list archive by
23 The index files read are created by
25 on a per-list basis and by
27 on a per-message archive for a indexed list.
29 The output files created are:
31 .I dir\fB/archive/threads/yyyymm
32 The thread index. It contains one line per subject, starting with the
33 number of the first message with that subject within the set
34 investigated, ``:'', a 20 character
35 subject hash, blank, ``\[n\]'' where ``n'' is the number of messages in the
36 thread, blank, and the subject.
37 The file ``yyyymm'' contains
38 entries for all threads that have messages in the month ``yyyymm''
39 or that have messages both before and after that month.
40 The subject hash is a key to the subject files; the message number is
41 a key to the index file.
42 The lines are in ascending order by message number when the index is
45 on an existing archive. When the messages are added one-by-one as in normal
46 archive operation, ``n'' is the number of message in the thread
47 .I for the particular month
48 and the order is in reverse of latest message, i.e. the last extended thread
49 is shown last. The message number accompanying a thread is
50 always a message within the thread. It is the first in
52 on existing lists, and the last message in incrementally created archives.
53 Use the corresponding subject index file to get a list of all
54 messages in the thread in ascending order.
56 .I dir\fB/archive/subjects/xx/yyyyyyyyyyyyyyyyyy
57 A subject file. The first line is the subject hash, a space, and the subject.
58 This is followed by one line per message with this subject, in the format
59 message number, ``:'', date (yyyymm), ``:'',
60 author hash, blank, author from line. The lines are
61 sorted by message number. The author hash is a key to the author files;
62 the message number is a key to the index file. The file in the example
63 would be for the subject hash ``xxyyyyyyyyyyyyyyyyyy''.
65 .I dir\fB/archive/authors/xx/yyyyyyyyyyyyyyyyyy
66 An author file. The first line is the author hash, a space, and the author
68 This is followed by one line per message with this author, in the format
69 message number, ``:'', date (yyyymm), ``:'',
70 subject hash, blank, subject. The lines are
71 sorted by message number. The subject hash is a key to the subject files;
72 the message number is a key to the index file. The file in the example
73 would be for the author hash ``xxyyyyyyyyyyyyyyyyyy''.
76 keeps track of the last message processed. Normally,
78 will process entries for messages from one above the contents of this file
79 up to an including the message number in
83 writes messages in a crash-proof manner when run in normal mode. When overriding
84 the normal message range with any of the options listed, the normal
86 of the output files is suppressed for efficiency. Should the computer crash
87 during this time the state of the indices is not defined. Use the
89 option in the (extremely rare) cases where this would be a problem.
92 Create a new index. This overrides
96 to start with the first message in the archive. Synonym for
100 does not remove files in the index. While it will overwrite/update old files
101 it will not remove files that are obsolete for other reasons.
105 Process entries starting with the message after the message listed in
109 Process messages from the archive section (set of 100 messages)
112 This is useful if you have removed part of the archive, as it will shorten
113 processing time and decrease memory use.
116 does not remove files in the index. While it will overwrite/update old files
117 it will not remove files that are obsolete for other reasons. The number of
118 messages per thread will be incorrect when using of the
122 switches leads to partial re-indexing of already indexed messages.
126 Do not change the starting message from the default
135 Sync files, except when on of the message range modifying options is
139 Process messages to message
141 instead of the last message in the archive. Again, files written are
142 corrected, but other files are not explicitly removed.
146 Process entries for messages up to the last message in the archive.
159 stores its linked lists in memory. On at 32-bit architecture, it uses
160 12 bytes per message, 28 bytes per thread (plus one copy of the subject),
161 and 20 bytes per author (plus one copy of the author from line).
163 In normal list use, it processes only at most a few messages at a time,
164 but for initial processing of a large archive, considerable amounts of
165 memory may be used. Assuming
166 40 bytes for subject/from line, 5 messages per thread, 100,000 messages,
167 and 1000 authors, this is 2.5 MB. For 1,000,000 messages this is about 20 MB.
169 Thus, for large archives, it may be useful to use the
171 switch to process the archive in multiple subsets, starting with e.g. the first
172 100,000, then the next, and so on.