Commit | Line | Data |
---|---|---|
f8beb284 MW |
1 | .TH ezmlm-archive 1 |
2 | .SH NAME | |
3 | ezmlm-archive \- create thread and author index for a mailing list archive | |
4 | .SH SYNOPSIS | |
5 | .B ezmlm-archive | |
6 | [ | |
7 | .B \-cCFTvV | |
8 | ][ | |
9 | .B \-f\fI msg1 | |
10 | ] | |
11 | ][ | |
12 | .B \-t\fI msg2 | |
13 | ] | |
14 | .I dir | |
15 | .SH DESCRIPTION | |
16 | .B ezmlm-archive | |
17 | reads the index files from a message archive, and creates a subject index, a | |
18 | collection of subject files, and a collection of author files. These | |
19 | files are suitable as an index for WWW access to, and navigation through | |
20 | a mailing list archive by | |
21 | .BR ezmlm-cgi(1) . | |
22 | ||
23 | The index files read are created by | |
24 | .B ezmlm-idx(1) | |
25 | on a per-list basis and by | |
26 | .B ezmlm-send(1) | |
27 | on a per-message archive for a indexed list. | |
28 | ||
29 | The output files created are: | |
30 | .TP | |
31 | .I dir\fB/archive/threads/yyyymm | |
32 | The thread index. It contains one line per subject, starting with the | |
33 | number of the first message with that subject within the set | |
34 | investigated, ``:'', a 20 character | |
25a55efe | 35 | subject hash, blank, ``[n]'' where ``n'' is the number of messages in the |
f8beb284 MW |
36 | thread, blank, and the subject. |
37 | The file ``yyyymm'' contains | |
38 | entries for all threads that have messages in the month ``yyyymm'' | |
39 | or that have messages both before and after that month. | |
40 | The subject hash is a key to the subject files; the message number is | |
41 | a key to the index file. | |
42 | The lines are in ascending order by message number when the index is | |
43 | created | |
44 | .I de novo | |
45 | on an existing archive. When the messages are added one-by-one as in normal | |
46 | archive operation, ``n'' is the number of message in the thread | |
47 | .I for the particular month | |
48 | and the order is in reverse of latest message, i.e. the last extended thread | |
49 | is shown last. The message number accompanying a thread is | |
50 | always a message within the thread. It is the first in | |
51 | archives created | |
52 | on existing lists, and the last message in incrementally created archives. | |
53 | Use the corresponding subject index file to get a list of all | |
54 | messages in the thread in ascending order. | |
55 | .TP | |
56 | .I dir\fB/archive/subjects/xx/yyyyyyyyyyyyyyyyyy | |
57 | A subject file. The first line is the subject hash, a space, and the subject. | |
58 | This is followed by one line per message with this subject, in the format | |
59 | message number, ``:'', date (yyyymm), ``:'', | |
60 | author hash, blank, author from line. The lines are | |
61 | sorted by message number. The author hash is a key to the author files; | |
62 | the message number is a key to the index file. The file in the example | |
63 | would be for the subject hash ``xxyyyyyyyyyyyyyyyyyy''. | |
64 | .TP | |
65 | .I dir\fB/archive/authors/xx/yyyyyyyyyyyyyyyyyy | |
66 | An author file. The first line is the author hash, a space, and the author | |
67 | from line. | |
68 | This is followed by one line per message with this author, in the format | |
69 | message number, ``:'', date (yyyymm), ``:'', | |
70 | subject hash, blank, subject. The lines are | |
71 | sorted by message number. The subject hash is a key to the subject files; | |
72 | the message number is a key to the index file. The file in the example | |
73 | would be for the author hash ``xxyyyyyyyyyyyyyyyyyy''. | |
74 | ||
75 | .I dir\fB/archnum | |
76 | keeps track of the last message processed. Normally, | |
77 | .B ezmlm-archive | |
78 | will process entries for messages from one above the contents of this file | |
79 | up to an including the message number in | |
80 | .IR dir\fB/num . | |
81 | .SH OPTIONS | |
82 | .B ezmlm-archive | |
83 | writes messages in a crash-proof manner when run in normal mode. When overriding | |
84 | the normal message range with any of the options listed, the normal | |
85 | .B sync(3) | |
86 | of the output files is suppressed for efficiency. Should the computer crash | |
87 | during this time the state of the indices is not defined. Use the | |
88 | .B \-s | |
89 | option in the (extremely rare) cases where this would be a problem. | |
90 | .TP | |
91 | .B \-c | |
92 | Create a new index. This overrides | |
93 | .I dir\fB/archnum | |
94 | causing | |
95 | .B ezmlm-archive | |
96 | to start with the first message in the archive. Synonym for | |
97 | .BR \-f\fI0 . | |
98 | .B NOTE: | |
99 | .B ezmlm-archive | |
100 | does not remove files in the index. While it will overwrite/update old files | |
101 | it will not remove files that are obsolete for other reasons. | |
102 | .TP | |
103 | .B \-C | |
104 | (Default.) | |
105 | Process entries starting with the message after the message listed in | |
106 | .IR dir\fB/archnum . | |
107 | .TP | |
108 | .B \-f\fI msg1 | |
109 | Process messages from the archive section (set of 100 messages) | |
110 | containing message | |
111 | .IR msg1 . | |
112 | This is useful if you have removed part of the archive, as it will shorten | |
113 | processing time and decrease memory use. | |
114 | .B NOTE: | |
115 | .B ezmlm-archive | |
116 | does not remove files in the index. While it will overwrite/update old files | |
117 | it will not remove files that are obsolete for other reasons. The number of | |
118 | messages per thread will be incorrect when using of the | |
119 | .B \-f | |
120 | and | |
121 | .B \-t | |
122 | switches leads to partial re-indexing of already indexed messages. | |
123 | .TP | |
124 | .B \-F | |
125 | (Default.) | |
126 | Do not change the starting message from the default | |
127 | (see | |
128 | .BR \-C ). | |
129 | .TP | |
130 | .B \-s | |
131 | Always sync files. | |
132 | .TP | |
133 | .B \-S | |
134 | (Default.) | |
135 | Sync files, except when on of the message range modifying options is | |
136 | used. | |
137 | .TP | |
138 | .B \-t\fI msg2 | |
139 | Process messages to message | |
140 | .I msg2 | |
141 | instead of the last message in the archive. Again, files written are | |
142 | corrected, but other files are not explicitly removed. | |
143 | .TP | |
144 | .B \-T | |
145 | (Default.) | |
146 | Process entries for messages up to the last message in the archive. | |
147 | .TP | |
148 | .B \-v | |
149 | Display | |
150 | .B ezmlm-archive | |
151 | version info. | |
152 | .TP | |
153 | .B \-V | |
154 | Display | |
155 | .B ezmlm-archive | |
156 | version info. | |
157 | .SH "MEMORY USAGE" | |
158 | .B ezmlm-archive | |
159 | stores its linked lists in memory. On at 32-bit architecture, it uses | |
160 | 12 bytes per message, 28 bytes per thread (plus one copy of the subject), | |
161 | and 20 bytes per author (plus one copy of the author from line). | |
162 | ||
163 | In normal list use, it processes only at most a few messages at a time, | |
164 | but for initial processing of a large archive, considerable amounts of | |
165 | memory may be used. Assuming | |
166 | 40 bytes for subject/from line, 5 messages per thread, 100,000 messages, | |
167 | and 1000 authors, this is 2.5 MB. For 1,000,000 messages this is about 20 MB. | |
168 | ||
169 | Thus, for large archives, it may be useful to use the | |
170 | .I \-t | |
171 | switch to process the archive in multiple subsets, starting with e.g. the first | |
172 | 100,000, then the next, and so on. | |
173 | .SH "SEE ALSO" | |
174 | ezmlm-cgi(1), | |
175 | ezmlm-idx(1), | |
176 | ezmlm-send(1), | |
177 | ezmlm(5) | |
178 |