Debianization and various other fixes.
[ezmlm] / ezmlm-archive.1
CommitLineData
f8beb284
MW
1.TH ezmlm-archive 1
2.SH NAME
3ezmlm-archive \- create thread and author index for a mailing list archive
4.SH SYNOPSIS
5.B ezmlm-archive
6[
7.B \-cCFTvV
8][
9.B \-f\fI msg1
10]
11][
12.B \-t\fI msg2
13]
14.I dir
15.SH DESCRIPTION
16.B ezmlm-archive
17reads the index files from a message archive, and creates a subject index, a
18collection of subject files, and a collection of author files. These
19files are suitable as an index for WWW access to, and navigation through
20a mailing list archive by
21.BR ezmlm-cgi(1) .
22
23The index files read are created by
24.B ezmlm-idx(1)
25on a per-list basis and by
26.B ezmlm-send(1)
27on a per-message archive for a indexed list.
28
29The output files created are:
30.TP
31.I dir\fB/archive/threads/yyyymm
32The thread index. It contains one line per subject, starting with the
33number of the first message with that subject within the set
34investigated, ``:'', a 20 character
25a55efe 35subject hash, blank, ``[n]'' where ``n'' is the number of messages in the
f8beb284
MW
36thread, blank, and the subject.
37The file ``yyyymm'' contains
38entries for all threads that have messages in the month ``yyyymm''
39or that have messages both before and after that month.
40The subject hash is a key to the subject files; the message number is
41a key to the index file.
42The lines are in ascending order by message number when the index is
43created
44.I de novo
45on an existing archive. When the messages are added one-by-one as in normal
46archive operation, ``n'' is the number of message in the thread
47.I for the particular month
48and the order is in reverse of latest message, i.e. the last extended thread
49is shown last. The message number accompanying a thread is
50always a message within the thread. It is the first in
51archives created
52on existing lists, and the last message in incrementally created archives.
53Use the corresponding subject index file to get a list of all
54messages in the thread in ascending order.
55.TP
56.I dir\fB/archive/subjects/xx/yyyyyyyyyyyyyyyyyy
57A subject file. The first line is the subject hash, a space, and the subject.
58This is followed by one line per message with this subject, in the format
59message number, ``:'', date (yyyymm), ``:'',
60author hash, blank, author from line. The lines are
61sorted by message number. The author hash is a key to the author files;
62the message number is a key to the index file. The file in the example
63would be for the subject hash ``xxyyyyyyyyyyyyyyyyyy''.
64.TP
65.I dir\fB/archive/authors/xx/yyyyyyyyyyyyyyyyyy
66An author file. The first line is the author hash, a space, and the author
67from line.
68This is followed by one line per message with this author, in the format
69message number, ``:'', date (yyyymm), ``:'',
70subject hash, blank, subject. The lines are
71sorted by message number. The subject hash is a key to the subject files;
72the message number is a key to the index file. The file in the example
73would be for the author hash ``xxyyyyyyyyyyyyyyyyyy''.
74
75.I dir\fB/archnum
76keeps track of the last message processed. Normally,
77.B ezmlm-archive
78will process entries for messages from one above the contents of this file
79up to an including the message number in
80.IR dir\fB/num .
81.SH OPTIONS
82.B ezmlm-archive
83writes messages in a crash-proof manner when run in normal mode. When overriding
84the normal message range with any of the options listed, the normal
85.B sync(3)
86of the output files is suppressed for efficiency. Should the computer crash
87during this time the state of the indices is not defined. Use the
88.B \-s
89option in the (extremely rare) cases where this would be a problem.
90.TP
91.B \-c
92Create a new index. This overrides
93.I dir\fB/archnum
94causing
95.B ezmlm-archive
96to start with the first message in the archive. Synonym for
97.BR \-f\fI0 .
98.B NOTE:
99.B ezmlm-archive
100does not remove files in the index. While it will overwrite/update old files
101it will not remove files that are obsolete for other reasons.
102.TP
103.B \-C
104(Default.)
105Process entries starting with the message after the message listed in
106.IR dir\fB/archnum .
107.TP
108.B \-f\fI msg1
109Process messages from the archive section (set of 100 messages)
110containing message
111.IR msg1 .
112This is useful if you have removed part of the archive, as it will shorten
113processing time and decrease memory use.
114.B NOTE:
115.B ezmlm-archive
116does not remove files in the index. While it will overwrite/update old files
117it will not remove files that are obsolete for other reasons. The number of
118messages per thread will be incorrect when using of the
119.B \-f
120and
121.B \-t
122switches leads to partial re-indexing of already indexed messages.
123.TP
124.B \-F
125(Default.)
126Do not change the starting message from the default
127(see
128.BR \-C ).
129.TP
130.B \-s
131Always sync files.
132.TP
133.B \-S
134(Default.)
135Sync files, except when on of the message range modifying options is
136used.
137.TP
138.B \-t\fI msg2
139Process messages to message
140.I msg2
141instead of the last message in the archive. Again, files written are
142corrected, but other files are not explicitly removed.
143.TP
144.B \-T
145(Default.)
146Process entries for messages up to the last message in the archive.
147.TP
148.B \-v
149Display
150.B ezmlm-archive
151version info.
152.TP
153.B \-V
154Display
155.B ezmlm-archive
156version info.
157.SH "MEMORY USAGE"
158.B ezmlm-archive
159stores its linked lists in memory. On at 32-bit architecture, it uses
16012 bytes per message, 28 bytes per thread (plus one copy of the subject),
161and 20 bytes per author (plus one copy of the author from line).
162
163In normal list use, it processes only at most a few messages at a time,
164but for initial processing of a large archive, considerable amounts of
165memory may be used. Assuming
16640 bytes for subject/from line, 5 messages per thread, 100,000 messages,
167and 1000 authors, this is 2.5 MB. For 1,000,000 messages this is about 20 MB.
168
169Thus, for large archives, it may be useful to use the
170.I \-t
171switch to process the archive in multiple subsets, starting with e.g. the first
172100,000, then the next, and so on.
173.SH "SEE ALSO"
174ezmlm-cgi(1),
175ezmlm-idx(1),
176ezmlm-send(1),
177ezmlm(5)
178