Import ezmlm-idx 0.40
[ezmlm] / sub_mysql / README
1 $Id: README,v 1.3 1999/02/20 20:05:19 lindberg Exp $
2 $Name: ezmlm-idx-040 $
3 INFORMATION ON BUILDING/USING EZMLM WITH MYSQL SUPPORT
4
5 (c) 1999, Frederik Lindberg,
6 lindberg@id.wustl.edu
7 You may use under GPL.
8
9 For information on MySQL, see http://www.tcx.se.
10
11 Most of this information is available in FAQ.idx.
12
13 If you are interested in contributing/testing a subscriber db interface
14 for another SQL server, please see sub_std/README and the routines here,
15 and contact lindberg@id.wustl.edu (it may already be in process). See end
16 of this file for other ways to contribute.
17
18 conf-mysql must be edited to reflect your system. On many systems, you
19 also need to include ``-lsocket'', as well as change the paths to the
20 /usr/local equivalents. For the i386.rpm-based systems, you need at
21 least MySQL-devel to build the files. Look at your mysql docs for more info.
22
23 TABLES USED FOR (My)SQL SUPPORT
24
25 The basic philosophy is that the database can be on any host (if you use
26 SENDER restrictions, connectivity to the main host is more important than
27 to the sublists), and you choose the database and "table root" names. The
28 default database is ``ezmlm'' and the default table root is ``list''. Each
29 list has a separate table root. Any number of lists can share a database.
30
31 The main list address table is named with the table root only, others have
32 that name with various suffixes. In the following ``list'' is used as the
33 table root.
34
35 ADDRESS TABLES
36 list subscriber addresses
37 list_slog subscriber address log
38 list_allow subscriber aliases for posts on SENDER checked lists.
39 list_allow_slog subscriber log for list_allow
40 list_deny blacklisted addresses for posts on SENDER checked lists.
41 list_deny_slog log for list_deny.
42 list_mod moderator addresses
43 list_mod_slog log for list_mod
44 list_digest subscriber log for digest list.
45 list_digest_slog log for list_digest
46
47 MESSAGE LOGGING TABLES
48 list_cookie message cookie table for main list
49 list_mlog message logging table for main list
50 list_digest_cookie message cookie table for digest list
51 list_digest_mlog message logging table for digest list
52
53 SUBLIST SPLIT TABLES
54 list_name sublist split table for main list
55 list_digest_name sublist split table for digest list.
56
57
58 ezmlm-mktab(1) is a script that outputs the table definintions. Look at
59 the output for a detailed field description.
60
61 The address tables contain (address,domain,hash,h,num). For normal
62 lists only the address field is used. For main->sublist clusters, the other
63 fields are used for load splitting. The domain is the first up to 3 characters
64 of the last part of the domain name. The hash is a address hash [0-52] differnt
65 from the one used by ezmlm for splitting within DIR/subscribers. When using
66 the address field as a primary key, the size of the index was unreasonable.
67 Therefore, ``num'' is used as a dummy primary key, and ``h'' (a 32 bit hash
68 of the address) is used as an index. This markedly speeds up (un)sub with
69 large (>30,000 rows) subscriber tables.
70
71 The *_slog tables contain the same info as DIR/Log, i.e. address, timestamp,
72 entry-type, entry-direction, and fromline. The entry-type is the first letter
73 of the type of entry (probe, manual, `` '' for normal), entry-direction is
74 ``+'' for addition, ``-'' for removal. Fromline is the From: header contents
75 taken from the subscribe confirm message or from ezmlm-sub (if used with -n).
76 It is blank for all address removals, and may be blank also for additions. It
77 is used by the list-log.xx command. It is trivial to JOIN this table with the
78 address table to get e.g. subsciber names, subscription dates, etc. These
79 tables also have the 32-bit hash ``h'' as an index. Joins should be done on
80 ``h'' as well as ``address'' for better performance.
81
82 The *_cookie tables contain message number, timestamp, and cookie. For each
83 message a pseudo-random cookie is generated that is ``impossible'' to guess
84 beforehand. For lists with sublists, this is used as basic authentication,
85 i.e. the sublist will refuse to process a message that doesn't contain the
86 correct cookie or that the sublist has already successfully processed.
87
88 The *_mlog tables contain log entries from main and sublists. These are
89 timestamp, listno, done. Listno is the lowest listnumber for an active list
90 entry with the name of this sublist as looked up in the *_name table. Done
91 is -1 for bounce, 0 for arrived, 1 for finished processing, and 2 for receipt
92 received. The routines are set up so that only the first attempt for each
93 combination (listno,code) is logged.
94
95 The *_name tables contain listno,name,domain,hash_lo,hash_hi,msgnum_lo,
96 msgnum_hi,notuse. Listno is auto_increment and unique. Name is the name of the
97 sublist. domain is the last up to 3 characters of the top domain name for
98 addresses served by this list (default = ''). It is is '', the list servers
99 all_domains_that_are_not_served_by_another list (in addition to domain '').
100 Of the addresses that match the domain criterion, the list serves the subset
101 with hash between hash_lo and hash_hi (defaults 0, 52). Any entry is ingnored if
102 notuse != 0 OR the current message number is not between msgnum_lo and
103 msgnum_hi.
104
105 For normal lists that are not distributed (i.e. they are a single list),
106 entries in the *_name tables are not needed and logging is not very
107 relevant.
108
109 For most lists, the only addresses that are stored in the SQL database are
110 the subscribers of list and digest, and the ``allow'' aliases. It is NOT
111 normally advisable to store moderator addresses there, since they are
112 needed only at the main list and secrecy is more important. ``Deny'' addresses
113 are few and again only needed at the main list. ``Allow'' are put in the
114 SQL database when using the default ezmlmrc file only to make all relevant
115 addresses manipulatable via the SQL server. The other tables are created, in
116 case they are wanted (the cost for having them as empty table is zero). The
117 basedir/sql file is the decision point. If it exists, an SQL table is used;
118 if not a local ezmlm db is used.
119
120 CONTRIBUTIONS REQESTED
121
122 I would be very grateful if there are users out there willing to do any of
123 the following and contribute it to this package. Please check with me first
124 (lindberg@id.wustl.edu), as the project may already be in progress/done.
125
126 1. Interfaces for other SQL servers. Oracle, SyBase, ...
127
128 2. A GUI admin utility to add/remove/manipulate the sublist split, essentially
129 by modifying list_[digest_]name in a safe way. Ideally WWW if it can be
130 done securely. If you use some standard interface (JDBC/DBD) it would be
131 useful also with other SQL severs. This could even be an Access program
132 using ODBC, although writing it for a platform running qmail/ezmlm makes
133 most sense.
134
135 3. a WWW GUI that allows users to subscribe/unsubscribe in a safe way. A random
136 password would be created the first time and stored in a new address->pw
137 table and mailed to the subscriber address. With that password, the user
138 would be able to [un]subscribe to lists, edit the name (for compatibility
139 implemented by adding a subscribe line to list_[digest]slog). Add/remove
140 aliases. Ideally, it should also allow searching by subscriber name. This
141 would search *_slog.fromline. If less that 'x' alternatives are found, the
142 user would be presented with names (not addresses), allowing the user to
143 cause the subscription name and password to be sent to the respective
144 subscription address. With that info, the subscriber can then unsubscribe,
145 even if s/he has forgotten the subscription address. It is complicated
146 slightly by the fact that ``fromline'' is the crude line and needs to be
147 rfc822 parsed. Again, use of a standard interface is encouraged to make it
148 compatible also with other SQL servers.
149
150 The aim of all this is to make it easy to use ezmlm to run very large lists,
151 easy to set up sites that handle subscriber interaction, archive access, etc,
152 and hopefully easier to integrate many ezmlm as done by some WWW sites today.
153