Import ezmlm-idx 0.40
[ezmlm] / sub_pgsql / README
1 $Id: README,v 1.1 1999/08/21 02:00:37 lindberg Exp $
2 $Name: ezmlm-idx-040 $
3 INFORMATION ON BUILDING/USING EZMLM WITH POSTGRESQL SUPPORT
4
5 Original source:
6 (c) 1999, Frederik Lindberg,
7 lindberg@id.wustl.edu
8 You may use under GPL.
9
10 and for PostgreSQL modifications:
11 (c) 1999, Magnus Stålåker
12 stalaker@umc.se
13
14 For information on PostgreSQL, see http://www.postgresql.org/
15
16 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17 NOTICE! This is a untested beta! USE WITH CAUTION!
18 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
19
20 This version of the PostgresSQL supports the basic subscriber address
21 database and subscription logging. The log that is searched by the -log.
22 command is the local listdir/Log file, NOT the SQL database. Message
23 logging is not supported, and central admin of list clusters is not
24 supported. Still, the current support covers the functionality used in
25 99% of lists.
26
27 Most of this information is available in FAQ.idx.
28
29 If you are interested in contributing/testing a subscriber db interface
30 for another SQL server, please see sub_std/README and the routines here,
31 and contact lindberg@id.wustl.edu (it may already be in process). See end
32 of this file for other ways to contribute.
33
34 conf-mysql must be edited to reflect your system. On many systems, you
35 also need to include ``-lsocket'', as well as change the paths to the
36 /usr/local equivalents. For the i386.rpm-based systems, you need at
37 least MySQL-devel to build the files. Look at your mysql docs for more info.
38
39 TABLES USED FOR (My)SQL SUPPORT
40
41 The basic philosophy is that the database can be on any host (if you use
42 SENDER restrictions, connectivity to the main host is more important than
43 to the sublists), and you choose the database and "table root" names. The
44 default database is ``ezmlm'' and the default table root is ``list''. Each
45 list has a separate table root. Any number of lists can share a database.
46
47 The main list address table is named with the table root only, others have
48 that name with various suffixes. In the following ``list'' is used as the
49 table root.
50
51 ADDRESS TABLES
52 list subscriber addresses
53 list_slog subscriber address log
54 list_allow subscriber aliases for posts on SENDER checked lists.
55 list_allow_slog subscriber log for list_allow
56 list_deny blacklisted addresses for posts on SENDER checked lists.
57 list_deny_slog log for list_deny.
58 list_mod moderator addresses
59 list_mod_slog log for list_mod
60 list_digest subscriber log for digest list.
61 list_digest_slog log for list_digest
62
63 MESSAGE LOGGING TABLES (not supported by this version of the interface)
64 list_cookie message cookie table for main list
65 list_mlog message logging table for main list
66 list_digest_cookie message cookie table for digest list
67 list_digest_mlog message logging table for digest list
68
69 SUBLIST SPLIT TABLES (not supported by this version of the interface)
70 list_name sublist split table for main list
71 list_digest_name sublist split table for digest list.
72
73
74 ezmlm-mktab(1) is a script that outputs the table definintions. Look at
75 the output for a detailed field description.
76
77 The address tables contain (address,domain,hash,h,num). For normal
78 lists only the address field is used. For main->sublist clusters, the other
79 fields are used for load splitting. The domain is the first up to 3 characters
80 of the last part of the domain name. The hash is a address hash [0-52] differnt
81 from the one used by ezmlm for splitting within DIR/subscribers. When using
82 the address field as a primary key, the size of the index was unreasonable.
83 Therefore, ``num'' is used as a dummy primary key, and ``h'' (a 32 bit hash
84 of the address) is used as an index. This markedly speeds up (un)sub with
85 large (>30,000 rows) subscriber tables.
86
87 The *_slog tables contain the same info as DIR/Log, i.e. address, timestamp,
88 entry-type, entry-direction, and fromline. The entry-type is the first letter
89 of the type of entry (probe, manual, `` '' for normal), entry-direction is
90 ``+'' for addition, ``-'' for removal. Fromline is the From: header contents
91 taken from the subscribe confirm message or from ezmlm-sub (if used with -n).
92 It is blank for all address removals, and may be blank also for additions. It
93 is used by the list-log.xx command. It is trivial to JOIN this table with the
94 address table to get e.g. subsciber names, subscription dates, etc. These
95 tables also have the 32-bit hash ``h'' as an index. Joins should be done on
96 ``h'' as well as ``address'' for better performance.
97
98 The *_cookie tables contain message number, timestamp, and cookie. For each
99 message a pseudo-random cookie is generated that is ``impossible'' to guess
100 beforehand. For lists with sublists, this is used as basic authentication,
101 i.e. the sublist will refuse to process a message that doesn't contain the
102 correct cookie or that the sublist has already successfully processed.
103
104 The *_mlog tables contain log entries from main and sublists. These are
105 timestamp, listno, done. Listno is the lowest listnumber for an active list
106 entry with the name of this sublist as looked up in the *_name table. Done
107 is -1 for bounce, 0 for arrived, 1 for finished processing, and 2 for receipt
108 received. The routines are set up so that only the first attempt for each
109 combination (listno,code) is logged.
110
111 The *_name tables contain listno,name,domain,hash_lo,hash_hi,msgnum_lo,
112 msgnum_hi,notuse. Listno is auto_increment and unique. Name is the name of the
113 sublist. domain is the last up to 3 characters of the top domain name for
114 addresses served by this list (default = ''). It is is '', the list servers
115 all_domains_that_are_not_served_by_another list (in addition to domain '').
116 Of the addresses that match the domain criterion, the list serves the subset
117 with hash between hash_lo and hash_hi (defaults 0, 52). Any entry is ingnored if
118 notuse != 0 OR the current message number is not between msgnum_lo and
119 msgnum_hi.
120
121 For normal lists that are not distributed (i.e. they are a single list),
122 entries in the *_name tables are not needed and logging is not very
123 relevant.
124
125 For most lists, the only addresses that are stored in the SQL database are
126 the subscribers of list and digest, and the ``allow'' aliases. It is NOT
127 normally advisable to store moderator addresses there, since they are
128 needed only at the main list and secrecy is more important. ``Deny'' addresses
129 are few and again only needed at the main list. ``Allow'' are put in the
130 SQL database when using the default ezmlmrc file only to make all relevant
131 addresses manipulatable via the SQL server. The other tables are created, in
132 case they are wanted (the cost for having them as empty table is zero). The
133 basedir/sql file is the decision point. If it exists, an SQL table is used;
134 if not a local ezmlm db is used.
135
136 CONTRIBUTIONS REQESTED
137
138 I would be very grateful if there are users out there willing to do any of
139 the following and contribute it to this package. Please check with me first
140 (lindberg@id.wustl.edu), as the project may already be in progress/done.
141
142 1. Interfaces for other SQL servers. Oracle, SyBase, ...
143
144 2. A GUI admin utility to add/remove/manipulate the sublist split, essentially
145 by modifying list_[digest_]name in a safe way. Ideally WWW if it can be
146 done securely. If you use some standard interface (JDBC/DBD) it would be
147 useful also with other SQL severs. This could even be an Access program
148 using ODBC, although writing it for a platform running qmail/ezmlm makes
149 most sense.
150
151 3. a WWW GUI that allows users to subscribe/unsubscribe in a safe way. A random
152 password would be created the first time and stored in a new address->pw
153 table and mailed to the subscriber address. With that password, the user
154 would be able to [un]subscribe to lists, edit the name (for compatibility
155 implemented by adding a subscribe line to list_[digest]slog). Add/remove
156 aliases. Ideally, it should also allow searching by subscriber name. This
157 would search *_slog.fromline. If less that 'x' alternatives are found, the
158 user would be presented with names (not addresses), allowing the user to
159 cause the subscription name and password to be sent to the respective
160 subscription address. With that info, the subscriber can then unsubscribe,
161 even if s/he has forgotten the subscription address. It is complicated
162 slightly by the fact that ``fromline'' is the crude line and needs to be
163 rfc822 parsed. Again, use of a standard interface is encouraged to make it
164 compatible also with other SQL servers.
165
166 The aim of all this is to make it easy to use ezmlm to run very large lists,
167 easy to set up sites that handle subscriber interaction, archive access, etc,
168 and hopefully easier to integrate many ezmlm as done by some WWW sites today.
169