| 1 | $Id: README,v 1.3 1999/02/20 20:05:19 lindberg Exp $ |
| 2 | $Name: ezmlm-idx-040 $ |
| 3 | INFORMATION ON BUILDING/USING EZMLM WITH MYSQL SUPPORT |
| 4 | |
| 5 | (c) 1999, Frederik Lindberg, |
| 6 | lindberg@id.wustl.edu |
| 7 | You may use under GPL. |
| 8 | |
| 9 | For information on MySQL, see http://www.tcx.se. |
| 10 | |
| 11 | Most of this information is available in FAQ.idx. |
| 12 | |
| 13 | If you are interested in contributing/testing a subscriber db interface |
| 14 | for another SQL server, please see sub_std/README and the routines here, |
| 15 | and contact lindberg@id.wustl.edu (it may already be in process). See end |
| 16 | of this file for other ways to contribute. |
| 17 | |
| 18 | conf-mysql must be edited to reflect your system. On many systems, you |
| 19 | also need to include ``-lsocket'', as well as change the paths to the |
| 20 | /usr/local equivalents. For the i386.rpm-based systems, you need at |
| 21 | least MySQL-devel to build the files. Look at your mysql docs for more info. |
| 22 | |
| 23 | TABLES USED FOR (My)SQL SUPPORT |
| 24 | |
| 25 | The basic philosophy is that the database can be on any host (if you use |
| 26 | SENDER restrictions, connectivity to the main host is more important than |
| 27 | to the sublists), and you choose the database and "table root" names. The |
| 28 | default database is ``ezmlm'' and the default table root is ``list''. Each |
| 29 | list has a separate table root. Any number of lists can share a database. |
| 30 | |
| 31 | The main list address table is named with the table root only, others have |
| 32 | that name with various suffixes. In the following ``list'' is used as the |
| 33 | table root. |
| 34 | |
| 35 | ADDRESS TABLES |
| 36 | list subscriber addresses |
| 37 | list_slog subscriber address log |
| 38 | list_allow subscriber aliases for posts on SENDER checked lists. |
| 39 | list_allow_slog subscriber log for list_allow |
| 40 | list_deny blacklisted addresses for posts on SENDER checked lists. |
| 41 | list_deny_slog log for list_deny. |
| 42 | list_mod moderator addresses |
| 43 | list_mod_slog log for list_mod |
| 44 | list_digest subscriber log for digest list. |
| 45 | list_digest_slog log for list_digest |
| 46 | |
| 47 | MESSAGE LOGGING TABLES |
| 48 | list_cookie message cookie table for main list |
| 49 | list_mlog message logging table for main list |
| 50 | list_digest_cookie message cookie table for digest list |
| 51 | list_digest_mlog message logging table for digest list |
| 52 | |
| 53 | SUBLIST SPLIT TABLES |
| 54 | list_name sublist split table for main list |
| 55 | list_digest_name sublist split table for digest list. |
| 56 | |
| 57 | |
| 58 | ezmlm-mktab(1) is a script that outputs the table definintions. Look at |
| 59 | the output for a detailed field description. |
| 60 | |
| 61 | The address tables contain (address,domain,hash,h,num). For normal |
| 62 | lists only the address field is used. For main->sublist clusters, the other |
| 63 | fields are used for load splitting. The domain is the first up to 3 characters |
| 64 | of the last part of the domain name. The hash is a address hash [0-52] differnt |
| 65 | from the one used by ezmlm for splitting within DIR/subscribers. When using |
| 66 | the address field as a primary key, the size of the index was unreasonable. |
| 67 | Therefore, ``num'' is used as a dummy primary key, and ``h'' (a 32 bit hash |
| 68 | of the address) is used as an index. This markedly speeds up (un)sub with |
| 69 | large (>30,000 rows) subscriber tables. |
| 70 | |
| 71 | The *_slog tables contain the same info as DIR/Log, i.e. address, timestamp, |
| 72 | entry-type, entry-direction, and fromline. The entry-type is the first letter |
| 73 | of the type of entry (probe, manual, `` '' for normal), entry-direction is |
| 74 | ``+'' for addition, ``-'' for removal. Fromline is the From: header contents |
| 75 | taken from the subscribe confirm message or from ezmlm-sub (if used with -n). |
| 76 | It is blank for all address removals, and may be blank also for additions. It |
| 77 | is used by the list-log.xx command. It is trivial to JOIN this table with the |
| 78 | address table to get e.g. subsciber names, subscription dates, etc. These |
| 79 | tables also have the 32-bit hash ``h'' as an index. Joins should be done on |
| 80 | ``h'' as well as ``address'' for better performance. |
| 81 | |
| 82 | The *_cookie tables contain message number, timestamp, and cookie. For each |
| 83 | message a pseudo-random cookie is generated that is ``impossible'' to guess |
| 84 | beforehand. For lists with sublists, this is used as basic authentication, |
| 85 | i.e. the sublist will refuse to process a message that doesn't contain the |
| 86 | correct cookie or that the sublist has already successfully processed. |
| 87 | |
| 88 | The *_mlog tables contain log entries from main and sublists. These are |
| 89 | timestamp, listno, done. Listno is the lowest listnumber for an active list |
| 90 | entry with the name of this sublist as looked up in the *_name table. Done |
| 91 | is -1 for bounce, 0 for arrived, 1 for finished processing, and 2 for receipt |
| 92 | received. The routines are set up so that only the first attempt for each |
| 93 | combination (listno,code) is logged. |
| 94 | |
| 95 | The *_name tables contain listno,name,domain,hash_lo,hash_hi,msgnum_lo, |
| 96 | msgnum_hi,notuse. Listno is auto_increment and unique. Name is the name of the |
| 97 | sublist. domain is the last up to 3 characters of the top domain name for |
| 98 | addresses served by this list (default = ''). It is is '', the list servers |
| 99 | all_domains_that_are_not_served_by_another list (in addition to domain ''). |
| 100 | Of the addresses that match the domain criterion, the list serves the subset |
| 101 | with hash between hash_lo and hash_hi (defaults 0, 52). Any entry is ingnored if |
| 102 | notuse != 0 OR the current message number is not between msgnum_lo and |
| 103 | msgnum_hi. |
| 104 | |
| 105 | For normal lists that are not distributed (i.e. they are a single list), |
| 106 | entries in the *_name tables are not needed and logging is not very |
| 107 | relevant. |
| 108 | |
| 109 | For most lists, the only addresses that are stored in the SQL database are |
| 110 | the subscribers of list and digest, and the ``allow'' aliases. It is NOT |
| 111 | normally advisable to store moderator addresses there, since they are |
| 112 | needed only at the main list and secrecy is more important. ``Deny'' addresses |
| 113 | are few and again only needed at the main list. ``Allow'' are put in the |
| 114 | SQL database when using the default ezmlmrc file only to make all relevant |
| 115 | addresses manipulatable via the SQL server. The other tables are created, in |
| 116 | case they are wanted (the cost for having them as empty table is zero). The |
| 117 | basedir/sql file is the decision point. If it exists, an SQL table is used; |
| 118 | if not a local ezmlm db is used. |
| 119 | |
| 120 | CONTRIBUTIONS REQESTED |
| 121 | |
| 122 | I would be very grateful if there are users out there willing to do any of |
| 123 | the following and contribute it to this package. Please check with me first |
| 124 | (lindberg@id.wustl.edu), as the project may already be in progress/done. |
| 125 | |
| 126 | 1. Interfaces for other SQL servers. Oracle, SyBase, ... |
| 127 | |
| 128 | 2. A GUI admin utility to add/remove/manipulate the sublist split, essentially |
| 129 | by modifying list_[digest_]name in a safe way. Ideally WWW if it can be |
| 130 | done securely. If you use some standard interface (JDBC/DBD) it would be |
| 131 | useful also with other SQL severs. This could even be an Access program |
| 132 | using ODBC, although writing it for a platform running qmail/ezmlm makes |
| 133 | most sense. |
| 134 | |
| 135 | 3. a WWW GUI that allows users to subscribe/unsubscribe in a safe way. A random |
| 136 | password would be created the first time and stored in a new address->pw |
| 137 | table and mailed to the subscriber address. With that password, the user |
| 138 | would be able to [un]subscribe to lists, edit the name (for compatibility |
| 139 | implemented by adding a subscribe line to list_[digest]slog). Add/remove |
| 140 | aliases. Ideally, it should also allow searching by subscriber name. This |
| 141 | would search *_slog.fromline. If less that 'x' alternatives are found, the |
| 142 | user would be presented with names (not addresses), allowing the user to |
| 143 | cause the subscription name and password to be sent to the respective |
| 144 | subscription address. With that info, the subscriber can then unsubscribe, |
| 145 | even if s/he has forgotten the subscription address. It is complicated |
| 146 | slightly by the fact that ``fromline'' is the crude line and needs to be |
| 147 | rfc822 parsed. Again, use of a standard interface is encouraged to make it |
| 148 | compatible also with other SQL servers. |
| 149 | |
| 150 | The aim of all this is to make it easy to use ezmlm to run very large lists, |
| 151 | easy to set up sites that handle subscriber interaction, archive access, etc, |
| 152 | and hopefully easier to integrate many ezmlm as done by some WWW sites today. |
| 153 | |