mdw@git.distorted.org.uk Git - sgt/agedu/blob - TODO

   1 TODO list for agedu
   2 ===================
   3
   4  - flexibility in the HTML report output mode: expose the internal
   5    mechanism for configuring the output filenames, and allow the
   6    user to request individual files with hyperlinks as if the other
   7    files existed. (In particular, functionality of this kind would
   8    enable other modes of use like the built-in --cgi mode, without
   9    me having to anticipate them in detail.)
  10
  11  - non-ASCII character set support
  12     + could usefully apply to --title and also to file names
  13     + how do we determine the input charset? Via locale, presumably.
  14     + how do we do translation? Importing my charset library is one
  15       heavyweight option; alternatively, does the native C locale
  16       mechanism provide enough functionality to do the job by itself?
  17     + in HTML, we would need to decide on an _output_ character set,
  18       specify it in a <meta http-equiv> tag, and translate to it from
  19       the input locale
  20        - one option is to make the output charset the same as the
  21          input one, in which case all we need is to identify its name
  22          for the <meta> tag
  23        - the other option is to make the output charset UTF-8 always
  24          and translate to that from everything else
  25        - in the web server and CGI modes, it would probably be nicer
  26          to move that <meta> tag into a proper HTTP header
  27     + even in text mode we would want to parse the filenames in some
  28       fashion, due to the unhelpful corner case of Shift-JIS Windows
  29       (in which backslashes in the input string must be classified as
  30       path separators or the second byte of a two-byte character)
  31        - that's really painful, since it will impact string processing
  32          of filenames throughout the code
  33        - so perhaps a better approach would be to do locale processing
  34          of filenames at _scan_ time, and normalise to UTF-8 in both
  35          the index and dump files?
  36           + involves incrementing the version of the dump-file format
  37           + then paths given on the command line are translated
  38             quickly to UTF-8 before comparing them against index paths
  39           + and now the HTML output side becomes easy, though the text
  40             output involves translating back again
  41           + but what if the filenames aren't intended to be
  42             interpreted in any particular character set (old-style
  43             Unix semantics) or in a consistent one?
  44
  45  - we could still be using more of the information coming from
  46    autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for
  47    particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR,
  48    HAVE_FNMATCH). We could usefully supply alternatives for some of
  49    these functions (e.g. cannibalise the PuTTY wildcard matcher for
  50    use in the absence of fnmatch, switch to vanilla truncate() in
  51    the absence of ftruncate); where we don't have alternative code,
  52    it would perhaps be polite to throw an error at configure time
  53    rather than allowing the subsequent build to fail.
  54     + however, I don't see anything here that looks very
  55       controversial; IIRC it's all in POSIX, for one thing. So more
  56       likely this should simply wait until somebody complains.
  57
  58  - IPv6 support in the HTTP server
  59     * of course, Linux magic auth can still work in this context; we
  60       merely have to be prepared to open one of /proc/net/tcp or
  61       /proc/net/tcp6 as appropriate.
  62
  63  - run-time configuration in the HTTP server
  64     * I think this probably works by having a configuration form, or
  65       a link pointing to one, somewhere on the report page. If you
  66       want to reconfigure anything, you fill in and submit the form;
  67       the web server receives HTTP GET with parameters and a
  68       referer, adjusts its internal configuration, and returns an
  69       HTTP redirect back to the referring page - which it then
  70       re-renders in accordance with the change.
  71     * All the same options should have their starting states
  72       configurable on the command line too.
  73
  74  - curses-ish equivalent of the web output
  75     + try using xterm 256-colour mode. Can (n)curses handle that? If
  76       not, try doing it manually.
  77     + I think my current best idea is to bypass ncurses and go
  78       straight to terminfo: generate lines of attribute-interleaved
  79       text and display them, so we only really need the sequences
  80       "go here and display stuff", "scroll up", "scroll down".
  81     + Infrastructure work before doing any of this would be to split
  82       html.c into two: one part to prepare an abstract data
  83       structure describing an HTML-like report (in particular, all
  84       the index lookups, percentage calculation, vector arithmetic
  85       and line sorting), and another part to generate the literal
  86       HTML. Then the former can be reused to produce very similar
  87       reports in coloured plain text.
  88
  89  - abstracting away all the Unix calls so as to enable a full
  90    Windows port. We can already do the difficult bit on Windows
  91    (scanning the filesystem and retrieving atime-analogues).
  92    Everything else is just coding - albeit quite a _lot_ of coding,
  93    since the Unix assumptions are woven quite tightly into the
  94    current code.
  95     + If nothing else, it's unclear what the user interface properly
  96       ought to be in a Windows port of agedu. A command-line job
  97       exactly like the Unix version might be useful to some people,
  98       but would certainly be strange and confusing to others.
  99
 100  - it might conceivably be useful to support a choice of indexing
 101    strategies. The current "continuous index" mechanism's tradeoff of
 102    taking O(N log N) space in order to be able to support any age
 103    cutoff you like is not going to be ideal for everybody. A second
 104    more conventional "discrete index" mechanism which allows the
 105    user to specify a number of fixed cutoffs and just indexes each
 106    directory on those alone would undoubtedly be a useful thing for
 107    large-scale users. This will require considerable thought about
 108    how to make the indexers pluggable at both index-generation time
 109    and query time.
 110     * however, now we have the cut-down version of the continuous
 111       index, the space saving is less compelling.
 112
 113  - A user requested what's essentially a VFS layer: given multiple
 114    index files and a map of how they fit into an overall namespace,
 115    we should be able to construct the right answers for any query
 116    about the resulting aggregated hierarchy by doing at most
 117    O(number of indexes * normal number of queries) work.
 118
 119  - Support for filtering the scan by ownership and permissions. The
 120    index data structure can't handle this, so we can't build a
 121    single index file admitting multiple subset views; but a user
 122    suggested that the scan phase could record information about
 123    ownership and permissions in the dump file, and then the indexing
 124    phase could filter down to a particular sub-view - which would at
 125    least allow the construction of various subset indices from one
 126    dump file, without having to redo the full disk scan which is the
 127    most time-consuming part of all.