TODO list for agedu
===================
-Before it's non-embarrassingly releasable:
+ - flexibility in the HTML report output mode: expose the internal
+ mechanism for configuring the output filenames, and allow the
+ user to request individual files with hyperlinks as if the other
+ files existed. (In particular, functionality of this kind would
+ enable other modes of use like the built-in --cgi mode, without
+ me having to anticipate them in detail.)
- - render HTTP access control more sane.
- * we should have the configurable option to use HTTP Basic
- authentication or Linux magic /proc/net/tcp
- * a third option, and the default one, should be to _try_ to use
- magic auth, and fall back to HTTP Basic if unavailable
+ - non-ASCII character set support
+ + could usefully apply to --title and also to file names
+ + how do we determine the input charset? Via locale, presumably.
+ + how do we do translation? Importing my charset library is one
+ heavyweight option; alternatively, does the native C locale
+ mechanism provide enough functionality to do the job by itself?
+ + in HTML, we would need to decide on an _output_ character set,
+ specify it in a <meta http-equiv> tag, and translate to it from
+ the input locale
+ - one option is to make the output charset the same as the
+ input one, in which case all we need is to identify its name
+ for the <meta> tag
+ - the other option is to make the output charset UTF-8 always
+ and translate to that from everything else
+ - in the web server and CGI modes, it would probably be nicer
+ to move that <meta> tag into a proper HTTP header
+ + even in text mode we would want to parse the filenames in some
+ fashion, due to the unhelpful corner case of Shift-JIS Windows
+ (in which backslashes in the input string must be classified as
+ path separators or the second byte of a two-byte character)
+ - that's really painful, since it will impact string processing
+ of filenames throughout the code
+ - so perhaps a better approach would be to do locale processing
+ of filenames at _scan_ time, and normalise to UTF-8 in both
+ the index and dump files?
+ + involves incrementing the version of the dump-file format
+ + then paths given on the command line are translated
+ quickly to UTF-8 before comparing them against index paths
+ + and now the HTML output side becomes easy, though the text
+ output involves translating back again
+ + but what if the filenames aren't intended to be
+ interpreted in any particular character set (old-style
+ Unix semantics) or in a consistent one?
- - sort out the command line syntax
- * I think there should be a unified --mode / -M for every
- running mode, possibly without the one-letter option for the
- diagnostic sorts of things
- * there should be some configurable options:
- + range limits on the age display
- + server address in httpd mode
+ - we could still be using more of the information coming from
+ autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for
+ particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR,
+ HAVE_FNMATCH). We could usefully supply alternatives for some of
+ these functions (e.g. cannibalise the PuTTY wildcard matcher for
+ use in the absence of fnmatch, switch to vanilla truncate() in
+ the absence of ftruncate); where we don't have alternative code,
+ it would perhaps be polite to throw an error at configure time
+ rather than allowing the subsequent build to fail.
+ + however, I don't see anything here that looks very
+ controversial; IIRC it's all in POSIX, for one thing. So more
+ likely this should simply wait until somebody complains.
- - do some configurability for the disk scan
- * wildcard-based includes and excludes
- + wildcards can act on the last pathname component or the
- whole lot
- + include and exclude can be interleaved; implicit "include
- *" before any
- * reinstate filesystem crossing, though not doing so should
- remain the default
-
- - polish up disk-scan progress reporting
- * by default it should be conditional on isatty(2)
- * manual override to enable or disable
- * we should find rather than guessing the terminal width
-
- - work out what to do about atimes on directories
- * one option is to read them during the scan and reinstate them
- after each recursion pop. Race-condition prone.
- * marking them in a distinctive colour in the reports is the
- other option.
-
- - make a final decision on the name!
-
-Future directions:
+ - IPv6 support in the HTTP server
+ * of course, Linux magic auth can still work in this context; we
+ merely have to be prepared to open one of /proc/net/tcp or
+ /proc/net/tcp6 as appropriate.
- run-time configuration in the HTTP server
* I think this probably works by having a configuration form, or
* All the same options should have their starting states
configurable on the command line too.
- - polish the plain-text output:
- + do the same formatting as in HTML, by showing files as a
- single unit and also sorting by size? (Probably the other way
- up, due to scrolling.)
- + configurable recursive output depth
-
- curses-ish equivalent of the web output
+ try using xterm 256-colour mode. Can (n)curses handle that? If
not, try doing it manually.
+ + I think my current best idea is to bypass ncurses and go
+ straight to terminfo: generate lines of attribute-interleaved
+ text and display them, so we only really need the sequences
+ "go here and display stuff", "scroll up", "scroll down".
+ + Infrastructure work before doing any of this would be to split
+ html.c into two: one part to prepare an abstract data
+ structure describing an HTML-like report (in particular, all
+ the index lookups, percentage calculation, vector arithmetic
+ and line sorting), and another part to generate the literal
+ HTML. Then the former can be reused to produce very similar
+ reports in coloured plain text.
+
+ - abstracting away all the Unix calls so as to enable a full
+ Windows port. We can already do the difficult bit on Windows
+ (scanning the filesystem and retrieving atime-analogues).
+ Everything else is just coding - albeit quite a _lot_ of coding,
+ since the Unix assumptions are woven quite tightly into the
+ current code.
+ + If nothing else, it's unclear what the user interface properly
+ ought to be in a Windows port of agedu. A command-line job
+ exactly like the Unix version might be useful to some people,
+ but would certainly be strange and confusing to others.
- - cross-module:
- + figure out what to do about scans starting in the root
- directory!
- * Currently we end up with a double leading slash on the
- pathnames, which is ugly, and we also get a zero-length
- href in between those slashes which means the web interface
- doesn't let you click back up to the top level at all.
- * One big problem here is that a lot of the code assumes that
- you can find the extent of a pathname by searching for
- "foo" and "foo^A", trusting that anything inside the
- directory will begin "foo/". So I'd need to consistently
- fix this everywhere so that a trailing slash is disregarded
- while doing it, but not actually removed.
- * The text output gets it all wrong.
- * The HTML output is fiddly even at the design stage: where
- would I _ideally_ put the link to click on to get back to
- /? It's unclear!
+ - it might conceivably be useful to support a choice of indexing
+ strategies. The current "continuous index" mechanism' tradeoff of
+ taking O(N log N) space in order to be able to support any age
+ cutoff you like is not going to be ideal for everybody. A second
+ more conventional "discrete index" mechanism which allows the
+ user to specify a number of fixed cutoffs and just indexes each
+ directory on those alone would undoubtedly be a useful thing for
+ large-scale users. This will require considerable thought about
+ how to make the indexers pluggable at both index-generation time
+ and query time.
+ * however, now we have the cut-down version of the continuous
+ index, the space saving is less compelling.
- - more flexible running modes
- + decouple the disk scan from the index building code, so that
- the former can optionally output in the same format as --dump
- and the latter can optionally work from input on stdin (having
- also fixed the --dump format in the process so it's perfectly
- general). Then we could scan on one machine and transfer the
- results over the net to another machine where they'd be
- indexed; in particular, this way the indexing machine could be
- 64-bit even if the machine owning the filesystems was only 32.
- + ability to build a database _and_ immediately run one of the
- ongoing interactive report modes (httpd, curses) would seem
- handy.
+ - A user requested what's essentially a VFS layer: given multiple
+ index files and a map of how they fit into an overall namespace,
+ we should be able to construct the right answers for any query
+ about the resulting aggregated hierarchy by doing at most
+ O(number of indexes * normal number of queries) work.
- - portability
- + between Unices:
- * autoconf?
- * configure use of stat64
- * configure use of /proc/net/tcp
- * what do we do elsewhere about _GNU_SOURCE?
- + further afield: is there in fact any non-Unix OS that supports
- atimes and hence can be used with agedu at all?
- * yes! http://msdn.microsoft.com/en-us/library/ms724290.aspx
+ - Support for filtering the scan by ownership and permissions. The
+ index data structure can't handle this, so we can't build a
+ single index file admitting multiple subset views; but a user
+ suggested that the scan phase could record information about
+ ownership and permissions in the dump file, and then the indexing
+ phase could filter down to a particular sub-view - which would at
+ least allow the construction of various subset indices from one
+ dump file, without having to redo the full disk scan which is the
+ most time-consuming part of all.