TODO list for agedu
===================
- - stop trying to calculate an upper bound on the index file size.
- Instead, just mmap it at initial size + delta, and periodically
- re-mmap it during index building if it grows too big. If we run
- out of address space, we'll hear about it eventually; and
- computing upper bounds given the new optimised index tends to be
- a factor of five out, which is bad because it'll lead to running
- out of theoretical address space and erroneously reporting
- failure long before we run out of it for real.
+ - flexibility in the HTML report output mode: expose the internal
+ mechanism for configuring the output filenames, and allow the
+ user to request individual files with hyperlinks as if the other
+ files existed. (In particular, functionality of this kind would
+ enable other modes of use like the built-in --cgi mode, without
+ me having to anticipate them in detail.)
+
+ - non-ASCII character set support
+ + could usefully apply to --title and also to file names
+ + how do we determine the input charset? Via locale, presumably.
+ + how do we do translation? Importing my charset library is one
+ heavyweight option; alternatively, does the native C locale
+ mechanism provide enough functionality to do the job by itself?
+ + in HTML, we would need to decide on an _output_ character set,
+ specify it in a <meta http-equiv> tag, and translate to it from
+ the input locale
+ - one option is to make the output charset the same as the
+ input one, in which case all we need is to identify its name
+ for the <meta> tag
+ - the other option is to make the output charset UTF-8 always
+ and translate to that from everything else
+ - in the web server and CGI modes, it would probably be nicer
+ to move that <meta> tag into a proper HTTP header
+ + even in text mode we would want to parse the filenames in some
+ fashion, due to the unhelpful corner case of Shift-JIS Windows
+ (in which backslashes in the input string must be classified as
+ path separators or the second byte of a two-byte character)
+ - that's really painful, since it will impact string processing
+ of filenames throughout the code
+ - so perhaps a better approach would be to do locale processing
+ of filenames at _scan_ time, and normalise to UTF-8 in both
+ the index and dump files?
+ + involves incrementing the version of the dump-file format
+ + then paths given on the command line are translated
+ quickly to UTF-8 before comparing them against index paths
+ + and now the HTML output side becomes easy, though the text
+ output involves translating back again
+ + but what if the filenames aren't intended to be
+ interpreted in any particular character set (old-style
+ Unix semantics) or in a consistent one?
- we could still be using more of the information coming from
autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for
HTML. Then the former can be reused to produce very similar
reports in coloured plain text.
- - http://msdn.microsoft.com/en-us/library/ms724290.aspx suggest
- modern Windowses support atime-equivalents, so a Windows port is
- possible in principle.
- + For a full Windows port, would need to modify the current
- structure a lot, to abstract away (at least) memory-mapping of
- files, details of disk scan procedure, networking for httpd.
- Unclear what the right UI would be on Windows, too;
- command-line exactly as now might be considered just a
- _little_ unfriendly. Or perhaps not.
- + Alternatively, a much easier approach would be to write a
- Windows version of just the --scan-dump mode, which does a
- filesystem scan via the Windows API and generates a valid
- agedu dump file on standard output. Then one would simply feed
- that over the network connection of one's choice to the rest
- of agedu running on Unix as usual.
+ - abstracting away all the Unix calls so as to enable a full
+ Windows port. We can already do the difficult bit on Windows
+ (scanning the filesystem and retrieving atime-analogues).
+ Everything else is just coding - albeit quite a _lot_ of coding,
+ since the Unix assumptions are woven quite tightly into the
+ current code.
+ + If nothing else, it's unclear what the user interface properly
+ ought to be in a Windows port of agedu. A command-line job
+ exactly like the Unix version might be useful to some people,
+ but would certainly be strange and confusing to others.
- it might conceivably be useful to support a choice of indexing
strategies. The current "continuous index" mechanism' tradeoff of
how to make the indexers pluggable at both index-generation time
and query time.
* however, now we have the cut-down version of the continuous
- index, it might be the case that the space gain is no longer
- worthwhile.
+ index, the space saving is less compelling.
+
+ - A user requested what's essentially a VFS layer: given multiple
+ index files and a map of how they fit into an overall namespace,
+ we should be able to construct the right answers for any query
+ about the resulting aggregated hierarchy by doing at most
+ O(number of indexes * normal number of queries) work.
+
+ - Support for filtering the scan by ownership and permissions. The
+ index data structure can't handle this, so we can't build a
+ single index file admitting multiple subset views; but a user
+ suggested that the scan phase could record information about
+ ownership and permissions in the dump file, and then the indexing
+ phase could filter down to a particular sub-view - which would at
+ least allow the construction of various subset indices from one
+ dump file, without having to redo the full disk scan which is the
+ most time-consuming part of all.