TODO list for agedu =================== - flexibility in the HTML report output mode: expose the internal mechanism for configuring the output filenames, and allow the user to request individual files with hyperlinks as if the other files existed. (In particular, functionality of this kind would enable other modes of use like the built-in --cgi mode, without me having to anticipate them in detail.) - non-ASCII character set support + could usefully apply to --title and also to file names + how do we determine the input charset? Via locale, presumably. + how do we do translation? Importing my charset library is one heavyweight option; alternatively, does the native C locale mechanism provide enough functionality to do the job by itself? + in HTML, we would need to decide on an _output_ character set, specify it in a tag, and translate to it from the input locale - one option is to make the output charset the same as the input one, in which case all we need is to identify its name for the tag - the other option is to make the output charset UTF-8 always and translate to that from everything else - in the web server and CGI modes, it would probably be nicer to move that tag into a proper HTTP header + even in text mode we would want to parse the filenames in some fashion, due to the unhelpful corner case of Shift-JIS Windows (in which backslashes in the input string must be classified as path separators or the second byte of a two-byte character) - that's really painful, since it will impact string processing of filenames throughout the code - so perhaps a better approach would be to do locale processing of filenames at _scan_ time, and normalise to UTF-8 in both the index and dump files? + involves incrementing the version of the dump-file format + then paths given on the command line are translated quickly to UTF-8 before comparing them against index paths + and now the HTML output side becomes easy, though the text output involves translating back again + but what if the filenames aren't intended to be interpreted in any particular character set (old-style Unix semantics) or in a consistent one? - we could still be using more of the information coming from autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR, HAVE_FNMATCH). We could usefully supply alternatives for some of these functions (e.g. cannibalise the PuTTY wildcard matcher for use in the absence of fnmatch, switch to vanilla truncate() in the absence of ftruncate); where we don't have alternative code, it would perhaps be polite to throw an error at configure time rather than allowing the subsequent build to fail. + however, I don't see anything here that looks very controversial; IIRC it's all in POSIX, for one thing. So more likely this should simply wait until somebody complains. - run-time configuration in the HTTP server * I think this probably works by having a configuration form, or a link pointing to one, somewhere on the report page. If you want to reconfigure anything, you fill in and submit the form; the web server receives HTTP GET with parameters and a referer, adjusts its internal configuration, and returns an HTTP redirect back to the referring page - which it then re-renders in accordance with the change. * All the same options should have their starting states configurable on the command line too. - curses-ish equivalent of the web output + try using xterm 256-colour mode. Can (n)curses handle that? If not, try doing it manually. + I think my current best idea is to bypass ncurses and go straight to terminfo: generate lines of attribute-interleaved text and display them, so we only really need the sequences "go here and display stuff", "scroll up", "scroll down". + Infrastructure work before doing any of this would be to split html.c into two: one part to prepare an abstract data structure describing an HTML-like report (in particular, all the index lookups, percentage calculation, vector arithmetic and line sorting), and another part to generate the literal HTML. Then the former can be reused to produce very similar reports in coloured plain text. - abstracting away all the Unix calls so as to enable a full Windows port. We can already do the difficult bit on Windows (scanning the filesystem and retrieving atime-analogues). Everything else is just coding - albeit quite a _lot_ of coding, since the Unix assumptions are woven quite tightly into the current code. + If nothing else, it's unclear what the user interface properly ought to be in a Windows port of agedu. A command-line job exactly like the Unix version might be useful to some people, but would certainly be strange and confusing to others. - it might conceivably be useful to support a choice of indexing strategies. The current "continuous index" mechanism's tradeoff of taking O(N log N) space in order to be able to support any age cutoff you like is not going to be ideal for everybody. A second more conventional "discrete index" mechanism which allows the user to specify a number of fixed cutoffs and just indexes each directory on those alone would undoubtedly be a useful thing for large-scale users. This will require considerable thought about how to make the indexers pluggable at both index-generation time and query time. * however, now we have the cut-down version of the continuous index, the space saving is less compelling. - A user requested what's essentially a VFS layer: given multiple index files and a map of how they fit into an overall namespace, we should be able to construct the right answers for any query about the resulting aggregated hierarchy by doing at most O(number of indexes * normal number of queries) work. - Support for filtering the scan by ownership and permissions. The index data structure can't handle this, so we can't build a single index file admitting multiple subset views; but a user suggested that the scan phase could record information about ownership and permissions in the dump file, and then the indexing phase could filter down to a particular sub-view - which would at least allow the construction of various subset indices from one dump file, without having to redo the full disk scan which is the most time-consuming part of all.