| 1 | TODO list for agedu |
| 2 | =================== |
| 3 | |
| 4 | - flexibility in the HTML report output mode: expose the internal |
| 5 | mechanism for configuring the output filenames, and allow the |
| 6 | user to request individual files with hyperlinks as if the other |
| 7 | files existed. (In particular, functionality of this kind would |
| 8 | enable other modes of use like the built-in --cgi mode, without |
| 9 | me having to anticipate them in detail.) |
| 10 | |
| 11 | - non-ASCII character set support |
| 12 | + could usefully apply to --title and also to file names |
| 13 | + how do we determine the input charset? Via locale, presumably. |
| 14 | + how do we do translation? Importing my charset library is one |
| 15 | heavyweight option; alternatively, does the native C locale |
| 16 | mechanism provide enough functionality to do the job by itself? |
| 17 | + in HTML, we would need to decide on an _output_ character set, |
| 18 | specify it in a <meta http-equiv> tag, and translate to it from |
| 19 | the input locale |
| 20 | - one option is to make the output charset the same as the |
| 21 | input one, in which case all we need is to identify its name |
| 22 | for the <meta> tag |
| 23 | - the other option is to make the output charset UTF-8 always |
| 24 | and translate to that from everything else |
| 25 | - in the web server and CGI modes, it would probably be nicer |
| 26 | to move that <meta> tag into a proper HTTP header |
| 27 | + even in text mode we would want to parse the filenames in some |
| 28 | fashion, due to the unhelpful corner case of Shift-JIS Windows |
| 29 | (in which backslashes in the input string must be classified as |
| 30 | path separators or the second byte of a two-byte character) |
| 31 | - that's really painful, since it will impact string processing |
| 32 | of filenames throughout the code |
| 33 | - so perhaps a better approach would be to do locale processing |
| 34 | of filenames at _scan_ time, and normalise to UTF-8 in both |
| 35 | the index and dump files? |
| 36 | + involves incrementing the version of the dump-file format |
| 37 | + then paths given on the command line are translated |
| 38 | quickly to UTF-8 before comparing them against index paths |
| 39 | + and now the HTML output side becomes easy, though the text |
| 40 | output involves translating back again |
| 41 | + but what if the filenames aren't intended to be |
| 42 | interpreted in any particular character set (old-style |
| 43 | Unix semantics) or in a consistent one? |
| 44 | |
| 45 | - we could still be using more of the information coming from |
| 46 | autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for |
| 47 | particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR, |
| 48 | HAVE_FNMATCH). We could usefully supply alternatives for some of |
| 49 | these functions (e.g. cannibalise the PuTTY wildcard matcher for |
| 50 | use in the absence of fnmatch, switch to vanilla truncate() in |
| 51 | the absence of ftruncate); where we don't have alternative code, |
| 52 | it would perhaps be polite to throw an error at configure time |
| 53 | rather than allowing the subsequent build to fail. |
| 54 | + however, I don't see anything here that looks very |
| 55 | controversial; IIRC it's all in POSIX, for one thing. So more |
| 56 | likely this should simply wait until somebody complains. |
| 57 | |
| 58 | - IPv6 support in the HTTP server |
| 59 | * of course, Linux magic auth can still work in this context; we |
| 60 | merely have to be prepared to open one of /proc/net/tcp or |
| 61 | /proc/net/tcp6 as appropriate. |
| 62 | |
| 63 | - run-time configuration in the HTTP server |
| 64 | * I think this probably works by having a configuration form, or |
| 65 | a link pointing to one, somewhere on the report page. If you |
| 66 | want to reconfigure anything, you fill in and submit the form; |
| 67 | the web server receives HTTP GET with parameters and a |
| 68 | referer, adjusts its internal configuration, and returns an |
| 69 | HTTP redirect back to the referring page - which it then |
| 70 | re-renders in accordance with the change. |
| 71 | * All the same options should have their starting states |
| 72 | configurable on the command line too. |
| 73 | |
| 74 | - curses-ish equivalent of the web output |
| 75 | + try using xterm 256-colour mode. Can (n)curses handle that? If |
| 76 | not, try doing it manually. |
| 77 | + I think my current best idea is to bypass ncurses and go |
| 78 | straight to terminfo: generate lines of attribute-interleaved |
| 79 | text and display them, so we only really need the sequences |
| 80 | "go here and display stuff", "scroll up", "scroll down". |
| 81 | + Infrastructure work before doing any of this would be to split |
| 82 | html.c into two: one part to prepare an abstract data |
| 83 | structure describing an HTML-like report (in particular, all |
| 84 | the index lookups, percentage calculation, vector arithmetic |
| 85 | and line sorting), and another part to generate the literal |
| 86 | HTML. Then the former can be reused to produce very similar |
| 87 | reports in coloured plain text. |
| 88 | |
| 89 | - abstracting away all the Unix calls so as to enable a full |
| 90 | Windows port. We can already do the difficult bit on Windows |
| 91 | (scanning the filesystem and retrieving atime-analogues). |
| 92 | Everything else is just coding - albeit quite a _lot_ of coding, |
| 93 | since the Unix assumptions are woven quite tightly into the |
| 94 | current code. |
| 95 | + If nothing else, it's unclear what the user interface properly |
| 96 | ought to be in a Windows port of agedu. A command-line job |
| 97 | exactly like the Unix version might be useful to some people, |
| 98 | but would certainly be strange and confusing to others. |
| 99 | |
| 100 | - it might conceivably be useful to support a choice of indexing |
| 101 | strategies. The current "continuous index" mechanism' tradeoff of |
| 102 | taking O(N log N) space in order to be able to support any age |
| 103 | cutoff you like is not going to be ideal for everybody. A second |
| 104 | more conventional "discrete index" mechanism which allows the |
| 105 | user to specify a number of fixed cutoffs and just indexes each |
| 106 | directory on those alone would undoubtedly be a useful thing for |
| 107 | large-scale users. This will require considerable thought about |
| 108 | how to make the indexers pluggable at both index-generation time |
| 109 | and query time. |
| 110 | * however, now we have the cut-down version of the continuous |
| 111 | index, the space saving is less compelling. |
| 112 | |
| 113 | - A user requested what's essentially a VFS layer: given multiple |
| 114 | index files and a map of how they fit into an overall namespace, |
| 115 | we should be able to construct the right answers for any query |
| 116 | about the resulting aggregated hierarchy by doing at most |
| 117 | O(number of indexes * normal number of queries) work. |
| 118 | |
| 119 | - Support for filtering the scan by ownership and permissions. The |
| 120 | index data structure can't handle this, so we can't build a |
| 121 | single index file admitting multiple subset views; but a user |
| 122 | suggested that the scan phase could record information about |
| 123 | ownership and permissions in the dump file, and then the indexing |
| 124 | phase could filter down to a particular sub-view - which would at |
| 125 | least allow the construction of various subset indices from one |
| 126 | dump file, without having to redo the full disk scan which is the |
| 127 | most time-consuming part of all. |