70322ae3 |
1 | TODO list for agedu |
2 | =================== |
3 | |
25b6ba22 |
4 | - flexibility in the HTML report output mode: expose the internal |
5 | mechanism for configuring the output filenames, and allow the |
6 | user to request individual files with hyperlinks as if the other |
7 | files existed. (In particular, functionality of this kind would |
8 | enable other modes of use like the built-in --cgi mode, without |
9 | me having to anticipate them in detail.) |
10 | |
268e65c2 |
11 | - non-ASCII character set support |
12 | + could usefully apply to --title and also to file names |
13 | + how do we determine the input charset? Via locale, presumably. |
14 | + how do we do translation? Importing my charset library is one |
15 | heavyweight option; alternatively, does the native C locale |
16 | mechanism provide enough functionality to do the job by itself? |
17 | + in HTML, we would need to decide on an _output_ character set, |
18 | specify it in a <meta http-equiv> tag, and translate to it from |
19 | the input locale |
20 | - one option is to make the output charset the same as the |
21 | input one, in which case all we need is to identify its name |
22 | for the <meta> tag |
23 | - the other option is to make the output charset UTF-8 always |
24 | and translate to that from everything else |
25 | - in the web server and CGI modes, it would probably be nicer |
26 | to move that <meta> tag into a proper HTTP header |
27 | + even in text mode we would want to parse the filenames in some |
28 | fashion, due to the unhelpful corner case of Shift-JIS Windows |
29 | (in which backslashes in the input string must be classified as |
30 | path separators or the second byte of a two-byte character) |
31 | - that's really painful, since it will impact string processing |
32 | of filenames throughout the code |
33 | - so perhaps a better approach would be to do locale processing |
34 | of filenames at _scan_ time, and normalise to UTF-8 in both |
35 | the index and dump files? |
36 | + involves incrementing the version of the dump-file format |
37 | + then paths given on the command line are translated |
38 | quickly to UTF-8 before comparing them against index paths |
39 | + and now the HTML output side becomes easy, though the text |
40 | output involves translating back again |
41 | + but what if the filenames aren't intended to be |
42 | interpreted in any particular character set (old-style |
43 | Unix semantics) or in a consistent one? |
44 | |
50e82fdc |
45 | - we could still be using more of the information coming from |
46 | autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for |
47 | particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR, |
48 | HAVE_FNMATCH). We could usefully supply alternatives for some of |
49 | these functions (e.g. cannibalise the PuTTY wildcard matcher for |
50 | use in the absence of fnmatch, switch to vanilla truncate() in |
51 | the absence of ftruncate); where we don't have alternative code, |
52 | it would perhaps be polite to throw an error at configure time |
53 | rather than allowing the subsequent build to fail. |
54 | + however, I don't see anything here that looks very |
55 | controversial; IIRC it's all in POSIX, for one thing. So more |
56 | likely this should simply wait until somebody complains. |
57 | |
1e8d78b9 |
58 | - IPv6 support in the HTTP server |
cfe942fb |
59 | * of course, Linux magic auth can still work in this context; we |
60 | merely have to be prepared to open one of /proc/net/tcp or |
61 | /proc/net/tcp6 as appropriate. |
1e8d78b9 |
62 | |
70322ae3 |
63 | - run-time configuration in the HTTP server |
64 | * I think this probably works by having a configuration form, or |
65 | a link pointing to one, somewhere on the report page. If you |
66 | want to reconfigure anything, you fill in and submit the form; |
67 | the web server receives HTTP GET with parameters and a |
68 | referer, adjusts its internal configuration, and returns an |
69 | HTTP redirect back to the referring page - which it then |
70 | re-renders in accordance with the change. |
71 | * All the same options should have their starting states |
72 | configurable on the command line too. |
73 | |
70322ae3 |
74 | - curses-ish equivalent of the web output |
75 | + try using xterm 256-colour mode. Can (n)curses handle that? If |
76 | not, try doing it manually. |
f2e52893 |
77 | + I think my current best idea is to bypass ncurses and go |
78 | straight to terminfo: generate lines of attribute-interleaved |
79 | text and display them, so we only really need the sequences |
80 | "go here and display stuff", "scroll up", "scroll down". |
f2e52893 |
81 | + Infrastructure work before doing any of this would be to split |
82 | html.c into two: one part to prepare an abstract data |
83 | structure describing an HTML-like report (in particular, all |
84 | the index lookups, percentage calculation, vector arithmetic |
85 | and line sorting), and another part to generate the literal |
86 | HTML. Then the former can be reused to produce very similar |
87 | reports in coloured plain text. |
70322ae3 |
88 | |
25b6ba22 |
89 | - abstracting away all the Unix calls so as to enable a full |
90 | Windows port. We can already do the difficult bit on Windows |
91 | (scanning the filesystem and retrieving atime-analogues). |
92 | Everything else is just coding - albeit quite a _lot_ of coding, |
93 | since the Unix assumptions are woven quite tightly into the |
94 | current code. |
95 | + If nothing else, it's unclear what the user interface properly |
96 | ought to be in a Windows port of agedu. A command-line job |
97 | exactly like the Unix version might be useful to some people, |
98 | but would certainly be strange and confusing to others. |
14601b5d |
99 | |
100 | - it might conceivably be useful to support a choice of indexing |
101 | strategies. The current "continuous index" mechanism' tradeoff of |
102 | taking O(N log N) space in order to be able to support any age |
103 | cutoff you like is not going to be ideal for everybody. A second |
104 | more conventional "discrete index" mechanism which allows the |
105 | user to specify a number of fixed cutoffs and just indexes each |
106 | directory on those alone would undoubtedly be a useful thing for |
107 | large-scale users. This will require considerable thought about |
108 | how to make the indexers pluggable at both index-generation time |
109 | and query time. |
110 | * however, now we have the cut-down version of the continuous |
522edd92 |
111 | index, the space saving is less compelling. |
25b6ba22 |
112 | |
113 | - A user requested what's essentially a VFS layer: given multiple |
114 | index files and a map of how they fit into an overall namespace, |
115 | we should be able to construct the right answers for any query |
116 | about the resulting aggregated hierarchy by doing at most |
117 | O(number of indexes * normal number of queries) work. |
118 | |
119 | - Support for filtering the scan by ownership and permissions. The |
120 | index data structure can't handle this, so we can't build a |
121 | single index file admitting multiple subset views; but a user |
122 | suggested that the scan phase could record information about |
123 | ownership and permissions in the dump file, and then the indexing |
124 | phase could filter down to a particular sub-view - which would at |
125 | least allow the construction of various subset indices from one |
126 | dump file, without having to redo the full disk scan which is the |
127 | most time-consuming part of all. |