70322ae3 |
1 | TODO list for agedu |
2 | =================== |
3 | |
25b6ba22 |
4 | - flexibility in the HTML report output mode: expose the internal |
5 | mechanism for configuring the output filenames, and allow the |
6 | user to request individual files with hyperlinks as if the other |
7 | files existed. (In particular, functionality of this kind would |
8 | enable other modes of use like the built-in --cgi mode, without |
9 | me having to anticipate them in detail.) |
10 | |
268e65c2 |
11 | - non-ASCII character set support |
12 | + could usefully apply to --title and also to file names |
13 | + how do we determine the input charset? Via locale, presumably. |
14 | + how do we do translation? Importing my charset library is one |
15 | heavyweight option; alternatively, does the native C locale |
16 | mechanism provide enough functionality to do the job by itself? |
17 | + in HTML, we would need to decide on an _output_ character set, |
18 | specify it in a <meta http-equiv> tag, and translate to it from |
19 | the input locale |
20 | - one option is to make the output charset the same as the |
21 | input one, in which case all we need is to identify its name |
22 | for the <meta> tag |
23 | - the other option is to make the output charset UTF-8 always |
24 | and translate to that from everything else |
25 | - in the web server and CGI modes, it would probably be nicer |
26 | to move that <meta> tag into a proper HTTP header |
27 | + even in text mode we would want to parse the filenames in some |
28 | fashion, due to the unhelpful corner case of Shift-JIS Windows |
29 | (in which backslashes in the input string must be classified as |
30 | path separators or the second byte of a two-byte character) |
31 | - that's really painful, since it will impact string processing |
32 | of filenames throughout the code |
33 | - so perhaps a better approach would be to do locale processing |
34 | of filenames at _scan_ time, and normalise to UTF-8 in both |
35 | the index and dump files? |
36 | + involves incrementing the version of the dump-file format |
37 | + then paths given on the command line are translated |
38 | quickly to UTF-8 before comparing them against index paths |
39 | + and now the HTML output side becomes easy, though the text |
40 | output involves translating back again |
41 | + but what if the filenames aren't intended to be |
42 | interpreted in any particular character set (old-style |
43 | Unix semantics) or in a consistent one? |
44 | |
50e82fdc |
45 | - we could still be using more of the information coming from |
46 | autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for |
47 | particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR, |
48 | HAVE_FNMATCH). We could usefully supply alternatives for some of |
49 | these functions (e.g. cannibalise the PuTTY wildcard matcher for |
50 | use in the absence of fnmatch, switch to vanilla truncate() in |
51 | the absence of ftruncate); where we don't have alternative code, |
52 | it would perhaps be polite to throw an error at configure time |
53 | rather than allowing the subsequent build to fail. |
54 | + however, I don't see anything here that looks very |
55 | controversial; IIRC it's all in POSIX, for one thing. So more |
56 | likely this should simply wait until somebody complains. |
57 | |
70322ae3 |
58 | - run-time configuration in the HTTP server |
59 | * I think this probably works by having a configuration form, or |
60 | a link pointing to one, somewhere on the report page. If you |
61 | want to reconfigure anything, you fill in and submit the form; |
62 | the web server receives HTTP GET with parameters and a |
63 | referer, adjusts its internal configuration, and returns an |
64 | HTTP redirect back to the referring page - which it then |
65 | re-renders in accordance with the change. |
66 | * All the same options should have their starting states |
67 | configurable on the command line too. |
68 | |
70322ae3 |
69 | - curses-ish equivalent of the web output |
70 | + try using xterm 256-colour mode. Can (n)curses handle that? If |
71 | not, try doing it manually. |
f2e52893 |
72 | + I think my current best idea is to bypass ncurses and go |
73 | straight to terminfo: generate lines of attribute-interleaved |
74 | text and display them, so we only really need the sequences |
75 | "go here and display stuff", "scroll up", "scroll down". |
f2e52893 |
76 | + Infrastructure work before doing any of this would be to split |
77 | html.c into two: one part to prepare an abstract data |
78 | structure describing an HTML-like report (in particular, all |
79 | the index lookups, percentage calculation, vector arithmetic |
80 | and line sorting), and another part to generate the literal |
81 | HTML. Then the former can be reused to produce very similar |
82 | reports in coloured plain text. |
70322ae3 |
83 | |
25b6ba22 |
84 | - abstracting away all the Unix calls so as to enable a full |
85 | Windows port. We can already do the difficult bit on Windows |
86 | (scanning the filesystem and retrieving atime-analogues). |
87 | Everything else is just coding - albeit quite a _lot_ of coding, |
88 | since the Unix assumptions are woven quite tightly into the |
89 | current code. |
90 | + If nothing else, it's unclear what the user interface properly |
91 | ought to be in a Windows port of agedu. A command-line job |
92 | exactly like the Unix version might be useful to some people, |
93 | but would certainly be strange and confusing to others. |
14601b5d |
94 | |
95 | - it might conceivably be useful to support a choice of indexing |
28da52f1 |
96 | strategies. The current "continuous index" mechanism's tradeoff of |
14601b5d |
97 | taking O(N log N) space in order to be able to support any age |
98 | cutoff you like is not going to be ideal for everybody. A second |
99 | more conventional "discrete index" mechanism which allows the |
100 | user to specify a number of fixed cutoffs and just indexes each |
101 | directory on those alone would undoubtedly be a useful thing for |
102 | large-scale users. This will require considerable thought about |
103 | how to make the indexers pluggable at both index-generation time |
104 | and query time. |
105 | * however, now we have the cut-down version of the continuous |
522edd92 |
106 | index, the space saving is less compelling. |
25b6ba22 |
107 | |
108 | - A user requested what's essentially a VFS layer: given multiple |
109 | index files and a map of how they fit into an overall namespace, |
110 | we should be able to construct the right answers for any query |
111 | about the resulting aggregated hierarchy by doing at most |
112 | O(number of indexes * normal number of queries) work. |
113 | |
114 | - Support for filtering the scan by ownership and permissions. The |
115 | index data structure can't handle this, so we can't build a |
116 | single index file admitting multiple subset views; but a user |
117 | suggested that the scan phase could record information about |
118 | ownership and permissions in the dump file, and then the indexing |
119 | phase could filter down to a particular sub-view - which would at |
120 | least allow the construction of various subset indices from one |
121 | dump file, without having to redo the full disk scan which is the |
122 | most time-consuming part of all. |