mdw@git.distorted.org.uk Git - sgt/agedu/blame_incremental

Commit	Line	Data
	1	TODO list for agedu
	2	===================
	3
	4	- flexibility in the HTML report output mode: expose the internal
	5	mechanism for configuring the output filenames, and allow the
	6	user to request individual files with hyperlinks as if the other
	7	files existed. (In particular, functionality of this kind would
	8	enable other modes of use like the built-in --cgi mode, without
	9	me having to anticipate them in detail.)
	10
	11	- non-ASCII character set support
	12	+ could usefully apply to --title and also to file names
	13	+ how do we determine the input charset? Via locale, presumably.
	14	+ how do we do translation? Importing my charset library is one
	15	heavyweight option; alternatively, does the native C locale
	16	mechanism provide enough functionality to do the job by itself?
	17	+ in HTML, we would need to decide on an _output_ character set,
	18	specify it in a <meta http-equiv> tag, and translate to it from
	19	the input locale
	20	- one option is to make the output charset the same as the
	21	input one, in which case all we need is to identify its name
	22	for the <meta> tag
	23	- the other option is to make the output charset UTF-8 always
	24	and translate to that from everything else
	25	- in the web server and CGI modes, it would probably be nicer
	26	to move that <meta> tag into a proper HTTP header
	27	+ even in text mode we would want to parse the filenames in some
	28	fashion, due to the unhelpful corner case of Shift-JIS Windows
	29	(in which backslashes in the input string must be classified as
	30	path separators or the second byte of a two-byte character)
	31	- that's really painful, since it will impact string processing
	32	of filenames throughout the code
	33	- so perhaps a better approach would be to do locale processing
	34	of filenames at _scan_ time, and normalise to UTF-8 in both
	35	the index and dump files?
	36	+ involves incrementing the version of the dump-file format
	37	+ then paths given on the command line are translated
	38	quickly to UTF-8 before comparing them against index paths
	39	+ and now the HTML output side becomes easy, though the text
	40	output involves translating back again
	41	+ but what if the filenames aren't intended to be
	42	interpreted in any particular character set (old-style
	43	Unix semantics) or in a consistent one?
	44
	45	- we could still be using more of the information coming from
	46	autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for
	47	particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR,
	48	HAVE_FNMATCH). We could usefully supply alternatives for some of
	49	these functions (e.g. cannibalise the PuTTY wildcard matcher for
	50	use in the absence of fnmatch, switch to vanilla truncate() in
	51	the absence of ftruncate); where we don't have alternative code,
	52	it would perhaps be polite to throw an error at configure time
	53	rather than allowing the subsequent build to fail.
	54	+ however, I don't see anything here that looks very
	55	controversial; IIRC it's all in POSIX, for one thing. So more
	56	likely this should simply wait until somebody complains.
	57
	58	- IPv6 support in the HTTP server
	59	* of course, Linux magic auth can still work in this context; we
	60	merely have to be prepared to open one of /proc/net/tcp or
	61	/proc/net/tcp6 as appropriate.
	62
	63	- run-time configuration in the HTTP server
	64	* I think this probably works by having a configuration form, or
	65	a link pointing to one, somewhere on the report page. If you
	66	want to reconfigure anything, you fill in and submit the form;
	67	the web server receives HTTP GET with parameters and a
	68	referer, adjusts its internal configuration, and returns an
	69	HTTP redirect back to the referring page - which it then
	70	re-renders in accordance with the change.
	71	* All the same options should have their starting states
	72	configurable on the command line too.
	73
	74	- curses-ish equivalent of the web output
	75	+ try using xterm 256-colour mode. Can (n)curses handle that? If
	76	not, try doing it manually.
	77	+ I think my current best idea is to bypass ncurses and go
	78	straight to terminfo: generate lines of attribute-interleaved
	79	text and display them, so we only really need the sequences
	80	"go here and display stuff", "scroll up", "scroll down".
	81	+ Infrastructure work before doing any of this would be to split
	82	html.c into two: one part to prepare an abstract data
	83	structure describing an HTML-like report (in particular, all
	84	the index lookups, percentage calculation, vector arithmetic
	85	and line sorting), and another part to generate the literal
	86	HTML. Then the former can be reused to produce very similar
	87	reports in coloured plain text.
	88
	89	- abstracting away all the Unix calls so as to enable a full
	90	Windows port. We can already do the difficult bit on Windows
	91	(scanning the filesystem and retrieving atime-analogues).
	92	Everything else is just coding - albeit quite a _lot_ of coding,
	93	since the Unix assumptions are woven quite tightly into the
	94	current code.
	95	+ If nothing else, it's unclear what the user interface properly
	96	ought to be in a Windows port of agedu. A command-line job
	97	exactly like the Unix version might be useful to some people,
	98	but would certainly be strange and confusing to others.
	99
	100	- it might conceivably be useful to support a choice of indexing
	101	strategies. The current "continuous index" mechanism' tradeoff of
	102	taking O(N log N) space in order to be able to support any age
	103	cutoff you like is not going to be ideal for everybody. A second
	104	more conventional "discrete index" mechanism which allows the
	105	user to specify a number of fixed cutoffs and just indexes each
	106	directory on those alone would undoubtedly be a useful thing for
	107	large-scale users. This will require considerable thought about
	108	how to make the indexers pluggable at both index-generation time
	109	and query time.
	110	* however, now we have the cut-down version of the continuous
	111	index, the space saving is less compelling.
	112
	113	- A user requested what's essentially a VFS layer: given multiple
	114	index files and a map of how they fit into an overall namespace,
	115	we should be able to construct the right answers for any query
	116	about the resulting aggregated hierarchy by doing at most
	117	O(number of indexes * normal number of queries) work.
	118
	119	- Support for filtering the scan by ownership and permissions. The
	120	index data structure can't handle this, so we can't build a
	121	single index file admitting multiple subset views; but a user
	122	suggested that the scan phase could record information about
	123	ownership and permissions in the dump file, and then the indexing
	124	phase could filter down to a particular sub-view - which would at
	125	least allow the construction of various subset indices from one
	126	dump file, without having to redo the full disk scan which is the
	127	most time-consuming part of all.

1

TODO list for agedu

2

===================

3

4

- flexibility in the HTML report output mode: expose the internal

5

mechanism for configuring the output filenames, and allow the

6

user to request individual files with hyperlinks as if the other

7

files existed. (In particular, functionality of this kind would

8

enable other modes of use like the built-in --cgi mode, without

9

me having to anticipate them in detail.)

10

11

- non-ASCII character set support

12

+ could usefully apply to --title and also to file names

13

+ how do we determine the input charset? Via locale, presumably.

14

+ how do we do translation? Importing my charset library is one

15

heavyweight option; alternatively, does the native C locale

16

mechanism provide enough functionality to do the job by itself?

17

+ in HTML, we would need to decide on an _output_ character set,

18

specify it in a <meta http-equiv> tag, and translate to it from

19

the input locale

20

- one option is to make the output charset the same as the

21

input one, in which case all we need is to identify its name

22

for the <meta> tag

23

- the other option is to make the output charset UTF-8 always

24

and translate to that from everything else

25

- in the web server and CGI modes, it would probably be nicer

26

to move that <meta> tag into a proper HTTP header

27

+ even in text mode we would want to parse the filenames in some

28

fashion, due to the unhelpful corner case of Shift-JIS Windows

29

(in which backslashes in the input string must be classified as

30

path separators or the second byte of a two-byte character)

31

- that's really painful, since it will impact string processing

32

of filenames throughout the code

33

- so perhaps a better approach would be to do locale processing

34

of filenames at _scan_ time, and normalise to UTF-8 in both

35

the index and dump files?

36

+ involves incrementing the version of the dump-file format

37

+ then paths given on the command line are translated

38

quickly to UTF-8 before comparing them against index paths

39

+ and now the HTML output side becomes easy, though the text

40

output involves translating back again

41

+ but what if the filenames aren't intended to be

42

interpreted in any particular character set (old-style

43

Unix semantics) or in a consistent one?

44

45

- we could still be using more of the information coming from

46

autoconf. Our config.h is defining a whole bunch of HAVE_FOOs for

47

particular functions (e.g. HAVE_INET_NTOA, HAVE_MEMCHR,

48

HAVE_FNMATCH). We could usefully supply alternatives for some of

49

these functions (e.g. cannibalise the PuTTY wildcard matcher for

50

use in the absence of fnmatch, switch to vanilla truncate() in

51

the absence of ftruncate); where we don't have alternative code,

52

it would perhaps be polite to throw an error at configure time

53

rather than allowing the subsequent build to fail.

54

+ however, I don't see anything here that looks very

55

controversial; IIRC it's all in POSIX, for one thing. So more

56

likely this should simply wait until somebody complains.

57

58

- IPv6 support in the HTTP server

59

* of course, Linux magic auth can still work in this context; we

60

merely have to be prepared to open one of /proc/net/tcp or

61

/proc/net/tcp6 as appropriate.

62

63

- run-time configuration in the HTTP server

64

* I think this probably works by having a configuration form, or

65

a link pointing to one, somewhere on the report page. If you

66

want to reconfigure anything, you fill in and submit the form;

67

the web server receives HTTP GET with parameters and a

68

referer, adjusts its internal configuration, and returns an

69

HTTP redirect back to the referring page - which it then

70

re-renders in accordance with the change.

71

* All the same options should have their starting states

72

configurable on the command line too.

73

74

- curses-ish equivalent of the web output

75

+ try using xterm 256-colour mode. Can (n)curses handle that? If

76

not, try doing it manually.

77

+ I think my current best idea is to bypass ncurses and go

78

straight to terminfo: generate lines of attribute-interleaved

79

text and display them, so we only really need the sequences

80

"go here and display stuff", "scroll up", "scroll down".

81

+ Infrastructure work before doing any of this would be to split

82

html.c into two: one part to prepare an abstract data

83

structure describing an HTML-like report (in particular, all

84

the index lookups, percentage calculation, vector arithmetic

85

and line sorting), and another part to generate the literal

86

HTML. Then the former can be reused to produce very similar

87

reports in coloured plain text.

88

89

- abstracting away all the Unix calls so as to enable a full

90

Windows port. We can already do the difficult bit on Windows

91

(scanning the filesystem and retrieving atime-analogues).

92

Everything else is just coding - albeit quite a _lot_ of coding,

93

since the Unix assumptions are woven quite tightly into the

94

current code.

95

+ If nothing else, it's unclear what the user interface properly

96

ought to be in a Windows port of agedu. A command-line job

97

exactly like the Unix version might be useful to some people,

98

but would certainly be strange and confusing to others.

99

100

- it might conceivably be useful to support a choice of indexing

101

strategies. The current "continuous index" mechanism' tradeoff of

102

taking O(N log N) space in order to be able to support any age

103

cutoff you like is not going to be ideal for everybody. A second

104

more conventional "discrete index" mechanism which allows the

105

user to specify a number of fixed cutoffs and just indexes each

106

directory on those alone would undoubtedly be a useful thing for

107

large-scale users. This will require considerable thought about

108

how to make the indexers pluggable at both index-generation time

109

and query time.

110

* however, now we have the cut-down version of the continuous

111

index, the space saving is less compelling.

112

113

- A user requested what's essentially a VFS layer: given multiple

114

index files and a map of how they fit into an overall namespace,

115

we should be able to construct the right answers for any query

116

about the resulting aggregated hierarchy by doing at most

117

O(number of indexes * normal number of queries) work.

118

119

- Support for filtering the scan by ownership and permissions. The

120

index data structure can't handle this, so we can't build a

121

single index file admitting multiple subset views; but a user

122

suggested that the scan phase could record information about

123

ownership and permissions in the dump file, and then the indexing

124

phase could filter down to a particular sub-view - which would at

125

least allow the construction of various subset indices from one

126

dump file, without having to redo the full disk scan which is the

127

most time-consuming part of all.