Fiddle about with the configure script so it notices the need for
[sgt/agedu] / agedu.but
1 \cfg{man-identity}{agedu}{1}{2008-11-02}{Simon Tatham}{Simon Tatham}
2
3 \define{dash} \u2013{-}
4
5 \title Man page for \cw{agedu}
6
7 \U NAME
8
9 \cw{agedu} \dash correlate disk usage with last-access times to
10 identify large and disused data
11
12 \U SYNOPSIS
13
14 \c agedu [ options ] action [action...]
15 \e bbbbb iiiiiii iiiiii iiiiii
16
17 \U DESCRIPTION
18
19 \cw{agedu} scans a directory tree and produces reports about how
20 much disk space is used in each directory and subdirectory, and also
21 how that usage of disk space corresponds to files with last-access
22 times a long time ago.
23
24 In other words, \cw{agedu} is a tool you might use to help you free
25 up disk space. It lets you see which directories are taking up the
26 most space, as \cw{du} does; but unlike \cw{du}, it also
27 distinguishes between large collections of data which are still in
28 use and ones which have not been accessed in months or years \dash
29 for instance, large archives downloaded, unpacked, used once, and
30 never cleaned up. Where \cw{du} helps you find what's using your
31 disk space, \cw{agedu} helps you find what's \e{wasting} your disk
32 space.
33
34 \cw{agedu} has several operating modes. In one mode, it scans your
35 disk and builds an index file containing a data structure which
36 allows it to efficiently retrieve any information it might need.
37 Typically, you would use it in this mode first, and then run it in
38 one of a number of \q{query} modes to display a report of the disk
39 space usage of a particular directory and its subdirectories. Those
40 reports can be produced as plain text (much like \cw{du}) or as
41 HTML. \cw{agedu} can even run as a miniature web server, presenting
42 each directory's HTML report with hyperlinks to let you navigate
43 around the file system to similar reports for other directories.
44
45 So you would typically start using \cw{agedu} by telling it to do a
46 scan of a directory tree and build an index. This is done with a
47 command such as
48
49 \c $ agedu -s /home/fred
50 \e bbbbbbbbbbbbbbbbbbb
51
52 which will build a large data file called \c{agedu.dat} in your
53 current directory. (If that current directory is \e{inside}
54 \cw{/home/fred}, don't worry \dash \cw{agedu} is smart enough to
55 discount its own index file.)
56
57 Having built the index, you would now query it for reports of disk
58 space usage. If you have a graphical web browser, the simplest and
59 nicest way to query the index is by running \cw{agedu} in web server
60 mode:
61
62 \c $ agedu -w
63 \e bbbbbbbb
64
65 which will print (among other messages) a URL on its standard output
66 along the lines of
67
68 \c URL: http://127.0.0.1:48638/
69
70 (That URL will always begin with \cq{127.}, meaning that it's in the
71 \cw{localhost} address space. So only processes running on the same
72 computer can even try to connect to that web server, and also there
73 is access control to prevent other users from seeing it \dash see
74 below for more detail.)
75
76 Now paste that URL into your web browser, and you will be shown a
77 graphical representation of the disk usage in \cw{/home/fred} and
78 its immediate subdirectories, with varying colours used to show the
79 difference between disused and recently-accessed data. Click on any
80 subdirectory to descend into it and see a report for its
81 subdirectories in turn; click on parts of the pathname at the top of
82 any page to return to higher-level directories. When you've finished
83 browsing, you can just press Ctrl-D to send an end-of-file
84 indication to \cw{agedu}, and it will shut down.
85
86 After that, you probably want to delete the data file
87 \cw{agedu.dat}, since it's pretty large. In fact, the command
88 \cw{agedu -R} will do this for you; and you can chain \cw{agedu}
89 commands on the same command line, so that instead of the above you
90 could have done
91
92 \c $ agedu -s /home/fred -w -R
93 \e bbbbbbbbbbbbbbbbbbbbbbbbb
94
95 for a single self-contained run of \cw{agedu} which builds its
96 index, serves web pages from it, and cleans it up when finished.
97
98 If you don't have a graphical web browser, you can do text-based
99 queries as well. Having scanned \cw{/home/fred} as above, you might
100 run
101
102 \c $ agedu -t /home/fred
103 \e bbbbbbbbbbbbbbbbbbb
104
105 which again gives a summary of the disk usage in \cw{/home/fred} and
106 its immediate subdirectories; but this time \cw{agedu} will print it
107 on standard output, in much the same format as \cw{du}. If you then
108 want to find out how much \e{old} data is there, you can add the
109 \cw{-a} option to show only files last accessed a certain length of
110 time ago. For example, to show only files which haven't been looked
111 at in six months or more:
112
113 \c $ agedu -t /home/fred -a 6m
114 \e bbbbbbbbbbbbbbbbbbbbbbbbb
115
116 That's the essence of what \cw{agedu} does. It has other modes of
117 operation for more complex situations, and the usual array of
118 configurable options. The following sections contain a complete
119 reference for all its functionality.
120
121 \U OPERATING MODES
122
123 This section describes the operating modes supported by \cw{agedu}.
124 Each of these is in the form of a command-line option, sometimes
125 with an argument. Multiple operating-mode options may appear on the
126 command line, in which case \cw{agedu} will perform the specified
127 actions one after another. For instance, as shown in the previous
128 section, you might want to perform a disk scan and immediately
129 launch a web server giving reports from that scan.
130
131 \dt \cw{-s} \e{directory} or \cw{--scan} \e{directory}
132
133 \dd In this mode, \cw{agedu} scans the file system starting at the
134 specified directory, and indexes the results of the scan into a
135 large data file which other operating modes can query.
136
137 \lcont{
138
139 By default, the scan is restricted to a single file system (since
140 the expected use of \cw{agedu} is that you would probably use it
141 because a particular disk partition was running low on space). You
142 can remove that restriction using the \cw{--cross-fs} option; other
143 configuration options allow you to include or exclude files or
144 entire subdirectories from the scan. See the next section for full
145 details of the configurable options.
146
147 The index file is created with restrictive permissions, in case the
148 file system you are scanning contains confidential information in
149 its structure.
150
151 Index files are dependent on the characteristics of the CPU
152 architecture you created them on. You should not expect to be able
153 to move an index file between different types of computer and have
154 it continue to work. If you need to transfer the results of a disk
155 scan to a different kind of computer, see the \cw{-D} and \cw{-L}
156 options below.
157
158 }
159
160 \dt \cw{-w} or \cw{--web}
161
162 \dd In this mode, \cw{agedu} expects to find an index file already
163 written. It allocates a network port, and starts up a web server on
164 that port which serves reports generated from the index file. By
165 default it invents its own URL and prints it out.
166
167 \lcont{
168
169 The web server runs until \cw{agedu} receives an end-of-file event
170 on its standard input. (The expected usage is that you run it from
171 the command line, immediately browse web pages until you're
172 satisfied, and then press Ctrl-D.)
173
174 In case the index file contains any confidential information about
175 your file system, the web server protects the pages it serves from
176 access by other people. On Linux, this is done transparently by
177 means of using \cw{/proc/net/tcp} to check the owner of each
178 incoming connection; failing that, the web server will require a
179 password to view the reports, and \cw{agedu} will print the password
180 it invented on standard output along with the URL.
181
182 Configurable options for this mode let you specify your own address
183 and port number to listen on, and also specify your own choice of
184 authentication method (including turning authentication off
185 completely) and a username and password of your choice.
186
187 }
188
189 \dt \cw{-t} \e{directory} or \cw{--text} \e{directory}
190
191 \dd In this mode, \cw{agedu} generates a textual report on standard
192 output, listing the disk usage in the specified directory and all
193 its subdirectories down to a given depth. By default that depth is
194 1, so that you see a report for \e{directory} itself and all of its
195 immediate subdirectories. You can configure a different depth (or no
196 depth limit) using \cw{-d}, described in the next section.
197
198 \lcont{
199
200 Used on its own, \cw{-t} merely lists the \e{total} disk usage in
201 each subdirectory; \cw{agedu}'s additional ability to distinguish
202 unused from recently-used data is not activated. To activate it, use
203 the \cw{-a} option to specify a minimum age.
204
205 The directory structure stored in \cw{agedu}'s index file is treated
206 as a set of literal strings. This means that you cannot refer to
207 directories by synonyms. So if you ran \cw{agedu -s .}, then all the
208 path names you later pass to the \cw{-t} option must be either
209 \cq{.} or begin with \cq{./}. Similarly, symbolic links within the
210 directory you scanned will not be followed; you must refer to each
211 directory by its canonical, symlink-free pathname.
212
213 }
214
215 \dt \cw{-R} or \cw{--remove}
216
217 \dd In this mode, \cw{agedu} deletes its index file. Running just
218 \cw{agedu -R} on its own is therefore equivalent to typing \cw{rm
219 agedu.dat}. However, you can also put \cw{-R} on the end of a
220 command line to indicate that \cw{agedu} should delete its index
221 file after it finishes performing other operations.
222
223 \dt \cw{-D} or \cw{--dump}
224
225 \dd In this mode, \cw{agedu} reads an existing index file and
226 produces a dump of its contents on standard output. This dump can
227 later be loaded into a new index file, perhaps on another computer.
228
229 \dt \cw{-L} or \cw{--load}
230
231 \dd In this mode, \cw{agedu} expects to read a dump produced by the
232 \cw{-D} option from its standard input. It constructs an index file
233 from that dump, exactly as it would have if it had read the same
234 data from a disk scan in \cw{-s} mode.
235
236 \dt \cw{-S} \e{directory} or \cw{--scan-dump} \e{directory}
237
238 \dd In this mode, \cw{agedu} will scan a directory tree and convert
239 the results straight into a dump on standard output, without
240 generating an index file at all. So running \cw{agedu -S /path}
241 should produce equivalent output to that of \cw{agedu -s /path -D},
242 except that the latter will produce an index file as a side effect
243 whereas \cw{-S} will not.
244
245 \lcont{
246
247 (The output will not be exactly \e{identical}, due to a
248 difference in treatment of last-access times on directories.
249 However, it should be effectively equivalent for most purposes. See
250 the documentation of the \cw{--dir-atime} option in the next section
251 for further detail.)
252
253 }
254
255 \dt \cw{-H} \e{directory} or \cw{--html} \e{directory}
256
257 \dd In this mode, \cw{agedu} will generate an HTML report of the
258 disk usage in the specified directory and its immediate
259 subdirectories, in the same form that it serves from its web server
260 in \cw{-w} mode.
261
262 \lcont{
263
264 By default, a single HTML report will be generated and simply
265 written to standard output, with no hyperlinks pointing to other
266 similar pages. If you also specify the \cw{-d} option (see below),
267 \cw{agedu} will instead write out a collection of HTML files with
268 hyperlinks between them, and call the top-level file
269 \cw{index.html}.
270
271 }
272
273 \dt \cw{--cgi}
274
275 \dd In this mode, \cw{agedu} will run as the bulk of a CGI script
276 which provides the same set of web pages as the built-in web server
277 would. It will read the usual CGI environment variables, and write
278 CGI-style data to its standard output.
279
280 \lcont{
281
282 The actual CGI program itself should be a tiny wrapper around
283 \cw{agedu} which passes it the \cw{--cgi} option, and also
284 (probably) \cw{-f} to locate the index file. \cw{agedu} will do
285 everything else.
286
287 No access control is performed in this mode: restricting access to
288 CGI scripts is assumed to be the job of the web server.
289
290 }
291
292 \U OPTIONS
293
294 This section describes the various configuration options that affect
295 \cw{agedu}'s operation in one mode or another.
296
297 The following option affects nearly all modes (except \cw{-S}):
298
299 \dt \cw{-f} \e{filename} or \cw{--file} \e{filename}
300
301 \dd Specifies the location of the index file which \cw{agedu}
302 creates, reads or removes depending on its operating mode. By
303 default, this is simply \cq{agedu.dat}, in whatever is the current
304 working directory when you run \cw{agedu}.
305
306 The following options affect the disk-scanning modes, \cw{-s} and
307 \cw{-S}:
308
309 \dt \cw{--cross-fs} and \cw{--no-cross-fs}
310
311 \dd These configure whether or not the disk scan is permitted to
312 cross between different file systems. The default is not to:
313 \cw{agedu} will normally skip over subdirectories on which a
314 different file system is mounted. This makes it convenient when you
315 want to free up space on a particular file system which is running
316 low. However, in other circumstances you might wish to see general
317 information about the use of space no matter which file system it's
318 on (for instance, if your real concern is your backup media running
319 out of space, and if your backups do not treat different file
320 systems specially); in that situation, use \cw{--cross-fs}.
321
322 \lcont{
323
324 (Note that this default is the opposite way round from the
325 corresponding option in \cw{du}.)
326
327 }
328
329 \dt \cw{--prune} \e{wildcard} and \cw{--prune-path} \e{wildcard}
330
331 \dd These cause particular files or directories to be omitted
332 entirely from the scan. If \cw{agedu}'s scan encounters a file or
333 directory whose name matches the wildcard provided to the
334 \cw{--prune} option, it will not include that file in its index, and
335 also if it's a directory it will skip over it and not scan its
336 contents.
337
338 \lcont{
339
340 Note that in most Unix shells, wildcards will probably need to be
341 escaped on the command line, to prevent the shell from expanding the
342 wildcard before \cw{agedu} sees it.
343
344 \cw{--prune-path} is similar to \cw{--prune}, except that the
345 wildcard is matched against the entire pathname instead of just the
346 filename at the end of it. So whereas \cw{--prune *a*b*} will match
347 any file whose actual name contains an \cw{a} somewhere before a
348 \cw{b}, \cw{--prune-path *a*b*} will also match a file whose name
349 contains \cw{b} and which is inside a directory containing an
350 \cw{a}, or any file inside a directory of that form, and so on.
351
352 }
353
354 \dt \cw{--exclude} \e{wildcard} and \cw{--exclude-path} \e{wildcard}
355
356 \dd These cause particular files or directories to be omitted from
357 the index, but not from the scan. If \cw{agedu}'s scan encounters a
358 file or directory whose name matches the wildcard provided to the
359 \cw{--exclude} option, it will not include that file in its index
360 \dash but unlike \cw{--prune}, if the file in question is a
361 directory it will still scan its contents and index them if they are
362 not ruled out themselves by \cw{--exclude} options.
363
364 \lcont{
365
366 As above, \cw{--exclude-path} is similar to \cw{--exclude}, except
367 that the wildcard is matched against the entire pathname.
368
369 }
370
371 \dt \cw{--include} \e{wildcard} and \cw{--include-path} \e{wildcard}
372
373 \dd These cause particular files or directories to be re-included in
374 the index and the scan, if they had previously been ruled out by one
375 of the above exclude or prune options. You can interleave include,
376 exclude and prune options as you wish on the command line, and if
377 more than one of them applies to a file then the last one takes
378 priority.
379
380 \lcont{
381
382 For example, if you wanted to see only the disk space taken up by
383 MP3 files, you might run
384
385 \c $ agedu -s . --exclude '*' --include '*.mp3'
386 \e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
387
388 which will cause everything to be omitted from the scan, but then
389 the MP3 files to be put back in. If you then wanted only a subset of
390 those MP3s, you could then exclude some of them again by adding,
391 say, \cq{--exclude-path './queen/*'} (or, more efficiently,
392 \cq{--prune ./queen}) on the end of that command.
393
394 As with the previous two options, \cw{--include-path} is similar to
395 \cw{--include} except that the wildcard is matched against the
396 entire pathname.
397
398 }
399
400 \dt \cw{--progress}, \cw{--no-progress} and \cw{--tty-progress}
401
402 \dd When \cw{agedu} is scanning a directory tree, it will typically
403 print a one-line progress report every second showing where it has
404 reached in the scan, so you can have some idea of how much longer it
405 will take. (Of course, it can't predict \e{exactly} how long it will
406 take, since it doesn't know which of the directories it hasn't
407 scanned yet will turn out to be huge.)
408
409 \lcont{
410
411 By default, those progress reports are displayed on \cw{agedu}'s
412 standard error channel, if that channel points to a terminal device.
413 If you need to manually enable or disable them, you can use the
414 above three options to do so: \cw{--progress} unconditionally
415 enables the progress reports, \cw{--no-progress} unconditionally
416 disables them, and \cw{--tty-progress} reverts to the default
417 behaviour which is conditional on standard error being a terminal.
418
419 }
420
421 \dt \cw{--dir-atime} and \cw{--no-dir-atime}
422
423 \dd In normal operation, \cw{agedu} ignores the atimes (last access
424 times) on the \e{directories} it scans: it only pays attention to
425 the atimes of the \e{files} inside those directories. This is
426 because directory atimes tend to be reset by a lot of system
427 administrative tasks, such as \cw{cron} jobs which scan the file
428 system for one reason or another \dash or even other invocations of
429 \cw{agedu} itself, though it tries to avoid modifying any atimes if
430 possible. So the literal atimes on directories are typically not
431 representative of how long ago the data in question was last
432 accessed with real intent to use that data in particular.
433
434 \lcont{
435
436 Instead, \cw{agedu} makes up a fake atime for every directory it
437 scans, which is equal to the newest atime of any file in or below
438 that directory (or the directory's last \e{modification} time,
439 whichever is newest). This is based on the assumption that all
440 \e{important} accesses to directories are actually accesses to the
441 files inside those directories, so that when any file is accessed
442 all the directories on the path leading to it should be considered
443 to have been accessed as well.
444
445 In unusual cases it is possible that a directory itself might embody
446 important data which is accessed by reading the directory. In that
447 situation, \cw{agedu}'s atime-faking policy will misreport the
448 directory as disused. In the unlikely event that such directories
449 form a significant part of your disk space usage, you might want to
450 turn off the faking. The \cw{--dir-atime} option does this: it
451 causes the disk scan to read the original atimes of the directories
452 it scans.
453
454 The faking of atimes on directories also requires a processing pass
455 over the index file after the main disk scan is complete.
456 \cw{--dir-atime} also turns this pass off. Hence, this option
457 affects the \cw{-L} option as well as \cw{-s} and \cw{-S}.
458
459 (The previous section mentioned that there might be subtle
460 differences between the output of \cw{agedu -s /path -D} and
461 \cw{agedu -S /path}. This is why. Doing a scan with \cw{-s} and then
462 dumping it with \cw{-D} will dump the fully faked atimes on the
463 directories, whereas doing a scan-to-dump with \cw{-S} will dump
464 only \e{partially} faked atimes \dash specifically, each directory's
465 last modification time \dash since the subsequent processing pass
466 will not have had a chance to take place. However, loading either of
467 the resulting dump files with \cw{-L} will perform the atime-faking
468 processing pass, leading to the same data in the index file in each
469 case. In normal usage it should be safe to ignore all of this
470 complexity.)
471
472 }
473
474 \dt \cw{--mtime}
475
476 \dd This option causes \cw{agedu} to index files by their last
477 modification time instead of their last access time. You might want
478 to use this if your last access times were completely useless for
479 some reason: for example, if you had recently searched every file on
480 your system, the system would have lost all the information about
481 what files you hadn't recently accessed before then. Using this
482 option is liable to be less effective at finding genuinely wasted
483 space than the normal mode (that is, it will be more likely to flag
484 things as disused when they're not, so you will have more candidates
485 to go through by hand looking for data you don't need), but may be
486 better than nothing if your last-access times are unhelpful.
487
488 The following option affects all the modes that generate reports:
489 the web server mode \cw{-w}, the stand-alone HTML generation mode
490 \cw{-H} and the text report mode \cw{-t}.
491
492 \dt \cw{--files}
493
494 \dd This option causes \cw{agedu}'s reports to list the individual
495 files in each directory, instead of just giving a combined report
496 for everything that's not in a subdirectory.
497
498 The following options affect the stand-alone HTML generation mode
499 \cw{-H} and the text report mode \cw{-t}.
500
501 \dt \cw{-d} \e{depth} or \cw{--depth} \e{depth}
502
503 \dd This option controls the maximum depth to which \cw{agedu}
504 recurses when generating a text or HTML report.
505
506 \lcont{
507
508 In text mode, the default is 1, meaning that the report will include
509 the directory given on the command line and all of its immediate
510 subdirectories. A depth of two includes another level below that,
511 and so on; a depth of zero means \e{only} the directory on the
512 command line.
513
514 In HTML mode, specifying this option switches \cw{agedu} from
515 writing out a single HTML file to writing out multiple files which
516 link to each other. A depth of 1 means \cw{agedu} will write out an
517 HTML file for the given directory and also one for each of its
518 immediate subdirectories.
519
520 If you want \cw{agedu} to recurse as deeply as possible, give the
521 special word \cq{max} as an argument to \cw{-d}.
522
523 }
524
525 \dt \cw{-o} \e{filename} or \cw{--output} \e{filename}
526
527 \dd This option is used to specify an output file for \cw{agedu} to
528 write its report to. In text mode or single-file HTML mode, the
529 argument is treated as the name of a file. In multiple-file HTML
530 mode, the argument is treated as the name of a directory: the
531 directory will be created if it does not already exist, and the
532 output HTML files will be created inside it.
533
534 The following options affect the web server mode \cw{-w}, and in one
535 case also the stand-alone HTML generation mode \cw{-H}:
536
537 \dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range}
538
539 \dd The HTML reports produced by \cw{agedu} use a range of colours
540 to indicate how long ago data was last accessed, running from red
541 (representing the most disused data) to green (representing the
542 newest). By default, the lengths of time represented by the two ends
543 of that spectrum are chosen by examining the data file to see what
544 range of ages appears in it. However, you might want to set your own
545 limits, and you can do this using \cw{-r}.
546
547 \lcont{
548
549 The argument to \cw{-r} consists of a single age, or two ages
550 separated by a minus sign. An age is a number, followed by one of
551 \cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days).
552 The first age in the range represents the oldest data, and will be
553 coloured red in the HTML; the second age represents the newest,
554 coloured green. If the second age is not specified, it will default
555 to zero (so that green means data which has been accessed \e{just
556 now}).
557
558 For example, \cw{-r 2y} will mark data in red if it has been unused
559 for two years or more, and green if it has been accessed just now.
560 \cw{-r 2y-3m} will similarly mark data red if it has been unused for
561 two years or more, but will mark it green if it has been accessed
562 three months ago or later.
563
564 }
565
566 \dt \cw{--address} \e{addr}[\cw{:}\e{port}]
567
568 \dd Specifies the network address and port number on which
569 \cw{agedu} should listen when running its web server. If you want
570 \cw{agedu} to listen for connections coming in from any source, you
571 should probably specify the special IP address \cw{0.0.0.0}. If the
572 port number is omitted, an arbitrary unused port will be chosen for
573 you and displayed.
574
575 \lcont{
576
577 If you specify this option, \cw{agedu} will not print its URL on
578 standard output (since you are expected to know what address you
579 told it to listen to).
580
581 }
582
583 \dt \cw{--auth} \e{auth-type}
584
585 \dd Specifies how \cw{agedu} should control access to the web pages
586 it serves. The options are as follows:
587
588 \lcont{
589
590 \dt \cw{magic}
591
592 \dd This option only works on Linux, and only when the incoming
593 connection is from the same machine that \cw{agedu} is running on.
594 On Linux, the special file \cw{/proc/net/tcp} contains a list of
595 network connections currently known to the operating system kernel,
596 including which user id created them. So \cw{agedu} will look up
597 each incoming connection in that file, and allow access if it comes
598 from the same user id under which \cw{agedu} itself is running.
599 Therefore, in \cw{agedu}'s normal web server mode, you can safely
600 run it on a multi-user machine and no other user will be able to
601 read data out of your index file.
602
603 \dt \cw{basic}
604
605 \dd In this mode, \cw{agedu} will use HTTP Basic authentication: the
606 user will have to provide a username and password via their browser.
607 \cw{agedu} will normally make up a username and password for the
608 purpose, but you can specify your own; see below.
609
610 \dt \cw{none}
611
612 \dd In this mode, the web server is unauthenticated: anyone
613 connecting to it has full access to the reports generated by
614 \cw{agedu}. Do not do this unless there is nothing confidential at
615 all in your index file, or unless you are certain that nobody but
616 you can run processes on your computer.
617
618 \dt \cw{default}
619
620 \dd This is the default mode if you do not specify one of the above.
621 In this mode, \cw{agedu} will attempt to use Linux magic
622 authentication, but if it detects at startup time that
623 \cw{/proc/net/tcp} is absent or non-functional then it will fall
624 back to using HTTP Basic authentication and invent a user name and
625 password.
626
627 }
628
629 \dt \cw{--auth-file} \e{filename} or \cw{--auth-fd} \e{fd}
630
631 \dd When \cw{agedu} is using HTTP Basic authentication, these
632 options allow you to specify your own user name and password. If you
633 specify \cw{--auth-file}, these will be read from the specified
634 file; if you specify \cw{--auth-fd} they will instead be read from a
635 given file descriptor which you should have arranged to pass to
636 \cw{agedu}. In either case, the authentication details should
637 consist of the username, followed by a colon, followed by the
638 password, followed \e{immediately} by end of file (no trailing
639 newline, or else it will be considered part of the password).
640
641 \U LIMITATIONS
642
643 The data file is pretty large. The core of \cw{agedu} is the
644 tree-based data structure it uses in its index in order to
645 efficiently perform the queries it needs; this data structure
646 requires \cw{O(N log N)} storage. This is larger than you might
647 expect; a scan of my own home directory, containing half a million
648 files and directories and about 20Gb of data, produced an index file
649 over 60Mb in size. Furthermore, since the data file must be
650 memory-mapped during most processing, it can never grow larger than
651 available address space, so a \e{really} big filesystem may need to
652 be indexed on a 64-bit computer. (This is one reason for the
653 existence of the \cw{-D} and \cw{-L} options: you can do the
654 scanning on the machine with access to the filesystem, and the
655 indexing on a machine big enough to handle it.)
656
657 The data structure also does not usefully permit access control
658 within the data file, so it would be difficult \dash even given the
659 willingness to do additional coding \dash to run a system-wide
660 \cw{agedu} scan on a \cw{cron} job and serve the right subset of
661 reports to each user.
662
663 In certain circumstances, \cw{agedu} can report false positives
664 (reporting files as disused which are in fact in use) as well as the
665 more benign false negatives (reporting files as in use which are
666 not). This arises when a file is, semantically speaking, \q{read}
667 without actually being physically \e{read}. Typically this occurs
668 when a program checks whether the file's mtime has changed and only
669 bothers re-reading it if it has; programs which do this include
670 \cw{rsync}(\e{1}) and \cw{make}(\e{1}). Such programs will fail to
671 update the atime of unmodified files despite depending on their
672 continued existence; a directory full of such files will be reported
673 as disused by \cw{agedu} but deleting them will cause trouble.
674
675 \U LICENCE
676
677 \cw{agedu} is free software, distributed under the MIT licence. Type
678 \cw{agedu --licence} to see the full licence text.
679
680 \versionid $Id$