Change the magic number used to introduce a trie file, so that instead
[sgt/agedu] / agedu.but
CommitLineData
67159944 1\cfg{man-identity}{agedu}{1}{2008-11-02}{Simon Tatham}{Simon Tatham}
2
3\define{dash} \u2013{-}
4
5\title Man page for \cw{agedu}
6
7\U NAME
8
61df92dc 9\cw{agedu} \dash correlate disk usage with last-access times to
10identify large and disused data
67159944 11
12\U SYNOPSIS
13
14\c agedu [ options ] action [action...]
15\e bbbbb iiiiiii iiiiii iiiiii
16
17\U DESCRIPTION
18
19\cw{agedu} scans a directory tree and produces reports about how
20much disk space is used in each directory and subdirectory, and also
21how that usage of disk space corresponds to files with last-access
22times a long time ago.
23
24In other words, \cw{agedu} is a tool you might use to help you free
25up disk space. It lets you see which directories are taking up the
26most space, as \cw{du} does; but unlike \cw{du}, it also
27distinguishes between large collections of data which are still in
28use and ones which have not been accessed in months or years \dash
29for instance, large archives downloaded, unpacked, used once, and
30never cleaned up. Where \cw{du} helps you find what's using your
31disk space, \cw{agedu} helps you find what's \e{wasting} your disk
32space.
33
34\cw{agedu} has several operating modes. In one mode, it scans your
35disk and builds an index file containing a data structure which
36allows it to efficiently retrieve any information it might need.
37Typically, you would use it in this mode first, and then run it in
38one of a number of \q{query} modes to display a report of the disk
39space usage of a particular directory and its subdirectories. Those
40reports can be produced as plain text (much like \cw{du}) or as
41HTML. \cw{agedu} can even run as a miniature web server, presenting
42each directory's HTML report with hyperlinks to let you navigate
43around the file system to similar reports for other directories.
44
45So you would typically start using \cw{agedu} by telling it to do a
46scan of a directory tree and build an index. This is done with a
47command such as
48
49\c $ agedu -s /home/fred
50\e bbbbbbbbbbbbbbbbbbb
51
52which will build a large data file called \c{agedu.dat} in your
53current directory. (If that current directory is \e{inside}
54\cw{/home/fred}, don't worry \dash \cw{agedu} is smart enough to
55discount its own index file.)
56
57Having built the index, you would now query it for reports of disk
58space usage. If you have a graphical web browser, the simplest and
59nicest way to query the index is by running \cw{agedu} in web server
60mode:
61
62\c $ agedu -w
63\e bbbbbbbb
64
65which will print (among other messages) a URL on its standard output
66along the lines of
67
4a9c130c 68\c URL: http://127.0.0.1:48638/
67159944 69
70(That URL will always begin with \cq{127.}, meaning that it's in the
e6fde1f7 71\cw{localhost} address space. So only processes running on the same
72computer can even try to connect to that web server, and also there
73is access control to prevent other users from seeing it \dash see
74below for more detail.)
67159944 75
76Now paste that URL into your web browser, and you will be shown a
77graphical representation of the disk usage in \cw{/home/fred} and
78its immediate subdirectories, with varying colours used to show the
79difference between disused and recently-accessed data. Click on any
80subdirectory to descend into it and see a report for its
81subdirectories in turn; click on parts of the pathname at the top of
82any page to return to higher-level directories. When you've finished
83browsing, you can just press Ctrl-D to send an end-of-file
84indication to \cw{agedu}, and it will shut down.
85
86After that, you probably want to delete the data file
87\cw{agedu.dat}, since it's pretty large. In fact, the command
88\cw{agedu -R} will do this for you; and you can chain \cw{agedu}
89commands on the same command line, so that instead of the above you
90could have done
91
92\c $ agedu -s /home/fred -w -R
93\e bbbbbbbbbbbbbbbbbbbbbbbbb
94
95for a single self-contained run of \cw{agedu} which builds its
96index, serves web pages from it, and cleans it up when finished.
97
98If you don't have a graphical web browser, you can do text-based
99queries as well. Having scanned \cw{/home/fred} as above, you might
100run
101
102\c $ agedu -t /home/fred
103\e bbbbbbbbbbbbbbbbbbb
104
105which again gives a summary of the disk usage in \cw{/home/fred} and
106its immediate subdirectories; but this time \cw{agedu} will print it
107on standard output, in much the same format as \cw{du}. If you then
108want to find out how much \e{old} data is there, you can add the
109\cw{-a} option to show only files last accessed a certain length of
110time ago. For example, to show only files which haven't been looked
111at in six months or more:
112
113\c $ agedu -t /home/fred -a 6m
114\e bbbbbbbbbbbbbbbbbbbbbbbbb
115
116That's the essence of what \cw{agedu} does. It has other modes of
117operation for more complex situations, and the usual array of
118configurable options. The following sections contain a complete
119reference for all its functionality.
120
121\U OPERATING MODES
122
123This section describes the operating modes supported by \cw{agedu}.
124Each of these is in the form of a command-line option, sometimes
125with an argument. Multiple operating-mode options may appear on the
126command line, in which case \cw{agedu} will perform the specified
127actions one after another. For instance, as shown in the previous
128section, you might want to perform a disk scan and immediately
129launch a web server giving reports from that scan.
130
131\dt \cw{-s} \e{directory} or \cw{--scan} \e{directory}
132
133\dd In this mode, \cw{agedu} scans the file system starting at the
134specified directory, and indexes the results of the scan into a
135large data file which other operating modes can query.
136
137\lcont{
138
139By default, the scan is restricted to a single file system (since
140the expected use of \cw{agedu} is that you would probably use it
141because a particular disk partition was running low on space). You
142can remove that restriction using the \cw{--cross-fs} option; other
143configuration options allow you to include or exclude files or
144entire subdirectories from the scan. See the next section for full
145details of the configurable options.
146
147The index file is created with restrictive permissions, in case the
148file system you are scanning contains confidential information in
149its structure.
150
151Index files are dependent on the characteristics of the CPU
152architecture you created them on. You should not expect to be able
153to move an index file between different types of computer and have
154it continue to work. If you need to transfer the results of a disk
155scan to a different kind of computer, see the \cw{-D} and \cw{-L}
156options below.
157
158}
159
160\dt \cw{-w} or \cw{--web}
161
162\dd In this mode, \cw{agedu} expects to find an index file already
163written. It allocates a network port, and starts up a web server on
164that port which serves reports generated from the index file. By
165default it invents its own URL and prints it out.
166
167\lcont{
168
a8a4d6d8 169The web server runs until \cw{agedu} receives an end-of-file event on
170its standard input. (The expected usage is that you run it from the
171command line, immediately browse web pages until you're satisfied, and
172then press Ctrl-D.) To disable the EOF behaviour, use the
173\cw{--no-eof} option.
67159944 174
175In case the index file contains any confidential information about
176your file system, the web server protects the pages it serves from
177access by other people. On Linux, this is done transparently by
178means of using \cw{/proc/net/tcp} to check the owner of each
179incoming connection; failing that, the web server will require a
180password to view the reports, and \cw{agedu} will print the password
181it invented on standard output along with the URL.
182
183Configurable options for this mode let you specify your own address
184and port number to listen on, and also specify your own choice of
185authentication method (including turning authentication off
186completely) and a username and password of your choice.
187
188}
189
190\dt \cw{-t} \e{directory} or \cw{--text} \e{directory}
191
192\dd In this mode, \cw{agedu} generates a textual report on standard
193output, listing the disk usage in the specified directory and all
00c5e40c 194its subdirectories down to a given depth. By default that depth is
67159944 1951, so that you see a report for \e{directory} itself and all of its
00c5e40c 196immediate subdirectories. You can configure a different depth (or no
197depth limit) using \cw{-d}, described in the next section.
67159944 198
199\lcont{
200
201Used on its own, \cw{-t} merely lists the \e{total} disk usage in
202each subdirectory; \cw{agedu}'s additional ability to distinguish
203unused from recently-used data is not activated. To activate it, use
204the \cw{-a} option to specify a minimum age.
205
206The directory structure stored in \cw{agedu}'s index file is treated
207as a set of literal strings. This means that you cannot refer to
208directories by synonyms. So if you ran \cw{agedu -s .}, then all the
209path names you later pass to the \cw{-t} option must be either
210\cq{.} or begin with \cq{./}. Similarly, symbolic links within the
211directory you scanned will not be followed; you must refer to each
212directory by its canonical, symlink-free pathname.
213
214}
215
216\dt \cw{-R} or \cw{--remove}
217
218\dd In this mode, \cw{agedu} deletes its index file. Running just
219\cw{agedu -R} on its own is therefore equivalent to typing \cw{rm
220agedu.dat}. However, you can also put \cw{-R} on the end of a
221command line to indicate that \cw{agedu} should delete its index
222file after it finishes performing other operations.
223
224\dt \cw{-D} or \cw{--dump}
225
226\dd In this mode, \cw{agedu} reads an existing index file and
227produces a dump of its contents on standard output. This dump can
228later be loaded into a new index file, perhaps on another computer.
229
230\dt \cw{-L} or \cw{--load}
231
232\dd In this mode, \cw{agedu} expects to read a dump produced by the
233\cw{-D} option from its standard input. It constructs an index file
234from that dump, exactly as it would have if it had read the same
235data from a disk scan in \cw{-s} mode.
236
237\dt \cw{-S} \e{directory} or \cw{--scan-dump} \e{directory}
238
239\dd In this mode, \cw{agedu} will scan a directory tree and convert
240the results straight into a dump on standard output, without
241generating an index file at all. So running \cw{agedu -S /path}
242should produce equivalent output to that of \cw{agedu -s /path -D},
243except that the latter will produce an index file as a side effect
244whereas \cw{-S} will not.
245
246\lcont{
247
e6fde1f7 248(The output will not be exactly \e{identical}, due to a
249difference in treatment of last-access times on directories.
250However, it should be effectively equivalent for most purposes. See
251the documentation of the \cw{--dir-atime} option in the next section
252for further detail.)
67159944 253
254}
255
256\dt \cw{-H} \e{directory} or \cw{--html} \e{directory}
257
258\dd In this mode, \cw{agedu} will generate an HTML report of the
259disk usage in the specified directory and its immediate
260subdirectories, in the same form that it serves from its web server
00c5e40c 261in \cw{-w} mode.
262
263\lcont{
264
265By default, a single HTML report will be generated and simply
266written to standard output, with no hyperlinks pointing to other
267similar pages. If you also specify the \cw{-d} option (see below),
268\cw{agedu} will instead write out a collection of HTML files with
269hyperlinks between them, and call the top-level file
270\cw{index.html}.
271
272}
67159944 273
a2d04613 274\dt \cw{--cgi}
275
276\dd In this mode, \cw{agedu} will run as the bulk of a CGI script
277which provides the same set of web pages as the built-in web server
278would. It will read the usual CGI environment variables, and write
279CGI-style data to its standard output.
280
281\lcont{
282
283The actual CGI program itself should be a tiny wrapper around
284\cw{agedu} which passes it the \cw{--cgi} option, and also
285(probably) \cw{-f} to locate the index file. \cw{agedu} will do
286everything else.
287
288No access control is performed in this mode: restricting access to
289CGI scripts is assumed to be the job of the web server.
290
291}
292
2d47a4d2 293\dt \cw{-h} or \cw{--help}
294
295\dd Causes \cw{agedu} to print some help text and terminate
296immediately.
297
298\dt \cw{-V} or \cw{--version}
299
300\dd Causes \cw{agedu} to print its version number and terminate
301immediately.
302
67159944 303\U OPTIONS
304
305This section describes the various configuration options that affect
306\cw{agedu}'s operation in one mode or another.
307
308The following option affects nearly all modes (except \cw{-S}):
309
310\dt \cw{-f} \e{filename} or \cw{--file} \e{filename}
311
312\dd Specifies the location of the index file which \cw{agedu}
313creates, reads or removes depending on its operating mode. By
314default, this is simply \cq{agedu.dat}, in whatever is the current
315working directory when you run \cw{agedu}.
316
317The following options affect the disk-scanning modes, \cw{-s} and
318\cw{-S}:
319
320\dt \cw{--cross-fs} and \cw{--no-cross-fs}
321
322\dd These configure whether or not the disk scan is permitted to
323cross between different file systems. The default is not to:
324\cw{agedu} will normally skip over subdirectories on which a
325different file system is mounted. This makes it convenient when you
326want to free up space on a particular file system which is running
327low. However, in other circumstances you might wish to see general
328information about the use of space no matter which file system it's
329on (for instance, if your real concern is your backup media running
330out of space, and if your backups do not treat different file
331systems specially); in that situation, use \cw{--cross-fs}.
332
333\lcont{
334
335(Note that this default is the opposite way round from the
336corresponding option in \cw{du}.)
337
338}
339
340\dt \cw{--prune} \e{wildcard} and \cw{--prune-path} \e{wildcard}
341
342\dd These cause particular files or directories to be omitted
343entirely from the scan. If \cw{agedu}'s scan encounters a file or
344directory whose name matches the wildcard provided to the
345\cw{--prune} option, it will not include that file in its index, and
346also if it's a directory it will skip over it and not scan its
347contents.
348
349\lcont{
350
351Note that in most Unix shells, wildcards will probably need to be
352escaped on the command line, to prevent the shell from expanding the
353wildcard before \cw{agedu} sees it.
354
355\cw{--prune-path} is similar to \cw{--prune}, except that the
356wildcard is matched against the entire pathname instead of just the
357filename at the end of it. So whereas \cw{--prune *a*b*} will match
358any file whose actual name contains an \cw{a} somewhere before a
359\cw{b}, \cw{--prune-path *a*b*} will also match a file whose name
360contains \cw{b} and which is inside a directory containing an
361\cw{a}, or any file inside a directory of that form, and so on.
362
363}
364
365\dt \cw{--exclude} \e{wildcard} and \cw{--exclude-path} \e{wildcard}
366
367\dd These cause particular files or directories to be omitted from
368the index, but not from the scan. If \cw{agedu}'s scan encounters a
369file or directory whose name matches the wildcard provided to the
370\cw{--exclude} option, it will not include that file in its index
371\dash but unlike \cw{--prune}, if the file in question is a
372directory it will still scan its contents and index them if they are
373not ruled out themselves by \cw{--exclude} options.
374
375\lcont{
376
377As above, \cw{--exclude-path} is similar to \cw{--exclude}, except
378that the wildcard is matched against the entire pathname.
379
380}
381
382\dt \cw{--include} \e{wildcard} and \cw{--include-path} \e{wildcard}
383
384\dd These cause particular files or directories to be re-included in
385the index and the scan, if they had previously been ruled out by one
386of the above exclude or prune options. You can interleave include,
387exclude and prune options as you wish on the command line, and if
388more than one of them applies to a file then the last one takes
389priority.
390
391\lcont{
392
393For example, if you wanted to see only the disk space taken up by
394MP3 files, you might run
395
396\c $ agedu -s . --exclude '*' --include '*.mp3'
397\e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
398
399which will cause everything to be omitted from the scan, but then
400the MP3 files to be put back in. If you then wanted only a subset of
401those MP3s, you could then exclude some of them again by adding,
402say, \cq{--exclude-path './queen/*'} (or, more efficiently,
403\cq{--prune ./queen}) on the end of that command.
404
405As with the previous two options, \cw{--include-path} is similar to
406\cw{--include} except that the wildcard is matched against the
407entire pathname.
408
409}
410
411\dt \cw{--progress}, \cw{--no-progress} and \cw{--tty-progress}
412
413\dd When \cw{agedu} is scanning a directory tree, it will typically
414print a one-line progress report every second showing where it has
415reached in the scan, so you can have some idea of how much longer it
416will take. (Of course, it can't predict \e{exactly} how long it will
417take, since it doesn't know which of the directories it hasn't
418scanned yet will turn out to be huge.)
419
420\lcont{
421
422By default, those progress reports are displayed on \cw{agedu}'s
423standard error channel, if that channel points to a terminal device.
424If you need to manually enable or disable them, you can use the
425above three options to do so: \cw{--progress} unconditionally
426enables the progress reports, \cw{--no-progress} unconditionally
427disables them, and \cw{--tty-progress} reverts to the default
428behaviour which is conditional on standard error being a terminal.
429
430}
431
432\dt \cw{--dir-atime} and \cw{--no-dir-atime}
433
434\dd In normal operation, \cw{agedu} ignores the atimes (last access
435times) on the \e{directories} it scans: it only pays attention to
436the atimes of the \e{files} inside those directories. This is
437because directory atimes tend to be reset by a lot of system
438administrative tasks, such as \cw{cron} jobs which scan the file
439system for one reason or another \dash or even other invocations of
440\cw{agedu} itself, though it tries to avoid modifying any atimes if
441possible. So the literal atimes on directories are typically not
442representative of how long ago the data in question was last
443accessed with real intent to use that data in particular.
444
445\lcont{
446
447Instead, \cw{agedu} makes up a fake atime for every directory it
448scans, which is equal to the newest atime of any file in or below
449that directory (or the directory's last \e{modification} time,
450whichever is newest). This is based on the assumption that all
451\e{important} accesses to directories are actually accesses to the
452files inside those directories, so that when any file is accessed
453all the directories on the path leading to it should be considered
454to have been accessed as well.
455
456In unusual cases it is possible that a directory itself might embody
457important data which is accessed by reading the directory. In that
458situation, \cw{agedu}'s atime-faking policy will misreport the
459directory as disused. In the unlikely event that such directories
460form a significant part of your disk space usage, you might want to
461turn off the faking. The \cw{--dir-atime} option does this: it
462causes the disk scan to read the original atimes of the directories
463it scans.
464
465The faking of atimes on directories also requires a processing pass
466over the index file after the main disk scan is complete.
467\cw{--dir-atime} also turns this pass off. Hence, this option
468affects the \cw{-L} option as well as \cw{-s} and \cw{-S}.
469
470(The previous section mentioned that there might be subtle
471differences between the output of \cw{agedu -s /path -D} and
472\cw{agedu -S /path}. This is why. Doing a scan with \cw{-s} and then
473dumping it with \cw{-D} will dump the fully faked atimes on the
474directories, whereas doing a scan-to-dump with \cw{-S} will dump
475only \e{partially} faked atimes \dash specifically, each directory's
476last modification time \dash since the subsequent processing pass
477will not have had a chance to take place. However, loading either of
478the resulting dump files with \cw{-L} will perform the atime-faking
479processing pass, leading to the same data in the index file in each
480case. In normal usage it should be safe to ignore all of this
481complexity.)
482
483}
484
f59a5d34 485\dt \cw{--mtime}
486
487\dd This option causes \cw{agedu} to index files by their last
488modification time instead of their last access time. You might want
489to use this if your last access times were completely useless for
490some reason: for example, if you had recently searched every file on
491your system, the system would have lost all the information about
492what files you hadn't recently accessed before then. Using this
493option is liable to be less effective at finding genuinely wasted
494space than the normal mode (that is, it will be more likely to flag
495things as disused when they're not, so you will have more candidates
496to go through by hand looking for data you don't need), but may be
497better than nothing if your last-access times are unhelpful.
498
19199555 499\lcont{
500
501Another use for this mode might be to find \e{recently created}
502large data. If your disk has been gradually filling up for years,
503the default mode of \cw{agedu} will let you find unused data to
504delete; but if you know your disk had plenty of space recently and
505now it's suddenly full, and you suspect that some rogue program has
506left a large core dump or output file, then \cw{agedu --mtime} might
507be a convenient way to locate the culprit.
508
509}
510
16139d21 511The following option affects all the modes that generate reports:
512the web server mode \cw{-w}, the stand-alone HTML generation mode
513\cw{-H} and the text report mode \cw{-t}.
514
515\dt \cw{--files}
516
517\dd This option causes \cw{agedu}'s reports to list the individual
518files in each directory, instead of just giving a combined report
519for everything that's not in a subdirectory.
520
2d47a4d2 521The following option affects the text report mode \cw{-t}.
522
523\dt \cw{-a} \e{age} or \cw{--age} \e{age}
524
525\dd This option tells \cw{agedu} to report only files of at least the
526specified age. An age is specified as a number, followed by one of
527\cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days).
528(This syntax is also used by the \cw{-r} option.) For example, \cw{-a
5296m} will produce a text report which includes only files at least six
530months old.
531
00c5e40c 532The following options affect the stand-alone HTML generation mode
533\cw{-H} and the text report mode \cw{-t}.
534
535\dt \cw{-d} \e{depth} or \cw{--depth} \e{depth}
536
537\dd This option controls the maximum depth to which \cw{agedu}
538recurses when generating a text or HTML report.
539
540\lcont{
541
542In text mode, the default is 1, meaning that the report will include
543the directory given on the command line and all of its immediate
544subdirectories. A depth of two includes another level below that,
545and so on; a depth of zero means \e{only} the directory on the
546command line.
547
548In HTML mode, specifying this option switches \cw{agedu} from
549writing out a single HTML file to writing out multiple files which
550link to each other. A depth of 1 means \cw{agedu} will write out an
551HTML file for the given directory and also one for each of its
552immediate subdirectories.
553
554If you want \cw{agedu} to recurse as deeply as possible, give the
555special word \cq{max} as an argument to \cw{-d}.
556
557}
558
559\dt \cw{-o} \e{filename} or \cw{--output} \e{filename}
560
561\dd This option is used to specify an output file for \cw{agedu} to
562write its report to. In text mode or single-file HTML mode, the
563argument is treated as the name of a file. In multiple-file HTML
564mode, the argument is treated as the name of a directory: the
565directory will be created if it does not already exist, and the
566output HTML files will be created inside it.
567
494ef23b 568The following options affect the web server mode \cw{-w}, and in some
569cases also the stand-alone HTML generation mode \cw{-H}:
67159944 570
571\dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range}
572
573\dd The HTML reports produced by \cw{agedu} use a range of colours
574to indicate how long ago data was last accessed, running from red
575(representing the most disused data) to green (representing the
576newest). By default, the lengths of time represented by the two ends
577of that spectrum are chosen by examining the data file to see what
578range of ages appears in it. However, you might want to set your own
579limits, and you can do this using \cw{-r}.
580
581\lcont{
582
583The argument to \cw{-r} consists of a single age, or two ages
584separated by a minus sign. An age is a number, followed by one of
585\cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days).
2d47a4d2 586(This syntax is also used by the \cw{-a} option.) The first age in the
587range represents the oldest data, and will be coloured red in the
588HTML; the second age represents the newest, coloured green. If the
589second age is not specified, it will default to zero (so that green
590means data which has been accessed \e{just now}).
67159944 591
592For example, \cw{-r 2y} will mark data in red if it has been unused
593for two years or more, and green if it has been accessed just now.
594\cw{-r 2y-3m} will similarly mark data red if it has been unused for
595two years or more, but will mark it green if it has been accessed
596three months ago or later.
597
598}
599
600\dt \cw{--address} \e{addr}[\cw{:}\e{port}]
601
6f25b662 602\dd Specifies the network address and port number on which \cw{agedu}
603should listen when running its web server. If you want \cw{agedu} to
604listen for connections coming in from any source, specify the address
605as the special value \cw{ANY}. If the port number is omitted, an
606arbitrary unused port will be chosen for you and displayed.
67159944 607
608\lcont{
609
610If you specify this option, \cw{agedu} will not print its URL on
611standard output (since you are expected to know what address you
612told it to listen to).
613
614}
615
616\dt \cw{--auth} \e{auth-type}
617
618\dd Specifies how \cw{agedu} should control access to the web pages
619it serves. The options are as follows:
620
621\lcont{
622
623\dt \cw{magic}
624
625\dd This option only works on Linux, and only when the incoming
626connection is from the same machine that \cw{agedu} is running on.
627On Linux, the special file \cw{/proc/net/tcp} contains a list of
628network connections currently known to the operating system kernel,
629including which user id created them. So \cw{agedu} will look up
630each incoming connection in that file, and allow access if it comes
631from the same user id under which \cw{agedu} itself is running.
632Therefore, in \cw{agedu}'s normal web server mode, you can safely
633run it on a multi-user machine and no other user will be able to
634read data out of your index file.
635
636\dt \cw{basic}
637
638\dd In this mode, \cw{agedu} will use HTTP Basic authentication: the
639user will have to provide a username and password via their browser.
640\cw{agedu} will normally make up a username and password for the
641purpose, but you can specify your own; see below.
642
643\dt \cw{none}
644
645\dd In this mode, the web server is unauthenticated: anyone
646connecting to it has full access to the reports generated by
647\cw{agedu}. Do not do this unless there is nothing confidential at
648all in your index file, or unless you are certain that nobody but
649you can run processes on your computer.
650
651\dt \cw{default}
652
653\dd This is the default mode if you do not specify one of the above.
654In this mode, \cw{agedu} will attempt to use Linux magic
655authentication, but if it detects at startup time that
656\cw{/proc/net/tcp} is absent or non-functional then it will fall
657back to using HTTP Basic authentication and invent a user name and
658password.
659
660}
661
662\dt \cw{--auth-file} \e{filename} or \cw{--auth-fd} \e{fd}
663
664\dd When \cw{agedu} is using HTTP Basic authentication, these
665options allow you to specify your own user name and password. If you
666specify \cw{--auth-file}, these will be read from the specified
667file; if you specify \cw{--auth-fd} they will instead be read from a
668given file descriptor which you should have arranged to pass to
669\cw{agedu}. In either case, the authentication details should
670consist of the username, followed by a colon, followed by the
671password, followed \e{immediately} by end of file (no trailing
672newline, or else it will be considered part of the password).
673
494ef23b 674\dt \cw{--title} \e{title}
675
a6b4f24c 676\dd Specify the string that appears at the start of the \cw{<title>}
494ef23b 677section of the output HTML pages. The default is \cq{agedu}. This
678title is followed by a colon and then the path you're viewing within
679the index file. You might use this option if you were serving
680\cw{agedu} reports for several different servers and wanted to make it
681clearer which one a user was looking at.
682
a8a4d6d8 683\dt \cw{--no-eof}
684
685\dd Stop \cw{agedu} in web server mode from looking for end-of-file on
686standard input and treating it as a signal to terminate.
687
67159944 688\U LIMITATIONS
689
690The data file is pretty large. The core of \cw{agedu} is the
691tree-based data structure it uses in its index in order to
692efficiently perform the queries it needs; this data structure
693requires \cw{O(N log N)} storage. This is larger than you might
694expect; a scan of my own home directory, containing half a million
695files and directories and about 20Gb of data, produced an index file
522edd92 696over 60Mb in size. Furthermore, since the data file must be
697memory-mapped during most processing, it can never grow larger than
61df92dc 698available address space, so a \e{really} big filesystem may need to
522edd92 699be indexed on a 64-bit computer. (This is one reason for the
700existence of the \cw{-D} and \cw{-L} options: you can do the
701scanning on the machine with access to the filesystem, and the
702indexing on a machine big enough to handle it.)
67159944 703
67159944 704The data structure also does not usefully permit access control
705within the data file, so it would be difficult \dash even given the
706willingness to do additional coding \dash to run a system-wide
707\cw{agedu} scan on a \cw{cron} job and serve the right subset of
708reports to each user.
709
fe8eebd4 710In certain circumstances, \cw{agedu} can report false positives
711(reporting files as disused which are in fact in use) as well as the
712more benign false negatives (reporting files as in use which are
713not). This arises when a file is, semantically speaking, \q{read}
714without actually being physically \e{read}. Typically this occurs
715when a program checks whether the file's mtime has changed and only
716bothers re-reading it if it has; programs which do this include
717\cw{rsync}(\e{1}) and \cw{make}(\e{1}). Such programs will fail to
718update the atime of unmodified files despite depending on their
719continued existence; a directory full of such files will be reported
38df9096 720as disused by \cw{agedu} even in situations where deleting them will
721cause trouble.
722
723Finally, of course, \cw{agedu}'s normal usage mode depends critically
724on the OS providing last-access times which are at least approximately
725right. So a file system mounted with Linux's \cq{noatime} option, or
726the equivalent on any other OS, will not give useful results!
727(However, the Linux mount option \cq{relatime}, which distributions
728now tend to use by default, should be fine for all but specialist
729purposes: it reduces the accuracy of last-access times so that they
730might be wrong by up to 24 hours, but if you're looking for files that
731have been unused for months or years, that's not a problem.)
fe8eebd4 732
67159944 733\U LICENCE
734
735\cw{agedu} is free software, distributed under the MIT licence. Type
736\cw{agedu --licence} to see the full licence text.
737
738\versionid $Id$