Debian bug #537127: if we're going to ifdef based on HAVE_STAT64, we
[sgt/agedu] / agedu.but
CommitLineData
67159944 1\cfg{man-identity}{agedu}{1}{2008-11-02}{Simon Tatham}{Simon Tatham}
2
3\define{dash} \u2013{-}
4
5\title Man page for \cw{agedu}
6
7\U NAME
8
61df92dc 9\cw{agedu} \dash correlate disk usage with last-access times to
10identify large and disused data
67159944 11
12\U SYNOPSIS
13
14\c agedu [ options ] action [action...]
15\e bbbbb iiiiiii iiiiii iiiiii
16
17\U DESCRIPTION
18
19\cw{agedu} scans a directory tree and produces reports about how
20much disk space is used in each directory and subdirectory, and also
21how that usage of disk space corresponds to files with last-access
22times a long time ago.
23
24In other words, \cw{agedu} is a tool you might use to help you free
25up disk space. It lets you see which directories are taking up the
26most space, as \cw{du} does; but unlike \cw{du}, it also
27distinguishes between large collections of data which are still in
28use and ones which have not been accessed in months or years \dash
29for instance, large archives downloaded, unpacked, used once, and
30never cleaned up. Where \cw{du} helps you find what's using your
31disk space, \cw{agedu} helps you find what's \e{wasting} your disk
32space.
33
34\cw{agedu} has several operating modes. In one mode, it scans your
35disk and builds an index file containing a data structure which
36allows it to efficiently retrieve any information it might need.
37Typically, you would use it in this mode first, and then run it in
38one of a number of \q{query} modes to display a report of the disk
39space usage of a particular directory and its subdirectories. Those
40reports can be produced as plain text (much like \cw{du}) or as
41HTML. \cw{agedu} can even run as a miniature web server, presenting
42each directory's HTML report with hyperlinks to let you navigate
43around the file system to similar reports for other directories.
44
45So you would typically start using \cw{agedu} by telling it to do a
46scan of a directory tree and build an index. This is done with a
47command such as
48
49\c $ agedu -s /home/fred
50\e bbbbbbbbbbbbbbbbbbb
51
52which will build a large data file called \c{agedu.dat} in your
53current directory. (If that current directory is \e{inside}
54\cw{/home/fred}, don't worry \dash \cw{agedu} is smart enough to
55discount its own index file.)
56
57Having built the index, you would now query it for reports of disk
58space usage. If you have a graphical web browser, the simplest and
59nicest way to query the index is by running \cw{agedu} in web server
60mode:
61
62\c $ agedu -w
63\e bbbbbbbb
64
65which will print (among other messages) a URL on its standard output
66along the lines of
67
4a9c130c 68\c URL: http://127.0.0.1:48638/
67159944 69
70(That URL will always begin with \cq{127.}, meaning that it's in the
e6fde1f7 71\cw{localhost} address space. So only processes running on the same
72computer can even try to connect to that web server, and also there
73is access control to prevent other users from seeing it \dash see
74below for more detail.)
67159944 75
76Now paste that URL into your web browser, and you will be shown a
77graphical representation of the disk usage in \cw{/home/fred} and
78its immediate subdirectories, with varying colours used to show the
79difference between disused and recently-accessed data. Click on any
80subdirectory to descend into it and see a report for its
81subdirectories in turn; click on parts of the pathname at the top of
82any page to return to higher-level directories. When you've finished
83browsing, you can just press Ctrl-D to send an end-of-file
84indication to \cw{agedu}, and it will shut down.
85
86After that, you probably want to delete the data file
87\cw{agedu.dat}, since it's pretty large. In fact, the command
88\cw{agedu -R} will do this for you; and you can chain \cw{agedu}
89commands on the same command line, so that instead of the above you
90could have done
91
92\c $ agedu -s /home/fred -w -R
93\e bbbbbbbbbbbbbbbbbbbbbbbbb
94
95for a single self-contained run of \cw{agedu} which builds its
96index, serves web pages from it, and cleans it up when finished.
97
98If you don't have a graphical web browser, you can do text-based
99queries as well. Having scanned \cw{/home/fred} as above, you might
100run
101
102\c $ agedu -t /home/fred
103\e bbbbbbbbbbbbbbbbbbb
104
105which again gives a summary of the disk usage in \cw{/home/fred} and
106its immediate subdirectories; but this time \cw{agedu} will print it
107on standard output, in much the same format as \cw{du}. If you then
108want to find out how much \e{old} data is there, you can add the
109\cw{-a} option to show only files last accessed a certain length of
110time ago. For example, to show only files which haven't been looked
111at in six months or more:
112
113\c $ agedu -t /home/fred -a 6m
114\e bbbbbbbbbbbbbbbbbbbbbbbbb
115
116That's the essence of what \cw{agedu} does. It has other modes of
117operation for more complex situations, and the usual array of
118configurable options. The following sections contain a complete
119reference for all its functionality.
120
121\U OPERATING MODES
122
123This section describes the operating modes supported by \cw{agedu}.
124Each of these is in the form of a command-line option, sometimes
125with an argument. Multiple operating-mode options may appear on the
126command line, in which case \cw{agedu} will perform the specified
127actions one after another. For instance, as shown in the previous
128section, you might want to perform a disk scan and immediately
129launch a web server giving reports from that scan.
130
131\dt \cw{-s} \e{directory} or \cw{--scan} \e{directory}
132
133\dd In this mode, \cw{agedu} scans the file system starting at the
134specified directory, and indexes the results of the scan into a
135large data file which other operating modes can query.
136
137\lcont{
138
139By default, the scan is restricted to a single file system (since
140the expected use of \cw{agedu} is that you would probably use it
141because a particular disk partition was running low on space). You
142can remove that restriction using the \cw{--cross-fs} option; other
143configuration options allow you to include or exclude files or
144entire subdirectories from the scan. See the next section for full
145details of the configurable options.
146
147The index file is created with restrictive permissions, in case the
148file system you are scanning contains confidential information in
149its structure.
150
151Index files are dependent on the characteristics of the CPU
152architecture you created them on. You should not expect to be able
153to move an index file between different types of computer and have
154it continue to work. If you need to transfer the results of a disk
155scan to a different kind of computer, see the \cw{-D} and \cw{-L}
156options below.
157
158}
159
160\dt \cw{-w} or \cw{--web}
161
162\dd In this mode, \cw{agedu} expects to find an index file already
163written. It allocates a network port, and starts up a web server on
164that port which serves reports generated from the index file. By
165default it invents its own URL and prints it out.
166
167\lcont{
168
169The web server runs until \cw{agedu} receives an end-of-file event
170on its standard input. (The expected usage is that you run it from
171the command line, immediately browse web pages until you're
172satisfied, and then press Ctrl-D.)
173
174In case the index file contains any confidential information about
175your file system, the web server protects the pages it serves from
176access by other people. On Linux, this is done transparently by
177means of using \cw{/proc/net/tcp} to check the owner of each
178incoming connection; failing that, the web server will require a
179password to view the reports, and \cw{agedu} will print the password
180it invented on standard output along with the URL.
181
182Configurable options for this mode let you specify your own address
183and port number to listen on, and also specify your own choice of
184authentication method (including turning authentication off
185completely) and a username and password of your choice.
186
187}
188
189\dt \cw{-t} \e{directory} or \cw{--text} \e{directory}
190
191\dd In this mode, \cw{agedu} generates a textual report on standard
192output, listing the disk usage in the specified directory and all
193its subdirectories down to a fixed depth. By default that depth is
1941, so that you see a report for \e{directory} itself and all of its
195immediate subdirectories. You can configure a different depth using
196\cw{-d}, described in the next section.
197
198\lcont{
199
200Used on its own, \cw{-t} merely lists the \e{total} disk usage in
201each subdirectory; \cw{agedu}'s additional ability to distinguish
202unused from recently-used data is not activated. To activate it, use
203the \cw{-a} option to specify a minimum age.
204
205The directory structure stored in \cw{agedu}'s index file is treated
206as a set of literal strings. This means that you cannot refer to
207directories by synonyms. So if you ran \cw{agedu -s .}, then all the
208path names you later pass to the \cw{-t} option must be either
209\cq{.} or begin with \cq{./}. Similarly, symbolic links within the
210directory you scanned will not be followed; you must refer to each
211directory by its canonical, symlink-free pathname.
212
213}
214
215\dt \cw{-R} or \cw{--remove}
216
217\dd In this mode, \cw{agedu} deletes its index file. Running just
218\cw{agedu -R} on its own is therefore equivalent to typing \cw{rm
219agedu.dat}. However, you can also put \cw{-R} on the end of a
220command line to indicate that \cw{agedu} should delete its index
221file after it finishes performing other operations.
222
223\dt \cw{-D} or \cw{--dump}
224
225\dd In this mode, \cw{agedu} reads an existing index file and
226produces a dump of its contents on standard output. This dump can
227later be loaded into a new index file, perhaps on another computer.
228
229\dt \cw{-L} or \cw{--load}
230
231\dd In this mode, \cw{agedu} expects to read a dump produced by the
232\cw{-D} option from its standard input. It constructs an index file
233from that dump, exactly as it would have if it had read the same
234data from a disk scan in \cw{-s} mode.
235
236\dt \cw{-S} \e{directory} or \cw{--scan-dump} \e{directory}
237
238\dd In this mode, \cw{agedu} will scan a directory tree and convert
239the results straight into a dump on standard output, without
240generating an index file at all. So running \cw{agedu -S /path}
241should produce equivalent output to that of \cw{agedu -s /path -D},
242except that the latter will produce an index file as a side effect
243whereas \cw{-S} will not.
244
245\lcont{
246
e6fde1f7 247(The output will not be exactly \e{identical}, due to a
248difference in treatment of last-access times on directories.
249However, it should be effectively equivalent for most purposes. See
250the documentation of the \cw{--dir-atime} option in the next section
251for further detail.)
67159944 252
253}
254
255\dt \cw{-H} \e{directory} or \cw{--html} \e{directory}
256
257\dd In this mode, \cw{agedu} will generate an HTML report of the
258disk usage in the specified directory and its immediate
259subdirectories, in the same form that it serves from its web server
260in \cw{-w} mode. However, this time, a single HTML report will be
261generated and simply written to standard output, with no hyperlinks
262pointing to other similar pages.
263
264\U OPTIONS
265
266This section describes the various configuration options that affect
267\cw{agedu}'s operation in one mode or another.
268
269The following option affects nearly all modes (except \cw{-S}):
270
271\dt \cw{-f} \e{filename} or \cw{--file} \e{filename}
272
273\dd Specifies the location of the index file which \cw{agedu}
274creates, reads or removes depending on its operating mode. By
275default, this is simply \cq{agedu.dat}, in whatever is the current
276working directory when you run \cw{agedu}.
277
278The following options affect the disk-scanning modes, \cw{-s} and
279\cw{-S}:
280
281\dt \cw{--cross-fs} and \cw{--no-cross-fs}
282
283\dd These configure whether or not the disk scan is permitted to
284cross between different file systems. The default is not to:
285\cw{agedu} will normally skip over subdirectories on which a
286different file system is mounted. This makes it convenient when you
287want to free up space on a particular file system which is running
288low. However, in other circumstances you might wish to see general
289information about the use of space no matter which file system it's
290on (for instance, if your real concern is your backup media running
291out of space, and if your backups do not treat different file
292systems specially); in that situation, use \cw{--cross-fs}.
293
294\lcont{
295
296(Note that this default is the opposite way round from the
297corresponding option in \cw{du}.)
298
299}
300
301\dt \cw{--prune} \e{wildcard} and \cw{--prune-path} \e{wildcard}
302
303\dd These cause particular files or directories to be omitted
304entirely from the scan. If \cw{agedu}'s scan encounters a file or
305directory whose name matches the wildcard provided to the
306\cw{--prune} option, it will not include that file in its index, and
307also if it's a directory it will skip over it and not scan its
308contents.
309
310\lcont{
311
312Note that in most Unix shells, wildcards will probably need to be
313escaped on the command line, to prevent the shell from expanding the
314wildcard before \cw{agedu} sees it.
315
316\cw{--prune-path} is similar to \cw{--prune}, except that the
317wildcard is matched against the entire pathname instead of just the
318filename at the end of it. So whereas \cw{--prune *a*b*} will match
319any file whose actual name contains an \cw{a} somewhere before a
320\cw{b}, \cw{--prune-path *a*b*} will also match a file whose name
321contains \cw{b} and which is inside a directory containing an
322\cw{a}, or any file inside a directory of that form, and so on.
323
324}
325
326\dt \cw{--exclude} \e{wildcard} and \cw{--exclude-path} \e{wildcard}
327
328\dd These cause particular files or directories to be omitted from
329the index, but not from the scan. If \cw{agedu}'s scan encounters a
330file or directory whose name matches the wildcard provided to the
331\cw{--exclude} option, it will not include that file in its index
332\dash but unlike \cw{--prune}, if the file in question is a
333directory it will still scan its contents and index them if they are
334not ruled out themselves by \cw{--exclude} options.
335
336\lcont{
337
338As above, \cw{--exclude-path} is similar to \cw{--exclude}, except
339that the wildcard is matched against the entire pathname.
340
341}
342
343\dt \cw{--include} \e{wildcard} and \cw{--include-path} \e{wildcard}
344
345\dd These cause particular files or directories to be re-included in
346the index and the scan, if they had previously been ruled out by one
347of the above exclude or prune options. You can interleave include,
348exclude and prune options as you wish on the command line, and if
349more than one of them applies to a file then the last one takes
350priority.
351
352\lcont{
353
354For example, if you wanted to see only the disk space taken up by
355MP3 files, you might run
356
357\c $ agedu -s . --exclude '*' --include '*.mp3'
358\e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
359
360which will cause everything to be omitted from the scan, but then
361the MP3 files to be put back in. If you then wanted only a subset of
362those MP3s, you could then exclude some of them again by adding,
363say, \cq{--exclude-path './queen/*'} (or, more efficiently,
364\cq{--prune ./queen}) on the end of that command.
365
366As with the previous two options, \cw{--include-path} is similar to
367\cw{--include} except that the wildcard is matched against the
368entire pathname.
369
370}
371
372\dt \cw{--progress}, \cw{--no-progress} and \cw{--tty-progress}
373
374\dd When \cw{agedu} is scanning a directory tree, it will typically
375print a one-line progress report every second showing where it has
376reached in the scan, so you can have some idea of how much longer it
377will take. (Of course, it can't predict \e{exactly} how long it will
378take, since it doesn't know which of the directories it hasn't
379scanned yet will turn out to be huge.)
380
381\lcont{
382
383By default, those progress reports are displayed on \cw{agedu}'s
384standard error channel, if that channel points to a terminal device.
385If you need to manually enable or disable them, you can use the
386above three options to do so: \cw{--progress} unconditionally
387enables the progress reports, \cw{--no-progress} unconditionally
388disables them, and \cw{--tty-progress} reverts to the default
389behaviour which is conditional on standard error being a terminal.
390
391}
392
393\dt \cw{--dir-atime} and \cw{--no-dir-atime}
394
395\dd In normal operation, \cw{agedu} ignores the atimes (last access
396times) on the \e{directories} it scans: it only pays attention to
397the atimes of the \e{files} inside those directories. This is
398because directory atimes tend to be reset by a lot of system
399administrative tasks, such as \cw{cron} jobs which scan the file
400system for one reason or another \dash or even other invocations of
401\cw{agedu} itself, though it tries to avoid modifying any atimes if
402possible. So the literal atimes on directories are typically not
403representative of how long ago the data in question was last
404accessed with real intent to use that data in particular.
405
406\lcont{
407
408Instead, \cw{agedu} makes up a fake atime for every directory it
409scans, which is equal to the newest atime of any file in or below
410that directory (or the directory's last \e{modification} time,
411whichever is newest). This is based on the assumption that all
412\e{important} accesses to directories are actually accesses to the
413files inside those directories, so that when any file is accessed
414all the directories on the path leading to it should be considered
415to have been accessed as well.
416
417In unusual cases it is possible that a directory itself might embody
418important data which is accessed by reading the directory. In that
419situation, \cw{agedu}'s atime-faking policy will misreport the
420directory as disused. In the unlikely event that such directories
421form a significant part of your disk space usage, you might want to
422turn off the faking. The \cw{--dir-atime} option does this: it
423causes the disk scan to read the original atimes of the directories
424it scans.
425
426The faking of atimes on directories also requires a processing pass
427over the index file after the main disk scan is complete.
428\cw{--dir-atime} also turns this pass off. Hence, this option
429affects the \cw{-L} option as well as \cw{-s} and \cw{-S}.
430
431(The previous section mentioned that there might be subtle
432differences between the output of \cw{agedu -s /path -D} and
433\cw{agedu -S /path}. This is why. Doing a scan with \cw{-s} and then
434dumping it with \cw{-D} will dump the fully faked atimes on the
435directories, whereas doing a scan-to-dump with \cw{-S} will dump
436only \e{partially} faked atimes \dash specifically, each directory's
437last modification time \dash since the subsequent processing pass
438will not have had a chance to take place. However, loading either of
439the resulting dump files with \cw{-L} will perform the atime-faking
440processing pass, leading to the same data in the index file in each
441case. In normal usage it should be safe to ignore all of this
442complexity.)
443
444}
445
f59a5d34 446\dt \cw{--mtime}
447
448\dd This option causes \cw{agedu} to index files by their last
449modification time instead of their last access time. You might want
450to use this if your last access times were completely useless for
451some reason: for example, if you had recently searched every file on
452your system, the system would have lost all the information about
453what files you hadn't recently accessed before then. Using this
454option is liable to be less effective at finding genuinely wasted
455space than the normal mode (that is, it will be more likely to flag
456things as disused when they're not, so you will have more candidates
457to go through by hand looking for data you don't need), but may be
458better than nothing if your last-access times are unhelpful.
459
16139d21 460The following option affects all the modes that generate reports:
461the web server mode \cw{-w}, the stand-alone HTML generation mode
462\cw{-H} and the text report mode \cw{-t}.
463
464\dt \cw{--files}
465
466\dd This option causes \cw{agedu}'s reports to list the individual
467files in each directory, instead of just giving a combined report
468for everything that's not in a subdirectory.
469
67159944 470The following options affect the web server mode \cw{-w}, and in one
3ac89349 471case also the stand-alone HTML generation mode \cw{-H}:
67159944 472
473\dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range}
474
475\dd The HTML reports produced by \cw{agedu} use a range of colours
476to indicate how long ago data was last accessed, running from red
477(representing the most disused data) to green (representing the
478newest). By default, the lengths of time represented by the two ends
479of that spectrum are chosen by examining the data file to see what
480range of ages appears in it. However, you might want to set your own
481limits, and you can do this using \cw{-r}.
482
483\lcont{
484
485The argument to \cw{-r} consists of a single age, or two ages
486separated by a minus sign. An age is a number, followed by one of
487\cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days).
488The first age in the range represents the oldest data, and will be
489coloured red in the HTML; the second age represents the newest,
490coloured green. If the second age is not specified, it will default
491to zero (so that green means data which has been accessed \e{just
492now}).
493
494For example, \cw{-r 2y} will mark data in red if it has been unused
495for two years or more, and green if it has been accessed just now.
496\cw{-r 2y-3m} will similarly mark data red if it has been unused for
497two years or more, but will mark it green if it has been accessed
498three months ago or later.
499
500}
501
502\dt \cw{--address} \e{addr}[\cw{:}\e{port}]
503
504\dd Specifies the network address and port number on which
505\cw{agedu} should listen when running its web server. If you want
506\cw{agedu} to listen for connections coming in from any source, you
507should probably specify the special IP address \cw{0.0.0.0}. If the
15e73840 508port number is omitted, an arbitrary unused port will be chosen for
509you and displayed.
67159944 510
511\lcont{
512
513If you specify this option, \cw{agedu} will not print its URL on
514standard output (since you are expected to know what address you
515told it to listen to).
516
517}
518
519\dt \cw{--auth} \e{auth-type}
520
521\dd Specifies how \cw{agedu} should control access to the web pages
522it serves. The options are as follows:
523
524\lcont{
525
526\dt \cw{magic}
527
528\dd This option only works on Linux, and only when the incoming
529connection is from the same machine that \cw{agedu} is running on.
530On Linux, the special file \cw{/proc/net/tcp} contains a list of
531network connections currently known to the operating system kernel,
532including which user id created them. So \cw{agedu} will look up
533each incoming connection in that file, and allow access if it comes
534from the same user id under which \cw{agedu} itself is running.
535Therefore, in \cw{agedu}'s normal web server mode, you can safely
536run it on a multi-user machine and no other user will be able to
537read data out of your index file.
538
539\dt \cw{basic}
540
541\dd In this mode, \cw{agedu} will use HTTP Basic authentication: the
542user will have to provide a username and password via their browser.
543\cw{agedu} will normally make up a username and password for the
544purpose, but you can specify your own; see below.
545
546\dt \cw{none}
547
548\dd In this mode, the web server is unauthenticated: anyone
549connecting to it has full access to the reports generated by
550\cw{agedu}. Do not do this unless there is nothing confidential at
551all in your index file, or unless you are certain that nobody but
552you can run processes on your computer.
553
554\dt \cw{default}
555
556\dd This is the default mode if you do not specify one of the above.
557In this mode, \cw{agedu} will attempt to use Linux magic
558authentication, but if it detects at startup time that
559\cw{/proc/net/tcp} is absent or non-functional then it will fall
560back to using HTTP Basic authentication and invent a user name and
561password.
562
563}
564
565\dt \cw{--auth-file} \e{filename} or \cw{--auth-fd} \e{fd}
566
567\dd When \cw{agedu} is using HTTP Basic authentication, these
568options allow you to specify your own user name and password. If you
569specify \cw{--auth-file}, these will be read from the specified
570file; if you specify \cw{--auth-fd} they will instead be read from a
571given file descriptor which you should have arranged to pass to
572\cw{agedu}. In either case, the authentication details should
573consist of the username, followed by a colon, followed by the
574password, followed \e{immediately} by end of file (no trailing
575newline, or else it will be considered part of the password).
576
577\U LIMITATIONS
578
579The data file is pretty large. The core of \cw{agedu} is the
580tree-based data structure it uses in its index in order to
581efficiently perform the queries it needs; this data structure
582requires \cw{O(N log N)} storage. This is larger than you might
583expect; a scan of my own home directory, containing half a million
584files and directories and about 20Gb of data, produced an index file
522edd92 585over 60Mb in size. Furthermore, since the data file must be
586memory-mapped during most processing, it can never grow larger than
61df92dc 587available address space, so a \e{really} big filesystem may need to
522edd92 588be indexed on a 64-bit computer. (This is one reason for the
589existence of the \cw{-D} and \cw{-L} options: you can do the
590scanning on the machine with access to the filesystem, and the
591indexing on a machine big enough to handle it.)
67159944 592
67159944 593The data structure also does not usefully permit access control
594within the data file, so it would be difficult \dash even given the
595willingness to do additional coding \dash to run a system-wide
596\cw{agedu} scan on a \cw{cron} job and serve the right subset of
597reports to each user.
598
599\U LICENCE
600
601\cw{agedu} is free software, distributed under the MIT licence. Type
602\cw{agedu --licence} to see the full licence text.
603
604\versionid $Id$