X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/agedu/blobdiff_plain/15e738400fc6d322b0caad7a5a7ef1dcaea5065f..26cfb15abef49bc22b35f7133ec8eddd057c4248:/agedu.but diff --git a/agedu.but b/agedu.but index 5bf5d0b..f73f333 100644 --- a/agedu.but +++ b/agedu.but @@ -6,8 +6,8 @@ \U NAME -\cw{agedu} - correlate disk usage with last-access times to identify -large and disused data +\cw{agedu} \dash correlate disk usage with last-access times to +identify large and disused data \U SYNOPSIS @@ -65,7 +65,7 @@ mode: which will print (among other messages) a URL on its standard output along the lines of -\c URL: http://127.164.152.163:48638/ +\c URL: http://127.0.0.1:48638/ (That URL will always begin with \cq{127.}, meaning that it's in the \cw{localhost} address space. So only processes running on the same @@ -166,10 +166,11 @@ default it invents its own URL and prints it out. \lcont{ -The web server runs until \cw{agedu} receives an end-of-file event -on its standard input. (The expected usage is that you run it from -the command line, immediately browse web pages until you're -satisfied, and then press Ctrl-D.) +The web server runs until \cw{agedu} receives an end-of-file event on +its standard input. (The expected usage is that you run it from the +command line, immediately browse web pages until you're satisfied, and +then press Ctrl-D.) To disable the EOF behaviour, use the +\cw{--no-eof} option. In case the index file contains any confidential information about your file system, the web server protects the pages it serves from @@ -190,10 +191,10 @@ completely) and a username and password of your choice. \dd In this mode, \cw{agedu} generates a textual report on standard output, listing the disk usage in the specified directory and all -its subdirectories down to a fixed depth. By default that depth is +its subdirectories down to a given depth. By default that depth is 1, so that you see a report for \e{directory} itself and all of its -immediate subdirectories. You can configure a different depth using -\cw{-d}, described in the next section. +immediate subdirectories. You can configure a different depth (or no +depth limit) using \cw{-d}, described in the next section. \lcont{ @@ -257,9 +258,37 @@ for further detail.) \dd In this mode, \cw{agedu} will generate an HTML report of the disk usage in the specified directory and its immediate subdirectories, in the same form that it serves from its web server -in \cw{-w} mode. However, this time, a single HTML report will be -generated and simply written to standard output, with no hyperlinks -pointing to other similar pages. +in \cw{-w} mode. + +\lcont{ + +By default, a single HTML report will be generated and simply +written to standard output, with no hyperlinks pointing to other +similar pages. If you also specify the \cw{-d} option (see below), +\cw{agedu} will instead write out a collection of HTML files with +hyperlinks between them, and call the top-level file +\cw{index.html}. + +} + +\dt \cw{--cgi} + +\dd In this mode, \cw{agedu} will run as the bulk of a CGI script +which provides the same set of web pages as the built-in web server +would. It will read the usual CGI environment variables, and write +CGI-style data to its standard output. + +\lcont{ + +The actual CGI program itself should be a tiny wrapper around +\cw{agedu} which passes it the \cw{--cgi} option, and also +(probably) \cw{-f} to locate the index file. \cw{agedu} will do +everything else. + +No access control is performed in this mode: restricting access to +CGI scripts is assumed to be the job of the web server. + +} \U OPTIONS @@ -443,8 +472,80 @@ complexity.) } -The following options affect the web server mode \cw{-w}, and in one -case also the stand-along HTML generation mode \cw{-H}: +\dt \cw{--mtime} + +\dd This option causes \cw{agedu} to index files by their last +modification time instead of their last access time. You might want +to use this if your last access times were completely useless for +some reason: for example, if you had recently searched every file on +your system, the system would have lost all the information about +what files you hadn't recently accessed before then. Using this +option is liable to be less effective at finding genuinely wasted +space than the normal mode (that is, it will be more likely to flag +things as disused when they're not, so you will have more candidates +to go through by hand looking for data you don't need), but may be +better than nothing if your last-access times are unhelpful. + +\lcont{ + +Another use for this mode might be to find \e{recently created} +large data. If your disk has been gradually filling up for years, +the default mode of \cw{agedu} will let you find unused data to +delete; but if you know your disk had plenty of space recently and +now it's suddenly full, and you suspect that some rogue program has +left a large core dump or output file, then \cw{agedu --mtime} might +be a convenient way to locate the culprit. + +} + +The following option affects all the modes that generate reports: +the web server mode \cw{-w}, the stand-alone HTML generation mode +\cw{-H} and the text report mode \cw{-t}. + +\dt \cw{--files} + +\dd This option causes \cw{agedu}'s reports to list the individual +files in each directory, instead of just giving a combined report +for everything that's not in a subdirectory. + +The following options affect the stand-alone HTML generation mode +\cw{-H} and the text report mode \cw{-t}. + +\dt \cw{-d} \e{depth} or \cw{--depth} \e{depth} + +\dd This option controls the maximum depth to which \cw{agedu} +recurses when generating a text or HTML report. + +\lcont{ + +In text mode, the default is 1, meaning that the report will include +the directory given on the command line and all of its immediate +subdirectories. A depth of two includes another level below that, +and so on; a depth of zero means \e{only} the directory on the +command line. + +In HTML mode, specifying this option switches \cw{agedu} from +writing out a single HTML file to writing out multiple files which +link to each other. A depth of 1 means \cw{agedu} will write out an +HTML file for the given directory and also one for each of its +immediate subdirectories. + +If you want \cw{agedu} to recurse as deeply as possible, give the +special word \cq{max} as an argument to \cw{-d}. + +} + +\dt \cw{-o} \e{filename} or \cw{--output} \e{filename} + +\dd This option is used to specify an output file for \cw{agedu} to +write its report to. In text mode or single-file HTML mode, the +argument is treated as the name of a file. In multiple-file HTML +mode, the argument is treated as the name of a directory: the +directory will be created if it does not already exist, and the +output HTML files will be created inside it. + +The following options affect the web server mode \cw{-w}, and in some +cases also the stand-alone HTML generation mode \cw{-H}: \dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range} @@ -550,6 +651,20 @@ consist of the username, followed by a colon, followed by the password, followed \e{immediately} by end of file (no trailing newline, or else it will be considered part of the password). +\dt \cw{--title} \e{title} + +\dd Specify the string that appears at the start of the \cw{} +section of the output HTML pages. The default is \cq{agedu}. This +title is followed by a colon and then the path you're viewing within +the index file. You might use this option if you were serving +\cw{agedu} reports for several different servers and wanted to make it +clearer which one a user was looking at. + +\dt \cw{--no-eof} + +\dd Stop \cw{agedu} in web server mode from looking for end-of-file on +standard input and treating it as a signal to terminate. + \U LIMITATIONS The data file is pretty large. The core of \cw{agedu} is the @@ -558,11 +673,13 @@ efficiently perform the queries it needs; this data structure requires \cw{O(N log N)} storage. This is larger than you might expect; a scan of my own home directory, containing half a million files and directories and about 20Gb of data, produced an index file -nearly a third of a Gb in size. Furthermore, since the data file -must be memory-mapped during most processing, it can never grow -larger than available address space, which means that any use of -\cw{agedu} on a file system more than about ten times the above size -is probably going to have to be done on a 64-bit computer. +over 60Mb in size. Furthermore, since the data file must be +memory-mapped during most processing, it can never grow larger than +available address space, so a \e{really} big filesystem may need to +be indexed on a 64-bit computer. (This is one reason for the +existence of the \cw{-D} and \cw{-L} options: you can do the +scanning on the machine with access to the filesystem, and the +indexing on a machine big enough to handle it.) The data structure also does not usefully permit access control within the data file, so it would be difficult \dash even given the @@ -570,6 +687,18 @@ willingness to do additional coding \dash to run a system-wide \cw{agedu} scan on a \cw{cron} job and serve the right subset of reports to each user. +In certain circumstances, \cw{agedu} can report false positives +(reporting files as disused which are in fact in use) as well as the +more benign false negatives (reporting files as in use which are +not). This arises when a file is, semantically speaking, \q{read} +without actually being physically \e{read}. Typically this occurs +when a program checks whether the file's mtime has changed and only +bothers re-reading it if it has; programs which do this include +\cw{rsync}(\e{1}) and \cw{make}(\e{1}). Such programs will fail to +update the atime of unmodified files despite depending on their +continued existence; a directory full of such files will be reported +as disused by \cw{agedu} but deleting them will cause trouble. + \U LICENCE \cw{agedu} is free software, distributed under the MIT licence. Type