| 1 | \cfg{man-identity}{agedu}{1}{2008-11-02}{Simon Tatham}{Simon Tatham} |
| 2 | |
| 3 | \define{dash} \u2013{-} |
| 4 | |
| 5 | \title Man page for \cw{agedu} |
| 6 | |
| 7 | \U NAME |
| 8 | |
| 9 | \cw{agedu} \dash correlate disk usage with last-access times to |
| 10 | identify large and disused data |
| 11 | |
| 12 | \U SYNOPSIS |
| 13 | |
| 14 | \c agedu [ options ] action [action...] |
| 15 | \e bbbbb iiiiiii iiiiii iiiiii |
| 16 | |
| 17 | \U DESCRIPTION |
| 18 | |
| 19 | \cw{agedu} scans a directory tree and produces reports about how |
| 20 | much disk space is used in each directory and subdirectory, and also |
| 21 | how that usage of disk space corresponds to files with last-access |
| 22 | times a long time ago. |
| 23 | |
| 24 | In other words, \cw{agedu} is a tool you might use to help you free |
| 25 | up disk space. It lets you see which directories are taking up the |
| 26 | most space, as \cw{du} does; but unlike \cw{du}, it also |
| 27 | distinguishes between large collections of data which are still in |
| 28 | use and ones which have not been accessed in months or years \dash |
| 29 | for instance, large archives downloaded, unpacked, used once, and |
| 30 | never cleaned up. Where \cw{du} helps you find what's using your |
| 31 | disk space, \cw{agedu} helps you find what's \e{wasting} your disk |
| 32 | space. |
| 33 | |
| 34 | \cw{agedu} has several operating modes. In one mode, it scans your |
| 35 | disk and builds an index file containing a data structure which |
| 36 | allows it to efficiently retrieve any information it might need. |
| 37 | Typically, you would use it in this mode first, and then run it in |
| 38 | one of a number of \q{query} modes to display a report of the disk |
| 39 | space usage of a particular directory and its subdirectories. Those |
| 40 | reports can be produced as plain text (much like \cw{du}) or as |
| 41 | HTML. \cw{agedu} can even run as a miniature web server, presenting |
| 42 | each directory's HTML report with hyperlinks to let you navigate |
| 43 | around the file system to similar reports for other directories. |
| 44 | |
| 45 | So you would typically start using \cw{agedu} by telling it to do a |
| 46 | scan of a directory tree and build an index. This is done with a |
| 47 | command such as |
| 48 | |
| 49 | \c $ agedu -s /home/fred |
| 50 | \e bbbbbbbbbbbbbbbbbbb |
| 51 | |
| 52 | which will build a large data file called \c{agedu.dat} in your |
| 53 | current directory. (If that current directory is \e{inside} |
| 54 | \cw{/home/fred}, don't worry \dash \cw{agedu} is smart enough to |
| 55 | discount its own index file.) |
| 56 | |
| 57 | Having built the index, you would now query it for reports of disk |
| 58 | space usage. If you have a graphical web browser, the simplest and |
| 59 | nicest way to query the index is by running \cw{agedu} in web server |
| 60 | mode: |
| 61 | |
| 62 | \c $ agedu -w |
| 63 | \e bbbbbbbb |
| 64 | |
| 65 | which will print (among other messages) a URL on its standard output |
| 66 | along the lines of |
| 67 | |
| 68 | \c URL: http://127.0.0.1:48638/ |
| 69 | |
| 70 | (That URL will always begin with \cq{127.}, meaning that it's in the |
| 71 | \cw{localhost} address space. So only processes running on the same |
| 72 | computer can even try to connect to that web server, and also there |
| 73 | is access control to prevent other users from seeing it \dash see |
| 74 | below for more detail.) |
| 75 | |
| 76 | Now paste that URL into your web browser, and you will be shown a |
| 77 | graphical representation of the disk usage in \cw{/home/fred} and |
| 78 | its immediate subdirectories, with varying colours used to show the |
| 79 | difference between disused and recently-accessed data. Click on any |
| 80 | subdirectory to descend into it and see a report for its |
| 81 | subdirectories in turn; click on parts of the pathname at the top of |
| 82 | any page to return to higher-level directories. When you've finished |
| 83 | browsing, you can just press Ctrl-D to send an end-of-file |
| 84 | indication to \cw{agedu}, and it will shut down. |
| 85 | |
| 86 | After that, you probably want to delete the data file |
| 87 | \cw{agedu.dat}, since it's pretty large. In fact, the command |
| 88 | \cw{agedu -R} will do this for you; and you can chain \cw{agedu} |
| 89 | commands on the same command line, so that instead of the above you |
| 90 | could have done |
| 91 | |
| 92 | \c $ agedu -s /home/fred -w -R |
| 93 | \e bbbbbbbbbbbbbbbbbbbbbbbbb |
| 94 | |
| 95 | for a single self-contained run of \cw{agedu} which builds its |
| 96 | index, serves web pages from it, and cleans it up when finished. |
| 97 | |
| 98 | If you don't have a graphical web browser, you can do text-based |
| 99 | queries as well. Having scanned \cw{/home/fred} as above, you might |
| 100 | run |
| 101 | |
| 102 | \c $ agedu -t /home/fred |
| 103 | \e bbbbbbbbbbbbbbbbbbb |
| 104 | |
| 105 | which again gives a summary of the disk usage in \cw{/home/fred} and |
| 106 | its immediate subdirectories; but this time \cw{agedu} will print it |
| 107 | on standard output, in much the same format as \cw{du}. If you then |
| 108 | want to find out how much \e{old} data is there, you can add the |
| 109 | \cw{-a} option to show only files last accessed a certain length of |
| 110 | time ago. For example, to show only files which haven't been looked |
| 111 | at in six months or more: |
| 112 | |
| 113 | \c $ agedu -t /home/fred -a 6m |
| 114 | \e bbbbbbbbbbbbbbbbbbbbbbbbb |
| 115 | |
| 116 | That's the essence of what \cw{agedu} does. It has other modes of |
| 117 | operation for more complex situations, and the usual array of |
| 118 | configurable options. The following sections contain a complete |
| 119 | reference for all its functionality. |
| 120 | |
| 121 | \U OPERATING MODES |
| 122 | |
| 123 | This section describes the operating modes supported by \cw{agedu}. |
| 124 | Each of these is in the form of a command-line option, sometimes |
| 125 | with an argument. Multiple operating-mode options may appear on the |
| 126 | command line, in which case \cw{agedu} will perform the specified |
| 127 | actions one after another. For instance, as shown in the previous |
| 128 | section, you might want to perform a disk scan and immediately |
| 129 | launch a web server giving reports from that scan. |
| 130 | |
| 131 | \dt \cw{-s} \e{directory} or \cw{--scan} \e{directory} |
| 132 | |
| 133 | \dd In this mode, \cw{agedu} scans the file system starting at the |
| 134 | specified directory, and indexes the results of the scan into a |
| 135 | large data file which other operating modes can query. |
| 136 | |
| 137 | \lcont{ |
| 138 | |
| 139 | By default, the scan is restricted to a single file system (since |
| 140 | the expected use of \cw{agedu} is that you would probably use it |
| 141 | because a particular disk partition was running low on space). You |
| 142 | can remove that restriction using the \cw{--cross-fs} option; other |
| 143 | configuration options allow you to include or exclude files or |
| 144 | entire subdirectories from the scan. See the next section for full |
| 145 | details of the configurable options. |
| 146 | |
| 147 | The index file is created with restrictive permissions, in case the |
| 148 | file system you are scanning contains confidential information in |
| 149 | its structure. |
| 150 | |
| 151 | Index files are dependent on the characteristics of the CPU |
| 152 | architecture you created them on. You should not expect to be able |
| 153 | to move an index file between different types of computer and have |
| 154 | it continue to work. If you need to transfer the results of a disk |
| 155 | scan to a different kind of computer, see the \cw{-D} and \cw{-L} |
| 156 | options below. |
| 157 | |
| 158 | } |
| 159 | |
| 160 | \dt \cw{-w} or \cw{--web} |
| 161 | |
| 162 | \dd In this mode, \cw{agedu} expects to find an index file already |
| 163 | written. It allocates a network port, and starts up a web server on |
| 164 | that port which serves reports generated from the index file. By |
| 165 | default it invents its own URL and prints it out. |
| 166 | |
| 167 | \lcont{ |
| 168 | |
| 169 | The web server runs until \cw{agedu} receives an end-of-file event on |
| 170 | its standard input. (The expected usage is that you run it from the |
| 171 | command line, immediately browse web pages until you're satisfied, and |
| 172 | then press Ctrl-D.) To disable the EOF behaviour, use the |
| 173 | \cw{--no-eof} option. |
| 174 | |
| 175 | In case the index file contains any confidential information about |
| 176 | your file system, the web server protects the pages it serves from |
| 177 | access by other people. On Linux, this is done transparently by |
| 178 | means of using \cw{/proc/net/tcp} to check the owner of each |
| 179 | incoming connection; failing that, the web server will require a |
| 180 | password to view the reports, and \cw{agedu} will print the password |
| 181 | it invented on standard output along with the URL. |
| 182 | |
| 183 | Configurable options for this mode let you specify your own address |
| 184 | and port number to listen on, and also specify your own choice of |
| 185 | authentication method (including turning authentication off |
| 186 | completely) and a username and password of your choice. |
| 187 | |
| 188 | } |
| 189 | |
| 190 | \dt \cw{-t} \e{directory} or \cw{--text} \e{directory} |
| 191 | |
| 192 | \dd In this mode, \cw{agedu} generates a textual report on standard |
| 193 | output, listing the disk usage in the specified directory and all |
| 194 | its subdirectories down to a given depth. By default that depth is |
| 195 | 1, so that you see a report for \e{directory} itself and all of its |
| 196 | immediate subdirectories. You can configure a different depth (or no |
| 197 | depth limit) using \cw{-d}, described in the next section. |
| 198 | |
| 199 | \lcont{ |
| 200 | |
| 201 | Used on its own, \cw{-t} merely lists the \e{total} disk usage in |
| 202 | each subdirectory; \cw{agedu}'s additional ability to distinguish |
| 203 | unused from recently-used data is not activated. To activate it, use |
| 204 | the \cw{-a} option to specify a minimum age. |
| 205 | |
| 206 | The directory structure stored in \cw{agedu}'s index file is treated |
| 207 | as a set of literal strings. This means that you cannot refer to |
| 208 | directories by synonyms. So if you ran \cw{agedu -s .}, then all the |
| 209 | path names you later pass to the \cw{-t} option must be either |
| 210 | \cq{.} or begin with \cq{./}. Similarly, symbolic links within the |
| 211 | directory you scanned will not be followed; you must refer to each |
| 212 | directory by its canonical, symlink-free pathname. |
| 213 | |
| 214 | } |
| 215 | |
| 216 | \dt \cw{-R} or \cw{--remove} |
| 217 | |
| 218 | \dd In this mode, \cw{agedu} deletes its index file. Running just |
| 219 | \cw{agedu -R} on its own is therefore equivalent to typing \cw{rm |
| 220 | agedu.dat}. However, you can also put \cw{-R} on the end of a |
| 221 | command line to indicate that \cw{agedu} should delete its index |
| 222 | file after it finishes performing other operations. |
| 223 | |
| 224 | \dt \cw{-D} or \cw{--dump} |
| 225 | |
| 226 | \dd In this mode, \cw{agedu} reads an existing index file and |
| 227 | produces a dump of its contents on standard output. This dump can |
| 228 | later be loaded into a new index file, perhaps on another computer. |
| 229 | |
| 230 | \dt \cw{-L} or \cw{--load} |
| 231 | |
| 232 | \dd In this mode, \cw{agedu} expects to read a dump produced by the |
| 233 | \cw{-D} option from its standard input. It constructs an index file |
| 234 | from that dump, exactly as it would have if it had read the same |
| 235 | data from a disk scan in \cw{-s} mode. |
| 236 | |
| 237 | \dt \cw{-S} \e{directory} or \cw{--scan-dump} \e{directory} |
| 238 | |
| 239 | \dd In this mode, \cw{agedu} will scan a directory tree and convert |
| 240 | the results straight into a dump on standard output, without |
| 241 | generating an index file at all. So running \cw{agedu -S /path} |
| 242 | should produce equivalent output to that of \cw{agedu -s /path -D}, |
| 243 | except that the latter will produce an index file as a side effect |
| 244 | whereas \cw{-S} will not. |
| 245 | |
| 246 | \lcont{ |
| 247 | |
| 248 | (The output will not be exactly \e{identical}, due to a |
| 249 | difference in treatment of last-access times on directories. |
| 250 | However, it should be effectively equivalent for most purposes. See |
| 251 | the documentation of the \cw{--dir-atime} option in the next section |
| 252 | for further detail.) |
| 253 | |
| 254 | } |
| 255 | |
| 256 | \dt \cw{-H} \e{directory} or \cw{--html} \e{directory} |
| 257 | |
| 258 | \dd In this mode, \cw{agedu} will generate an HTML report of the |
| 259 | disk usage in the specified directory and its immediate |
| 260 | subdirectories, in the same form that it serves from its web server |
| 261 | in \cw{-w} mode. |
| 262 | |
| 263 | \lcont{ |
| 264 | |
| 265 | By default, a single HTML report will be generated and simply |
| 266 | written to standard output, with no hyperlinks pointing to other |
| 267 | similar pages. If you also specify the \cw{-d} option (see below), |
| 268 | \cw{agedu} will instead write out a collection of HTML files with |
| 269 | hyperlinks between them, and call the top-level file |
| 270 | \cw{index.html}. |
| 271 | |
| 272 | } |
| 273 | |
| 274 | \dt \cw{--cgi} |
| 275 | |
| 276 | \dd In this mode, \cw{agedu} will run as the bulk of a CGI script |
| 277 | which provides the same set of web pages as the built-in web server |
| 278 | would. It will read the usual CGI environment variables, and write |
| 279 | CGI-style data to its standard output. |
| 280 | |
| 281 | \lcont{ |
| 282 | |
| 283 | The actual CGI program itself should be a tiny wrapper around |
| 284 | \cw{agedu} which passes it the \cw{--cgi} option, and also |
| 285 | (probably) \cw{-f} to locate the index file. \cw{agedu} will do |
| 286 | everything else. |
| 287 | |
| 288 | No access control is performed in this mode: restricting access to |
| 289 | CGI scripts is assumed to be the job of the web server. |
| 290 | |
| 291 | } |
| 292 | |
| 293 | \U OPTIONS |
| 294 | |
| 295 | This section describes the various configuration options that affect |
| 296 | \cw{agedu}'s operation in one mode or another. |
| 297 | |
| 298 | The following option affects nearly all modes (except \cw{-S}): |
| 299 | |
| 300 | \dt \cw{-f} \e{filename} or \cw{--file} \e{filename} |
| 301 | |
| 302 | \dd Specifies the location of the index file which \cw{agedu} |
| 303 | creates, reads or removes depending on its operating mode. By |
| 304 | default, this is simply \cq{agedu.dat}, in whatever is the current |
| 305 | working directory when you run \cw{agedu}. |
| 306 | |
| 307 | The following options affect the disk-scanning modes, \cw{-s} and |
| 308 | \cw{-S}: |
| 309 | |
| 310 | \dt \cw{--cross-fs} and \cw{--no-cross-fs} |
| 311 | |
| 312 | \dd These configure whether or not the disk scan is permitted to |
| 313 | cross between different file systems. The default is not to: |
| 314 | \cw{agedu} will normally skip over subdirectories on which a |
| 315 | different file system is mounted. This makes it convenient when you |
| 316 | want to free up space on a particular file system which is running |
| 317 | low. However, in other circumstances you might wish to see general |
| 318 | information about the use of space no matter which file system it's |
| 319 | on (for instance, if your real concern is your backup media running |
| 320 | out of space, and if your backups do not treat different file |
| 321 | systems specially); in that situation, use \cw{--cross-fs}. |
| 322 | |
| 323 | \lcont{ |
| 324 | |
| 325 | (Note that this default is the opposite way round from the |
| 326 | corresponding option in \cw{du}.) |
| 327 | |
| 328 | } |
| 329 | |
| 330 | \dt \cw{--prune} \e{wildcard} and \cw{--prune-path} \e{wildcard} |
| 331 | |
| 332 | \dd These cause particular files or directories to be omitted |
| 333 | entirely from the scan. If \cw{agedu}'s scan encounters a file or |
| 334 | directory whose name matches the wildcard provided to the |
| 335 | \cw{--prune} option, it will not include that file in its index, and |
| 336 | also if it's a directory it will skip over it and not scan its |
| 337 | contents. |
| 338 | |
| 339 | \lcont{ |
| 340 | |
| 341 | Note that in most Unix shells, wildcards will probably need to be |
| 342 | escaped on the command line, to prevent the shell from expanding the |
| 343 | wildcard before \cw{agedu} sees it. |
| 344 | |
| 345 | \cw{--prune-path} is similar to \cw{--prune}, except that the |
| 346 | wildcard is matched against the entire pathname instead of just the |
| 347 | filename at the end of it. So whereas \cw{--prune *a*b*} will match |
| 348 | any file whose actual name contains an \cw{a} somewhere before a |
| 349 | \cw{b}, \cw{--prune-path *a*b*} will also match a file whose name |
| 350 | contains \cw{b} and which is inside a directory containing an |
| 351 | \cw{a}, or any file inside a directory of that form, and so on. |
| 352 | |
| 353 | } |
| 354 | |
| 355 | \dt \cw{--exclude} \e{wildcard} and \cw{--exclude-path} \e{wildcard} |
| 356 | |
| 357 | \dd These cause particular files or directories to be omitted from |
| 358 | the index, but not from the scan. If \cw{agedu}'s scan encounters a |
| 359 | file or directory whose name matches the wildcard provided to the |
| 360 | \cw{--exclude} option, it will not include that file in its index |
| 361 | \dash but unlike \cw{--prune}, if the file in question is a |
| 362 | directory it will still scan its contents and index them if they are |
| 363 | not ruled out themselves by \cw{--exclude} options. |
| 364 | |
| 365 | \lcont{ |
| 366 | |
| 367 | As above, \cw{--exclude-path} is similar to \cw{--exclude}, except |
| 368 | that the wildcard is matched against the entire pathname. |
| 369 | |
| 370 | } |
| 371 | |
| 372 | \dt \cw{--include} \e{wildcard} and \cw{--include-path} \e{wildcard} |
| 373 | |
| 374 | \dd These cause particular files or directories to be re-included in |
| 375 | the index and the scan, if they had previously been ruled out by one |
| 376 | of the above exclude or prune options. You can interleave include, |
| 377 | exclude and prune options as you wish on the command line, and if |
| 378 | more than one of them applies to a file then the last one takes |
| 379 | priority. |
| 380 | |
| 381 | \lcont{ |
| 382 | |
| 383 | For example, if you wanted to see only the disk space taken up by |
| 384 | MP3 files, you might run |
| 385 | |
| 386 | \c $ agedu -s . --exclude '*' --include '*.mp3' |
| 387 | \e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb |
| 388 | |
| 389 | which will cause everything to be omitted from the scan, but then |
| 390 | the MP3 files to be put back in. If you then wanted only a subset of |
| 391 | those MP3s, you could then exclude some of them again by adding, |
| 392 | say, \cq{--exclude-path './queen/*'} (or, more efficiently, |
| 393 | \cq{--prune ./queen}) on the end of that command. |
| 394 | |
| 395 | As with the previous two options, \cw{--include-path} is similar to |
| 396 | \cw{--include} except that the wildcard is matched against the |
| 397 | entire pathname. |
| 398 | |
| 399 | } |
| 400 | |
| 401 | \dt \cw{--progress}, \cw{--no-progress} and \cw{--tty-progress} |
| 402 | |
| 403 | \dd When \cw{agedu} is scanning a directory tree, it will typically |
| 404 | print a one-line progress report every second showing where it has |
| 405 | reached in the scan, so you can have some idea of how much longer it |
| 406 | will take. (Of course, it can't predict \e{exactly} how long it will |
| 407 | take, since it doesn't know which of the directories it hasn't |
| 408 | scanned yet will turn out to be huge.) |
| 409 | |
| 410 | \lcont{ |
| 411 | |
| 412 | By default, those progress reports are displayed on \cw{agedu}'s |
| 413 | standard error channel, if that channel points to a terminal device. |
| 414 | If you need to manually enable or disable them, you can use the |
| 415 | above three options to do so: \cw{--progress} unconditionally |
| 416 | enables the progress reports, \cw{--no-progress} unconditionally |
| 417 | disables them, and \cw{--tty-progress} reverts to the default |
| 418 | behaviour which is conditional on standard error being a terminal. |
| 419 | |
| 420 | } |
| 421 | |
| 422 | \dt \cw{--dir-atime} and \cw{--no-dir-atime} |
| 423 | |
| 424 | \dd In normal operation, \cw{agedu} ignores the atimes (last access |
| 425 | times) on the \e{directories} it scans: it only pays attention to |
| 426 | the atimes of the \e{files} inside those directories. This is |
| 427 | because directory atimes tend to be reset by a lot of system |
| 428 | administrative tasks, such as \cw{cron} jobs which scan the file |
| 429 | system for one reason or another \dash or even other invocations of |
| 430 | \cw{agedu} itself, though it tries to avoid modifying any atimes if |
| 431 | possible. So the literal atimes on directories are typically not |
| 432 | representative of how long ago the data in question was last |
| 433 | accessed with real intent to use that data in particular. |
| 434 | |
| 435 | \lcont{ |
| 436 | |
| 437 | Instead, \cw{agedu} makes up a fake atime for every directory it |
| 438 | scans, which is equal to the newest atime of any file in or below |
| 439 | that directory (or the directory's last \e{modification} time, |
| 440 | whichever is newest). This is based on the assumption that all |
| 441 | \e{important} accesses to directories are actually accesses to the |
| 442 | files inside those directories, so that when any file is accessed |
| 443 | all the directories on the path leading to it should be considered |
| 444 | to have been accessed as well. |
| 445 | |
| 446 | In unusual cases it is possible that a directory itself might embody |
| 447 | important data which is accessed by reading the directory. In that |
| 448 | situation, \cw{agedu}'s atime-faking policy will misreport the |
| 449 | directory as disused. In the unlikely event that such directories |
| 450 | form a significant part of your disk space usage, you might want to |
| 451 | turn off the faking. The \cw{--dir-atime} option does this: it |
| 452 | causes the disk scan to read the original atimes of the directories |
| 453 | it scans. |
| 454 | |
| 455 | The faking of atimes on directories also requires a processing pass |
| 456 | over the index file after the main disk scan is complete. |
| 457 | \cw{--dir-atime} also turns this pass off. Hence, this option |
| 458 | affects the \cw{-L} option as well as \cw{-s} and \cw{-S}. |
| 459 | |
| 460 | (The previous section mentioned that there might be subtle |
| 461 | differences between the output of \cw{agedu -s /path -D} and |
| 462 | \cw{agedu -S /path}. This is why. Doing a scan with \cw{-s} and then |
| 463 | dumping it with \cw{-D} will dump the fully faked atimes on the |
| 464 | directories, whereas doing a scan-to-dump with \cw{-S} will dump |
| 465 | only \e{partially} faked atimes \dash specifically, each directory's |
| 466 | last modification time \dash since the subsequent processing pass |
| 467 | will not have had a chance to take place. However, loading either of |
| 468 | the resulting dump files with \cw{-L} will perform the atime-faking |
| 469 | processing pass, leading to the same data in the index file in each |
| 470 | case. In normal usage it should be safe to ignore all of this |
| 471 | complexity.) |
| 472 | |
| 473 | } |
| 474 | |
| 475 | \dt \cw{--mtime} |
| 476 | |
| 477 | \dd This option causes \cw{agedu} to index files by their last |
| 478 | modification time instead of their last access time. You might want |
| 479 | to use this if your last access times were completely useless for |
| 480 | some reason: for example, if you had recently searched every file on |
| 481 | your system, the system would have lost all the information about |
| 482 | what files you hadn't recently accessed before then. Using this |
| 483 | option is liable to be less effective at finding genuinely wasted |
| 484 | space than the normal mode (that is, it will be more likely to flag |
| 485 | things as disused when they're not, so you will have more candidates |
| 486 | to go through by hand looking for data you don't need), but may be |
| 487 | better than nothing if your last-access times are unhelpful. |
| 488 | |
| 489 | \lcont{ |
| 490 | |
| 491 | Another use for this mode might be to find \e{recently created} |
| 492 | large data. If your disk has been gradually filling up for years, |
| 493 | the default mode of \cw{agedu} will let you find unused data to |
| 494 | delete; but if you know your disk had plenty of space recently and |
| 495 | now it's suddenly full, and you suspect that some rogue program has |
| 496 | left a large core dump or output file, then \cw{agedu --mtime} might |
| 497 | be a convenient way to locate the culprit. |
| 498 | |
| 499 | } |
| 500 | |
| 501 | The following option affects all the modes that generate reports: |
| 502 | the web server mode \cw{-w}, the stand-alone HTML generation mode |
| 503 | \cw{-H} and the text report mode \cw{-t}. |
| 504 | |
| 505 | \dt \cw{--files} |
| 506 | |
| 507 | \dd This option causes \cw{agedu}'s reports to list the individual |
| 508 | files in each directory, instead of just giving a combined report |
| 509 | for everything that's not in a subdirectory. |
| 510 | |
| 511 | The following options affect the stand-alone HTML generation mode |
| 512 | \cw{-H} and the text report mode \cw{-t}. |
| 513 | |
| 514 | \dt \cw{-d} \e{depth} or \cw{--depth} \e{depth} |
| 515 | |
| 516 | \dd This option controls the maximum depth to which \cw{agedu} |
| 517 | recurses when generating a text or HTML report. |
| 518 | |
| 519 | \lcont{ |
| 520 | |
| 521 | In text mode, the default is 1, meaning that the report will include |
| 522 | the directory given on the command line and all of its immediate |
| 523 | subdirectories. A depth of two includes another level below that, |
| 524 | and so on; a depth of zero means \e{only} the directory on the |
| 525 | command line. |
| 526 | |
| 527 | In HTML mode, specifying this option switches \cw{agedu} from |
| 528 | writing out a single HTML file to writing out multiple files which |
| 529 | link to each other. A depth of 1 means \cw{agedu} will write out an |
| 530 | HTML file for the given directory and also one for each of its |
| 531 | immediate subdirectories. |
| 532 | |
| 533 | If you want \cw{agedu} to recurse as deeply as possible, give the |
| 534 | special word \cq{max} as an argument to \cw{-d}. |
| 535 | |
| 536 | } |
| 537 | |
| 538 | \dt \cw{-o} \e{filename} or \cw{--output} \e{filename} |
| 539 | |
| 540 | \dd This option is used to specify an output file for \cw{agedu} to |
| 541 | write its report to. In text mode or single-file HTML mode, the |
| 542 | argument is treated as the name of a file. In multiple-file HTML |
| 543 | mode, the argument is treated as the name of a directory: the |
| 544 | directory will be created if it does not already exist, and the |
| 545 | output HTML files will be created inside it. |
| 546 | |
| 547 | The following options affect the web server mode \cw{-w}, and in some |
| 548 | cases also the stand-alone HTML generation mode \cw{-H}: |
| 549 | |
| 550 | \dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range} |
| 551 | |
| 552 | \dd The HTML reports produced by \cw{agedu} use a range of colours |
| 553 | to indicate how long ago data was last accessed, running from red |
| 554 | (representing the most disused data) to green (representing the |
| 555 | newest). By default, the lengths of time represented by the two ends |
| 556 | of that spectrum are chosen by examining the data file to see what |
| 557 | range of ages appears in it. However, you might want to set your own |
| 558 | limits, and you can do this using \cw{-r}. |
| 559 | |
| 560 | \lcont{ |
| 561 | |
| 562 | The argument to \cw{-r} consists of a single age, or two ages |
| 563 | separated by a minus sign. An age is a number, followed by one of |
| 564 | \cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days). |
| 565 | The first age in the range represents the oldest data, and will be |
| 566 | coloured red in the HTML; the second age represents the newest, |
| 567 | coloured green. If the second age is not specified, it will default |
| 568 | to zero (so that green means data which has been accessed \e{just |
| 569 | now}). |
| 570 | |
| 571 | For example, \cw{-r 2y} will mark data in red if it has been unused |
| 572 | for two years or more, and green if it has been accessed just now. |
| 573 | \cw{-r 2y-3m} will similarly mark data red if it has been unused for |
| 574 | two years or more, but will mark it green if it has been accessed |
| 575 | three months ago or later. |
| 576 | |
| 577 | } |
| 578 | |
| 579 | \dt \cw{--address} \e{addr}[\cw{:}\e{port}] |
| 580 | |
| 581 | \dd Specifies the network address and port number on which \cw{agedu} |
| 582 | should listen when running its web server. If you want \cw{agedu} to |
| 583 | listen for connections coming in from any source, specify the address |
| 584 | as the special value \cw{ANY}. If the port number is omitted, an |
| 585 | arbitrary unused port will be chosen for you and displayed. |
| 586 | |
| 587 | \lcont{ |
| 588 | |
| 589 | If you specify this option, \cw{agedu} will not print its URL on |
| 590 | standard output (since you are expected to know what address you |
| 591 | told it to listen to). |
| 592 | |
| 593 | } |
| 594 | |
| 595 | \dt \cw{--auth} \e{auth-type} |
| 596 | |
| 597 | \dd Specifies how \cw{agedu} should control access to the web pages |
| 598 | it serves. The options are as follows: |
| 599 | |
| 600 | \lcont{ |
| 601 | |
| 602 | \dt \cw{magic} |
| 603 | |
| 604 | \dd This option only works on Linux, and only when the incoming |
| 605 | connection is from the same machine that \cw{agedu} is running on. |
| 606 | On Linux, the special file \cw{/proc/net/tcp} contains a list of |
| 607 | network connections currently known to the operating system kernel, |
| 608 | including which user id created them. So \cw{agedu} will look up |
| 609 | each incoming connection in that file, and allow access if it comes |
| 610 | from the same user id under which \cw{agedu} itself is running. |
| 611 | Therefore, in \cw{agedu}'s normal web server mode, you can safely |
| 612 | run it on a multi-user machine and no other user will be able to |
| 613 | read data out of your index file. |
| 614 | |
| 615 | \dt \cw{basic} |
| 616 | |
| 617 | \dd In this mode, \cw{agedu} will use HTTP Basic authentication: the |
| 618 | user will have to provide a username and password via their browser. |
| 619 | \cw{agedu} will normally make up a username and password for the |
| 620 | purpose, but you can specify your own; see below. |
| 621 | |
| 622 | \dt \cw{none} |
| 623 | |
| 624 | \dd In this mode, the web server is unauthenticated: anyone |
| 625 | connecting to it has full access to the reports generated by |
| 626 | \cw{agedu}. Do not do this unless there is nothing confidential at |
| 627 | all in your index file, or unless you are certain that nobody but |
| 628 | you can run processes on your computer. |
| 629 | |
| 630 | \dt \cw{default} |
| 631 | |
| 632 | \dd This is the default mode if you do not specify one of the above. |
| 633 | In this mode, \cw{agedu} will attempt to use Linux magic |
| 634 | authentication, but if it detects at startup time that |
| 635 | \cw{/proc/net/tcp} is absent or non-functional then it will fall |
| 636 | back to using HTTP Basic authentication and invent a user name and |
| 637 | password. |
| 638 | |
| 639 | } |
| 640 | |
| 641 | \dt \cw{--auth-file} \e{filename} or \cw{--auth-fd} \e{fd} |
| 642 | |
| 643 | \dd When \cw{agedu} is using HTTP Basic authentication, these |
| 644 | options allow you to specify your own user name and password. If you |
| 645 | specify \cw{--auth-file}, these will be read from the specified |
| 646 | file; if you specify \cw{--auth-fd} they will instead be read from a |
| 647 | given file descriptor which you should have arranged to pass to |
| 648 | \cw{agedu}. In either case, the authentication details should |
| 649 | consist of the username, followed by a colon, followed by the |
| 650 | password, followed \e{immediately} by end of file (no trailing |
| 651 | newline, or else it will be considered part of the password). |
| 652 | |
| 653 | \dt \cw{--title} \e{title} |
| 654 | |
| 655 | \dd Specify the string that appears at the start of the \cw{<title>} |
| 656 | section of the output HTML pages. The default is \cq{agedu}. This |
| 657 | title is followed by a colon and then the path you're viewing within |
| 658 | the index file. You might use this option if you were serving |
| 659 | \cw{agedu} reports for several different servers and wanted to make it |
| 660 | clearer which one a user was looking at. |
| 661 | |
| 662 | \dt \cw{--no-eof} |
| 663 | |
| 664 | \dd Stop \cw{agedu} in web server mode from looking for end-of-file on |
| 665 | standard input and treating it as a signal to terminate. |
| 666 | |
| 667 | \U LIMITATIONS |
| 668 | |
| 669 | The data file is pretty large. The core of \cw{agedu} is the |
| 670 | tree-based data structure it uses in its index in order to |
| 671 | efficiently perform the queries it needs; this data structure |
| 672 | requires \cw{O(N log N)} storage. This is larger than you might |
| 673 | expect; a scan of my own home directory, containing half a million |
| 674 | files and directories and about 20Gb of data, produced an index file |
| 675 | over 60Mb in size. Furthermore, since the data file must be |
| 676 | memory-mapped during most processing, it can never grow larger than |
| 677 | available address space, so a \e{really} big filesystem may need to |
| 678 | be indexed on a 64-bit computer. (This is one reason for the |
| 679 | existence of the \cw{-D} and \cw{-L} options: you can do the |
| 680 | scanning on the machine with access to the filesystem, and the |
| 681 | indexing on a machine big enough to handle it.) |
| 682 | |
| 683 | The data structure also does not usefully permit access control |
| 684 | within the data file, so it would be difficult \dash even given the |
| 685 | willingness to do additional coding \dash to run a system-wide |
| 686 | \cw{agedu} scan on a \cw{cron} job and serve the right subset of |
| 687 | reports to each user. |
| 688 | |
| 689 | In certain circumstances, \cw{agedu} can report false positives |
| 690 | (reporting files as disused which are in fact in use) as well as the |
| 691 | more benign false negatives (reporting files as in use which are |
| 692 | not). This arises when a file is, semantically speaking, \q{read} |
| 693 | without actually being physically \e{read}. Typically this occurs |
| 694 | when a program checks whether the file's mtime has changed and only |
| 695 | bothers re-reading it if it has; programs which do this include |
| 696 | \cw{rsync}(\e{1}) and \cw{make}(\e{1}). Such programs will fail to |
| 697 | update the atime of unmodified files despite depending on their |
| 698 | continued existence; a directory full of such files will be reported |
| 699 | as disused by \cw{agedu} but deleting them will cause trouble. |
| 700 | |
| 701 | \U LICENCE |
| 702 | |
| 703 | \cw{agedu} is free software, distributed under the MIT licence. Type |
| 704 | \cw{agedu --licence} to see the full licence text. |
| 705 | |
| 706 | \versionid $Id$ |