X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/utils/blobdiff_plain/9acadc2b1377453e1c10614920bd390c52227e8a..22bf13b350b5cceb9bbfa0e17029e4f6c1cf1273:/cvt-utf8/cvt-utf8.but diff --git a/cvt-utf8/cvt-utf8.but b/cvt-utf8/cvt-utf8.but index 427c097..bf8294a 100644 --- a/cvt-utf8/cvt-utf8.but +++ b/cvt-utf8/cvt-utf8.but @@ -1,18 +1,17 @@ \cfg{man-identity}{cvt-utf8}{1}{2004-03-24}{Simon Tatham}{Simon Tatham} -\cfg{man-mindepth}{1} -\C{cvt-utf8-manpage} Man page for \cw{cvt-utf8} +\title Man page for \cw{cvt-utf8} -\H{cvt-utf8-manpage-name} NAME +\U NAME \cw{cvt-utf8} - convert between UTF-8 and Unicode, and analyse Unicode -\H{cvt-utf8-manpage-synopsis} SYNOPSIS +\U SYNOPSIS \c cvt-utf8 [flags] [hex UTF-8 bytes and/or U+codepoints] \e bbbbbbbb iiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii -\H{cvt-utf8-manpage-description} DESCRIPTION +\U DESCRIPTION \cw{cvt-utf8} is a tool for manipulating and analysing UTF-8 and Unicode data. Its functions include: @@ -35,7 +34,7 @@ print out a verbose analysis of the input data. If you need it to read UTF-8 from standard input or to write pure UTF-8 to standard output, you can do so using command-line options. -\H{cvt-utf8-manpage-options} OPTIONS +\U OPTIONS \dt \cw{-i} @@ -52,7 +51,7 @@ long analysis of the input data. \dd Look up each code point in the Unihan database as well as the main Unicode character database. -\H{cvt-utf8-manpage-examples} EXAMPLES +\U EXAMPLES In \cw{cvt-utf8}'s native mode, it simply analyses input Unicode or UTF-8 data. For example, you can give a list of Unicode code @@ -127,7 +126,67 @@ Chinese text meaning \q{Traditional Chinese}: \c midst of; hit (target); attain \c U-00006587 E6 96 87 literature, culture, writing -\H{cvt-utf8-manpage-bugs} BUGS +\U ADMINISTRATION -Command-line option processing is very basic. In particular, \cw{-h} -must come before \cw{-i} or it will not be recognised. +In order to print the \cw{unicode.org} official name of each +character, \cw{cvt-utf8} requires a file mapping code points to +names. This file is in DBM database format, for rapid lookup. + +This database file is accessed using the Python \cw{anydbm} module, +so its precise file name will vary depending on what flavours of DBM +you have installed. The name Python knows it by is \cq{unicode}; it +may actually be called \cq{unicode.db} or something similar. + +\cw{cvt-utf8} generates this DBM file itself starting from the +Unicode Character Database, in the form of the file +\cw{UnicodeData.txt} supplied by \cw{unicode.org}. It supports two +administrative options for this purpose: + +\c cvt-utf8 --build /path/to/UnicodeData.txt /path/to/unicode + +Given a copy of \cw{UnicodeData.txt} on disk, this mode will create +the DBM file and store it in a place of your choice. + +\c cvt-utf8 --fetch-build /path/to/unicode + +If you have a direct Internet connection, this will automatically +download the text file from \cw{unicode.org} and process it straight +into the DBM file. + +There is a second DBM file, known to Python as \cw{unihan}, which is +required to support the \cw{-h} option. This one is built from the +Unihan Database, distributed by \cw{unicode.org} as a zip file +containing a text file \cw{Unihan.txt}. + +If you already have \cw{Unihan.txt} on your system, you can build +\cw{cvt-utf8}'s \cw{unihan} DBM file like this: + +\c cvt-utf8 --build-unihan /path/to/Unihan.txt /path/to/unihan + +Or, again, \cw{cvt-utf8} can automatically download it from +\cw{unicode.org}, unpack the zip file on the fly, and write the DBM +straight out: + +\c cvt-utf8 --fetch-build-unihan /path/to/unihan + +\cw{cvt-utf8} expects to find these database files in one of the +following locations: + +\c /usr/share/unicode +\c /usr/lib/unicode +\c /usr/local/share/unicode +\c /usr/local/lib/unicode +\c $HOME/share/unicode +\e iiiii +\c $HOME/lib/unicode +\e iiiii + +If either of these files is not found, \cw{cvt-utf8} will still +perform the rest of its functions. + +\U LICENCE + +\cw{cvt-utf8} is free software, distributed under the MIT licence. +Type \cw{cvt-utf8 --licence} to see the full licence text. + +\versionid $Id$