X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/utils/blobdiff_plain/8a48d402ca6948144107b6e3bc857d90155bf4cb..HEAD:/cvt-utf8/cvt-utf8.but diff --git a/cvt-utf8/cvt-utf8.but b/cvt-utf8/cvt-utf8.but index 3ee4832..a5cc683 100644 --- a/cvt-utf8/cvt-utf8.but +++ b/cvt-utf8/cvt-utf8.but @@ -1,16 +1,17 @@ \cfg{man-identity}{cvt-utf8}{1}{2004-03-24}{Simon Tatham}{Simon Tatham} -\cfg{html-chapter-numeric}{yes} + +\define{dash} \u2013{-} \title Man page for \cw{cvt-utf8} \U NAME -\cw{cvt-utf8} - convert between UTF-8 and Unicode, and analyse Unicode +\cw{cvt-utf8} \dash convert between UTF-8 and Unicode, and analyse Unicode \U SYNOPSIS -\c cvt-utf8 [flags] [hex UTF-8 bytes and/or U+codepoints] -\e bbbbbbbb iiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii +\c cvt-utf8 [flags] [hex UTF-8 bytes, U+codepoints, SGML entities] +\e bbbbbbbb iiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii \U DESCRIPTION @@ -29,11 +30,12 @@ code point in the Unicode character database and identify it. \b Look up Unified Han characters in the \q{Unihan} database and provide their translation text. -By default, \cw{cvt-utf8} expects to receive hex numbers (either -UTF-8 bytes or Unicode code points) on the command line, and it will -print out a verbose analysis of the input data. If you need it to -read UTF-8 from standard input or to write pure UTF-8 to standard -output, you can do so using command-line options. +By default, \cw{cvt-utf8} expects to receive character data on the +command line (as a mixture of UTF-8 bytes, Unicode code points and +SGML numeric character entities), and it will print out a verbose +analysis of the input data. If you need it to read UTF-8 from +standard input or to write pure UTF-8 to standard output, you can do +so using command-line options. \U OPTIONS @@ -67,6 +69,15 @@ points... ... and \cw{cvt-utf8} gives you the UTF-8 encodings plus the character definitions. +If it's more convenient, you can specify those characters as SGML +numeric entity references (for example if you're cutting and pasting +out of a web page): + +\c $ cvt-utf8 '€' '–' +\e bbbbbbbbbbbbbbbbbbbbbbbbbbbbb +\c U-000020AC E2 82 AC EURO SIGN +\c U-00002013 E2 80 93 EN DASH + Alternatively, you can supply a list of UTF-8 bytes... \c $ cvt-utf8 D0 A0 D1 83 D1 81 D1 81 D0 BA D0 B8 D0 B9 @@ -130,8 +141,8 @@ Chinese text meaning \q{Traditional Chinese}: \U ADMINISTRATION In order to print the \cw{unicode.org} official name of each -character, \cw{cvt-utf8} requires file mapping code points to names. -This file is in DBM database format, for rapid lookup. +character, \cw{cvt-utf8} requires a file mapping code points to +names. This file is in DBM database format, for rapid lookup. This database file is accessed using the Python \cw{anydbm} module, so its precise file name will vary depending on what flavours of DBM @@ -189,3 +200,5 @@ perform the rest of its functions. \cw{cvt-utf8} is free software, distributed under the MIT licence. Type \cw{cvt-utf8 --licence} to see the full licence text. + +\versionid $Id$