mdw@git.distorted.org.uk Git - sgt/charset/blob - README

   1 This subdirectory contains a general character-set conversion
   2 library, used in Timber, and available for use in other software if
   3 it should happen to be useful.
   4
   5 I intend to use this same library in other programs at some future
   6 date. (A cut-down version of it is already in use in some ports of
   7 PuTTY.) It is therefore a _strong_ design goal that this library
   8 should remain perfectly general, and not tied to particulars of
   9 Timber. It must not reference any code outside its own subdirectory;
  10 it should not have Timber-specific helper routines added to it
  11 unless they can be documented in a general manner which might make
  12 them useful in other circumstances as well.
  13
  14 There are some multibyte character encodings which this library does
  15 not currently support. Those that I know of are:
  16
  17  - Johab. There is no reason why we _shouldn't_ support this, but it
  18    wasn't immediately necessary at the time I did the initial
  19    coding. If anyone needs it, it shouldn't be too hard. The Unicode
  20    mapping table for the encoding is available at
  21    http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT
  22
  23  - ISO-2022-JP-1 (RFC 2237), and ISO-2022-JP-2 (RFC 1554). These
  24    should be even easier if required - we already have the ISO 2022
  25    machinery in place, and support all the underlying character
  26    sets.
  27
  28  - ISO-2022-CN and ISO-2022-CN-EXT (RFC 1922). These are a little tricky
  29    as they allow use of both GB2312 (simplified Chinese) and CNS 11643
  30    (traditional Chinese), so we may need some way to specify which to
  31    prefer.
  32
  33  - The Hong Kong (HKSCS) extension to Big5. Again, mapping tables
  34    are available in the Unihan database.
  35
  36  - Other Big Five extensions, which I don't have mapping tables for
  37    at all.