Silly of me to overlook it: another obvious way you might like to
[sgt/charset] / README
CommitLineData
c6d25d8d 1This subdirectory contains a general character-set conversion
2library, used in Timber, and available for use in other software if
3it should happen to be useful.
4
5I intend to use this same library in other programs at some future
6date. (A cut-down version of it is already in use in some ports of
7PuTTY.) It is therefore a _strong_ design goal that this library
8should remain perfectly general, and not tied to particulars of
9Timber. It must not reference any code outside its own subdirectory;
10it should not have Timber-specific helper routines added to it
11unless they can be documented in a general manner which might make
12them useful in other circumstances as well.
13
14There are some multibyte character encodings which this library does
15not currently support. Those that I know of are:
16
17 - Johab. There is no reason why we _shouldn't_ support this, but it
18 wasn't immediately necessary at the time I did the initial
19 coding. If anyone needs it, it shouldn't be too hard. The Unicode
20 mapping table for the encoding is available at
21 http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT
22
23 - ISO-2022-JP-1 (RFC 2237), and ISO-2022-JP-2 (RFC 1554). These
24 should be even easier if required - we already have the ISO 2022
25 machinery in place, and support all the underlying character
26 sets.
27
b063a840 28 - ISO-2022-CN and ISO-2022-CN-EXT (RFC 1922). These are a little tricky
29 as they allow use of both GB2312 (simplified Chinese) and CNS 11643
30 (traditional Chinese), so we may need some way to specify which to
31 prefer.
c6d25d8d 32
33 - The Hong Kong (HKSCS) extension to Big5. Again, mapping tables
34 are available in the Unihan database.
35
36 - Other Big Five extensions, which I don't have mapping tables for
37 at all.