Move MODULE files out of individual project directories into a
[sgt/charset] / README
CommitLineData
c6d25d8d 1This subdirectory contains a general character-set conversion
2library, used in Timber, and available for use in other software if
3it should happen to be useful.
4
5I intend to use this same library in other programs at some future
6date. (A cut-down version of it is already in use in some ports of
7PuTTY.) It is therefore a _strong_ design goal that this library
8should remain perfectly general, and not tied to particulars of
9Timber. It must not reference any code outside its own subdirectory;
10it should not have Timber-specific helper routines added to it
11unless they can be documented in a general manner which might make
12them useful in other circumstances as well.
13
14There are some multibyte character encodings which this library does
15not currently support. Those that I know of are:
16
17 - Johab. There is no reason why we _shouldn't_ support this, but it
18 wasn't immediately necessary at the time I did the initial
19 coding. If anyone needs it, it shouldn't be too hard. The Unicode
20 mapping table for the encoding is available at
21 http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT
22
23 - ISO-2022-JP-1 (RFC 2237), and ISO-2022-JP-2 (RFC 1554). These
24 should be even easier if required - we already have the ISO 2022
25 machinery in place, and support all the underlying character
26 sets.
27
28 - ISO-2022-CN and ISO-2022-CN-EXT (RFC 1922), and EUC-TW. These
29 encodings depend on the CNS 11643-1992 character set. Mapping
30 table data for this set is available from unicode.org, but only
31 in the Unihan database
32 ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip
33
34 - The Hong Kong (HKSCS) extension to Big5. Again, mapping tables
35 are available in the Unihan database.
36
37 - Other Big Five extensions, which I don't have mapping tables for
38 at all.