sgt/charset
11 years agoSilly of me to overlook it: another obvious way you might like to master git-svn
simon [Thu, 19 Jul 2012 17:03:15 +0000 (17:03 +0000)]
Silly of me to overlook it: another obvious way you might like to
specify characters to 'confuse' is to just put them on the command
line in the system multibyte encoding! In a UTF-8 terminal environment
this may very well be the easiest thing.

git-svn-id: svn://svn.tartarus.org/sgt/charset@9584 cda61777-01e9-0310-a592-d414129be87e

11 years agoA slightly silly new utility: 'confuse'. You provide it with some
simon [Wed, 18 Jul 2012 22:52:00 +0000 (22:52 +0000)]
A slightly silly new utility: 'confuse'. You provide it with some
Unicode values (typically two of them), and it finds cases in which
the provided characters are all encoded as the same thing in different
charsets and prints those charsets. So if you encounter, for example,
some piece of text which has U+0153 LATIN SMALL LIGATURE OE where you
might have expected U+00A3 POUND SIGN, simply run 'confuse 153 a3' and
it'll tell you which character sets the sender and receiver of the
text might have got confused between.

git-svn-id: svn://svn.tartarus.org/sgt/charset@9581 cda61777-01e9-0310-a592-d414129be87e

11 years agoMechanism for iterating over all supported charsets.
simon [Wed, 18 Jul 2012 22:49:07 +0000 (22:49 +0000)]
Mechanism for iterating over all supported charsets.

git-svn-id: svn://svn.tartarus.org/sgt/charset@9580 cda61777-01e9-0310-a592-d414129be87e

11 years agoFix an integer-type mismatch between %04x in a printf format string
simon [Thu, 3 May 2012 18:20:39 +0000 (18:20 +0000)]
Fix an integer-type mismatch between %04x in a printf format string
and a long int. Spotted by Ubuntu 12.04's gcc, and probably would have
caused trouble on 64-bit machines.

git-svn-id: svn://svn.tartarus.org/sgt/charset@9489 cda61777-01e9-0310-a592-d414129be87e

12 years agoCorrect a comment.
simon [Wed, 9 Nov 2011 21:38:39 +0000 (21:38 +0000)]
Correct a comment.

I had wrongly believed my TYPECHECK macro double-evaluated one of its
arguments and hence would cause side effects to happen twice. But in
fact I've just realised that although it double-_expands_ the
argument, it doesn't double-_evaluate_ it: the two expansions occur in
mutually exclusive branches of a ?:, and hence cannot both be
executed.

So I've removed the comment that says my macro is rubbish. My macro is
in fact great :-)

git-svn-id: svn://svn.tartarus.org/sgt/charset@9328 cda61777-01e9-0310-a592-d414129be87e

12 years agoMerge PuTTY r9326, adding CP852 support.
simon [Fri, 14 Oct 2011 07:04:04 +0000 (07:04 +0000)]
Merge PuTTY r9326, adding CP852 support.

git-svn-id: svn://svn.tartarus.org/sgt/charset@9327 cda61777-01e9-0310-a592-d414129be87e

15 years agoI've just seen the MIME charset name 'x-sjis' in the wild. Add it to
simon [Fri, 17 Apr 2009 17:35:57 +0000 (17:35 +0000)]
I've just seen the MIME charset name 'x-sjis' in the wild. Add it to
the list.

git-svn-id: svn://svn.tartarus.org/sgt/charset@8498 cda61777-01e9-0310-a592-d414129be87e

15 years agoctype functions require their argument to be EOF or representable as an
ben [Sun, 11 Jan 2009 14:17:57 +0000 (14:17 +0000)]
ctype functions require their argument to be EOF or representable as an
unsigned char.  On platforms were char is signed, passing plain char won't
cut it.  Make sure we case chars to unsigned char before passing them to
tolower().

git-svn-id: svn://svn.tartarus.org/sgt/charset@8404 cda61777-01e9-0310-a592-d414129be87e

15 years agoOh, all right. Put in the implicit zero elements at the ends of the
simon [Fri, 21 Nov 2008 19:20:33 +0000 (19:20 +0000)]
Oh, all right. Put in the implicit zero elements at the ends of the
initialisers, so that gcc stops whining.

git-svn-id: svn://svn.tartarus.org/sgt/charset@8311 cda61777-01e9-0310-a592-d414129be87e

15 years agoI've just had some spam in Windows-874, a Thai SBCS. Add libcharset
simon [Thu, 21 Aug 2008 18:39:52 +0000 (18:39 +0000)]
I've just had some spam in Windows-874, a Thai SBCS. Add libcharset
support for it.

git-svn-id: svn://svn.tartarus.org/sgt/charset@8151 cda61777-01e9-0310-a592-d414129be87e

15 years agoJust in case sbcsgen.pl is fed an sbcs.dat with the wrong line
simon [Wed, 9 Jul 2008 17:06:57 +0000 (17:06 +0000)]
Just in case sbcsgen.pl is fed an sbcs.dat with the wrong line
endings, remove \r from input lines.

git-svn-id: svn://svn.tartarus.org/sgt/charset@8113 cda61777-01e9-0310-a592-d414129be87e

16 years agoAdd the ability to pass a NULL output buffer and/or an unlimited
simon [Sun, 5 Aug 2007 12:50:57 +0000 (12:50 +0000)]
Add the ability to pass a NULL output buffer and/or an unlimited
output length to charset_{to,from}_unicode, permitting convenient
dry-running of conversions to determine the required output length
and/or test for the presence of difficult characters.

git-svn-id: svn://svn.tartarus.org/sgt/charset@7677 cda61777-01e9-0310-a592-d414129be87e

17 years agoAdd rule to compile emacsenc.c. Noticed by David Leonard.
ben [Mon, 30 Apr 2007 20:24:39 +0000 (20:24 +0000)]
Add rule to compile emacsenc.c.  Noticed by David Leonard.

git-svn-id: svn://svn.tartarus.org/sgt/charset@7495 cda61777-01e9-0310-a592-d414129be87e

17 years agoAdd a mechanism for translating to and from the coding system symbols
ben [Mon, 9 Apr 2007 11:29:22 +0000 (11:29 +0000)]
Add a mechanism for translating to and from the coding system symbols
used by GNU Emacs.  This is likely to be useful for generating or
interpreting "coding:" entries in file local variables.

git-svn-id: svn://svn.tartarus.org/sgt/charset@7455 cda61777-01e9-0310-a592-d414129be87e

17 years agoI've apparently had this lying around for months but forgotten to
simon [Tue, 13 Jun 2006 09:06:28 +0000 (09:06 +0000)]
I've apparently had this lying around for months but forgotten to
commit it. Add `-i' option to cstable, which causes charset names to
be output as CS_* constants where meaningful. (Doesn't apply to MBCS
base charsets, because CS_* constants identify _encodings_.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@6728 cda61777-01e9-0310-a592-d414129be87e

17 years agoRemove an outright lie I've just noticed in the comment at the top
simon [Thu, 18 May 2006 12:32:18 +0000 (12:32 +0000)]
Remove an outright lie I've just noticed in the comment at the top
of this file!

git-svn-id: svn://svn.tartarus.org/sgt/charset@6705 cda61777-01e9-0310-a592-d414129be87e

18 years agosbcsgen.pl was giving different results on different machines in the case
jacob [Wed, 26 Apr 2006 23:01:06 +0000 (23:01 +0000)]
sbcsgen.pl was giving different results on different machines in the case
where two SBCS code points mapped to a single Unicode point.
Changed so that by default it favours the lower SBCS code point.
On ixion, this highlighted ambiguities in CS_MAC_THAI, CS_MAC_SYMBOL, and
CS_VISCII. Guessed at a preference for the first two and added "sortpriority"
directives. (No idea about VISCII.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@6641 cda61777-01e9-0310-a592-d414129be87e

18 years agoCP866 is popular and small. Add it to both the general and PuTTY
jacob [Sun, 18 Dec 2005 16:57:00 +0000 (16:57 +0000)]
CP866 is popular and small. Add it to both the general and PuTTY
implementations of libcharset, since we've had at least one request for
it in PuTTY.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6499 cda61777-01e9-0310-a592-d414129be87e

18 years agoReinstate the DEPLANARISE macros, this time in what I believe is a
simon [Tue, 15 Nov 2005 21:07:45 +0000 (21:07 +0000)]
Reinstate the DEPLANARISE macros, this time in what I believe is a
genuinely portable form. (Thanks to IWJ for ideas.) While I'm here,
add a couple of explicit `unsigned' casts and U suffixes to prevent
more pedantic compilers from warning.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6463 cda61777-01e9-0310-a592-d414129be87e

18 years agoFix various compiler warnings and errors. In particular, my cunning
simon [Sun, 13 Nov 2005 12:33:41 +0000 (12:33 +0000)]
Fix various compiler warnings and errors. In particular, my cunning
auto-type-checking DEPLANARISE and REPLANARISE macros have turned
out to only work in gcc, which is a shame.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6455 cda61777-01e9-0310-a592-d414129be87e

18 years agowrite_utf8() is used in iso2022.c as of r6378; declare it.
jacob [Sun, 23 Oct 2005 10:14:01 +0000 (10:14 +0000)]
write_utf8() is used in iso2022.c as of r6378; declare it.
(Fixes a warning in iso2022.c. There are lots more.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@6424 cda61777-01e9-0310-a592-d414129be87e

18 years agoWorking ISO 2022 output function. Outputs full ISO 2022 (not sure
simon [Fri, 7 Oct 2005 17:13:24 +0000 (17:13 +0000)]
Working ISO 2022 output function. Outputs full ISO 2022 (not sure
what that's useful for but it seemed a pity not to do it) and
compound text.

I've completely removed the compound text implementation from
iso2022s.c in favour of using the more flexible iso2022.c, meaning
we can cope with nastiness such as DOCS.

This is largely untested: I've checked it on small examples as I
went along, but it lacks anything resembling a proper test suite.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6378 cda61777-01e9-0310-a592-d414129be87e

18 years agoPostScript StandardEncoding might occasionally come in handy. While
simon [Thu, 6 Oct 2005 10:05:34 +0000 (10:05 +0000)]
PostScript StandardEncoding might occasionally come in handy. While
I'm here, I've updated the URL to the Adobe Glyph List.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6376 cda61777-01e9-0310-a592-d414129be87e

18 years agoCorrect the URL of the X Registry to the one given in the Registry, which
ben [Mon, 26 Sep 2005 10:54:34 +0000 (10:54 +0000)]
Correct the URL of the X Registry to the one given in the Registry, which
works (unlike our old one).

git-svn-id: svn://svn.tartarus.org/sgt/charset@6358 cda61777-01e9-0310-a592-d414129be87e

18 years agoNever loop up to _and including_ lenof(array).
simon [Mon, 26 Sep 2005 08:51:43 +0000 (08:51 +0000)]
Never loop up to _and including_ lenof(array).

git-svn-id: svn://svn.tartarus.org/sgt/charset@6357 cda61777-01e9-0310-a592-d414129be87e

18 years agoCorrect copy and paste error.
simon [Sat, 24 Sep 2005 18:24:25 +0000 (18:24 +0000)]
Correct copy and paste error.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6354 cda61777-01e9-0310-a592-d414129be87e

18 years agoEUC-TW implementation, plus an explanation of why ISO-2022-CN is difficult.
ben [Sat, 24 Sep 2005 17:50:36 +0000 (17:50 +0000)]
EUC-TW implementation, plus an explanation of why ISO-2022-CN is difficult.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6353 cda61777-01e9-0310-a592-d414129be87e

18 years agoHow on earth have I managed not to have libcharset.a itself in the
simon [Sat, 24 Sep 2005 17:39:44 +0000 (17:39 +0000)]
How on earth have I managed not to have libcharset.a itself in the
svn:ignore property for so long?!

git-svn-id: svn://svn.tartarus.org/sgt/charset@6352 cda61777-01e9-0310-a592-d414129be87e

18 years agoSpace-saving restructure of the CNS 11643 data tables. Reduces the
simon [Sat, 24 Sep 2005 17:15:33 +0000 (17:15 +0000)]
Space-saving restructure of the CNS 11643 data tables. Reduces the
RO data size in cns11643.o from 400k to 240k. Relies on there being
at most seven planes (7*94*94 <= 64k) and on the character set not
encoding any Unicode code point above U+40000; if either of these
becomes untrue later on we can always fall back to the previous
approach, or to somewhere between that and here.

The new version passes all the same tests as the old one did, and
generates the same output under the new `cstable -v'. I'm confident
that I haven't broken it.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6351 cda61777-01e9-0310-a592-d414129be87e

18 years agoFix a couple of warnings.
ben [Sat, 24 Sep 2005 17:09:32 +0000 (17:09 +0000)]
Fix a couple of warnings.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6350 cda61777-01e9-0310-a592-d414129be87e

18 years agoIntroduce the -v flag which outputs the actual index of each code
simon [Sat, 24 Sep 2005 17:08:43 +0000 (17:08 +0000)]
Introduce the -v flag which outputs the actual index of each code
point in every charset.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6349 cda61777-01e9-0310-a592-d414129be87e

18 years agoAdd support for CNS 11643.
ben [Sat, 24 Sep 2005 17:08:41 +0000 (17:08 +0000)]
Add support for CNS 11643.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6348 cda61777-01e9-0310-a592-d414129be87e

18 years agoInclude CNS 11643 in the cstable diagnostic utility.
simon [Sat, 24 Sep 2005 17:05:19 +0000 (17:05 +0000)]
Include CNS 11643 in the cstable diagnostic utility.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6347 cda61777-01e9-0310-a592-d414129be87e

18 years agoIRG source T3 includes not only plane 3 of CNS 11643, but also "some additional
ben [Sat, 24 Sep 2005 16:26:55 +0000 (16:26 +0000)]
IRG source T3 includes not only plane 3 of CNS 11643, but also "some additional
characters".  We now filter out the latter from our mapping table.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6345 cda61777-01e9-0310-a592-d414129be87e

18 years agoCNS 11643 goes above the BMP, so the test code should take that into
simon [Sat, 24 Sep 2005 16:02:49 +0000 (16:02 +0000)]
CNS 11643 goes above the BMP, so the test code should take that into
account when checking the reverse mapping for every potentially
relevant Unicode character.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6343 cda61777-01e9-0310-a592-d414129be87e

18 years agoAdd a mapping table for CNS 11643-1992. It's a bit big, and nothing
ben [Sat, 24 Sep 2005 15:50:33 +0000 (15:50 +0000)]
Add a mapping table for CNS 11643-1992.  It's a bit big, and nothing
uses it yet.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6342 cda61777-01e9-0310-a592-d414129be87e

18 years agoSupport for the ESC $ ( 0 and ESC $ ( 1 sets that Emacs uses to embed
ben [Wed, 21 Sep 2005 12:45:56 +0000 (12:45 +0000)]
Support for the ESC $ ( 0 and ESC $ ( 1 sets that Emacs uses to embed
Big5 in COMPOUND_TEXT.  Emacs does lots of other rude things to
COMPOUND_TEXT, but this one is supported by XLib as well.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6336 cda61777-01e9-0310-a592-d414129be87e

18 years agoAdd support for COMPOUND_TEXT extended segments encoding ISO 98859-14,
ben [Wed, 21 Sep 2005 10:21:20 +0000 (10:21 +0000)]
Add support for COMPOUND_TEXT extended segments encoding ISO 98859-14,
ISO 8859-15, and BIG5.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6335 cda61777-01e9-0310-a592-d414129be87e

18 years agoAdd two new SBCSes: BS 4730 (alias UK-ASCII) and DEC graphics (alias VT100
ben [Sun, 18 Sep 2005 14:53:19 +0000 (14:53 +0000)]
Add two new SBCSes: BS 4730 (alias UK-ASCII) and DEC graphics (alias VT100
line-drawing).  I think this means that libcharset supports all the character
sets that PuTTY supports, which is nice.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6330 cda61777-01e9-0310-a592-d414129be87e

18 years agoWhen documenting s0 and s1, get then the right way around.
ben [Sun, 18 Sep 2005 13:34:37 +0000 (13:34 +0000)]
When documenting s0 and s1, get then the right way around.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6329 cda61777-01e9-0310-a592-d414129be87e

18 years ago1: Better documentation of how read_iso2022() stores its state.
ben [Sun, 18 Sep 2005 13:31:00 +0000 (13:31 +0000)]
1: Better documentation of how read_iso2022() stores its state.
2: Minimal write_iso2022(): it can't encode anything, but promises not to
   segfault.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6328 cda61777-01e9-0310-a592-d414129be87e

18 years agoBen points out that ESC ( J in ISO-2022-JP should encode the
simon [Sun, 18 Sep 2005 13:01:42 +0000 (13:01 +0000)]
Ben points out that ESC ( J in ISO-2022-JP should encode the
_bottom_ half of JIS X 0201 (the one that's almost identical to
ASCII, equivalent to the bottom half of Shift-JIS), not the top
half.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6327 cda61777-01e9-0310-a592-d414129be87e

18 years agoMake read_utf8(), like read_sbcs(), accessible to the rest of the library,
ben [Sun, 18 Sep 2005 12:49:44 +0000 (12:49 +0000)]
Make read_utf8(), like read_sbcs(), accessible to the rest of the library,
so it can be used directly in iso2022.c.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6326 cda61777-01e9-0310-a592-d414129be87e

18 years agoUndo another change that leaked through with the ISO-2022 commit.
ben [Sun, 18 Sep 2005 12:48:50 +0000 (12:48 +0000)]
Undo another change that leaked through with the ISO-2022 commit.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6325 cda61777-01e9-0310-a592-d414129be87e

18 years agoUpdate comment to reflect state of DOCS support.
ben [Sun, 18 Sep 2005 12:39:18 +0000 (12:39 +0000)]
Update comment to reflect state of DOCS support.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6324 cda61777-01e9-0310-a592-d414129be87e

18 years agoUndo accidental change in previous commit.
ben [Sun, 18 Sep 2005 12:24:10 +0000 (12:24 +0000)]
Undo accidental change in previous commit.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6323 cda61777-01e9-0310-a592-d414129be87e

18 years agoSupport for using DOCS to switch to and from UTF-8 mode.
ben [Sat, 17 Sep 2005 18:47:35 +0000 (18:47 +0000)]
Support for using DOCS to switch to and from UTF-8 mode.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6321 cda61777-01e9-0310-a592-d414129be87e

18 years agoReasonably complete ISO 2022 support. Huge and hairy, but it seems to
ben [Sat, 17 Sep 2005 15:34:58 +0000 (15:34 +0000)]
Reasonably complete ISO 2022 support.  Huge and hairy, but it seems to
largely work.  It might even be useful for something.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6320 cda61777-01e9-0310-a592-d414129be87e

18 years agoUse standard "WILD" markers for unregistered Big 5 aliases.
ben [Sat, 17 Sep 2005 11:22:54 +0000 (11:22 +0000)]
Use standard "WILD" markers for unregistered Big 5 aliases.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6319 cda61777-01e9-0310-a592-d414129be87e

18 years agoFix stupid typo.
ben [Sat, 17 Sep 2005 11:21:32 +0000 (11:21 +0000)]
Fix stupid typo.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6318 cda61777-01e9-0310-a592-d414129be87e

18 years agoNames for ASCII and JIS X 0201 that appear both in the X registry and in
ben [Sat, 17 Sep 2005 11:21:05 +0000 (11:21 +0000)]
Names for ASCII and JIS X 0201 that appear both in the X registry and in
the usual X fonts.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6317 cda61777-01e9-0310-a592-d414129be87e

18 years agoTIS-620 is equivalent to ISO 8859-11, so map the MIME name for the former to
ben [Sat, 17 Sep 2005 10:49:51 +0000 (10:49 +0000)]
TIS-620 is equivalent to ISO 8859-11, so map the MIME name for the former to
the latter.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6316 cda61777-01e9-0310-a592-d414129be87e

18 years agoSubstantial overhaul of the UTF-8 decoder. It now uses 26 bits of state
ben [Fri, 16 Sep 2005 18:42:45 +0000 (18:42 +0000)]
Substantial overhaul of the UTF-8 decoder.  It now uses 26 bits of state
rather than 32, which might make it possible to use it inside another
decoder.  All the tests still pass.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6315 cda61777-01e9-0310-a592-d414129be87e

18 years agoBring utf8.c's internal tests up to date in the (somewhat belated)
simon [Fri, 16 Sep 2005 16:52:26 +0000 (16:52 +0000)]
Bring utf8.c's internal tests up to date in the (somewhat belated)
wake of r3713.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6314 cda61777-01e9-0310-a592-d414129be87e

18 years agoI've apparently had this todo-list comment sitting on stormhawk for
simon [Tue, 13 Sep 2005 22:09:02 +0000 (22:09 +0000)]
I've apparently had this todo-list comment sitting on stormhawk for
nearly a year and not checked it in.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6309 cda61777-01e9-0310-a592-d414129be87e

19 years agoExplicitly constify a bunch of static data declarations which were
simon [Thu, 10 Mar 2005 10:20:36 +0000 (10:20 +0000)]
Explicitly constify a bunch of static data declarations which were
conceptually const but not declared as such. Halibut is now back to
the practically-speaking-pointless but rather satisfying status of
having no global writable data whatsoever :-)

git-svn-id: svn://svn.tartarus.org/sgt/charset@5476 cda61777-01e9-0310-a592-d414129be87e

19 years agoAdd a `--list-charsets' option to Halibut to enumerate canonical names of known
jacob [Fri, 18 Feb 2005 13:17:28 +0000 (13:17 +0000)]
Add a `--list-charsets' option to Halibut to enumerate canonical names of known
character sets.

(Also make libcharset `return_in_enum' values saner.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@5341 cda61777-01e9-0310-a592-d414129be87e

19 years agoMove MODULE files out of individual project directories into a
simon [Thu, 18 Nov 2004 11:30:39 +0000 (11:30 +0000)]
Move MODULE files out of individual project directories into a
MODULES top-level directory, which is where the Tartarus website
scripts will (hopefully) start reading them from.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4813 cda61777-01e9-0310-a592-d414129be87e

19 years agoRemove .cvsignore files on all active branches.
simon [Tue, 16 Nov 2004 15:29:14 +0000 (15:29 +0000)]
Remove .cvsignore files on all active branches.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4788 cda61777-01e9-0310-a592-d414129be87e

19 years agoCVS revision numbers, stored as `cvs2svn:cvs-rev' properties, are a
simon [Tue, 16 Nov 2004 15:27:00 +0000 (15:27 +0000)]
CVS revision numbers, stored as `cvs2svn:cvs-rev' properties, are a
useful piece of history in this repository but we don't want to
preserve their latest values on future commits. Accordingly, I'm
deleting them from all active development (though not from past
release branches).

git-svn-id: svn://svn.tartarus.org/sgt/charset@4787 cda61777-01e9-0310-a592-d414129be87e

19 years agoCouple of fiddly fixes in libcharset.
simon [Tue, 26 Oct 2004 19:39:42 +0000 (19:39 +0000)]
Couple of fiddly fixes in libcharset.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4701 cda61777-01e9-0310-a592-d414129be87e

19 years agoCross-testing the libcharset compound text implementation against
simon [Sat, 25 Sep 2004 18:38:11 +0000 (18:38 +0000)]
Cross-testing the libcharset compound text implementation against
Xutf8TextListToTextProperty reveals that the latter supports JIS X
0212 via the escape sequences ESC $ ( D and ESC $ ) D, although this
is not listed in my copy of ctext.ps. It's easy enough to support
it, though, so now we do.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4581 cda61777-01e9-0310-a592-d414129be87e

19 years agoFix first two bugs in compound text support: escape sequences were
simon [Sat, 25 Sep 2004 14:45:32 +0000 (14:45 +0000)]
Fix first two bugs in compound text support: escape sequences were
mis-ordered, and initial charset state failed to specify 8859-1 in GR.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4579 cda61777-01e9-0310-a592-d414129be87e

19 years agoThe COMPOUND_TEXT encoding used by some X applications to transfer
simon [Sat, 25 Sep 2004 13:24:27 +0000 (13:24 +0000)]
The COMPOUND_TEXT encoding used by some X applications to transfer
internationalised text in selections is a subset of ISO 2022
containing no base character sets which libcharset doesn't already
support. As such, it isn't too hard to add direct compound text
support into libcharset, so here it is. With any luck I should
eventually be able to integrate this into Unix PuTTY, to deal with
the fact that the useful Xutf8 functions we currently use are
specific to XFree86.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4578 cda61777-01e9-0310-a592-d414129be87e

19 years agoPreferred MIME name for ASCII is "US-ASCII", not "ANSI_X3.4-1968". Oops.
simon [Thu, 3 Jun 2004 09:28:04 +0000 (09:28 +0000)]
Preferred MIME name for ASCII is "US-ASCII", not "ANSI_X3.4-1968". Oops.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4269 cda61777-01e9-0310-a592-d414129be87e

20 years agoAdd charset_from_locale(), a best-effort attempt to return the
simon [Thu, 22 Apr 2004 17:27:57 +0000 (17:27 +0000)]
Add charset_from_locale(), a best-effort attempt to return the
libcharset CS_* identifier for the character set indicated by the
active locale. Uses code from Markus Kuhn's website.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4115 cda61777-01e9-0310-a592-d414129be87e

20 years agoI typed `Win1252' today and libcharset didn't recognise it. Fixed.
simon [Tue, 20 Apr 2004 21:24:40 +0000 (21:24 +0000)]
I typed `Win1252' today and libcharset didn't recognise it. Fixed.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4107 cda61777-01e9-0310-a592-d414129be87e

20 years agoFix an oddity in PDFDocEncoding.
simon [Sun, 18 Apr 2004 07:46:37 +0000 (07:46 +0000)]
Fix an oddity in PDFDocEncoding.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4093 cda61777-01e9-0310-a592-d414129be87e

20 years agoNow that I've renamed the `test' program to `convcs', fix .cvsignore.
simon [Sat, 17 Apr 2004 18:20:58 +0000 (18:20 +0000)]
Now that I've renamed the `test' program to `convcs', fix .cvsignore.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4091 cda61777-01e9-0310-a592-d414129be87e

20 years ago`gcc -Wall' points out some signed/unsigned comparisons. Fixed.
simon [Sat, 17 Apr 2004 18:12:58 +0000 (18:12 +0000)]
`gcc -Wall' points out some signed/unsigned comparisons. Fixed.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4090 cda61777-01e9-0310-a592-d414129be87e

20 years agoNow this is a top-level CVS module, it should have LICENCE and
simon [Sat, 17 Apr 2004 18:09:27 +0000 (18:09 +0000)]
Now this is a top-level CVS module, it should have LICENCE and
MODULE files of its own.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4089 cda61777-01e9-0310-a592-d414129be87e

20 years agoLink libcharset into Halibut. (This involved faffing with
simon [Sat, 17 Apr 2004 11:44:49 +0000 (11:44 +0000)]
Link libcharset into Halibut. (This involved faffing with
CVSROOT/modules, so anyone with a checked-out copy of Halibut will
unfortunately need to do `cvs co' again.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@4088 cda61777-01e9-0310-a592-d414129be87e

20 years agoIn preparation for using libcharset in Halibut, I've added
simon [Sat, 17 Apr 2004 08:12:07 +0000 (08:12 +0000)]
In preparation for using libcharset in Halibut, I've added
PDFDocEncoding to the SBCS list; this is a custom superset of
ISO-8859-1 used in PDF files to store user-visible text that isn't
printed on a page (such as metadata and the document outline).

git-svn-id: svn://svn.tartarus.org/sgt/charset@4087 cda61777-01e9-0310-a592-d414129be87e

20 years agoInclude libcharset into both the Timber and Halibut checkouts.
simon [Sat, 17 Apr 2004 08:04:45 +0000 (08:04 +0000)]
Include libcharset into both the Timber and Halibut checkouts.
Unfortunately this means people will have to do `cvs co' again to
get this update, but that appears to be the price I pay for being
able to conveniently share a single source base in this way.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4086 cda61777-01e9-0310-a592-d414129be87e