sgt/charset
18 years agoSupport for using DOCS to switch to and from UTF-8 mode.
ben [Sat, 17 Sep 2005 18:47:35 +0000 (18:47 +0000)]
Support for using DOCS to switch to and from UTF-8 mode.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6321 cda61777-01e9-0310-a592-d414129be87e

18 years agoReasonably complete ISO 2022 support. Huge and hairy, but it seems to
ben [Sat, 17 Sep 2005 15:34:58 +0000 (15:34 +0000)]
Reasonably complete ISO 2022 support.  Huge and hairy, but it seems to
largely work.  It might even be useful for something.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6320 cda61777-01e9-0310-a592-d414129be87e

18 years agoUse standard "WILD" markers for unregistered Big 5 aliases.
ben [Sat, 17 Sep 2005 11:22:54 +0000 (11:22 +0000)]
Use standard "WILD" markers for unregistered Big 5 aliases.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6319 cda61777-01e9-0310-a592-d414129be87e

18 years agoFix stupid typo.
ben [Sat, 17 Sep 2005 11:21:32 +0000 (11:21 +0000)]
Fix stupid typo.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6318 cda61777-01e9-0310-a592-d414129be87e

18 years agoNames for ASCII and JIS X 0201 that appear both in the X registry and in
ben [Sat, 17 Sep 2005 11:21:05 +0000 (11:21 +0000)]
Names for ASCII and JIS X 0201 that appear both in the X registry and in
the usual X fonts.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6317 cda61777-01e9-0310-a592-d414129be87e

18 years agoTIS-620 is equivalent to ISO 8859-11, so map the MIME name for the former to
ben [Sat, 17 Sep 2005 10:49:51 +0000 (10:49 +0000)]
TIS-620 is equivalent to ISO 8859-11, so map the MIME name for the former to
the latter.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6316 cda61777-01e9-0310-a592-d414129be87e

18 years agoSubstantial overhaul of the UTF-8 decoder. It now uses 26 bits of state
ben [Fri, 16 Sep 2005 18:42:45 +0000 (18:42 +0000)]
Substantial overhaul of the UTF-8 decoder.  It now uses 26 bits of state
rather than 32, which might make it possible to use it inside another
decoder.  All the tests still pass.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6315 cda61777-01e9-0310-a592-d414129be87e

18 years agoBring utf8.c's internal tests up to date in the (somewhat belated)
simon [Fri, 16 Sep 2005 16:52:26 +0000 (16:52 +0000)]
Bring utf8.c's internal tests up to date in the (somewhat belated)
wake of r3713.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6314 cda61777-01e9-0310-a592-d414129be87e

18 years agoI've apparently had this todo-list comment sitting on stormhawk for
simon [Tue, 13 Sep 2005 22:09:02 +0000 (22:09 +0000)]
I've apparently had this todo-list comment sitting on stormhawk for
nearly a year and not checked it in.

git-svn-id: svn://svn.tartarus.org/sgt/charset@6309 cda61777-01e9-0310-a592-d414129be87e

19 years agoExplicitly constify a bunch of static data declarations which were
simon [Thu, 10 Mar 2005 10:20:36 +0000 (10:20 +0000)]
Explicitly constify a bunch of static data declarations which were
conceptually const but not declared as such. Halibut is now back to
the practically-speaking-pointless but rather satisfying status of
having no global writable data whatsoever :-)

git-svn-id: svn://svn.tartarus.org/sgt/charset@5476 cda61777-01e9-0310-a592-d414129be87e

19 years agoAdd a `--list-charsets' option to Halibut to enumerate canonical names of known
jacob [Fri, 18 Feb 2005 13:17:28 +0000 (13:17 +0000)]
Add a `--list-charsets' option to Halibut to enumerate canonical names of known
character sets.

(Also make libcharset `return_in_enum' values saner.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@5341 cda61777-01e9-0310-a592-d414129be87e

19 years agoMove MODULE files out of individual project directories into a
simon [Thu, 18 Nov 2004 11:30:39 +0000 (11:30 +0000)]
Move MODULE files out of individual project directories into a
MODULES top-level directory, which is where the Tartarus website
scripts will (hopefully) start reading them from.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4813 cda61777-01e9-0310-a592-d414129be87e

19 years agoRemove .cvsignore files on all active branches.
simon [Tue, 16 Nov 2004 15:29:14 +0000 (15:29 +0000)]
Remove .cvsignore files on all active branches.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4788 cda61777-01e9-0310-a592-d414129be87e

19 years agoCVS revision numbers, stored as `cvs2svn:cvs-rev' properties, are a
simon [Tue, 16 Nov 2004 15:27:00 +0000 (15:27 +0000)]
CVS revision numbers, stored as `cvs2svn:cvs-rev' properties, are a
useful piece of history in this repository but we don't want to
preserve their latest values on future commits. Accordingly, I'm
deleting them from all active development (though not from past
release branches).

git-svn-id: svn://svn.tartarus.org/sgt/charset@4787 cda61777-01e9-0310-a592-d414129be87e

19 years agoCouple of fiddly fixes in libcharset.
simon [Tue, 26 Oct 2004 19:39:42 +0000 (19:39 +0000)]
Couple of fiddly fixes in libcharset.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4701 cda61777-01e9-0310-a592-d414129be87e

19 years agoCross-testing the libcharset compound text implementation against
simon [Sat, 25 Sep 2004 18:38:11 +0000 (18:38 +0000)]
Cross-testing the libcharset compound text implementation against
Xutf8TextListToTextProperty reveals that the latter supports JIS X
0212 via the escape sequences ESC $ ( D and ESC $ ) D, although this
is not listed in my copy of ctext.ps. It's easy enough to support
it, though, so now we do.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4581 cda61777-01e9-0310-a592-d414129be87e

19 years agoFix first two bugs in compound text support: escape sequences were
simon [Sat, 25 Sep 2004 14:45:32 +0000 (14:45 +0000)]
Fix first two bugs in compound text support: escape sequences were
mis-ordered, and initial charset state failed to specify 8859-1 in GR.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4579 cda61777-01e9-0310-a592-d414129be87e

19 years agoThe COMPOUND_TEXT encoding used by some X applications to transfer
simon [Sat, 25 Sep 2004 13:24:27 +0000 (13:24 +0000)]
The COMPOUND_TEXT encoding used by some X applications to transfer
internationalised text in selections is a subset of ISO 2022
containing no base character sets which libcharset doesn't already
support. As such, it isn't too hard to add direct compound text
support into libcharset, so here it is. With any luck I should
eventually be able to integrate this into Unix PuTTY, to deal with
the fact that the useful Xutf8 functions we currently use are
specific to XFree86.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4578 cda61777-01e9-0310-a592-d414129be87e

19 years agoPreferred MIME name for ASCII is "US-ASCII", not "ANSI_X3.4-1968". Oops.
simon [Thu, 3 Jun 2004 09:28:04 +0000 (09:28 +0000)]
Preferred MIME name for ASCII is "US-ASCII", not "ANSI_X3.4-1968". Oops.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4269 cda61777-01e9-0310-a592-d414129be87e

20 years agoAdd charset_from_locale(), a best-effort attempt to return the
simon [Thu, 22 Apr 2004 17:27:57 +0000 (17:27 +0000)]
Add charset_from_locale(), a best-effort attempt to return the
libcharset CS_* identifier for the character set indicated by the
active locale. Uses code from Markus Kuhn's website.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4115 cda61777-01e9-0310-a592-d414129be87e

20 years agoI typed `Win1252' today and libcharset didn't recognise it. Fixed.
simon [Tue, 20 Apr 2004 21:24:40 +0000 (21:24 +0000)]
I typed `Win1252' today and libcharset didn't recognise it. Fixed.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4107 cda61777-01e9-0310-a592-d414129be87e

20 years agoFix an oddity in PDFDocEncoding.
simon [Sun, 18 Apr 2004 07:46:37 +0000 (07:46 +0000)]
Fix an oddity in PDFDocEncoding.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4093 cda61777-01e9-0310-a592-d414129be87e

20 years agoNow that I've renamed the `test' program to `convcs', fix .cvsignore.
simon [Sat, 17 Apr 2004 18:20:58 +0000 (18:20 +0000)]
Now that I've renamed the `test' program to `convcs', fix .cvsignore.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4091 cda61777-01e9-0310-a592-d414129be87e

20 years ago`gcc -Wall' points out some signed/unsigned comparisons. Fixed.
simon [Sat, 17 Apr 2004 18:12:58 +0000 (18:12 +0000)]
`gcc -Wall' points out some signed/unsigned comparisons. Fixed.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4090 cda61777-01e9-0310-a592-d414129be87e

20 years agoNow this is a top-level CVS module, it should have LICENCE and
simon [Sat, 17 Apr 2004 18:09:27 +0000 (18:09 +0000)]
Now this is a top-level CVS module, it should have LICENCE and
MODULE files of its own.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4089 cda61777-01e9-0310-a592-d414129be87e

20 years agoLink libcharset into Halibut. (This involved faffing with
simon [Sat, 17 Apr 2004 11:44:49 +0000 (11:44 +0000)]
Link libcharset into Halibut. (This involved faffing with
CVSROOT/modules, so anyone with a checked-out copy of Halibut will
unfortunately need to do `cvs co' again.)

git-svn-id: svn://svn.tartarus.org/sgt/charset@4088 cda61777-01e9-0310-a592-d414129be87e

20 years agoIn preparation for using libcharset in Halibut, I've added
simon [Sat, 17 Apr 2004 08:12:07 +0000 (08:12 +0000)]
In preparation for using libcharset in Halibut, I've added
PDFDocEncoding to the SBCS list; this is a custom superset of
ISO-8859-1 used in PDF files to store user-visible text that isn't
printed on a page (such as metadata and the document outline).

git-svn-id: svn://svn.tartarus.org/sgt/charset@4087 cda61777-01e9-0310-a592-d414129be87e

20 years agoInclude libcharset into both the Timber and Halibut checkouts.
simon [Sat, 17 Apr 2004 08:04:45 +0000 (08:04 +0000)]
Include libcharset into both the Timber and Halibut checkouts.
Unfortunately this means people will have to do `cvs co' again to
get this update, but that appears to be the price I pay for being
able to conveniently share a single source base in this way.

git-svn-id: svn://svn.tartarus.org/sgt/charset@4086 cda61777-01e9-0310-a592-d414129be87e