X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/halibut/blobdiff_plain/b80802bae4dfeae315236b136d6181198e84d4d7..c770011aba58fb66b9a654c5089476315998ab27:/doc/input.but diff --git a/doc/input.but b/doc/input.but index daf8092..d99f5ce 100644 --- a/doc/input.but +++ b/doc/input.but @@ -52,6 +52,10 @@ and Halibut would generate the text This \\ is a backslash, and these are \{braces\}. } +If you want to write your input file in a character set other than +ASCII, you can do so by using the \c{\\cfg\{input-charset\}} +command. See \k{input-config} for details of this. + \H{input-inline} Simple \i{inline formatting commands} Halibut formatting commands all begin with a backslash, followed by @@ -188,19 +192,23 @@ Here is some \q{text in quotes}. } and in every output format Halibut generates, it will choose the -best quote characters available to it in that format. - -You can still use ordinary ASCII \i{double quotes} if you prefer; or -you could even use the \c{\\u} command (see \k{input-unicode}) to -generate \i{Unicode matched quotes} (single or double) and fall back -to the normal ASCII one if they aren't available. But I recommend -using the built-in \c{\\q} command in most cases, because it's -simple and does the best it can everywhere. - -(Note that if you're using the \c{\\c} or \c{\\cw} commands to -display literal computer code, you probably \e{will} want to use -literal \i{ASCII quote characters}, because it is likely to matter -precisely which quote character you use.) +best quote characters available to it in that format. (The quote +characters to use can be configured with the \c{\\cfg} command.) + +You can still use the ordinary quote characters of your choice if +you prefer; or you could even use the \c{\\u} command (see +\k{input-unicode}) to generate \i{Unicode matched quotes} (single or +double) in a way which will automatically fall back to the normal +ASCII one if they aren't available. But I recommend using the +built-in \c{\\q} command in most cases, because it's simple and does +the best it can everywhere. + +If you're using the \c{\\c} or \c{\\cw} commands to display literal +computer code, you will probably want to use literal \i{ASCII quote +characters}, because it is likely to matter precisely which quote +character you use. In fact, Halibut actually \e{disallows} the use +of \c{\\q} within either of \c{\\c} and \c{\\cw}, since this +simplifies some of the output formats. \S{input-nonbreaking} \c{\\-} and \c{\\_}: \ii{Non-breaking hyphens} and \I{non-breaking spaces}spaces @@ -308,12 +316,12 @@ The \c{\\W} command supports a piece of extra syntax to make this convenient for you. You can specify \c{\\c} or \c{\\cw} \e{between} the first and second pairs of braces. For example, you might write -\c Google is located at \W{http://www.google.com/}\cw{www.google.com}. +\c Google is at \W{http://www.google.com/}\cw{www.google.com}. and Halibut would produce \quote{ -Google is located at \W{http://www.google.com/}\cw{www.google.com}. +Google is at \W{http://www.google.com/}\cw{www.google.com}. } If you want the link text to be an index term as well, you can also @@ -324,23 +332,18 @@ indexing.) \S{input-unicode} \c{\\u}: Specifying arbitrary \i{Unicode} characters -When Halibut is finished, it should have full Unicode support. You -should be able to specify any (reasonably well known) \i{character -set} for your input document, and Halibut should convert it all to -Unicode as it reads it in. Similarly, you should be able to specify -the character set you want for each output format and have all the -conversion done automatically. - -Currently, none of this is actually supported. Input text files are -assumed to be in \i{ISO 8859-1}, and each output format has its own -non-configurable character set (although the HTML output can use the -\c{Ӓ} mechanism to output any Unicode character it likes). +Halibut has extensive support for Unicode and character set +conversion. You can specify any (reasonably well known) \i{character +set} for your input document, and Halibut will convert it all to +Unicode as it reads it in. See \k{input-config} for more details of +this. If you need to specify a Unicode character in your input document -which is not supported by the input character set, you can use the -\i\c{\\u} command to do this. \c{\\u} expects to be followed by a -sequence of hex digits; so that \c{\\u0041}, for example, denotes -the Unicode character \cw{0x0041}, which is the capital letter A. +which is not supported by the input character set you have chosen, +you can use the \i\c{\\u} command to do this. \c{\\u} expects to be +followed by a sequence of hex digits; so that \c{\\u0041}, for +example, denotes the Unicode character \cw{0x0041}, which is the +capital letter A. If a Unicode character specified in this way is not supported in a particular \e{output} format, you probably don't just want it to be @@ -360,8 +363,8 @@ This is likely to cost \u20AC{EUR\_}2500 at least. If you read it in other formats, you may see different results. -\S{input-xref} \i\c{\\k} and \i\c{\\K}: \ii{Cross-references} to -other sections +\S{input-xref} \i\c{\\k} and \I{\\K-upper}\c{\\K}: +\ii{Cross-references} to other sections \K{intro-features} mentions that Halibut \I{section numbers}numbers the sections of your document automatically, and can generate @@ -842,8 +845,8 @@ So now you know. } -\S{input-sections} \i\c{\\C}, \i\c{\\H}, \i\c{\\S}, \i\c{\\A}, -\i\c{\\U}: Chapter and \i{section headings} +\S{input-sections} \I{\\C-upper}\c{\\C}, \i\c{\\H}, \i\c{\\S}, +\i\c{\\A}, \I{\\U-upper}\c{\\U}: Chapter and \i{section headings} \K{intro-features} mentions that Halibut \I{section numbering}numbers the sections of your document automatically, and @@ -1018,10 +1021,10 @@ If you need your document to refer to other documents (research papers, books, websites, whatever), you might find a bibliography feature useful. -You can define a bibliography entry using the \i\c{\\B} command. This -looks very like the \c{\\C} command and friends: it expects a -keyword in braces, followed by some text describing the document -being referred to. For example: +You can define a bibliography entry using the \I{\\B-upper}\c{\\B} +command. This looks very like the \c{\\C} command and friends: it +expects a keyword in braces, followed by some text describing the +document being referred to. For example: \c \B{freds-book} \q{The Taming Of The Mongoose}, by Fred Bloggs. \c Published by Paperjam & Notoner, 1993. @@ -1124,8 +1127,8 @@ appear emphasised, you must say so explicitly using \c{\\IM}; see Sometimes you might want to index a term which is not explicitly mentioned, but which is highly relevant to the text and you think that somebody looking up that term in the index might find it useful -to be directed here. To do this you can use the \i\c{\\I} command, -to create an \i{\e{invisible} index tag}: +to be directed here. To do this you can use the \I{\\I-upper}\c{\\I} +command, to create an \i{\e{invisible} index tag}: \c If your printer runs out of toner, \I{replacing toner \c cartridge}here is what to do: @@ -1230,6 +1233,39 @@ default one (typically \c{\\IM\{foo\}\_foo}, although it might be Halibut discards its default implicit one, and you must then specify that one explicitly as well if you wanted to keep it. +\S{input-index-case} Indexing terms that differ only in case + +The \e{tags} you use to define an index term (that is, the text in +the braces after \c{\\i}, \c{\\I} and \c{\\IM}) are treated +case-insensitively by Halibut. So if, as in this manual itself, you +need two index terms that differ only in case, doing this will not +work: + +\c The \i\c{\\c} command defines computer code. +\c +\c The \i\c{\\C} command defines a chapter. + +Halibut will treat these terms as the same, and will fold the two +sets of references into one combined list (although it will warn you +that it is doing this). The idea is to ensure that people who forget +to use \c{\\ii} find out about it rather than Halibut silently +generating a bad index; checking an index for errors is very hard +work, so Halibut tries to avoid errors in the first place as much as +it can. + +If you do come across this situation, you will need to define two +distinguishable index terms. What I did in this manual was something +like this: + +\c The \i\c{\\c} command defines computer code. +\c +\c The \I{\\C-upper}\c{\\C} command defines a chapter. +\c +\c \IM{\\C-upper} \c{\\C} + +The effect of this will be two separate index entries, one reading +\c{\\c} and the other reading \c{\\C}, pointing to the right places. + \H{input-config} \ii{Configuring} Halibut Halibut uses the \i\c{\\cfg} command to allow you to configure various @@ -1273,6 +1309,72 @@ subsections of a chapter. \dd Exactly like \c{chapter}, but changes the name given to appendices. +\dt \I\cw{\\cfg\{input-charset\}}\cw{\\cfg\{input-charset\}\{}\e{character set name}\cw{\}} + +\dd This tells Halibut what \i{character set} you are writing your +input file in. By default, it is assumed to be US-ASCII (meaning +\e{only} plain \i{ASCII}, with no accented characters at all). + +\lcont{ + +You can specify any well-known name for any supported character set. +For example, \c{iso-8859-1}, \c{iso8859-1} and \c{iso_8859-1} are +all recognised, \c{GB2312} and \c{EUC-CN} both work, and so on. + +This directive takes effect immediately after the \c{\\cfg} command. +All text after that in the file is expected to be in the new +character set. You can even change character set several times +within a file if you really want to. + +When Halibut reads the input file, everything you type will be +converted into \i{Unicode} from the character set you specify here, +will be processed as Unicode by Halibut internally, and will be +written to the various output formats in whatever character sets +they deem appropriate. + +} + +\dt \I\cw{\\cfg\{quotes\}}\cw{\\cfg\{quotes\}\{}\e{open-quote}\cw{\}\{}\e{close-quote}\cw{\}}[\cw{\{}\e{open-quote}\cw{\}\{}\e{close-quote}...\cw{\}}] + +\dd This specifies the quote characters which should be used. You +should separately specify the open and close quote marks; each +quote mark can be one character (\cw{\\cfg\{quotes\}\{`\}\{'\}}), or +more than one (\cw{\\cfg\{quotes\}\{<<\}\{>>\}}). + +\lcont{ + +\cw{\\cfg\{quotes\}} can be overridden by configuration directives for +each individual backend (see \k{output}); it is a convenient way of +setting quote characters for all backends at once. + +All backends use these characters in response to the \c{\\q} command +(see \k{input-quotes}). Some (such as the text backend) use them for +other purposes too. + +You can specify multiple fallback options in this command (a pair of +open and close quotes, each in their own braces, then another pair, +then another if you like), and Halibut will choose the first pair +which the output character set supports (Halibut will always use a +matching pair). (This is to allow you to configure quote characters +once, generate output in several different character sets, and have +Halibut constantly adapt to make the best use of the current +encoding.) For example, you might write + +\c \cfg{quotes}{\u201c}{\u201d}{"}{"} + +and Halibut would use the Unicode matched double quote characters if +possible, and fall back to ASCII double quotes otherwise. If the +output character set were to contain U+201C but not U+201D, then +Halibut would fall back to using the ASCII double quote character as +\e{both} open and close quotes. (No known character set is that +silly; I mention it only as an example.) + +\cw{\\cfg\{quotes\}} (and the backend-specific versions) apply to the +\e{entire} output; it's not possible to change quote characters +partway through the output. + +} + In addition to these configuration commands, there are also configuration commands provided by each individual output format. These configuration commands are discussed along with each output @@ -1283,6 +1385,10 @@ The \i{default settings} for the above options are: \c \cfg{chapter}{Chapter} \c \cfg{section}{Section} \c \cfg{appendix}{Appendix} +\c \cfg{input-charset}{ASCII} + +(The default settings for \cw{\\cfg\{quotes\}} are backend-specific; +see \k{output}.) \H{input-macro} Defining \i{macros}