X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/halibut/blobdiff_plain/b80802bae4dfeae315236b136d6181198e84d4d7..1d1da251ac6656094a13788ff273d9922d58f927:/doc/input.but diff --git a/doc/input.but b/doc/input.but index daf8092..8cf4b8a 100644 --- a/doc/input.but +++ b/doc/input.but @@ -1,3 +1,5 @@ +\versionid $Id$ + \C{input} Halibut input format This chapter describes the format in which you should write @@ -52,6 +54,10 @@ and Halibut would generate the text This \\ is a backslash, and these are \{braces\}. } +If you want to write your input file in a character set other than +ASCII, you can do so by using the \c{\\cfg\{input-charset\}} +command. See \k{input-config} for details of this. + \H{input-inline} Simple \i{inline formatting commands} Halibut formatting commands all begin with a backslash, followed by @@ -136,12 +142,12 @@ I recommend using weak code for any application where it is example, if the text is capitalised, that's usually good enough. If I talk about the Pentium's \cw{EAX} and \cw{EDX} registers, for example, you don't need quotes to notice that those are special; so -I would write that in Halibut as \q{\c{the Pentium's \\cw\{EAX\} and -\\cw\{EDX\} registers}}. But if I'm talking about the Unix command +I would write that in Halibut as \cq{the Pentium's \\cw\{EAX\} and +\\cw\{EDX\} registers}. But if I'm talking about the Unix command \c{man}, which is an ordinary English word in its own right, a reader might be slightly confused if it appeared in the middle of a -sentence undecorated; so I would write that as \q{\c{the Unix command -\\c\{man\}}}. +sentence undecorated; so I would write that as \cq{the Unix command +\\c\{man\}}. In summary: @@ -155,12 +161,18 @@ fixed-width font if possible, but it's not essential}. In really extreme cases, you might want Halibut to use \i{quotation marks} even in output formats which can change font. In \k{input-date}, for example, I mention the special formatting -command \q{\cw{\\.}}. If that appeared at the end of a sentence +command \cq{\\.}. If that appeared at the end of a sentence \e{without} the quotes, then the two adjacent full stops would look -pretty strange even if they were obviously in different fonts. So I -used the \c{\\q} command to provide my own set of quotes, and then -used \c{\\cw} rather than \c{\\c} to ensure that none of Halibut's -output formats would add another set of quotes: +pretty strange even if they were obviously in different fonts. + +For this, Halibut supports the \i\c{\\cq} command, which is exactly +equivalent to using \c{\\q} to provide quotes and then using +\c{\\cw} inside the quotes. So in the paragraph above, for example, +I wrote + +\c the special formatting command \cq{\\.}. + +and I could equivalently have written \c the special formatting command \q{\cw{\\.}}. @@ -188,19 +200,23 @@ Here is some \q{text in quotes}. } and in every output format Halibut generates, it will choose the -best quote characters available to it in that format. - -You can still use ordinary ASCII \i{double quotes} if you prefer; or -you could even use the \c{\\u} command (see \k{input-unicode}) to -generate \i{Unicode matched quotes} (single or double) and fall back -to the normal ASCII one if they aren't available. But I recommend -using the built-in \c{\\q} command in most cases, because it's -simple and does the best it can everywhere. - -(Note that if you're using the \c{\\c} or \c{\\cw} commands to -display literal computer code, you probably \e{will} want to use -literal \i{ASCII quote characters}, because it is likely to matter -precisely which quote character you use.) +best quote characters available to it in that format. (The quote +characters to use can be configured with the \c{\\cfg} command.) + +You can still use the ordinary quote characters of your choice if +you prefer; or you could even use the \c{\\u} command (see +\k{input-unicode}) to generate \i{Unicode matched quotes} (single or +double) in a way which will automatically fall back to the normal +ASCII one if they aren't available. But I recommend using the +built-in \c{\\q} command in most cases, because it's simple and does +the best it can everywhere. + +If you're using the \c{\\c} or \c{\\cw} commands to display literal +computer code, you will probably want to use literal \i{ASCII quote +characters}, because it is likely to matter precisely which quote +character you use. In fact, Halibut actually \e{disallows} the use +of \c{\\q} within either of \c{\\c} and \c{\\cw}, since this +simplifies some of the output formats. \S{input-nonbreaking} \c{\\-} and \c{\\_}: \ii{Non-breaking hyphens} and \I{non-breaking spaces}spaces @@ -248,9 +264,9 @@ but if you try to follow it with an alphabetic or numeric character (such as writing \c{\\dateZ}) then Halibut will assume you are trying to invoke the name of a macro command you have defined yourself, and will complain if no such command exists. To get round -this you can use the special \q{\cw{\\.}} do-nothing command. See +this you can use the special \cq{\\.} do-nothing command. See \k{input-macro} for more about general Halibut command syntax and -\q{\cw{\\.}}. +\cq{\\.}. If you would prefer the date to be generated in a specific format, you can follow the \c{\\date} command with a format specification in @@ -308,12 +324,12 @@ The \c{\\W} command supports a piece of extra syntax to make this convenient for you. You can specify \c{\\c} or \c{\\cw} \e{between} the first and second pairs of braces. For example, you might write -\c Google is located at \W{http://www.google.com/}\cw{www.google.com}. +\c Google is at \W{http://www.google.com/}\cw{www.google.com}. and Halibut would produce \quote{ -Google is located at \W{http://www.google.com/}\cw{www.google.com}. +Google is at \W{http://www.google.com/}\cw{www.google.com}. } If you want the link text to be an index term as well, you can also @@ -324,23 +340,18 @@ indexing.) \S{input-unicode} \c{\\u}: Specifying arbitrary \i{Unicode} characters -When Halibut is finished, it should have full Unicode support. You -should be able to specify any (reasonably well known) \i{character -set} for your input document, and Halibut should convert it all to -Unicode as it reads it in. Similarly, you should be able to specify -the character set you want for each output format and have all the -conversion done automatically. - -Currently, none of this is actually supported. Input text files are -assumed to be in \i{ISO 8859-1}, and each output format has its own -non-configurable character set (although the HTML output can use the -\c{Ӓ} mechanism to output any Unicode character it likes). +Halibut has extensive support for Unicode and character set +conversion. You can specify any (reasonably well known) \i{character +set} for your input document, and Halibut will convert it all to +Unicode as it reads it in. See \k{input-config} for more details of +this. If you need to specify a Unicode character in your input document -which is not supported by the input character set, you can use the -\i\c{\\u} command to do this. \c{\\u} expects to be followed by a -sequence of hex digits; so that \c{\\u0041}, for example, denotes -the Unicode character \cw{0x0041}, which is the capital letter A. +which is not supported by the input character set you have chosen, +you can use the \i\c{\\u} command to do this. \c{\\u} expects to be +followed by a sequence of hex digits; so that \c{\\u0041}, for +example, denotes the Unicode character \cw{0x0041}, which is the +capital letter A. If a Unicode character specified in this way is not supported in a particular \e{output} format, you probably don't just want it to be @@ -360,8 +371,8 @@ This is likely to cost \u20AC{EUR\_}2500 at least. If you read it in other formats, you may see different results. -\S{input-xref} \i\c{\\k} and \i\c{\\K}: \ii{Cross-references} to -other sections +\S{input-xref} \i\c{\\k} and \I{\\K-upper}\c{\\K}: +\ii{Cross-references} to other sections \K{intro-features} mentions that Halibut \I{section numbers}numbers the sections of your document automatically, and can generate @@ -478,7 +489,7 @@ Note that the above paragraph makes use of a backslash and a pair of braces, and does \e{not} need to escape them in the way described in \k{input-basics}. This is because code paragraphs formatted in this way are a special case; the intention is that you can just copy and -paste a lump of code out of your program, put \q{\cw{\\c }} at the +paste a lump of code out of your program, put \cq{\\c } at the start of every line, and simply \e{not have to worry} about the details - you don't have to go through the whole block looking for characters to escape. @@ -663,6 +674,16 @@ This produces the following output: } +If you really want to, you are allowed to use \c{\\dt} and \c{\\dd} +without strictly interleaving them (multiple consecutive \c{\\dt}s +or consecutive \c{\\dd}s, or a description list starting with +\c{\\dd} or ending with \c{\\dt}). This is probably most useful if +you are listing a sequence of things with \c{\\dt}, but only some of +them actually need \c{\\dd} descriptions. You should \e{not} use +multiple consecutive \c{\\dd}s to provide a multi-paragraph +definition of something; that's what \c{\\lcont} is for, as +explained in \k{input-list-continuation}. + \S2{input-list-continuation} \ii{Continuing list items} into further paragraphs @@ -842,8 +863,8 @@ So now you know. } -\S{input-sections} \i\c{\\C}, \i\c{\\H}, \i\c{\\S}, \i\c{\\A}, -\i\c{\\U}: Chapter and \i{section headings} +\S{input-sections} \I{\\C-upper}\c{\\C}, \i\c{\\H}, \i\c{\\S}, +\i\c{\\A}, \I{\\U-upper}\c{\\U}: Chapter and \i{section headings} \K{intro-features} mentions that Halibut \I{section numbering}numbers the sections of your document automatically, and @@ -955,7 +976,7 @@ The three special paragraph types are: \dd This defines the overall title of the entire document. This title is treated specially in some output formats (for example, it's -used in a \cw{