X-Git-Url: https://git.distorted.org.uk/~mdw/sgt/halibut/blobdiff_plain/8856f1501a7f27f4ca4f8033136794e1fdfa1702..ed5c261fdcdc64f131052308649bc97dddb2c660:/doc/input.but diff --git a/doc/input.but b/doc/input.but index ca66df5..1c20acc 100644 --- a/doc/input.but +++ b/doc/input.but @@ -52,6 +52,10 @@ and Halibut would generate the text This \\ is a backslash, and these are \{braces\}. } +If you want to write your input file in a character set other than +ASCII, you can do so by using the \c{\\cfg\{input-charset\}} +command. See \k{input-config} for details of this. + \H{input-inline} Simple \i{inline formatting commands} Halibut formatting commands all begin with a backslash, followed by @@ -79,14 +83,14 @@ be treated identically: \S{input-emph} \c{\\e}: Emphasising text -Possibly the most obvious piece of formatting you might want to use -in a document is \i\e{emphasis}. To emphasise text, you use the -\i\c{\\e} command, and follow it up with the text to be emphasised -in braces. For example, the first sentence in this paragraph was -generated using the Halibut input +Possibly the most obvious piece of formatting you might want +to use in a document is \i\e{emphasis}. +To emphasise text, you use the \i\c{\\e} command, and follow it up +with the text to be emphasised in braces. For example, the first +sentence in this paragraph was generated using the Halibut input -\c Possibly the most obvious piece of formatting you might want to use -\c in a document is \e{emphasis}. +\c Possibly the most obvious piece of formatting you might want +\c to use in a document is \e{emphasis}. \S{input-code} \c{\\c} and \c{\\cw}: Displaying \i{computer code} inline @@ -126,10 +130,10 @@ knows where the literal computer text starts and stops; and in plain text, this cannot be done by changing font, so there needs to be an alternative way. -So in the plain text back end, things marked as code (\c{\\c}) will -be surrounded by quote marks, so that it's obvious where they start -and finish. Things marked as weak code (\c{\\cw}) will not look any -different from normal text. +So in the plain text output format, things marked as code (\c{\\c}) +will be surrounded by quote marks, so that it's obvious where they +start and finish. Things marked as weak code (\c{\\cw}) will not +look any different from normal text. I recommend using weak code for any application where it is \e{obvious} that the text is literal computer input or output. For @@ -152,6 +156,18 @@ changing the font if possible, or by using quotes if not. \b \c{\\cw} means \q{it would be nice to display this text in a fixed-width font if possible, but it's not essential}. +In really extreme cases, you might want Halibut to use \i{quotation +marks} even in output formats which can change font. In +\k{input-date}, for example, I mention the special formatting +command \q{\cw{\\.}}. If that appeared at the end of a sentence +\e{without} the quotes, then the two adjacent full stops would look +pretty strange even if they were obviously in different fonts. So I +used the \c{\\q} command to provide my own set of quotes, and then +used \c{\\cw} rather than \c{\\c} to ensure that none of Halibut's +output formats would add another set of quotes: + +\c the special formatting command \q{\cw{\\.}}. + There is a separate mechanism for displaying computer code in an entire paragraph; see \k{input-codepara} for that one. @@ -178,17 +194,20 @@ Here is some \q{text in quotes}. and in every output format Halibut generates, it will choose the best quote characters available to it in that format. -You can still use ordinary ASCII \i{double quotes} if you prefer; or -you could even use the \c{\\u} command (see \k{input-unicode}) to -generate \i{Unicode matched quotes} (single or double) and fall back -to the normal ASCII one if they aren't available. But I recommend -using the built-in \c{\\q} command in most cases, because it's -simple and does the best it can everywhere. - -(Note that if you're using the \c{\\c} or \c{\\cw} commands to -display literal computer code, you probably \e{will} want to use -literal \i{ASCII quote characters}, because it is likely to matter -precisely which quote character you use.) +You can still use the ordinary quote characters of your choice if +you prefer; or you could even use the \c{\\u} command (see +\k{input-unicode}) to generate \i{Unicode matched quotes} (single or +double) in a way which will automatically fall back to the normal +ASCII one if they aren't available. But I recommend using the +built-in \c{\\q} command in most cases, because it's simple and does +the best it can everywhere. + +If you're using the \c{\\c} or \c{\\cw} commands to display literal +computer code, you will probably want to use literal \i{ASCII quote +characters}, because it is likely to matter precisely which quote +character you use. In fact, Halibut actually \e{disallows} the use +of \c{\\q} within either of \c{\\c} and \c{\\cw}, since this +simplifies some of the output formats. \S{input-nonbreaking} \c{\\-} and \c{\\_}: \ii{Non-breaking hyphens} and \I{non-breaking spaces}spaces @@ -230,6 +249,16 @@ and Halibut generates something like This document was generated on \date. } +You can follow the \c{\\date} command directly with punctuation (as +in this example, where it is immediately followed by a full stop), +but if you try to follow it with an alphabetic or numeric character +(such as writing \c{\\dateZ}) then Halibut will assume you are +trying to invoke the name of a macro command you have defined +yourself, and will complain if no such command exists. To get round +this you can use the special \q{\cw{\\.}} do-nothing command. See +\k{input-macro} for more about general Halibut command syntax and +\q{\cw{\\.}}. + If you would prefer the date to be generated in a specific format, you can follow the \c{\\date} command with a format specification in braces. The format specification will be run through the standard C @@ -286,34 +315,34 @@ The \c{\\W} command supports a piece of extra syntax to make this convenient for you. You can specify \c{\\c} or \c{\\cw} \e{between} the first and second pairs of braces. For example, you might write -\c Google is located at \W{http://www.google.com/}\cw{www.google.com}. +\c Google is at \W{http://www.google.com/}\cw{www.google.com}. and Halibut would produce \quote{ -Google is located at \W{http://www.google.com/}\cw{www.google.com}. +Google is at \W{http://www.google.com/}\cw{www.google.com}. } +If you want the link text to be an index term as well, you can also +specify \c{\\i} or \c{\\ii}; this has to come before \c{\\c} or +\c{\\cw} if both are present. (See \k{input-index} for more about +indexing.) + \S{input-unicode} \c{\\u}: Specifying arbitrary \i{Unicode} characters -When Halibut is finished, it should have full Unicode support. You -should be able to specify any (reasonably well known) \i{character -set} for your input document, and Halibut should convert it all to -Unicode as it reads it in. Similarly, you should be able to specify -the character set you want for each output format and have all the -conversion done automatically. - -Currently, none of this is actually supported. Input text files are -assumed to be in \i{ISO 8859-1}, and each output format has its own -non-configurable character set (although the HTML output can use the -\c{Ӓ} mechanism to output any Unicode character it likes). +Halibut has extensive support for Unicode and character set +conversion. You can specify any (reasonably well known) \i{character +set} for your input document, and Halibut will convert it all to +Unicode as it reads it in. See \k{input-config} for more details of +this. If you need to specify a Unicode character in your input document -which is not supported by the input character set, you can use the -\i\c{\\u} command to do this. \c{\\u} expects to be followed by a -sequence of hex digits; so that \c{\\u0041}, for example, denotes -the Unicode character \cw{0x0041}, which is the capital letter A. +which is not supported by the input character set you have chosen, +you can use the \i\c{\\u} command to do this. \c{\\u} expects to be +followed by a sequence of hex digits; so that \c{\\u0041}, for +example, denotes the Unicode character \cw{0x0041}, which is the +capital letter A. If a Unicode character specified in this way is not supported in a particular \e{output} format, you probably don't just want it to be @@ -385,15 +414,25 @@ braces, that text will be ignored by Halibut. For example, you might write -\c The typical behaviour of an antelope \#{do I mean gazelle?} is... +\c The typical behaviour of an antelope \#{do I mean +\c gazelle?} is... and Halibut will simply leave out the aside about gazelles, and will generate nothing but \quote{ -The typical behaviour of an antelope \#{do I mean gazelle?} is... +The typical behaviour of an antelope \#{do I mean +gazelle?} is... } +This command will respect nested braces, so you can use it to +comment out sections of Halibut markup: + +\c This function is \#{very, \e{very}} important. + +In this example, the comment lasts until the final closing brace (so +that the whole \q{very, \e{very}} section is commented out). + The \c{\\#} command can also be used to produce a whole-paragraph comment; see \k{input-commentpara} for details of that. @@ -441,7 +480,7 @@ Note that the above paragraph makes use of a backslash and a pair of braces, and does \e{not} need to escape them in the way described in \k{input-basics}. This is because code paragraphs formatted in this way are a special case; the intention is that you can just copy and -paste a lump of code out of another program, put \q{\cw{\\c }} at the +paste a lump of code out of your program, put \q{\cw{\\c }} at the start of every line, and simply \e{not have to worry} about the details - you don't have to go through the whole block looking for characters to escape. @@ -560,10 +599,12 @@ Here's a list: The disadvantage of having Halibut sort out the list numbering for you is that if you need to refer to a list item by its number, you -can't reliably do so. To get round this, Halibut allows an optional -keyword in braces after the \c{\\n} command. This keyword can then -be referenced using the \c{\\k} or \c{\\K} command (see -\k{input-xref}) to provide the number of the list item. For example: +can't reliably know the number in advance (because if you later add +another item at the start of the list, the numbers will all change). +To get round this, Halibut allows an optional keyword in braces +after the \c{\\n} command. This keyword can then be referenced using +the \c{\\k} or \c{\\K} command (see \k{input-xref}) to provide the +number of the list item. For example: \c Here's a list: \c @@ -589,6 +630,12 @@ Here's a list: \n Now go back to step \k{this-one}. } +The keyword you supply after \c{\\n} is allowed to contain escaped +special characters (\c{\\\\}, \c{\\\{} and \c{\\\}}), but should not +contain any other Halibut markup. It is intended to be a word or two +of ordinary text. (This also applies to keywords used in other +commands, such as \c{\\B} and \c{\\C}). + \S2{input-list-description} \i\c{\\dt} and \i\c{\\dd}: \ii{Description lists} @@ -698,7 +745,7 @@ Here's a list: } -This syntax seems a little bit inconvenient, and perhaps +This syntax might seem a little bit inconvenient, and perhaps counter-intuitive: you might expect the enclosing braces to have to go around the \e{whole} list item, rather than everything except the first paragraph. @@ -758,8 +805,8 @@ your quoted section, and \c{\}} at the end, and the paragraphs in between will be formatted to indicate that they are a quotation. (This very manual, in fact, uses this feature a lot: all of the -examples of Halibut source followed by Halibut output have the -output quoted using \c{\\quote}.) +examples of Halibut input followed by Halibut output have the output +quoted using \c{\\quote}.) Here's some example Halibut input: @@ -770,8 +817,8 @@ Here's some example Halibut input: \c \q{The question is,} said Alice, \q{whether you \e{can} make \c words mean so many different things.} \c -\c \q{The question is,} said Humpty Dumpty, \q{who is to be master - -\c that's all.} +\c \q{The question is,} said Humpty Dumpty, \q{who is to be +\c master - that's all.} \c \c } \c @@ -788,8 +835,8 @@ In \q{Through the Looking Glass}, Lewis Carroll wrote: \q{The question is,} said Alice, \q{whether you \e{can} make words mean so many different things.} -\q{The question is,} said Humpty Dumpty, \q{who is to be master - -that's all.} +\q{The question is,} said Humpty Dumpty, \q{who is to be +master - that's all.} } @@ -821,6 +868,12 @@ written as and this allows me to use the command \c{\\k\{input\}} to generate a cross-reference to that chapter somewhere else. +The \I{keyword syntax}keyword you supply after one of these commands +is allowed to contain escaped special characters (\c{\\\\}, \c{\\\{} +and \c{\\\}}), but should not contain any other Halibut markup. It +is intended to be a word or two of ordinary text. (This also applies +to keywords used in other commands, such as \c{\\B} and \c{\\n}). + The next level down from \c{\\C} is \c{\\H}, for \q{heading}. This is used in exactly the same way as \c{\\C}, but section headings defined with \c{\\H} are considered to be part of a containing @@ -911,11 +964,11 @@ special paragraph type to point it out. \dd This command indicates that the paragraph attached to it contains a \i{copyright statement} for the document. This text is -usually displayed inline, just before the first chapter title but -after any preamble text before that; but in some output formats it -is given additional special treatment. For example, Windows Help -files have a standard slot in which to store a copyright notice, so -that other software can display it prominently. +displayed inline where it appears, exactly like a normal paragraph; +but in some output formats it is given additional special treatment. +For example, Windows Help files have a standard slot in which to +store a copyright notice, so that other software can display it +prominently. \dt \i\cw{\\versionid} @@ -940,9 +993,9 @@ a comment, like this: \c Here's a (fairly short) paragraph which will be displayed. \c \c \# Here's a comment paragraph which will not be displayed, no -\c matter how long it goes on. All I needed to indicate this was the -\c single \# at the start of the paragraph; I don't need one on -\c every line or anything like that. +\c matter how long it goes on. All I needed to indicate this was +\c the single \# at the start of the paragraph; I don't need one +\c on every line or anything like that. \c \c Here's another displayed paragraph. @@ -953,9 +1006,9 @@ When run through Halibut, this produces the following output: Here's a (fairly short) paragraph which will be displayed. \# Here's a comment paragraph which will not be displayed, no -matter how long it goes on. All I needed to indicate this was the -single \# at the start of the paragraph; I don't need one on -every line or anything like that. +matter how long it goes on. All I needed to indicate this was +the single \# at the start of the paragraph; I don't need one +on every line or anything like that. Here's another displayed paragraph. @@ -1013,6 +1066,12 @@ format for a particular book: \c \BR{freds-book} [Fred1993] +The keyword you supply after \c{\\B} is allowed to contain escaped +special characters (\c{\\\\}, \c{\\\{} and \c{\\\}}), but should not +contain any other Halibut markup. It is intended to be a word or two +of ordinary text. (This also applies to keywords used in other +commands, such as \c{\\n} and \c{\\C}). + \H{input-index} Creating an \i{index} Halibut contains a comprehensive indexing mechanism, which attempts @@ -1216,6 +1275,31 @@ subsections of a chapter. \dd Exactly like \c{chapter}, but changes the name given to appendices. +\dt \I\cw{\\cfg\{input-charset\}}\cw{\\cfg\{input-charset\}\{}\e{character set name}\cw{\}} + +\dd This tells Halibut what \i{character set} you are writing your +input file in. By default, it is assumed to be US-ASCII (meaning +\e{only} plain \i{ASCII}, with no accented characters at all). + +\lcont{ + +You can specify any well-known name for any supported character set. +For example, \c{iso-8859-1}, \c{iso8859-1} and \c{iso_8859-1} are +all recognised, \c{GB2312} and \c{EUC-CN} both work, and so on. + +This directive takes effect immediately after the \c{\\cfg} command. +All text after that in the file is expected to be in the new +character set. You can even change character set several times +within a file if you really want to. + +When Halibut reads the input file, everything you type will be +converted into \i{Unicode} from the character set you specify here, +will be processed as Unicode by Halibut internally, and will be +written to the various output formats in whatever character sets +they deem appropriate. + +} + In addition to these configuration commands, there are also configuration commands provided by each individual output format. These configuration commands are discussed along with each output @@ -1226,6 +1310,7 @@ The \i{default settings} for the above options are: \c \cfg{chapter}{Chapter} \c \cfg{section}{Section} \c \cfg{appendix}{Appendix} +\c \cfg{input-charset}{ASCII} \H{input-macro} Defining \i{macros} @@ -1245,14 +1330,23 @@ macro, using the \i\c{\\define} command: \c \define{eur} \u20AC{EUR\_} +Your macro names may include Roman alphabetic characters +(\c{a}-\c{z}, \c{A}-\c{Z}) and ordinary Arabic numerals +(\c{0}-\c{9}), but nothing else. (This is general \I{command +syntax}syntax for all of Halibut's commands, except for a few +special ones such as \c{\\_} and \c{\\-} which consist of a single +punctuation character only.) + Then you can just write ... \c This is likely to cost \eur 2500 at least. ... except that that's not terribly good, because you end up with a -space between the Euro sign and the number. In this case, it's -helpful to use the special \i\c{\\.} command, which is defined to -\I{NOP}\I{doing nothing}do nothing at all! But it acts as a +space between the Euro sign and the number. (If you had written +\c{\\eur2500}, Halibut would have tried to interpret it as a macro +command called \c{eur2500}, which you didn't define.) In this case, +it's helpful to use the special \i\c{\\.} command, which is defined +to \I{NOP}\I{doing nothing}do nothing at all! But it acts as a separator between your macro and the next character: \c This is likely to cost \eur\.2500 at least.