Don't forget to mention the default setting for \cfg{input-charset}.

[sgt/halibut] / doc / input.but
diff --git a/doc/input.but b/doc/input.but

index e3849e3..b29cb88 100644 (file)
--- a/doc/input.but
+++ b/doc/input.but
@@ -52,6 +52,10 @@ and Halibut would generate the text
  This \\ is a backslash, and these are \{braces\}.
  }
  
  This \\ is a backslash, and these are \{braces\}.
  }
  
+If you want to write your input file in a character set other than
+ASCII, you can do so by using the \c{\\cfg\{input-charset\}}
+command. See \k{input-config} for details of this.
+
  \H{input-inline} Simple \i{inline formatting commands}
  
  Halibut formatting commands all begin with a backslash, followed by
  \H{input-inline} Simple \i{inline formatting commands}
  
  Halibut formatting commands all begin with a backslash, followed by
@@ -79,14 +83,14 @@ be treated identically:
  
  \S{input-emph} \c{\\e}: Emphasising text
  
  
  \S{input-emph} \c{\\e}: Emphasising text
  
-Possibly the most obvious piece of formatting you might want to use
-in a document is \i\e{emphasis}. To emphasise text, you use the
-\i\c{\\e} command, and follow it up with the text to be emphasised
-in braces. For example, the first sentence in this paragraph was
-generated using the Halibut input
+Possibly the most obvious piece of formatting you might want
+to use in a document is \i\e{emphasis}.
+To emphasise text, you use the \i\c{\\e} command, and follow it up
+with the text to be emphasised in braces. For example, the first
+sentence in this paragraph was generated using the Halibut input
  
  
-\c Possibly the most obvious piece of formatting you might want to use
-\c in a document is \e{emphasis}.
+\c Possibly the most obvious piece of formatting you might want
+\c to use in a document is \e{emphasis}.
  
  \S{input-code} \c{\\c} and \c{\\cw}: Displaying \i{computer code} inline
  
  
  \S{input-code} \c{\\c} and \c{\\cw}: Displaying \i{computer code} inline
  
@@ -126,10 +130,10 @@ knows where the literal computer text starts and stops; and in plain
  text, this cannot be done by changing font, so there needs to be an
  alternative way.
  
  text, this cannot be done by changing font, so there needs to be an
  alternative way.
  
-So in the plain text back end, things marked as code (\c{\\c}) will
-be surrounded by quote marks, so that it's obvious where they start
-and finish. Things marked as weak code (\c{\\cw}) will not look any
-different from normal text.
+So in the plain text output format, things marked as code (\c{\\c})
+will be surrounded by quote marks, so that it's obvious where they
+start and finish. Things marked as weak code (\c{\\cw}) will not
+look any different from normal text.
  
  I recommend using weak code for any application where it is
  \e{obvious} that the text is literal computer input or output. For
  
  I recommend using weak code for any application where it is
  \e{obvious} that the text is literal computer input or output. For
@@ -152,6 +156,18 @@ changing the font if possible, or by using quotes if not.
  \b \c{\\cw} means \q{it would be nice to display this text in a
  fixed-width font if possible, but it's not essential}.
  
  \b \c{\\cw} means \q{it would be nice to display this text in a
  fixed-width font if possible, but it's not essential}.
  
+In really extreme cases, you might want Halibut to use \i{quotation
+marks} even in output formats which can change font. In
+\k{input-date}, for example, I mention the special formatting
+command \q{\cw{\\.}}. If that appeared at the end of a sentence
+\e{without} the quotes, then the two adjacent full stops would look
+pretty strange even if they were obviously in different fonts. So I
+used the \c{\\q} command to provide my own set of quotes, and then
+used \c{\\cw} rather than \c{\\c} to ensure that none of Halibut's
+output formats would add another set of quotes:
+
+\c the special formatting command \q{\cw{\\.}}.
+
  There is a separate mechanism for displaying computer code in an
  entire paragraph; see \k{input-codepara} for that one.
  
  There is a separate mechanism for displaying computer code in an
  entire paragraph; see \k{input-codepara} for that one.
  
@@ -178,12 +194,13 @@ Here is some \q{text in quotes}.
  and in every output format Halibut generates, it will choose the
  best quote characters available to it in that format.
  
  and in every output format Halibut generates, it will choose the
  best quote characters available to it in that format.
  
-You can still use ordinary ASCII \i{double quotes} if you prefer; or
-you could even use the \c{\\u} command (see \k{input-unicode}) to
-generate \i{Unicode matched quotes} (single or double) and fall back
-to the normal ASCII one if they aren't available. But I recommend
-using the built-in \c{\\q} command in most cases, because it's
-simple and does the best it can everywhere.
+You can still use the ordinary quote characters of your choice if
+you prefer; or you could even use the \c{\\u} command (see
+\k{input-unicode}) to generate \i{Unicode matched quotes} (single or
+double) in a way which will automatically fall back to the normal
+ASCII one if they aren't available. But I recommend using the
+built-in \c{\\q} command in most cases, because it's simple and does
+the best it can everywhere.
  
  (Note that if you're using the \c{\\c} or \c{\\cw} commands to
  display literal computer code, you probably \e{will} want to use
  
  (Note that if you're using the \c{\\c} or \c{\\cw} commands to
  display literal computer code, you probably \e{will} want to use
@@ -230,6 +247,16 @@ and Halibut generates something like
  This document was generated on \date.
  }
  
  This document was generated on \date.
  }
  
+You can follow the \c{\\date} command directly with punctuation (as
+in this example, where it is immediately followed by a full stop),
+but if you try to follow it with an alphabetic or numeric character
+(such as writing \c{\\dateZ}) then Halibut will assume you are
+trying to invoke the name of a macro command you have defined
+yourself, and will complain if no such command exists. To get round
+this you can use the special \q{\cw{\\.}} do-nothing command. See
+\k{input-macro} for more about general Halibut command syntax and
+\q{\cw{\\.}}.
+
  If you would prefer the date to be generated in a specific format,
  you can follow the \c{\\date} command with a format specification in
  braces. The format specification will be run through the standard C
  If you would prefer the date to be generated in a specific format,
  you can follow the \c{\\date} command with a format specification in
  braces. The format specification will be run through the standard C
@@ -294,26 +321,26 @@ and Halibut would produce
  Google is located at \W{http://www.google.com/}\cw{www.google.com}.
  }
  
  Google is located at \W{http://www.google.com/}\cw{www.google.com}.
  }
  
+If you want the link text to be an index term as well, you can also
+specify \c{\\i} or \c{\\ii}; this has to come before \c{\\c} or
+\c{\\cw} if both are present. (See \k{input-index} for more about
+indexing.)
+
  \S{input-unicode} \c{\\u}: Specifying arbitrary \i{Unicode}
  characters
  
  \S{input-unicode} \c{\\u}: Specifying arbitrary \i{Unicode}
  characters
  
-When Halibut is finished, it should have full Unicode support. You
-should be able to specify any (reasonably well known) \i{character
-set} for your input document, and Halibut should convert it all to
-Unicode as it reads it in. Similarly, you should be able to specify
-the character set you want for each output format and have all the
-conversion done automatically.
-
-Currently, none of this is actually supported. Input text files are
-assumed to be in \i{ISO 8859-1}, and each output format has its own
-non-configurable character set (although the HTML output can use the
-\c{&#1234;} mechanism to output any Unicode character it likes).
+Halibut has extensive support for Unicode and character set
+conversion. You can specify any (reasonably well known) \i{character
+set} for your input document, and Halibut will convert it all to
+Unicode as it reads it in. See \k{input-config} for more details of
+this.
  
  If you need to specify a Unicode character in your input document
  
  If you need to specify a Unicode character in your input document
-which is not supported by the input character set, you can use the
-\i\c{\\u} command to do this. \c{\\u} expects to be followed by a
-sequence of hex digits; so that \c{\\u0041}, for example, denotes
-the Unicode character \cw{0x0041}, which is the capital letter A.
+which is not supported by the input character set you have chosen,
+you can use the \i\c{\\u} command to do this. \c{\\u} expects to be
+followed by a sequence of hex digits; so that \c{\\u0041}, for
+example, denotes the Unicode character \cw{0x0041}, which is the
+capital letter A.
  
  If a Unicode character specified in this way is not supported in a
  particular \e{output} format, you probably don't just want it to be
  
  If a Unicode character specified in this way is not supported in a
  particular \e{output} format, you probably don't just want it to be
@@ -385,15 +412,25 @@ braces, that text will be ignored by Halibut.
  
  For example, you might write
  
  
  For example, you might write
  
-\c The typical behaviour of an antelope \#{do I mean gazelle?} is...
+\c The typical behaviour of an antelope \#{do I mean
+\c gazelle?} is...
  
  and Halibut will simply leave out the aside about gazelles, and will
  generate nothing but
  
  \quote{
  
  and Halibut will simply leave out the aside about gazelles, and will
  generate nothing but
  
  \quote{
-The typical behaviour of an antelope \#{do I mean gazelle?} is...
+The typical behaviour of an antelope \#{do I mean
+gazelle?} is...
  }
  
  }
  
+This command will respect nested braces, so you can use it to
+comment out sections of Halibut markup:
+
+\c This function is \#{very, \e{very}} important.
+
+In this example, the comment lasts until the final closing brace (so
+that the whole \q{very, \e{very}} section is commented out).
+
  The \c{\\#} command can also be used to produce a whole-paragraph
  comment; see \k{input-commentpara} for details of that.
  
  The \c{\\#} command can also be used to produce a whole-paragraph
  comment; see \k{input-commentpara} for details of that.
  
@@ -441,7 +478,7 @@ Note that the above paragraph makes use of a backslash and a pair of
  braces, and does \e{not} need to escape them in the way described in
  \k{input-basics}. This is because code paragraphs formatted in this
  way are a special case; the intention is that you can just copy and
  braces, and does \e{not} need to escape them in the way described in
  \k{input-basics}. This is because code paragraphs formatted in this
  way are a special case; the intention is that you can just copy and
-paste a lump of code out of another program, put \q{\cw{\\c }} at the
+paste a lump of code out of your program, put \q{\cw{\\c }} at the
  start of every line, and simply \e{not have to worry} about the
  details - you don't have to go through the whole block looking for
  characters to escape.
  start of every line, and simply \e{not have to worry} about the
  details - you don't have to go through the whole block looking for
  characters to escape.
@@ -560,10 +597,12 @@ Here's a list:
  
  The disadvantage of having Halibut sort out the list numbering for
  you is that if you need to refer to a list item by its number, you
  
  The disadvantage of having Halibut sort out the list numbering for
  you is that if you need to refer to a list item by its number, you
-can't reliably do so. To get round this, Halibut allows an optional
-keyword in braces after the \c{\\n} command. This keyword can then
-be referenced using the \c{\\k} or \c{\\K} command (see
-\k{input-xref}) to provide the number of the list item. For example:
+can't reliably know the number in advance (because if you later add
+another item at the start of the list, the numbers will all change).
+To get round this, Halibut allows an optional keyword in braces
+after the \c{\\n} command. This keyword can then be referenced using
+the \c{\\k} or \c{\\K} command (see \k{input-xref}) to provide the
+number of the list item. For example:
  
  \c Here's a list:
  \c
  
  \c Here's a list:
  \c
@@ -589,6 +628,12 @@ Here's a list:
  \n Now go back to step \k{this-one}.
  }
  
  \n Now go back to step \k{this-one}.
  }
  
+The keyword you supply after \c{\\n} is allowed to contain escaped
+special characters (\c{\\\\}, \c{\\\{} and \c{\\\}}), but should not
+contain any other Halibut markup. It is intended to be a word or two
+of ordinary text. (This also applies to keywords used in other
+commands, such as \c{\\B} and \c{\\C}).
+
  \S2{input-list-description} \i\c{\\dt} and \i\c{\\dd}:
  \ii{Description lists}
  
  \S2{input-list-description} \i\c{\\dt} and \i\c{\\dd}:
  \ii{Description lists}
  
@@ -698,7 +743,7 @@ Here's a list:
  
  }
  
  
  }
  
-This syntax seems a little bit inconvenient, and perhaps
+This syntax might seem a little bit inconvenient, and perhaps
  counter-intuitive: you might expect the enclosing braces to have to
  go around the \e{whole} list item, rather than everything except the
  first paragraph.
  counter-intuitive: you might expect the enclosing braces to have to
  go around the \e{whole} list item, rather than everything except the
  first paragraph.
@@ -758,8 +803,8 @@ your quoted section, and \c{\}} at the end, and the paragraphs in
  between will be formatted to indicate that they are a quotation.
  
  (This very manual, in fact, uses this feature a lot: all of the
  between will be formatted to indicate that they are a quotation.
  
  (This very manual, in fact, uses this feature a lot: all of the
-examples of Halibut source followed by Halibut output have the
-output quoted using \c{\\quote}.)
+examples of Halibut input followed by Halibut output have the output
+quoted using \c{\\quote}.)
  
  Here's some example Halibut input:
  
  
  Here's some example Halibut input:
  
@@ -770,8 +815,8 @@ Here's some example Halibut input:
  \c \q{The question is,} said Alice, \q{whether you \e{can} make
  \c words mean so many different things.}
  \c
  \c \q{The question is,} said Alice, \q{whether you \e{can} make
  \c words mean so many different things.}
  \c
-\c \q{The question is,} said Humpty Dumpty, \q{who is to be master -
-\c that's all.}
+\c \q{The question is,} said Humpty Dumpty, \q{who is to be
+\c master - that's all.}
  \c
  \c }
  \c
  \c
  \c }
  \c
@@ -788,8 +833,8 @@ In \q{Through the Looking Glass}, Lewis Carroll wrote:
  \q{The question is,} said Alice, \q{whether you \e{can} make
  words mean so many different things.}
  
  \q{The question is,} said Alice, \q{whether you \e{can} make
  words mean so many different things.}
  
-\q{The question is,} said Humpty Dumpty, \q{who is to be master -
-that's all.}
+\q{The question is,} said Humpty Dumpty, \q{who is to be
+master - that's all.}
  
  }
  
  
  }
  
@@ -821,6 +866,12 @@ written as
  and this allows me to use the command \c{\\k\{input\}} to generate a
  cross-reference to that chapter somewhere else.
  
  and this allows me to use the command \c{\\k\{input\}} to generate a
  cross-reference to that chapter somewhere else.
  
+The \I{keyword syntax}keyword you supply after one of these commands
+is allowed to contain escaped special characters (\c{\\\\}, \c{\\\{}
+and \c{\\\}}), but should not contain any other Halibut markup. It
+is intended to be a word or two of ordinary text. (This also applies
+to keywords used in other commands, such as \c{\\B} and \c{\\n}).
+
  The next level down from \c{\\C} is \c{\\H}, for \q{heading}. This
  is used in exactly the same way as \c{\\C}, but section headings
  defined with \c{\\H} are considered to be part of a containing
  The next level down from \c{\\C} is \c{\\H}, for \q{heading}. This
  is used in exactly the same way as \c{\\C}, but section headings
  defined with \c{\\H} are considered to be part of a containing
@@ -940,9 +991,9 @@ a comment, like this:
  \c Here's a (fairly short) paragraph which will be displayed.
  \c
  \c \# Here's a comment paragraph which will not be displayed, no
  \c Here's a (fairly short) paragraph which will be displayed.
  \c
  \c \# Here's a comment paragraph which will not be displayed, no
-\c matter how long it goes on. All I needed to indicate this was the
-\c single \# at the start of the paragraph; I don't need one on
-\c every line or anything like that.
+\c matter how long it goes on. All I needed to indicate this was
+\c the single \# at the start of the paragraph; I don't need one
+\c on every line or anything like that.
  \c
  \c Here's another displayed paragraph.
  
  \c
  \c Here's another displayed paragraph.
  
@@ -953,9 +1004,9 @@ When run through Halibut, this produces the following output:
  Here's a (fairly short) paragraph which will be displayed.
  
  \# Here's a comment paragraph which will not be displayed, no
  Here's a (fairly short) paragraph which will be displayed.
  
  \# Here's a comment paragraph which will not be displayed, no
-matter how long it goes on. All I needed to indicate this was the
-single \# at the start of the paragraph; I don't need one on
-every line or anything like that.
+matter how long it goes on. All I needed to indicate this was
+the single \# at the start of the paragraph; I don't need one
+on every line or anything like that.
  
  Here's another displayed paragraph.
  
  
  Here's another displayed paragraph.
  
@@ -1013,6 +1064,12 @@ format for a particular book:
  
  \c \BR{freds-book} [Fred1993]
  
  
  \c \BR{freds-book} [Fred1993]
  
+The keyword you supply after \c{\\B} is allowed to contain escaped
+special characters (\c{\\\\}, \c{\\\{} and \c{\\\}}), but should not
+contain any other Halibut markup. It is intended to be a word or two
+of ordinary text. (This also applies to keywords used in other
+commands, such as \c{\\n} and \c{\\C}).
+
  \H{input-index} Creating an \i{index}
  
  Halibut contains a comprehensive indexing mechanism, which attempts
  \H{input-index} Creating an \i{index}
  
  Halibut contains a comprehensive indexing mechanism, which attempts
@@ -1216,6 +1273,31 @@ subsections of a chapter.
  \dd Exactly like \c{chapter}, but changes the name given to
  appendices.
  
  \dd Exactly like \c{chapter}, but changes the name given to
  appendices.
  
+\dt \I\cw{\\cfg\{input-charset\}}\cw{\\cfg\{input-charset\}\{}\e{character set name}\cw{\}}
+
+\dd This tells Halibut what \i{character set} you are writing your
+input file in. By default, it is assumed to be US-ASCII (meaning
+\e{only} plain \i{ASCII}, with no accented characters at all).
+
+\lcont{
+
+You can specify any well-known name for any supported character set.
+For example, \c{iso-8859-1}, \c{iso8859-1} and \c{iso_8859-1} are
+all recognised, \c{GB2312} and \c{EUC-CN} both work, and so on.
+
+This directive takes effect immediately after the \c{\\cfg} command.
+All text after that in the file is expected to be in the new
+character set. You can even change character set several times
+within a file if you really want to.
+
+When Halibut reads the input file, everything you type will be
+converted into \i{Unicode} from the character set you specify here,
+will be processed as Unicode by Halibut internally, and will be
+written to the various output formats in whatever character sets
+they deem appropriate.
+
+}
+
  In addition to these configuration commands, there are also
  configuration commands provided by each individual output format.
  These configuration commands are discussed along with each output
  In addition to these configuration commands, there are also
  configuration commands provided by each individual output format.
  These configuration commands are discussed along with each output
@@ -1226,6 +1308,7 @@ The \i{default settings} for the above options are:
  \c \cfg{chapter}{Chapter}
  \c \cfg{section}{Section}
  \c \cfg{appendix}{Appendix}
  \c \cfg{chapter}{Chapter}
  \c \cfg{section}{Section}
  \c \cfg{appendix}{Appendix}
+\c \cfg{input-charset}{ASCII}
  
  \H{input-macro} Defining \i{macros}
  
  
  \H{input-macro} Defining \i{macros}
  
@@ -1245,14 +1328,23 @@ macro, using the \i\c{\\define} command:
  
  \c \define{eur} \u20AC{EUR\_}
  
  
  \c \define{eur} \u20AC{EUR\_}
  
+Your macro names may include Roman alphabetic characters
+(\c{a}-\c{z}, \c{A}-\c{Z}) and ordinary Arabic numerals
+(\c{0}-\c{9}), but nothing else. (This is general \I{command
+syntax}syntax for all of Halibut's commands, except for a few
+special ones such as \c{\\_} and \c{\\-} which consist of a single
+punctuation character only.)
+
  Then you can just write ...
  
  \c This is likely to cost \eur 2500 at least.
  
  ... except that that's not terribly good, because you end up with a
  Then you can just write ...
  
  \c This is likely to cost \eur 2500 at least.
  
  ... except that that's not terribly good, because you end up with a
-space between the Euro sign and the number. In this case, it's
-helpful to use the special \i\c{\\.} command, which is defined to
-\I{NOP}\I{doing nothing}do nothing at all! But it acts as a
+space between the Euro sign and the number. (If you had written
+\c{\\eur2500}, Halibut would have tried to interpret it as a macro
+command called \c{eur2500}, which you didn't define.) In this case,
+it's helpful to use the special \i\c{\\.} command, which is defined
+to \I{NOP}\I{doing nothing}do nothing at all! But it acts as a
  separator between your macro and the next character:
  
  \c This is likely to cost \eur\.2500 at least.
  separator between your macro and the next character:
  
  \c This is likely to cost \eur\.2500 at least.