X-Git-Url: https://git.distorted.org.uk/~mdw/sod/blobdiff_plain/7e55d0998646666923cc31c5c86dd13be95fba87..5fcc98348b560f1e40261044d59048d2c4ff016f:/STYLE diff --git a/STYLE b/STYLE index c1eb705..e85be96 100644 --- a/STYLE +++ b/STYLE @@ -1,19 +1,51 @@ -Notes on Lisp style +Notes on coding style -* Language subset and extensions +* General + +** Layout + +Lines are 77 characters at most, except for strange special effects. +Don't ask. This is not negotiable, though. Don't try to tell me that +your monitor is very wide so you can read longer lines. My monitor is +likely at least as wide. On the other hand, most lines are easily short +enough to fit in my narrow columns, so the right hand side of a wide +window would be mostly blank. This seems wasteful to me, when I could +fill that space with more code. + +Horizontal whitespace for layout purposes -- i.e., indentation and +alignment, rather than just separating words -- consists of as many tabs +as possible, followed by as many spaces as necessary to reach the target +column. Tab stops occur at every eight columns. You can tell this +because when you cat a file to your terminal, that's how the tabs +appear. Editors which disagree about this are simply wrong. + +My indentation quantum is usually two columns. It seems that some +modern editors are deeply confused, and think that tab width and +indentation quantum are the same thing, but they aren't. Such broken +editors will make a hopeless mess of my code. If you have the +misfortune to use such an editor, maybe you could contribute patches to +fix it. + + +* Lisp style + +** Language subset and extensions None of ANSI Common Lisp is off-limits. I think my Lisp style is rather more imperative in flavour than most modern Lisp programmers. It's probably closer to historical Lisp -practice in that regard, even though I wasn't writing Lisp back then. +practice in that regard, even though I wasn't writing Lisp back then. A +lot of this is because I don't assume that the Lisp implementation +handles tail calls properly: Common Lisp is not Scheme. I make extensive use of CLOS, and macros. On a couple of occasions I've made macros which use CLOS generic function dispatch to compute their expansions. The parser language is probably the best example of this in the codebase. -I like hairy ~format~ strings. +I like hairy ~format~ strings. I've intentionally opted to leave them +as challenges to the reader rather than explain them. I've avoided hairy ~loop~ for the most part, not because I dislike it strongly but because others do and I don't find that it wins big enough @@ -50,8 +82,7 @@ suite, which makes additional use of ~xlunit~. Running the test suite isn't essential to getting the translator built, so this isn't as much of a problem. - -* Layout +** Layout I pretty much let Emacs indent my code for me, based on information collected by SLIME. Some exceptions: @@ -64,13 +95,28 @@ collected by SLIME. Some exceptions: fixing. Since I don't use hairy ~loop~ much, this isn't a major problem. -Lines are 77 characters at most, except for strange special effects. -Don't ask. This is not negotiable, though. Don't try to tell me that -your monitor is very wide so you can read longer lines. My monitor is -likely at least as wide. On the other hand, most lines are easily short -enough to fit in my narrow columns, so the right hand side of a wide -window would be mostly blank. This seems wasteful to me, when I could -fill that space with more code. + + Emacs indents lambda lists really badly. I often prefer to put the + entire lambda list on its own line than to split it. If I have to + split a simple lambda list, without lambda-list keywords, I just + align the start of each subsequent line with the start of the first + argument. I break hairy lambda lists before lambda-list keywords, + and the start of a subsequent line aligns with the first argument + name following the lambda-list keyword which begins the group, so + that the lambda-list keyword stands out. + + : (defun many-arguments (first second third + : fourth fifth) + : ...) + + : (defun hairy-arguments (first second third + : &optional fourth fifth + : sixth + : &rest others) + : ...) + + I don't know what I'd do if I had a hairy lambda list with so many + mandatory positional arguments that I had to split them. So far, + this situation hasn't come up. Lisp code does have a tendency to march across to the right quite rapidly given a chance. I have a number of strategies for dealing with @@ -86,8 +132,7 @@ this. + Shrug my shoulders and let code dribble down the right hand side for a bit. - -* Packages and exporting +** Packages and exporting A package collects symbols which are given meanings in one or more source files. If a package's code is all in one file, then the package @@ -119,8 +164,7 @@ See ~doc/list-symbols.lisp~ for more sophisticated reporting. (In particular, this identifies what kind of thing(s) each external symbol names.) - -* Comments and file structuring +** Comments and file structuring A file starts with a big ~;;;~ comment bearing the Emacs ~-*-lisp-*-~ marker, a quick description, and copyright and licensing boilerplate. I @@ -129,7 +173,8 @@ special effects. Then there's package stuff. There may be a ~cl:defpackage~ form (with explicit package qualifier) if the relevant package doesn't have its own -package definition file. +package definition file. I use gensyms to name packages: strings don't +seem right, and symbols would leak into some unrelated package. Then there's ~cl:in-package~. Like ~defpackage~, I use a gensym to name the package. I can't think offhand of a good reason to have a file with @@ -171,8 +216,7 @@ Files end, as a result of long tradition, with a comment : ;;;----- That's all, folks -------------------------------------------------- - -* Macro style +** Macro style I don't mind complicated macros if they're doing something worthwhile. They need to have good documentation strings, though. @@ -185,8 +229,7 @@ It's extremely bad taste for a macro to evaluate its evaluable parameters in any order other than strictly left to right, or to evaluate them more than once. - -* Data structures +** Data structures I've tended to be happy with plain lists for homogeneous-ish collections. Strongly heterogeneous collections (other than input @@ -242,8 +285,7 @@ making a new structure type. I tend to tidy up a few rough edges. to roll up your sleeves and mess with ~&aux~ parameters to pull it off. - -* Naming +** Naming I'm a traditionalist in some ways, and one of the reasons I like Lisp is the richness of its history and tradition. @@ -257,16 +299,15 @@ I've also tended to go for fairly prosaic names, taking my inspiration from the CLOS MOP. While I mourn the loss of whimsical names like ~haulong~ and ~haipart~, I've tried to avoid inventing more of them. -There's a convention, which I think comes from ML, of using ~_~ in a -where a binding occurrence of a variable name is expected, to signify -that that the corresponding value is to be discarded. Common Lisp, -alas, doesn't have such a convention. Instead, there's a sequence of -silly names used with the same intention, and the bindings are then -explicitly ignored with a declaration. The names begin ~hunoz~, -~hukairz~, and (I think) ~huaskt~. +There's a convention, which I think comes from ML, of using ~_~ where a +binding occurrence of a variable name is expected, to signify that that +the corresponding value is to be discarded. Common Lisp, alas, doesn't +have such a convention. Instead, there's a sequence of silly names used +with the same intention, and the bindings are then explicitly ignored +with a declaration. The names begin ~hunoz~, ~hukairz~, and (I think) +~huaskt~. - -* Declarations +** Declarations The code is light on declarations, other than ~ignore~ and similar used to muffle warnings. The macros try to do sensible things with @@ -276,16 +317,279 @@ actual correctness, declarations provided by the caller need to be split up into a number of different parts of the expansion, which in turn requires figuring out what the declarations mean and which bindings they're referring to. That's not completely impossible, assuming that -there aren't implementation-specific declarations which crazy syntax +there aren't implementation-specific declarations with crazy syntax mixed in there, but it's more work than seems worthwhile. +* C style + +** Language subset and extensions + +I'm trying to support C89 still. There are few really worthwhile +features in C99 and later, though there are some. For now, I want Sod +to continue working if built with a C89 compiler, even if some things -- +e.g., most notably the macro sugar for varargs messages -- are +unavailable. + +Similarly, I'll use compiler-specific features if they don't adversely +affect portability. For example, I'll use GCC attributes to improve +compiler diagnostics, but they're wrapped up in preprocessor hacking so +that they won't be noticed by compilers which don't understand them. +I'm generally happy to accept contributions which make similar +improvements for other compilers. + +Sod is supposed to have minimal dependencies. It should be able to work +in what the ISO C standard names a `freestanding environment', without +most of the standard C library. The keyword-argument library is +carefully split into a piece which is fully portable and a piece which +depends on features which are only available in hosted environments, +like being able to print stuff to ~stderr~, so that users targetting +embedded systems have an easy porting job. + +** Naming + +I usually give local variables, arguments, and structure members very +short names, just one or two characters long. I find that longer names +are harder to distinguish, and take up horizontal space. Besides, +mathematicians have been using single-letter variable names quite +successfully for hundreds of years. + +I usually choose variable names to match their types in an informal way. +Loop counters are often called ~i~, ~j~, ~k~; generic pointers, and +pointers to bytes or characters, are usually ~p~ or ~q~; a character is +often ~ch~; a ~FILE~ pointer is ~fp~ following long tradition; sizes of +things, in bytes, are ~sz~, while lengths of vectors, in elements, are +~n~. I often name values of, or pointers to, structures or custom types +with the first letter of the type. If I have two things of the same +kind, I'll often double the name of one of them; e.g., if I have two +pointers to ~whatsit~ structures, I might call them ~w~ and ~ww~. + +I don't (any more) give ~typedef~ names to structures or unions. This +makes it possible to have a variable with the same name as the structure +tag without serious trouble. + +In variable names, I tend to just squash pieces of words together; in +longer names, sometimes I'll put in underscores to split things up a +bit. Camel case is bletcherous. + +File-scope names with /internal/ linkage -- i.e., things marked ~static~ +-- generally deserve somewhat longer names. I don't give them other +kind of marking; e.g., I'd probably name the pointer to the head of a +list of ~foo~ things something like ~foohead~. + +Names with /external/ linkage want more care because they're playing in +a shared global namespace. + +** Layout + +The C indent quantum is two columns. + +Declarations go at the top of functions. I don't put declarations in +inner blocks, and I certainly don't scatter declarations throughout a +block. I find that having the declarations all in one place makes it +easier for me to keep track of what things the function is going to be +thinking about. + +If I can't set a variable to its proper value immediately, I'll leave it +uninitialized until I can. That way, the compiler will warn me if I +forget. + +Most of my style is an attempt to get as much interesting code on the +screen at a time, and still be able to read it. The short variable +names keep things distinct while keeping statements short; short +statements don't need to be split across multiple lines. And keeping +the overall line length limit low means I can fit more /columns/ of code +on my screen. + +If there are several related variables with the same declaration +specifiers, I'll usually write a single declaration for all of them -- +even if they have different actual types. For example, + +: struct foo f, *fp = &f; + +Note that a ~*~ declarator operator has a space to its left, but never +to its right. (Stroustrup's style horribly misrepresents the underlying +syntax.) + +I will often write multiple statements on a single line, usually to +indicate that these things are part of the same thought, and they +shouldn't be separated. For example, if I'm working through an array of +things, I might have a pointer ~p~ to the element I'm hacking on, and a +count ~n~ of things left to hack, I'll have a loop + +: while (n) { +: /* hack on *p */ +: p++; n--; +: } + +so that the two updates don't get separated. + +I don't wrap braces around individual statements that fit on a single +line. For example, I'll write + +: while (*p == ' ') p++; + +On the other hand, if a single substatement is going to take more than +one line then it gets wrapped in braces. + +I don't write blocks which aren't part of larger compound statements, +e.g., ~if~ or ~while~. I'll write a compound statementon a single line +if I can; but I'll split ~if~ with an ~else~ over two lines. For +example, + +: if (a == 1) x = 0; +: else if (b == 3) { y = 2; z = 1; } +: else w = 15; + +On the other hand, if I can't write all of the branches of an +~if~\relax/\relax ~else if~ ladder like this, then /all/ of the +substatements get their own lines. (I write ~do~\relax/\relax ~while~ +loops in the same way, but this comes up much less frequently.) + +If I can't write a block on the same line, then the opening brace goes +on the same line as the statement head, and the closing brace gets its +own line. A trailing ~else~ or ~while~ goes on the same line as the +previous closing brace, if there is one. + +I don't write spaces inside parentheses or square brackets, or between +unary operators and their operands. I always write ~sizeof~ as if it +were a function, even though I know it isn't. I write a single space +either side of non-multiplicative binary operators -- i.e., other than +~*~, ~/~, ~%~, and ~&~; I don't write spaces around multiplicative +operators any more. The comma operator is special, and gets a space +after, but not before. + +If I'm breaking a long line at a binary operator, the break comes +/after/ the operator, not before. + +** Common conventions + +A /predicate/ is a function which answers a yes/no question -- and has +no side-effects. I don't use ~bool~ or similar; predicates return +~int~, such that zero is false and nonzero is true. Predicates usually +have names ending in ~p~ or ~_p~. (Note that function names +beginning ~is...~ are reserved for future ~~ macros.) + +On the other hand, an /operation/ is a function whose main purpose is to +have an effect -- maybe create a thing, or update some state. In the +absence of better ideas, operations also return ~int~, but zero +indicates success, and nonzero -- usually $-1$ -- indicates failure. + +** Error handling and resource management + +I've tried many techniques. I think the following is the best approach +so far. + +I try to arrange that every type which represents some resource which +might need releasing has an easily recognizable `inert' value which +indicates that the resource has not been acquired. At the top of a +function, I initialize all of the variables which might hold onto +resources to their inert values. At the end of the function, I place a +label, ~end~ or ~fail~. An ~end~ label is for common cleanup; a ~fail~ +label is for cleanup that's only needed on unsuccessful completion. + + +** Miscellaneous style issues + +I write ~0~, not ~NULL~. Doing this prevents a common error in +null-terminated variable-length argument lists, e.g., ~execlp~, where +~NULL~ is actually an integer ~0~ in disguise and ends up being an ~int~ +where a pointer was wanted. + +I don't usually write redundant comparisons against ~0~, or ~NULL~, or +well-known return codes indicating success. Again, this helps with +compression. I'll write + +: rc = do_something(foo, bar); if (rc) goto end; + +(yes, one line) rather than comparing ~rc~ against some ~STATUS_SUCCESS~ +code or similar. Exception: I still haven't decided whether I prefer +leaving the explicit relational in ~strcmp~ and similar tests. + +I always write parentheses around the expression in a ~return~ +statement. + +In declarations, storage classes come first (e.g., ~static~, ~extern~, +~typedef~), followed by qualifiers (~const~, ~volatile~; I never use +~restrict~), and then the type specifiers, signedness indicators first +where they aren't redundant (so maybe ~signed char~ for special effects, +but never ~signed int~), then length indicators, then the base type. I +omit ~int~ if there are other type specifiers, so ~unsigned~ or ~long~, +rather than ~unsigned int~ or ~long int~. + +The full declarator syntax for function pointer is pretty ugly. I often +simplify it by defining a ~typedef~ for the /function/ type, not the +function pointer type. For example + +: typedef int callbackfn(struct thing */*t*/, void */*p*/); + +I'd then use variables (structure members, arguments, etc.) of type +~callbackfn *~. + +In header files, I comment out argument names to prevent problems with +macros defined by client translation units. Also, I explicitly mark +function declarations as being ~extern~. + +** Comments and file structuring + +I never use C++-style ~//~ comments except for temporary special +effects. + +If a comment fits on one line, then its closing ~*/~ is on the same +line; otherwise, the ending ~*/~ is on a line by itself, and there's a +spine of ~*~ characters in a column on the left. + +A file starts with a big comment bearing the Emacs ~-*-c-*-~ marker, a +quick description, and copyright and licensing boilerplate. + +Header files are wrapped up with multiple-inclusion and C++ guards, with + +: #ifndef HEADER_H +: #define HEADER_H +: +: #ifdef __cplusplus +: extern "C" { +: #endif + +at the top. + +The rest of the file consists of C code. I don't use page boundaries +~^L~ to split files up. Instead, I use big banner comments for this: + +: /*----- Section title -----------------------------------------------------*/ + +Following long tradition, functions and macros are documented in a +preceding comment which looks like this. + +: /* --- @name@ --- * +: * +: * Arguments: @type fmm@ = a five-minute argument +: * @type fhh@ = the full half-hour +: * +: * Returns: A return value. +: * +: * Use: It does a thing. Otherwise I wouldn't have bothered. +: */ + +Sometimes (rarely) the description of the return value explains +sufficiently what the thing does. If so, the `Use' part can be omitted. +Fragments of C code in this comment are surrounded by ~@~ characters. +There can also be \LaTeX\ maths in here, in ~%$~...\relax ~$%~. + +Files end, as a result of long tradition, with a comment + +: /*----- That's all, folks -------------------------------------------------*/ + +The closing ~#endif~ of a header file comes after this final comment. + + * COMMENT Emacs cruft #+LATEX_CLASS: strayman ## LocalWords: CLOS ish destructure destructured accessor specializers -## LocalWords: accessors DSLs gensym +## LocalWords: accessors DSLs gensym gensyms bletcherous Stroustrup +## LocalWords: Stroustrup's signedness ## Local variables: ## mode: org