lib/sod-hosted.c (sod_makev): Use two statements rather than tricky expression.
[sod] / STYLE
1 Notes on coding style
2
3 * General
4
5 ** Layout
6
7 Lines are 77 characters at most, except for strange special effects.
8 Don't ask. This is not negotiable, though. Don't try to tell me that
9 your monitor is very wide so you can read longer lines. My monitor is
10 likely at least as wide. On the other hand, most lines are easily short
11 enough to fit in my narrow columns, so the right hand side of a wide
12 window would be mostly blank. This seems wasteful to me, when I could
13 fill that space with more code.
14
15 Horizontal whitespace for layout purposes -- i.e., indentation and
16 alignment, rather than just separating words -- consists of as many tabs
17 as possible, followed by as many spaces as necessary to reach the target
18 column. Tab stops occur at every eight columns. You can tell this
19 because when you cat a file to your terminal, that's how the tabs
20 appear. Editors which disagree about this are simply wrong.
21
22 My indentation quantum is usually two columns. It seems that some
23 modern editors are deeply confused, and think that tab width and
24 indentation quantum are the same thing, but they aren't. Such broken
25 editors will make a hopeless mess of my code. If you have the
26 misfortune to use such an editor, maybe you could contribute patches to
27 fix it.
28
29
30 * Lisp style
31
32 ** Language subset and extensions
33
34 None of ANSI Common Lisp is off-limits.
35
36 I think my Lisp style is rather more imperative in flavour than most
37 modern Lisp programmers. It's probably closer to historical Lisp
38 practice in that regard, even though I wasn't writing Lisp back then. A
39 lot of this is because I don't assume that the Lisp implementation
40 handles tail calls properly: Common Lisp is not Scheme.
41
42 I make extensive use of CLOS, and macros. On a couple of occasions I've
43 made macros which use CLOS generic function dispatch to compute their
44 expansions. The parser language is probably the best example of this in
45 the codebase.
46
47 I like hairy ~format~ strings. I've intentionally opted to leave them
48 as challenges to the reader rather than explain them.
49
50 I've avoided hairy ~loop~ for the most part, not because I dislike it
51 strongly but because others do and I don't find that it wins big enough
52 for the fight to be worthwhile.
53
54 I only use ~&aux~ lambda-list parameters in ~defstruct~ BOA
55 constructors, for special effects.
56
57 I use ~car~, not ~first~, and ~cdr~, not ~rest~. Similarly, I use
58 ~cadr~, not ~second~, and I'm not afraid to use ~cddr~ or ~cadar~.
59
60 Similarly, I've not used ~elt~, preferring to know what kind of sequence
61 I'm dealing with, or using the built-in sequence functions.
62
63 I'm happy to use ~1+~, and I like the brevity of ~1-~ enough to use it
64 despite its terrible name.
65
66 There are no reader syntax extensions in the code. This is because I
67 couldn't think of any way they'd be especially helpful, and not because
68 I'm in any way opposed to them.
69
70 The main translator, in the ~SOD~ package, tries to assume very little
71 beyond ANSI Common Lisp and what's included in just about every serious
72 implementation: specifically, MOP introspection, and Gray streams.
73 There's intentionally no MOP intercession.
74
75 The frontend additionally makes use of ~cl-launch~, but the dependency
76 is actually quite weak, and it could be replaced with a different, maybe
77 implementation-specific, mechanism fairly easily. I'm keen to take
78 patches which improve frontend portability.
79
80 I'm more tolerant of extensions and external dependencies in the test
81 suite, which makes additional use of ~xlunit~. Running the test suite
82 isn't essential to getting the translator built, so this isn't as much
83 of a problem.
84
85 ** Layout
86
87 I pretty much let Emacs indent my code for me, based on information
88 collected by SLIME. Some exceptions:
89
90 + DSLs (e.g., the parser language) have their own space of macros
91 which Emacs doesn't understand and for the most part I haven't
92 bothered to teach it.
93
94 + Emacs sometimes does a bad job with hairy ~loop~ and requires manual
95 fixing. Since I don't use hairy ~loop~ much, this isn't a major
96 problem.
97
98 + Emacs indents lambda lists really badly. I often prefer to put the
99 entire lambda list on its own line than to split it. If I have to
100 split a simple lambda list, without lambda-list keywords, I just
101 align the start of each subsequent line with the start of the first
102 argument. I break hairy lambda lists before lambda-list keywords,
103 and the start of a subsequent line aligns with the first argument
104 name following the lambda-list keyword which begins the group, so
105 that the lambda-list keyword stands out.
106
107 : (defun many-arguments (first second third
108 : fourth fifth)
109 : ...)
110
111 : (defun hairy-arguments (first second third
112 : &optional fourth fifth
113 : sixth
114 : &rest others)
115 : ...)
116
117 I don't know what I'd do if I had a hairy lambda list with so many
118 mandatory positional arguments that I had to split them. So far,
119 this situation hasn't come up.
120
121 Lisp code does have a tendency to march across to the right quite
122 rapidly given a chance. I have a number of strategies for dealing with
123 this.
124
125 + Break a long nested calculation into pieces, giving names to the
126 intermediate results, in a ~let*~ form.
127
128 + Hoist deeply nested complex computations out into ~flet~ or
129 ~labels~, and then invoke them from inside whatever complicated
130 conditional mess was needed to decide what to do.
131
132 + Shrug my shoulders and let code dribble down the right hand side for
133 a bit.
134
135 ** Packages and exporting
136
137 A package collects symbols which are given meanings in one or more
138 source files. If a package's code is all in one file, then the package
139 definition can be put in that file too; otherwise I put it in its own
140 file.
141
142 I don't put ~:export~ in package definitions. Instead, I scatter calls
143 to the ~export~ function throughout the code, right next to where the
144 relevant symbol is defined. This has three important advantages.
145
146 + You can tell, when you're reading the code which defines ~foo~,
147 whether ~foo~ is exported and therefore a defined part of the
148 package interface.
149
150 + When you know that you're writing a thing which will form part of
151 the package interface, you don't have to go off and edit some other
152 file to export it.
153
154 + A master list of exported symbols becomes a merge hazard: if two
155 different branches add symbols to nearby pieces of the master list
156 then you get a merge conflict for no especially good reason.
157
158 There's an apparent disadvantage: there's no immediately visible master
159 list of exported symbols. But that's not a big problem:
160
161 : (loop for s being the external-symbols of pkg collect s)
162
163 See ~doc/list-symbols.lisp~ for more sophisticated reporting. (In
164 particular, this identifies what kind of thing(s) each external symbol
165 names.)
166
167 ** Comments and file structuring
168
169 A file starts with a big ~;;;~ comment bearing the Emacs ~-*-lisp-*-~
170 marker, a quick description, and copyright and licensing boilerplate. I
171 don't use four-semicolon comments, and I only use ~#|~ ... ~|#~ for
172 special effects.
173
174 Then there's package stuff. There may be a ~cl:defpackage~ form (with
175 explicit package qualifier) if the relevant package doesn't have its own
176 package definition file. I use gensyms to name packages: strings don't
177 seem right, and symbols would leak into some unrelated package.
178
179 Then there's ~cl:in-package~. Like ~defpackage~, I use a gensym to name
180 the package. I can't think offhand of a good reason to have a file with
181 sections `in' more than one package. So, the ~in-package~ form goes at
182 the top of the file, before the first section header. If sections are
183 going to end up in separate packages, I think I'd put a ~cl:in-package~
184 at the top of each section in case I wanted to reorder them.
185
186 The rest of the file consists of Lisp code. I don't use page boundaries
187 ~^L~ to split files up. Instead, I use big banner comments for this:
188
189 : ;;;--------------------------------------------------------------------------
190 : ;;; Section title.
191
192 Sections don't usually have internal comments, but if they did they'd
193 also be ~;;;~ comments.
194
195 Almost all definitions get documentation strings. I've tried to be
196 consistent about formatting.
197
198 + Docstring lines are 77 characters or less.
199
200 + The first line gives a summary of what the thing does. The summary,
201 together with the SLIME-generated synopsis, is likely enough to
202 remind you what the thing does.
203
204 + The rest of the lines are indented by three spaces, and explain
205 carefully what the thing does and what all the parameters mean.
206
207 Smallish functions and macros don't usually need any further
208 commentary. Big functions often need to be split into bitesize pieces
209 with their own internal ~;;~ comments. The idea is that these comments
210 should explain the code's overall strategy to the reader, and help them
211 figure out how a piece fits into that strategy.
212
213 Winged, single ~;~ comments are very rare.
214
215 Files end, as a result of long tradition, with a comment
216
217 : ;;;----- That's all, folks --------------------------------------------------
218
219 ** Macro style
220
221 I don't mind complicated macros if they're doing something worthwhile.
222 They need to have good documentation strings, though.
223
224 That said, where possible I've tried to factor macros into an actual
225 macro providing the syntactic sugar, and a function which receives the
226 parameters and $\eta$-expanded forms, and does the actual work.
227
228 It's extremely bad taste for a macro to evaluate its evaluable
229 parameters in any order other than strictly left to right, or to
230 evaluate them more than once.
231
232 ** Data structures
233
234 I've tended to be happy with plain lists for homogeneous-ish
235 collections. Strongly heterogeneous collections (other than input
236 syntax, destructured using ~defmacro~ or ~destructuring-bind~) I've
237 tended to make a proper data type for.
238
239 My first instinct when defining a new structure is to use ~defclass~.
240 While it's annoyingly verbose, it has the immense benefit over
241 ~defstruct~ that it's safe to redefine CLOS classes in a running image
242 without the world breaking, and I usually find it necessary to add or
243 change slots while I'm working on new code. Once a piece of code has
244 settled down and I have a good feel for what my structure is actually
245 doing, I might switch the ~defclass~ for a ~defstruct~. Several
246 questions influence my decision.
247
248 + Do slot accesses need to be really fast? My usual Lisp
249 implementations aggressively optimize ~defstruct~ accessor
250 functions.
251
252 + Have I subclassed my class? While I can move over a
253 single-inheritance tree using ~:include~, it seems wrong to do this
254 most of the time. Also, I'd be precluding subclasses from multiple
255 inheritance, and I'd either have to prohibit subclassing by
256 extensions or have to commit to ~defstruct~ in the documentation.
257 In general, I'm much happier committing to ~defclass~.
258
259 + Are there methods specialized on my class? Again, structure classes
260 make fine method specializers, but it doesn't seem right.
261
262 Apart from being hard to redefine, ~defstruct~ does a pretty good job of
263 making a new structure type. I tend to tidy up a few rough edges.
264
265 + The default predicate always has ~-p~ appended. If the class name
266 is a single word, then I'll explicitly name the predicate with a
267 simple ~p~ suffix. For example, ~ship~ would have the predicate
268 ~shipp~, rather than ~ship-p~.
269
270 + If there are slots I can't default then I'll usually provide a BOA
271 constructor which sets them from required parameters; other slots
272 I'll set from optional or keyword parameters according to my taste
273 and judgement.
274
275 + Slots mustn't be given names which are external in any package.
276 Unfortunately, slot names are used in constructing accessor names,
277 and sometimes the right accessor name involves a prohibited symbol.
278 I've mostly addressed this by naming the slot ~%foo~, and then
279 providing inline reader and writer functions. (CLOS class
280 definitions don't have this problem because you get to set the
281 accessor function names independently of the slot names.)
282
283 + BOA constructors are strange. You can set the initial slots based
284 on an arbitrary computation on the provided parameters, but you have
285 to roll up your sleeves and mess with ~&aux~ parameters to pull it
286 off.
287
288 ** Naming
289
290 I'm a traditionalist in some ways, and one of the reasons I like Lisp is
291 the richness of its history and tradition.
292
293 In other languages, I tend to use single- or two-letter names for
294 variables and structure slots; not so much in Lisp. Other languages
295 express more using punctuation, so the names stand out easily; I find
296 that short names can be lost more easily in Lisp.
297
298 I've also tended to go for fairly prosaic names, taking my inspiration
299 from the CLOS MOP. While I mourn the loss of whimsical names like
300 ~haulong~ and ~haipart~, I've tried to avoid inventing more of them.
301
302 There's a convention, which I think comes from ML, of using ~_~ where a
303 binding occurrence of a variable name is expected, to signify that that
304 the corresponding value is to be discarded. Common Lisp, alas, doesn't
305 have such a convention. Instead, there's a sequence of silly names used
306 with the same intention, and the bindings are then explicitly ignored
307 with a declaration. The names begin ~hunoz~, ~hukairz~, and (I think)
308 ~huaskt~.
309
310 ** Declarations
311
312 The code is light on declarations, other than ~ignore~ and similar used
313 to muffle warnings. The macros try to do sensible things with
314 declarations, and I think they succeed fairly well, but there might be
315 bugs and rough edges. I know that some are just broken because, for
316 actual correctness, declarations provided by the caller need to be split
317 up into a number of different parts of the expansion, which in turn
318 requires figuring out what the declarations mean and which bindings
319 they're referring to. That's not completely impossible, assuming that
320 there aren't implementation-specific declarations with crazy syntax
321 mixed in there, but it's more work than seems worthwhile.
322
323
324 * C style
325
326 ** Language subset and extensions
327
328 I'm trying to support C89 still. There are few really worthwhile
329 features in C99 and later, though there are some. For now, I want Sod
330 to continue working if built with a C89 compiler, even if some things --
331 e.g., most notably the macro sugar for varargs messages -- are
332 unavailable.
333
334 Similarly, I'll use compiler-specific features if they don't adversely
335 affect portability. For example, I'll use GCC attributes to improve
336 compiler diagnostics, but they're wrapped up in preprocessor hacking so
337 that they won't be noticed by compilers which don't understand them.
338 I'm generally happy to accept contributions which make similar
339 improvements for other compilers.
340
341 Sod is supposed to have minimal dependencies. It should be able to work
342 in what the ISO C standard names a `freestanding environment', without
343 most of the standard C library. The keyword-argument library is
344 carefully split into a piece which is fully portable and a piece which
345 depends on features which are only available in hosted environments,
346 like being able to print stuff to ~stderr~, so that users targetting
347 embedded systems have an easy porting job.
348
349 ** Naming
350
351 I usually give local variables, arguments, and structure members very
352 short names, just one or two characters long. I find that longer names
353 are harder to distinguish, and take up horizontal space. Besides,
354 mathematicians have been using single-letter variable names quite
355 successfully for hundreds of years.
356
357 I usually choose variable names to match their types in an informal way.
358 Loop counters are often called ~i~, ~j~, ~k~; generic pointers, and
359 pointers to bytes or characters, are usually ~p~ or ~q~; a character is
360 often ~ch~; a ~FILE~ pointer is ~fp~ following long tradition; sizes of
361 things, in bytes, are ~sz~, while lengths of vectors, in elements, are
362 ~n~. I often name values of, or pointers to, structures or custom types
363 with the first letter of the type. If I have two things of the same
364 kind, I'll often double the name of one of them; e.g., if I have two
365 pointers to ~whatsit~ structures, I might call them ~w~ and ~ww~.
366
367 I don't (any more) give ~typedef~ names to structures or unions. This
368 makes it possible to have a variable with the same name as the structure
369 tag without serious trouble.
370
371 In variable names, I tend to just squash pieces of words together; in
372 longer names, sometimes I'll put in underscores to split things up a
373 bit. Camel case is bletcherous.
374
375 File-scope names with /internal/ linkage -- i.e., things marked ~static~
376 -- generally deserve somewhat longer names. I don't give them other
377 kind of marking; e.g., I'd probably name the pointer to the head of a
378 list of ~foo~ things something like ~foohead~.
379
380 Names with /external/ linkage want more care because they're playing in
381 a shared global namespace.
382
383 ** Layout
384
385 The C indent quantum is two columns.
386
387 Declarations go at the top of functions. I don't put declarations in
388 inner blocks, and I certainly don't scatter declarations throughout a
389 block. I find that having the declarations all in one place makes it
390 easier for me to keep track of what things the function is going to be
391 thinking about.
392
393 If I can't set a variable to its proper value immediately, I'll leave it
394 uninitialized until I can. That way, the compiler will warn me if I
395 forget.
396
397 Most of my style is an attempt to get as much interesting code on the
398 screen at a time, and still be able to read it. The short variable
399 names keep things distinct while keeping statements short; short
400 statements don't need to be split across multiple lines. And keeping
401 the overall line length limit low means I can fit more /columns/ of code
402 on my screen.
403
404 If there are several related variables with the same declaration
405 specifiers, I'll usually write a single declaration for all of them --
406 even if they have different actual types. For example,
407
408 : struct foo f, *fp = &f;
409
410 Note that a ~*~ declarator operator has a space to its left, but never
411 to its right. (Stroustrup's style horribly misrepresents the underlying
412 syntax.)
413
414 I will often write multiple statements on a single line, usually to
415 indicate that these things are part of the same thought, and they
416 shouldn't be separated. For example, if I'm working through an array of
417 things, I might have a pointer ~p~ to the element I'm hacking on, and a
418 count ~n~ of things left to hack, I'll have a loop
419
420 : while (n) {
421 : /* hack on *p */
422 : p++; n--;
423 : }
424
425 so that the two updates don't get separated.
426
427 I don't wrap braces around individual statements that fit on a single
428 line. For example, I'll write
429
430 : while (*p == ' ') p++;
431
432 On the other hand, if a single substatement is going to take more than
433 one line then it gets wrapped in braces.
434
435 I don't write blocks which aren't part of larger compound statements,
436 e.g., ~if~ or ~while~. I'll write a compound statementon a single line
437 if I can; but I'll split ~if~ with an ~else~ over two lines. For
438 example,
439
440 : if (a == 1) x = 0;
441 : else if (b == 3) { y = 2; z = 1; }
442 : else w = 15;
443
444 On the other hand, if I can't write all of the branches of an
445 ~if~\relax/\relax ~else if~ ladder like this, then /all/ of the
446 substatements get their own lines. (I write ~do~\relax/\relax ~while~
447 loops in the same way, but this comes up much less frequently.)
448
449 If I can't write a block on the same line, then the opening brace goes
450 on the same line as the statement head, and the closing brace gets its
451 own line. A trailing ~else~ or ~while~ goes on the same line as the
452 previous closing brace, if there is one.
453
454 I don't write spaces inside parentheses or square brackets, or between
455 unary operators and their operands. I always write ~sizeof~ as if it
456 were a function, even though I know it isn't. I write a single space
457 either side of non-multiplicative binary operators -- i.e., other than
458 ~*~, ~/~, ~%~, and ~&~; I don't write spaces around multiplicative
459 operators any more. The comma operator is special, and gets a space
460 after, but not before.
461
462 If I'm breaking a long line at a binary operator, the break comes
463 /after/ the operator, not before.
464
465 ** Common conventions
466
467 A /predicate/ is a function which answers a yes/no question -- and has
468 no side-effects. I don't use ~bool~ or similar; predicates return
469 ~int~, such that zero is false and nonzero is true. Predicates usually
470 have names ending in ~p~ or ~_p~. (Note that function names
471 beginning ~is...~ are reserved for future ~<ctype.h>~ macros.)
472
473 On the other hand, an /operation/ is a function whose main purpose is to
474 have an effect -- maybe create a thing, or update some state. In the
475 absence of better ideas, operations also return ~int~, but zero
476 indicates success, and nonzero -- usually $-1$ -- indicates failure.
477
478 ** Error handling and resource management
479
480 I've tried many techniques. I think the following is the best approach
481 so far.
482
483 I try to arrange that every type which represents some resource which
484 might need releasing has an easily recognizable `inert' value which
485 indicates that the resource has not been acquired. At the top of a
486 function, I initialize all of the variables which might hold onto
487 resources to their inert values. At the end of the function, I place a
488 label, ~end~ or ~fail~. An ~end~ label is for common cleanup; a ~fail~
489 label is for cleanup that's only needed on unsuccessful completion.
490
491
492 ** Miscellaneous style issues
493
494 I write ~0~, not ~NULL~. Doing this prevents a common error in
495 null-terminated variable-length argument lists, e.g., ~execlp~, where
496 ~NULL~ is actually an integer ~0~ in disguise and ends up being an ~int~
497 where a pointer was wanted.
498
499 I don't usually write redundant comparisons against ~0~, or ~NULL~, or
500 well-known return codes indicating success. Again, this helps with
501 compression. I'll write
502
503 : rc = do_something(foo, bar); if (rc) goto end;
504
505 (yes, one line) rather than comparing ~rc~ against some ~STATUS_SUCCESS~
506 code or similar. Exception: I still haven't decided whether I prefer
507 leaving the explicit relational in ~strcmp~ and similar tests.
508
509 I always write parentheses around the expression in a ~return~
510 statement.
511
512 In declarations, storage classes come first (e.g., ~static~, ~extern~,
513 ~typedef~), followed by qualifiers (~const~, ~volatile~; I never use
514 ~restrict~), and then the type specifiers, signedness indicators first
515 where they aren't redundant (so maybe ~signed char~ for special effects,
516 but never ~signed int~), then length indicators, then the base type. I
517 omit ~int~ if there are other type specifiers, so ~unsigned~ or ~long~,
518 rather than ~unsigned int~ or ~long int~.
519
520 The full declarator syntax for function pointer is pretty ugly. I often
521 simplify it by defining a ~typedef~ for the /function/ type, not the
522 function pointer type. For example
523
524 : typedef int callbackfn(struct thing */*t*/, void */*p*/);
525
526 I'd then use variables (structure members, arguments, etc.) of type
527 ~callbackfn *~.
528
529 In header files, I comment out argument names to prevent problems with
530 macros defined by client translation units. Also, I explicitly mark
531 function declarations as being ~extern~.
532
533 ** Comments and file structuring
534
535 I never use C++-style ~//~ comments except for temporary special
536 effects.
537
538 If a comment fits on one line, then its closing ~*/~ is on the same
539 line; otherwise, the ending ~*/~ is on a line by itself, and there's a
540 spine of ~*~ characters in a column on the left.
541
542 A file starts with a big comment bearing the Emacs ~-*-c-*-~ marker, a
543 quick description, and copyright and licensing boilerplate.
544
545 Header files are wrapped up with multiple-inclusion and C++ guards, with
546
547 : #ifndef HEADER_H
548 : #define HEADER_H
549 :
550 : #ifdef __cplusplus
551 : extern "C" {
552 : #endif
553
554 at the top.
555
556 The rest of the file consists of C code. I don't use page boundaries
557 ~^L~ to split files up. Instead, I use big banner comments for this:
558
559 : /*----- Section title -----------------------------------------------------*/
560
561 Following long tradition, functions and macros are documented in a
562 preceding comment which looks like this.
563
564 : /* --- @name@ --- *
565 : *
566 : * Arguments: @type fmm@ = a five-minute argument
567 : * @type fhh@ = the full half-hour
568 : *
569 : * Returns: A return value.
570 : *
571 : * Use: It does a thing. Otherwise I wouldn't have bothered.
572 : */
573
574 Sometimes (rarely) the description of the return value explains
575 sufficiently what the thing does. If so, the `Use' part can be omitted.
576 Fragments of C code in this comment are surrounded by ~@~ characters.
577 There can also be \LaTeX\ maths in here, in ~%$~...\relax ~$%~.
578
579 Files end, as a result of long tradition, with a comment
580
581 : /*----- That's all, folks -------------------------------------------------*/
582
583 The closing ~#endif~ of a header file comes after this final comment.
584
585
586 * COMMENT Emacs cruft
587
588 #+LATEX_CLASS: strayman
589
590 ## LocalWords: CLOS ish destructure destructured accessor specializers
591 ## LocalWords: accessors DSLs gensym gensyms bletcherous Stroustrup
592 ## LocalWords: Stroustrup's signedness
593
594 ## Local variables:
595 ## mode: org
596 ## End: