[sod] / STYLE

Notes on Lisp style

* Language subset and extensions

None of ANSI Common Lisp is off-limits.

I think my Lisp style is rather more imperative in flavour than most
modern Lisp programmers.  It's probably closer to historical Lisp
practice in that regard, even though I wasn't writing Lisp back then.

I make extensive use of CLOS, and macros.  On a couple of occasions I've
made macros which use CLOS generic function dispatch to compute their
expansions.  The parser language is probably the best example of this in
the codebase.

I like hairy ~format~ strings.

I've avoided hairy ~loop~ for the most part, not because I dislike it
strongly but because others do and I don't find that it wins big enough
for the fight to be worthwhile.

I only use ~&aux~ lambda-list parameters in ~defstruct~ BOA
constructors, for special effects.

I use ~car~, not ~first~, and ~cdr~, not ~rest~.  Similarly, I use
~cadr~, not ~second~, and I'm not afraid to use ~cddr~ or ~cadar~.

Similarly, I've not used ~elt~, preferring to know what kind of sequence
I'm dealing with, or using the built-in sequence functions.

I'm happy to use ~1+~, and I like the brevity of ~1-~ enough to use it
despite its terrible name.

There are no reader syntax extensions in the code.  This is because I
couldn't think of any way they'd be especially helpful, and not because
I'm in any way opposed to them.

The main translator, in the ~SOD~ package, tries to assume very little
beyond ANSI Common Lisp and what's included in just about every serious
implementation: specifically, MOP introspection, and Gray streams.
There's intentionally no MOP intercession.

The frontend additionally makes use of ~cl-launch~, but the dependency
is actually quite weak, and it could be replaced with a different, maybe
implementation-specific, mechanism fairly easily.  I'm keen to take
patches which improve frontend portability.

I'm more tolerant of extensions and external dependencies in the test
suite, which makes additional use of ~xlunit~.  Running the test suite
isn't essential to getting the translator built, so this isn't as much
of a problem.


* Layout

I pretty much let Emacs indent my code for me, based on information
collected by SLIME.  Some exceptions:

  + DSLs (e.g., the parser language) have their own space of macros
    which Emacs doesn't understand and for the most part I haven't
    bothered to teach it.

  + Emacs sometimes does a bad job with hairy ~loop~ and requires manual
    fixing.  Since I don't use hairy ~loop~ much, this isn't a major
    problem.

  + Emacs indents lambda lists really badly.  I often prefer to put the
    entire lambda list on its own line than to split it.  If I have to
    split a simple lambda list, without lambda-list keywords, I just
    align the start of each subsequent line with the start of the first
    argument.  I break hairy lambda lists before lambda-list keywords,
    and the start of a subsequent line aligns with the first argument
    name following the lambda-list keyword which begins the group, so
    that the lambda-list keyword stands out.

    : (defun many-arguments (first second third
    : 			     fourth fifth)
    :   ...)

    : (defun hairy-arguments (first second third
    : 			      &optional fourth fifth
    : 					sixth
    : 			      &rest others)
    :   ...)

    I don't know what I'd do if I had a hairy lambda list with so many
    mandatory positional arguments that I had to split them.  So far,
    this situation hasn't come up.

Lines are 77 characters at most, except for strange special effects.
Don't ask.  This is not negotiable, though.  Don't try to tell me that
your monitor is very wide so you can read longer lines.  My monitor is
likely at least as wide.  On the other hand, most lines are easily short
enough to fit in my narrow columns, so the right hand side of a wide
window would be mostly blank.  This seems wasteful to me, when I could
fill that space with more code.

Lisp code does have a tendency to march across to the right quite
rapidly given a chance.  I have a number of strategies for dealing with
this.

  + Break a long nested calculation into pieces, giving names to the
    intermediate results, in a ~let*~ form.

  + Hoist deeply nested complex computations out into ~flet~ or
    ~labels~, and then invoke them from inside whatever complicated
    conditional mess was needed to decide what to do.

  + Shrug my shoulders and let code dribble down the right hand side for
    a bit.


* Packages and exporting

A package collects symbols which are given meanings in one or more
source files.  If a package's code is all in one file, then the package
definition can be put in that file too; otherwise I put it in its own
file.

I don't put ~:export~ in package definitions.  Instead, I scatter calls
to the ~export~ function throughout the code, right next to where the
relevant symbol is defined.  This has three important advantages.

  + You can tell, when you're reading the code which defines ~foo~,
    whether ~foo~ is exported and therefore a defined part of the
    package interface.

  + When you know that you're writing a thing which will form part of
    the package interface, you don't have to go off and edit some other
    file to export it.

  + A master list of exported symbols becomes a merge hazard: if two
    different branches add symbols to nearby pieces of the master list
    then you get a merge conflict for no especially good reason.

There's an apparent disadvantage: there's no immediately visible master
list of exported symbols.  But that's not a big problem:

: (loop for s being the external-symbols of pkg collect s)

See ~doc/list-symbols.lisp~ for more sophisticated reporting.  (In
particular, this identifies what kind of thing(s) each external symbol
names.)


* Comments and file structuring

A file starts with a big ~;;;~ comment bearing the Emacs ~-*-lisp-*-~
marker, a quick description, and copyright and licensing boilerplate.  I
don't use four-semicolon comments, and I only use ~#|~ ... ~|#~ for
special effects.

Then there's package stuff.  There may be a ~cl:defpackage~ form (with
explicit package qualifier) if the relevant package doesn't have its own
package definition file.

Then there's ~cl:in-package~.  Like ~defpackage~, I use a gensym to name
the package.  I can't think offhand of a good reason to have a file with
sections `in' more than one package.  So, the ~in-package~ form goes at
the top of the file, before the first section header.  If sections are
going to end up in separate packages, I think I'd put a ~cl:in-package~
at the top of each section in case I wanted to reorder them.

The rest of the file consists of Lisp code.  I don't use page boundaries
~^L~ to split files up.  Instead, I use big banner comments for this:

: ;;;--------------------------------------------------------------------------
: ;;; Section title.

Sections don't usually have internal comments, but if they did they'd
also be ~;;;~ comments.

Almost all definitions get documentation strings.  I've tried to be
consistent about formatting.

  + Docstring lines are 77 characters or less.

  + The first line gives a summary of what the thing does.  The summary,
    together with the SLIME-generated synopsis, is likely enough to
    remind you what the thing does.

  + The rest of the lines are indented by three spaces, and explain
    carefully what the thing does and what all the parameters mean.

Smallish functions and macros don't usually need any further
commentary.  Big functions often need to be split into bitesize pieces
with their own internal ~;;~ comments.  The idea is that these comments
should explain the code's overall strategy to the reader, and help them
figure out how a piece fits into that strategy.

Winged, single ~;~ comments are very rare.

Files end, as a result of long tradition, with a comment

: ;;;----- That's all, folks --------------------------------------------------


* Macro style

I don't mind complicated macros if they're doing something worthwhile.
They need to have good documentation strings, though.

That said, where possible I've tried to factor macros into an actual
macro providing the syntactic sugar, and a function which receives the
parameters and $\eta$-expanded forms, and does the actual work.

It's extremely bad taste for a macro to evaluate its evaluable
parameters in any order other than strictly left to right, or to
evaluate them more than once.


* Data structures

I've tended to be happy with plain lists for homogeneous-ish
collections.  Strongly heterogeneous collections (other than input
syntax, destructured using ~defmacro~ or ~destructuring-bind~) I've
tended to make a proper data type for.

My first instinct when defining a new structure is to use ~defclass~.
While it's annoyingly verbose, it has the immense benefit over
~defstruct~ that it's safe to redefine CLOS classes in a running image
without the world breaking, and I usually find it necessary to add or
change slots while I'm working on new code.  Once a piece of code has
settled down and I have a good feel for what my structure is actually
doing, I might switch the ~defclass~ for a ~defstruct~.  Several
questions influence my decision.

  + Do slot accesses need to be really fast?  My usual Lisp
    implementations aggressively optimize ~defstruct~ accessor
    functions.

  + Have I subclassed my class?  While I can move over a
    single-inheritance tree using ~:include~, it seems wrong to do this
    most of the time.  Also, I'd be precluding subclasses from multiple
    inheritance, and I'd either have to prohibit subclassing by
    extensions or have to commit to ~defstruct~ in the documentation.
    In general, I'm much happier committing to ~defclass~.

  + Are there methods specialized on my class?  Again, structure classes
    make fine method specializers, but it doesn't seem right.

Apart from being hard to redefine, ~defstruct~ does a pretty good job of
making a new structure type.  I tend to tidy up a few rough edges.

  + The default predicate always has ~-p~ appended.  If the class name
    is a single word, then I'll explicitly name the predicate with a
    simple ~p~ suffix.  For example, ~ship~ would have the predicate
    ~shipp~, rather than ~ship-p~.

  + If there are slots I can't default then I'll usually provide a BOA
    constructor which sets them from required parameters; other slots
    I'll set from optional or keyword parameters according to my taste
    and judgement.

  + Slots mustn't be given names which are external in any package.
    Unfortunately, slot names are used in constructing accessor names,
    and sometimes the right accessor name involves a prohibited symbol.
    I've mostly addressed this by naming the slot ~%foo~, and then
    providing inline reader and writer functions.  (CLOS class
    definitions don't have this problem because you get to set the
    accessor function names independently of the slot names.)

  + BOA constructors are strange.  You can set the initial slots based
    on an arbitrary computation on the provided parameters, but you have
    to roll up your sleeves and mess with ~&aux~ parameters to pull it
    off.


* Naming

I'm a traditionalist in some ways, and one of the reasons I like Lisp is
the richness of its history and tradition.

In other languages, I tend to use single- or two-letter names for
variables and structure slots; not so much in Lisp.  Other languages
express more using punctuation, so the names stand out easily; I find
that short names can be lost more easily in Lisp.

I've also tended to go for fairly prosaic names, taking my inspiration
from the CLOS MOP.  While I mourn the loss of whimsical names like
~haulong~ and ~haipart~, I've tried to avoid inventing more of them.

There's a convention, which I think comes from ML, of using ~_~ where a
binding occurrence of a variable name is expected, to signify that that
the corresponding value is to be discarded.  Common Lisp, alas, doesn't
have such a convention.  Instead, there's a sequence of silly names used
with the same intention, and the bindings are then explicitly ignored
with a declaration.  The names begin ~hunoz~, ~hukairz~, and (I think)
~huaskt~.


* Declarations

The code is light on declarations, other than ~ignore~ and similar used
to muffle warnings.  The macros try to do sensible things with
declarations, and I think they succeed fairly well, but there might be
bugs and rough edges.  I know that some are just broken because, for
actual correctness, declarations provided by the caller need to be split
up into a number of different parts of the expansion, which in turn
requires figuring out what the declarations mean and which bindings
they're referring to.  That's not completely impossible, assuming that
there aren't implementation-specific declarations with crazy syntax
mixed in there, but it's more work than seems worthwhile.


* COMMENT Emacs cruft

#+LATEX_CLASS: strayman

## LocalWords:  CLOS ish destructure destructured accessor specializers
## LocalWords:  accessors DSLs gensym

## Local variables:
## mode: org
## End:
Commit	Line	Data
1528431b MW	1	Notes on Lisp style
	2
	3	* Language subset and extensions
	4
	5	None of ANSI Common Lisp is off-limits.
	6
828cb3b1 MW	7	I think my Lisp style is rather more imperative in flavour than most
	8	modern Lisp programmers. It's probably closer to historical Lisp
	9	practice in that regard, even though I wasn't writing Lisp back then.
	10
1528431b MW	11	I make extensive use of CLOS, and macros. On a couple of occasions I've
	12	made macros which use CLOS generic function dispatch to compute their
	13	expansions. The parser language is probably the best example of this in
828cb3b1 MW	14	the codebase.
	15
	16	I like hairy ~format~ strings.
1528431b MW	17
	18	I've avoided hairy ~loop~ for the most part, not because I dislike it
	19	strongly but because others do and I don't find that it wins big enough
	20	for the fight to be worthwhile.
	21
	22	I only use ~&aux~ lambda-list parameters in ~defstruct~ BOA
	23	constructors, for special effects.
	24
	25	I use ~car~, not ~first~, and ~cdr~, not ~rest~. Similarly, I use
	26	~cadr~, not ~second~, and I'm not afraid to use ~cddr~ or ~cadar~.
	27
	28	Similarly, I've not used ~elt~, preferring to know what kind of sequence
	29	I'm dealing with, or using the built-in sequence functions.
	30
	31	I'm happy to use ~1+~, and I like the brevity of ~1-~ enough to use it
	32	despite its terrible name.
	33
	34	There are no reader syntax extensions in the code. This is because I
	35	couldn't think of any way they'd be especially helpful, and not because
	36	I'm in any way opposed to them.
	37
	38	The main translator, in the ~SOD~ package, tries to assume very little
	39	beyond ANSI Common Lisp and what's included in just about every serious
	40	implementation: specifically, MOP introspection, and Gray streams.
	41	There's intentionally no MOP intercession.
	42
7e55d099 MW	43	The frontend additionally makes use of ~cl-launch~, but the dependency
7e55d099 MW	44	is actually quite weak, and it could be replaced with a different, maybe
1528431b MW	45	implementation-specific, mechanism fairly easily. I'm keen to take
	46	patches which improve frontend portability.
	47
	48	I'm more tolerant of extensions and external dependencies in the test
	49	suite, which makes additional use of ~xlunit~. Running the test suite
	50	isn't essential to getting the translator built, so this isn't as much
	51	of a problem.
	52
	53
	54	* Layout
	55
	56	I pretty much let Emacs indent my code for me, based on information
	57	collected by SLIME. Some exceptions:
	58
	59	+ DSLs (e.g., the parser language) have their own space of macros
	60	which Emacs doesn't understand and for the most part I haven't
	61	bothered to teach it.
	62
	63	+ Emacs sometimes does a bad job with hairy ~loop~ and requires manual
	64	fixing. Since I don't use hairy ~loop~ much, this isn't a major
	65	problem.
	66
f458e64e MW	67	+ Emacs indents lambda lists really badly. I often prefer to put the
	68	entire lambda list on its own line than to split it. If I have to
	69	split a simple lambda list, without lambda-list keywords, I just
	70	align the start of each subsequent line with the start of the first
	71	argument. I break hairy lambda lists before lambda-list keywords,
	72	and the start of a subsequent line aligns with the first argument
	73	name following the lambda-list keyword which begins the group, so
	74	that the lambda-list keyword stands out.
	75
	76	: (defun many-arguments (first second third
	77	: fourth fifth)
	78	: ...)
	79
	80	: (defun hairy-arguments (first second third
	81	: &optional fourth fifth
	82	: sixth
	83	: &rest others)
	84	: ...)
	85
	86	I don't know what I'd do if I had a hairy lambda list with so many
	87	mandatory positional arguments that I had to split them. So far,
	88	this situation hasn't come up.
	89
1528431b MW	90	Lines are 77 characters at most, except for strange special effects.
	91	Don't ask. This is not negotiable, though. Don't try to tell me that
	92	your monitor is very wide so you can read longer lines. My monitor is
	93	likely at least as wide. On the other hand, most lines are easily short
	94	enough to fit in my narrow columns, so the right hand side of a wide
	95	window would be mostly blank. This seems wasteful to me, when I could
	96	fill that space with more code.
	97
	98	Lisp code does have a tendency to march across to the right quite
	99	rapidly given a chance. I have a number of strategies for dealing with
	100	this.
	101
	102	+ Break a long nested calculation into pieces, giving names to the
	103	intermediate results, in a ~let*~ form.
	104
7e55d099 MW	105	+ Hoist deeply nested complex computations out into ~flet~ or
7e55d099 MW	106	~labels~, and then invoke them from inside whatever complicated
1528431b MW	107	conditional mess was needed to decide what to do.
	108
	109	+ Shrug my shoulders and let code dribble down the right hand side for
	110	a bit.
	111
	112
	113	* Packages and exporting
	114
	115	A package collects symbols which are given meanings in one or more
	116	source files. If a package's code is all in one file, then the package
	117	definition can be put in that file too; otherwise I put it in its own
	118	file.
	119
	120	I don't put ~:export~ in package definitions. Instead, I scatter calls
	121	to the ~export~ function throughout the code, right next to where the
	122	relevant symbol is defined. This has three important advantages.
	123
	124	+ You can tell, when you're reading the code which defines ~foo~,
	125	whether ~foo~ is exported and therefore a defined part of the
	126	package interface.
	127
	128	+ When you know that you're writing a thing which will form part of
	129	the package interface, you don't have to go off and edit some other
	130	file to export it.
	131
	132	+ A master list of exported symbols becomes a merge hazard: if two
	133	different branches add symbols to nearby pieces of the master list
	134	then you get a merge conflict for no especially good reason.
	135
	136	There's an apparent disadvantage: there's no immediately visible master
	137	list of exported symbols. But that's not a big problem:
	138
	139	: (loop for s being the external-symbols of pkg collect s)
	140
	141	See ~doc/list-symbols.lisp~ for more sophisticated reporting. (In
	142	particular, this identifies what kind of thing(s) each external symbol
	143	names.)
	144
	145
	146	* Comments and file structuring
	147
	148	A file starts with a big ~;;;~ comment bearing the Emacs ~--lisp--~
	149	marker, a quick description, and copyright and licensing boilerplate. I
	150	don't use four-semicolon comments, and I only use ~#\|~ ... ~\|#~ for
	151	special effects.
	152
	153	Then there's package stuff. There may be a ~cl:defpackage~ form (with
	154	explicit package qualifier) if the relevant package doesn't have its own
	155	package definition file.
	156
	157	Then there's ~cl:in-package~. Like ~defpackage~, I use a gensym to name
	158	the package. I can't think offhand of a good reason to have a file with
7e55d099 MW	159	sections `in' more than one package. So, the ~in-package~ form goes at
	160	the top of the file, before the first section header. If sections are
	161	going to end up in separate packages, I think I'd put a ~cl:in-package~
	162	at the top of each section in case I wanted to reorder them.
1528431b MW	163
	164	The rest of the file consists of Lisp code. I don't use page boundaries
	165	~^L~ to split files up. Instead, I use big banner comments for this:
	166
	167	: ;;;--------------------------------------------------------------------------
	168	: ;;; Section title.
	169
	170	Sections don't usually have internal comments, but if they did they'd
	171	also be ~;;;~ comments.
	172
	173	Almost all definitions get documentation strings. I've tried to be
	174	consistent about formatting.
	175
	176	+ Docstring lines are 77 characters or less.
	177
	178	+ The first line gives a summary of what the thing does. The summary,
	179	together with the SLIME-generated synopsis, is likely enough to
	180	remind you what the thing does.
	181
	182	+ The rest of the lines are indented by three spaces, and explain
	183	carefully what the thing does and what all the parameters mean.
	184
	185	Smallish functions and macros don't usually need any further
	186	commentary. Big functions often need to be split into bitesize pieces
	187	with their own internal ~;;~ comments. The idea is that these comments
	188	should explain the code's overall strategy to the reader, and help them
	189	figure out how a piece fits into that strategy.
	190
	191	Winged, single ~;~ comments are very rare.
	192
	193	Files end, as a result of long tradition, with a comment
	194
	195	: ;;;----- That's all, folks --------------------------------------------------
	196
	197
	198	* Macro style
	199
	200	I don't mind complicated macros if they're doing something worthwhile.
	201	They need to have good documentation strings, though.
	202
	203	That said, where possible I've tried to factor macros into an actual
	204	macro providing the syntactic sugar, and a function which receives the
	205	parameters and $\eta$-expanded forms, and does the actual work.
	206
	207	It's extremely bad taste for a macro to evaluate its evaluable
	208	parameters in any order other than strictly left to right, or to
	209	evaluate them more than once.
	210
	211
	212	* Data structures
	213
	214	I've tended to be happy with plain lists for homogeneous-ish
	215	collections. Strongly heterogeneous collections (other than input
	216	syntax, destructured using ~defmacro~ or ~destructuring-bind~) I've
	217	tended to make a proper data type for.
	218
	219	My first instinct when defining a new structure is to use ~defclass~.
	220	While it's annoyingly verbose, it has the immense benefit over
	221	~defstruct~ that it's safe to redefine CLOS classes in a running image
	222	without the world breaking, and I usually find it necessary to add or
	223	change slots while I'm working on new code. Once a piece of code has
	224	settled down and I have a good feel for what my structure is actually
	225	doing, I might switch the ~defclass~ for a ~defstruct~. Several
	226	questions influence my decision.
227
228	+ Do slot accesses need to be really fast? My usual Lisp
229	implementations aggressively optimize ~defstruct~ accessor
230	functions.
231
7e55d099	232	+ Have I subclassed my class? While I can move over a
1528431b MW	233	single-inheritance tree using ~:include~, it seems wrong to do this
	234	most of the time. Also, I'd be precluding subclasses from multiple
	235	inheritance, and I'd either have to prohibit subclassing by
	236	extensions or have to commit to ~defstruct~ in the documentation.
	237	In general, I'm much happier committing to ~defclass~.
	238
	239	+ Are there methods specialized on my class? Again, structure classes
	240	make fine method specializers, but it doesn't seem right.
	241
	242	Apart from being hard to redefine, ~defstruct~ does a pretty good job of
	243	making a new structure type. I tend to tidy up a few rough edges.
	244
	245	+ The default predicate always has ~-p~ appended. If the class name
	246	is a single word, then I'll explicitly name the predicate with a
	247	simple ~p~ suffix. For example, ~ship~ would have the predicate
a51bf71a	248	~shipp~, rather than ~ship-p~.
1528431b MW	249
	250	+ If there are slots I can't default then I'll usually provide a BOA
	251	constructor which sets them from required parameters; other slots
	252	I'll set from optional or keyword parameters according to my taste
	253	and judgement.
	254
	255	+ Slots mustn't be given names which are external in any package.
	256	Unfortunately, slot names are used in constructing accessor names,
	257	and sometimes the right accessor name involves a prohibited symbol.
	258	I've mostly addressed this by naming the slot ~%foo~, and then
	259	providing inline reader and writer functions. (CLOS class
	260	definitions don't have this problem because you get to set the
	261	accessor function names independently of the slot names.)
	262
	263	+ BOA constructors are strange. You can set the initial slots based
	264	on an arbitrary computation on the provided parameters, but you have
	265	to roll up your sleeves and mess with ~&aux~ parameters to pull it
	266	off.
	267
	268
	269	* Naming
	270
	271	I'm a traditionalist in some ways, and one of the reasons I like Lisp is
	272	the richness of its history and tradition.
	273
	274	In other languages, I tend to use single- or two-letter names for
	275	variables and structure slots; not so much in Lisp. Other languages
	276	express more using punctuation, so the names stand out easily; I find
	277	that short names can be lost more easily in Lisp.
	278
	279	I've also tended to go for fairly prosaic names, taking my inspiration
	280	from the CLOS MOP. While I mourn the loss of whimsical names like
	281	~haulong~ and ~haipart~, I've tried to avoid inventing more of them.
	282
f458e64e MW	283	There's a convention, which I think comes from ML, of using ~_~ where a
	284	binding occurrence of a variable name is expected, to signify that that
	285	the corresponding value is to be discarded. Common Lisp, alas, doesn't
	286	have such a convention. Instead, there's a sequence of silly names used
	287	with the same intention, and the bindings are then explicitly ignored
	288	with a declaration. The names begin ~hunoz~, ~hukairz~, and (I think)
	289	~huaskt~.
1528431b MW	290
	291
	292	* Declarations
	293
	294	The code is light on declarations, other than ~ignore~ and similar used
	295	to muffle warnings. The macros try to do sensible things with
	296	declarations, and I think they succeed fairly well, but there might be
	297	bugs and rough edges. I know that some are just broken because, for
	298	actual correctness, declarations provided by the caller need to be split
	299	up into a number of different parts of the expansion, which in turn
	300	requires figuring out what the declarations mean and which bindings
	301	they're referring to. That's not completely impossible, assuming that
f458e64e	302	there aren't implementation-specific declarations with crazy syntax
1528431b MW	303	mixed in there, but it's more work than seems worthwhile.
	304
	305
	306	* COMMENT Emacs cruft
	307
	308	#+LATEX_CLASS: strayman
	309
	310	## LocalWords: CLOS ish destructure destructured accessor specializers
	311	## LocalWords: accessors DSLs gensym
	312
	313	## Local variables:
	314	## mode: org
	315	## End: