# -*-org-*-
#+TITLE: ~runlisp~ -- run scripts written in Common Lisp
#+AUTHOR: Mark Wooding
#+LaTeX_CLASS: strayman
#+LaTeX_HEADER: \usepackage{tikz, gnuplot-lua-tikz}

~runlisp~ is a small C program intended to be run from a script ~#!~
line.  It selects and invokes a Common Lisp implementation, so as to run
the script.  In this sense, ~runlisp~ is a partial replacement for
~cl-launch~.

Currently, the following Lisp implementations are supported:

  + Armed Bear Common Lisp (~abcl~),
  + Clozure Common Lisp (~ccl~),
  + GNU CLisp (~clisp~),
  + Carnegie--Mellon Univerity Common Lisp (~cmucl~), and
  + Embeddable Common Lisp (~ecl~), and
  + Steel Bank Common Lisp (~sbcl~).

I'm happy to take patches to support additional free Lisp
implementations.  I'm not interested in supporting non-free Lisp
systems.


* Writing scripts in Lisp

** Basic use

The obvious way to use ~runlisp~ is in a shebang (~#!~) line at the top
of a script.  For example:

: #! /usr/local/bin/runlisp
: (format t "Hello from Lisp!~%")

Script interpreters must be named with absolute pathnames in shebang
lines; if your ~runlisp~ is installed somewhere other than
~/usr/local/bin/~ then you'll need to write something different.
Alternatively, a common hack involves abusing the ~env~ program as a
script interpreter, because it will do a path search for the program
it's supposed to run:

: #! /usr/bin/env runlisp
: (format t "Hello from Lisp!~%")

** Specific Lisps

Lisp implementations are not created equal -- for good reason.  If your
script depends on the features of some particular Lisp implementation,
then you can tell ~runlisp~ that it must use that implementation to run
your script using the ~-L~ option; for example:

: #! /usr/local/bin/runlisp -Lsbcl
: (format t "Hello from Steel Bank Common Lisp!~%")

If your script supports several Lisps, but not all, then list them all
in the ~-L~ option, separated by commas:

: #! /usr/local/bin/runlisp -Lsbcl,ccl
: (format t #.(concatenate 'string
:                          "Hello from "
:                          #+sbcl "Steel Bank"
:                          #+ccl "Clozure"
:                          #-(or sbcl ccl) "an unexpected"
:                          " Common Lisp!~%"))

** Embedded options

If your script requires features of particular Lisp implementations
/and/ you don't want to hardcode an absolute path to ~runlisp~, then you
have a problem.  Most Unix-like operating systems will parse a shebang
line into the initial ~#!~, the pathname to the interpreter program,
and a /single/ optional argument: any further spaces don't separate
further arguments: they just get included in the first argument, all the
way up to the end of the line.  So

: #! /usr/bin/env runlisp -Lsbcl
: (format t "Hello from Steel Bank Common Lisp!~%")

won't work: it'll just try to run a program named ~runlisp -Lsbcl~, with
a space in the middle of its name, and that's quite unlikely to exist.

To help with this situation, ~runlisp~ reads /embedded options/ from
your script.  Specifically, if the script's second line contains the
token ~@RUNLISP:~ then ~runlisp~ will parse additional options from this
line.  So the following will work properly.

: #! /usr/bin/env runlisp
: ;;; @RUNLISP: -Lsbcl
: (format t "Hello from Steel Bank Common Lisp!~%")

Embedded options are split at spaces properly.  Spaces can be escaped or
quoted in (an approximation to) the usual shell manner, should that be
necessary.  See the manpage for the gory details.

** Common environment

~runlisp~ puts some effort into making sure that Lisp scripts get the
same view of the world regardless of which implementation is running
them.

For example:

  + The ~asdf~ and ~uiop~ systems are loaded and ready for use.

  + The script's command-line arguments are available in
    ~uiop:*command-line-arguments*~.  Its name can be found by calling
    ~(uiop:argv0)~ -- though it's probably also in ~*load-pathname*~.

  + The prevailing Unix standard input, output, and error files are
    available through the Lisp ~*standard-input*~, ~*standard-output*~,
    and ~*error-ouptut*~ streams, respectively.  (This is, alas, not a
    foregone conclusion.)

  + The keyword ~:runlisp-script~ is added to the ~*features*~ list.
    This means that your script can tell whether it's being run from the
    command line, and should therefore do its thing and then quit; or
    merely being loaded into a Lisp system, e.g., for debugging or
    development, and should sit still and not do anything until it's
    asked.

See the manual for the complete list of guarantees.


* Invoking Lisp implementations

** Basic use

A secondary use of ~runlisp~ is in build scripts for Lisp programs.  If
the entire project is just a Lisp library, then it's possibly acceptable
to just provide an ASDF system definition and expect users to type
~(asdf:load-system "mumble")~ to use it.  If it's a program, or there
are things other than Lisp which ASDF can't or shouldn't handle --
significant pieces in other languages, or a Lisp executable image to
make and install -- then it seems sensible to make the project's main
build system be something language-agnostic, say Unix ~make~, and
arrange for that to invoke ASDF at the appropriate time.

But how should that be arranged?  It's relatively easy for a project'
Lisp code to support multiple Lisp implementation; but each
implementation wants different runes for evaluating Lisp forms from the
command line, and some of them don't provide an ideal environment for
integrating into a build system.  So ~runlisp~ provides a simple common
command-line interface for evaluating Lisp forms.  For example:

: $ runlisp -e '(format t "~A~%" (+ 1 2))'
: 3

If your build script needs to get information out of Lisp, then wrapping
~format~, or even ~prin1~, around forms is annoying; so ~runlisp~ has a
~-p~ option which prints the values of the forms it evaluates.

: $ runlisp -e '(+ 1 2)'
: 3

If a form produces multiple values, then ~-p~ will print all of them
separated by spaces, on a single line:

: $ runlisp -p '(floor 5 2)'
: 2 1

In addition to evaluating forms with ~-e~, and printing their values
with ~-p~, you can also load a file of Lisp code using ~-l~.

When ~runlisp~ is acting on ~-e~, ~-p~, and/or ~-l~ options, it's said
to be running in /eval/ mode, rather than its usual /script/ mode.  In
script mode, it /doesn't/ set ~:runlisp-script~ in ~*features*~.

You can still insist that ~runlisp~ use a particular Lisp
implementation, or one of a subset of implementations, using the ~-L~
option mentioned above.

: $ runlisp -Lsbcl -p "(lisp-implementation-type)"
: "SBCL"

** Command-line processing

When scripting a Lisp -- as opposed to running a Lisp script -- it's not
necessarily the case that your script knows in advance exactly what it
needs to ask Lisp to do.  For example, it might need to tell Lisp to
install a program in a particular directory, determined by Autoconf.
While it's certainly /possible/ to quote such data and splice them into
Lisp forms, it's more convenient to pass them in separately.  So
~runlisp~ ensures that the command-line options are available to Lisp
forms via ~uiop:*command-line-arguments*~, as they are to a Lisp script.

: $ runlisp -p "uiop:*command-line-arguments*" one two three
: ("one" "two" "three")

When running Lisp forms like this, ~(uiop:argv0)~ isn't very
meaningful.  (Currently, it reveals the name of the script which
~runlisp~ uses to implement this feature.)


* Configuring =runlisp=

** Where =runlisp= looks for configuration

You can influence which Lisp implementations are chosen by ~runlisp~ by
writing a configuration file, and/or setting an environment variable.

~runlisp~ looks for configuration in ~~/.runlisprc~, and in
~~/.config/runlisprc~.  You could put configuration in both, but that
doesn't seem like a great idea.  A configuration file just contains
blank lines, comments, and command-line options, just as you'd write
them to the shell.  Simple quoting and escaping is provided: see the
manual page for the full details.  Each line is processed independently,
so it doesn't work to write an option on one line and then its argument
on the next.

The environment variable ~RUNLISP_OPTIONS~ is processed /after/ reading
the configuration file(s), if any.  Again, it should contain
command-line options, as you'd write them to the shell.

** Deciding which Lisp implementation to use

The most useful option to use here is ~-P~, which builds up a
/preference list/, in order.  The argument to ~-P~ is a comma-separated
list of Lisp implementation names, just like you'd give to ~-L~.

If you provide multiple ~-P~ options (e.g., on different lines of your
configuration file, or separately in the configuration file and
environment variable, then the lists are concatenated.  Since the
environment variable is processed after the configuration file, this
means that 

When deciding which Lisp implementation to use, ~runlisp~ works as
follows.  It builds a list of /acceptable/ Lisp implementations from the
~-L~ options, and a list of /preferred/ Lisp implementations from the
~-P~ options.  If there aren't any ~-L~ options, then it assumes that
/all/ Lisp implementations are acceptable; but if there are no ~-P~
options then it assumes that /no/ Lisp implementations are preferred.
It then works through the preferred list in order: if it finds an
implementation which is installed and acceptable, then it uses that one.
If that doesn't work, then it works through the acceptable
implementations that it hasn't tried yet, in order, and if it finds one
of those that's installed, then it runs that one.  Otherwise it reports
an error and gives up.

** Clearing the preferred list

Since the environment variable is processed after the configuration
files, it can only append more Lisp implementations to the end of the
preferred list, which may well not be so helpful.  There's an additional
option ~-C~, which completely clears the preferred list.  The idea is
that you can write ~-C~ at the start of your ~RUNLISP_OPTIONS~
environment variable to temporarily override your usual configuration
for some special effect.


* What's wrong with =cl-launch=?

The short version is that ~cl-launch~ is slow and inconvenient.
~cl-launch~ is a big, complicated Common Lisp/Bourne shell polyglot
which tries to do everything but doesn't quite succeed.

** It's slow.

I took a trivial Lisp script:

: (format t "Hello from ~A!~%~
:            Script = `~A'~%~
:            Arguments = (~{`~A'~^, ~})~%"
:         (lisp-implementation-type)
:         (uiop:argv0)
:         uiop:*command-line-arguments*)

I timed how long it took to run on all of ~runlisp~'s supported Lisp
implementations, and compared them to how long ~cl-launch~ took: the
results are shown in table [[tab:runlisp-vanilla]].  ~runlisp~ is /at least/
two and half times faster at running this script than ~cl-launch~ on all
implementations except Clozure CL[fn:slow-ccl], and approaching four and
a half times faster on SBCL.

#+CAPTION: ~cl-launch~ vs ~runlisp~ (with vanilla images)
#+NAME: tab:runlisp-vanilla
#+ATTR_LATEX: :float t :placement [tbp]
|------------------+-------------------+-----------------+----------------------|
| *Implementation* | *~cl-launch~ (s)* | *~runlisp~ (s)* | *~runlisp~ (factor)* |
|------------------+-------------------+-----------------+----------------------|
| ABCL             |            7.3036 |          2.6027 |                2.806 |
| Clozure CL       |            1.2769 |          0.9678 |                1.319 |
| GNU CLisp        |            1.2498 |          0.2659 |                4.700 |
| CMU CL           |            0.9665 |          0.3065 |                3.153 |
| ECL              |            0.8025 |          0.3173 |                2.529 |
| SBCL             |            0.3266 |          0.0739 |                4.419 |
|------------------+-------------------+-----------------+----------------------|
#+TBLFM: $4=$2/$3;%.3f

But this is using the `vanilla' Lisp images installed with the
implementations.  ~runlisp~ by default builds custom images for most
Lisp implementations, which improves startup performance significantly;
see table [[tab:runlisp-custom]].  (I don't currently know how to build a
useful custom image for ABCL.  ~runlisp~ does build a custom image for
ECL, but it doesn't help significantly.)  These results are summarized
in figure [[fig:lisp-graph]].

#+CAPTION: ~cl-launch~ vs ~runlisp~ (with custom images)
#+NAME: tab:runlisp-custom
#+ATTR_LATEX: :float t :placement [tbp]
|------------------+-------------------+-----------------+----------------------|
| *Implementation* | *~cl-launch~ (s)* | *~runlisp~ (s)* | *~runlisp~ (factor)* |
|------------------+-------------------+-----------------+----------------------|
| ABCL             |            7.3036 |          2.5873 |                2.823 |
| Clozure CL       |            1.2769 |          0.0088 |              145.102 |
| GNU CLisp        |            1.2498 |          0.0146 |               85.603 |
| CMU CL           |            0.9665 |          0.0063 |              153.413 |
| ECL              |            0.8025 |          0.3185 |                2.520 |
| SBCL             |            0.3266 |          0.0077 |               42.416 |
|------------------+-------------------+-----------------+----------------------|
#+TBLFM: $4=$2/$3;%.3f

#+CAPTION: Comparison of ~runlisp~ and ~cl-launch~ times
#+NAME: fig:lisp-graph
#+ATTR_LATEX: :float t :placement [tbp]
[[file:doc/lisp-graph.tikz]]

Unlike ~cl-launch~, with some Lisp implementations at least, ~runlisp~
startup performance is usefully comparable to other popular scripting
language implementations.  I wrote similarly trivial scripts in a number
of other languages, and timed them; the results are tabulated in table
[[tab:runlisp-interp]] and graphed in figure [[fig:interp-graph]].

#+CAPTION: ~runlisp~ vs other interpreters
#+NAME: tab:runlisp-interp
#+ATTR_LATEX: :float t :placement [tbp]
|------------------------------+-------------|
| *Implementation*             | *Time (ms)* |
|------------------------------+-------------|
| Clozure CL                   |         8.8 |
| GNU CLisp                    |        14.6 |
| CMU CL                       |         6.3 |
| SBCL                         |         7.7 |
|------------------------------+-------------|
| Perl                         |         1.2 |
| Python                       |        10.3 |
|------------------------------+-------------|
| Debian Almquist shell (dash) |         1.4 |
| GNU Bash                     |         2.0 |
| Z Shell                      |         4.1 |
|------------------------------+-------------|
| Tiny C (compile & run)       |         1.2 |
| GCC (precompiled)            |         0.5 |
|------------------------------+-------------|

#+CAPTION: Comparison of ~runlisp~ and other script interpreters
#+NAME: fig:interp-graph
#+Attr_latex: :float t :placement [tbp]
[[file:doc/interp-graph.tikz]]

(All the timings in this section were performed on the same 2020 Dell
XPS13 laptop running Debian `buster'.  The tools used to make the
measurements are included in the source distribution, in the ~bench/~
subdirectory.)

[fn:slow-ccl] I don't know why Clozure CL shows such a small difference
here.

** It's inconvenient

~cl-launch~ has this elaborate machinery which reads shell script
fragments from various places and sets variables like ~$LISPS~, but it
doesn't quite work.

Unlike other scripting languages such as Perl or Python, Common Lisp has
lots of implementations, and they all have various unique features (and
bugs) which a script might rely on (or need to avoid).  Also, a user
might have preferences about which Lisps to use.  ~cl-launch~'s approach
to this problem is a ~system_preferred_lisps~ shell function which can
be used in ~~/.cl-launchrc~ to select a Lisp system for a particular
`software system', though this notion doesn't appear to be well-defined,
but this all works by editing a single ~$LISPS~ shell variable.  By
contrast, ~runlisp~ has a ~-L~ option with which scripts can specify the
Lisp systems they support (in a preference order), and a ~-P~ option
with which users can express their own preferences (e.g., in the
environment or a configuration file): ~runlisp~ will never choose a Lisp
system which the script can't deal with, but it will respect the user's
relative preferences.

** It doesn't establish a (useful) common environment

A number of Lisp systems are annoyingly deficient in their handling of
scripts.

For example, when GNU CLisp's ~-x~ option is used, it rebinds
~*standard-input*~ to an internal string stream holding the expression
passed in on the command line, leaving the process's actual stdin nearly
impossible to access.

: $ date | cl-launch -l sbcl -i "(princ (read-line nil nil))" # expected
: Sun  9 Aug 14:39:10 BST 2020
: $ date | cl-launch -l clisp -i "(princ (read-line nil nil))" # bug!
: NIL

As another example, Armed Bear Common Lisp doesn't seem to believe in
the stderr stream: when it starts up, ~*error-ouptut*~ is bound to the
standard output, just like ~*standard-output*~.  Also, ~cl-launch~
loading ASDF causes a huge number of ~style-warning~ messages to be
written to stdout, making ABCL pretty much useless for writing filter
scripts.

: $ cl-launch -l sbcl -i '(progn
:                           (format *standard-output* "output~%")
:                           (format *error-output* "error~%"))' \
:   > >(sed 's/^/stdout: /') 2> >(sed 's/^/stderr: /')
: stdout: output
: stderr: error
: $ cl-launch -l abcl -i '(progn
:                           (format *standard-output* "output~%")
:                           (format *error-output* "error~%"))' \
:   > >(sed 's/^/stdout: /') 2> >(sed 's/^/stderr: /')
: [1813 lines of compiler warnings tagged `stdout:']
: stdout: output
: stdout: error

~runlisp~ takes care of all of this, providing a basic but useful common
level of shell integration for all its supported Lisp implementations.
In particular:

  + It ensures that the standard Unix `stdin', `stdout', and `stdarr'
    file descriptors are hooked up to the Lisp ~*standard-input*~,
    ~*standard-output*~, and ~*error-output*~ streams.

  + It ensures that starting a script doesn't write a deluge of
    diagnostic drivel.

The complete details are given in ~runlisp~'s manpage.

** Why might one prefer =cl-launch= anyway?

On the other hand, ~cl-launch~ is well established and full-featured.

~cl-launch~ compiles scripts before trying to run them, so they'll run
faster on Lisps which use an interpreter by default.  It has a caching
feature so running a script a second time doesn't need to recompile it.
If your scripts are compute-intensive and benefit from ahead-of-time
compilation then maybe ~cl-launch~ is preferable.

~cl-launch~ supports more Lisp systems.  I only have six installed on my
development machine at the moment, so those are the ones that ~runlisp~
supports.  If you want your scripts to be able to run on other Lisps,
then ~cl-launch~ is the way to do that.  Of course, I welcome patches to
help ~runlisp~ support other free Lisp implementations.  ~cl-launch~
also supports proprietary Lisps: I have very little interest in these,
so if you want to run scripts using Allegro or LispWorks then
~cl-launch~ is your only choice.