Joe MarshallCLRHack: Multiple return values

· 10 hours ago

Multiple Return Value Implementation in CLRHack

The CLRHack compiler implements Multiple Return Values (MRV) by extending the single-value limitation of the .NET Common Intermediate Language (CIL) stack through a thread-local side-channel. This allows Lisp forms to communicate multiple values (up to 64) across function boundaries.

1. The Side-Channel Storage

Because a CIL method can only return a single object on the stack, CLRHack utilizes a static class [LispBase]Lisp.Values. This class contains [ThreadStatic] fields that act as a secondary communication channel:

  • Primary Value: Always resides on the CIL evaluation stack.
  • ReturnCount: An int32 field indicating the total number of values returned (including the primary one).
  • Value1 through Value63: Object fields that store the second through sixty-fourth return values.

2. Producing Multiple Values (The Staging Logic)

To prevent corruption during evaluation, the values form uses a Stage-and-Commit strategy. This is necessary because the side-channel is global to the thread; if a sub-expression inside a values form itself returns multiple values, it would overwrite the global fields before the outer values form is finished.

The compilation process for (values form1 form2 ... formN) follows these steps:

  1. Evaluation: Each form is evaluated in order.
  2. Local Staging: The result of form1 is kept on the stack. The results of form2 through formN are immediately stored into method-local variables (temporaries). This ensures that if form3 calls a function that returns multiple values, the result of form2 is safely tucked away in a local variable and cannot be overwritten.
  3. Commitment: After all forms are evaluated, the compiler generates code to move the values from the local temporaries into the global Value1...ValueN fields.
  4. Finalization: The ReturnCount is set to N.

3. Preservation across Control Flow

Certain Lisp constructs must evaluate sub-forms without allowing those sub-forms to interfere with the return values of the primary form. This is handled by a Save-Restore pattern.

Multiple-Value-Prog1

The multiple-value-prog1 form evaluates its first form, then saves the entire side-channel state (the primary value, the ReturnCount, and all ValueN fields) into local variables. It then evaluates the subsequent forms. After they finish, it restores the side-channel state from its locals, ensuring the values of the first form are what the caller receives.

Unwind-Protect

In unwind-protect, the protected form is evaluated and its primary result is stored in a local variable. Crucially, the finally block (cleanup) must not destroy the side-channel state produced by the protected form. The compiler generates code at the start of the finally block to save ReturnCount and Value1...63 into locals. Once the cleanup forms complete, the state is restored from these locals before the method returns.

4. Nested Multiple Values (The Re-entrancy Problem)

The fundamental problem with a global side-channel is re-entrancy. If the compiler were to store form2 directly into the global Value1 field, and then form3 involved a function call like (some-func), that function might execute its own (values ...) logic. This would overwrite the global Value1 that was just set for the outer form.

By enforcing the use of method-local temporaries during the production of values, CLRHack ensures that the global side-channel is only updated at the last possible moment ("atomically" relative to the Lisp expression), effectively shielding the return values from being corrupted by nested evaluations.

Tim BradshawMeasuring slot access cost in Common Lisp

· 27 hours ago

I’ve been interested in how slow CLOS slot access is in Common Lisp. Here’s how I measured it.

I wanted to compare the cost of access to fields of various objects in Common Lisp. In particular I wanted to get a feel for the difference between a slot in a class defined with defclass, so an instance of a subclass of standard-object, and a field in a class defined with defstruct, so an instance of a subclass of structure-object.

A naïve model of the access cost

I measured forms like

(let ((s 0))
  (declare (type fixnum s))
  (dotimes (i n s)
    (incf s (the fixnum (slot-value a 'a)))
    (incf s (the fixnum (slot-value a 'b)))
    ...))

For \(N\) iterations you might think the time \(T\) should be

\[T = N(c_l + n(c_i + c_s)) + c_c\]

where \(c_l\) is the per-step loop overhead, \(n\) is the number of slots, \(c_i\) is the time to increment a variable, \(c_s\) is the slot read cost (the thing we want), and \(c_c\) is the overhead of calling the function to do all this. \(N\gtrapprox 10^9\), so it’s safe to treat \(c_c\) as zero. This is linear in \(n\), and \(T\) is a thing we can measure, so we can differentiate and get an expression for \(c_s\) which is what we want.

\[c_s = \frac{1}{N}\frac{dT}{dn} - c_i\]

In fact, everything works in terms of the per-step time \(t\doteq T/N\) as \(N\) varies for different classes and numbers of slots to keep the runtimes reasonable, and then \(c_s = dt/dn - c_i\).

A better model

Well, this turns out to be wrong. In particular if you estimate \(c_i\) (see below on how I did this) and use it in the above expression you will end up calculating values of \(c_s\) for structures which are either absurdly tiny (\(\sim 10^{–11}\)s for a machine with a cycle time \(\sim 10^{-9}\)s) or even negative. The reason is pretty obviously that the increment and the access are largely overlapped.

So in what follows I simply treat \(c_i\) as zero. This may overestimate \(c_s\) somewhat. But a result of that overestimation is that the factor by which slot access is slower than structure field access will be underestimated, which will make CLOS seem faster than it is, since if \(a \gt b \gt c \gt 0\) then \((a - c)/(b - c) \gt a/b\). That’s good, because what I’m trying to demonstrate is that it’s really slow, so an underestimation is safe.

The new model expression for \(c_s\) is then just \(c_s = dt/dn\).

What I did

I measured slot access time in the same way for a class with 10 slots, measuring 2, 4, 6, 8 and 10 slots, and did the same thing for a structure with 10 fields.

Because the access times and numbers of accesses per step vary widely I adjusted the number of iterations to keep the run-times sane: more than 10 seconds per test but ideally less than 60.

Each measurement was repeated 4 times.

I then fitted a linear function to the data for each class (least-squares fit), and used its gradient and the estimated variable-increment cost to estimate \(c_s\) for each type.

All the measurements were done on an M1 MacBook Air, using caffeinate to prevent it sleeping. I measured LispWorks 8.1.2 and SBCL 2.6.4. Total run times were somewhat over an hour for each implementation.

Results

SBCL slot access data and best fit

SBCL slot access data and best fit

LispWorks slot access data and best fit

LispWorks slot access data and best fit

SBCL structure field access data and best fit

SBCL structure field access data and best fit

LispWorks structure field access data and best fit

LispWorks structure field access data and best fit

From these you can see that the results are consistent between runs and the best fit is pretty good.

The per-slot cost is then the slope of the best fit curve, or perhaps slightly less.

Structure field access cost estimate

  • SBCL: \(c_s \approx 3.2\times 10^{-10}\)s.
  • LispWorks: \(c_s \approx 3.1\times 10^{-10}\)s.

Note that these are both almost certainly a single cycle up to rounding.

standard-instance subclass slot access cost estimate

  • SBCL: \(c_s \approx 1.2\times 10^{-8}\)s.
  • LispWorks: \(c_s \approx 1.0\times 10^{-8}\)s.

Ratios

The ratios between these two values for each implementation are then about 38 for SBCL and about 32 for LispWorks: this is how much slower CLOS slot access is than structure field access. In fact it is probably an underestimate of how much slower it is.

What is obvious1

CLOS slot access is really slow.

What is less obvious but almost certainly true2

This is not because multiple inheritance is inherently slow: it’s because the design of CLOS, especially if you want to take the AMOP MOP seriously, implies crappy performance.

Can this be fixed? Yes, I think so, with well-defined tradeoffs. Will it be? Up to implementors. So, probably not, sadly.


Notes

To get an estimate of the time to increment a variable, \(c_i\), first measure a large number of iterations of an empty loop and then a loop which increments a variable 100 times for each step. Both of the implementations I measured do not optimize empty loops away, intentionally I think. This estimate is now not used (see above), but if it’s not about a clock cycle (about \(3.3\times 10^{-10}\)s on M1) then probably something is wrong.


Code

This is the CL code I used.

;;;; Some slot-value benchmarks
;;;
;;; None of this code is general-purpose.
;;;

(in-package :cl-user)

(define-condition too-short (simple-error)
  ((seconds :initform 0 :initarg :seconds :reader too-short-seconds)))

(defmacro noting-too-short (&body forms)
  `(handler-bind ((too-short (lambda (e)
                              (format *debug-io* "~&Too short: ~,2Fs when minimum was ~Ds~%"
                                      (too-short-seconds e)
                                      *minimum-seconds*)
                              (continue e))))
     ,@forms))

(defvar *minimum-seconds* 10)         ;how long it must run for

(defmacro ticks (&body forms)
  `(let ((start (get-internal-real-time))
         (end (progn
                ,@forms
                (get-internal-real-time))))
     (let* ((elapsed-ticks (- end start))
            (elapsed-seconds (/ elapsed-ticks internal-time-units-per-second)))
       (when (< elapsed-seconds *minimum-seconds*)
         (cerror "just return ~D (~,2F seconds)"
                 (make-condition 
                  'too-short
                  :format-control "~D ticks (~,2F seconds) is not long enough"
                  :format-arguments (list elapsed-ticks (float elapsed-seconds))
                  :seconds (float elapsed-seconds))
                 elapsed-ticks (float elapsed-seconds)))
       elapsed-ticks)))

(defun seconds (ticks &optional (divider 1))
  (/ ticks internal-time-units-per-second divider))

(defun note (control &rest args)
  (format *debug-io* "~&[~?]~%" control args)
  (force-output *debug-io*))

(defmacro noting ((&rest notes) &body forms)
  ;; Single value only, but this is all we need
  `(progn
     (format *debug-io* "~&[~@{~A~^ ~}" ,@notes)
     (force-output *debug-io*)
     (let ((r (progn ,@forms)))
       (format *debug-io* " -> ~A]~%" r)
       (force-output *debug-io*)
       r)))

(defun inc-n (n incs)
  (declare (type fixnum n incs)
           (optimize speed (safety 0)))
  (case incs
    (0
     (dotimes (i n 0)))
    (100
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s)
         (incf s))))
    (otherwise
     (error "what even is this"))))

(defun estimate-increment-time (&key
                                (exponent 11)
                                &aux
                                  (n (round (expt 10 exponent)))
                                  (n/100 (round (expt 10 (- exponent 2)))))
  (declare (type fixnum n n/100))
  (/ (- (seconds (noting (100 n/100) (ticks (inc-n n/100 100))) n/100)
        (seconds (noting (0 n) (ticks (inc-n n 0))) n))
     100))

(defclass a ()
  ((a :initform 0 :reader a-a)
   (b :initform 0 :reader a-b)
   (c :initform 0 :reader a-c)
   (d :initform 0 :reader a-d)
   (e :initform 0 :reader a-e)
   (f :initform 0 :reader a-f)
   (g :initform 0 :reader a-g)
   (h :initform 0 :reader a-h)
   (i :initform 0 :reader a-i)
   (j :initform 0 :reader a-j)))

(defstruct b
  (a 0)
  (b 0)
  (c 0)
  (d 0)
  (e 0)
  (f 0)
  (g 0)
  (h 0)
  (i 0)
  (j 0))

(defgeneric svn (o n count &key)
  (declare (optimize speed)))

(defmethod svn ((a a) n count &key (reader nil))
  (declare (type fixnum n count)
           (optimize speed (safety 0)))
  (if reader
      (case count
        (2
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (a-a a)))
             (incf s (the fixnum (a-b a))))))
        (4
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (a-a a)))
             (incf s (the fixnum (a-b a)))
             (incf s (the fixnum (a-c a)))
             (incf s (the fixnum (a-d a))))))
        (6
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (a-a a)))
             (incf s (the fixnum (a-b a)))
             (incf s (the fixnum (a-c a)))
             (incf s (the fixnum (a-d a)))
             (incf s (the fixnum (a-e a)))
             (incf s (the fixnum (a-f a))))))
        (8
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (a-a a)))
             (incf s (the fixnum (a-b a)))
             (incf s (the fixnum (a-c a)))
             (incf s (the fixnum (a-d a)))
             (incf s (the fixnum (a-e a)))
             (incf s (the fixnum (a-f a)))
             (incf s (the fixnum (a-g a)))
             (incf s (the fixnum (a-h a))))))
        (10
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (a-a a)))
             (incf s (the fixnum (a-b a)))
             (incf s (the fixnum (a-c a)))
             (incf s (the fixnum (a-d a)))
             (incf s (the fixnum (a-e a)))
             (incf s (the fixnum (a-f a)))
             (incf s (the fixnum (a-g a)))
             (incf s (the fixnum (a-h a)))
             (incf s (the fixnum (a-i a)))
             (incf s (the fixnum (a-j a))))))
        (otherwise
         (error "what even is this")))
      (case count
        (2
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (slot-value a 'a)))
             (incf s (the fixnum (slot-value a 'b))))))
        (4
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (slot-value a 'a)))
             (incf s (the fixnum (slot-value a 'b)))
             (incf s (the fixnum (slot-value a 'c)))
             (incf s (the fixnum (slot-value a 'd))))))
        (6
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (slot-value a 'a)))
             (incf s (the fixnum (slot-value a 'b)))
             (incf s (the fixnum (slot-value a 'c)))
             (incf s (the fixnum (slot-value a 'd)))
             (incf s (the fixnum (slot-value a 'e)))
             (incf s (the fixnum (slot-value a 'f))))))
        (8
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (slot-value a 'a)))
             (incf s (the fixnum (slot-value a 'b)))
             (incf s (the fixnum (slot-value a 'c)))
             (incf s (the fixnum (slot-value a 'd)))
             (incf s (the fixnum (slot-value a 'e)))
             (incf s (the fixnum (slot-value a 'f)))
             (incf s (the fixnum (slot-value a 'g)))
             (incf s (the fixnum (slot-value a 'h))))))
        (10
         (let ((s 0))
           (declare (type fixnum s))
           (dotimes (i n s)
             (incf s (the fixnum (slot-value a 'a)))
             (incf s (the fixnum (slot-value a 'b)))
             (incf s (the fixnum (slot-value a 'c)))
             (incf s (the fixnum (slot-value a 'd)))
             (incf s (the fixnum (slot-value a 'e)))
             (incf s (the fixnum (slot-value a 'f)))
             (incf s (the fixnum (slot-value a 'g)))
             (incf s (the fixnum (slot-value a 'h)))
             (incf s (the fixnum (slot-value a 'i)))
             (incf s (the fixnum (slot-value a 'j))))))
        (otherwise
         (error "what even is this")))))

(defmethod svn ((b b) n count &key)
  (declare (type fixnum n)
           (optimize speed (safety 0)))
  (case count
    (2
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s (the fixnum (b-a b)))
         (incf s (the fixnum (b-b b))))))
    (4
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s (the fixnum (b-a b)))
         (incf s (the fixnum (b-b b)))
         (incf s (the fixnum (b-c b)))
         (incf s (the fixnum (b-d b))))))
    (6
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s (the fixnum (b-a b)))
         (incf s (the fixnum (b-b b)))
         (incf s (the fixnum (b-c b)))
         (incf s (the fixnum (b-d b)))
         (incf s (the fixnum (b-e b)))
         (incf s (the fixnum (b-f b))))))
    (8
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s (the fixnum (b-a b)))
         (incf s (the fixnum (b-b b)))
         (incf s (the fixnum (b-c b)))
         (incf s (the fixnum (b-d b)))
         (incf s (the fixnum (b-e b)))
         (incf s (the fixnum (b-f b)))
         (incf s (the fixnum (b-g b)))
         (incf s (the fixnum (b-h b))))))
    (10
     (let ((s 0))
       (declare (type fixnum s))
       (dotimes (i n s)
         (incf s (the fixnum (b-a b)))
         (incf s (the fixnum (b-b b)))
         (incf s (the fixnum (b-c b)))
         (incf s (the fixnum (b-d b)))
         (incf s (the fixnum (b-e b)))
         (incf s (the fixnum (b-f b)))
         (incf s (the fixnum (b-g b)))
         (incf s (the fixnum (b-h b)))
         (incf s (the fixnum (b-i b)))
         (incf s (the fixnum (b-j b))))))
    (otherwise
     (error "what even is this"))))

(defun measure-thing (thing &key
                              (exponent 11)
                              (specs
                               '((2 -0.2)
                                 (4 -0.4)
                                 (6 -0.6)
                                 (8 -0.8)
                                 (10 -1)))
                              (sleep 0)
                      &aux (cn (class-name (class-of thing))))
  (mapcar (lambda (spec)
            (destructuring-bind (count delta &rest kws &key) spec
              (let ((iterations (round (expt 10 (+ exponent delta)))))
                (let ((per-step (float
                                 (seconds (noting (cn count iterations)
                                                  (ticks (apply #'svn thing
                                                                iterations count kws)))
                                          iterations))))
                  (note "~S ~D elapsed ~Ds per-step ~Ds"
                        cn count (* per-step iterations) per-step)
                  (when (> sleep 0)
                    (noting ("sleep" sleep)
                      (sleep sleep)))
                  (list cn count per-step)))))
          specs))

(defun measure-things (&key
                       (things-and-exponents `((,(make-b) 11)
                                               (,(make-instance 'a) 10)))
                       (log-file "thing-times.ldat")
                       (tries 4)
                       (sleep 5))
  ;; Dump measurements to a log file
  (with-standard-io-syntax
    (with-open-file (log log-file :direction :output
                                  :if-exists :supersede)
      (noting-too-short
       (let ((increment-time (float (estimate-increment-time))))
         (note "increment time ~Ds" increment-time)
         (pprint increment-time log)
         (force-output log))
        (dolist (thing-and-exponent things-and-exponents)
          (destructuring-bind (thing exponent) thing-and-exponent
            (note "~S exponent ~D"
                  (class-name (class-of thing))
                  exponent)
            (dotimes (try tries)
              (pprint
               (measure-thing thing
                              :exponent exponent
                              :sleep sleep)
               log)
              (force-output log)))))))
  log-file)

This is the Racket code which plotted the data and computed the fit & cost.

#lang racket

;;;; Fit data from tsv
;;;

(require simple-polynomial
         plot)

(define (snarf from)
  ;; Stolen from warranted (wcs.rkt): just read all the forms, safely
  (call-with-default-reading-parameterization
   (thunk
    (parameterize ([read-accept-lang #f]
                   [read-accept-reader #f])
      (call-with-input-file from
        (λ (p)
          (for/list ([form (in-port read p)])
            form)))))))

(define (classify file-data)
  ;; The data is an increment time, followed by lists of (class-name
  ;; slot-count seconds) Return a hash table mapping from class names
  ;; and the imcrememt time
  (match-let ([(cons increment-time records) file-data])
    (define cmap (make-hasheqv))
    (for* ([record (in-list records)]
           [single (in-list record)])
      (match-let ([(list class-name count seconds) single])
        (hash-update! cmap class-name
                      (λ (c)
                        (cons (list count seconds) c))
                      '())))
    (values cmap increment-time)))

(define (linear-fit class-name cmap)
  (points->best-fit-polynomial (hash-ref cmap class-name) 1))

(define (slot-cost class-name cmap (increment-time 0.0))
  (- (first (polynomial-terms (linear-fit class-name cmap)))
     increment-time))

(define (file-slot-cost class-name file
                        #:use-increment-time (use-increment-time #f))
  (let-values ([(cmap increment-time)
                (classify (snarf file))])
    (slot-cost class-name cmap (if use-increment-time
                                   increment-time
                                   0.0))))

(define (file-A/B-ratio file #:use-increment-time (use-increment-time #f))
  (/ (file-slot-cost 'A file #:use-increment-time use-increment-time)
     (file-slot-cost 'B file #:use-increment-time use-increment-time)))

(define (plot-linear-fit class-name cmap
                         #:to-file (to-file #f)
                         #:title (title #f))
  (parameterize ([plot-font-family 'modern]
                 [plot-width 560]
                 [plot-x-far-axis? #f]
                 [plot-y-far-axis? #f]
                 [plot-x-ticks (linear-ticks #:number 5)])
    (define pts (hash-ref cmap class-name))
    ((if to-file
         (curryr plot-file to-file)
         plot)
     (list
      (points pts #:sym 'plus #:label (format "~A data" class-name))
      (function (points->best-fit-polynomial pts 1) #:label "linear fit"))
     #:x-min 0
     #:x-max 10.5
     #:x-label "count"
     #:y-label "seconds/step"
     #:title title)))

(define (file-plot-linear-fit class-name file
                              #:to-file (to-file #f)
                              #:title (title #f))
  (let-values ([(cmap _) (classify (snarf file))])
    (plot-linear-fit class-name cmap
                     #:to-file to-file
                     #:title title)))

  1. aka Conclusions 

  2. aka The conclusion anybody paying attention would draw. 

Joe MarshallCLRHack: Tail Recursion

· 34 hours ago

Tail-Call Handling in CLRHack

I decided to make proper tail recursion a fundamental requirement in CLRHack. This prevents stack overflow errors during standard recursive patterns and ensures the runtime remains stable regardless of recursion depth. Technically, Common Lisp isn't required to be tail recursive, but I want mine to be.

1. Tail Position Identification

The compiler performs a structural analysis of the Abstract Syntax Tree (AST) to identify "tail positions." An expression is in a tail position if its value is the final result of the function, meaning no further work remains to be done in the current frame after the call returns. The generate-step2 walker propagates a tail-p flag through the following logic:

  • Functions/Lambdas: The final expression in the body is in the tail position.
  • Conditionals (IF): Both the "then" and "else" branches are in the tail position.
  • Sequences (PROGN/LET): Only the very last form in the sequence is in the tail position.
  • Blocks: The last form of a BLOCK is in the tail position, provided the block is not the target of a RETURN-FROM.

2. CIL Instruction Emission

To implement proper tail-call semantics, the compiler utilizes the native tail. prefix in the Common Intermediate Language (CIL). When a function call is detected in a tail position, the compiler applies the following mandatory transformation:

  1. The Prefix: It prepends the tail. opcode to the call or callvirt instruction.
  2. The Return: It immediately follows the call with a ret (return) instruction.

The tail. prefix instructs the .NET Just-In-Time (JIT) compiler to discard the current method's stack frame before jumping to the target function. This ensures that the call consumes zero additional stack space, turning the recursive call into a semantic jump.

3. Safety and Context Constraints

The implementation of tail-calls is subject to specific safety rules imposed by the Common Language Runtime (CLR) to maintain execution integrity:

  • Protected Regions: The CLR prohibits tail. calls inside try, catch, or finally blocks. Because Lisp constructs such as unwind-protect and handler-case rely on these CIL features, tail-call elimination is suspended within these specific scopes to ensure cleanup handlers and error recovery mechanisms function correctly.
  • Frame Cleanup: The compiler ensures that all local resources are in a valid state before the tail. prefix is issued, allowing the CLR to safely deallocate the current frame.

Example CIL Output

Consider a recursive counter that must be able to run indefinitely:

  (defun count-down (n)
    (if (= n 0)
        "Done"
        (count-down (- n 1))))
  

The compiled CIL for the recursive branch is transformed to ensure stack neutrality:

      ; ... code to calculate (- n 1) ...
      tail.
      call object Program::'COUNT-DOWN'(object)
      ret
  

By strictly enforcing this pattern, CLRHack guarantees that recursive programs can execute with constant stack space, fulfilling my core requirement of tail recursion.

Marco AntoniottiGetting HE&Lambda;P, Finally!

· 2 days ago

As I wrote in my last blog entry, I went back hacking on HEΛP.

HEΛP is the Common Lisp code documentation tool I started writing many years ago.

Apart from a little necessary Javascript and CSS, HEΛP is a full Common Lisp program, geared towards producing static documentation sites for CL code. I finally got around to modernize it and it is now ready for testing.

Evolution from (X)HTML to HTML5

The original HEΛP release was producing only (X)HTML output, moreover based on FRAMESETs.

Alas, when the first HEΛP release was made, FRAMESETs were falling out of fashion, and they were eventually deprecated with the advent of HTML5. An "upgrade" to HTML5 became then a necessity.

After a very long process, I finally finished the HTML5 port, plus some bells and whistles. All in all, the implementation uses <div ... > sections plus CSS to lay out the display, as I understand it is the proper coding fashion nowadays. The port uses the W3.CSS styles, which facilitated a number of choices. The result is rather pleasing, as far as I am concerned.

Example: Producing the HEΛP documentation

The HEΛP documentation (a form of it) is produced with the following command (hlp is one of the package nicknames):

(hlp:document #P"./" ; Just a "top directory"...
              :documentation-title "HE&Lambda;P"
              :format :html5
              :exclude-directories
              (list "doc/"
                    "js/"
                    "css/"
                    ".git/"
                    "tests/"
                    "tmp/"
                    "tools/"
                    )

              :exclude-files ; I run this from LW.
              (list "impl-dependent/ccl.lisp"
                    "impl-dependent/sbcl.lisp"
                    "utilities/document-helambdap.lisp"
                    "utilities/lambda-list-parsing.lisp"
                    )
              :only-documented t
              :only-exported t
              )

After much printing, the resulting static web pages are deposited in docs/html5/, unless overridden. The system also relies on some defaults which are handled by CLAD library.

Viewing the result.
For the time being, you can find the main page of HEΛP here. Navigating the bar on the left will allow you to see different bits and pieces fo the documentation. You will notice that you have different views of the documentation: a system view and a package view. The system view gives you also a view of the files and folders (modules) it contains.

Documentation strings are mostly left alone.
Unike for Emacs Lisp, there is no real agreement in the CL community about how to format documentation strings (if there is, I do not agree with it by definition - obviously). HEΛP wants to be able to document code that does not adopt any documentation string convention, therefore it treats documentation strings pretty much as they are, only adding some text in the guise of the Hyperspec entries.

Screenshot

Here are a couple of screenshots. Apologies for the bad resolution.

The Main View

The HEΛP Main View

The DIctionary View

The HEΛP Dictionary View

Missing Pieces

There is one thing that is missing from HEΛP: the generation of proper crossreferencing. To do it correctly it will be necessary to somehow make some educated guesses about the content of documentation strings or agreeing on some markup to tag linkable items. Apart from that, at the time of this writing the doc strings are handled as enties in an has table, and that could be improved, as more indexes may be needed.

Of course, a major rewrite may also help, but time is a tyrant.

Any suggestion is welcome.

Try it!

Again, HEΛP is ready for you to try. You can clone the repository from Sourceforge.

... and remember: no Python or Ruby or Shell dependencies: pure CL (plus some Javascript, which is a functional language after all, whose first implementation was done in CL).

Enjoy!


'(cheers)

Joe MarshallCLRHack Lexical Variables

· 2 days ago

Lexical Closures in CLRHack

CLRHack implements lexical closures by transforming dynamic Lisp environments into static CIL class structures. Since the .NET Common Language Runtime (CLR) does not have a native concept of "nesting" functions within the lexical scope of another function's local variables, the compiler employs Lambda Lifting and Explicit Closure Conversion.

1. Lambda Lifting

Every lambda expression (including those generated by flet and labels) is extracted from its nesting site. The compiler generates a unique, standalone CIL class for each lambda. These classes inherit from the base [LispBase]Lisp.Closure class.

2. The Closure Class Structure

The generated class acts as a container for both the code (the lambda body) and the environment (the captured variables). It consists of:

  • Environment Fields: For every "free variable" (a variable referenced in the lambda but defined in an outer scope), the compiler adds a public field to the class.
  • The Constructor: A constructor is generated that accepts the values (or references) of these free variables and stores them in the class fields.
  • The Invoke Methods: The class overrides the virtual Invoke methods of the base Closure class. The body of the Lisp lambda is compiled into these methods.

3. Environment Capture

At the point in the code where the lambda is defined, the compiler emits a newobj instruction. It passes the current values of the required local variables into the closure's constructor. This "closes" over the variables, creating a persistent instance of the environment that lives on the heap.

  ; Lisp Source
  (let ((factor 10))
    (lambda (x) (* x factor)))

  ; Conceptual CIL Transformation
  .class private Lambda_1 extends [LispBase]Lisp.Closure {
      .field public object factor_captured

      .method public hidebysig specialname rtspecialname void .ctor(object f) {
          ldarg.0
          ldarg.1
          stfld object Lambda_1::factor_captured
          ret
      }

      .method public virtual object Invoke(object x) {
          ldarg.0
          ldfld object Lambda_1::factor_captured
          ldarg.1
          ; ... multiplication logic ...
      }
  }
  

4. Shared Mutability via ValueCells

Common Lisp requires that if an outer variable is mutated (via setq), all closures capturing that variable must see the change. To support this, CLRHack uses Indirection Cells:

  • If the compiler detects that a captured variable is mutated, it "boxes" that variable.
  • Instead of storing a raw value (like an integer) on the stack, it creates a [LispBase]Lisp.ValueCell object.
  • The closure captures the reference to this ValueCell.
  • Both the parent function and the closure access the variable by reading from or writing to the Value field of the cell. This ensures that all parties share the same mutable state.

5. Invocation

When a closure is invoked (e.g., via funcall), the Invoke method is called. Inside this method, the this pointer (ldarg.0 in CIL) provides the code with access to the captured environment fields. This allows the lifted function to behave as if it were still sitting inside its original lexical scope.

Joe MarshallCLRHack argument passing

· 3 days ago

Common Lisp Argument Passing in CLRHack

The CLRHack engine translates the dynamic, flexible argument-passing semantics of Common Lisp into the static, strongly-typed environment of the .NET Common Language Runtime (CLR). It achieves this through a combination of CIL method overloading, sentinel-based defaulting, and runtime list construction.

1. The Overloading Architecture

Since CIL methods have a fixed arity (number of arguments), but Common Lisp functions support variable arguments, the compiler generates multiple entry points for every defined function.

  • Public Overloads: For every function, the compiler generates a set of public static methods (typically from 0 to 8 arguments). These serve as the "API" for both direct calls and closure invocation.
  • The Body Method: The actual logic of the Lisp function is compiled into a single private static method named [FunctionName]_Body. All valid public overloads normalize their arguments and delegate to this method.
  • Arity Enforcement: Overloads representing invalid argument counts (e.g., calling a 2-arg function with 5 args) are generated to throw a Lisp.WrongNumberOfArgumentsException.

2. Parameter Type Implementation

Required Parameters

Required parameters are the simplest. They map directly to the leading object arguments in both the public overloads and the internal body method.

Optional Parameters (&optional)

Handling &optional involves a "Sentinel Pattern":

  • Sentinels: If a caller uses an overload that provides fewer than the maximum number of optional arguments, the compiler passes a special global constant: Lisp.Undefined::Value.
  • Late Defaulting: Inside the _Body method, the engine generates CIL code to check if the argument is EQ to the Undefined sentinel. If the check passes, the code evaluates the Lisp default expression and stores the result back into the parameter using starg.
  • Supplied-p: If the Lisp code defines a "supplied-p" variable, an extra boolean parameter is added to the _Body method, which is set to true or false based on the presence of the argument in the specific overload.

Rest Parameters (&rest)

Rest parameters are handled by runtime list construction:

  • In the public overloads, any arguments provided beyond the required/optional count are bundled into a linked list using Lisp.List/ListCell.
  • The _Body method receives this list as a single object argument.

Keyword Parameters (&key)

Keyword parameters are implemented as a transformation over the &rest mechanism:

  1. Trailing arguments are gathered into a "rest" list.
  2. The _Body method defines local variables for every keyword parameter, initialized to their Lisp default values.
  3. The Keyword Loop: At the start of the function, the engine emits a loop that scans the rest list in pairs. It uses Object.Equals to compare each key against the pre-interned keyword symbols (e.g., :TEST). When a match is found, the corresponding local variable is updated with the value following the key.

3. The Calling Mechanism

Direct Calls

When the compiler identifies a call to a known defun in the same assembly, it emits a direct call to the public overload that exactly matches the number of provided arguments. This provides near-native performance for fixed-arity calls.

Indirect Calls (Closures & Funcall)

All Lisp functions are instances of the Lisp.Closure class. This class provides virtual Invoke methods. When a closure is created, it captures its environment and provides overrides for these Invoke methods that jump into the appropriate Program static overloads.

4. Multiple Return Values

Note: Argument passing is only half the story; return values use a side-channel.

Because .NET methods return only one value, CLRHack uses a Thread-Static Side-Channel in the Lisp.Values class:

  • The first value is returned normally as the method's return value.
  • Additional values are stored in [ThreadStatic] fields (Value1, Value2, ... up to Value63).
  • A ReturnCount field is updated to tell the caller how many values are waiting in the buffer.

Example CIL Generation

; Lisp: (defun add-optional (x &optional (y 5)) (+ x y))

; Overload for 1 arg
.method public static object 'ADD-OPTIONAL'(object x) {
    ldarg 0
    ldsfld object [LispBase]Lisp.Undefined::Value
    tail. call object Program::'ADD-OPTIONAL_Body'(object, object)
    ret
}

; The Body Method
.method private static object 'ADD-OPTIONAL_Body'(object x, object y) {
    ; Defaulting logic for Y
    ldarg 1
    ldsfld object [LispBase]Lisp.Undefined::Value
    bne.un SKIP_DEFAULT
    ldc.i4 5
    box int32
    starg 1
SKIP_DEFAULT:
    ; ... rest of function ...
}
    

Joe MarshallCLRHack: FibBenchmark

· 7 days ago

The first thing to look at is the Fibonacci benchmark. The source code is here:

(in-package "CLRHACK")

(progn
  (defun fib (n)
    (if (< n 2)
        n
        (+ (fib (- n 1))
           (fib (- n 2)))))
  (defun main ()
    (print "Fibonacci of 10:")
    (print (fib 10)))
  (main))

And it compiles to this IL code: (commentary after the code)

.assembly extern mscorlib {}
.assembly extern LispBase {}

.assembly 'FibBenchmark' {}
.module 'FibBenchmark.exe'

.class public auto ansi beforefieldinit Program
       extends [mscorlib]System.Object
{
    .field public static class [LispBase]Lisp.Symbol 'SYM_G545'

.method public static hidebysig specialname rtspecialname void '.cctor'() cil managed
{
    .maxstack 8
    ldsfld class [LispBase]Lisp.Package [LispBase]Lisp.Package::CommonLisp
    ldstr "T"
    callvirt instance class [LispBase]Lisp.Symbol [LispBase]Lisp.Package::'Intern'(string)
    stsfld class [LispBase]Lisp.Symbol Program::'SYM_G545'
    ret
}

.method public static hidebysig object 'FIB'(object) cil managed
{
    .maxstack 8
    .locals (object TEMP_B)
    ldarg 0
    ldc.i4 2
    box int32
    stloc TEMP_B
    unbox.any int32
    ldloc TEMP_B
    unbox.any int32
    clt
    brtrue TRUE543
    ldnull
    br END544
TRUE543:
    nop
    ldsfld class [LispBase]Lisp.Symbol Program::'SYM_G545'
END544:
    nop
    ldnull
    ceq
    brtrue ELSE546
    ldarg 0
    ret
ELSE546:
    nop
    ldarg 0
    ldc.i4 1
    box int32
    stloc TEMP_B
    unbox.any int32
    ldloc TEMP_B
    unbox.any int32
    sub
    box int32
    call object Program::'FIB'(object)
    ldarg 0
    ldc.i4 2
    box int32
    stloc TEMP_B
    unbox.any int32
    ldloc TEMP_B
    unbox.any int32
    sub
    box int32
    call object Program::'FIB'(object)
    stloc TEMP_B
    unbox.any int32
    ldloc TEMP_B
    unbox.any int32
    add
    box int32
    ret
BLOCK_END_FIB_532:
    nop
    ret
}

.method public static hidebysig object 'MAIN'() cil managed
{
    .maxstack 8
    ldstr "Fibonacci of 10:"
    call void [mscorlib]System.Console::'WriteLine'(object)
    ldnull
    pop
    ldc.i4 10
    box int32
    call object Program::'FIB'(object)
    call void [mscorlib]System.Console::'WriteLine'(object)
    ldnull
    ret
BLOCK_END_MAIN_536:
    nop
    ret
}

.method public static hidebysig void 'Main'() cil managed
{
    .entrypoint
    .maxstack 8
    call object Program::'MAIN'()
    pop
    ret
}

} // end of class Program

The first thing the fib program does is compare argument x to the literal number 2. The compiler pushes argument 0 on to the stack, and then the compiler pushes a integer 2 on to the stack and boxes it.

Next, the compiler has to perform the compare. In order to do this it must unbox both arguments. One argument is on top of the stack, so it is put into a local TEMP_B so we can get to the other argument. We unbox it. We then restore TEMP_B to the top of stack and unbox it. Finally we compare the two unboxed values for less than.

This pattern of unboxing a pair of elements from the top of stack by way of a temporary local is repeated several places in the compiled code as FIB rather inefficiently subtracts 1 or 2 from the argument and makes the recursive call.

This example shows that the compiler basically treats everything as a .NET object. It unboxes numbers at the last moment and boxes the results as soon as they are generated. It is not efficient code.

Joe MarshallI Wrote a Compiler

· 9 days ago

I was bored so I wrote a compiler. I'm lazy so I vibe coded it. It compiles Lisp to .NET IL (the byte code that the .NET runtime executes). The IL is then JIT compiled to machine code and executed. You can use the dotnet runtime from Microsoft or the open source mono runtime as the runtime for the compiled code.

The basic idea of the compiler is to map lambda expressions to .NET classes. The lexical variables are stored as fields in the class. The body of the lambda is compiled to a method in the class. We use lambda lifting to flatten any nested lambdas. We use cell conversion to handle mutable variables and we simply copy the values of immutable variables into the lifted lambdas when they are closed over.

Although I `vibe coded` the compiler, I leveraged my experience with writing compilers to break down the problem into passes that were simple enough that `vibe coding` was possible. For instance, in order to implement lambda lifting, I first wrote a pass that determined the free variables of each lambda. That's a pretty simple operation that I could easily `vibe code`. In order to emit the correct IL, I first wrote a pass that segregated the variables into arguments, lexicals, and globals. Again, that's a simple operation that I could easily `vibe code`.

The trickiest part was the code generator. I had decided to implement tail recursion by using the `tail.` prefix in the IL. This is a hint to the JIT compiler that the call is a tail call and that it can optimize it by reusing the current stack frame. However, the JIT compiler is a bit picky about when it will actually perform the tail calls, and the other parts of the code generator kept moving the tail calls around so that they were no longer in tail position. I eventually had to add a pre-pass to the code generator that tracked the continuations in order to ensure that there was enough information later on to enforce tail position on the tail calls.

It... works? It compiles a number of the Gabriel Benchmarks, and some test programs that demonstrate lexical scoping, mutable variables, and tail recursion. It is most definitely a Lisp compiler, but if you look under the hood, well, be forewarned. It isn't pretty.

The compiler itself was vibe coded. The only restriction on the output code was that it had to implement what the input code specified. It did not have to conform to any particular notion of how to implement lisp features on the .NET runtime beyond the requirement that the output was correct. Choices that are typically made by a Lisp architect, such as how to deal with integers, the implementation of the standard library, etc., were all left up to the vibe coding process. I provided a couple of runtime libraries: a cell library for implementing mutable variables, and a List library for implementing singly linked lists. These were written in C#. The vibe coding process was allowed to modify the C# code in these libraries as well and it did so in a couple of places.

I started with one a simple benchmark and got it to compile and run. From there, I added more benchmarks and each time told the compiler to fix any errors that came up. I also added some test programs that were not part of the benchmarks in order to test specific features of the compiler. As I added more and more test programs, the `vibe coding process` added more and more features to the compiler. This ended up producing more and more complex compiler output code.

I'm going to devote a few blog posts to this compiler, so if it isn't up your alley, skip ahead a few posts.

ECL NewsECL 26.5.5 release

· 21 days ago

We are announcing a bugfix ECL release that addresses a few issues that has slipped through testing of the recent one.

Addressed issues:

  • bugfix: MAKE-PACKAGE destructively modified defining form's cons cells of the package local nicknames, breaking package literals in bytecmp (#839)

  • bugfix: the first environment is now always page-aligned by using the same allocation mechanism as all subsequent envs (#828)

  • bugfix: allow loading concatenated fasc files (#842)

  • bugfix: defclass does not redefine existing classes at compile time with forward-referenced classes in the bytecodes compiler (#843)

This release is available for download in a form of a source code archive (we do not ship prebuilt binaries):

Happy Hacking,
The ECL Developers

Gábor MelisDRef Leaves Home

· 21 days ago

Version 0.5 of DRef, the definition reifier, is now available. It has moved to its own repository, completing its separation from PAX, where it was originally developed.

This was a long time coming. Twelve years ago today, PAX was born. From the start, PAX used the concept of locatives to refer to definitions without first-class objects. For example, to generate documentation for the *MY-VAR* variable, one could use the VARIABLE locative as in (*MY-VAR* VARIABLE). PAX needed to be able to tell whether such a definition exists, as well as access its docstring and source location.

Over time, this mechanism evolved into a portable, extensible introspection library independent of PAX. I began separating the two projects two years ago and named the new library, though they continued to share a repository. I have now removed the remaining dependencies so that DRef can live on its own.

Joe MarshallEchoes of the Lisp Listener

· 24 days ago

The Lisp Machine Listener had an electric close parenthesis. When the user typed a close parenthesis, and this was the close parenthesis that finished the complete form at top level, the form would be sent to the REPL right away with no need to press enter. Here's how to get this behavior with SLY:

(defun my-sly-mrepl-electric-close-paren ()
  "Insert ')' and auto-send ONLY if we are closing a top-level Lisp form."
  (interactive)
  (let ((state (syntax-ppss)))
    (insert ")")
    ;; Safety checks:
    ;; 1. We were at depth 1 (so we are now at depth 0)
    ;; 2. We aren't in a string or comment
    ;; 3. The input actually starts with a paren (it's a form, not a sentence)
    (when (and (= (car state) 1)
               (not (nth 3 state))
               (not (nth 4 state))
               (string-match-p "^\\s-*(" 
                               (buffer-substring-no-properties (sly-mrepl--mark) (point))))
      (sly-mrepl-return))))

Another cool hack is to get the REPL to do double duty as a command line to the LLM chatbot. When you type RET in the REPL, it will check if the input is a complete lisp form. If so, it will send the form to the REPL as normal. If not, it will send the input to the chatbot. Here's how to do this:

(defun my-sly-mrepl-electric-return ()
  "Send to Lisp if it's a form/symbol, or wrap in (chat ...) if it's a sentence."
  (interactive)
  (let* ((beg (marker-position (sly-mrepl--mark)))
         (end (point-max))
         (input (buffer-substring-no-properties beg end))
         (trimmed (string-trim input)))
    (cond
     ;; If it's empty, just do a normal return
     ((string-blank-p trimmed)
      (sly-mrepl-return))
     
     ;; If it starts with a paren, quote, or hash, it's definitely a Lisp form
     ((string-match-p "^\\s-*[(#'\"]" trimmed)
      (sly-mrepl-return))
     
     ;; If it's a single word (no spaces), treat it as a symbol/form (e.g., *package*)
     ((not (string-match-p "\\s-" trimmed))
      (sly-mrepl-return))
     
     ;; Otherwise, it's a sentence. Wrap it and fire.
     (t
      (delete-region beg end)
      (insert (format "(chat %S)" trimmed))
      (sly-mrepl-return)))))

Install as follows:

;; Apply to SLY MREPL with a safety check for the mode map
(with-eval-after-load 'sly-mrepl
  (define-key sly-mrepl-mode-map (kbd "RET") 'my-sly-mrepl-electric-return)
  (define-key sly-mrepl-mode-map (kbd ")") 'my-sly-mrepl-electric-close-paren))

Tim BradshawMaking CLOS slot access less slow

· 25 days ago

Access to slots in CLOS instances is often very slow. It’s probably not possible for it ever to be really fast, but the AMOP MOP does provide a way of making it, at least, less slow.

How slow is it?

Here are some benchmarks for accessing fields in objects of various kinds, using SBCL. All of these tests do something equivalent to

(defclass a ()
  ((i :initform 0 :type fixnum)))

(defclass a/no-fixnum ()
  ((i :initform 0)))

(defmethod svn ((a a) n)
  (declare (type fixnum n)
           (optimize speed (safety 0)))
  (dotimes (i n)
    (incf (the fixnum (slot-value a 'i)))))

(defmethod svn ((a a/no-fixnum) n)
  (declare (type fixnum n)
           (optimize speed (safety 0)))
  (dotimes (i n)
    (incf (the fixnum (slot-value a 'i)))))

They then call svn (or equivalent) with a large value of \(n\), do that a number of times \(m\) and then divide by \(2 \times n \times m\) to get an average time per access (incf accesses the slot twice).

For SBCL 2.6.3.178-a190d9710 on ARM64 Apple M1, seconds per access:

  • raw fixnum increment \(1.58\times 10^{-10}\), ratio \(1.0\);
  • slot access with slot-value (slot type fixnum) \(1.20\times 10^{-8}\), ratio \(76\);
  • slot access with slot-value (no slot type) \(1.22\times 10^{-8}\), ratio \(77\);
  • slot access with slot-value (single slot-value-using-class method) \(1.69\times 10^{-8}\), ratio \(107\);
  • slot access using standard-instance-access \(1.00\times 10^{-9}\), ratio \(6.4\);
  • slot access, struct (slot type fixnum) \(1.57\times 10^{-10}\), ratio \(1.0\);
  • slot access, struct (no type) \(1.58\times 10^{-10}\), ratio \(1.0\);
  • slot access, cons (car) \(1.59\times 10^{-10}\), ratio \(1.0\).

These numbers vary slightly, but this gives a good picture of what is going on. In particular you can see that slot-value within a method specialised on the class is more than 70 times slower than access for a structure slot, but if you can use standard-instance-access it is only about 6 times slower: standard-instance-access speeds things up by a factor of about 10, which changes CLOS slot access performance from laughably slow to merely pretty slow.

A macro

I’ve written a macro, called with-sia-slots which is like with-slots but uses standard-instance-access. It therefore has all the constraints imposed by that, but it is significantly faster than with-slots or slot-value. It has some overhead, as it has to dynamically compute the slot locations: this is better done outside any inner loop. This means that, for instance, you probably want to write code that looks like

(with-sia-slots (x) o
  (dotimes (i many)
    (setf x (... x ...))))

which will mean you only pay the overhead once.

The above tests don’t use with-sia-slots, as I wrote them partly to see if something like this was worth writing. However on a current (at the time of writing) SBCL with-sia-slots is asymptotically about 10 times faster than with-slots as demonstrated by these tests.

Up to package names it should be portable to any CL with an AMOP-compatible MOP. It can be found in my implementation-specific hacks, linked from here.

Yukari HafnerOn Lisp, LLMs, and Community

· 26 days ago
https://studio.tymoon.eu/api/studio/file?id=3892

In 2015 in London I attended my first European Lisp Symposium. I was 21 at the time, and while this wasn't my first time abroad on my own, it was still a pretty stressful affair. I remember it still pretty clearly to that day: meeting Robert Strandh, Zach Beane, Didier Verna, Daniel Kochmański and many other people I'd previously admired from afar through many discussions on IRC. It was an important event for me, and was the first time I'd felt like I was in a group of people I could talk with about my interests and ambitions.

Last year in 2025 I was the local chair for ELS in Zürich. It was a stressful time and I don't remember much of it other than how the stage looked, the food, and me rushing all over to get supplies and take care of other emergencies. I barely talked to anyone because I was either rushing about, stressed, or too tired.

In that time, my life has changed significantly. Over the years I took on more and more organisational roles for ELS itself: remaking the website, handling the transition to a hybrid online conference, handling the live streaming on-site, and last year being local chair.

But for other parts of the broader Lisp community I gradually changed in the opposite direction: I stopped religiously reading the #lisp/#commonlisp IRC channels. I left the Lisp Discord. I stopped replying on and ultimately altogether reading the /r/lisp subreddit. I stopped blogging about what was going on both in other places and with my own projects.

All of these changes happened over time as I found myself with less tolerance for things that annoyed me and wasted my time and energy. The endless debates about why there wasn't a new standard, the constant humm-hawwing about what """the community""" should do, why Lisp wasn't more widely used if it's so great, someone starting yet another project that was already done instead of contributing to an existing implementation, and so on and so forth.

And then I found myself thinking today: "gee, I'm not very excited to go to ELS'26, huh? Whatever happened?" I've already booked my flight and hotel, and I'll be there anyway, partly because I have to for organisational reasons. But now that I'm thinking about how I feel, I can't say for certain if I will be back next year, too. Both for financial and emotional reasons.

In recent years I've found myself more and more disconnected with male-dominated spaces in general. I don't feel at home in them. I'm already not a very social person and struggle with any kind of gathering that has more than 6 people, but a lot more so still if it's mostly men. Not necessarily because I feel like I'm in any kind of danger, but simply because I don't feel like I belong. And... you know, that's sad. Obviously me leaving won't make the situation better for the other women that do attend, but that's the dilemma with all of these situations: unless the organisation creates intentional pressure to correct the situation, it will inevitably only reinforce itself.[1]

And then there's what I can, in the nicest way, only describe as "The LLM Situation," though I will be increasingly un-nice going forward. As of early this year SBCL has happily accepted patches that are authored by or with the use of LLMs, and the maintainers have rebuffed complaints about this practise. The mailing list has also gotten its fair share of useless blather by apologists and pointless drivel dreamed up by LLMs that only wastes everyone's time, to the point where I had to just stop reading it altogether. A few maintainers of other significant projects seem to also have embraced the capitalist wasteland mass exploitation machine that disguises itself as "technology."

On the other side of things as the lead developer of Shirakumo I've decided to put out a blanket ban on all of this garbage. I do not care if LLMs work at all, or if they will ever work, or whatever. The usability of LLMs is completely irrelevant. By using them you are happily handing over the single last remaining shred of your human spirit to the capitalists to help them burn everyone else and the world with it to the ground.

I think back to the impromptu "LLM roundtable" discussion that took place at the end of ELS last year, along with the usual apologist bullshit that was spread in the ELS Signal group at the time, and some of the lightning talks that were shown. And as I think about this, I am filled with trepidation about the coming conference.

Obviously I have no idea what it will be like yet, and I have no idea what the programme will be, nor what people will be there, or what the general vibe is going to be. But nevertheless, I really hope I won't have to "crash out" as the kids say. I already lost my mind last year, seemingly being the only one that wanted to hold a firm stance against this wave of shit at the time.

So what does this all mean going forward? Well, for just now, nothing. I'll continue to be in the places I have dug out myself: mastodon, the shirakumo lichat/libera channel, my patreon, and other small, purpose-driven communities. But it's very possible I'll be leaving ELS behind me permanently after this year, cutting off even the last part of the community that used to be most of my world.

Regardless, I will still be working on my Lisp projects. If nothing else, one of the nice things about the looming tower of software I've built over all these years is that I am in control of the vast majority of it, and replacing any particular part I didn't write should it get enshittified is not that big of an endeavour.

Make no mistake though: I will continue to be increasingly outspoken and annoying about political matters that I consider important and relevant, and this will also be visible in the software I write, be that in licensing, ecosystem integration, or documentation.

I hope that more people will speak out publicly about their stance. It's important to show what you stand for, even if you're just a small part. What is considered "normal" and acceptable is only ever a matter of what people get to see, regardless of how prevalent that stance is among the population. Currently people are getting to see a lot of folks proudly and loudly making trash and littering it all over the place. This normalisation is dangerous, because it makes the average joes think it's OK for them to do it too, or even that they should be doing it.

Just the same way as any other social movement, you 🫵 play a role in it, and your voice matters. Whether you use your voice for the betterment of humans, for the betterment of the ghouls feeding off of us, or silently let the ghouls feed off of us.

See you at ELS'26 in Kraków!


  1. A very dramatic but clear demonstration of this principle is found in the "Nazi Bar" anecdote. People that don't feel comfortable will just leave, even when there is no explicit and obvious push for them to do so.

Tim BradshawStructures of arrays

· 40 days ago

Or, second system.

A while ago, I decided that I’d like to test my intuition that Lisp (specifically implementations of Common Lisp) was not, in fact, bad at floating-point code and that the ease of designing languages in Lisp could make traditional Fortran-style array-bashing numerical code pretty pleasant to write.

I used an intentionally naïve numerical solution to a gravitating many-body system as a benchmark, so I could easily compare Lisp & C versions. The brief result is that the Lisp code is a little slower than C, but not much: Lisp is not, in fact, slow. Who knew?

The point here though, is that I wanted to dress up the array-bashing code so it looked a lot more structured. To do this I wrote a macro which hid what was in fact an array of (for instance) double floats behind a bunch of syntax which made it look like an array of structures. That macro took a couple of hours.

This was fine and pretty simple, but it only dealt with a single type for each conceptual array of objects, there was no inheritance and it was restricted in various other ways. In particular it really was syntactic sugar on a vector: there was no distinct implementational type at all. So I thought well, I could make it more general and nicer.

Big mistake.

The second system

Here is an example of what I wanted to be able to do (this is in fact the current syntax):

(define-soa-class example ()
  ((x :array t :type double-float)
   (y :array t :type double-float)
   (p :array t :type double-float :group pq)
   (q :array t :type double-float :group pq)
   (r :array t :type fixnum)
   (s)))

This defines a class, instances of which have five array slots and one scalar slot. Of the array slots:

  • x and y share an array and will be neighbouring elements;
  • p and q share a different array, because the group option says they must not share with x and y;
  • r will be in its own array, unless the upgraded element type of fixnum is the same as that of double-float;
  • s is just a slot.

The implementation will tell you this:

> (describe (make-instance 'example :dimensions '(2 2)))
#<example 8010059EEB> is an example
[...]
dimensions      (2 2)
total-size      4
rank            2
tick            1
its class example has a valid layout
it has 3 arrays:
 index 0, element type double-float, 2 slots
 index 1, element type (signed-byte 64), 1 slot
 index 2, element type double-float, 2 slots
it has 5 array slots:
 name x, index 0 offset 0
 name y, index 0 offset 1
 name r, index 1 offset 0
 name p, index 2 offset 0
 name q, index 2 offset 1

This is already too complicated: the ability to control sharing via groups is almost certainly never going to be useful: it’s only even there because I thought of it quite early on and never removed it.

The class definition macro then needs to arrange life so that enough information is available so that a macro can be written which turns indexed slot access into indexed array access of the underlying arrays which are secretly stored in instances, inserting declarations to make this as fast as possible: anything slower than explicit array access is not acceptable. This might (and does) look like this, for example:

(with-array-slots (x y) (thing example)
  (for* ((i ...) (j ...))
    (setf (x i j) (- (y i j) (y j i)))))

As you can see from this, the resulting objects should be allowed to have rank other than 1. Inheritance should also work, including for array slots. Redefinition should be supported and obsolete macro expansions and instances at least detected.

In other words there are exactly two things I should have aimed at achieving: the ability to define fields of various types and have them grouped into (generally fewer) underlying arrays, and an implementational type to hold these things. Everything else was just unnecessary baggage which made the implementation much more complicated than it needed to be.

I had not finished making mistakes. The system needs to store some metadata about how slots map onto the underlying arrays, element types and so on, so the macro can use this to compile efficient code. There are two obvious ways to do this: use the property list of the class name, or subclass standard-class and store the metadata in the class. The first approach is simple, portable, has clear semantics, but it’s ‘hacky’; the second is more complicated, not portable, has unclear semantics1, but it’s The Right Thing2. Another wrong decision I made without even trying.

The only thing that saved me was that the nature of software is that you can only make a finite number of bad decisions in a finite time.

More bad decisions

I was not done. Early on, I thought that, well, I could make this whole thing be a shim around defstruct: single inheritance was more than enough, and obviously I could store metadata on the property list of the type name as described above. And there’s no nausea with multiple accessors or any of that nonsense.

But, somehow, I found writing a thing which would process the (structure-name ...) case of defstruct too painful, so I decided to go for the shim-around-defclass version instead. I even have a partly-complete version of the defstructy code which I abandoned. Another mistake.

I also decided that The Right Thing was to have the system support objects of rank 0. That constrains the underlying array representation (it needs to use rank \(n+1\) arrays for an object of rank \(n\)) in a way which I thought for a long time might limit performance.

Things I already knew

At any point during the implementation of this I could have told you that it was too general and the implementation was going to be too complicated for no real gain. I don’t know why I made so many bad choices.

The whole process took weeks and I nearly just gave up several times.

The light at the end of the tunnel

Or: all-up testing.

Eventually, I had a thing I thought might work. The macro syntax was a bit ugly (that macro still exists, with a different name) but it seemed to work. But since the whole purpose of the thing was performance, that needed to be checked. I wasn’t optimistic.

What I did was to write a version of my naïve gravitational many-body system using the new code, based closely on the previous one. The function that updates the state of the particles looks like this:

(defun/quickly step-pvs (source destination from below dt G &aux
                                (n (particle-vector-length source)))
  ;; Step a source particle vector into a destination one.
  ;;
  ;; Operation count:
  ;;  3
  ;;  + (below - from) * (n - 1) * (3 + 8 + 9)
  ;;  + (below - from) * (12 + 6)
  ;;  = (below - from) * (20 * (n - 1) + 18) + 3
  (declare (type particle-vector source destination)
           (type vector-index from)
           (type vector-dimension below)
           (type fpv dt G)
           (type vector-dimension n))
  (when (eq source destination)
    (error "botch"))
  (let*/fpv ((Gdt (* G dt))
             (Gdt^2/2 (/ (* Gdt dt) (fpv 2.0))))
    (binding-array-slots (((source particle-vector :check nil :rank 1 :suffix _s)
                           m x y z vx vy vz)
                          ((destination particle-vector :check nil :rank 1 :suffix _d)
                           m x y z vx vy vz))
      (for ((i1 (in-naturals :initially from :bound below :fixnum t)))
        (let/fpv ((ax/G zero.fpv)
                  (ay/G zero.fpv)
                  (az/G zero.fpv)
                  (x1 (x_s i1))
                  (y1 (y_s i1))
                  (z1 (z_s i1))
                  (vx1 (vx_s i1))
                  (vy1 (vy_s i1))
                  (vz1 (vz_s i1)))
          (for ((i2 (in-naturals n t)))
            (when (= i1 i2) (next))
            (let/fpv ((m2 (m_s i2))
                      (x2 (x_s i2))
                      (y2 (y_s i2))
                      (z2 (z_s i2)))
              (let/fpv ((rx (- x2 x1))
                        (ry (- y2 y1))
                        (rz (- z2 z1)))
                (let/fpv ((r^3 (let* ((r^2 (+ (* rx rx) (* ry ry) (* rz rz)))
                                      (r (sqrt r^2)))
                                 (declare (type nonnegative-fpv r^2 r))
                                 (* r r r))))
                  (incf ax/G (/ (* rx m2) r^3))
                  (incf ay/G (/ (* ry m2) r^3))
                  (incf az/G (/ (* rz m2) r^3))))))
          (setf (x_d i1) (+ x1 (* vx1 dt) (* ax/G Gdt^2/2))
                (y_d i1) (+ y1 (* vy1 dt) (* ay/G Gdt^2/2))
                (z_d i1) (+ z1 (* vz1 dt) (* az/G Gdt^2/2)))
          (setf (vx_d i1) (+ vx1 (* ax/G Gdt))
                (vy_d i1) (+ vy1 (* ay/G Gdt))
                (vz_d i1) (+ vz1 (* az/G Gdt)))))))
  destination)

And it not only worked, the performance was very close to the previous version, straight out of the gate. The syntax is not as nice as that of the initial, quick-and-dirty version, but it is much more general, so I think that’s worth it on the whole.

There have been problems since then: in particular the dependency on when classes get defined. It will never be as portable as I’d like because of the unnecessary MOP dependencies3, but it is usable and quick4.

Was it worth it? May be, but it should have been simpler.


  1. When exactly do classes get defined? Right. 

  2. Nothing that uses the AMOP MOP is ever The Right Thing, because the whole thing was designed by people who were extremely smart, but still not as smart as they needed to be and thought they were. It’s unclear if any MOP for CLOS can ever be satisfactory, in part because CLOS itself suffers from the same smart-but-not-smart-enough problem to a large extent not helped by bring dropped wholesale into CL at the last minute: by the time CL was standardised people had written large systems in it, but almost nobody had written anything significant using CLOS, let alone the AMOP MOP. 

  3. A mistake I somehow managed to avoid was using the whole slot-definition mechanism the MOP wants you to use. 

  4. I will make it available at some point. 

Robert SmithNot all elementary functions can be expressed with exp-minus-log

· 42 days ago

By Robert Smith

All Elementary Functions from a Single Operator is a paper by Andrzej Odrzywołek that has been making rounds on the internet lately, being called everything from a “breakthrough” to “groundbreaking”. Some are going as far as to suggest that the entire foundations of computer engineering and machine learning should be re-built as a result of this. The paper says that the function

$$ E(x,y) := \exp x - \log y $$

together with variables and the constant $1$, which we will call EML terms, are sufficient to express all elementary functions, and proceeds to give constructions for many constants and functions, from addition to $\pi$ to hyperbolic trigonometry.

I think the result is neat and thought-provoking. Odrzywołek is explicit about his definition of “elementary function”. His Table 1 fixes “elementary” as 36 specific symbols, and under that definition his theorem is correct and clever, so long as we accept some of his modifications to the conventional $\log$ function and do arithmetic with infinities.

My concern is that the word “elementary” in the title carries a much broader meaning in standard mathematical usage. Odrzywołek recognizes this, saying little more than “[t]hat generality is not needed here” and that his work takes “the ordinary scientific-calculator point of view”. He does not offer further commentary.

What is this more general setting, and does his claim still hold? In modern pure mathematics, dating back to the 19th century, the definition of “elementary function” has been well established. We’ll get to a definition shortly, but to cut to the chase, the titular result does not hold in this setting. As such, in layman’s terms, I do not consider the “Exp-Minus-Log” function to be the continuous analog of the Boolean NAND gate or the universal quantum CCNOT/CSWAP gates.

The rough TL;DR is this: Elementary functions typically include arbitrary polynomial root functions, and EML terms cannot express them. Below, I’ll give a relatively technical argument that EML terms are not sufficient to express what I consider standard elementary functions.

To avoid any confusion, the purpose of this blog post is manifold:

  1. To elucidate what many mathematicians consider to be an “elementary function”, which is the foundation for a variety of rich and interesting math (especially if you like computer science).
  2. To prove a result about EML terms using topological Galois theory.
  3. To demonstrate how this result may be used to show an elementary function not expressible by EML terms.

This blog post is not a refutation of Odrzywołek’s work, though the title might be considered just as clickbait (and accurate) as his, depending on where you sit in the hall of mathematics and computation.

Disclaimer: I audited graduate-level mathematics courses almost 20 years ago, and I am not a professional mathematician. Please email me if my statements are clumsy or incorrect.

The 19th century is where all modern understanding of elementary functions was developed, Liouville being one of the big names with countless theorems of analysis and algebra named after him. One such result is about integration: do the outputs of integrals look the same as their inputs? Well, what does “input” and “look the same” mean? Liouville defined a class of functions called elementary functions, and said that the integral of an elementary function will sometimes be elementary, and when it is, it will always resemble the input in a specific way, plus potential extra logarithmic factors.

Since then, elementary functions have been defined by starting with rational functions and closing under arithmetic operations, composition, exponentiation, logarithms, and polynomial roots. While EML terms are quite expressive, they are unable to capture the “polynomial roots” in full generality. We will show this by using Khovanskii’s topological Galois theory: the monodromy group of a function built from rational functions by composition with $\exp$ and $\log$ is solvable. For anybody that has studied Galois theory in an algebra course, this will be familiar, as the destination here is effectively the same, but with more powerful intermediate tooling to wrangle exponentials and logarithms.

First, let’s be more precise by what we mean by an EML term and by a standard elementary function.

Definition (EML Term): An EML term in the variables $x_1,\dots,x_n$ is any expression obtained recursively, starting from $\{1, x_1,\dots,x_n\}$, by the rule $$ T,S \mapsto \exp T-\log S. $$ Each such term, evaluated at a point where all the $\log$ arguments are nonzero, determines an analytic germ; we take $\mathcal T_n$ to be the class of germs representable this way, together with their maximal analytic continuations.

Definition (Standard Elementary Function): The standard elementary functions $\mathcal{E}_n$ are the smallest class of multivalued analytic functions on domains in $\mathbb{C}^n$ containing the rational functions and closed under

  • arithmetic operations and composition,
  • exponentiation and logarithms,
  • algebraic adjunctions: if $P(Y)\in K[Y]$ is a polynomial whose coefficients lie in a previously constructed class $K$, then any local branch of a solution of $P(Y)=0$ is admitted.

What we will show is that the class of elementary functions defined this way is strictly larger than the class induced by EML terms.

Lemma: Every EML term has solvable monodromy group. In particular, if $f\in\mathcal T_n$ is algebraic over $\mathbb C(x_1,\dots,x_n)$, then its monodromy group is a finite solvable group.

Proof: We prove by induction on EML term construction. Constants and coordinate functions have trivial monodromy.

For the inductive step, suppose $f = \exp A-\log B$ with $A,B\in\mathcal T_n$, and assume that $\mathrm{Mon}(A)$ and $\mathrm{Mon}(B)$ are solvable. We argue in three steps.

Step 1: $\mathrm{Mon}(\exp A)$ is solvable. The germs of $\exp A$ are images under $\exp$ of the germs of $A$, with germs of $A$ differing by $2\pi i\mathbb Z$ collapsing to the same value. So there is a surjection $\mathrm{Mon}(A)\twoheadrightarrow\mathrm{Mon}(\exp A)$, and a quotient of a solvable group is solvable.

Step 2: $\mathrm{Mon}(\log B)$ is solvable. At a generic point $p$, germs of $\log B$ are parameterized by pairs $(b,k)$ where $b$ is a germ of $B$ at $p$ and $k\in\mathbb Z$ selects the branch of $\log$. A loop $\gamma$ acts by $$ (b,k)\mapsto\bigl(\rho_B(\gamma)(b), k+n(\gamma,b)\bigr), $$ where $\rho_B(\gamma)$ is the monodromy action of $\gamma$ on germs of $B$, and $n(\gamma,b)\in\mathbb Z$ is the winding number around $0$ of the analytic continuation of $b$ along $\gamma$. The projection $\mathrm{Mon}(\log B)\to\mathrm{Mon}(B)$ onto the first component is a surjective homomorphism. Its kernel consists of the elements of $\mathrm{Mon}(\log B)$ induced by loops $\gamma$ with $\rho_B(\gamma)=\mathrm{id}$, which then act only by integer shifts on the $k$-coordinate. Let $S_B$ be the set of germs of $B$ at $p$. For each $b\in S_B$, such a loop determines an integer shift $n(\gamma,b)$, so the kernel embeds in the direct product $\mathbb Z^{S_B}$. In particular, the kernel is abelian. Hence $\mathrm{Mon}(\log B)$ is an extension of $\mathrm{Mon}(B)$ by an abelian group, and extensions of solvable groups by abelian groups are solvable.

Step 3: $\mathrm{Mon}(f)$ is solvable. At a generic point, a germ of $f=\exp A-\log B$ is obtained by subtraction from a pair (germ of $\exp A$, germ of $\log B$), and analytic continuation acts componentwise on such pairs. This gives a surjection of $\pi_1$ onto some subgroup $$ H \le \mathrm{Mon}(\exp A)\times\mathrm{Mon}(\log B), $$ and, since $f$ is obtained from the pair by subtraction, this descends to a surjection $H\twoheadrightarrow\mathrm{Mon}(f)$. So $\mathrm{Mon}(f)$ is a quotient of a subgroup of a direct product of solvable groups, hence solvable.

The second statement of the lemma follows: an algebraic function has finitely many branches, so its monodromy group is finite; a solvable group that is finite is, well, finite and solvable. ∎

Remark. This is the core of Khovanskii’s topological Galois theory; see Topological Galois Theory: Solvability and Unsolvability of Equations in Finite Terms.

Theorem: $\mathcal T_n \subsetneq \mathcal E_n$.

Proof: $\mathcal E_n$ is closed under algebraic adjunction, so any local branch of an algebraic function is elementary. In particular, a branch of a root of the generic quintic $$ f^5+a_1f^4+a_2f^3+a_3f^2+a_4f+a_5=0 $$ is elementary.

Suppose for contradiction that at some point $p$ a germ of a branch of this root agrees with a germ of an EML term $T$. By uniqueness of analytic continuation, the Riemann surfaces obtained by maximally continuing these two germs coincide, so in particular their monodromy groups coincide. The monodromy group of the generic quintic is $S_5$, which is not solvable. But by the lemma, the monodromy group of any EML term is solvable. Contradiction.

Hence $\mathcal T_n$ is a strict subset of $\mathcal E_n$. ∎

Edit (15 April 2026): This article used to have an example proving that the real and complex absolute value cannot be expressed over their entire domain as EML terms under the conventional definition of $\log$. I wrote it to emphasize that Odrzywołek’s approach required mathematical “patching” in order to work as intended. However, it ended up more distracting than illuminating, and was tangential to the point about the definition of “elementary”, so it has been removed.

Scott L. BursonFSet v2.4.2: CHAMP Bags, and v1.0 of my FSet book!

· 43 days ago

A couple of weeks ago I released FSet 2.4.0, which brought a CHAMP implementation of bags, filling out the suite of CHAMP types.  🚀  FSet users should have a look at the release page, as it also contained a number of bug fixes and minor changes.

I've since released v2.4.1 and v2.4.2, with some more bug fixes.

But the big news is the book!   It brings together all the introductory material I have written, plus a lot more, along with a complete API Reference chapter.

FSet is now in the state I decided last summer I wanted to get it into: faster, better tested and debugged, more feature-complete, and much better documented than it has ever been in its nearly two decades of existence.  I am, of course, very much hoping that these months of work have made the library more interesting and accessible to CL programmers who haven't tried it yet.  I am even hoping that its existence helps attract newcomers to the CL community.  Time will tell!

 

Wimpie NortjeDependency hell revisited, updating my Qlot workflow.

· 44 days ago

I wrote on this topic before but the landscape has changed a lot since then.

Skip to the new Qlot workflow.

When you work on projects that become even slightly complex it is a matter of time until you run into problems where the specific version of a particular library becomes important. This happens in most, if not all, programming languages.

In the Common Lisp environment Quicklisp has become the de facto standard for loading libraries, including fetching and loading their dependencies. Quicklisp distributes libraries in "distributions" which are point-in-time snapshots of all the known and working libraries at the time of distribution creation.

An advantage of this approach is that you are guaranteed that all libraries available in the distribution can be loaded with any of the others. Some disadvantages are that 1) if a library was included in an older distribution but no longer loads cleanly, it gets removed from the distribution, and 2) libraries are only added or updated when a new distribution is cut.

Though libraries can be put in ~/quicklisp/local-projects/ in order to supplement or override those in the distribution, Quicklisp does not provide any mechanism for pinning the state of ~/quicklisp/local-projects/.

Some Quicklisp attributes:

  1. All libraries in a distribution can be loaded with any of the others. Those that no longer do are removed from the distribution.
  2. Libraries are only added or updated at distribution cutting time.
  3. You specify a single distribution version, not individual library versions. The distribution is used globally, across all projects and across all lisp implementations.
  4. Libraries can be loaded from ~/quicklisp/local-projects/. Anything there will override the version in the distribution.
  5. Nothing about the ~/quicklisp/local-projects/ content can be specified using Quicklisp. It needs to be managed outside of Quicklisp.
  6. Libraries are loaded from the current Quicklisp distribution. There is no way to specify which distribution version a particular project must use.
  7. Quicklisp can create a bundle of all the loaded systems that can be committed with the project code and used from there rather than the distribution, i.e. vendoring.
  8. The source code for libraries included in a distribution are stored in a dedicated location for Quicklisp, they are not read from the original source repo.

Depending on your situation these attributes may be positive, negative or irrelevant.

There are projects like Ultralisp that have different philosophies regarding the distribution content but they still depend on Quicklisp for all other aspects. Thus they share most of the above attributes.

Since my previous post on this topic much has changed in the library version arena. There are now many projects that address different aspects of the above list; the topic of vendoring has gained momentum; and Qlot has changed a lot, to the extent that some code samples in older posts no longer work.

Vendoring is the idea that all the libraries your project depend on are actually part of your project and as such should be committed as part of the project code. Both Quicklisp and Qlot support this with QL:bundle and QLOT:bundle, and the Vend tool is entirely focused on vendoring.

The significant changes in Qlot broke my development workflow. Since I now had to spend time fixing this it was a good opportunity to evaluate some of the other library versioning tools. The issues that made me hesitant to adapt to the new Qlot without considering other options are:

  • Roswell: Qlot pushes hard for using Roswell. That ads Roswell as another intermediary and dependency in an already complex process.
  • The Qlot documentation is heavily biased towards using it as an external tool rather than through the REPL. Previously I used Qlot from the REPL and wanted to continue doing so if possible.
  • Qlot is developed on SBCL which means that any issues in CCL take longer to be discovered and fixed. As I mainly use CCL it is something to keep in mind.
  • The documentation proposes that lisp be started as a subprocess of Qlot, e.g. $ qlot exec ros emacs or $ qlot exec ccl. This is to ensure that the lisp is properly configured to use the project local Quicklisp. This is a brittle solution because the standard lisp REPL is then no longer sufficient. When you forget to start lisp this way Quicklisp will load the wrong libraries without any indication, potentially causing subtle bugs which are normally absent.
  • Using QLOT:quickload rather than QL:quickload always bugged me for similar reasons. It is a non-standard way to do a standard thing. Forgetting to do it is easy and then you have the wrong libraries loaded.

I evaluated many of the other version management tools and came to the conclusion that Qlot is the closest to what I wanted and I set off trying to find a workflow that will adhere to my requirements listed here:

  • Does not involve Roswell
  • Works with CCL
  • Works inside the REPL of a lisp loaded without any funny requirements
  • Gives me 100% certainty that the correct versions of all libraries are loaded, without having to interrogate (asdf:system-relative-pathname) for each.

After some fiddling with Qlot I learned that:

  1. Qlot installs a complete and independent Quicklisp inside your project directory. This Quicklisp has no dependence, relation or knowledge about the global Quicklisp.
  2. $ qlot exec ccl mostly arranges things such that Quicklisp is loaded from the project local installation. If you can arrange for that to happen without using Qlot then you can use start your lisp normally.
  3. When the project local Quicklisp is loaded it doesn't know about the global or any other project local Quicklisps. This gives the 100% certainty of where libraries were loaded from.
  4. When the project local Quicklisp is loaded you use it exactly like you would the global one. That is, QL:quickload and not QLOT:quickload, nor do you use any other Qlot wrappers.
  5. Qlot does not need to be present in your lisp image at all.
  6. Loading Qlot inside you current REPL doesn't work well because:

    1. The Qlot REPL API is still in development.
    2. Doing this loads Qlot from the global Quicklisp instead of the project local one. Distribution version mismatches between the two Quicklisps could trigger issues with Qlot itself. It also voids the certainty of library location.
  7. Having Qlot as a standalone executable outside of your lisp image puts it on the same plane as other tools used for development such as make or git. It then doesn't matter which implementation it prefers because it doesn't affect your choices.
  8. Qlot has gained the ability to bundle libraries in case you want to go the vendoring route.
  9. Roswell is not needed for any of the above.

New workflow

Combining my requirements and my new understanding of Qlot, I modified my workflow for pinning library versions to be:

  1. Qlot executable must be available in your path.
  2. Specify the distribution and library versions in qlfile.
  3. qlot install at the CLI.
  4. Load lisp without executing init scripts, e.g. ccl -n.
  5. Load Quicklisp from the project local installation.

    (load "PROJECT-PATH/.qlot/setup.lisp")

  6. Use Quicklisp as before.

    (ql:quickload :alexandria)

If you would like to vendor your libraries then:

  1. Specify the distribution and library versions in qlfile.
  2. qlot install
  3. qlot bundle
  4. git commit
  5. ccl
  6. (load PROJECT-PATH/.bundle-libs/setup.lisp)

Qlot tasks

All Qlot related tasks such as initialising a project, installing libraries, upgrading libraries, etc must be performed in the CLI using the Qlot executable. These happen relatively infrequently and inside the REPL Qlot does not feature.

Tim BradshawRules for Lisp programs

· 48 days ago

Some very serious rules. Very serious.

The essential rule. If you are not building languages in Lisp why are you even here?

The lesser rules.

  1. If you write a program which uses defclass you are probably making a mistake.
  2. If you write a program which uses the CLOS MOP you are making a mistake.
  3. If you write a program which uses LOOP for any purpose other than creating a better iteration construct you are making a mistake.
  4. If you write a program which uses LOOP only to create a better iteration construct you are probably making a mistake.
  5. If you write a program which uses explicit package-qualified names more than very infrequently you will be cast into the outer darkness along with your program.

I will not be taking questions.

Patrick SteinNomic Coding Game

· 50 days ago

About 30 years ago, I had an idea for a coding game inspired by Nomic. It occurred to me last month that all of the tools I need are readily available now.

Pen-and-paper Nomic

The pen-and-paper game of Nomic (by Peter Suber) has an initial ruleset which describes how one proposes changes to the rules, how one gets those changes ratified, a way to award points when someone’s rule change is ratified, and a rule declaring that the winner is the first player to amass 100 points. Some of the rules are mutable and some are immutable and there are rules about turning mutable rules into immutable ones and vice-versa.

The game was meant to show some of the paradoxes of self-amendment. It was meant to lead people into situations where it was clear that certain actions were both legal (or even mandatory) and illegal.

A drastically simplified starting set of rules might look like:

  • There are these players: Alice, Bob, Carol, David, and Mel.
  • Any of the players can propose a change to these rules at any time when there is not already an outstanding proposal.
  • When a player makes a proposal, all players (including the player making the proposal) must immediately vote: Yay or Nay.
  • If a proposal garners more Yay than Nay votes, it takes effect immediately. Otherwise, the proposal is rejected.
  • The winner is the first person to score 100 points.

Nomic in Code

So, 30 years ago, I had the idea that it would be fabulous to write some code to referee a Nomic game. However, because interpretation of the rules is so horrendously human, it felt impossible. Today, in 2026, it seems one could maybe get Claude, Gemini, or some other LLM to referee. But, this doesn’t much interest me, either, really. I cannot get any of them to keep track of something that I made them write down. I cannot imagine that I would be happy with their interpretation of whether my move is legal given the current state of the rules nor to amend the rules appropriately if my move is legal.

What felt slightly more attainable 30 years ago would be to make it a battle in code:

  • The players propose deltas to the current code.
  • The players vote on which deltas to approve.
  • If the resulting code declares you the winner, you win.

This was nice and all, but it was also too static. The rules about who can vote and how votes are tallied and such wouldn’t be subject to change.

Nomic in Code in 2026

Fast-forward to last month. Last month, I realized that with the GitHub API interface, I could implement a very Nomic-ish pull request battle game. I can:

  • Gather information about all of the open pull requests on a repository,
  • Checkout a copy of the current main branch of that same repository,
  • Run the code on the main branch of that repository and give it the information that I collected about the open pull requests, and
  • Have the code on the main branch tell me which open pull requests (if any) to accept or reject.

To be truly in Nomic’s full spirit, it would be nice to allow the code in the repository to interact with the GitHub API on its own. Alas, that would immediately let the players vote in changes that expose my GitHub tokens, so it would be a gaping security hole—not only because it would let users impersonate me but because it would let them end-around the actual code in the repository to make changes to the main branch in the repository.

So, as it is, I have a supervisor written in Common Lisp which handles all of the interaction with GitHub and various game repositories (one to play in Common Lisp, one to play in JavaScript, and one to play in Python). The supervisor:

  • fetches all of the open pull requests;
  • annotates each pull request with:
    • all of the reviews on the pull request,
    • all of the comments on the pull request, and
    • all of the commits on the pull request;
  • clones the main branch of the game repository;
  • runs the game code from that main branch giving it the annotated list of open pull requests encoded as JSON on standard input;
  • reads the JSON-encoded output from the game code; and
  • acts accordingly.

The game code, given a list of open pull requests can reply with one of the following messages:

{
  "decision": "winner",
  "name": name-of-winner,
  "message": optional-reason-for-decision
}
{
  "decision": "accept",
  "id": id-number-of-pull-request-to-accept,
  "message": optional-reason-for-decision
}
{
  "decision": "reject",
  "id": id-number-of-pull-request-to-reject,
  "message": optional-reason-for-decision
}
{
  "decision": "defer"
}

The "defer" decision means that there is not enough information at the moment. Maybe, in the future, with other pull requests or other comments or reviews we will be able to make some move.

If the game code replies with anything that isn’t one of the four types of replies shown above, the supervisor assumes the latest merge broke the code and reverts the change.

The Ask

I haven’t been able to drum up enough players for a game in any of my regular haunts. So, I am looking for tolerant players who will help me give it a test run or two to work out the kinks in the supervisor. Some areas where I forsee potential issues:

  • There may be scenarios that cause the game to reach an impasse.
  • There are probably some GitHub responses that the supervisor doesn’t do the right thing with (in fact, I think I just thought of a situation that a malicious player could do if they are a collaborator rather than doing this through forked repos).
  • There might be special issues related to pull requests coming in from forks rather than within the repo which I cannot test without making myself a second GitHub account.
  • Who can say what the optimal number of players is, at this point?

So, if you’re tolerant of some bumps in the process, have a GitHub account (or will make one), and are interested in a Common Lisp battle of pull requests, let me know so we can get a game going.

The post Nomic Coding Game first appeared on nklein software.

Marco AntoniottiAn Update on MK-DEFSYSTEM

· 53 days ago

There are still a few of us (at least two) who are using MK:DEFSYSTEM. The venerable system construction tool has accumulated a lot of ancient cruft, some of which quite convoluted.

Recently I went back to MK:DEFSYSTEM and "cleaned up" some of the code, especially regarding the pathname construction for each component.  I also used some simpler hierarchical tricks using defstruct only.

The result should be more solid and clearer in the steps that comprise some "macro tasks". Of course, a rewrite using CLOS would change the coding style, but the choice has been made to keep the MK:DEFSYSTEM code base quite... retro (and somewhat simple).

Why did I went back to MK:DEFSYSTEM? As usual, it is because of a rabbit-hole I fell into: I will blog about it later on (hint: HEΛP).

MK-DEFSYSTEM quick history as of March 2026

MK-DEFSYSTEM (or MK:DEFSYSTEM, or MAKE:DEFSYSTEM) was originally written by Mark Kantrowitz as part of the original "CMU Lisp Utilities" collection; an early "public" set of Common Lisp code and utilities that, in the writer's opinion form one of the basis of most Common Lisp writing to date.

As stated (by M. Kantrowitz himself) in this file header, the original version of MK-DEFSYSTEM was inspired by the Symbolics DEFSYSTEM (or DEFSYS) tool. Yet, MK-DEFSYSTEM differs significantly from it.

In its original form, MK-DEFSYSTEM was built in the CLtL1 era, accommodated a lot of variance among filesystems and CL implementations and it still bears those idiosycrasies. CLtL2 (1992) first and ANSI (1994) next, started reshaping the code base then.

MK-DEFSYSTEM was originally distributed under a license agreement that made redistribution tricky. In 1999, the writer - that'd be me, Marco Antoniotti - contacted Mark Kantrowitz offering to become a maintainer while reworking the distribution license to hammer some FOSS into it. Mark Kantrowitz graciously agreed and, after that, the writer got literally and physically hugged by a few Common Lisp developers because they could use MK-DEFSYSTEM more freely.

Of course, ASDF came along and it solved the same problems that Symbolics (and Kent Pitman's) DEFSYS and MK-DEFSYSTEM solve, plus much more.

Yet, MK-DEFSYSTEM has some nice features (in the eye of the beholder).

MK-DEFSYSTEM still ships in one file - defsystem.lisp - that you can LOAD in your Common Lisp init file. Of course, a big chunk of its current code base is "backward compatibility" and new ok-we-miss-UIOP-and-or-at-least-CL-FAD functionality, plus an ever growing ongoing commentary like this one.

Given this background, the writer has been maintaining MK-DEFSYSTEM for a long time, and more recently, Madhu has made significant changes (and maintains himself a fork with some bells and whistles of his own) since 2008.

Of course, many other contributors helped over the years, and are acknowledged in the early Change Log and in comments in the code.

In early 2026, the writer cleaned up the code and reworked some of the logic, by factoring out some code from main functions. In particular, the CREATE-COMPONENT-PATHNAMES, GENERATE-COMPONENT-PATHNAMES, COMPONENT-FULL-PATHNAME, COMPONENT-FULL-NAMESTRING interplay is better organized; plus new structures, leveraging DEFSTRUCT :INCLUDE feature have been introduced, rendering the code TYPECASE-able.

MK-DEFSYSTEM is old, but it works. It is quirky but it works (at least for the two or three known users - which, in 2026, is already a big chunk of the Common Lisp users' community). Moreover, it does have, at least in the eye of the beholder, some more user friendly user API, for most use case, especially for plain Common Lisp code.

The current MK-DEFSYSTEM repository is at https://gitlab.common-lisp.net/mantoniotti/mk-defsystem

(*) It is assumed that the reader knows about all the acronyms, tools and systems referred to in the text.


'(cheers)

Robert SmithIdiomatic Lisp and the nbody benchmark

· 54 days ago

When talking to Lisp programmers, you often hear something like, “adapt Lisp to your problem, not your problem to Lisp.” The basic idea is this: if Lisp doesn’t let you easily write a solution to your problem because it lacks some fundamental constructs that make expressing solutions easy, then add them to Lisp first, then write your solution.

That sounds all good and well in the abstract, and maybe we could even come up with some toy examples—say, defining HTTP request routing logic in a nice DSL. But where’s a real example of this that’s not artificial or overengineered?

Recently, on Twitter, I butted into the middle of an exchange between @Ngnghm (a famous Lisp programmer) and @korulang (an account dedicated to a new language called Koru) about Lisp. I’m oversimplifying, but it went something like this:

  • Lisp is slow.
  • No it’s not!
  • Yes it is!
  • No it’s not!
  • Then prove it!

Now, there’s plenty of evidence online that Common Lisp has reasonably good compilers that produce reasonably good machine code, and so the question became more nuanced: Can Lisp be realistically competitive with C without ending up being a mess of unidiomatic code?

Our interlocutor @korulang proposed a benchmark, the “nbody” benchmark from the Computer Language Benchmarks Game. This was of particular interest to them, because they used it as an object of study for their Koru language. To quote their blog post:

We wanted Koru kernels to land in the same ballpark as idiomatic C, Rust, and Zig.

The result was stronger than that.

Our fused n-body kernel, written in straightforward Koru kernel style, came in faster than the plain reference implementations. Every implementation here is "naive" — the obvious, idiomatic version a competent programmer would write in each language. No tricks, no hand-tuning, no -ffast-math: […]

and they proceeded to show Koru being 14% faster than C and 106% faster than Lisp.

Now, putting aside that some of the code and blog post were written with LLMs, there are many questions that are left unanswered here, since computer architecture and operating system matter a lot (where did these benchmarks run?). Moreover, the author buries the lede a little bit and proceeds to show how we might write “unidiomatic” C to match the performance of Koru.

I’m not concerned about nitpicking their approach or rigorously evaluating their claims, but I would like to dwell on this common refrain: “idiomatic”. What is that supposed to mean?

“Idiomatic code” in the context of programming means something like “representative of a fluent computer programmer” and “aligned with the peculiar characteristics of the language”. In some sense, idiomatic code in a particular language shouldn’t stand out amongst other code in that language, and idiomatic code should, in some sense, portray the identity of the language itself.

Idiomatic C is the C that uses terse names, simple loops, and unsafe arithmetic.

Idiomatic Haskell is the Haskell that uses short functions, higher-order abstractions, immutable data structures, and safe constructs.

What about idiomatic Lisp? Well, here’s the rub. A fluent programmer at Lisp doesn’t reach for one paradigmatic toolbox; they weave in and out of imperative, functional, object-oriented, etc. styles without much of a second thought. There’s a sort of “meta” characteristic to Lisp programming: you’re programming the language almost as much as you’re programming the program.

Yes, Lisp has loops, but “loopy code” isn’t intrinsically “Lispy code”. Yes, Lisp has objects, but “OOPy code” isn’t intrinsically “Lispy code”. In my opinion, what makes code “Lispy” is whether or not the programmer used Lisp’s metaprogramming and/or built-in multi-paradigm facilities to a reasonable degree to make the solution to their problem efficient and easy to understand in some global sense. For some problems, that may be “loopy” or “OOPy” or something else. It’s finding a Pareto-efficient syntactic and semantic combination offered by the language, or perhaps one of the programmer’s own creation.

So we get back to the @korulang benchmark challenge. Looking at their repository:

  • nbody.c looks like idiomatic C;
  • nbody.hs looks like wildly unidiomatic Haskell, but the problem is, the idiomatic version would probably be slower;
  • nbody.lisp looks reasonable, though it could easily be improved, but loopy; and
  • The Koru solution kernel_fused.kz looks idiomatic, as far as I can tell for not knowing anything about Koru.

I hesitate to say nbody.lisp is idiomatic. It’s reasonable, it’s straightforward to any imperative-minded programmer, but it’s not Lispy. That doesn’t make it good or bad, but it does lead to the grand question:

Can we use Common Lisp to express a solution to the nbody benchmark in a way that reads more naturally than a direct-from-C port?

I would say that, at face value, Koru’s solution is along the lines of what is more natural relative to the problem itself. Here are the essential bits.

~std.kernel:shape(Body) {
x: f64, y: f64, z: f64,
vx: f64, vy: f64, vz: f64,
mass: f64,
}
~std.kernel:init(Body) {
{ x: 0, y: 0, z: 0, vx: 0, vy: 0, vz: 0, mass: SOLAR_MASS },
{ x: 4.84143144246472090e+00, y: -1.16032004402742839e+00, z: -1.03622044471123109e-01, vx: 1.66007664274403694e-03 * DAYS_PER_YEAR, vy: 7.69901118419740425e-03 * DAYS_PER_YEAR, vz: -6.90460016972063023e-05 * DAYS_PER_YEAR, mass: 9.54791938424326609e-04 * SOLAR_MASS },
{ x: 8.34336671824457987e+00, y: 4.12479856412430479e+00, z: -4.03523417114321381e-01, vx: -2.76742510726862411e-03 * DAYS_PER_YEAR, vy: 4.99852801234917238e-03 * DAYS_PER_YEAR, vz: 2.30417297573763929e-05 * DAYS_PER_YEAR, mass: 2.85885980666130812e-04 * SOLAR_MASS },
{ x: 1.28943695621391310e+01, y: -1.51111514016986312e+01, z: -2.23307578892655734e-01, vx: 2.96460137564761618e-03 * DAYS_PER_YEAR, vy: 2.37847173959480950e-03 * DAYS_PER_YEAR, vz: -2.96589568540237556e-05 * DAYS_PER_YEAR, mass: 4.36624404335156298e-05 * SOLAR_MASS },
{ x: 1.53796971148509165e+01, y: -2.59193146099879641e+01, z: 1.79258772950371181e-01, vx: 2.68067772490389322e-03 * DAYS_PER_YEAR, vy: 1.62824170038242295e-03 * DAYS_PER_YEAR, vz: -9.51592254519715870e-05 * DAYS_PER_YEAR, mass: 5.15138902046611451e-05 * SOLAR_MASS },
}
| kernel k |>
std.kernel:step(0..iterations)
|> std.kernel:pairwise {
const dx = k.x - k.other.x;
const dy = k.y - k.other.y;
const dz = k.z - k.other.z;
const dsq = dx*dx + dy*dy + dz*dz;
const mag = DT / (dsq * @sqrt(dsq));
k.vx -= dx * k.other.mass * mag;
k.vy -= dy * k.other.mass * mag;
k.vz -= dz * k.other.mass * mag;
k.other.vx += dx * k.mass * mag;
k.other.vy += dy * k.mass * mag;
k.other.vz += dz * k.mass * mag;
}
|> std.kernel:self {
k.x += DT * k.vx;
k.y += DT * k.vy;
k.z += DT * k.vz;
}
| computed c |>
capture({ energy: @as(f64, 0) })
| as acc |>
for(0..5)
| each i |>
captured { energy: acc.energy + 0.5*c[i].mass*(c[i].vx*c[i].vx+c[i].vy*c[i].vy+c[i].vz*c[i].vz) }
|> for(i+1..5)
| each j |>
captured { energy: acc.energy - c[i].mass*c[j].mass / @sqrt((c[i].x-c[j].x)*(c[i].x-c[j].x)+(c[i].y-c[j].y)*(c[i].y-c[j].y)+(c[i].z-c[j].z)*(c[i].z-c[j].z)) }
| captured final |>
std.io:print.blk {
{{ final.energy:d:.9 }}
}

Can we achieve something similar in Lisp?

First, let’s make a baseline. I’m running Ubuntu Noble with a “AMD RYZEN AI MAX+ PRO 395” with a clock speed that varies between 0.6-5 GHz. I am also using SBCL 2.6.3 and gcc 13.3. Using nbody.lisp as a starting point, I modified it for a few easy wins. I’ll call this version nbody-lisp-conventional. A quick benchmark reveals that the loopy Lisp code is only about 20% slower than the C code compiled with gcc -O3 -ffast-math -march=native.

$ ./nbody-lisp-conventional 50000000
-0.169286396
timing: 2000 ms
$ ./nbody-c 50000000
-0.169286396
timing: 1662 ms

As a Lisp programmer, it’s not surprising that it’s a little slower. The number of person-years that have gone into C compilers to optimize idiomatic C code makes the development effort behind SBCL, the most popular open-source Lisp compiler, look like a rounding error.

Now that we have a baseline, our goal is to come up with a nicer Lisp program that also improves the timing.

Our approach will be simple. We will create a library.lisp that contains new language constructs of a similar ilk to Koru, and we will use them to implement the nbody benchmark in impl.lisp. Some rules:

  • No compile-time precomputation or caching. I can’t just compute the answer at compile time, or cache a sub-computation that makes the full one trivial.
  • No fundamental algorithm changes. I can’t use a different integrator, for example.
  • Using assembly is allowed, but it must only make use of the facilities offered by the Lisp compiler (i.e., no external tools), and the implementation of nbody itself must be understandable without knowing assembly. In other words, it should be sufficiently hidden, and in principle easily substitutable with portable code.
  • Library code must be in principle useful for other similar tasks. It should not be hyper-specialized to this specific problem instance, but instead be useful for this general class of problems.

The third rule is more rigorous than it looks. It means we can’t just have a solve-nbody problem which dispatches to assembly.

To accomplish the above, we define a kernel DSL. The DSL allows us to express how elements of a composite transform, maintaining just enough invariants to allow them to be handled efficiently. These kernels are then compiled into efficient code, more efficient than ordinary loopy Lisp allows for.

Our attention will be focused on a proof-of-concept library of functionality for writing particle simulators. The operators we define are:

  • define-kernel-shape: Define the data to be transformed by each kernel. This would be the data to characterize the static and dynamic properties of a particle in motion, as well as the number of particles under consideration.
  • define-kernel-step: Define a kernel as a sequence of existing ones.
  • define-self-kernel: Define a read-write kernel that operates on each element independently, without access to other elements (i.e., a map operation).
  • define-pairwise-kernel: Define a read-write kernel that operates on all pairs of elements, reduced by symmetry (i.e., (i,j) and (j,i) are considered only once).
  • define-reduction-kernel: Define a read-only kernel that does reduction of a sequence into a single value (i.e., a reduce operation).

This collection of five operators forms a miniature, re-usable language. These broadly recapitulate those of Koru, and allow us to write something that looks like this:

(defconstant +solar-mass+ (* 4d0 pi pi))
(defconstant +days-per-year+ 365.24d0)
(defconstant +dt+ 0.01d0)
(define-kernel-shape body 5
x y z vx vy vz mass)
(defparameter *system*
(make-body-system
(list :x 0d0 :y 0d0 :z 0d0
:vx 0d0 :vy 0d0 :vz 0d0
:mass +solar-mass+)
...))
(define-pairwise-kernel advance-forces (s body dt)
(let* ((dx (- i.x j.x))
(dy (- i.y j.y))
(dz (- i.z j.z))
(dsq (+ (+ (* dx dx) (* dy dy)) (* dz dz)))
(mag (/ dt (* dsq (sqrt dsq)))))
(let ((dm-j (* mag j.mass))
(dm-i (* mag i.mass)))
(decf i.vx (* dx dm-j))
(decf i.vy (* dy dm-j))
(decf i.vz (* dz dm-j))
(incf j.vx (* dx dm-i))
(incf j.vy (* dy dm-i))
(incf j.vz (* dz dm-i)))))
(define-self-kernel advance-positions (s body dt)
(incf self.x (* dt self.vx))
(incf self.y (* dt self.vy))
(incf self.z (* dt self.vz)))
(define-reduction-kernel (energy e 0d0) (s body)
(:self
(+ e (* (* 0.5d0 self.mass)
(+ (+ (* self.vx self.vx) (* self.vy self.vy))
(* self.vz self.vz)))))
(:pair
(let* ((dx (- i.x j.x))
(dy (- i.y j.y))
(dz (- i.z j.z)))
(- e (/ (* i.mass j.mass)
(sqrt (+ (+ (* dx dx) (* dy dy))
(* dz dz))))))))
(define-kernel-step run-simulation (system body n :params ((dt double-float)))
(advance-forces dt)
(advance-positions dt))

Well, in fact, this isn’t an ideal approximation, it’s almost exactly how it turned out. Given this is a proof of concept, we sometimes have to write some Lisp things a little funny. For example, you’ll notice we write:

(+ (+ (* dx dx) (* dy dy)) (* dz dz))

instead of the far more readable

(+ (* dx dx) (* dy dy) (* dz dz))

Both are completely valid and both can be used. So why the former? It is a result of a limitation of a little feature I built in: auto-vectorization. The vectorizer walks the mathematical expressions and replaces them with fast SIMD variants instead. Here’s a little fragment showing this rewrite rule:

...
(case (car expr)
;; (+ a (* b c)) -> fmadd(a,b,c)
((+)
(let ((args (cdr expr)))
(cond
((and (= (length args) 2) (mul-p (second args)))
`(%%fmadd-pd ,(xf (first args))
,(xf (second (second args)))
,(xf (third (second args)))))
...

The implementation of these kernel macros in library.lisp weighs in at just under 700 lines, and includes optional x64 SIMD auto-vectorization.

Well, for the nail biting moment, how does it compare? I made a Makefile that compares the idiomatic C against the loopy Lisp against our kernel DSL Lisp. It does a median-of-3. Running this on my computer gives:

$ make bench
=== C (gcc -O3 -ffast-math) ===
-0.169286396
runs: 1657 1664 1653 ms
median: 1657 ms
=== Lisp (SBCL, conventional loops) ===
-0.169286396
runs: 1991 2009 2005 ms
median: 2005 ms
=== Lisp (SBCL, kernel syntax) ===
-0.169286396
runs: 1651 1651 1652 ms
median: 1651 ms

So, in fact, we have matched the performance of C almost exactly. Furthermore, the generated code is still not as lean as it could be. Not to put too fine a point on it, but, <100 lines of Lisp, supported by

  • 700 lines of library code and about 4 hours of my time; and
  • 500k lines of its host compiler sbcl

has performance parity and greater readability/reusability than <100 lines of C, supported by

  • ~5,000k lines of just the C part of its host compiler gcc.

None of this is to make an argument that Lisp is “better”, or that there isn’t merit to avoiding custom DSLs in certain circumstances, or that the world doesn’t have room for more custom home-grown compilers and parsers, but I think this is the clearest possible, quasi-realistic demonstration that idiomatic Lisp can be as fast as idiomatic C without tremendous work, whilst netting additional benefits unique to Lisp.

All code is available here.

ECL NewsECL 26.3.27 release

· 60 days ago

We are announcing a new stable ECL release. This release highlights:

  • bytecodes closures are now faster and avoid capturing unused parts of the lexical environment
  • improvements to the native compiler, including better separation between compiler frontend and backend, reduced function call overhead, more aggressive dead code elimination and many internal improvements and bug fixes
  • hash table implementation improvements and bug fixes for collisions
  • streams: extensions EXT:PEEK-BYTE, EXT:UNREAD-BYTE, GRAY:STREAM-PEEK-BYTE and GRAY:STREAM-UNREAD-BYTE, bugfixes and implementation refactor
  • the codebase has been updated to conform to the C23 standard
  • simplified procedure for cross-compiling ECL itself
  • support for cross-compilation of Common Lisp code to different targets using a new :TARGET option for COMPILE-FILE
  • some fixes for the emscripten target

The release also incorporates many other bug fixes and performance improvements as well as an updated manual. We'd like to thank all people who contributed to ECL with code, testing, issue reports and otherwise.

People listed here contributed code in this iteration: Daniel Kochmański, Marius Gerbershagen, Tarn W. Burton, Kirill A. Korinsky, Dmitry Solomennikov, Kevin Zheng, Mark Shroyer and Sebastien Marie.

People listed here did extensive release candidate testing on various platforms: Marius Gerbershagen, Daniel Kochmański, Dima Pasechnik, Matthias Köppe, Jeremy List, Mark Damon Hughes and Paul Ruetz.

This release is available for download in a form of a source code archive (we do not ship prebuilt binaries):

Finally, a note on the release schedule: ECL releases often take some time to come out, partially because we do extensive testing against supported platforms and existing libraries to find regressions. In the meantime all improvements are incrementally incorporated in the branch develop. It is considered stable and it is tested and reviewed with necessary dilligence. If release cycle is too slow for your needs, then we suggest following the branch develop for the most recent changes.

Happy Hacking,
The ECL Developers

Christoph BreitkopfFunctional Valhalla?

· 75 days ago

Pointer-rich data layouts lead to suboptimal performance on modern hardware. For an excellent introduction to this, see the article The Road to Valhalla. While it is specifically about Java, many parts of the article also apply to other languages. To summarize some of the key points of the article:

  • In 1990, a main memory fetch was about as expensive as an arithmetic operation. Now, it might be a hundred times slower.
  • A pointer-rich data layout involving indirections between data at different locations is not ideal for today's hardware.
  • A language should make flat (cache-efficient) and dense (memory-efficient) memory layouts possible without compromising abstraction or type safety.

Consider a vector of records (or tuples, structures, product types - I'll stay with "record" in this article). A pointer-rich layout has each record allocated separately in the heap, with a vector containing pointers to the records. For example, given a "Point" record of two numbers:

pikchr diagram

The flat and dense layout has the records directly in the array:

pikchr diagram

(Note that there is another flat layout, namely, using one vector per field of the record. This is better suited to instruction-level parallelism or specialized hardware (e.g., GPUs), especially when the record fields have different sizes. But it is less suited for general-purpose computing, as reading a single vector element requires one memory access per field, whereas the "vector of records" layout above requires only one access per record. Such a layout can be easily implemented in any language that has arrays of native types, whether in the language itself or in a library (e.g., OCaml's Owl library). Thus, in this article, I will only consider the "array of records" layout above.)

Functional language considerations

Things should be much easier in functional languages than in Java: we have purity, referential transparency, and everything is a value. So it should be simple enough to store these values in memory in their native representation. But there are reasons that that is often not the case in practice:

  • Lazyness: a value can be a computation that produces a value only when needed.
  • Layout polymorphism: unless we replicate the code for every type (as, for example, Rust does), we need to be able to store every possible value in the same kind of slot.
  • Dynamically typed languages require type information at runtime.
  • Functional languages often have automatic memory management, which may require runtime type information.
  • Many of our languages are not purely functional, but contain impure features.
  • Pure languages often lack traditional vectors or arrays, since making them perform well in immutable code is not easy.
  • Historical reasons: Graph reduction was a common implementation technique for lazy languages, and graphs involve pointers.
  • Implementation restrictions: not being mainstream, fewer resources are devoted to implementation and optimization.

Many implementations can not even lay out native types flat in records, so a Point record of IEEE 754 double-precision numbers may actually look like this in memory:

pikchr diagram

The (very short) List

So, given a record type, which functional languages allow a collection of values of that type to have a flat, linear memory layout? The number of programming languages that claim to be "functional" is huge, so the ones listed here are just a selection based on my preferences - mainly languages that allow that layout, and some I have some experience with and can speculate on how easy or hard it would be to add that as a library or extension.

Since the Point record can be misleading in its simplicity when it comes to the question of whether the functionality could be implemented as a library, I'll point out that there are records where the layout is a bit more interesting:

  • Records containing different types with different storage sizes, for example, one 64-bit float and one 32-bit integer. On most architectures, this will require 4 bytes of padding between elements.
  • Records containing native values along with something that has to be represented as a pointer, for example, a reference-type or a lazy value. In a flat layout, this means that every nth element will be a pointer, requiring special support from the memory management system, either by providing layout information or by using a conservative GC that treats everything as a potential pointer.

Pure languages:

Clean

Yes: Clean has unboxed arrays of records in the base language.

Caveat: it does not have integer types of specific sizes and only one floating-point type, making it harder to reduce memory usage by using the smallest type just large enough to support the required value range. It seems possible to implement such types in a library (the mTask system does that).

Futhark

No. Futhark does not intend to be a general-purpose language, so this is not surprising.

I mention it here because it does have arrays of records, but, since it targets GPUs and related hardware, it uses the "record of arrays" layout mentioned above.

Haskell

Yes. Not in the base language, but there is library support via Data.Vector.Unboxed. Types that implement the Unbox type class can be used in these vectors. Many basic types and tuples have an Unbox instance. However, when you care about efficiency, you probably do not want to use tuples but rather a data type with strict fields, i.e., not:

type Point = (Double, Double)

but:

data Point = Point !Double !Double

Writing an Unbox instance for such a type is not trivial. The vector-th-unbox library makes it easier, but requires Template Haskell. Unboxed vectors are implemented by marshalling the values to byte arrays, so records with pointer fields are not supported.

Impure Languages

F#

Yes, even records with pointer fields. Records have structural equality, and you can use structs or the [<Struct>] attribute to get a flat layout.


And that's all I could find. Unless I follow Wikipedia's list of functional programming languages, which contains languages such as C++, C#, Rust, or Swift, that allow the flat layout, but don't really fit my idea of a functional language. But SML, OCaml, Erlang (Elixir, Gleam), Scala? Not that I could see (but please correct me if I'm wrong).

Rolling your own

Since there is a library implementation for Haskell, maybe that's a possibility for other languages?

You should be able to implement flat layouts in any language that supports byte vectors. More interesting is how well such a library fits into the language, and whether a user of the library has to write code or annotations for user-defined record types, or whether the library can handle part or all of that automagically.

I'll only mention my beloved Lisp/Scheme here. Lisp's uniform syntax and macro system are a bonus here, but the lack of static typing makes things harder.

In Scheme, R6RS (and R7RS with the help of some SRFIs) has byte-vectors and marshalling to/from them in the standard library. But Scheme does not have type annotations, so you either need to offer a macro to define records with typed fields or to define how to marshal the fields of a regular (sealed) record. Since you can shadow standard procedures in a library, you can write code that looks like regular Scheme code, but, perhaps surprisingly, loses identity when storing/retrieving values from records:

(let ((vec (make-typed-vector 'point 1000))
      (pt (make-point x y)))
  (vector-set! vec 0 pt)
  (eq? (vector-ref vec 0) pt))
#f

(But then, you probably shouldn't be using eq? when doing functional programming in Scheme).

The same approach is possible in Common Lisp. In contrast to Scheme, it does have optional type annotations, and, together with a helper library for accessing the innards of floats and either the meta-object protocol to get type information or (probably better) a macro to define typed records, an implementation should be reasonably straightforward. Making it play nice with inheritance and the dynamic nature of Common Lisp (e.g., adding slots to classes or even changing an object's class at runtime) would be a much harder undertaking.

Conclusion

Of the functional languages I looked at, only F# fully supports flat and dense memory layouts. Among the pure languages, Haskell and Clean come close.

The question is how important this really is. There's a good argument to be made for turning to more specialized languages like Futhark if you mainly care about performance. On the other hand, having a uniform codebase in one language also has advantages.

Then, the performance story has changed, too. While the points Project Valhalla raises remain true in principle, processor designers are aware of this as well. They are doing their best to hide memory latency with techniques such as out-of-order execution or humongous caches. Thus, on a modern CPU, the effects of a pointer-rich layout are often only observable with large working set sizes.

Still, given the plethora of imperative language that can get you to Valhalla, support for this in the functional landscape seems lacking. In the future, I hope to see more languages or libraries that will make this possible.

Scott L. BursonFSet v2.3.0: Transients!

· 80 days ago

FSet v2.3.0 added transients!  These make it faster to populate new collections with data, especially as the collections get large.  I shamelessly stole the idea from Clojure.

They are currently implemented only for the CHAMP types ch-set, ch-map, ch-2-relation, ch-replay-set, and ch-replay-map.

The term "transient" contrasts with "persistent".  I'm using the term "persistent" in its functional-data-structure sense, as Clojure does: a data structure is persistent if multiple states of it can coexist in memory efficiently.  (The probably more familiar use of the term is in the database sense, where it refers to nonvolatile storage of data.)  FSet collections have, up to now, all been persistent in this sense; a point modification to one, such as by with or less, takes only O(log n) space and time to return a new state of the collection, without disturbing the previous state.

A transient encapsulates the internal tree of a collection so as to guarantee that it holds the only pointer to the tree; this allows modifications to tree nodes to be made in-place, so long as the node has sufficient allocated space.  Once the collection is built, the tree is in the same format that existing FSet code expects, and can be accessed and functionally updated as usual.

Some quick micro-benchmarking suggests that speedups, for constructing a set from scratch, range from 1.6x at size 64 to as much as 2.4x at size 4096. 

You don't necessarily even have to use transients explicitly in order to benefit from them.  Some FSet builtins such as filter and image use them now.  The GMap result types ch-set etc. also use them.

For details, see the GitLab MR.


Robert SmithBeating Bellard's formula

· 83 days ago

By Robert Smith

Fabrice Bellard came up with a computationally efficient formula for calculating the nth hexadecimal digit of $\pi$ without calculating any of the previous n−1. It’s called Bellard’s formula. It wasn’t the first of its kind, but in terms of computational efficiency, it was a substantial improvement over the original, elegant Bailey-Borwein-Plouffe formula. Due to the trio’s discovery, these formulas are often called BBP-type formulas.

Over the years, numerous BBP-type formulas have been discovered. In fact, Bailey gives us a recipe to search for them using integer-relation algorithms. In simple terms, we can just guess formulas, and run a computation to see if it likely equals $\pi$ with high confidence. If we do find one, then we can use it as a conjecture to prove formally.

Like Bellard and many others, I ran a variant of Bailey’s recipe, effectively doing a brute-force search, highly optimized and in parallel. The search yielded another formula that is computationally more efficient than Bellard’s formula. The identity is as follows:

$$ \pi = \sum_{k=0}^{\infty} \frac{1}{4096^k} \left( \frac{1}{6k+1} - \frac{2^{-5}}{6k+3} + \frac{2^{-8}}{6k+5} + \frac{2}{8k+1} - \frac{2^{-5}}{8k+5} + \frac{2^{-1}}{12k+3} - \frac{2^{-4}}{12k+7} - \frac{2^{-8}}{12k+11} \right). $$

It converges at a rate of 12 bits per term. We will prove convergence, and then prove the identity itself (with a little computer assistance). As it turns out, an equivalent form of this formula was already discovered, which we will discuss as well. Finally, we’ll show a very simple implementation in Common Lisp.

Proof of convergence

Write the series as $S := \sum_{k=0}^{\infty} 4096^{-k}R(k)$. Since $R(k)\in O(1/k)$, convergence is dominated by the geometric term $4096^{-k}$:

$$ \lim_{k \to \infty} \left\vert \frac{R(k+1)}{4096^{k+1}} \middle/ \frac{R(k)}{4096^{k}} \right\vert = \frac{1}{4096}. $$

By the ratio test, the series converges absolutely. Since $4096 = 2^{12}$, each additional term contributes exactly 12 bits of precision.

Bellard’s formula converges at 10 bits per term and requires the evaluation of 7 fractions. The above converges at 12 bits per term, and requires the evaluation of 8 fractions. So while we require 20% fewer terms, each term requires about 14% more arithmetic. So, net-net, this formula is approximately 5-6% more efficient.

Proof of identity via a definite integral

Consider $1/(nk+j) = \int_{0}^{1} x^{nk+j-1} dx$. For positive integers $n$ and $b$, we get

$$ \sum_{k=0}^{\infty} \frac{1}{b^k}\cdot\frac{1}{nk+j} = \sum_{k=0}^{\infty} \int_{0}^{1} \left(\frac{x^n}{b}\right)^k x^{j-1} dx. $$

We can swap the sum and integral via the Lebesgue dominated convergence theorem, since the power series $\sum (x^n/b)^k$ converges uniformly for $x \in [0, 1]$ and $b > 1$. Using this and summing the geometric series gives:

$$ \int_{0}^{1} x^{j-1} \sum_{k=0}^{\infty} \left(\frac{x^n}{b}\right)^k dx = \int_{0}^{1} \frac{x^{j-1}}{1 - x^n/b} dx. $$

We now apply this to $S$ termwise with $b=4096=2^{12}$:

$$ S = \int_0^1 \left( \frac{x^{0}}{1 - \frac{x^6}{2^{12}}} - 2^{-5} \frac{x^{2}}{1 - \frac{x^6}{2^{12}}} + 2^{-8} \frac{x^{4}}{1 - \frac{x^6}{2^{12}}} + 2 \frac{x^{0}}{1 - \frac{x^8}{2^{12}}} - 2^{-5} \frac{x^{4}}{1 - \frac{x^8}{2^{12}}} + 2^{-1} \frac{x^{2}}{1 - \frac{x^{12}}{2^{12}}} - 2^{-4} \frac{x^{6}}{1 - \frac{x^{12}}{2^{12}}} - 2^{-8} \frac{x^{10}}{1 - \frac{x^{12}}{2^{12}}} \right) dx. $$

At this point, you could try to algebra your way through, expanding, using the substitution $x=2u$, etc. ultimately yielding a nice denominator $(u^2\pm 2u+2)(u^6-64)(u^{12}-1)$. Maybe compute some residues. Or, just CAS your way through.

% fricas
FriCAS Computer Algebra System
Version: FriCAS 2025.12.23git built with sbcl 2.5.2.1852-1f3beec71
Timestamp: Wed Mar 4 12:41:38 EST 2026
-----------------------------------------------------------------------------
Issue )copyright to view copyright notices.
Issue )summary for a summary of useful system commands.
Issue )quit to leave FriCAS and return to shell.
-----------------------------------------------------------------------------
(1) -> f := (1/(1 - x^6/4096))
- (1/32)*x^2/(1 - x^6/4096)
+ (1/256)*x^4/(1 - x^6/4096)
+ 2*1/(1 - x^8/4096)
- (1/32)*x^4/(1 - x^8/4096)
+ (1/2)*x^2/(1 - x^12/4096)
- (1/16)*x^6/(1 - x^12/4096)
- (1/256)*x^10/(1 - x^12/4096);
Type: Fraction(Polynomial(Fraction(Integer)))
(2) -> normalize(integrate(f, x = 0..1))
3 1 11 19 1
(2) 2 atan(-) - 2 atan(-) + 2 atan(--) + 2 atan(--) + 2 atan(-)
2 2 24 48 4
Type: Expression(Fraction(Integer))

So now we just need to show the arctans all collapse to $\pi$. Recall the identity

$$ \tan^{-1} a \pm \tan^{-1} b = \tan^{-1}\left(\frac{a\pm b}{1\mp ab}\right). $$

The sum of the first four terms can be calculated easily in Common Lisp:

% sbcl --no-inform
* (defun combine (a b) (/ (+ a b) (- 1 (* a b))))
COMBINE
* (reduce #'combine '(3/2 -1/2 11/24 19/48))
4

So we have $2\big(\tan^{-1}4 + \tan^{-1}(1/4)\big)$, and with our final elementary trig identity $\tan^{-1} (a/b) = \pi/2 - \tan^{-1} (b/a)$, we find $S = \pi$.

A new discovery?

Of course, I was excited to find this formula, but after some internet spelunking, it turns out it had already been discovered by Géry Huvent and Boris Gourévitch, perhaps independently. Gourévitch doesn’t credit Huvent as he does with other formulas, but he does say “[…] furthermore, we can obtain BBP formula […] by using what Gery Huvent calls the denomination tables […].” Daisuke Takahashi cites Huvent’s website in this 2019 paper published in The Ramanujan Journal. In all cases, they write the formula in the following way:

$$ \frac{1}{128} \sum _{k=0}^{\infty} \frac{1}{2^{12k}}\left( \frac{768}{24 k+3}+\frac{512}{24k+4}+\frac{128}{24 k+6}-\frac{16}{24 k+12}-\frac{16}{24 k+14}-\frac{12}{24 k+15}+\frac{2}{24 k+20}-\frac{1}{24 k+22}\right), $$

which is structurally equivalent to $S$.

Despite having been known already, this formula doesn’t appear to be well known. As such, I hope this blog post brings more attention to it.

Simple implementation

Here is a simple implementation of digit extraction using BBP-type formulas in Common Lisp:

(defun %pow2-mod (exponent modulus)
(cond
((= modulus 1) 0)
((zerop exponent) 1)
(t
(let ((result 1)
(base (mod 2 modulus))
(e exponent))
(loop :while (plusp e) :do
(when (oddp e)
(setf result (mod (* result base) modulus)))
(setf base (mod (* base base) modulus)
e (ash e -1)))
result))))
(defun %scaled-frac-of-power-two (exponent denom)
(cond
((>= exponent 0)
(let ((residue (%pow2-mod exponent denom)))
(floor (ash residue *precision-bits*) denom)))
(t
(let ((effective-bits (+ *precision-bits* exponent)))
(if (minusp effective-bits)
0
(floor (ash 1 effective-bits) denom))))))
(defun %series-scaled-frac (bit-index bbp-series k-step global-shift alternating-p)
;; A series is a list of series terms. A series term is a quadruple
;; (SIGN SHIFT DENOM-MULTIPLIER DENOM-OFFSET) representing the summand
;; SIGN * 2^SHIFT / (DENOM_MULTIPLIER * k + DENOM_OFFSET).
(let* ((modulus (ash 1 *precision-bits*))
(max-shift (loop :for term :in bbp-series :maximize (second term)))
(k-max (max 0 (ceiling (+ bit-index ; conservative bound
global-shift
max-shift
*precision-bits*
*guard-bits*)
k-step))))
(loop :with acc := 0
:for k :from 0 :to k-max :do
(let ((k-sign (if (and alternating-p (oddp k)) -1 1))
(k-factor (* k-step k)))
(dolist (term bbp-series)
(destructuring-bind (term-sign shift den-mul den-add) term
(let* ((denom (+ den-add (* den-mul k)))
(exponent (+ bit-index global-shift shift (- k-factor)))
(piece (%scaled-frac-of-power-two exponent denom))
(signed (* k-sign term-sign)))
(when (plusp piece)
(setf acc (mod (+ acc (* signed piece)) modulus)))))))
:finally (return acc))))
(defun %nth-hex-from-series (n terms k-step global-shift alternating-p)
(let* ((bit-index (* 4 n)))
(ldb (byte 4 (- *precision-bits* 4))
(%series-scaled-frac bit-index
terms
k-step
global-shift
alternating-p))))

This implementation uses Lisp’s arbitrary precision integer arithmetic. A “real” implementation would use more efficient arithmetic, but this will suffice for some basic testing. Now we can write functions to use the Bellard formula and the new formula:

(defparameter +bellard-terms+
'((-1 5 4 1)
(-1 0 4 3)
(+1 8 10 1)
(-1 6 10 3)
(-1 2 10 5)
(-1 2 10 7)
(+1 0 10 9)))
(defun bellard-nth-hex (n)
(%nth-hex-from-series (* 4 n) +bellard-terms+ 10 -6 t))
(defparameter +new-terms+
'((+1 0 6 1)
(-1 -5 6 3)
(+1 -8 6 5)
(+1 1 8 1)
(-1 -5 8 5)
(+1 -1 12 3)
(-1 -4 12 7)
(-1 -8 12 11)))
(defun new-nth-hex (n)
(%nth-hex-from-series (* 4 n) +new-terms+ 12 0 nil))

Let’s make sure they agree for the first 1000 hex digits:

CL-USER> (loop :for i :below 1000
:always (= (bellard-nth-hex i) (new-nth-hex i)))
T

And now let’s look at timing comparisons. Here’s a little driver:

(defun compare-timings (n)
(flet ((time-it (f n)
(sb-ext:gc :full t)
(let ((start (get-internal-real-time)))
(funcall f n)
(- (get-internal-real-time) start))))
(loop :repeat n
:for index := 1 :then (* 10 index)
:for bellard := (time-it #'bellard-nth-hex index)
:for new := (time-it #'new-nth-hex index)
:do (format t "~v,' D: new is ~A% faster than bellard~%" n index
(round (* 100 (- bellard new)) bellard)))))

And the results if the timing up to the one millionth hexadecimal digit:

CL-USER> (compare-timings 7)
1 : new is 81% faster than bellard
10 : new is 7% faster than bellard
100 : new is 6% faster than bellard
1000 : new is 5% faster than bellard
10000 : new is 4% faster than bellard
100000 : new is 3% faster than bellard
1000000: new is 4% faster than bellard

As predicted, though imperfect a test, it’s consistently faster across a few orders of magnitude.

Neil MunroNingle Tutorial 15: Pagination, Part 2

· 87 days ago

Contents

Introduction

Welcome back! We will be revisiting the pagination from last time, however we are going to try and make this easier on ourselves, I built a package for pagination mito-pager, the idea is that much of what we looked at in the last lesson was very boiler plate and repetitive so we should look at removing this.

I will say, my mito-pager can do a little more than just what I show here, it has two modes, you can use paginate-dao (named this way so that it is familiar to mito) to paginate over simple models, however, if you need to perform complex queries there is a macro with-pager that you can use to paginate. It is this second form we will use in this tutorial.

There is one thing to bear in mind, when using mito-pager, you must implement your data retrieval functions in such a way to return a values object, as mito-pager relies on this to work.

I encourge you to try the library out in other use-cases and, of course, if you have ideas, please let me know.

Changes

Most of our changes are quite limited in scope, really it's just our controllers and models that need most of the edits.

ningle-tutorial-project.asd

We need to add the mito-pager package to our project asd file.

- :ningle-auth)
+ :ningle-auth
+ :mito-pager)

src/controllers.lisp

Here is the real payoff! I almost dreaded writing the sheer volume of the change but then realised it's so simple, we only need to change our index function, and it may be better to delete it all and write our new simplified version.

(defun index (params)
  (let* ((user (gethash :user ningle:*session*))
         (req-page (or (parse-integer (or (ingle:get-param "page" params) "1") :junk-allowed t) 1))
         (req-limit (or (parse-integer (or (ingle:get-param "limit" params) "50") :junk-allowed t) 50)))
    (flet ((get-posts (limit offset) (ningle-tutorial-project/models:posts user :offset offset :limit limit)))
      (mito-pager:with-pager ((posts pager #'get-posts :page req-page :limit req-limit))
        (djula:render-template* "main/index.html" nil :title "Home" :user user :posts posts :pager pager)))))

This is much nicer, and in my opinion, the controller should be this simple.

src/main.lisp

We need to ensure we include the templates from mito-pager, this is a simple one line change.

 (defun start (&key (server :woo) (address "127.0.0.1") (port 8000))
    (djula:add-template-directory (asdf:system-relative-pathname :ningle-tutorial-project "src/templates/"))
+   (djula:add-template-directory (asdf:system-relative-pathname :mito-pager "src/templates/"))

src/models.lisp

As mentioned at the top of this tutorial, we have to implement our data retrieval functions in a certain way. While there are some changes here, we ultimately end up with less code.

We can start by removing the count parameter, we wont be needing it in this implementation, and since we don't need the count parameter anymore, the :around method can go too!

- (defgeneric posts (user &key offset limit count)
+ (defgeneric posts (user &key offset limit)
-
- (defmethod posts :around (user &key (offset 0) (limit 50) &allow-other-keys)
-   (let ((count (mito:count-dao 'post))
-         (offset (max 0 offset))
-         (limit (max 1 limit)))
-     (if (and (> count 0) (>= offset count))
-       (let* ((page-count (max 1 (ceiling count limit)))
-              (corrected-offset (* (1- page-count) limit)))
-         (posts user :offset corrected-offset :limit limit))
-       (call-next-method user :offset offset :limit limit :count count))))

There's two methods to look at, the first is when the type of user is user:

-
- (defmethod posts ((user user) &key offset limit count)
+ (defmethod posts ((user user) &key offset limit)
...
      (values
-         (mito:retrieve-by-sql sql :binds params)
-         count
-         offset)))
+         (mito:retrieve-by-sql sql :binds params)
+         (mito:count-dao 'post))))

The second is when the type of user is null:

-
- (defmethod posts ((user null) &key offset limit count)
+ (defmethod posts ((user null) &key offset limit)
...
    (values
-       (mito:retrieve-by-sql sql)
-       count
-       offset)))
+       (mito:retrieve-by-sql sql)
+       (mito:count-dao 'post))))

As you can see, all we are really doing is relying on mito to do the lions share of the work, right down to the count.

src/templates/main/index.html

The change here is quite simple, all we need to do is to change the path to the partial, we need to simply point to the partial provided by mito-pager.


- {% include "partials/pager.html" with url="/" title="Posts" %}
+ {% include "mito-pager/partials/pager.html" with url="/" title="Posts" %}

src/templates/partials/pagination.html

This one is easy, we can delete it! mito-pager provides its own template, and while you can override it (if you so wish), in this tutorial we do not need it anymore.

Conclusion

I hope you will agree that this time, using a prebuilt package takes a lot of the pain out of pagination. I don't like to dictate what developers should, or shouldn't use, so that's why last time you were given the same information I had, so if you wish to build your own library, you can, or if you want to focus on getting things done, you are more than welcome to use mine, and of course, if you find issues please do let me know!

Learning Outcomes

Level Learning Outcome
Understand Understand how third-party pagination libraries like mito-pager abstract boilerplate pagination logic, and how with-pager expects a fetch function returning (values items count) to handle page clamping, offset calculation, and boundary correction automatically.
Apply Apply flet to define a local adapter function that bridges the project's posts generic function with mito-pager's expected (lambda (limit offset) ...) interface, and use with-pager to reduce controller complexity to its essential logic.
Analyse Analyse what responsibilities were transferred from the manual pagination implementation to mito-pager — count caching, boundary checking, offset calculation, page correction, and range generation — contrasting the complexity of both approaches.
Create Refactor a manual pagination implementation to use mito-pager by simplifying model methods to return (values items count), replacing complex multi-step controller calculations with with-pager, and delegating the pagination template partial to the library.

Github

  • The link for the custom pagination part of the tutorials code is available here.

Common Lisp HyperSpec

Symbol Type Why it appears in this lesson CLHS
defpackage Macro Define project packages like ningle-tutorial-project/models, /forms, /controllers. http://www.lispworks.com/documentation/HyperSpec/Body/m_defpac.htm
in-package Macro Enter each package before defining models, controllers, and functions. http://www.lispworks.com/documentation/HyperSpec/Body/m_in_pkg.htm
defgeneric Macro Define the simplified generic posts function signature with keyword parameters offset and limit (the count parameter is removed). http://www.lispworks.com/documentation/HyperSpec/Body/m_defgen.htm
defmethod Macro Implement the simplified posts methods for user and null types (the :around validation method is removed). http://www.lispworks.com/documentation/HyperSpec/Body/m_defmet.htm
flet Special Operator Define the local get-posts adapter function that wraps posts to match mito-pager's expected (lambda (limit offset) ...) interface. http://www.lispworks.com/documentation/HyperSpec/Body/s_flet_.htm
let* Special Operator Sequentially bind user, req-page, and req-limit in the controller where each value is used in subsequent bindings. http://www.lispworks.com/documentation/HyperSpec/Body/s_let_l.htm
or Macro Provide fallback values when parsing page and limit parameters, defaulting to 1 and 50 respectively. http://www.lispworks.com/documentation/HyperSpec/Body/m_or.htm
multiple-value-bind Macro Capture the SQL string and bind parameters returned by sxql:yield in the model methods. http://www.lispworks.com/documentation/HyperSpec/Body/m_multip.htm
values Function Return two values from posts methods — the list of results and the total count — as required by mito-pager:with-pager. http://www.lispworks.com/documentation/HyperSpec/Body/a_values.htm
parse-integer Function Convert string query parameters ("1", "50") to integers, with :junk-allowed t for safe parsing. http://www.lispworks.com/documentation/HyperSpec/Body/f_parse_.htm

Joe Marshallbinary-compose-left and binary-compose-right

· 98 days ago

If you have a unary function F, you can compose it with function G, H = F ∘ G, which means H(x) = F(G(x)). Instead of running x through F directly, you run it through G first and then run the output of G through F.

If F is a binary function, then you either compose it with a unary function G on the left input: H = F ∘left G, which means H(x, y) = F(G(x), y) or you compose it with a unary function G on the right input: H = F ∘right G, which means H(x, y) = F(x, G(y)).

(binary-compose-left f g)  = (λ (x y) (f (g x) y))
(binary-compose-right f g) = (λ (x y) (f x (g y)))

We could extend this to trinary functions and beyond, but it is less common to want to compose functions with more than two inputs.

binary-compose-right comes in handy when combined with fold-left. This identity holds

 (fold-left (binary-compose-right f g) acc lst) <=>
   (fold-left f acc (map g lst))

but the right-hand side is less efficient because it requires an extra pass through the list to map g over it before folding. The left-hand side is more efficient because it composes g with f on the fly as it folds, so it only requires one pass through the list.

vindarel&#128396;&#65039; Lisp screenshots: today's Common Lisp applications in action

· 103 days ago

I released a hopefully inspiring gallery:

lisp-screenshots.org

We divide the showcase under the categories Music, Games, Graphics and CAD, Science and industry, Web applications, Editors and Utilities.

Of course:

“Please don’t assume Lisp is only useful for...

thank you ;)

For more example of companies using CL in production, see this list (contributions welcome, of course).


Don’t hesitate to share a screenshot of your app! It can be closed source and yourself as the sole user, as long as it as some sort of a GUI, and you use it. Historical success stories are for another collection.

The criteria are:

  • built in Common Lisp
  • with some sort of a graphical interface
  • targeted at end users
  • a clear reference, anywhere on the web, by email to me or simply as a comment here, that it is built in CL.

Details:

  • it can be web applications whose server side is CL, even if the JS/HTML is classical web tech.
  • no CLI interfaces. A readline app is OK but hey, we can do better.
  • it can be closed-source or open-source, commercial, research or a personal software
  • regarding “end users”: I don’t see how to include a tool like CEPL, but I did include a screen of LispWorks.
  • bonus point if it is developed in a company (we want it on https://github.com/azzamsa/awesome-lisp-companies/), be it a commercial product or an internal tool.

You can reach us on GitHub discussions, by email at (reverse "gro.zliam@stohsneercs+leradniv") and in the comments.

Best,

Joe MarshallVibe Coded Scheme Interpreter

· 108 days ago

Mark Friedman just released his Scheme-JS interpreter which is a Scheme with transparent JavaScript interoperability. See his blog post at furious ideas.

This interpreter apparently uses the techniques of lightweight stack inspection — Mark consulted me a bit about that hack works. I'm looking forward to seeing the vibe coded architecture.

Gábor MelisUntangling Literate Programming

· 113 days ago

Classical literate programming

A literate program intersperses narrative and code chunks. From this, source code to be fed to the compiler is generated by a process called tangling, and documentation by weaving. The specifics of tangling vary, but by allowing complete reordering and textual combination of chunks, it lets the human narrative drive the exposition at the cost of introducing an additional step into the write-compile-run cycle.

The general idea

It is easy to mistake this classical implementation of literate programming for the more general idea that we want to

  1. present code to human readers in pedagogical order with narrative added, and

  2. make changing code and its documentation together easy.

The advantages of literate programming follow from these desiderata.

Untangled LP

In many languages today, code order is far more flexible than in the era of early literate programming, so the narrative order can be approximated to some degree using docstrings and comments. Code and its documentation are side by side, so changing them together should also be easy. Since the normal source code now acts as the LP source, there is no more tangling in the programming loop. This is explored in more detail here.

Pros and cons

Having no tangling is a great benefit, as we get to keep our usual programming environment and tooling. On the other hand, bare-bones untangled LP faces the following potential problems.

  1. Order mismatch: Things like inline functions and global variables may need to be defined before use. So, code order tends to deviate from narrative order to some degree.

  2. Reduced locality: Our main tool to sync code and narrative is factoring out small, meaningful functions, which is just good programming style anyway. However, this may be undesirable for reasons of performance or readability. In such a case, we might end up with a larger function. Now, if we have only a single docstring for it, then it can be non-obvious which part of the code a sentence in the docstring refers to because of their distance and the presence of other parts.

  3. No source code only view: Sometimes we want to see only the code. In classical LP, we can look at the tangled file. In untangled LP, editor support for hiding the narrative is the obvious solution.

  4. No generated documentation: There is no more tangling nor weaving, but we still need another tool to generate documentation. Crucially, generating documentation is not in the main programming loop.

In general, whether classical or untangled LP is better depends on the severity of the above issues in the particular programming environment.

The Lisp and PAX view

MGL-PAX, a Common Lisp untangled LP solution, aims to minimize the above problems and fill in the gaps left by dropping tangling.

  1. Order

    • Common Lisp is quite relaxed about the order of function definitions, but not so much about DEFMACRO, DEFVAR, DEFPARAMETER, DEFCONSTANT, DEFTYPE , DEFCLASS, DEFSTRUCT, DEFINE-COMPILER-MACRO, SET-MACRO-CHARACTER, SET-DISPATCH-MACRO-CHARACTER, DEFPACKAGE. However, code order can for the most part follow narrative order. In practice, we end up with some DEFVARs far from their parent DEFSECTIONs (but DECLAIM SPECIAL helps).

    • DEFSECTION controls documentation order. The references to Lisp definitions in DEFSECTION determine narrative order independently from the code order. This allows the few ordering problems to be patched over in the generated documentation.

    • Furthermore, because DEFSECTION can handle the exporting of symbols, we can declare the public interface piecemeal, right next to the relevant definitions, rather than in a monolithic DEFPACKAGE

  2. Locality

    • Lisp macros replace chunks in the rare, complex cases where a chunk is not a straightforward text substitution but takes parameters. Unlike text-based LP chunks, macros must operate on valid syntax trees (S-expressions), so they cannot be used to inject arbitrary text fragments (e.g. an unclosed parenthesis).

      This constraint forces us to organize code into meaningful, syntactic units rather than arbitrary textual fragments, which results in more robust code. Within these units, macros allow us to reshape the syntax tree directly, handling scoping properly where text interpolation would fail.

    • PAX's NOTE is an extractable, named comment. NOTE can interleave with code within e.g. functions to minimize the distance between the logic and its documentation.

    • Also, PAX hooks into the development to provide easy navigation in the documentation tree.

  3. Source code only view: PAX supports hiding verbose documentation (sections, docstrings, comments) in the editor.

  4. Generating documentation

    • PAX extracts docstrings, NOTEs and combines them with narrative glue in DEFSECTIONs.

    • Documentation can be generated as static HTML/PDF files for offline reading or browsed live (in an Emacs buffer or via an in-built web server) during development.

    • LaTeX math is supported in both PDF and HTML (via MathJax, whether live or offline).

In summary, PAX accepts a minimal deviation in code/narrative order but retains the original, interactive Lisp environment (e.g. SLIME/Sly), through which it offers optional convenience features like extended navigation, live browsing, and hiding documentation in code. In return, we give up easy fine-grained control over typesetting the documentation - a price well worth paying in Common Lisp.


For older items, see the Planet Lisp Archives.


Last updated: 2026-05-26 07:00