Planet Lisp

Eugene ZaikonnikovSome documents on AM and EURISKO

· 11 days ago

Sharing here a small collection of documents by Douglas B. Lenat related to design AM and EURISKO that I assembled over the years. These are among the most famous programs of symbolic AI era. They represent so-called 'discovery systems'. Unlike expert systems, they run loosely-constrained heuristic search in a complex problem domain.

AM was Lenat's doctoral thesis and the first attempt of such kind. Unfortunately, it's all described in rather informal pseudocode, a decision that led to a number of misunderstandings in follow-up criticism. Lenat has responded to that in one of the better known publications, Why AM and EURISKO appear to work.

AM was built around concept formation process utilizing a set of pre-defined heuristics. EURISKO takes it a step further, adding the mechanism of running discovery search on its own heuristics. Both are specimen of what we could call 'Lisp-complete' programs: designs that require Lisp or its hypothetical, similarly metacircular equivalent to function. Their style was idiomatic to INTERLISP of 1970s, making heavy use of FEXPRs and self-modification of code.

There's quite a lot of thorough analysis available in three-part The Nature of Heuristics: part one, part two. The third part contains the most insights into the workings of EURISKO. Remarkable quote of when EURISKO discovered Lisp atoms, reflecting it was written before the two decade pause in nuclear annihilation threat:

Next, EURISKO analyzed the differences between EQ and EQUAL. Specifically, it defined the set of structures which can be EQUAL but not EQ, and then defined the complement of that set. This turned out to be the concept we refer to as LISP atoms. In analogy to humankind, once EURISKO discovered atoms it was able to destroy its environment (by clobbering CDR of atoms), and once that capability existed it was hard to prevent it from happening.

Lenat's eventual conclusion from all this was that "common sense" is necessary to drive autonomous heuristic search, and that a critical mass of knowledge is necessary. That's where his current CYC project started off in early 1990s.

Bonus material: The Elements of Artificial Intelligence Using Common Lisp by Steven L. Tanimoto describes a basic AM clone, Pythagoras.

CL Test Gridquicklisp 2018-10-18

· 17 days ago

Quicklisp newsOctober 2018 Quicklisp dist update now available

· 25 days ago
New projects:
  • arrows — Implements -> and ->> from Clojure, as well as several expansions on the idea. — CC0
  • authenticated-encryption — Authenticated-Encryption functions — MIT
  • base64 — Base64 encoding and decoding for Common Lisp. — Apache 2.0
  • black-tie — Noise library for Common Lisp. — BSD
  • cl-clblas — clBLAS binding — Apache License, Version 2.0
  • cl-dotenv — Utility library for loading .env files — MIT
  • cl-fuzz — A Fuzz Testing Framework — BSD-2
  • cl-las — Library to manipulate LAS files — ISC
  • cl-proj — CL-PROJ provides Proj.4 library bindings — BSD
  • cl-prolog2 — Common Interface to the ISO prolog implementations from Common Lisp — MIT
  • cover — Code coverage utility for Common Lisp — MIT
  • destructuring-bind-star — DESTRUCTURING-BIND with proper error signaling — MIT
  • everblocking-stream — A stream that always blocks and never has data available. — Public domain
  • heap — Binary Heap for Common Lisp. — Apache 2.0
  • huffman — Huffman encoding and decoding for Common Lisp. — Apache 2.0
  • lazy — Lazy forms for Common Lisp. — Apache 2.0
  • lorem-ipsum — Lorem ipsum generator in portable Common Lisp — MIT
  • parse — Parsing package for Common Lisp. — Apache 2.0
  • print-html — Simple html generator. — MIT License
  • protest — Common Lisp PROtocol and TESTcase Manager — LLGPL
  • re — Lua-style string pattern matching. — Apache 2.0
  • regular-type-expression — This project contains several Common Lisp packages — MIT
  • safe-read — A variant of READ secure against internbombing, excessive input and macro characters. — BSD 2-clause
  • safety-params — Filter parameters — BSD 2-Clause
  • sc-extensions — additional library collection for cl-collider — Public Domain / 0-clause MIT
  • sha1 — SHA1 Digest and HMAC for LispWorks. — Apache 2.0
  • sycamore — A fast, purely functional data structure library — BSD-3
  • targa — Targa Image Loading for Common Lisp. — Apache 2.0
  • trivial-cltl2 — Compatibility package exporting CLtL2 functionality — LLGPL
Updated projectsarray-utilsasdf-vizassoc-utilsbinary-iobit-smashercari3sceplcl+sslcl-anacl-cffi-gtkcl-collidercl-colors2cl-i18ncl-kanrencl-ledgercl-liballegrocl-mecabcl-mixedcl-neovimcl-notebookcl-patternscl-plumbingcl-portmanteaucl-postgres-plus-uuidcl-progress-barcl-pslibcl-pslib-barcodecl-pythoncl-rabbitcl-sdl2cl-sdl2-imagecl-sdl2-mixercl-sdl2-ttfclackcloser-mopclosure-commonclunit2clxcodexcolleencommonqtcroatoancxmlcxml-stpdataflydefinitionsdexadordjuladmldo-urlencodedufydynamic-mixinseasy-audioeclectorfemlispfunction-cachefxmlgamebox-mathgeowktgolden-utilsharmonyiclendarinquisitorintegralironcladlacklasslichat-tcp-serverlog4clmaidenmcclimmitomito-attachmentmywayninevehningleoverlordpango-markupparachuteparser.iniperlrepetalispplace-utilsplexippus-xpathplump-sexppostmodernpreplprint-licensesqlotqtoolsquriread-csvroves-dot2scalplselserapeumshadowshuffletronslysplit-sequencest-jsonstaplestmxstumpwmsxqltime-intervaltootertrace-dbtrack-besttriviatrivial-benchmarktrivial-garbagetrivial-gray-streamstrivial-indenttrivial-utilitiesubiquitousutmvarjovernacularwoowookie.

Removed projects: clot, clpmr, cobstor, html-sugar, ie3fp, manardb, metafs, mime4cl, net4cl, npg, ods4cl, plain-odbc, quid-pro-quo, sanitized-params, sclf, smtp4cl, tiff4cl.

The removed projects no longer work on SBCL.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

Vsevolod DyomkinANN: flight-recorder - a robust REPL logging facility

· 44 days ago

Interactivity is a principal requirement for a usable programming environment. Interactivity means that there should be a shell/console/REPL or other similar text-based command environment. And a principal requirement for such an environment is keeping history. And not just keeping it, but doing it robustly:

  • recording history from concurrently running sessions
  • keeping unlimited history
  • identifying the time of the record and its context

This allows to freely experiment and reproduce the results of successful experiments, as well as go back to an arbitrary point in time and take another direction of your work, as well as keeping DRY while performing common repetitive tasks in the REPL (e.g. initialization of an environment or context).

flight-recorder or frlog (when you need a distinct name) is a small tool that intends to support all these requirements. It grew out of a frustration with how history is kept in SLIME, so it was primarily built to support this environment, but it can be easily utilized for other shells that don't have good enough history facility. It is possible due to its reliance on the most common and accessible data-interchange facility: text-based HTTP.

frlog is a backend service that supports any client that is able to send an HTTP request.

The backend is a Common Lisp script that can be run in the following manner (probably, the best way to do it is inside screen):


sbcl --noprint --load hunch.lisp -- -port 7654 -script flight-recorder.lisp

If will print a bunch of messages that should end with the following line (modulo timestamp):


[2018-09-29 16:00:53 [INFO]] Started hunch acceptor at port: 7654.

The service appends each incoming request to the text file in markdown format: ~/.frlog.md.

The API is just a single endpoint - /frlog that accepts GET and POST requests. The parameters are:

  • text is the content (url-encoded, for sure) of the record that can, alternatively, be sent in the POST request's body (more robust)

Optional query parameters are:

  • title - used to specify that this is a new record: for console-based interactions, usually, there's a command and zero or more results - a command starts the record (and thus should be accompanied with the title: for SLIME interactions it's the current Lisp package and a serial number). An entitled record is added in the following manner:

    ### cl-user (10) 2018-09-29_15:49:17

    (uiop:pathname-directory-pathname )
    If there's no title, the text is added like this:

    ;;; 2018-09-29_15:49:29

    #<program-error @ #x100074bfd72>
  • tag - if provided it signals that the record should be made not to a standard .frlog.md file, but to .frlog-<tag>.md. This allows to easily log a specific group of interactions separately If the response code is 200 everything's fine.

Currently, 2 clients are available:

  • a SLIME client flight-recorder.el that monkey-patches a couple of basic SLIME functions (just load it from Emacs if you have SLIME initialized)
  • and a tiny Lisp client frlog.lisp

P.S. To sum up, more and more I've grown to appreciate simple (sometimes even primitive - the primitive the better :) tools. flight-recorder seems to me to be just like that: it was very easy to hack together, but it solves an important problem for me and, I guess, for many. And it's the modern "Unix way": small independent daemons, text-based formats and HTTP instead of pipes...

P.P.S. frlog uses another tiny tool of mine - hunch that I've already utilized in a number of projects but haven't described yet - it's a topic for another post. In short, it is a script to streamline running hunchentoot that does all the basic setup and reduces the developer's work to just defining the HTTP endpoints.

P.P.P.S. I know, the name is, probably, taken and it's a rather obvious one. But I think it just doesn't matter in this case... :)

Quicklisp newsAugust 2018 Quicklisp dist update now available

· 74 days ago
New projects:
  • cari3s — A generator for the i3 status bar. — Artistic
  • cl-all — A script to evaluate expressions in multiple lisp implementations. — Artistic
  • cl-collider — A SuperCollider client for CommonLisp — Public Domain
  • cl-environments — Implements the CLTL2 environment access functionality for implementations which do not provide the functionality to the programmer. — MIT
  • cl-ledger — Double-entry accounting system. — BSD-3
  • cl-mecab — Interface of MeCab that is a morpheme analyzer — LLGPL
  • cl-swagger-codegen — lisp code generator for swagger — 2-clause BSD
  • cmake-parser — A cmake script parser. — MIT
  • dartscluuid — Provides support for UUIDs as proper values — MIT
  • database-migrations — System to version the database in roughly the same way rails migrations work. Differences are that only one database is really supported (but hacking around that is trivial) and that migrations are not needed to be stored in separate files. — MIT
  • float-features — A portability library for IEEE float features not covered by the CL standard. — Artistic
  • iclendar — An iCalendar format lirbary. — Artistic
  • illusion — Customize and manage Lisp parens reader — MIT
  • lisp-binary — Declare binary formats as structs and then read and write them. — GPLv3
  • mmap — Portable mmap (file memory mapping) utility library. — Artistic
  • pango-markup — A small library to generate pango-style text markup. — Artistic
  • petalisp — Elegant High Performance Computing — AGPLv3
  • pludeck — A friendly interface for creating a Plump DOM. — MIT
  • print-licenses — Print the licenses used by the given project and its dependencies. — MIT
  • sly — Sylvester the Cat's Common Lisp IDE — Public Domain
  • sprint-stars — Display the stars of a GitHub User — GPL 3
  • studio-client — A client library for the Studio image hosting service — Artistic
  • test-utils — Convenience functions and macros for testing Common Lisp applications via Prove and Quickcheck — MIT Expat
  • trivial-utilities — A collection of useful functions and macros. — MIT
  • vernacular — Module system for language embeddings. — MIT
Updated projects3d-matrices3d-vectorsa-cl-loggeracclimationahungry-fleecealexaalgebraic-data-libraryaprilarc-compatarray-utilsbodge-sndfilecavemanceplcepl.drm-gbmchirpcl+sslcl-anacl-bnfcl-bootstrapcl-colors2cl-conllucl-csvcl-dbicl-feedparsercl-flaccl-flowcl-fondcl-gamepadcl-gendoccl-generatorcl-gpiocl-gracecl-i18ncl-k8055cl-kanrencl-libuvcl-mixedcl-monitorscl-mpg123cl-oclapicl-openglcl-out123cl-patternscl-ppcrecl-progress-barcl-projectcl-pslibcl-pslib-barcodecl-sdl2-ttfcl-soilcl-soloudcl-spidevcl-strcl-virtualboxcl-waylandcl-yesqlclackclawclipclmlcloser-mopclssclunit2colleenconfiguration.optionsconiumcroatoancrypto-shortcutscurry-compose-reader-macrosdartsclhashtreedeedsdeferreddefinitionsdelta-debugdeploydexadordissectdmldocumentation-utilsdufyeazy-gnuploteazy-projecteclectorfare-scriptsfast-httpflareflowfluteforform-fiddlefxmlglsl-specglsl-toolkitgolden-utilsgsllhalftoneharmonyhumblerironcladjsonrpckenzolacklambda-fiddlelasslegitlichat-protocollichat-serverliblichat-tcp-clientlichat-tcp-serverlichat-ws-serverlionchatlisp-executablelispbuilderlistopialquerymaidenmcclimmitomodularizemodularize-hooksmodularize-interfacesmore-conditionsmultiposterneo4clnibblesninevehnorthoclclopticloverlordoxenfurtparachutepathname-utilsperlrepipingplumpplump-bundleplump-sexpplump-texpostmodernqlotqt-libsqtoolsqtools-uiracerrandom-stateratifyredirect-streamrfc2388rutilssanitized-paramsselserapeumshadowsimple-inferiorssimple-tasksslimesnoozesoftdrinksouthspinneretstaplestumpwmsxqlterminfothnappytootertrace-dbtrivial-argumentstrivial-batterytrivial-benchmarktrivial-clipboardtrivial-gray-streamstrivial-indenttrivial-main-threadtrivial-mimestrivial-thumbnailumbrausocketuuidvarjoverbosevgplotwhofieldswith-c-syntaxwoowookiexml.location.

Removed projects: cl-clblas, cl-proj.

There are no direct problems with cl-clblas and cl-proj. They are victims of a hard drive crash on my end, and an incomplete recovery. I have not been able to set up the foreign libraries required to build those projects in time for this month's release.

If you want to continue using cl-clblas and cl-proj, there are a few options:

  • Don't upgrade to the latest Quicklisp dist until they are back in
  • If you already upgraded, downgrade
  • Clone their repos into ~/quicklisp/local-projects
Sorry for any inconvenience this may cause. I hope to have it fully resolved in the September 2018 update.

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

Zach BeaneA Road to Common Lisp - Steve Losh

· 78 days ago

Paul KhuongRestartable Sequences With the Polysemic Null Segment Selector

· 80 days ago

Implementing non-blocking algorithms is one of the few things that are easier to code in a kernel than in userspace (the only other one I can think of is physically wrecking a machine). In the kernel, we only have to worry about designing a protocol that achieves forward progress even if some threads stop participating. We must guarantee the absence of conceptual locks (we can’t have operations that must be paired, or barriers that everyone must reach), but are free to implement the protocol by naïvely spinlocking around individual steps: kernel code can temporarily disable preemption and interrupts to bound a critical section’s execution time.

Getting these non-blocking protocols right is still challenging, but the challenge is one fundamental for reliable systems. The same problems, solutions, and space/functionality trade-offs appear in all distributed systems. Some would even argue that the kind of interfaces that guarantee lock- or wait- freedom are closer to the object oriented ideals.

Of course, there is still a place for clever instruction sequences that avoid internal locks, for code that may be paused anywhere without freezing the whole system: interrupts can’t always be disabled, read operations should avoid writing to shared memory if they can, and a single atomic read-modify-write operation may be faster than locking. The key point for me is that this complexity is opt-in: we can choose to tackle it incrementally, as a performance problem rather than as a prerequisite for correctness.

We don’t have the same luxury in userspace. We can’t start by focusing on the fundamentals of a non-blocking algorithm, and only implement interruptable sequences where it makes sense. Userspace can’t disable preemption, so we must think about the minutiae of interruptable code sequences from the start; non-blocking algorithms in userspace are always in hard mode, where every step of the protocol might be paused at any instruction.

Specifically, the problem with non-blocking code in user space isn’t that threads or processes can be preempted at any point, but rather that the preemption can be observed. It’s a PCLSRing issue! Even Unices guarantee programmers won’t observe a thread in the middle of a syscall: when a thread (process) must be interrupted, any pending syscall either runs to completion, or returns with an error1. What we need is a similar guarantee for steps of our own non-blocking protocols2.

Hardware transactional memory kind of solves the problem (preemption aborts any pending transaction) but is a bit slow3, and needs a fallback mechanism. Other emulation schemes for PCLSRing userspace code divide the problem in two:

  1. Determining that another thread was preempted in the middle of a critical section.
  2. Guaranteeing that that other thread will not blindly resume executing its critical section (i.e., knowing that the thread knows that we know it’s been interrupted4).

The first part is relatively easy. For per-CPU data, it suffices to observe that we are running on a given CPU (e.g., core #4), and that another thread claims to own the same CPU’s (core #4’s) data. For global locks, we can instead spin for a while before entering a slow path that determines whether the holder has been preempted, by reading scheduling information in /proc.

The second part is harder. I have played with schemes that relied on signals, but was never satisfied: I found Linux perf will rarely, but not never, drop interrupts when I used it to “profile” context switches, and signaling when we determine that the holder has been pre-empted has memory visibility issues for per-CPU data5.

Until earlier this month, the best known solution on mainline Linux involved cross-modifying code! When a CPU executes a memory write instruction, that write is affected by the registers, virtual memory mappings, and the instruction’s bytes. Contemporary operating systems rarely let us halt and tweak another thread’s general purpose registers (Linux won’t let us self-ptrace, nor pause an individual thread). Virtual memory mappings are per-process, and can’t be modified from the outside. The only remaining angle is modifying the premptee’s machine code.

That’s what Facebook’s experimental library Rseq (restartable sequences) actually does.

I’m not happy with that solution either: while it “works,” it requires per-thread clones of each critical section, and makes us deal with cross-modifying code. I’m not comfortable with leaving code pages writable, and we also have to guarantee the pre-emptee’s writes are visible. For me, the only defensible implementation is to modify the code by mmap-ing pages in place, which incurs an IPI per modification. The total system overhead thus scales superlinearly with the number of CPUs.

With Mathieu Desnoyers’s, Paul Turner’s, and Andrew Hunter’s patch to add an rseq syscall to Linux 4.18, we finally have a decent answer. Rather than triggering special code when a thread detects that another thread has been pre-empted in the middle of a critical section, userspace can associate recovery code with the address range for each restartable critical section’s instructions. Whenever the kernel preempts a thread, it detects whether the interruptee is in such a restartable sequence, and, if so, redirects the instruction pointer to the associated recovery code. This essentially means that critical sections must be read-only except for the last instruction in the section, but that’s not too hard to satisfy. It also means that we incur recovery even when no one would have noticed, but the overhead should be marginal (there’s at most one recovery per timeslice), and we get a simpler programming model in return.

Earlier this year, I found another way to prevent critical sections from resuming normal execution after being preempted. It’s a total hack that exercises a state saving defect in Linux/x86-64, but I’m comfortable sharing it now that Rseq is in mainline: if anyone needs the functionality, they can update to 4.18, or backport the feature.

Here’s a riddle!

riddle.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static const char values[] = { 'X', 'Y' };

static char
read_value(void)
{
        /*
         * New feature in GCC 6; inline asm would also works.
         * https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces
         */
        return *(const __seg_gs char *)(uintptr_t)values;
}

int
main(int argc, char **argv)
{
        /* ... */
        char before_switch = read_value();  /* Returns 'X'. */
        usleep(1000 * 1000);  /* Or any other wait for preemption. */
        char after_switch = read_value();  /* Returns 'Y'. */
        /* ... */
}

With an appropriate setup, the read_value function above will return a different value once the executing thread is switched out. No, the kernel isn’t overwriting read-only data while we’re switched out. When I listed the set of inputs that affect a memory store or load instruction (general purpose registers, virtual memory mappings, and the instruction bytes), I left out one last x86 thing: segment registers.

Effective addresses on x86oids are about as feature rich as it gets: they sum a base address, a shifted index, a constant offset, and, optionally, a segment base. Today, we simply use segment bases to implement thread-local storage (each thread’s FS or GS offset points to its thread-local block), but that usage repurposes memory segmentation, an old 8086 feature... and x86-64 still maintains some backward compatibility with its 16-bit ancestor. There’s a lot of unused complexity there, so it’s plausible that we’ll find information leaks or otherwise flawed architectural state switching by poking around segment registers.

How to set that up

After learning about this trick to observe interrupts from userland, I decided to do a close reading of Linux’s task switching code on x86-64 and eventually found this interesting comment6.

Observing a value of 0 in the FS or GS registers can mean two things:

  1. Userspace explicitly wrote the null segment selector in there, and reset the segment base to 0.
  2. The kernel wrote a 0 in there before setting up the segment base directly, with WR{FS,GS}BASE or by writing to a model-specific register (MSR).

Hardware has to efficiently keep track of which is actually in effect. If userspace wrote a 0 in FS or GS, prefixing an instruction with that segment has no impact; if the MSR write is still active (and is non-zero), using that segment must impact effective address computation.

There’s no easy way to do the same in software. Even in ring 0, the only sure-fire way to distinguish between the two cases is to actually read the current segment base value, and that’s slow. Linux instead fast-paths the common case, where the segment register is 0 because the kernel is handling segment bases. It prioritises that use case so much that the code knowingly sacrifices correctness when userspace writes 0 in a segment register after asking the kernel to setup its segment base directly.

This incorrectness is acceptable because it only affects the thread that overwrites its segment register, and no one should go through that sequence of operations. Legacy code can still manipulate segment descriptor tables and address them in segment registers. However, being legacy code, it won’t use the modern syscall that directly manipulates the segment base. Modern code can let the kernel set the segment base without playing with descriptor tables, and has no reason to look at segment registers.

The only way to observe the buggy state saving is to go looking for it, with something like the code below (which uses GS because FS is already taken by glibc to implement thread-local storage).

h4x.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#define RUN_ME /*
gcc-6 -std=gnu99 $0 -o h4x && ./h4x; exit $?;

Should output
Reads: XXYX
Re-reads: XYX
*/
#define _GNU_SOURCE
#include <asm/prctl.h>
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static const char values[] = { 'X', 'Y' };

/* Directly set GS's base with a syscall. */
static void
set_gs_base(unsigned long base)
{
        int ret = syscall(__NR_arch_prctl, ARCH_SET_GS, base);
        assert(ret == 0);
}

/* Write a 0 in GS. */
static void
set_gs(unsigned short value)
{
        asm volatile("movw %0, %%gs" :: "r"(value) : "memory");
}

/* Read gs:values[0]. */
static char
read_value(void)
{
        /*
         * New feature in GCC 6; inline asm would also works.
         * https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces
         */
        return *(const __seg_gs char *)(uintptr_t)values;
}

int
main(int argc, char **argv)
{
        char reads[4];
        char re_reads[3];

        reads[0] = read_value();
        reads[1] = (set_gs(0), read_value());
        reads[2] = (set_gs_base(1), read_value());
        reads[3] = (set_gs(0), read_value());

        printf("Reads: %.4s\n", reads);
        fflush(NULL);

        re_reads[0] = read_value();
        re_reads[1] = (usleep(1000 * 1000), read_value());
        re_reads[2] = (set_gs(0), read_value());

        printf("Re-reads: %.3s\n", re_reads);
        return 0;
}

Running the above on my Linux 4.14/x86-64 machine yields

$ gcc-6 -std=gnu99 h4x.c && ./a.out
Reads: XXYX
Re-reads: XYX

The first set of reads shows that:

  1. our program starts with no offset in GS (reads[0] == values[0])
  2. explicitly setting GS to 0 does not change that (reads[1] == values[0])
  3. changing the GS base to 1 with arch_prctl does work (reads[2] == values[1])
  4. resetting the GS selector to 0 resets the base (reads[3] == values[0]).

The second set of reads shows that:

  1. the reset base survives short syscalls (re_reads[0] == values[0])
  2. an actual context switch reverts the GS base to the arch_prctl value (re_reads[1] == values[1])
  3. writing a 0 in GS resets the base again (re_reads[2] == values[0]).

Cute hack, why is it useful?

The property demonstrated in the hack above is that, after our call to arch_prctl, we can write a 0 in GS with a regular instruction to temporarily reset the GS base to 0, and know it will revert to the arch_prctl offset again when the thread resumes execution, after being suspended.

We now have to ensure our restartable sequences are no-ops when the GS base is reset to the arch_prctl offset, and that the no-op is detected as such. For example, we could set the arch_prctl offset to something small, like 4 or 8 bytes, and make sure that any address we wish to mutate in a critical section is followed by 4 or 8 bytes of padding that can be detected as such. If a thread is switched out in the middle of a critical section, its GS base will be reset to 4 or 8 when the thread resumes execution; we must guarantee that this offset will make the critical section’s writes fail.

If a write is a compare-and-swap, we only have to make sure the padding’s value is unambiguously different from real data: reading the padding instead of the data will make compare-and-swap fail, and the old value will tell us that it failed because we read padding, which should only happen after the section is pre-empted. We can play similar tricks with fetch-and-add (e.g., real data is always even, while the padding is odd), or atomic bitwise operations (steal the sign bit).

If we’re willing to eat a signal after a context switch, we can set the arch_prctl offset to something very large, and take a segmentation fault after being re-scheduled. Another option is to set the arch_prctl offset to 1, and use a double-wide compare-and-swap (CMPXCHG16B), or turn on the AC (alignment check) bit in EFLAGS. After a context switch, our destination address will be misaligned, which will trigger a SIGBUS that we can handle.

The last two options aren’t great, but, if we make sure to regularly write a 0 in GS, signals should be triggered rarely, only when pre-emption happens between the last write to GS and a critical section. They also have the advantages of avoiding the need for padding, and making it trivial to detect when a restartable section was interrupted. Detection is crucial because it often isn’t safe to assume an operation failed when it succeeded (e.g., unwittingly succeeding at popping from a memory allocator’s freelist would leak memory). When a GS-prefixed instruction fails, we must be able to tell from the instruction’s result, and nothing else. We can’t just check if the segment base is still what we expect, after the fact: our thread could have been preempted right after the special GS-prefixed instruction, before our check.

Once we have restartable sections, we can use them to implement per-CPU data structures (instead of per-thread), or to let thread acquire locks and hold them until they are preempted: with restartable sections that only write if there was no preemption between the lock acquisition and the final store instruction, we can create a revocable lock abstraction and implement wait-free coöperation or flat-combining.

Unfortunately, our restartable sections will always be hard to debug: observing a thread’s state in a regular debugger like GDB will reset the GS base and abort the section. That’s not unique to the segment hack approach. Hardware transactional memory will abort critical sections when debugged, and there’s similar behaviour with the official rseq syscall. It’s hard enough to PCLSR userspace code; it would be even harder to PCLSR-except-when-the-interruption-is-for-debugging.

Who’s to blame?

The null GS hack sounds like it only works because of a pile of questionable design decisions. However, if we look at the historical context, I’d say everything made sense.

Intel came up with segmentation back when 16 bit pointers were big, but 64KB of RAM not quite capacious enough. They didn’t have 32 bit (never mind 64 bit) addresses in mind, nor threads; they only wanted to address 1 MB of RAM with their puny registers. When thread libraries abused segments to implement thread-local storage, the only other options were to over-align the stack and hide information there, or to steal a register. Neither sounds great, especially with x86’s six-and-a-half general purpose registers. Finally, when AMD decided to rip out segmentation, but keep FS and GS, they needed to make porting x86 code as easy as possible, since that was the whole value proposition for AMD64 over Itanium.

I guess that’s what systems programming is about. We take our tools, get comfortable with their imperfections, and use that knowledge to build new tools by breaking the ones we already have (#Mickens).

Thank you Andrew for a fun conversation that showed the segment hack might be of interest to someone else, and to Gabe for snarkily reminding us Rseq is another Linux/Silicon Valley re-invention.


  1. That’s not as nice as rewinding the PC to just before the syscall, with a fixed up state that will resume the operation, but is simpler to implement, and usually good enough. Classic worst is better (Unix semantics are also safer with concurrency, but that could have been opt-in...).

  2. That’s not a new observation, and SUN heads like to point to prior art like Dice’s and Garthwaite’s Mostly Lock-Free Malloc, Garthwaite’s, Dice’s, and White’s work on Preemption notification for per-CPU buffers, or Harris’s and Fraser’s Revocable locks. Linux sometimes has to reinvent everything with its special flavour.

  3. For instance, SuperMalloc optimistically uses TSX to access per-CPU caches, but TSX is slow enough that SuperMalloc first tries to use a per-thread cache. Dice and Harris explored the use of hardware transactional lock elision solely to abort on context switches; they maintained high system throughput under contention by trying the transaction once before falling back to a regular lock.

  4. I did not expect systems programming to get near multi-agent epistemic logic ;)

  5. Which is fixable with LOCKed instructions, but that defeats the purpose of per-CPU data.

  6. I actually found the logic bug before the Spectre/Meltdown fire drill and was worried the hole would be plugged. This one survived the purge. fingers crossed

Nicolas HafnerGeometry Clipmaps Revisited - Gamedev

· 80 days ago

header
A while back I wrote a bit about my adventures with a game engine technique called Geometry Clipmaps and my difficulties in implementing them. After spending what was supposed to be a week wherein I create a game fixing and completing an implementation of the Clipmaps instead, I think I'm now ready to talk about it. I'll also try to answer the questions I posed in the previous entry.

Base Geometry

The clipmaps have a base resolution that defines the number of vertices used along an axis per level. We then construct a uniform grid of 1/4th of the vertices along each side. For instance, let's use a base resolution of 1024, so we construct a 256x256 grid. This grid is then used to form a 1024x1024 ring.

base ring

The level-0 ring uses four more instances of this grid to fill out the middle, but every other ring uses the exact same geometry, just scaled up by a power of two.

nested rings

This means we only need a single 256x256 vertex array to render everything. We can even reduce the number of draw calls to one by creating an additional vertex buffer that contains all the per-level grid offsets and level indices, and using instanced rendering. This additional buffer only requires ((12*l)+4)*3 elements, so with a typical 5 levels that would be a negligible 192 floats. This could be further reduced, but as it is it should be good enough for now.

Map Representation

Since each level is represented by 1024x1024 vertices, we need l number of 1024x1024 textures to represent the map data. We can make use of 2D array textures for this. You'll likely want multiple textures to represent all of your data (one for elevation, one for material blends, etc).

texture arrays

In the vertex shader of the clipmap we then look up the value of the height texture corresponding to the current vertex position.

layout (location = 0) in vec3 position;
layout (location = 1) in float level;
layout (location = 2) in vec2 offset;

uniform mat4 view_matrix;
uniform mat4 projection_matrix;
uniform sampler2DArray height_map;

void main(){
  float level_scale = pow(2.0, level);
  float n = textureSize(height_map, 0).x;
  vec2 map_pos = (position.xz + offset)/4;
  
  vec2 tex_off = (map_pos+0.5-1/(n+1))*n;
  float y = texelFetch(height_map, ivec3(tex_off, level), 0).r;
  
  vec2 world_2d = map_pos * level_scale;
  vec4 world = vec4(world_2d.x, y, world_2d.y, 1);
  gl_Position =  projection_matrix * view_matrix * world;
}

Here, position is the position of the vertex in the grid, level is the current level index, and offset is the offset of the grid from the ring's centre. For simplicity's sake we assume the grid vertexes span between -0.5 and +0.5 and thus the offsets between -1.5 and +1.5.

Since the lookup of the texture is precise, we can use the pixel-aligned texelFetch function. You're also going to want a scaling factor in there somewhere in order to make sure you get the height you want, since it has been normalised into unit floats.

You can use the same fetch procedure for any additional maps you might have to retrieve their values and pass them on to the fragment shader, or whatever you like.

Preparation Step

Once you export the full map data from your scene creator, you need to preprocess it into appropriate chunks before it can be used by the clipmaps. You need a directory of chunk files for each level of the clipmaps, with each chunk being of the clipmap resolution. So you start out with level 0, dice the map up and save the chunks to file. Then you subsequently scale the map down linearly by a factor of two and chunk it for the next level.

I recommend using a raw data format for the chunks that only contains the uncompressed pixel data without any header, so that loading it is efficient. Without compression or header you can simply mmap the file and pass the pointer to OpenGL to upload to textures.

Update Step

Finally you need to implement an update step that takes care of uploading new map data to the texture arrays. This is the most complicated part of the whole technique and it has gone through several iterations as I've worked on it. The current version is almost perfect. There's a further optimisation that could be done, but I haven't felt the need to implement it yet.

In any case, since the clipmap chunks and the clipmap region coincide in size, any particular clipmap region can be covered by at most four chunks. So, in order to display a particular region, we upload the sub-regions of the four chunks that make up the region to each level of the clipmap array texture. We can do this thanks to gl:pixel-store's unpack-row-length, which makes it possible to upload sub-regions of a source image.

Calculating the proper offsets isn't terribly exciting, but I've laid out the basic idea in the following image just for completeness' sake.

update regions

And with this finally implemented, the clipmaps are complete!

Or so one would think. Unfortunately the devil is in the details, and there's a lot of those.

Inter-Level Blending

Since each level above loses precision, the height values at the border between the levels is not going to match up exactly. In order to accommodate this, you have to blend smoothly between the two levels, gradually picking values from the higher level as you get closer to the edge.

At this point an attentive reader might notice that, since the vertex resolution is also not matching between two levels, the border is going to have a single edge of an outer level represented by two edges with a vertex in the middle of the inner level. Fortunately for us this is not an issue as long as we ensure that our clipmap textures use linear magnification interpolation. In that case, since the lines between vertices are linearly interpolated themselves, the lookup between two texels of the higher level is going to be interpolated in the same way, giving us a value that lands exactly on the middle point of the edge, just as we need it to be.

The blending factor itself can be calculated from the map_pos using some basic scaling and arithmetic.

vec2 alpha = (abs(map_pos)*2-0.5)*2;
alpha = clamp((alpha+BORDER_OFFSET-(1-BORDER_WIDTH))/BORDER_WIDTH, 0, 1);
float a = max(alpha.x, alpha.y);

The constants BORDER_OFFSET and BORDER_WIDTH allow you to easily designate how far from the border the blend should begin and how much you want to blend. I found 0.1 and 0.25 to be good values, though you might need to experiment depending on the resolution.

blending regions

At this point we simply perform another texture lookup in the outer level and mix the two together.

vec2 tex_off_o = (map_pos/2+0.5)-0.5/n;
float y_o = texture(height_map, vec3(tex_off_o, level+1)).r;
y = mix(y, y_o, a);

In this case we need to make use of the standard texture lookup, since we do want interpolation between texels. This also requires a slightly different texture coordinate calculation of course. You have to repeat this mixing procedure for every other map in order to smoothly transition between levels.

There's one more gotcha here, which is what happens at the outermost level. Since there's no higher level to blend with, we need to catch this. For this purpose we need a uniform that instructs us of the number of levels, and a check to see if we should blend or not.

layout (location = 0) in vec3 position;
layout (location = 1) in float level;
layout (location = 2) in vec2 offset;

uniform mat4 view_matrix;
uniform mat4 projection_matrix;
uniform sampler2DArray height_map;
uniform int levels;

void main(){
  float level_scale = pow(2.0, level);
  float n = textureSize(height_map, 0).x;
  vec2 map_pos = (position.xz + offset)/4;
  
  vec2 tex_off = (map_pos+0.5-1/(n+1))*n;
  float y = texelFetch(height_map, ivec3(tex_off, level), 0).r;
  
  if(level+1 < levels){
    vec2 tex_off_o = (map_pos/2+0.5)-0.5/n;
    float y_o = texture(height_map, vec3(tex_off_o, level+1)).r;
    y = mix(y, y_o, a);
  }
  
  vec2 world_2d = map_pos * level_scale;
  vec4 world = vec4(world_2d.x, y, world_2d.y, 1);
  gl_Position =  projection_matrix * view_matrix * world;
}

With this in place we should now have a smooth display of the entire clipmap.

As long as we don't move, that is.

Moving Updates

When the camera or player moves, the clipmaps need to be updated. We already have the update of the actual underlying elevation maps covered, but that's not good enough. Since the elevation maps are pixel-aligned, they are going to pop into place once the threshold has been crossed to load the next row of pixels in. Naturally this popping effect is very jarring and not acceptable.

Interestingly enough I don't recall any of the papers talking about this problem. I haven't re-read them to check, but I'm quite sure about it since I think I would have remembered.

In order to deal with this popping problem we have to physically move vertices by the remainder of the grid motion. As in, if our current level clipmap grid is unit-sized, then we need to shift the vertices in xz direction by mod(pos, 1.0) to compensate for the movement.

vec2 mov_off = mod(world_pos.xz, level_scale)/n;
vec2 world_2d = (map_pos * level_scale) - mov_off;

Naturally, we need to blend this between regions as well.

vec2 mov_off_o = mod(world_pos.xz, level_scale*2)/n;
mov_off = mix(mov_off, mov_off_o, a);

This will make the height transitions look buttery smooth, even in the face of continuous movement. It's almost perfect.

Adjusting Inter-Level Map Lookup for Moving Updates

Since we shift the physical vertices for the elevation, we don't need to change how the values are looked up in the texture. This is not the same for any other map, which might be used for material lookup or other purposes in the fragment shader. The texture coordinates for those maps needs to be adjusted.

vec3 tex_i = vec3(tex_off/n, level);
vec3 tex_o = vec3(tex_off_o-a/(2*n), level+1);

This is almost perfect, but unfortunately there's still a small bit of popping going on, namely when two adjacent levels are off-by-one in their displayed maps. This effect is only really visible on low-resolution clipmaps, but I'd still like to figure out how to fix it at some point. At the time of writing this I had played around trying to fix it for around an hour and then put it off for later.

Surface Normals

If you're going to use any kind of lighting engine, you're likely going to need surface normals for the clipmap. You could add another array texture that has this information baked in beforehand, but that would increase the amount of data that's being streamed quite a bit. It's easier and probably more efficient to just calculate the normals in the vertex shader.

float yu = texelFetch(height_map, ivec3(min(n-1, tex_off.x+1), tex_off.y, level), 0).r;
float yv = texelFetch(height_map, ivec3(tex_off.x, min(n-1, tex_off.y+1), level), 0).r;
vec3 normal = normalize(vec3(y-yu, 0.5, y-yv));

And of course we need to blend this as well.

float yu_o = texture(height_map, vec3(tex_off_o+vec2(1/n,0), level+1)).r;
float yv_o = texture(height_map, vec3(tex_off_o+vec2(0,1/n), level+1)).r;
yu = mix(yu, yu_o, a);
yv = mix(yv, yv_o, a);

Not too bad!

Concave Geometry

Since elevation maps can only represent convex geometry, concave geometry like caves, houses, and so forth need to be handled separately. In order to allow this, I attempted to use a simple geometry shader that punches holes into the terrain where the height value is zero.

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;
in vec3 world[];

void main(){
  if(world[0].y*world[1].y*world[2].y != 0){
    int i;
    for(i = 0;i < gl_in.length();i++){
      gl_Position = gl_in[i].gl_Position;
      EmitVertex();
    }
    EndPrimitive();
  }
}

With the holes punched this would allow rendering in parts of the map that would be concave in place of the hole. Unfortunately it turns out that geometry shaders are super slow and this tears my frame rate in half. There's the other option of using discard; in the fragment shader, but that has its issues too. At this point I'm not sure yet how I'll handle this case.

Besides, I don't have a system for general mesh streaming yet, so I'll probably think about this once I get to it. If you have ideas though, please do let me know.

Putting it All Together

Implementing this technique has been a long, arduous process. It has cost me multiple attempts and many hours of fiddling around and trying things out. Still, I'm quite happy to be able to say that I've got a technique implemented and working in Trial that is typically employed in big, commercial, triple-A titles.

You can find the current implementation of the clipmaps in the Trial repository. It's still making some shortcuts for me to test things, but I'm sure I'll get it to a usable, modular point soon enough. I hope that this entry has been helpful for people trying to understand this technique, and if not that, then I at least hope it has given people a bit of an appreciation for the work that has gone into it.

For now, here's a short in-engine rendering of a basic terrain region I generated using WorldCreator. It's rendered live using five levels of clipmaps with a base resolution of 1024x1024. Rendering is done using a very, very primitive phong renderer with solid colour mixing. No textures or anything fancy is applied, so it looks rather basic. The map is also not scaled quite right, so things look a bit squashed. My apologies for that.

Finally I'd like to apologise for the lack of Lisp in this entry! After all, all the code snippets are GLSL code. I hope that you'll at least trust me when I say that everything except for the shaders was indeed implemented in Lisp.

Hans HübnerUsing the VMS PATCH utility to fix a VAX LISP issue

· 87 days ago
In my previous post about VAX LISP, I complained about a misfeature of the editor that prevented binding editor functions to the Ctrl-S and Ctrl-Q keys.  In the manual, it is stated that these keys cannot be bound due to "system restrictions". It appears, though, that it is a measure to prevent users from footing themselves in the foot because Ctrl-S and Ctrl-Q could be sent by terminals to implement software flow control.  Software flow control is optional, though; it can be switched on and off for each terminal line and it is not needed for virtual terminal connections (Telnet, DECNET) as flow control is implemented by the network layer and networked terminal sessions usually have enough buffer space.  Thus, reserving Ctrl-S and Ctrl-Q really only makes sense for asynchronous terminal lines that require software flow control, and there is no reason to reserve these characters for all terminal sessions, regardless of technology.
I am used to having Ctrl-S and Ctrl-Q be bound to certain functions from GNU Emacs, and I want to use the same bindings with VAX LISP as they've become part of my muscle memory.  Thus, I wanted to remove the explicit checks for these keys in the EDITOR::BIND-COMMAND and BIND-KEYBOARD-FUNCTION functions.  Lacking the source code to VAX LISP, I could not just remove the checks and recompile, so I needed a different way.
This is where the VMS PATCH utility comes into play.  It is a standard tool that is used to modify program executables, usually to remove bugs without requiring a full reinstallation of the software package that needs to be fixed.

Finding what to patch

In order to remove the checks for Ctrl-S/Q in EDITOR::BIND-COMMAND and BIND-KEYBOARD-FUNCTION, I disassembled the two function using the standard VAX LISP function DISASSEMBLE.  VAX assembler code is easy to read, so it was rather straightforward to find out how the checks work.  Here is the relevant extract of the disassembly output from the VAX LISP BIND-KEYBOARD-FUNCTION function:
000FC  MOVQ   BIND, -(STACK)
000FF  MOVL   -10(FRAME), -(STACK)
00103  MOVZBL #89, -(STACK)
00107  MOVZBL #99, -(STACK)
0010B  MOVAB  18(STACK), FRAME
0010F  MOVL   'CHAR/=, LEX
00113  MOVL   4(LEX), FUNC
00117  JMP    @4(FUNC)
0011A  CMPL   RES, SV
0011D  BEQL   121
0011F  BRB    145
00121  MOVL   FRAME, -(STACK)
00124  MOVL   #A28, -(STACK)
0012B  MOVQ   BIND, -(STACK)
0012E  MOVL   '"The character argument must be a control character (other than~%~
            CTRL/S or CTRL/Q): ~S", -(STACK)
Apparently, #89 and #99 refer to Ctrl-S and Ctrl-Q, and the CHAR/= function is called to check whether the argument is one of those characters.
The easiest way to get rid of the check seemed to me to replace the BEQL 121 instruction, which jumps to printing the error message and returning from the function, to BRB 145, which is the jump to the rest of the function, which performs the actual binding.

Locating the code

To make the actual modification using the PATCH utility, I now needed to know where in the executable file these instructions can be found - the addresses in the DISASSEMBLE output are only relative to the start of the function, and the LISP.EXE executable file does not include any symbols.  I thus picked out a few instructions that seemed characteristic and assembled them to machine code using the VMS macroassembler.  Then, I searched for this instruction sequence in LISP.EXE using a trivial perl program, which gave me the offset of the function that I needed to update.
Knowing the offsets of the locations in the binary file that I needed to change, I could now go about and use the PATCH utility in interactive mode to devise the actual patch.  The PATCH manual suggests that the VMS standard debugger would be used for patch development, but as I did not have the symbol table for LISP.EXE, using PATCH itself for development was more appropriate as it can work in /ABSOLUTE mode which uses file offsets for loctions.
First, I needed to find and disassemble the code using the locations I had devised.  As I only had the offset of the MOVZBL #89, -(STACK) instruction and VAX instructions have variable lengths, I needed to probe a bunch of addresses before I could come up with this:
PATCH>e/i 00403E68:00403E9A
00403E68:  MOVQ    R10,-(R7)
00403E6B:  MOVL    B^0F0(R8),-(R7)
00403E6F:  MOVZBL  #089,-(R7)
00403E73:  MOVZBL  #099,-(R7)
00403E77:  MOVAB   B^18(R7),R8
00403E7B:  MOVL    B^30(R11),R9
00403E7F:  MOVL    B^04(R9),R11
00403E83:  JMP     @B^04(R11)
00403E86:  CMPL    R6,AP
00403E89:  BEQL    00403E8D
00403E8B:  BRB     00403EB1
00403E8D:  MOVL    R8,-(R7)
00403E90:  MOVL    #00000A28,-(R7)
00403E97:  MOVQ    R10,-(R7)
00403E9A:  MOVL    B^38(R11),-(R7)
This is exactly the same instruction sequence that I disassembled using VAX LISP above. As you can see, it is significantly harder to read the disassembly produced by PATCH as it does not show symbolic register names, and it also does not translate data pointers like in the last instruction - VAX LISP showed the error message where PATCH prints B^38(R11).  The addresses in the disassembly, however, are absolute offsets into the LISP.EXE file, and that was what I needed.

Creating the actual patch

Equipped with this information, I could create the actual patch to change VAX LISP so that it no longer disallows binding of Ctrl-S and Ctrl-Q, which I reproduce below:
$ PATCH/ABSOLUTE LISP$SYSTEM:LISP.EXE/OUTPUT=LISP$SYSTEM:
REPLACE /INSTRUCTION
^X00403E5A
'BLEQ ^X00403E5E'
EXIT
'BLEQ ^X00403EB1'
EXIT
REPLACE /INSTRUCTION
^X002F3246
'BEQL ^X002F324A'
EXIT
'BRB ^X002F3265'
EXIT
UPDATE
EXIT
I used the REPLACE/INSTRUCTION command which requires the specification of an address, the current instruction(s) at the location and the new instruction(s) to put at the location.  PATCH would only perform the replacement if the instructions currently in the file match what is specified, otherwise it would abort with an error.  This is useful to prevent the patch from patching the wrong file or the wrong version of a file.

Better patches

As I did not have the source code or the symbol table for LISP.EXE when I developed this patch, I needed to resort to patching in absolute mode which is kind of brittle.  PATCH can also work in symbolic mode if a symbol table is available for the executable file, which allows patches to work even if the executable has been linked differently than the executable used to develop the patch (i.e. you could write a patch for a library function that'd work for executables that were linked statically against the library).  Also, PATCH allows patches to insert more instructions than were originally present in the executable.  It does this by adding additional disk space to the executable to hold the new, longer code, and then patch the required jump instructions to the patch area into the original location.

Closing thoughts

It is quite nice to be able to fix a bug in or remove a feature from a commercial program without having the source code, just by working with a bit of disassembly and on-board utilities. Back in the day, it was also useful to be able to distribute fixes to customers in the form of tiny patches. These could even be sent as a printed letter and typed in by the customer on a terminal, with reasonable effort and good chances of success.

Hans HübnerEvaluating VAX LISP 3.1

· 90 days ago

The back story

Back in the day, when there still existed a number of different computer architectures and operating systems, I was a huge fan of the VAX/VMS operating system. VMS was designed together with the VAX architecture, and it used specific mechanisms offered by the processor to implement its I/O, security and other services. As it was still common at the end of the 1970ies, portability was not a design objective, and computer companies often had both hardware and software teams working together to implement computing solutions, from hardware to operating systems, development environments and applications.
I became a VMS fan mostly for three reasons: VMS machines were popular in the early packet switched networks that I could access, their default installation often left accounts with default passwords open, and VMS has an excellent online help system which helped me to learn and program VMS without having access to printed manuals.
I also liked the VAX for its orthogonal architecture, which made it somewhat similar to my favorite 8 bit microprocessor at the time, the Zilog Z80.  Back then, being able to program a machine in assembly language was still somewhat desirable, and an orthogonal instruction set with many general purpose registers and fancy addressing modes allowing complex programs to be written in assembly language appealed to me, too.
Frankly, though, I never wrote a lot of assembly language code. Instead, C quickly became my language of choice even on VMS, and when g++ 1.42 became available on VMS, I mostly became a full  time C++ programmer.  This was when object oriented programming just became the fad, with magazines like the C/C++ Users Journal and the Journal of Object Oriented Programming being the prime sources of inspiration. Around that time, I read Grady Booch's excellent book Object-Oriented Analysis and Design with Applications, which explained OO programming using examples in a variety of then-popular OO languages like Smalltalk, C++ and Object Pascal. It also had a chapter on using CLOS as the example language. To me, CLOS was completely out of reach and locked in the ivory tower, requiring utterly expensive machines and software licenses that I'd never hope to have access to, so I read the chapter more as science fiction than as something of practical relevance.
Thus, I stuck with C++ and later Perl on VMS, which was amusing enough, also for being a rather exotic blend.

Around 2001, I was introduced to Lisp by the way of VMS. At the time, my programming gigs were all on Unix or Windows and my affection to VMS became part of my hobby as a collector of vintage computers.
At some point, I brought a MicroVAX with me to a vintage computer convention, and a friend had his MacIvory Symbolics Lisp Machine on display. As the two systems were from the same era, we thought that we should try to bring up some network between them.  My VAX ran a fairly recent version of VMS, though, and when we tried to establish a DECNET connection, the Lisp machine would signal an error and put us into a debugger. What got me hooked on Lisp was that my friend could display the source code of the failing DECNET device driver, figure out that the problem was the too-new DECNET version number that the VAX had reported in the connection setup packet, fix the Lisp code in the DECNET driver of the Lisp Machine and then continue the file transfer operation that had signaled the problem.  I certainly understood that this would theoretically be possible back then, but I had never seen a system that would have that level of integration of operating system and programming language. I was intrigued.
Fortunately, machines had become much cheaper and faster in the decade since I had first read about CLOS, and the open source movement had made Unix and Common Lisp affordable. I went for CMUCL and eventually became a full time Common Lisp developer for several years.

The holy combination

I had been looking for VAX LISP for some time, as LISP and VMS sounded like a blend of very likeable technologies, so I was very happy when someone found the installation tapes and made them available through the comp.os.vms news group in 2017.  I played around with it a little, but unfortunately the manuals that were publicly available at the time were only for VAX LISP V2.1, and V3.1 was supposedly different enough to make me want the updated manual set.  It took a bit of research and emailing until I got in touch with the former architect of VAX LISP at digital equipment, and he was kind enough to borrow me the complete manual set for version 3.0, which I scanned and OCRed so that it is now available on bitsavers.  Thus, i felt equipped to give VAX LISP a test drive as a vacation project.

About VAX LISP

VAX LISP is an implementation of Common Lisp, but it implements it as described in the first version of "Common Lisp the Language" (CLtL1). This means that it feels like Common Lisp as it was standardized nine years later, but with several important features missing. Among these missing features are:
  • LOOP, the general purpose, well-hated macro for iteration.  You get simple LOOP, DOLIST, DOTIMES and of course the dreadful DO and DO* couple.
  • The Common Lisp Object System (CLOS), although there is a version of PCL coming with VAX LISP in the examples directory.  There also is a version of the Flavors object system included.
  • The condition system. VAX LISP allows you to bind the *UNIVERSAL-ERROR-HANDLER* special variable to a function that'd be called when ERROR, CERROR or WARN would be called and that could take a non-local jump using THROW.
  • DESTRUCTURING-BIND and lambda lists with &KEY, &REST etc.  Lambda lists like we know them today are only available in DEFUN
Furthermore, the package system is not yet where it is today, there are no generalized sequences and there are probably a several other bits and pieces that are missing.

Notable features of VAX LISP are:
  • Integrated, user-extensible editor with both EDT (VMS's standard editor) and Emacs emulation modes, both with a terminal and a DECwindows backend. There is extensive documentation on programming the editor in LISP and I've added several features myself, which was a joy.
  • Interface to the standard DECwindows widget set. This required the implementation of a multithreading system so that DECwindows code, written in C, could be called from LISP and DECwindows handlers could be written in LISP and called from C.
  • A foreign function interface implementing the VMS procedure calling standard, making it possible to call VMS system services and library routines as well as code written in other VMS languages.
Additionally, it comes with a good bunch of example code that is quite helpful when trying to figure out how to use the language, although the quality of the samples vary.

A Project to Try It Out

I wanted to get a feel of how programming in VAX LISP works, so I decided to write a simple http server in it.  This would give me the opportunity to try out the system interface as I would have to call into the TCP/IP subsystem.  As I wanted to evaluate VAX LISP during a vacation trip, I used SIMH on my MacBook instead of a real VAX, running VMS V5.5-2.

The Development Experience

Initially, I had hoped that I could use the graphical version of VAX LISP for development, but I did not manage to set up a fully functional DECwindows workstation environment and fell back to the terminal version.  The downside of this is that I had to alternate between working in the editor and in the REPL, as the editor does not have a convenient way to evaluate expressions and display the results built-in.  It would probably not be very hard to add that, but I wanted to spend most of my time working on the web server.

The REPL

The terminal REPL of VAX LISP does not have a lot of convenience features.  LISP expressions are entered using the standard VMS line editor, which means that no completion is available and all you get is a way to recall the previously entered line.  Annoyingly, if a line is longer than whatever your current terminal width is, there is no way to edit text before the line break.  Obviously, no work has been put into making the terminal REPL nice.

The only notable convenience is that one can bind keys (ASCII codes, really) to call LISP functions, and one use case suggested in the manual is to bind, say, Ctrl-E to call the 'ED function.  That way, switching to the editor from the REPL is very quick and I use that feature all the time.  Beyond that, the REPL experience is rather bare-bones and only slightly better than, say, a bare SBCL or Clozure CL REPL because of VMS's standard line-editing facility, which gives you a one-line recall buffer.

The Editor

VAX LISP comes with an editor that is written in LISP itself and that is extensively documented so that users can extend and adapt it as desired.  By default, it tries to emulate EDT, which is the original VMS editor, but EDT is rather quirky and pretty much unusable without a DEC keyboard as it relies on certain function keys that are not present on modern keyboard layouts.  The VAX LISP editor also supports an Emacs compatibility mode, because even back in the late 1980ies, most LISP programmers would be used to Emacs and not be willing to use another editor.  In Emacs emulation mode, I found it rather easy to write and edit code and I even did some substantial refactorings of my http server code base with it without too much grief.  I spent a bit of time of adding a few features that I am used to from GNU Emacs.  Writing editor extensions in the same language and environment as the actual project was rather joyful.
The documentation to the editor is extensive, but it leaves a number of questions unanswered.  I also find its organization a bit odd, with a whole chapter on editor concepts that comes after all the nitty gritty details of editor customization explanations as separate "Part II" of the manual.  I'd expect the description of editor concepts to be coming first, before applying the concepts would be explained.  Also, the explanation how editor major and minor modes work is rather fuzzy.
One annoying misfeature of the editor is that it explicitly disallows binding Ctrl-S and Ctrl-Q, and the manual claims that this is due to "system restrictions".  While it is true that Ctrl-S and Ctrl-Q were used to implement software flow control on terminal connections that lacked hardware flow control, software flow control is an optional feature of the VMS terminal driver and can easily be switched off if out-of-band flow control is available for a given terminal connection.  This would be easy to fix if one hard the VAX LISP source code, but that seems to be lost, unfortunately.
Maybe not so unexpectedly for an editor written in the 1980ies, the VAX LISP editor does not have an "undo" facility.  This dampened my enthusiasm when I started writing code, but in practice, it has not hurt me too much.  When I made larger changes, I saved my buffers beforehand, and as the VMS file system automatically keeps old versions of files until they're purged, I could go back to the previous state if I wanted to roll back.

VMS and the Internet Protocols

My romanticized memories of VMS date back from a time when there was no Internet, and the Internet Protocols (IP) were just one set of protocols that were used in global networking.  DEC had its own protocol suite, DECNET, and was committed to making DECNET OSI compatible, as they believed that OSI would become the dominant wide-area networking protocol suite of the future.  DECNET was integrated into VMS quite nicely, and it was easy to implement peer to peer connections from any language, even from command line scripts (which enabled the creation of early networked malware like WANK).  Implementations of other networking protocols like RSCS (for BITNET/EARN) and IP (for Arpanet) were available for VMS, but DEC just treated them as secondary networking systems that did not deserve a lot of attention.
To implement my http server in VAX LISP, I needed an implementation of IP, so I installed "DEC TCP/IP Services for VMS" (traditionally referred to as UCX, "VMS/Ultrix Connection") as an add-on product to my base VMS installation.  UCX implements a standard set of IP clients and servers like FTP, Telnet, NFS etc. and supports two programming interfaces, one similar to the BSD socket calls meant to be used from programs written in C, and another one based on VMS system services, meant for programs written in other languages.  Boldly, I decided that I wanted to use the System Services interface, also in the hope that it would be as nice and convenient as the DECNET programming interface.  I was quite wrong.
Apparently, the System Services interface to UCX is a layer on top of the socket interface, so you get the "best" of both worlds:  BSD sockets with its quirky sequence of required calls to socket(), listen() and accept(), asking you to specify the length of the "listen queue" and to set the "reuse address" option in your server code because, well, because that is how it wants its chicken to be waved to work well.  And then the beautiful VMS $QIO interface with its opaque "function code" and the parameters named P1 to P6 which would be used differently based on the function code supplied.
DEC, in its lack of enthusiasm for supporting IP on VMS, really made sure that the combination of $QIO and sockets would be as ugly as possible.

So, the required sequence to listen to incoming TCP connections and then accept them requires the program to:
  • Assign a channel to the UCX network device
  • Set the channel to be a TCP/IP socket
  • Enable the REUSEADDR option, which is already inconvenient in C with the setsockopt interface, but with $QIO you have to create an additional descriptor to point to the socket option
  • Set the local address of the socket, again using an additional descriptor to point to the local address
  • Set the socket to listen, using another $QIO invocation, now passing the maximum queue length
These steps directly correspond to what you'd have to do when using the BSD socket interface, only that with the System Services interface, you need to call $QIO all the time.  What is actually done underneath is then determined by which of the P1 to P6 parameters you set and how, so if you call $QIO with the IO$_SETMODE function code and specify a non-zero P5 parameter, you are calling setsockopt() and if you specify a non-zero P3 parameter, you're calling listen() and so on.
Of course, there are few things that can't be syntactically improved with a bunch of LISP macros and in my http server, the whole sequence does not look so bad, but seriously:  This is one of the worst APIs that I have seen in my life and the experience put quite a damper on my romanticized view on VMS and DEC in general.

Serving Files

A http server commonly serves content from files, so I wanted mine to do so as well so that it could deliver a HTML page and some additional resources to the client.  On VMS, files are not the simple byte streams that we think about when talking about files nowadays.  Rather, the Record Managment System (RMS) maintains the file structure for applications and supports various file organizations like line-sequential and indexed with fixed and varying record lengths etc.  It is still possible to read the raw data stored in a file, but that will include the bookkeeping information of RMS.  This means that the http server also needs to know something about the structure of the file being served and possibly perform some conversion instead of just forwarding the file data to the client directly.

Binary Files

Serving binary files like images requires the http server to just read the files in a block-wise fashion, but as VAX LISP does not support block-wise reading, this needs to be done with direct calls to RMS.  This gave me an opportunity to learn working with RMS, as its API is quite different from, say, the POSIX file API with its simple open()/read()/write()/close() interface.
As RMS supports different file organizations, there are many parameters that an application program needs to specify when it wants to operate on a file.  Instead of putting all these different parameters into many functions, RMS uses in-memory structures (blocks) that describe the operations and parameters.  These blocks could be defined using the facilities of the application programming language (i.e. struct definitions and numeric constant macros in C), or they could be described using the File Definition Language (FDL).  FDL is a domain specific language that describes file and record structures.  Applications could invoke the FDL interpreter to turn textual file structure descriptions into binary RMS blocks that it could then use for accessing files.
As a result, the RMS API entry points only require specifying pointers to the blocks that describe the operation, which makes the call sites look rather clean.  As, the entry points themselves clearly describe the operation, this makes the LISP code to serve a binary file using RMS calls rather clean-looking.

Text Files

Serving text files like HTML or CSS files is rather more challenging in VAX LISP, considering that the on-disk data of RMS text files cannot simply be copied to the HTTP client.  This is because, depending on the file's RMS record format, text files are stored with the length of each line stored in the file in binary form before the actual characters in the line.  For that reason, I chose to use the text file I/O mechanism of VAX LISP to read and provide for some explicit buffering to avoid writing each line to the client in a separate $QIO call.  I am not sure whether that strategy really provides performance benefits or whether UCX's own buffering would be as good.  The resulting code does not really look terribly complex, though.

Roundup

In the end, I've now implemented a basic http server in VAX LISP, called rasselbock.  The development experience was quite good, and I found it to be fun to write some code in a Lisp that felt like the Common Lisp I've been using professionally for several years, but that lacks several of the features that I took for granted.  I think that in the end, VAX LISP provides enough tooling for serious programming work.  It was also an interesting experience to write some code in a language for which the Internet offers no support at all.  All I had was the manuals, some sample code and DESCRIBE, and that was sufficient to create something which (kind of) works.

My vacation comes to an end so I'm going to stop at this point for now.  In the future, I'm going to put rasselbock onto a real VAX, and I also want to spend some time working with the DECwindows version of VAX LISP as that will probably be a much nicer development experience.  Furthermore, I would like to try converting rasselbock to run asynchronously, as that one of the cool things in VMS is that all its system services can be used asynchronously.

I need to thank Walter van Roggen who was kind enough to borrow me the VAX LISP V3.0 manual set for scanning, and Michael Kukat who parted with his remaining VAX hardware so that I have a real machine to run VAX LISP and rasselbock on.  I'd also like to thank Richard Wells who pointed me at the vectorized version of the digital logo which led me to become a producer of t-shirts.

If you made it this far, you might also be interested in my follow-on post, in which I describe how I patched the editor so that I could bind Ctrl-S and Ctrl-Q.

Christophe Rhodesa trace of riscv success

· 92 days ago

(Don't get too excited by the "success" in the title of this post.)

A brief SBCL RISC-V update this time, as there have been no train journeys since the last blog post. Most of what has been achieved is trivial definitions and rearrangements. We've defined some more of the MOVE Virtual Operations that the compiler needs to be able to move quantities between different storage classes (from immediate to descriptor-reg, or between tagged fixnums in descriptor-reg to untagged fixnums in signed-reg, for example).

We've defined a debugging printer function, which wouldn't be needed at this stage except that we're asking the compiler to print out substantial parts of its internal data structures. We've reorganized, and not in a desperately good way, the definitions of the registers and their storage bases and classes; this area of code is "smelly", in that various arguments aren't evaluate when maybe they should be, which means that some other structures need to be defined at compile-time so that variables can be defined at compile-time so that values (in the next form) can be evaluated at read-time; it's all a bit horrible, and as Paul said, not friendly to interactive development. And then a few more fairly trivial definitions later, we are at the point where we have a first trace file from compiling our very simple scratch.lisp file. Hooray! Here's the assembly code of our foo function:

L10:

VOP XEP-ALLOCATE-FRAME {#<SB!ASSEM:LABEL 1>} 
L1:

VOP EMIT-LABEL {#<SB!ASSEM:LABEL 2>} 
L2:

VOP EMIT-LABEL {#<SB!ASSEM:LABEL 3>} 
L3:

VOP NOTE-ENVIRONMENT-START {#<SB!ASSEM:LABEL 4>} 
L4:

L11:
L5:

VOP TYPE-CHECK-ERROR/C #:G0!4[A0] :DEBUG-ENVIRONMENT
    {SB!KERNEL:OBJECT-NOT-SIMPLE-ARRAY-FIXNUM-ERROR X}

L12:
    byte    50
    byte    40
    .align  2

L6:

VOP EMIT-LABEL {#<SB!ASSEM:LABEL 7>} 
L7:

VOP EMIT-LABEL {#<SB!ASSEM:LABEL 8>} 
L8:

VOP NOTE-ENVIRONMENT-START {#<SB!ASSEM:LABEL 9>} 
L9:

L13:
L14:
L15:

I hope that everyone can agree that this is a flawless gem of a function.

The next step is probably to start actually defining and using RISC-V instructions, so... stay tuned!

Christophe Rhodesfirst riscy steps

· 98 days ago

In our first post, I summarized the work of a few evenings of heat and a few hours on a train, being basically the boilerplate (parenthesis-infested boilerplate, but boilerplate nonetheless) necessary to get an SBCL cross-compiler built. At that point, it was possible to start running make-host-2.sh, attempting to use the cross-compiler to build the Lisp sources to go into the initial Lisp image for the new, target SBCL. Of course, given that all I had done was create boilerplate, running the cross-compiler was unlikely to end well: and in fact it barely started well, giving an immediate error of functions not being defined.

Well, functions not being defined is a fairly normal state of affairs (fans of static type systems look away now!), and easily fixed: we simply define them until they're not undefined any more, ideally with sensible rather than nonsensical contents. That additional constraint, though, gives us our first opportunity to indulge in some actual thought about our target platform.

The routines we have to define include things like functions to return Temporary Names (registers or stack locations) for such things as the lisp return address, the old frame pointer, and the number of arguments to a function: these are things that will be a part of the lisp calling convention, which may or may not bear much relation to the "native" C calling convention. On most SBCL platforms, it doesn't have much in common, with Lisp execution effectively being one large complicated subroutine when viewed from the prism of the C world; the Lisp execution uses a separate stack, which allows us to be sure about what on the stack is a Lisp object and what might not be (making garbage collectors precise on those platforms).

In the nascent RISC-V port, in common with other 32-register ports, these values of interest will be stored in registers, so we need to think about mapping these concepts to the platform registers, documentation of which can be found at the RISC-V site. In particular, let's have a look at logical page 109 (physical page 121) of the user-level RISC-V ISA specification v2.2:

Register ABI Name Description Saver
x0 zero Hard-wired zero -
x1 ra Return address Caller
x2 sp Stack pointer Callee
x3 gp Global pointer -
x4 tp Thread pointer -
x5 t0 Temporary/Alternate link register Caller
x6-7 t1-2 Temporaries Caller
x8 s0/fp Saved register/Frame pointer Callee
x9 s1 Saved register Callee
x10-11 a0-1 Function arguments/return values Caller
x12-17 a2-7 Function arguments Caller
x18-27 s2-11 Saved registers Callee
x28-31 t3-6 Temporaries Caller

The table (and the rest of the document) tells us that x0 is a wired zero: no matter what is written to it, reading from it will always return 0. That's a useful thing to know; we can now define a storage class for zero specifically, and define the zero register at offset 0.

The platform ABI uses x1 for a link register (storing the return address for a subroutine). We may or may not want to use this for our own return address; I'm not yet sure if we will be able to. For now, though, we'll keep it reserved for the platform return address, defining it as register lr in SBCL nomenclature, and use x5 for our own lisp return address register. Why x5? The ISA definition for the jalr instruction (Jump And Link Register, logical page 16) informs us that return address prediction hardware should manipulate the prediction stack when either x1 or x5 is used as the link register with jalr, and not when any other register is used; that's what "Alternate link register" means in the above table. We'll see if this choice for our lra register, for Spectre or for worse, gives us any kind of speed boost - or even if it is tenable at all.

We can define the Number Stack Pointer and Number Frame Pointer registers, even though we don't need them yet: in SBCL parlance, the "number" stack is the C stack, and our register table says that those should be x2 and x8 respectively. Next, for our own purposes we will need: Lisp stack, frame and old-frame pointer registers; some function argument registers; and some non-descriptor temporaries. We'll put those semi-arbitrarily in the temporaries and function arguments region of the registers; the only bit of forethought here is to alternate our Lisp arguments and non-descriptor temporaries with one eye on the "C" RISC-V extension for compressed instructions: if an instruction uses registers x8-x15 and the "C" extension is implemented on a particular RISC-V board, the instruction might be encodable in two bytes instead of four, hopefully reducing instruction loading and decoding time and instruction cache pressure. (This might also be overthinking things at this stage). In the spirit of not overthinking, let's just put nargs on x31 for now.

With all these decisions implemented, we hit our first "real" error! Remembering that we are attempting to compile a very simple file, whose only actual Lisp code involves returning an element from a specialized array, it's great to hit this error because it seems to signify that we are almost ready. The cross-compiler is built knowing that some functions (usually as a result of code transformations) should never be compiled as calls, but should instead be /translated/ using one of our Virtual OPerations. data-vector-ref, which the aref in our scratch.lisp file translates to because of the type declarations, is one such function, so we need to define it: and also define it in such a way that it will be used no matter what the current compiler policy, hence the :policy :fast-safe addition to the reffer macro. (That, and many other such :policy declarations, are dotted around everywhere in the other backends; I may come to regret having taken them out in the initial boilerplate exercise.)

And then, finally, because there's always an excuse for a good cleanup: those float array VOPs, admittedly without :generator bodies, look too similar: let's have a define-float-vector-frobs macro to tidy things up. (They may not remain that tidy when we come to actually implement them). At this point, we have successfully compiled one Virtual OPeration! I know this, because the next error from running the build with all this is about call-named, not data-vector-set. Onward!

Didier VernaQuickref open-sourced

· 99 days ago

8 months after the original announcement, we finally open-sourced Quickref. The complete source code is available here.

I gave a lightning talk at ELS 2018, in which I claimed I would release the code in "something like two weeks". That was 4 months ago :-). I apologize for the delay, especially to Zach who expressed interest in how we did the Docker thing at the time. I wanted to clean up the code before the first public release (I'm a maniac like that), which I did, but it took some time.

Compared to what I announced at ELS, Quickref also has a couple of new features, most importantly the ability to generate documentation for only what's already installed, rather than for the whole Quicklisp world. This can be very convenient if you just want some local documentation for the things you use daily. Finally, there is also a very rudimentary support for multithreading, which currently doesn't bring much. But the code has been prepared for going further in that direction; this will be the topic of another internship which will start next September.

Browse no less than 1588 reference manuals right now at https://quickref.common-lisp.net/, and recreate your own local version with this Docker 2-liner:

docker run --name quickref quickref/quickref
docker cp quickref:/home/quickref/quickref .

Christophe Rhodesbeginning an sbcl port

· 102 days ago

I'm on a train! (Disclaimer: I'm not on a train any more.)

What better thing to do, during the hot, hot July days, thank something semi-mechanical involving very little brain power? With student-facing work briefly calming down (we've just about weathered the storm of appeals and complaints following the publication of exam results), and the all-too-brief Summer holidays approaching, I caught myself thinking that it wasn't fair that Doug and his team should have all the fun of porting SBCL to a new architecture; I want a go, too! (The fact that I have no time notwithstanding.)

But, to what? 64-bit powerpc is definitely on the move; I've missed that boat. SBCL already supports most current architectures, and "supports" a fair number of obsolete ones too. One thought that did catch my brain, though, was prompted by the recent publication by ARM of a set of claims purporting to demonstrate the dangers of the RISC-V architecture, in comparison to ARM's own. Well, since SBCL already runs on 32-bit and 64-bit ARM platforms, how hard could porting to RISC-V be?

I don't know the answer to that yet! This first post merely covers the straightforward - some might even say "tedious" - architecture-neutral changes required to get SBCL to the point that it could start about considering compiling code for a new backend.

The SBCL build has roughly seven phases (depending a bit on how you count):

  1. build configuration;
  2. build the cross-compiler using the host Lisp, generating target-specific header files;
  3. build the runtime and garbage collector using a C compiler and platform assembler, generating an executable;
  4. build the target compiler using the cross-compiler, generating target-specific fasl files;
  5. build the initial ("cold") image, using the genesis program to simulate the effect of loading fasl files, generating a target Lisp memory image;
  6. run all the code that stage 5 delayed because it couldn't simulate the effects of loading fasls (e.g. many side-effectful top-level forms);
  7. build everything else, primarily PCL (a full implementation CLOS) and save the resulting image.

1. Define a keyword for a new backend architecture

Probably the most straightforward thing to do, and something that allows me to (pretty much) say that we're one seventh of the way there: defining a new keyword for a new backend architecture is almost as simple as adding a line or two to make-config.sh. Almost. We run some devious dastardly consistency checks on our target *features*, and those need to be updated too. This gets make-config.sh running, and allows make-host-1.sh to run its consistency checks.

2. Construct minimal backend-specific files

Phase 2, building the cross-compiler, sounds like something straightforward: after all, if we don't have the constraint that the cross-compiler will do anything useful, not even run, then surely just producing minimal files where the build system expects to find them will do, and Lisp's usual late-binding nature will allow us to get away with the rest. No?

Well, that could have been the case, but SBCL itself does impose some constraints, and the minimal files are a lot less minimal than you might think. The translation of IR2 (second intermediate representation) to machine code involves calling a number of VOPs (virtual operations) by name: and those calls by name perform compile-time checking that the virtual operation template name already exists. So we have to define stub VOPs for all those virtual operations called by name, including those in optimizing vector initialisation, for all of our specialised vector types. These minimal definitions do nothing other than pacify the safety checks in the cross-compiler code.

Phase 2 also generates a number of header files used in the compilation of the C-based runtime, propagating constant and object layout definitions to the garbage collector and other support routines. We won't worry about that for now; we just have to ignore an error about this at first-genesis time for a while. And with this, make-host-1.sh runs to completion.

4. Build the target compiler using the cross-compiler

At this point, we have a compiler backend that compiles. It almost certainly doesn't run, and even when it does run, we will want to ask it to compile simple forms of our choosing. For now, we just add a new file to the start of the build order, and put in a very simple piece of code, to see how we get on. (Spoiler alert: not very well). We've added the :trace-file flag so that the cross-compiler will print information about its compilation, which will be useful as and when we get as far as actually compiling things.

The next steps, moving beyond the start of step 4, will involve making decisions about how we are going to support sbcl on RISC-V. As yet, we have not made any decisions about register function (some of those, such as the C stack and frame pointer, will be dictated by the platform ABI, but many won't, such as exactly how we will partition the register set), or about how the stack works (again, the C stack will be largely determined, but Lisp will have its own stack, kept separate for ease of implementating garbage collection).

(Observant readers will note that phase 3 is completely unstarted. It's not necessary yet, though it does need to be completed for phase 4 to run to completion, as some of the cross-compilation depends on information about the runtime memory layout. Some of the details in phase 3 depend on the decisions made in phase 4, though, so that's why I'm doing things in this order, and not, say, because I don't have a RISC-V machine to run anything on -- it was pleasantly straightforward to bring up Fedora under QEMU which should be fine as a place to get started.)

And this concludes the initial SBCL porting work that can be done with very little brain. I do not know where, if anywhere, this effort will go; I am notionally on holiday, and it is also hot: two factors which suggest that work on this, if any, will be accomplished slowly. I'd be very happy to collaborate, if anyone else is interested, or else plod along in my own slow way, writing things up as I go.

François-René RideauMaking ASDF more magic by making it less magic

· 106 days ago

This short essay describes some challenges I leave to the next maintainers of ASDF, the Common Lisp build system, related to the "magic" involved in bootstrapping ASDF. If you dare to read further, grab a chair and your favorite headache-fighting brewage.

ASDF is a build system for Common Lisp. In the spirit of the original Lisp DEFSYSTEM, it compiles and loads software (mostly but not exclusively Common Lisp code) in the same image as it's running. Indeed, old time Lisp machines and Lisp systems did everything in the same world, in the same (memory) image, the same address space, the same (Unix-style) process — whatever you name it. This notably distinguishes it from traditional Unix build, which happens in multiple processes each with its own address space. The upside of the Lisp way is extremely low overhead, which allows for greater speed on olden single-processor machines, but also richer communication between build phases (especially for error reporting and handling), interactive debugging, and more. The upside of the Unix way is greater parallelizability, which allows for greater speed on newer multi-processor machines, but also fewer interactions between build phases, which makes determinism and distributed computation easier. The upside of the Lisp way is still unduly under-appreciated by those ignorant of Lisp and other image-based systems (such as Smalltalk). The Lisp way feels old because it is old; it could be updated to integrate the benefits of the Unix way, possibly using language-based purity and effect control in addition to low-level protection; but that will probably happen with Racket rather than Common Lisp.

One notable way that ASDF is magic is in its support for building itself — i.e. compiling and/or loading a new version of itself, in the very same Lisp image that is driving the build, replacing itself in the process, while it is running. This "hot upgrade" isn't an idle exercise, but an essential feature without which ASDF 1 was doomed. For the explanation why, see my original post on ASDF, Software Irresponsibility, or the broader paper on ASDF 2, Evolving ASDF: More Cooperation, Less Coordination.

Now, despite the heroic efforts of ASDF 2 described in the paper above, self-build could not be made to reliably happen in the middle of the build: subtle incompatibilities between old and new version, previously loaded extensions being clobbered by redefinitions yet expected to work later, interactions with existing control stack frames or with inlining or caching of methods, etc., may not only cause the build to fail, but also badly corrupt the entire system. Thus, self-build, if it happens, must happen at the very beginning of the build. However, the way ASDF works, it is not predictable whether some part of the build will later depend on ASDF. Therefore, to ensure that self-build happens in the beginning if it happens at all, ASDF 3 makes sure it always happens, as the very first thing that ASDF does, no matter what. This also makes ASDF automatically upgrade itself if you just install a new source repository somewhere in your source-registry, e.g. under ~/common-lisp/asdf/ (recommended location for hackers) or /usr/share/common-lisp/source/cl-asdf/ (where Debian installs it). This happens as a call to upgrade-asdf in defmethod operate :around in operate.lisp (including as now called by load-asd), so it is only triggered in side-effectful operations of ASDF, not pure ones (but since find-system can call load-asd, such side-effects can happen just to look at a not-previously-seen, new or updated system definition file). Special care is taken in the record-dependency method in find-system.lisp so this autoload doesn't cause circular dependencies.

Second issue since ASDF 3: ASDF was and is still traditionally delivered as a single file asdf.lisp that you can load in any Common Lisp implementation (literally any, from Genera to Clasp), and it just works. This is not the primary way that ASDF is seen by most end-users anymore: nowadays, every maintained implementation provides ASDF as a module, so users can (require "asdf") about anywhere to get a relatively recent ASDF 3.1 or later. But distributing a single file asdf.lisp is still useful to initially bootstrap ASDF on each of these implementations. Now, by release 2.26, ASDF had grown from its initial 500-line code hack to a 4500-line mess, with roughly working but incomplete efforts to address robustness, portability and upgradability, and with a deep design bug (see the ASDF 3 paper). To allow for further growth, robustification and non-trivial refactoring, ASDF 3 was split into two sets of files, one for the portability library, called UIOP (now 15 files, 7400 lines) and one for ASDF itself (now 25 files, 6000 lines as of 3.3.2.2), the latter set also specifically called asdf/defsystem in this context. The code is much more maintainable for having been organized in these much more modular smaller bites. However, to still be able to deliver as a single file, ASDF implemented a mechanism to concatenate all the files in the correct order into the desired artifact. It would be nice to convince each and every implementation vendor to provide UIOP and ASDF as separate modules, like SBCL and MKCL do, but that's a different challenge of its own.

There is another important reason to concatenate all source files into a single deliverable file: upgrading ASDF may cause some functions to be undefined or partially redefined between source files, while being used in the building ASDF's call stack, which may cause ASDF to fail to properly compile and load the next ASDF. Compiling and loading ASDF in a single file both makes these changes atomic and minimizes the use of functions being redefined while called. Note that in this regard, UIOP could conceivably be loaded the normal way, because it follows stricter backward compatibility restrictions than ASDF, and can afford them because it has a simpler, more stable, less extensible API that doesn't maintain as much dynamic state, and its functions are less likely to be adversely modified while in the call stack. Still, distributing UIOP and ASDF in separate files introduces opportunities for things to go wrong, and since we need a single-file output for ASDF, it's much safer to also include UIOP in it, and simpler not to have to deal with two different ways to build ASDF, with or without a transcluded UIOP.

As a more recent issue, ASDF 3.3 tracks what operations happen while loading a .asd file (e.g. loading triggered by defsystem-depends-on), and uses that information to dynamically track dependencies between multiple phases of the build: there is a new phase each time ASDF compiles and loads extensions to ASDF into the current image as a prerequisite to process further build specifications. ASDF 3.3 is then capable of using this information to properly detect what to rebuild or not rebuild when doing a incremental compilation involving multiple build phases, whereas previous versions could fail in both ways. But, in light of the first issue, that means that trying to define ASDF or UIOP is special, since everything depends on them. And UIOP is even more special because ASDF depends on it. The "solution" I used in ASDF 3.3 was quite ugly — to prevent circular dependency between a define-op asdf and define-op uiop, I made asdf.asd magically read content from uiop.asd without loading uiop.asd, so as to transclude its sources in the concatenated file asdf.lisp. This is a gross hack that ought to be replaced by something better — probably adding more special cases to record-dependency for uiop as well as asdf along the way.

Finally, there is a way that ASDF could be modified, that would displace the magic of bootstrap such that no special case is needed for ASDF or UIOP — by making that magic available to all systems, potentially also solving issues with cross-compilation. (UIOP remains slightly special: it must be built using the standard CL primitives rather than the UIOP API.) But that would require yet another massive refactoring of ASDF. Moreover, it would either be incompatible with existing ASDF extensions or require non-trivial efforts to maintain a backward-compatible path. The problem is the plan made by ASDF is executed by repeatedly calling the perform method, itself calling other methods on objects of various classes comprising the ASDF model, while this model is being redefined. The solution is that from the plan for one phase of execution, ASDF would instead extract a non-extensible, more functional representation of the plan that is impervious to upgrade. Each action would thus define a method on perform-forms instead of perform, that would (primarily) return a list of forms to evaluate at runtime. These forms can then be evaluated without the context of ASDF's object model, actually with a minimal context that even allows them to be run in a different Lisp image on a different Lisp implementation, allowing for cross-compilation, which itself opens the way for parallelized or distributed deterministic backends in the style of XCVB or Bazelisp. Such changes might justify bumping the ASDF version to 4.0.

As usual, writing this essay, which was prompted by a question from Eric Timmons, forced me to think about the alternatives and why I had rejected some of them, to look back at the code and experiment with it, which ultimately led to my finding and fixing various small bugs in the code. As for solving the deeper issues, they're up to you, next ASDF maintainers!

Quicklisp newsJuly 2018 Quicklisp dist update now available

· 125 days ago
Hi everyone! There's a new Quicklisp update for July, and regular updates should resume on a monthly schedule.

I'm using a new release system that involves Docker for easier build server setup and management. It took a while to get going but it should (eventually) make it easier for others to run things in an environment similar to mine. For example, it has all the required foreign libraries needed to compile and load everything in Quicklisp.

Here's the info for the new update:

New projects:
  • april — April is a subset of the APL programming language that compiles to Common Lisp. — Apache-2.0
  • aws-foundation — Amazon AWS low-level utilities — BSD
  • binary-io — Library for reading and writing binary data. — BSD
  • cl-bip39 — A Common Lisp implementation of BIP-0039 — MIT
  • cl-bnf — A simple BNF parser. — MIT
  • cl-generator — cl-generator, a generator implementation for common lisp — MIT
  • cl-patterns — Pattern library for algorithmic music composition and performance in Common Lisp. — GNU General Public License v3.0
  • cl-progress-bar — Display progress bars directly in REPL. — MIT
  • clad — The CLAD System. — BSD
  • concrete-syntax-tree — Library for parsing Common Lisp code into a concrete syntax tree. — FreeBSD
  • definitions — General definitions reflection library. — Artistic
  • eclector — A Common Lisp reader that can adapt to different implementations, and that can return Concrete Syntax Trees — BSD
  • flute — A beautiful, easilly composable HTML5 generation library — MIT
  • froute — An Http routing class that takes advantage of the MOP — MIT
  • language-codes — A small library mapping language codes to language names. — Artistic
  • lichat-ldap — LDAP backend for the Lichat server profiles. — Artistic
  • multilang-documentation — A drop-in replacement for CL:DOCUMENTATION providing multi-language docstrings — Artistic
  • multiposter — A small application to post to multiple services at once. — Artistic
  • sandalphon.lambda-list — Lambda list parsing and usage — WTFPL
  • sel — Programmatic modification and evaluation of software — GPL 3
  • system-locale — System locale and language discovery — Artistic
  • taglib — Pure Lisp implementation to read (and write, perhaps, one day) tags — UNLICENSE 
  • terrable — Terragen TER file format reader — Artistic
  • tooter — A client library for Mastodon instances. — Artistic
  • trace-db — Writing, reading, storing, and searching of program traces — GPL 3
  • umbra — A library of reusable GPU shader functions. — MIT
Updated projects3d-matrices3d-vectorsagnostic-lizardalexaaws-sign4base-blobsbodge-blobs-supportbodge-chipmunkbodge-nanovgbodge-nuklearbordeaux-threadsbt-semaphorecavemanceplcepl.drm-gbmcerberuschirpcl-algebraic-data-typecl-asynccl-autorepocl-cognitocl-conllucl-darkskycl-flowcl-formscl-gamepadcl-geoscl-gobject-introspectioncl-gophercl-hamcrestcl-interpolcl-liballegrocl-libsvm-formatcl-mechanizecl-mpicl-muthcl-online-learningcl-pslibcl-pythoncl-random-forestcl-readlinecl-rediscl-rulescl-sdl2cl-strcl-tomlcl-yesqlclackclawclipcloser-mopclosure-htmlclssclxcodexcoleslawcommon-lisp-actorsconfiguration.optionscroatoancurry-compose-reader-macrosdelta-debugdeploydexadordjuladmldocumentation-utilsdocumentation-utils-extensionsdoubly-linked-listdufydynamic-mixinselffare-scriptsfemlispfftflareflexi-streamsforfreebsd-sysctlfxmlgamebox-frame-managergamebox-mathglsl-specglsl-toolkitglyphsgolden-utilsgraphgsllharmonyhelambdaphunchensocketironcladjoselichat-protocollichat-serverliblichat-tcp-serverlisp-chatlistopiamaidenmcclimmedia-typesmitoninevehomer-countopticloverlordoxenfurtparachuteparseqparser.inipathname-utilspgloaderphysical-quantitiesplumppostmodernppathpythonic-string-readerqbase64qlotqt-libsqtoolsrandom-samplerovertg-maths-dot2serapeumshadowsimple-flow-dispatcherslimespinneretstaplestring-casestumpwmsxqlthe-cost-of-nothingtriviatrivial-ldaptrivial-mmapubiquitousuiopunit-formulavarjowebsocket-driverwhofieldsxhtmlambdaxlsxxml-emitter.

Removed projects: binge, black-tie, cl-ctrnn, cl-directed-graph, cl-ledger, cl-scan, readable, spartns.

The projects removed either didn't build (cl-directed-graph) or are no longer available for download that I could find (everything else).

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

Paul KhuongThe Confidence Sequence Method: A Computer-age Test for Statistical SLOs

· 130 days ago

This post goes over some code that I pushed to github today. All the snippets below should be in the repo, which also includes C and Python code with the same structure.

I recently resumed thinking about balls and bins for hash tables. This time, I’m looking at large bins (on the order of one 2MB huge page). There are many hashing methods with solid worst-case guarantees that unfortunately query multiple uncorrelated locations; I feel like we could automatically adapt them to modern hierarchical storage (or address translation) to make them more efficient, for a small loss in density.

In theory, large enough bins can be allocated statically with a minimal waste of space. I wanted some actual non-asymptotic numbers, so I ran numerical experiments and got the following distribution of global utilisation (fill rate) when the first bin fills up.

It looks like, even with one thousand bins of thirty thousand values, we can expect almost 98% space utilisation until the first bin saturates. I want something more formal.

Could I establish something like a service level objective, “When distributing balls randomly between one thousand bins with individual capacity of thirty thousand balls, we can utilise at least 98% of the total space before a bin fills up, x% of the time?”

The natural way to compute the “x%” that makes the proposition true is to first fit a distribution on the observed data, then find out the probability mass for that distribution that lies above 98% fill rate. Fitting distributions takes a lot of judgment, and I’m not sure I trust myself that much.

Alternatively, we can observe independent identically distributed fill rates, check if they achieve 98% space utilisation, and bound the success rate for this Bernoulli process.

There are some non-trivial questions associated with this approach.

  1. How do we know when to stop generating more observations... without fooling ourselves with \(p\)-hacking?
  2. How can we generate something like a confidence interval for the success rate?

Thankfully, I have been sitting on a software package to compute satisfaction rate for exactly this kind of SLO-type properties, properties of the form “this indicator satisfies $PREDICATE x% of the time,” with arbitrarily bounded false positive rates.

The code takes care of adaptive stopping, generates a credible interval, and spits out a report like this : we see the threshold (0.98), the empirical success rate estimate (0.993 ≫ 0.98), a credible interval for the success rate, and the shape of the probability mass for success rates.

This post shows how to compute credible intervals for the Bernoulli’s success rate, how to implement a dynamic stopping criterion, and how to combine the two while compensating for multiple hypothesis testing. It also gives two examples of converting more general questions to SLO form, and answers them with the same code.

Credible intervals for the Binomial

If we run the same experiment \(n\) times, and observe \(a\) successes (\(b = n - a\) failures), it’s natural to ask for an estimate of the success rate \(p\) for the underlying Bernoulli process, assuming the observations are independent and identically distributed.

Intuitively, that estimate should be close to \(a / n\), the empirical success rate, but that’s not enough. I also want something that reflects the uncertainty associated with small \(n\), much like in the following ridge line plot, where different phrases are assigned not only a different average probability, but also a different spread.

I’m looking for an interval of plausible success rates \(p\) that responds to both the empirical success rate \(a / n\) and the sample size \(n\); that interval should be centered around \(a / n\), be wide when \(n\) is small, and become gradually tighter as \(n\) increases.

The Bayesian approach is straightforward, if we’re willing to shut up and calculate. Once we fix the underlying success rate \(p = \hat{p}\), the conditional probability of observing \(a\) successes and \(b\) failures is

\[P((a, b) | p = \hat{p}) \sim \hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b},\]

where the right-hand side is a proportion1, rather than a probability.

We can now apply Bayes’s theorem to invert the condition and the event. The inversion will give us the conditional probability that \(p = \hat{p}\), given that we observed \(a\) successes and \(b\) successes. We only need to impose a prior distribution on the underlying rate \(p\). For simplicity, I’ll go with the uniform \(U[0, 1]\), i.e., every success rate is equally plausible, at first. We find

\[P(p = \hat{p} | (a, b)) = \frac{P((a, b) | p = \hat{p}) P(p = \hat{p})}{P(a, b)}.\]

We already picked the uniform prior, \(P(p = \hat{p}) = 1\,\forall \hat{p}\in [0,1],\) and the denominator is a constant with respect to \(\hat{p}\). The expression simplifies to

\[P(p = \hat{p} | (a, b)) \sim \hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b},\]

or, if we normalise to obtain a probability,

\[P(p = \hat{p} | (a, b)) = \frac{\hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b}}{\int\sb{0}\sp{1} \hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b}\, d\hat{p}} = \textrm{Beta}(a+1, b+1).\]

A bit of calculation, and we find that our credibility estimate for the underlying success rate follows a Beta distribution. If one is really into statistics, they can observe that the uniform prior distribution is just the \(\textrm{Beta}(1, 1)\) distribution, and rederive that the Beta is the conjugate distribution for the Binomial distribution.

For me, it suffices to observe that the distribution \(\textrm{Beta}(a+1, b+1)\) is unimodal, does peak around \(a / (a + b)\), and becomes tighter as the number of observations grows. In the following image, I plotted three Beta distributions, all with empirical success rate 0.9; red corresponds to \(n = 10\) (\(a = 9\), \(b = 1\), \(\textrm{Beta}(10, 2)\)), black to \(n = 100\) (\(\textrm{Beta}(91, 11)\)), and blue to \(n = 1000\) (\(\textrm{Beta}(901, 101)\)).

We calculated, and we got something that matches my intuition. Before trying to understand what it means, let’s take a detour to simply plot points from that un-normalised proportion function \(\hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b}\), on an arbitrary \(y\) axis.

Let \(\hat{p} = 0.4\), \(a = 901\), \(b = 101\). Naïvely entering the expression at the REPL yields nothing useful.

CL-USER> (* (expt 0.4d0 901) (expt (- 1 0.4d0) 101))
0.0d0

The issue here is that the un-normalised proportion is so small that it underflows double floats and becomes a round zero. We can guess that the normalisation factor \(\frac{1}{\mathrm{Beta}(\cdot,\cdot)}\) quickly grows very large, which will bring its own set of issues when we do care about the normalised probability.

How can we renormalise a set of points without underflow? The usual trick to handle extremely small or large magnitudes is to work in the log domain. Rather than computing \(\hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b}\), we shall compute

\[\log\left[\hat{p}\sp{a} \cdot (1 - \hat{p})\sp{b}\right] = a \log\hat{p} + b \log (1 - \hat{p}).\]

CL-USER> (+ (* 901 (log 0.4d0)) (* 101 (log (- 1 0.4d0))))
-877.1713374189787d0
CL-USER> (exp *)
0.0d0

That’s somewhat better: the log-domain value is not \(-\infty\), but converting it back to a regular value still gives us 0.

The \(\log\) function is monotonic, so we can find the maximum proportion value for a set of points, and divide everything by that maximum value to get plottable points. There’s one last thing that should change: when \(x\) is small, \(1 - x\) will round most of \(x\) away. Instead of (log (- 1 x)), we should use (log1p (- x)) to compute \(\log (1 + -x) = \log (1 - x)\). Common Lisp did not standardise log1p, but SBCL does have it in internals, as a wrapper around libm. We’ll just abuse that for now.

CL-USER> (defun proportion (x) (+ (* 901 (log x)) (* 101 (sb-kernel:%log1p (- x)))))
PROPORTION
CL-USER> (defparameter *points* (loop for i from 1 upto 19 collect (/ i 20d0)))
*POINTS*
CL-USER> (reduce #'max *points* :key #'proportion)
-327.4909190001001d0

We have to normalise in the log domain, which is simply a subtraction: \(\log(x / y) = \log x - \log y\). In the case above, we will subtract \(-327.49\ldots\), or add a massive \(327.49\ldots\) to each log proportion (i.e., multiply by \(10\sp{142}\)). The resulting values should have a reasonably non-zero range.

CL-USER> (mapcar (lambda (x) (cons x (exp (- (proportion x) *)))) *points*)
((0.05d0 . 0.0d0)
 (0.1d0 . 0.0d0)
 [...]
 (0.35d0 . 3.443943164733533d-288)
 [...]
 (0.8d0 . 2.0682681158181894d-16) 
 (0.85d0 . 2.6252352579425913d-5)
 (0.9d0 . 1.0d0)
 (0.95d0 . 5.65506756824607d-10))

There’s finally some signal in there. This is still just an un-normalised proportion function, not a probability density function, but that’s already useful to show the general shape of the density function, something like the following, for \(\mathrm{Beta}(901, 101)\).

Finally, we have a probability density function for the Bayesian update of our belief about the success rate after \(n\) observations of a Bernoulli process, and we know how to compute its proportion function. Until now, I’ve carefully avoided the question of what all these computations even mean. No more (:

The Bayesian view assumes that the underlying success rate (the value we’re trying to estimate) is unknown, but sampled from some distribution. In our case, we assumed a uniform distribution, i.e., that every success rate is a priori equally likely. We then observe \(n\) outcomes (successes or failures), and assign an updated probability to each success rate. It’s like a many-world interpretation in which we assume we live in one of a set of worlds, each with a success rate sampled from the uniform distribution; after observing 900 successes and 100 failures, we’re more likely to be in a world where the success rate is 0.9 than in one where it’s 0.2. With Bayes’s theorem to formalise the update, we assign posterior probabilities to each potential success rate value.

We can compute an equal-tailed credible interval from that \(\mathrm{Beta}(a+1,b+1)\) posterior distribution by excluding the left-most values, \([0, l)\), such that the Beta CDF (cumulative distribution function) at \(l\) is \(\varepsilon / 2\), and doing the same with the right most values to cut away \(\varepsilon / 2\) of the probability density. The CDF for \(\mathrm{Beta}(a+1,b+1)\) at \(x\) is the incomplete beta function, \(I\sb{x}(a+1,b+1)\). That function is really hard to compute (this technical report detailing Algorithm 708 deploys five different evaluation strategies), so I’ll address that later.

The more orthodox “frequentist” approach to confidence intervals treats the whole experiment, from data colleaction to analysis (to publication, independent of the observations 😉) as an Atlantic City algorithm: if we allow a false positive rate of \(\varepsilon\) (e.g., \(\varepsilon=5\%\)), the experiment must return a confidence interval that includes the actual success rate (population statistic or parameter, in general) with probability \(1 - \varepsilon\), for any actual success rate (or underlying population statistic / parameter). When the procedure fails, with probability at most \(\varepsilon\), it is allowed to fail in an arbitrary manner.

The same Atlantic City logic applies to \(p\)-values. An experiment (data collection and analysis) that accepts when the \(p\)-value is at most \(0.05\) is an Atlantic City algorithm that returns a correct result (including “don’t know”) with probability at least \(0.95\), and is otherwise allowed to yield any result with probability at most \(0.05\). The \(p\)-value associated with a conclusion, e.g., “success rate is more than 0.8” (the confidence level associated with an interval) means something like “I’m pretty sure that the success rate is more than 0.8, because the odds of observing our data if that were false are small (less than 0.05).” If we set that threshold (of 0.05, in the example) ahead of time, we get an Atlantic City algorithm to determine if “the success rate is more than 0.8” with failure probability 0.05. (In practice, reporting is censored in all sorts of ways, so...)

There are ways to recover a classical confidence interval, given \(n\) observations from a Bernoulli. However, they’re pretty convoluted, and, as Jaynes argues in his note on confidence intervals, the classical approach gives values that are roughly the same2 as the Bayesian approach... so I’ll just use the Bayesian credibility interval instead.

See this stackexchange post for a lot more details.

Dynamic stopping for Binomial testing

The way statistics are usually deployed is that someone collects a data set, as rich as is practical, and squeezes that static data set dry for significant results. That’s exactly the setting for the credible interval computation I sketched in the previous section.

When studying the properties of computer programs or systems, we can usually generate additional data on demand, given more time. The problem is knowing when it’s ok to stop wasting computer time, because we have enough data... and how to determine that without running into multiple hypothesis testing issues (ask anyone who’s run A/B tests).

Here’s an example of an intuitive but completely broken dynamic stopping criterion. Let’s say we’re trying to find out if the success rate is less than or greater than 90%, and are willing to be wrong 5% of the time. We could get \(k\) data points, run a statistical test on those data points, and stop if the data let us conclude with 95% confidence that the underlying success rate differs from 90%. Otherwise, collect \(2k\) fresh points, run the same test; collect \(4k, \ldots, 2\sp{i}k\) points. Eventually, we’ll have enough data.

The issue is that each time we execute the statistical test that determines if we should stop, we run a 5% risk of being totally wrong. For an extreme example, if the success rate is exactly 90%, we will eventually stop, with probability 1. When we do stop, we’ll inevitably conclude that the success rate differs from 90%, and we will be wrong. The worst-case (over all underlying success rates) false positive rate is 100%, not 5%!

In my experience, programmers tend to sidestep the question by wasting CPU time with a large, fixed, number of iterations... people are then less likely to run our statistical tests, since they’re so slow, and everyone loses (the other popular option is to impose a reasonable CPU budget, with error thresholds so lax we end up with a smoke test).

Robbins, in Statistical Methods Related to the Law of the Iterated Logarithm, introduces a criterion that, given a threshold success rate \(p\) and a sequence of (infinitely many!) observations from the same Bernoulli with unknown success rate parameter, will be satisfied infinitely often when \(p\) differs from the Bernoulli’s success rate. Crucially, Robbins also bounds the false positive rate, the probability that the criterion be satisfied even once in the infinite sequence of observations if the Bernoulli’s unknown success rate is exactly equal to \(p\). That criterion is

\[{n \choose a} p\sp{a} (1-p)\sp{n-a} \leq \frac{\varepsilon}{n+1},\]

where \(n\) is the number of observations, \(a\) the number of successes, \(p\) the threshold success rate, and \(\varepsilon\) the error (false positive) rate. As the number of observation grows, the criterion becomes more and more stringent to maintain a bounded false positive rate over the whole infinite sequence of observations.

There are similar “Confidence Sequence” results for other distributions (see, for example, this paper of Lai), but we only care about the Binomial here.

More recently, Ding, Gandy, and Hahn showed that Robbins’s criterion also guarantees that, when it is satisfied, the empirical success rate (\(a/n\)) lies on the correct side of the threshold \(p\) (same side as the actual unknown success rate) with probability \(1-\varepsilon\). This result leads them to propose the use of Robbins’s criterion to stop Monte Carlo statistical tests, which they refer to as the Confidence Sequence Method (CSM).

(defun csm-stop-p (successes failures threshold eps)
  "Pseudocode, this will not work on a real machine."
  (let ((n (+ successes failures)))
    (<= (* (choose n successes) 
           (expt threshold successes)
           (expt (- 1 threshold) failures))
        (/ eps (1+ n)))))

We may call this predicate at any time with more independent and identically distributed results, and stop as soon as it returns true.

The CSM is simple (it’s all in Robbins’s criterion), but still provides good guarantees. The downside is that it is conservative when we have a limit on the number of observations: the method “hedges” against the possibility of having a false positive in the infinite number of observations after the limit, observations we will never make. For computer-generated data sets, I think having a principled limit is pretty good; it’s not ideal to ask for more data than strictly necessary, but not a blocker either.

In practice, there are still real obstacles to implementing the CSM on computers with finite precision (floating point) arithmetic, especially since I want to preserve the method’s theoretical guarantees (i.e., make sure rounding is one-sided to overestimate the left-hand side of the inequality).

If we implement the expression well, the effect of rounding on correctness should be less than marginal. However, I don’t want to be stuck wondering if my bad results are due to known approximation errors in the method, rather than errors in the code. Moreover, if we do have a tight expression with little rounding errors, adjusting it to make the errors one-sided should have almost no impact. That seems like a good trade-off to me, especially if I’m going to use the CSM semi-automatically, in continuous integration scripts, for example.

One look at csm-stop-p shows we’ll have the same problem we had with the proportion function for the Beta distribution: we’re multiplying very small and very large values. We’ll apply the same fix: work in the log domain and exploit \(\log\)’s monotonicity.

\[{n \choose a} p\sp{a} (1-p)\sp{n-a} \leq \frac{\varepsilon}{n+1}\]

becomes

\[\log {n \choose a} + a \log p + (n-a)\log (1-p) \leq \log\varepsilon -\log(n+1),\]

or, after some more expansions, and with \(b = n - a\),

\[\log n! - \log a! - \log b! + a \log p + b \log(1 - p) + \log(n+1) \leq \log\varepsilon.\]

The new obstacle is computing the factorial \(x!\), or the log-factorial \(\log x!\). We shouldn’t compute the factorial iteratively: otherwise, we could spend more time in the stopping criterion than in the data generation subroutine. Robbins has another useful result for us:

\[\sqrt{2\pi} n\sp{n + ½} \exp(-n) \exp\left(\frac{1}{12n+1}\right) < n! < \sqrt{2\pi} n\sp{n + ½} \exp(-n) \exp\left(\frac{1}{12n}\right),\]

or, in the log domain,

\[\log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n -n + \frac{1}{12n+1} < \log n! < \log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n -n +\frac{1}{12n}.\]

This double inequality gives us a way to over-approximate \(\log {n \choose a} = \log \frac{n!}{a! b!} = \log n! - \log a! - \log b!,\) where \(b = n - a\):

\[\log {n \choose a} < -\log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n -n +\frac{1}{12n} - \left(a + \frac{1}{2}\right)\log a +a - \frac{1}{12a+1} - \left(b + \frac{1}{2}\right)\log b +b - \frac{1}{12b+1},\]

where the right-most expression in Robbins’s double inequality replaces \(\log n!\), which must be over-approximated, and the left-most \(\log a!\) and \(\log b!\), which must be under-approximated.

Robbins’s approximation works well for us because, it is one-sided, and guarantees that the (relative) error in \(n!\), \(\frac{\exp\left(\frac{1}{12n}\right) - \exp\left(\frac{1}{12n+1}\right)}{n!},\) is small, even for small values like \(n = 5\) (error \(< 0.0023\%\)), and decreases with \(n\): as we perform more trials, the approximation is increasingly accurate, thus less likely to spuriously prevent us from stopping.

Now that we have a conservative approximation of Robbins’s criterion that only needs the four arithmetic operations and logarithms (and log1p), we can implement it on a real computer. The only challenge left is regular floating point arithmetic stuff: if rounding must occur, we must make sure it is in a safe (conservative) direction for our predicate.

Hardware usually lets us manipulate the rounding mode to force floating point arithmetic operations to round up or down, instead of the usual round to even. However, that tends to be slow, so most language (implementations) don’t support changing the rounding mode, or do so badly... which leaves us in a multi-decade hardware/software co-evolution Catch-22.

I could think hard and derive tight bounds on the round-off error, but I’d rather apply a bit of brute force. IEEE-754 compliant implementations must round the four basic operations correctly. This means that \(z = x \oplus y\) is at most half a ULP away from \(x + y,\) and thus either \(z = x \oplus y \geq x + y,\) or the next floating point value after \(z,\) \(z^\prime \geq x + y\). We can find this “next value” portably in Common Lisp, with decode-float/scale-float, and some hand-waving for denormals.

(defun next (x &optional (delta 1))
  "Increment x by delta ULPs. Very conservative for
   small (0/denormalised) values."
  (declare (type double-float x)
           (type unsigned-byte delta))
  (let* ((exponent (nth-value 1 (decode-float x)))
         (ulp (max (scale-float double-float-epsilon exponent)
                   least-positive-normalized-double-float)))
    (+ x (* delta ulp))))

I prefer to manipulate IEEE-754 bits directly. That’s theoretically not portable, but the platforms I care about make sure we can treat floats as sign-magnitude integers.

next
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#+sbcl
(progn
  (declaim (inline %float-bits %bits-float next prev))
  (defun %float-bits (x)
    "Convert a double float x to sign-extended sign/magnitude, and
     then to 2's complement."
    (declare (type double-float x))
    (let* ((hi (sb-kernel:double-float-high-bits x))
           (lo (sb-kernel:double-float-low-bits x))
           (word (+ (ash (ldb (byte 31 0) hi) 32) lo)))
      ;; hi is the high half of the 64 bit sign-magnitude
      ;; representation... in two's complement. Extract the significand,
      ;; and then apply the sign bit. We want to preserve signed zeros,
      ;; so return -1 - word instead of -word.
      ;;
      ;; (- -1 word) = (lognot word) = (logxor word -1).
      (logxor word (ash hi -32))))

  (defun %bits-float (bits)
    "Convert 2's complement to sign-extended sign/magnitude, then
     double float."
    (declare (type (signed-byte 64) bits))
    ;; convert back to sign-magnitude: if bits is negative, all but the
    ;; sign bit must be flipped again.
    (let ((bits (logxor bits
                        (ldb (byte 63 0) (ash bits -64)))))
      (sb-kernel:make-double-float (ash bits -32)
                                   (ldb (byte 32 0) bits))))

  (defun next (x &optional (delta 1))
    "Increment x by delta ULPs."
    (declare (type double-float x)
             (type unsigned-byte delta))
    (%bits-float (+ (%float-bits x) delta)))

  (defun prev (x &optional (delta 1))
    "Decrement x by delta ULPs."
    (declare (type double-float x)
             (type unsigned-byte delta))
    (%bits-float (- (%float-bits x) delta))))
CL-USER> (double-float-bits pi)
4614256656552045848
CL-USER> (double-float-bits (- pi))
-4614256656552045849

The two’s complement value for pi is one less than (- (double-float-bits pi)) because two’s complement does not support signed zeros.

CL-USER> (eql 0 (- 0))
T
CL-USER> (eql 0d0 (- 0d0))
NIL
CL-USER> (double-float-bits 0d0)
0
CL-USER> (double-float-bits -0d0)
-1

We can quickly check that the round trip from float to integer and back is an identity.

CL-USER> (eql pi (bits-double-float (double-float-bits pi)))
T
CL-USER> (eql (- pi) (bits-double-float (double-float-bits (- pi))))
T
CL-USER> (eql 0d0 (bits-double-float (double-float-bits 0d0)))
T
CL-USER> (eql -0d0 (bits-double-float (double-float-bits -0d0)))
T

We can also check that incrementing or decrementing the integer representation does increase or decrease the floating point value.

CL-USER> (< (bits-double-float (1- (double-float-bits pi))) pi)
T
CL-USER> (< (bits-double-float (1- (double-float-bits (- pi)))) (- pi))
T
CL-USER> (bits-double-float (1- (double-float-bits 0d0)))
-0.0d0
CL-USER> (bits-double-float (1+ (double-float-bits -0d0)))
0.0d0
CL-USER> (bits-double-float (1+ (double-float-bits 0d0)))
4.9406564584124654d-324
CL-USER> (bits-double-float (1- (double-float-bits -0d0)))
-4.9406564584124654d-324

The code doesn’t handle special values like infinities or NaNs, but that’s out of scope for the CSM criterion anyway. That’s all we need to nudge the result of the four operations to guarantee an over- or under- approximation of the real value. We can also look at the documentation for our libm (e.g., for GNU libm) to find error bounds on functions like log; GNU claims their log is never off by more than 3 ULP. We can round up to the fourth next floating point value to obtain a conservative upper bound on \(\log x\).

log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
(declaim (type (unsigned-byte 31) *libm-error-limit*))
(defvar *libm-error-limit* 4
  "Assume libm is off by less than 4 ULPs.")

(declaim (inline log-up log-down))
(defun log-up (x)
  "Conservative upper bound on log(x)."
  (declare (type double-float x))
  (next (log x) *libm-error-limit*))

(defun log-down (x)
  "Conservative lower bound on log(x)."
  (declare (type double-float x))
  (prev (log x) *libm-error-limit*))

#+sbcl
(progn
  (declaim (inline log1p-up log1p-down))
  (defun log1p-up (x)
    "Convervative upper bound on log(1 + x)."
    (declare (type double-float x))
    (next (sb-kernel:%log1p x) *libm-error-limit*))

  (defun log1p-down (x)
    "Conservative lower bound on log(1 + x)"
    (declare (type double-float x))
    (prev (sb-kernel:%log1p x) *libm-error-limit*)))

I could go ahead and use the building blocks above (ULP nudging for directed rounding) to directly implement Robbins’s criterion,

\[\log {n \choose a} + a \log p + b\log (1-p) + \log(n+1) \leq \log\varepsilon,\]

with Robbins’s factorial approximation,

\[\log {n \choose a} < -\log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n -n +\frac{1}{12n} - \left(a + \frac{1}{2}\right)\log a +a - \frac{1}{12a+1} - \left(b + \frac{1}{2}\right)\log b +b - \frac{1}{12b+1}.\]

However, even in the log domain, there’s a lot of cancellation: we’re taking the difference of relatively large numbers to find a small result. It’s possible to avoid that by re-associating some of the terms above, e.g., for \(a\):

\[-\left(a + \frac{1}{2}\right) \log a + a - a \log p = -\frac{\log a}{2} + a (-\log a + 1 - \log p).\]

Instead, I’ll just brute force things (again) with Kahan summation. Shewchuk’s presentation in Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates highlights how the only step where we may lose precision to rounding is when we add the current compensation term to the new summand. We can implement Kahan summation with directed rounding in only that one place: all the other operations are exact!

“kahan summation”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
;;; Kahan-style summation.
;;;
;;; Represent the accumulator as an evaluated sum of two doubles. As
;;; long as the compensation term is initially 0, the result is a safe
;;; upper bound on the real value, and the two terms are
;;; "non-overlapping."  For more details, see "Adaptive Precision
;;; Floating-Point Arithmetic and Fast Robust Geometric Predicates",
;;; Shewchuk, 1997; Technical report CMU-CS-96-140R / Discrete & Comp
;;; Geom 18(3), October 1997. Theorem 6 in particular.

(declaim (inline sum-update-up sum-update-finish))
(defun sum-update-up (accumulator compensation term &optional ordered)
  "Given an evaluated sum
     (accumulator + compensation),
   return a new unevaluated sum for an upper bound on
     (accumulator + compensation + term).

   If ordered, assume
     term < accumulator,
   or
     accumulator = compensation = 0."
  (declare (type double-float accumulator compensation
                 term))
  (when (and (not ordered)
             (< (abs accumulator) (abs term)))
    (rotatef accumulator term))
  (let* ((rest-1 (next (+ compensation term))) ; safe upper bound on c + t
         (rest (if (<= compensation 0d0)       ; tighter, still safe.
                   (min term rest-1)
                   rest-1))
         ;; Perform a Dekker sum of accumulator + rest. The result is
         ;; exact, so no need for next/prev here.
         ;;
         ;; Precondition: |accumulator| >= |rest| (or accumulator = 0).
         (a accumulator)
         (b rest)
         (x (+ a b))
         (b-virtual (- x a))     ; b-virtual = value really added to a
         (y (- b b-virtual)))
    (values x y)))

(defun sum-update-finish (accumulator compensation)
  "Return a conservative upper bound for accumulator + compensation.

   In theory, (+ accumulator compensation) is equal to accumulator.
   In practice, it doesn't hurt to do this right. The second return
   value is the new compensation term (should never be positive)."
  (declare (type double-float accumulator compensation))
  (let* ((raw-sum (next (+ accumulator compensation)))
         (sum (if (> compensation 0d0)
                  raw-sum
                  ;; if compensation <= 0, acc is already an upper
                  ;; bound.
                  (min accumulator raw-sum)))
         (delta (- sum accumulator)))
    (assert (>= delta compensation))
    (values sum (- compensation delta))))

(declaim (ftype (function (&rest double-float)
                          (values double-float double-float &optional))
                sum-up))
(defun sum-up (&rest values)
  "Conservative upper bound for the sum of values, with a Kahan
   summation loop."
  (let ((acc 0d0)
        (err 0d0))
    (dolist (value values (sum-update-finish acc err))
      (setf (values acc err)
            (sum-update-up acc err value)))))

We need one last thing to implement \(\log {n \choose a}\), and then Robbins’s confidence sequence: a safely rounded floating-point value approximation of \(-\log \sqrt{2 \pi}\). I precomputed one with computable-reals:

CL-USER> (computable-reals:-r
          (computable-reals:log-r
           (computable-reals:sqrt-r computable-reals:+2pi-r+)))
-0.91893853320467274178...
CL-USER> (computable-reals:ceiling-r 
          (computable-reals:*r *
                               (ash 1 53)))
-8277062471433908
-0.65067431749790398594...
CL-USER> (* -8277062471433908 (expt 2d0 -53))
-0.9189385332046727d0
CL-USER> (computable-reals:-r (rational *)
                              ***)
+0.00000000000000007224...

We can safely replace \(-\log\sqrt{2\pi}\) with -0.9189385332046727d0, or, equivalently, (scale-float -8277062471433908.0d0 -53), for an upper bound. If we wanted a lower bound, we could decrement the integer significand by one.

log-choose
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
;;; Upper bound for log c(n, s).

(declaim (type double-float *minus-log-sqrt-2pi*))
(defvar *minus-log-sqrt-2pi* -0.9189385332046727d0
  "Smallest double precision value > -log sqrt(2pi).")

(declaim (ftype (function ((unsigned-byte 49) (unsigned-byte 49))
                          (values double-float double-float &optional))
                robbins-log-choose))
(defun robbins-log-choose (n s)
  "Compute a conservative upper bound on log c(n, s) based on
   Robbins's bounds for k!."
  (check-type n (unsigned-byte 49)) ;; ensure 53 bit arith is exact.
  (check-type s (unsigned-byte 49))
  (assert (<= 0 s n))
  ;; Handle easy cases, where c(n, s) is 1 or n.
  (when (or (= n s)
            (zerop s))
    (return-from robbins-log-choose (values 0d0 0d0)))
  (when (or (= s 1)
            (= s (1- n)))
    (return-from robbins-log-choose (values (log-up (float n 1d0))
                                            0d0)))
  (let* ((n (float n 1d0))
         (s (float s 1d0))
         (n-s (float (- n s) 1d0))
         (l1 (next (* (+ n .5d0) (log-up n)))) ; (+ n .5d0) is exact.
         (l2 (next (- (* (+ s .5d0) (log-down s)))))
         (l3 (next (- (* (+ n-s .5d0) (log-down n-s)))))
         (r1 (next (/ (* 12d0 n))))          ; (* 12d0 n) is exact.
         (r2 (next (- (/ (1+ (* 12d0 s)))))) ; also exact.
         (r3 (next (- (/ (1+ (* 12d0 n-s)))))))
    (sum-up *minus-log-sqrt-2pi*
            l1 l2 l3
            r1 r2 r3)))

We can quickly check against an exact implementation with computable-reals and a brute force factorial.

CL-USER> (defun cr-log-choose (n s)
           (computable-reals:-r
            (computable-reals:log-r (alexandria:factorial n))
            (computable-reals:log-r (alexandria:factorial s))
            (computable-reals:log-r (alexandria:factorial (- n s)))))
CR-LOG-CHOOSE
CL-USER> (computable-reals:-r (rational (robbins-log-choose 10 5))
                              (cr-log-choose 10 5))
+0.00050526703375914436...
CL-USER> (computable-reals:-r (rational (robbins-log-choose 1000 500))
                              (cr-log-choose 1000 500))
+0.00000005551513197557...
CL-USER> (computable-reals:-r (rational (robbins-log-choose 1000 5))
                              (cr-log-choose 1000 5))
+0.00025125559085509706...

That’s not obviously broken: the error is pretty small, and always positive.

Given a function to over-approximate log-choose, the Confidence Sequence Method’s stopping criterion is straightforward.

csm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
(declaim (ftype (function ((unsigned-byte 49)
                           (real (0) (1))
                           (unsigned-byte 49)
                           real)
                          (values boolean double-float &optional))
                csm))
(defun csm (n alpha s log-eps)
  "Given n trials and s sucesses, are we reasonably sure that the
  success rate is *not* alpha (with a false positive rate < exp(log-eps))?

  Answer that question with Ding, Gandy, and Hahn's confidence
  sequence method (CSM). The second return value is an estimate of the
  false positive target rate we would need to stop here. This value
  should only be used for reporting; the target rate eps should always
  be fixed before starting the experiment."
  (check-type n (unsigned-byte 49))
  (check-type alpha (real (0) (1)))
  (check-type s (unsigned-byte 49))
  (check-type log-eps real)
  (assert (<= 0 s n))
  (let* ((log-choose (robbins-log-choose n s))
         (n (float n 1d0))
         (alpha (float alpha 1d0))
         (s (float s 1d0))
         (log-eps (float log-eps 1d0))
         (log-level (sum-up (log-up (1+ n))
                            log-choose
                            (next (* s (log-up alpha)))
                            (next (* (- n s) (log1p-up (- alpha)))))))
    (values (< log-level log-eps) log-level)))

The other, much harder, part is computing credible (Bayesian) intervals for the Beta distribution. I won’t go over the code, but the basic strategy is to invert the CDF, a monotonic function, by bisection3, and to assume we’re looking for improbable (\(\mathrm{cdf} < 0.5\)) thresholds. This assumption lets us pick a simple hypergeometric series that is normally useless, but converges well for \(x\) that correspond to such small cumulative probabilities; when the series converges too slowly, it’s always conservative to assume that \(x\) is too central (not extreme enough).

That’s all we need to demo the code. Looking at the distribution of fill rates for the 1000 bins @ 30K ball/bin facet in

it looks like we almost always hit at least 97.5% global density, let’s say with probability at least 98%. We can ask the CSM to tell us when we have enough data to confirm or disprove that hypothesis, with a 0.1% false positive rate.

Instead of generating more data on demand, I’ll keep things simple and prepopulate a list with new independently observed fill rates.

CL-USER> (defparameter *observations* '(0.978518900
                                        0.984687300
                                        0.983160833
                                        [...]))
CL-USER> (defun test (n)
           (let ((count (count-if (lambda (x) (>= x 0.975))
                                  *observations*
                                  :end n)))
             (csm:csm n 0.98d0 count (log 0.001d0))))
CL-USER> (test 10)
NIL
2.1958681996231784d0
CL-USER> (test 100)
NIL
2.5948497850893184d0
CL-USER> (test 1000)
NIL
-3.0115331544604658d0
CL-USER> (test 2000)
NIL
-4.190687115879456d0
CL-USER> (test 4000)
T
-17.238559826956475d0

We can also use the inverse Beta CDF to get a 99.9% credible interval. After 4000 trials, we found 3972 successes.

CL-USER> (count-if (lambda (x) (>= x 0.975))
                   *observations*
                   :end 4000)
3972

These values give us the following lower and upper bounds on the 99.9% CI.

CL-USER> (csm:beta-icdf 3972 (- 4000 3972) 0.001d0)
0.9882119750976562d0
1.515197753898523d-5
CL-USER> (csm:beta-icdf 3972 (- 4000 3972) 0.001d0 t)
0.9963832682169742d0
2.0372679238045424d-13

And we can even re-use and extend the Beta proportion code from earlier to generate this embeddable SVG report.

There’s one small problem with the sample usage above: if we compute the stopping criterion with a false positive rate of 0.1%, and do the same for each end of the credible interval, our total false positive (error) rate might actually be 0.3%! The next section will address that, and the equally important problem of estimating power.

Monte Carlo power estimation

It’s not always practical to generate data forever. For example, we might want to bound the number of iterations we’re willing to waste in an automated testing script. When there is a bound on the sample size, the CSM is still correct, just conservative.

We would then like to know the probability that the CSM will stop successfully when the underlying success rate differs from the threshold rate \(p\) (alpha in the code). The problem here is that, for any bounded number of iterations, we can come up with an underlying success rate so close to \(p\) (but still different) that the CSM can’t reliably distinguish between the two.

If we want to be able to guarantee any termination rate, we need two thresholds: the CSM will stop whenever it’s likely that the underlying success rate differs from either of them. The hardest probability to distinguish from both thresholds is close to the midpoint between them.

With two thresholds and the credible interval, we’re running three tests in parallel. I’ll apply a Bonferroni correction, and use \(\varepsilon / 3\) for each of the two CSM tests, and \(\varepsilon / 6\) for each end of the CI.

That logic is encapsulated in csm-driver. We only have to pass a success value generator function to the driver. In our case, the generator is itself a call to csm-driver, with fixed thresholds (e.g., 96% and 98%), and a Bernoulli sampler (e.g., return T with probability 97%). We can see if the driver returns successfully and correctly at each invocation of the generator function, with the parameters we would use in production, and recursively compute an estimate for that procedure’s success rate with CSM. The following expression simulates a CSM procedure with thresholds at 96% and 98%, the (usually unknown) underlying success rate in the middle, at 97%, a false positive rate of at most 0.1%, and an iteration limit of ten thousand trials. We pass that simulation’s result to csm-driver, and ask whether the simulation’s success rate differs from 99%, while allowing one in a million false positives.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
CL-USER> (labels ((bernoulli (i &aux; (p 0.97d0))
                    (declare (ignore i))
                    (< (random 1d0) p))
                  (generator (i &aux; (p 0.97d0)
                                     (alpha 0.96d0) (alpha-hi 0.98d0)
                                     (eps 1d-3) (max-count 10000))
                    (declare (ignore i))
                    (multiple-value-bind (success success-hi estimate)
                        (csm:csm-driver #'bernoulli alpha eps
                                        :alpha-hi alpha-hi
                                        :max-count max-count)
                      ;; check that the CSM succeeds, and that it does so
                      ;; with correct estimates.
                      (let ((correct-alpha (if (< p alpha)
                                               (< estimate alpha)
                                               (> estimate alpha)))
                            (correct-hi (if (< p alpha-hi)
                                            (< estimate alpha-hi)
                                            (> estimate alpha-hi))))
                        (cond ((and success success-hi)
                               (and correct-alpha correct-hi))
                              (success
                               correct-alpha)
                              (success-hi
                               correct-hi)
                              (t
                               nil))))))
           (csm:csm-driver #'generator 0.99d0 1d-6))
T
T
1.0d0
2210
2210
0.993145939238895d0
0.9999999998869291d0

We find that yes, we can expect the 96%/98%/0.1% false positive/10K iterations setup to succeed more than 99% of the time. The code above is available as csm-power, with a tighter outer false positive rate of 1e-9. If we only allow 1000 iterations, csm-power quickly tells us that, with one CSM success in 100 attempts, we can expect the CSM success rate to be less than 99%.

CL-USER> (csm:csm-power 0.97d0 0.96d0 1000 :alpha-hi 0.98d0 :eps 1d-3 :stream *standard-output*)
         1 0.000e+0 1.250e-10 10.000e-1 1.699e+0
        10 0.000e+0 0.000e+0 8.660e-1 1.896e+1
        20 0.000e+0 0.000e+0 6.511e-1 3.868e+1
        30 0.000e+0 0.000e+0 5.099e-1 5.851e+1
        40 2.500e-2 5.518e-7 4.659e-1 7.479e+1
        50 2.000e-2 4.425e-7 3.952e-1 9.460e+1
        60 1.667e-2 3.694e-7 3.427e-1 1.144e+2
        70 1.429e-2 3.170e-7 3.024e-1 1.343e+2
        80 1.250e-2 2.776e-7 2.705e-1 1.542e+2
        90 1.111e-2 2.469e-7 2.446e-1 1.741e+2
       100 1.000e-2 2.223e-7 2.232e-1 1.940e+2
100 iterations, 1 successes (false positive rate < 1.000000e-9)
success rate p ~ 1.000000e-2
confidence interval [2.223495e-7, 0.223213    ]
p < 0.990000    
max inner iteration count: 816

T
T
0.01d0
100
1
2.2234953205868331d-7
0.22321314110840665d0

SLO-ify all the things with this Exact test

Until now, I’ve only used the Confidence Sequence Method (CSM) for Monte Carlo simulation of phenomena that are naturally seen as boolean success / failures processes. We can apply the same CSM to implement an exact test for null hypothesis testing, with a bit of resampling magic.

Looking back at the balls and bins grid, the average fill rate seems to be slightly worse for 100 bins @ 60K ball/bin, than for 1000 bins @ 128K ball/bin. How can we test that with the CSM?

First, we should get a fresh dataset for the two setups we wish to compare.

CL-USER> (defparameter *100-60k* #(0.988110167
                                   0.990352500
                                   0.989940667
                                   0.991670667
                                   [...]))
CL-USER> (defparameter *1000-128k* #(0.991456281
                                     0.991559578
                                     0.990970109
                                     0.990425805
                                     [...]))
CL-USER> (alexandria:mean *100-60k*)
0.9897938
CL-USER> (alexandria:mean *1000-128k*)
0.9909645
CL-USER> (- * **)
0.0011706948

The mean for 1000 bins @ 128K ball/bin is slightly higher than that for 100 bins @ 60k ball/bin. We will now simulate the null hypothesis (in our case, that the distributions for the two setups are identical), and determine how rarely we observe a difference of 0.00117 in means. I only use a null hypothesis where the distributions are identical for simplicity; we could use the same resampling procedure to simulate distributions that, e.g., have identical shapes, but one is shifted right of the other.

In order to simulate our null hypothesis, we want to be as close to the test we performed as possible, with the only difference being that we generate data by reshuffling from our observations.

CL-USER> (defparameter *resampling-data* (concatenate 'simple-vector *100-60k* *1000-128k*))
*RESAMPLING-DATA*
CL-USER> (length *100-60k*)
10000
CL-USER> (length *1000-128k*)
10000

The two observation vectors have the same size, 10000 values; in general, that’s not always the case, and we must make sure to replicate the sample sizes in the simulation. We’ll generate our simulated observations by shuffling the *resampling-data* vector, and splitting it in two subvectors of ten thousand elements.

CL-USER> (let* ((shuffled (alexandria:shuffle *resampling-data*))
                (60k (subseq shuffled 0 10000))
                (128k (subseq shuffled 10000)))
           (- (alexandria:mean 128k) (alexandria:mean 60k)))
6.2584877e-6

We’ll convert that to a truth value by comparing the difference of simulated means with the difference we observed in our real data, \(0.00117\ldots\), and declare success when the simulated difference is at least as large as the actual one. This approach gives us a one-sided test; a two-sided test would compare the absolute values of the differences.

CL-USER> (csm:csm-driver 
          (lambda (_)
            (declare (ignore _))
            (let* ((shuffled (alexandria:shuffle *resampling-data*))
                   (60k (subseq shuffled 0 10000))
                   (128k (subseq shuffled 10000)))
              (>= (- (alexandria:mean 128k) (alexandria:mean 60k))
                  0.0011706948)))
          0.005 1d-9 :alpha-hi 0.01 :stream *standard-output*)
         1 0.000e+0 7.761e-11 10.000e-1 -2.967e-1
        10 0.000e+0 0.000e+0 8.709e-1 -9.977e-1
        20 0.000e+0 0.000e+0 6.577e-1 -1.235e+0
        30 0.000e+0 0.000e+0 5.163e-1 -1.360e+0
        40 0.000e+0 0.000e+0 4.226e-1 -1.438e+0
        50 0.000e+0 0.000e+0 3.569e-1 -1.489e+0
        60 0.000e+0 0.000e+0 3.086e-1 -1.523e+0
        70 0.000e+0 0.000e+0 2.718e-1 -1.546e+0
        80 0.000e+0 0.000e+0 2.427e-1 -1.559e+0
        90 0.000e+0 0.000e+0 2.192e-1 -1.566e+0
       100 0.000e+0 0.000e+0 1.998e-1 -1.568e+0
       200 0.000e+0 0.000e+0 1.060e-1 -1.430e+0
       300 0.000e+0 0.000e+0 7.207e-2 -1.169e+0
       400 0.000e+0 0.000e+0 5.460e-2 -8.572e-1
       500 0.000e+0 0.000e+0 4.395e-2 -5.174e-1
       600 0.000e+0 0.000e+0 3.677e-2 -1.600e-1
       700 0.000e+0 0.000e+0 3.161e-2 2.096e-1
       800 0.000e+0 0.000e+0 2.772e-2 5.882e-1
       900 0.000e+0 0.000e+0 2.468e-2 9.736e-1
      1000 0.000e+0 0.000e+0 2.224e-2 1.364e+0
      2000 0.000e+0 0.000e+0 1.119e-2 5.428e+0

NIL
T
0.0d0
2967
0
0.0d0
0.007557510165262294d0

We tried to replicate the difference 2967 times, and did not succeed even once. The CSM stopped us there, and we find a CI for the probability of observing our difference, under the null hypothesis, of [0, 0.007557] (i.e., \(p < 0.01\)). Or, for a graphical summary, . We can also test for a lower \(p\)-value by changing the thresholds and running the simulation more times (around thirty thousand iterations for \(p < 0.001\)).

This experiment lets us conclude that the difference in mean fill rate between 100 bins @ 60K ball/bin and 1000 @ 128K is probably not due to chance: it’s unlikely that we observed an expected difference between data sampled from the same distribution. In other words, “I’m confident that the fill rate for 1000 bins @ 128K ball/bin is greater than for 100 bins @ 60K ball/bins, because it would be highly unlikely to observe a difference in means that extreme if they had the same distribution (\(p < 0.01\))”.

In general, we can use this exact test when we have two sets of observations, \(X\sb{0}\) and \(Y\sb{0}\), and a statistic \(f\sb{0} = f(X\sb{0}, Y\sb{0})\), where \(f\) is a pure function (the extension to three or more sets of observations is straightforward).

The test lets us determine the likelihood of observing \(f(X, Y) \geq f\sb{0}\) (we could also test for \(f(X, Y) \leq f\sb{0}\)), if \(X\) and \(Y\) were taken from similar distributions, modulo simple transformations (e.g., \(X\)’s mean is shifted compared to \(Y\)’s, or the latter’s variance is double the former’s).

We answer that question by repeatedly sampling without replacement from \(X\sb{0} \cup Y\sb{0}\) to generate \(X\sb{i}\) and \(Y\sb{i}\), such that \(|X\sb{i}| = |X\sb{0}|\) and \(|Y\sb{i}| = |Y\sb{0}|\) (e.g., by shuffling a vector and splitting it in two). We can apply any simple transformation here (e.g., increment every value in \(Y\sb{i}\) by \(\Delta\) to shift its mean by \(\Delta\)). Finally, we check if \(f(X\sb{i}, Y\sb{i}) \geq f\sb{0} = f(X\sb{0}, Y\sb{0})\); if so, we return success for this iteration, otherwise failure.

The loop above is a Bernoulli process that generates independent, identically distributed (assuming the random sampling is correct) truth values, and its success rate is equal to the probability of observing a value for \(f\) “as extreme” as \(f\sb{0}\) under the null hypothesis. We use the CSM with false positive rate \(\varepsilon\) to know when to stop generating more values and compute a credible interval for the probability under the null hypothesis. If that probability is low (less than some predetermined threshold, like \(\alpha = 0.001\)), we infer that the null hypothesis does not hold, and declare that the difference in our sample data points at a real difference in distributions. If we do everything correctly (cough), we will have implemented an Atlantic City procedure that fails with probability \(\alpha + \varepsilon\).

Personally, I often just set the threshold and the false positive rate unreasonably low and handwave some Bayes.

That’s all!

I pushed the code above, and much more, to github, in Common Lisp, C, and Python (probably Py3, although 2.7 might work). Hopefully anyone can run with the code and use it to test, not only SLO-type properties, but also answer more general questions, with an exact test. I’d love to have ideas or contributions on the usability front. I have some throw-away code in attic/, which I used to generate the SVG in this post, but it’s not great. I also feel like I can do something to make it easier to stick the logic in shell scripts and continuous testing pipelines.

When I passed around a first draft for this post, many readers that could have used the CSM got stuck on the process of moving from mathematical expressions to computer code; not just how to do it, but, more fundamentally, why we can’t just transliterate Greek to C or CL. I hope this revised post is clearer. Also, I hope it’s clear that the reason I care so much about not introducing false positive via rounding isn’t that I believe they’re likely to make a difference, but simply that I want peace of mind with respect to numerical issues; I really don’t want to be debugging some issue in my tests and have to wonder if it’s all just caused by numerical errors.

The reason I care so much about making sure users can understand what the CSM codes does (and why it does what it does) is that I strongly believe we should minimise dependencies whose inner working we’re unable to (legally) explore. Every abstraction leaks, and leakage is particularly frequent in failure situations. We may not need to understand magic if everything works fine, but, everything breaks eventually, and that’s when expertise is most useful. When shit’s on fire, we must be able to break the abstraction and understand how the magic works, and how it fails.

This post only tests ideal SLO-type properties (and regular null hypothesis tests translated to SLO properties), properties of the form “I claim that this indicator satisfies $PREDICATE x% of the time, with false positive rate y%” where the indicator’s values are independent and identically distributed.

The last assumption is rarely truly satisfied in practice. I’ve seen an interesting choice, where the service level objective is defined in terms of a sample of production requests, which can replayed, shuffled, etc. to ensure i.i.d.-ness. If the nature of the traffic changes abruptly, the SLO may not be representative of behaviour in production; but, then again, how could the service provider have guessed the change was about to happen? I like this approach because it is amenable to predictive statistical analysis, and incentivises communication between service users and providers, rather than users assuming the service will gracefully handle radically new crap being thrown at it.

Even if we have a representative sample of production, it’s not true that the service level indicators for individual requests are distributed identically. There’s an easy fix for the CSM and our credible intervals: generate i.i.d. sets of requests by resampling (e.g., shuffle the requests sample) and count successes and failures for individual requests, but only test for CSM termination after each resampled set.

On a more general note, I see the Binomial and Exact tests as instances of a general pattern that avoids intuitive functional decompositions that create subproblems that are harder to solve than the original problem. For example, instead of trying to directly determine how frequently the SLI satisfies some threshold, it’s natural to first fit a distribution on the SLI, and then compute percentiles on that distribution. Automatically fitting an arbitrary distribution is hard, especially with the weird outliers computer systems spit out. Reducing to a Bernoulli process before applying statistics is much simpler. Similarly, rather than coming up with analytical distributions in the Exact test, we brute-force the problem by resampling from the empirical data. I have more examples from online control systems... I guess the moral is to be wary of decompositions where internal subcomponents generate intermediate values that are richer in information than the final output.

Thank you Jacob, Ruchir, Barkley, and Joonas for all the editing and restructuring comments.


  1. Proportions are unscaled probabilities that don’t have to sum or integrate to 1. Using proportions instead of probabilities tends to make calculations simpler, and we can always get a probability back by rescaling a proportion by the inverse of its integral.

  2. Instead of a \(\mathrm{Beta}(a+1, b+1)\), they tend to bound with a \(\mathrm{Beta}(a, b)\). The difference is marginal for double-digit \(n\).

  3. I used the bisection method instead of more sophisticated ones with better convergence, like Newton’s method or the derivative-free Secant method, because bisection already adds one bit of precision per iteration, only needs a predicate that returns “too high” or “too low,” and is easily tweaked to be conservative when the predicate declines to return an answer.

Eugene ZaikonnikovA tiny Lisp bytecode interpreter in Z-80 assembly

· 134 days ago

It all started with a raid on a long abandoned hosting service. Seen a mention of it in the news, leading to a vague recollection of using it for something. Email address associated with the account was long defunct, and the service itself changed ownership a few times in the past two decades. But incredibly, I could recall login credentials and they worked still.

Amazingly, in a pile of abandoned HTML templates, obsolete software archives and Under Construction GIFs there was a source file for a project I long considered lost. It's a minimal Lisp bytecode interpreter written in assembly for ZX Spectrum along the lines of MIT AIM-514. Save for address locations and maybe a couple ROM calls for error reporting it's generic Z-80 code.

It was a part of bigger project that should have included a primitive REPL, but no trace of that was found. Also, am quite sure there is a henious bug lurking in the mark&sweep GC. Should really find time to finally debug that!

Paul KhuongAn Old Conjecture on Stream Transducers

· 142 days ago

I’ve been thinking about stream processing again, and came back to an old pick-two-of-three conjecture of mine: for stream processing without dynamic allocation, “arbitrary outputs per input, multiple consumers, multiple producers: choose two.”

The question is interesting because stream processing in constant space is a subset of L (or FL), and thus probably not P-complete, let alone Turing complete. Having easily characterisable subsets of stream processing that can be implemented in constant space would be a boon for the usability of stream DSLs.

I think I find this academic trope as suspicious as @DRMavIver does, so I have mixed feelings about the fact that this one still feels true seven years later.

The main reason I believe in this conjecture is the following example, F(S(X), X), where S is the function that takes a stream and ouputs every other value. Or, more formally, \(F\sb{i} = f(X\sb{2i}, X\sb{i})\).

Let’s say X is some stream of values that can’t be easily re-computed (e.g., each output value is the result of a slow computation). How do we then compute F(S(X), X) without either recomputing the stream X, or buffering an unbounded amount of past values from that stream? I don’t see a way to do so, not just in any stream processing DSL (domain specific language), but also in any general purpose language.

For me, the essence of the problem is that the two inputs to F are out of sync with respect to the same source of values, X: one consumes two values of X per invocation of F, and the other only one. This issue could also occur if we forced stream transducers (processing nodes) to output a fixed number of value at each invocation: let S repeat each value of X twice, i.e., interleave X with X (\(F\sb{i} = f(X\sb{\lfloor i / 2\rfloor}, X\sb{i})\)).

Forcing each invocation of a transducer to always produce exactly one value is one way to rule out this class of stream processing network. Two other common options are to forbid either forks (everything is single-use or subtrees copied and recomputed for each reuse) or joins (only single-input stream processing nodes).

I don’t think this turtle-and-hare desynchronisation problem is a weakness in stream DSLs, I only see a reasonable task that can’t be performed in constant space. Given the existence of such tasks, I’d like to see stream processing DSLs be explicit about the tradeoffs they make to balance performance guarantees, expressiveness, and usability, especially when it comes to the performance model.

Christophe Rhodessbcl method tracing

· 151 days ago

Since approximately forever, sbcl has advertised the possibility of tracing individual methods of a generic function by passing :methods t as an argument to trace. Until recently, tracing methods was only supported using the :encapsulate nil style of tracing, modifying the compiled code for function objects directly.

For a variety of reasons, the alternative :encapsulate t implementation of tracing, effectively wrapping the function with some code to run around it, is more robust. One problem with :encapsulate nil tracing is that if the object being traced is a closure, the modification of the function's code will affect all of the closures, not just any single one - closures are distinct objects with distinct closed-over environments, but they share the same execuable code, so modifying one of them modifies all of them. However, the implementation of method tracing I wrote in 2005 - essentially, finding and tracing the method functions and the method fast-functions (on which more later) - was fundamentally incompatible with encapsulation; the method functions are essentially never called by name by CLOS, but by more esoteric means.

What are those esoteric means, I hear you ask?! I'm glad I can hear you. The Metaobject Protocol defines a method calling convention, such that method calls receive as two arguments firstly: the entire argument list as the method body would expect to handle; and secondly: the list of sorted applicable next methods, such that the first element is the method which should be invoked if the method uses call-next-method. So a method function conforming to this protocol has to:

  1. destructure its first argument to bind the method parameters to the arguments given;
  2. if call-next-method is used, reconstruct an argument list (in general, because the arguments to the next method need not be the same as the arguments to the existing method) before calling the next method's method-function with the reconstructed argument list and the rest of the next methods.

But! For a given set of actual arguments, for that call, the set of applicable methods is known; the precedence order is known; and, with a bit of bookkeeping in the implementation of defmethod, whether any individual method actually calls call-next-method is known. So it is possible, at the point of calling a generic-function with a set of arguments, to know not only the first applicable method, but in fact all the applicable methods, their ordering, and the combination of those methods that will actually get called (which is determined by whether methods invoke call-next-method and also by the generic function's method combination).

Therefore, a sophisticated (and by "sophisticated" here I mean "written by the wizards at Xerox PARC)") implementation of CLOS can compile an effective method for a given call, resolve all the next-method calls, perform some extra optimizations on slot-value and slot accessors, improve the calling convention (we no longer need the list of next methods, but only a single next effective-method, so we can spread the argument list once more), and cache the resulting function for future use. So the one-time cost for each set of applicable methods generates an optimized effective method, making use of fast-method-functions with the improved calling convention.

Here's the trick, then: this effective method is compiled into a chain of method-call and fast-method-call objects, which call their embedded functions. This, then, is ripe for encapsulation; to allow method tracing, all we need to do is arrange at compute-effective-method time that those embedded functions are wrapped in code that performs the tracing, and that any attempt to untrace the generic function (or to modify the tracing parameters) reinitializes the generic function instance, which clears all the effective method caches. And then Hey Bob, Your Uncle's Presto! and everything works.

(defgeneric foo (x)
  (:method (x) 3))
(defmethod foo :around ((x fixnum))
  (1+ (call-next-method)))
(defmethod foo ((x integer))
  (* 2 (call-next-method)))
(defmethod foo ((x float))
  (* 3 (call-next-method)))
(defmethod foo :before ((x single-float))
  'single)
(defmethod foo :after ((x double-float))
 'double)

Here's a generic function foo with moderately complex methods. How can we work out what is going on? Call the method tracer!

CL-USER> (foo 2.0d0)
  0: (FOO 2.0d0)
    1: ((SB-PCL::COMBINED-METHOD FOO) 2.0d0)
      2: ((METHOD FOO (FLOAT)) 2.0d0)
        3: ((METHOD FOO (T)) 2.0d0)
        3: (METHOD FOO (T)) returned 3
      2: (METHOD FOO (FLOAT)) returned 9
      2: ((METHOD FOO :AFTER (DOUBLE-FLOAT)) 2.0d0)
      2: (METHOD FOO :AFTER (DOUBLE-FLOAT)) returned DOUBLE
    1: (SB-PCL::COMBINED-METHOD FOO) returned 9
  0: FOO returned 9
9

This mostly works. It doesn't quite handle all cases, specifically when the CLOS user adds a method and implements call-next-method for themselves:

(add-method #'foo
            (make-instance 'standard-method
             :qualifiers '()
             :specializers (list (find-class 'fixnum))
             :lambda-list '(x)
             :function (lambda (args nms) (+ 2 (funcall (sb-mop:method-function (first nms)) args (rest nms))))))
CL-USER> (foo 3)
  0: (FOO 3)
    1: ((METHOD FOO :AROUND (FIXNUM)) 3)
      2: ((METHOD FOO (FIXNUM)) 3)
      2: (METHOD FOO (FIXNUM)) returned 8
    1: (METHOD FOO :AROUND (FIXNUM)) returned 9
  0: FOO returned 9
9

In this trace, we have lost the method trace from the direct call to the method-function, and calls that that function makes; this is the cost of performing the trace in the effective method, though a mitigating factor is that we have visibility of method combination (through the (sb-pcl::combined-method foo) line in the trace above). It would probably be possible to do the encapsulation in the method object itself, by modifying the function and the fast-function, but this requires rather more book-keeping and (at least theoretically) breaks the object identity: we do not have licence to modify the function stored in a method object. So, for now, sbcl has this imperfect solution for users to try (expected to be in sbcl-1.4.9, probably released towards the end of June).

(I can't really believe it's taken me twelve years to do this. Other implementations have had this working for years. Sorry!)

Quicklisp newsNo May 2018 Quicklisp dist update

· 165 days ago
The computer on which I make Quicklisp builds stopped working a little while ago, and I haven't had time to dive in and work on it. As soon as it's fixed, I'll prepare and release a new dist. Sorry about the inconvenience!

Christophe Rhodessbcl method-combination fixes

· 172 days ago

At the 2018 European Lisp Symposium, the most obviously actionable feedback for SBCL from a presentation was from Didier's remorseless deconstruction of SBCL's support for method combinations (along with the lack of explicitness about behavioural details in the ANSI CL specification and the Art of the Metaobject Protocol). I don't think that Didier meant to imply that SBCL was particularly bad at method combinations, compared with other available implementations - merely that SBCL was a convenient target. And, to be fair, there was a bug report from a discussion with Bruno Haible back in SBCL's history - May/June 2004, according to my search - which had languished largely unfixed for fourteen years.

I said that I found the Symposium energising. And what better use to put that energy than addressing user feedback? So, I spent a bit of time earlier this month thinking, fixing and attempting to work out what behaviours might actually be useful. To be clear, SBCL's support for define-method-combination was (probably) standards-compliant in the usual case, but if you follow the links from above, or listen to Didier's talk, you will be aware that that's not saying all that much, in that almost nothing is specified about behaviours under redefinition of method combinations.

So, to start with, I solved the cache invalidation (one of the hardest problems in Computer Science), making sure that discriminating functions and effective methods are reset and invalidated for all affected generic functions. This was slightly complicated by the strategy that SBCL has of distinguishing short and long method-combinations with distinct classes (and distinct implementation strategies for compute-effective-method); but this just needed to be methodical and careful. Famous last words: I think that all method-combination behaviour in SBCL is now coherent and should meet user expectations.

More interesting, I think, was coming up with test cases for desired behaviours. Method combinations are not, I think, widely used in practice; whether that is because of lack of support, lack of understanding or lack of need of what they provide, I don't know. (In fact in conversations at ELS we discussed another possibility, which is that everyone is more comfortable customising compute-effective-method instead - both that and define-method-combination provide ways for inserting arbitrary code for the effect of a generic function call with particular arguments. But what this means is that there isn't, as far as I know at least, a large corpus of interesting method combinations to play with.

One interesting one which came up: Bike on #lisp designed an implementation using method-combinations of finite state machines, which I adapted to add to SBCL's test suite. My version looks like:

(define-method-combination fsm (default-start)
    ((primary *))
    (:arguments &key start)
  `(let ((state (or ,start ',default-start)))
     (restart-bind
         (,@(mapcar (lambda (m) `(,(first (method-qualifiers m))
                                  (lambda ()
                                    (setq state (call-method ,m))
                                    (if (and (typep state '(and symbol (not null)))
                                             (find-restart state))
                                        (invoke-restart state)
                                        state))))
                    primary))
       (invoke-restart state))))

and there will be more on this use of restart-bind in a later post, I hope. Staying on the topic of method combinations, how might one use this fsm method combination? A simple example might be to recognize strings with an even number of #\a characters:

;;; first, define something to help with all string parsing
(defclass info ()
  ((string :initarg :string)
   (index :initform 0)))
;;; then the state machine itself
(defgeneric even-as (info &key &allow-other-keys)
  (:method-combination fsm :yes))
(defmethod even-as :yes (info &key)
  (with-slots ((s string) (i index)) info
    (cond ((= i (length s)) t) ((char= (char s i) #\a) (incf i) :no) (t (incf i) :yes))))
(defmethod even-as :no (info &key)
  (with-slots ((s string) (i index)) info
    (cond ((= i (length s)) nil) ((char= (char s i) #\a) (incf i) :yes) (t (incf i) :no))))

(Exercise for the reader: adapt this to implement a Turing Machine)

Another example of (I think) an interesting method combination was one which I came up with in the context of generalized specializers, for an ELS a while ago: the HTTP Request method-combination to be used with HTTP Accept specializers. I'm interested in more! A github search found some examples before I ran out of patience; do you have any examples?

And I have one further question. The method combination takes arguments at generic-function definition time (the :yes in (:method-combination fsm :yes)). Normally, arguments to things are evaluated at the appropriate time. At the moment, SBCL (and indeed all other implementations I tested, but that's not strong evidence given the shared heritage) do not evaluate the arguments to :method-combination - treating it more like a macro call than a function call. I'm not sure that is the most helpful behaviour, but I'm struggling to come up with an example where the other is definitely better. Maybe something like

(let ((lock (make-lock)))
  (defgeneric foo (x)
    (:method-combination locked lock)
    (:method (x) ...)))

Which would allow automatic locking around the effective method of FOO through the method combination? I need to think some more here.

In any case: the method-combination fixes are in the current SBCL master branch, shortly to be released as sbcl-1.4.8. And there is still time (though not very much!) to apply for the many jobs advertised at Goldsmiths Computing - what better things to do on a Bank Holiday weekend?

Marco AntoniottiSome updates: bugs fixing and CLAD.

· 180 days ago
Hello there,

it has been a very long time since I posted here, but most recently, thanks to a couple of pesky bug reports on HEΛP by Mirko Vukovic, and because I had a couple of days relatively free, I was able to go back to do some programming, fix (some of) the bugs and post here.

Here is the story.  There were two bugs which I had to deal with (there are more, of course).
  1. A bug triggered by CCL and its implementation of the Common Lisp Reader algorithm.
  2. A buglet due to missing supporting data (.css files in this case) in the deployment of the HEΛP generated documentation.
The first bug was quite difficult to track down and it boiled down to CCL bailing out on READ in an unexpected way (that is, with respect to other implementations).  As an aside, this is a problem of the standard non having a predefined condition for "error caused by the reader because it does not find a package"; LW has CONDITIONS:PACKAGE-NOT-FOUND-READER, but that is not standard, and some implementation just signal READER-ERROR or PACKAGE-ERROR.  The error was easy to "fix" once diagnosed: just don't process files that you know will be problematic, and HEΛP can already "exclude" such files.

The second bug was easier to diagnose, but the fix was more complicated (especially due to the NIH syndrome I suffer from).  The problem is that ASDF moves the compiled code around, but not auxiliary data, like in my case, .css files.  I could have followed what ASDF does, but I decided to go another way and came up with a small library I called Common Lisp Application Data (CLAD, because you need to "dress" your code).

CLAD

By now, at least on Windows 10 and Mac OS X (and Linux), there is a a notion of an Application and Data Folder. The user version of this folder (as opposed to the system one) is ~/Library/ on Mac OS X and %USERPROFILE%\AppData\Roaming\ (this is the "roaming" profile in W10 parlance).  For Linux there are several semi-standards, one of them is the XDG base directory specification; in this case a common solution is to use the ~/.config/ folder.

CLAD assumes these "fixed" locations to create per-application or per-library subfolders of a "Common Lisp" one.  That is, CLAD ensures the presence of the following folders in your account.
  • Mac Os X: ~/Library/Common Lisp/
  • Windows 10: %USERPROFILE%\AppData\Roaming\Common Lisp\
  • Linux/Unix: ~/.config/common-lisp/ (in this case, I am following ASDF lead)
The library exports three simple functions,
  1. user-cl-data-folder, which returns the pathnames of the folders above.
  2. ensure-app-or-library-data-folder, which, as the name implies, ensures that a subfolder exists in the proper location.
  3. app-or-library-data-folder, which returns the pathname associated to a library or app.
A library or app can now set itself up by doing something like

(defparameter *helambdap-data-folder*
  (clad:ensure-app-or-library-data-folder "HELambdaP")
  "The user HELambdaP data folder.")

On Mac OS X, this results in the folder ~/Library/Common Lisp/HELambdaP; a library or an application can now rely on a clear space where to store "common" data files.  For HEΛP it solved the problem of where to find the .css files in a reliable place.

Trivial? Yes.
NIH syndrome? Of course.
Complete? No.
Useful? You be the judge of that.


(cheers)

LispjobsLisp Developer, 3E, Brussels, Belgium

· 182 days ago

See: http://3eeu.talentfinder.be/en/vacature/30101/lisp-developer

You join a team of developers, scientists, engineers and business developers that develop, operate and commercialize SynaptiQ worldwide.

You work in a Linux-based Java, Clojure and Common Lisp environment. Your focus is on the development, maintenance, design and unit testing of SynaptiQ's real-time aggregation and alerting engine that processes time-series and events. This data engine is Common Lisp based.

The objective is to own the entire lifecycle of the platform, that is from the architecture and development of new features to the deployment and operation of the platform in production environment. The position is open to candidates with no knowledge of LISP if they have a good affinity and experience in functional languages.

Christophe Rhodesalgorithms and data structures term2

· 189 days ago

I presented some of the work on teaching algorithms and data structures at the 2018 European Lisp Symposium

Given that I wanted to go to the symposium (and I'm glad I did!), the most economical method for going was if I presented research work - because then there was a reasonable chance that my employer would fund the expenses (spoiler: they did; thank you!). It might perhaps be surprising to hear that they don't usually directly fund attending events where one is not presenting; on the other hand, it's perhaps reasonable on the basis that part of an academic's job as a scholar and researcher is to be creating and disseminating new knowledge, and of course universities, like any organizations, need to prioritise spending money on things which bring value or further the organization's mission.

In any case, I found that I wanted to write about the teaching work that I have been doing, and in particular I chose to write about a small, Lisp-related aspect. Specifically, it is now fairly normal in technical subjects to perform a lot of automated testing of students; it relieves the burden on staff to assess things which can be mechanically assessed, and deliver feedback to individual students which can be delivered automatically; this frees up staff time to perform targeted interventions, give better feedback on more qualitative aspects of the curriculum, or work fewer weekends of the year. A large part of my teaching work for the last 18 months has been developing material for these automated tests, and working on the infrastructure underlying them, for my and colleagues' teaching.

So, the more that we can test automatically and meaningfully, the more time we have to spend on other things. The main novelty here, and the lisp-related hook for the paper I submitted to ELS, was being able to give meaningful feedback on numerical answer questions which probed whether students were developing a good mental model of the meaning of pseudocode. That's a bit vague; let's be specific and consider the break and continue keywords:

x ← 0
for 0 ≤ i < 9
  x ← x + i
  if x > 17
    continue
  end if
  x ← x + 1
end for
return x

The above pseudocode is typical of what a student might see; the question would be "what does the above block of pseudocode return?", which is mildly arithmetically challenging, particularly under time pressure, but the conceptual aspect that was being tested here was whether the student understood the effect of continue. Therefore, it is important to give the student specific feedback; the more specific, the better. So if a student answered 20 to this question (as if the continue acted as a break), they would receive a specific feedback message reminding them about the difference between the two operators; if they answered 45, they received a message reminding them that continue has a particular meaning in loops; and any other answers received generic feedback.

Having just one of these questions does no good, though. Students will go to almost any lengths to avoid learning things, and it is easy to communicate answers to multiple-choice and short-answer questions among a cohort. So, I needed hundreds of these questions: at least one per student, but in fact by design the students could take these multiple-chocie quizzes multiple times, as they are primarily an aid for the students themselves, to help them discover what they know.

Now of course I could treat the above pseudocode fragment as a template, parameterise it (initial value, loop bounds, increment) and compute the values needing the specific feedback in terms of the values of the parameters. But this generalizes badly: what happens when I decide that I want to vary the operators (say to introduce multiplication) or modify the structure somewhat (e.g. by swapping the two increments before and after the continue)? The parametrization gets more and more complicated, the chances of (my) error increase, and perhaps most importantly it's not any fun.

Instead, what did I do? With some sense of grim inevitability, I evolved (or maybe accreted) an interpreter (in emacs lisp) for a sexp-based representation of this pseudocode. At the start of the year, it's pretty simple; towards the end it has developed into an almost reasonable mini-language. Writing the interpreter is straightforward, though the way it evolved into one gigantic case statement for supported operators rather than having reasonable semantics is a bit of a shame; as a bonus, implementing a pretty-printer for the sexp-based pseudocode, with correct indentation and keyword highlighting, is straightforward. Then armed with the pseudocode I will ask the students to interpret, I can mutate it in ways that I anticipate students might think like (replacing continue with break or progn) and interpret that form to see which wrong answer should generate what feedback.

Anyway, that was the hook. There's some evidence in the paper that the general approach of repeated micro-assessment, and also the the consideration of likely student mistakes and giving specific feedback, actually works. And now that the (provisional) results are in, how does this term compare with last term? We can look at the relationship between this term's marks and last term's. What should we be looking for? Generally, I would expect marks in the second term's coursework to be broadly similar to the marks in the first term - all else being equal, students who put in a lot of effort and are confident with the material in term 1 are likely to have an easier time integrating the slightly more advanced material in term 2. That's not a deterministic rule, though; some students will have been given a wake-up call by the term 1 marks, and equally some students might decide to coast.

plot of term 2 marks against term 1: a = 0.82, R² = 0.67

I've asked R to draw the regression line in the above picture; a straight line fit seems reasonable based on the plot. What are the statistics of that line?

R> summary(lm(Term2~Term1, data=d))

Call:
lm(formula = Term2 ~ Term1, data = d)

Residuals:
        Min      1Q  Median      3Q     Max 
    -41.752  -6.032   1.138   6.107  31.155 

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.18414    4.09773   0.777    0.439
Term1        0.82056    0.05485  14.961   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.46 on 107 degrees of freedom
  (32 observations deleted due to missingness)
Multiple R-squared:  0.6766,    Adjusted R-squared:  0.6736 
F-statistic: 223.8 on 1 and 107 DF,  p-value: < 2.2e-16

Looking at the summary above, we have a strong positive relationship between term 1 and term 2 marks. The intercept is approximately zero (if you got no marks in term 1, you should expect no marks in term 2), and the slope is less than one: on average, each mark a student got in term 1 tended to convert to 0.8 marks in term 2 - this is plausibly explained by the material being slightly harder in term 2, and by the fact that some of the assessments were more explicitly designed to allow finer discrimination at the top end - marks in the 90s. (A note for international readers: in the UK system, the pass mark is 40%, excellent work is typically awarded a mark in the 70% range - marks of 90% should be reserved for exceptional work). The average case is, however, only that: there was significant variation from that average line, and indeed (looking at the quartiles) over 50% of the cohort was more than half a degree class (5 percentage points) away from their term 2 mark as "predicted" from their mark for term 1.

All of this seems reasonable, and it was a privilege to work with this cohort of students, and to present the sum of their interactions on this course to the audience I had. I got the largest round of applause, I think, for revealing that as part of the peer-assessment I had required that students run each others' code. I also had to present some of the context for the work; not only because this was an international gathering, with people in various university systems and from industry, but also because of the large-scale disruption caused by industrial action over the Universities Superannuation Scheme (the collective, defined benefit pension fund for academics at about 68 Universities and ~300 other bodies associated with Higher Education). Perhaps most gratifyingly, students were able to continue learning despite being deprived of their tuition for three consecutive weeks; judging by their performance on the various assessments so far,

And now? The students will sit an exam, after which I and colleagues will look in detail at those results and the relationship with the students' coursework marks (as I did last year). I will continue developing this material (my board for this module currently lists 33 todo items), and adapt it for next year and for new cohorts. And maybe you will join me? The Computing department at Goldsmiths is hiring lecturers and senior lecturers to come and participate in research, scholarship and teaching in computing: a lecturer in creative computing, a lecturer in computer games, a lecturer in data science, a lecturer in physical and creative computing, a lecturer in computer science and a senior lecturer in computer science. Anyone reading this is welcome to contact me to find out more!

Victor Anyakin&#8220;A new way of blogging about Common Lisp&#8221; by Yehonathan Sharvit

· 190 days ago

Seen this post mentioned several times on Twitter, but not on Planet Lisp yet. So, here it is:

A new way of blogging about Common Lisp by Yehonathan Sharvit (@viebel).

Christophe Rhodesels2018 reflections

· 193 days ago

A few weeks ago, I attended the 2018 European Lisp Symposium.

If you were at the 2017 iteration, you might (as I do) have been left with an abiding image of greyness. That was not at all to fault the location (Brussels) or the organization (co-located with ) of the symposium; however, the weather was unkind, and that didn't help with my own personal low-energy mood at the time.

This year, the event was organized by Ravenpack in Marbella. And the first thing that I noticed on landing was the warmth. (Perhaps the only "perk" of low-cost airlines is that it is most likely that one will disembark on to terra firma rather than a passenger boarding bridge. And after quite a miserable winter in the UK, the warmth and the sunshine at the bus stop, while waiting for the bus from Malagá airport to Marbella, was very welcome indeed. The sense of community was strong too; while waiting for the bus, I hopped on to #lisp IRC and Nic Hafner volunteered to meet me at the Marbella bus station and walk with me to my hotel; we ended up going for drinks at a beachside bar that evening with Dimitri Fontaine, Christian Schafmeister, Mikhail Raskin and others - and while we were sitting there, enjoying the setting, I saw other faces I recognised walking past along the beach promenade.

The setting for the conference itself, the Centro Cultural Cortijo de Miraflores, was charming. We had the use of a room, just about big enough for the 90ish delegates; the centre is the seat of the Marbella Historical Archives, and our room was next to the olive oil exhibition.

2018 European Lisp Symposium opening

We also had an inside courtyard for the breaks; sun, shade, water and coffee were available in equal measure, and there was space for good conversations - I spoke with many people, and it was great to catch up and reminisce with old friends, and to discuss the finer points of language implementation and release management with new ones. (I continue to maintain that the SBCL time-boxed monthly release cadence of master, initiated by Bill Newman way back in the SBCL 0.7 days in 2002 [!], has two distinct advantages, for our situation, compared with other possible choices, and I said so more than once over coffee.)

The formal programme, curated by Dave Cooper, was great too. Zach's written about his highlights; I enjoyed hearing Jim Newton's latest on being smarter about typecase, and Robert Smith's presentation about quantum computing, including a walkthrough of a quantum computing simulator. As well as quantum computing, application areas mentioned in talks this year included computational chemistry, data parallel processing, pedagogy, fluid dynamics, architecture, sentiment analysis and enterprise resource planning - and there were some implementation talks in there too, not least from R. "it's part of the brand" Matthew Emerson, who gave a paean to Clozure Common Lisp. Highly non-scientific SBCL user feedback from ELS presenters: most of the speakers with CL-related talks or applications at least mentioned SBCL; most of those mentions by speaker were complimentary, but it's possible that most by raw count referred to bugs - solely due to Didier Verna's presentation about method combinations (more on this in a future blog post).

Overall, I found this year's event energising. I couldn't be there any earlier than the Sunday evening, nor could I stay beyond Wednesday morning, so I missed at least the organized social event, but the two days I had were full and stress-free (even when chairing sessions and for my own talk). The local organization was excellent; Andrew Lawson, Nick Levine and our hosts at the foundation did us proud; we had a great conference dinner, and the general location was marvellous.

Next year's event will once again be co-located with <Programming>, this time in Genova (sunny!) on 1st-2nd April 2019. Put it in your calendar now!

Didier VernaLisp, Jazz, Aikido, 10 years later

· 196 days ago

10 years ago, I published a short blog entitled "Lisp, Jazz, Aikido", barely scratching the surface of what I found to be commonalities between the 3 disciplines. At the time, I had the intuition that those ideas were the tip of a potentially big iceberg, and I ended the blog with the following sentence: "I'd like to write a proper essay about these things when I find the time... someday."

Well, 10 years later, I did. The essay, which is 50 pages long, has been published in the Art, Science, and Engineering of Programming Journal, and actually received the Reviewers'Choice Award 2018. I'm not the bragging type, far from it, but I had to mention this because this essay is so personal, and I invested so much in its preparation (more than 300 hours) that I am as deeply touched by the award as I would have been hurt, had it been negatively received...

The live presentation has unfortunately not been recorded, but I took the time to make a screencast afterwards, which is now available on YouTube. Just like the essay, this presentation is not in the typical setting that you'd expect at a scientific conference...

If you've got an artistic fiber, if you're sensitive to the aesthetic dimension in what you do, you may enjoy this work...

Zach BeaneThoughts on ELS 2018

· 197 days ago

Matt Emerson opened the conference keynote talk on Clozure CL. He also touched on using and celebrating CL. It provided a jolt of enthusiasm and positivity to kick off a fun conference. And it made me want to use Clozure CL more often and learn how to effectively hack on and with it.

Nicolas Hafner's talk on shaders was interesting, but partway through the talk he revealed that the slideshow system itself was an interactive Common Lisp program. Along with standard slideshow behavior, it displayed and animated the classic teapot, with interactive recompilation and redisplay of the shaders.

Ethan Schwartz's talk on Google's tweaks to SBCL's GC was a reminder that you don't have to settle for the system-provided customization knobs. When source is available and you're willing to customize very specifically for your application's workload, you can get nice improvements.

(Also his passing mention of Docker inspired me to work on a Docker image with all the Quicklisp-required foreign libraries baked into it. I think it will be helpful to me for easier testing, and maybe distributed builds. It should be helpful to others for testing everything in Quicklisp without the pain of gathering all the build requirements for its many projects.)

You should check out Douglas Katzman's lightning talk, "Self-modifying code (for fun and profit)". He's added a neat thing in SBCL where the system can be disassembled completely and reassembled into a new binary with various bits of optimization or instrumentation. 

There were many other great talks and discussions!

It's rejuvenating to hear about all the cool things people are working on. There's stuff people are making that you might make use of in your own projects, or inspiration to hack your own things to share with others.

Also: great weather, great city, fun excursion to Ronda, seeing old friends from past conferences, meeting new friends, putting online names to real-life faces, hearing directly from people who use Quicklisp. And Dmitri Fontaine told me about tic80, which my kid and I used to make a retro game in just a few days.

Next ELS conference is in Genova, Italy on April 1, 2019. Start planning today, and I'll see you there!

Quicklisp newsQuicklisp dist update for April, 2018 now available

· 197 days ago
New projects:
  • bst — Binary search tree — GPL-3
  • cl-colors2 — Simple color library for Common Lisp — Boost Software License - Version 1.0
  • cl-env — Easily parse .env files. That's it! — MIT
  • cl-gopher — Gopher protocol library — BSD 2-Clause
  • clunit2 — CLUnit is a Common Lisp unit testing framework. — BSD
  • dml — Diagram Make Language for common lisp — MIT License
  • external-symbol-not-found — Portability library for detecting reader ~ errors coming from reading non-existing or non-external symbols in packages — Unlicense
  • golden-utils — Auxiliary utilities (AU). — MIT
  • linedit — Readline-style library. — MIT
  • lisp-interface-library — Long name alias for lil — MIT
  • ppath — A Common Lisp path handling library based on Python's os.path module — BSD
  • practical-cl — All code from Practical Common Lisp. — BSD
  • quux-hunchentoot — Thread pooling for hunchentoot — MIT
  • s-dot2 — Render Graphviz graphs from within Lisp — BSD-style
Updated projects3d-matrices3d-vectorsagnostic-lizardarchitecture.builder-protocolceplcl-anacl-bootstrapcl-conllucl-cpuscl-dbicl-factoringcl-fadcl-feedparsercl-fixturescl-flowcl-fluent-loggercl-fondcl-gbmcl-glfw3cl-graylogcl-mlepcl-notebookcl-packcl-pythoncl-random-forestcl-readlinecl-sdl2cl-strcl-yesqlclawclemcloser-mopclwebclxcroatoandefenumdjuladocumentation-utils-extensionsdufydynaesrapfare-memoizationfemlispgamebox-dgengamebox-frame-managergamebox-mathglsl-specglsl-toolkithousehu.dwim.delicoinkwellironcladjonathanjoselacklisp-chatlistopialocal-time-durationmaidenmatlispmcclimmitonibblesninevehoclcloverlordoxenfurtparser.common-rulesparsleyperlrepngloadpostmodernqlotrovertg-mathserapeumshadowstaplestumpwmuniversal-configunix-optsutmvarjoxmls.

Removed projects: cl-openid, cl-sendmail, xmls-tools.

The removed projects depend on XMLS, which was recently updated with a slightly different API, and they were never updated to match.

To get this update, use (ql:update-dist "quicklisp").

Enjoy!


For older items, see the Planet Lisp Archives.


Last updated: 2018-11-02 15:00