Planet Lisp

Patrick SteinAlternatives (v0.2.20141117) Released

· 6 days ago

I have uploaded a new version of my Alternatives library. In addition to the ALTERNATIVES macro, there is an ALTERNATIVES* macro which allows one to specify a name for the set of choices. Then, one can check the DOCUMENTATION to see which alternative was last macroexpanded.

(defun random-letter ()
  (alt:alternatives* random-letter-algorithm
     "Always pick the letter A."

      "Choose any letter with equal probability"
      (random:random-elt "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))))

(documentation 'random-letter-algorithm 'alt:alternatives)
     Always pick the letter A."

Nicolas HafnerRetrospective 731520 Minutes - Confession 39

· 7 days ago

header It's apparently been just 508 days since I first joined github. In that time I've written a lot of Common Lisp code and apparently made around 4000-5000 commits. I now want to make a retrospective and go over all the projects I've started. I'll omit some of the smaller, uninteresting ones though.

The projects are very roughly in the order I remember creating them. I can't recall exactly what it was, so things might be all over the place, but it matters not. An approximate order like this is sufficient.


This was my first big CL project that I started as I was investigating tools for radiance. Radiance already began conceptually before this, but I didn't write significant enough code for it to count. lQuery tries to bring the very convenient jQuery syntax for manipulating the DOM to CL. I did this because I knew jQuery and I did not find the alternatives very appealing. Initially it was supposed to help with templating for the most part, but it turned out to be more useful for other tasks in the end.

The first version of lQuery was written in a hotel room in Japan during my one-week holiday there. Time well spent! Don't worry though, I got out often enough as well.


lQuery was also my first library to be published and distributed via Quicklisp, so I needed it to have easy to read documentation. Docstrings are great, but I wanted that information to be on the documentation page as well, so I looked for libraries that allowed me to bundle that somehow. Given that I couldn't find anything I liked, I quickly wrote up my own thing that used lQuery to generate the page. It was a matter of some hours and made me very pleased at the time.


Radiance is sort of the reason I really got into CL to begin with. The previous versions of the TyNET framework that my websites run on were written in PHP and I got really sick of the sources, so I never really went to fix bugs or make necessary improvements. Things worked, but they didn't work great.

As I picked up CL I had to look for a project to really get used to the language and rewriting my framework seemed like the first, obvious step. I wanted Radiance to become a mature, stable and usable system that other people could profit from as well. So, unlike in previous attempts I tried to take good care to do things right, even if my understanding of the language at that point was questionable at best.

One and a half years and almost a complete re-write (again) later, I still don't regret choosing this as my major project, as I'm now fairly confident that it will become something that people can use in the future. It's not quite there yet, but well on its way.


I dislike breakpoints and love good logging, so the next step that Radiance demanded was a good logging solution. I first tried my hands on log4cl, but didn't quite like it, mostly for a lack of being able to figure out how to make it work the way I wanted. So, rolling my own it was. I wanted something very flexible, so I though up a pipeline system for log message processing and distribution.

That was this library; a very small thing that allowed you to create (albeit in a cumbersome fashion) pipelines that could be used to process and distribute arbitrary messages.


From there on out I went to write the actual logger mechanisms, including threading support. Verbose was the result, and I still use and like it today.


For a while then I was occupied with the task of writing a bot for the Encyclopedia Dramatica wiki that should handle new registrations and bannings by adding templates to the user pages. In order to make this possible I checked out a few IRC libraries and wrote a crude thing that would sit on a channel and accept simple commands.

In order for it to actually do its thing though, I had to interact with the mediawiki API, so I wrote a tiny wrapper library around some of the calls that I needed. I never put this on Quicklisp because it was never fleshed-out enough to be there and it still isn't. Maybe some day I'll revise this to be a properly usable thing.


After I finished the bot I wanted to extend it to be able to interact with the forums of ED, which ran on XenForo. Unfortunately that forum offered absolutely zero APIs to access. There was a plugin, but I couldn't get the admins to install it as the forum was apparently so twisted that doing anything could make it crash and burn. Oh well.

So, I set out the classic way of parsing webpage content. Thanks to lQuery this was not that huge of a pain in the butt, but it still took a lot of fiddling to get things to run. This library too is not on QL as it is a big hack and far from complete as well.


At this point I'm really unsure about the order of the projects. Either way, the little bot project I made for ED was a mess and I wanted a proper bot framework to replace my previous bot, Kizai. As I wasn't impressed by the available IRC libraries either, I wrote Colleen from scratch.

Colleen is still being worked on every now and again today, but (with some bigger and smaller rewrites along the way) it has proven to be a very good framework that I am very glad I took the time to write.


In order to test out Radiance and because I was sick of pastebin as a paste service, I set out to write my own. This, too, has proven to be a good investment of my time as I still use plaster as my primary pasting service today. There's a few things I'd like to improve about it whenever I do get the time to, but for the most part it just works.


At some point I noticed that I'd like to have twitter interaction for some of my web-services, so I looked around for API interfaces for that. However there wasn't anything that really worked well. So, once more I went to write something that fit my needs.

This was my first really frustrating project to get going, mostly because figuring out how oAuth is supposed to work is a huge pain. Web-APIs are some of the worst things to interact with, as often enough there is absolutely no way to figure out what exactly went wrong, so you're left stumbling in the dark until you find something that works.

Even though I haven't really used Chirp much myself, it seems to have been of use to a couple of people at least, if Github stars are anything to go by.


Since oAuth is a repeating pattern on the web and it was sufficiently painful to figure out for Chirp, I segregated that part out into its own library. I'm not sure if anyone aside from myself has used South for anything though.


During one of my rewriting iterations of Colleen I noticed that a very common pattern was to save and load some kind of storage. Moving that pattern out into the framework and thus automating configuration and storage seemed like a good idea. However, since Colleen was also an end-user application, I needed to make sure that the configuration could be saved in a format that the user wanted, rather than simple sexprs.

And that's what Universal-Config is supposed to do: Generalise the access of configuration as well as the storage. It works really well on the code side; accessing parts and changing the config is very simple and convenient. It only works so-so on the configuration storage side of things though, as I needed to strike some gross compromises in the serialisation of the objects to ensure compatibility between formats.

Maybe some day I'll figure out a smarter solution to the problems UC has.


Deferred was an attempt at providing mechanisms for optional features of your code. Meaning that your could would work depending on what kind of libraries are loaded at the time. Therefore I could for example provide a local server based authentication with South without explicitly requiring Hunchentoot or some other webserver. Deferred is more a proof-of-concept than anything though, as I haven't actually utilised it in any of my projects.

However, the problem is an interesting one and whenever I do return to it, I want to try to tackle it from a different angle (extending ASDF to allow something like optional dependencies and conditional components).


The first version of lQuery used Closure-HTML, CXML, and css-selectors to do most of the work. However, CHTML and CXML suffered from big problems: CXML would not parse regular HTML (of course) and CHTML would not parse HTML5 as it required a strict DTD to conform to. Also, css-selectors' performance wasn't the greatest either.

So, in order to clean up all these issues I set out to write my own HT/X/ML parser that should both be fast and lenient towards butchered documents. Well, fast it is, and lenient it is as well. Plump is probably so far my best project in my opinion, as its code is straight-forward, extensible, and just does its job very well.


The next step was to build a CSS-selectors DOM search engine on top of Plump. This turned out to be quite simple, as I could re-use the tools from Plump to parse the selectors and searching the DOM efficiently was not that big of a deal either.

After these two were done, the last job was to re-write lQuery to work with the new systems Plump and CLSS provided. The re-write was a very good idea, as it made lQuery a lot more extensible and easier to read and test. It was quite funny to read such old code, after having worked with CL for about a year by then.


The templating engine I used in Radiance so far had been a combination of lQuery and “uibox”, which provided some very crude tools to fill in fields of nodes on the DOM. I didn't like this approach very much as there was too much lQuery clutter in the code that should've been in the template.

Clip now provides a templating system that hasn't been done in CL before and I don't think has really been done ever. All the code that manipulates your template is in the template itself, but the template is a valid HTML5 document at all times. The trick is to take advantage what HTML already allows you to do: custom tags and attributes. Clip picks those up, parses them and then modifies the DOM according to their instructions. All you have to do in your CL code is to pass in the data the page needs.


lQuery-Doc left a lot to wish for, so another rewrite was in order. This time I took advantage of Clip's capabilities to provide a very straight-forward, no-bullshit tool to generate documentation.

The only drawback it has currently is that its default template doesn't have the greatest stylesheet in the world, but that hardly bothers me. Maybe I'll get to writing a fancy one some day.


I always wanted to write my own painting application, mostly because MyPaint and others were never completely to my liking. I even took attempts at this before in Java. At some point, out of curiosity, I looked into how I would go about grabbing tablet input. Investigating the jPen library brought me absolutely nothing but confusion, so I looked for other ways. Luckily enough it turned out that Qt already provided a built-in way to grab events from tablets and from previous experience with a minor project I knew that CommonQt allowed me to use Qt rather easily from CL out.

So, what started out as a quick test to see whether it would even be possible to make a painting application quickly turned into a big thing that had a lot of potential. You can read more about it here.


A lot of time had passed since I last worked on Radiance. I took time off as I noticed that the framework had turned into something uncanny and I needed to fix that. And the way to fix it was to write a lot of design drafts and work out all the issues that came to mind on paper.

My conclusion after all this was: Radiance needed a complete, from scratch, rewrite. Oh boy. The first part that needed to be done is a proper library to provide the encapsulation into modules. Modules are Radiance's primary abstraction that allow you to neatly separate parts, but also unify the access and interaction between them.

Modularize was the solution for this and it works pretty well. In fact, it works so well that I don't even think about it anymore nowadays, it just does its job as I expect it to. Aside from Modularize itself I wrote two extensions that tucker on support for triggers and the much-needed interfaces and implementations mechanism that is vital to Radiance. I won't explain what these do exactly right now, that'll be for when I write the comprehensive guide to Radiance.


After a long time of rewriting Radiance's core and contribs, it was time to rewrite another component from the old version of TyNET: my blog. This time I tried to focus on simplicity and getting it done well. Simple it is indeed, it's barely 200 lines of code. And as you can probably see as you read this, it works quite nicely.


Writing CSS is a pain in the butt, as it involves a lot of duplication and other annoyances. At some point I had the idea of writing a Lisp to CSS compiler. Taking inspiration from Sass this idea grew into LASS in a matter of.. a day or two, I think?

I now use LASS for all of my style sheet writing concerns as it just works very well and with some minor emacs fiddling I don't even have to worry about compiling it to CSS myself.


Sometimes Xach would talk on IRC about wanting to interact with Tumblr through CL. As Tumblr is a service I use too and the biggest hurdle (oAuth) was already handled by South I took the challenge of writing yet another web-API client.

Humbler turned out a lot nicer than Chirp did in terms of end-user experience, I would say. However, I cannot at all say the same about my own experience while writing it. Tumblr's API “documentation” is quite bad, to be blunt. A lot of the returned fields are not noted on the page, some things are plain wrong (probably out of date) and in general there's just not enough things actually being documented. The worst part about it all was the audacity that the staff had to proclaim in a blog post that they wanted to encourage experimentation!, as if having to figure out the API by yourself was actually a great thing.

Anyway, I haven't actually used Humbler for anything myself, but some people seem to be using it and that's good enough to me.


Returning back to Radiance problems, one of the recurring issues was validating user input. There didn't seem to be a library that did this in any way. And so the same old game of ‘write it yourself’ began. Ratify's development mostly included reading RFCs and trying to translate them into tests and finally bundling it all together in some easy to use macros.


On twitter I encountered a really nice screenshot of an error page on some Clojure project. I couldn't find the tweet again later, so I don't know what features it had exactly, but suffice to say it was better than anything I'd seen up to that point.

That lead me to wonder how I could actually get the stack trace myself if an error occurred. There was already a library that provided rudimentary support for that, trivial-backtrace. Taking a look at its source code filled me with everything else but esteem though, so I headed out to write something that would allow people to inspect the stack, restarts, and accompanied source code easily.


A quick question by eudoxia on twitter inspired me to write a very quick toolkit to extract and infuse CSS from/into HTML. The main use case for the former would be to turn HTML into HTML+CSS and the latter to reverse the process (for, say, emails). Using lQuery and LASS this turned out to be a super easy thing to do and I had it done in no time.

Hooray for great code re-use!


Aside from the blog, the only really actively used component of TyNET was the imageboard, Stevenchan. Stevenchan ran on my own software called Purplish. In order to be able to dump everything of the old-code base I was driven to re-write Purplish for Radiance.

However, Purplish now takes a much different approach. A lot of traditional imageboard features are missing and a couple of unconventional features were added. Plus, having it written in CL has the advantage of being much easier to maintain, so if anything ever does crop up I'll tend much more towards wanting to fix it than I did before with PHP.


I like language a lot. I also like to try and reduce things to their minimum. So, the idea came to me of a site that allowed people to review things with only a single keyword. The idea behind that was to, with sufficient data, see what kind of patterns emerge and find out what people think the essence of an experience is.

Certainly it wouldn't be useful for an actual ‘review’ of anything, but it's nevertheless an interesting experiment. I don't know if I'll ever get enough data to find patterns in this or anything that could lead to scientifically significant correlations, but it's a fun enough thing on its own.


Having completed pretty much everything that I wanted to work on and stalling on some major issues with Radiance I was on the lookout for things to do. Parasol was still on hold and nothing else really picked my interest. In an attempt to start out right and not dive head over heels into it again, I first considered ways in which to make the C++-ness of Qt more lispy.

Born out of this was Qtools, a collection of tools to aid development with CommonQt and make it all feel a bit more homely. Of course, some major issues still remain; you still need to pay attention to garbage and there's still C++ calls lingering about, but all in all the heavy hackery of Qtools does make it more pleasing to the eye.

Qtools forced me to go deeper into the guts of CLOS and MOP than I've ever gone before and I had to spend a lot of time in implementation code to figure out how to make the things I needed work. I wouldn't advise Qtools as a good use of MOP, but it could be considered an impressive work in exercising the flexibility Lisp offers.

So, that's it then for now. I'd like to amend here that during the most part of all these projects I should've been studying for university. I'm not sure if working on these projects was the right choice, but I have learned a huge bunch and I hope that my produce of my efforts has been of use to other people. If not, then it certainly was not the right choice to indulge myself this much in programming.

Before I go into another long rant about my questionable situation in university I'll cap this here. Until another time!

Post scriptum: If you have ideas for features or new projects for me to work on, please let me know! More ideas is always better.


Patrick SteinAlternatives (v0.1.20141115) Released

· 8 days ago

I have now released the code that I mentioned in my previous post Code That Tells You Why which lets one keep multiple implementations around in code and switch between them manually without much trouble.

A link to the source code is here:

Patrick SteinCode That Tells You Why

· 10 days ago

A 2006 article by Jeff Atwood titled Code Tells You How, Comments Tell You Why showed up on reddit/r/programming today.

It makes a good point. However, it got me thinking that for cases like the binary-search example in the article, it might be nice to see all of the alternatives in the code and easily be able to switch between them.

One way to accomplish this in Lisp is to abuse the #+ and #- reader macros:

(defun sum-i^2 (n)
  (loop :for i :to n :summing (* i i))

  (do ((i 0 (1+ i))
       (sum 0 (+ sum (* i i))))
      ((> i n) sum))

  "Some people find a do-loop to hard to read
    (and 'too' too hard to spell, apparently)."

  (/ (* n (1+ n) (1+ (+ n n)) 6))

This is less than ideal for a number of reasons, including: one needs to make sure to pick “feature” names that won’t actually ever get turned on, the sense of + and - seem backwards here, and switching to a different alternative requires editing two places.

Another Lisp alternative is to abuse the case form:

(defun sum-i^2 (n)
  (case :now-i-know-better-and-can-do-this
     (loop :for i :to n :summing (* i i)))

     (do ((i 0 (1+ i))
          (sum 0 (+ sum (* i i))))
         ((> i n) sum)))

     "Some people find a do-loop to hard to read
    (and 'too' too hard to spell, apparently)."

     (/ (* n (1+ n) (1+ (+ n n)) 6)))))

This is better. No one can doubt which alternative is in use. It is only one edit to switch which alternative is used. It still feels pretty hackish to me though.

One can clean it up a bit with some macrology.

(defmacro alternatives (&body clauses)
  (flet ((symbol-is-***-p (sym)
           (and (symbolp sym)
                (string= (symbol-name sym) "***")))
         (final-clause-p (clause)
           (when (listp clause)
             (destructuring-bind (tag &body body) clause
               (when (and (symbolp tag)
                          (member (symbol-name tag)
                                  '("***" "FINAL" "BLESSED")
                                  :test #'string=))
      ((member-if #'symbol-is-***-p clauses)
       (let ((clause (first (rest anaphora:it))))
            ,@(rest clause))))

      ((find-if #'final-clause-p clauses)
          ,@(rest anaphora:it)))

      ((last clauses)
          ,@(rest (first anaphora:it)))))))

With this macro, one can now rewrite the sum-i^2 function quite readably:

(defun sum-i^2 (n)
     (loop :for i :to n :summing (* i i)))

     (do ((i 0 (1+ i))
          (sum 0 (+ sum (* i i))))
         ((> i n) sum)))

     "Some people find a do-loop to hard to read
    (and 'too' too hard to spell, apparently)."

     (/ (* n (1+ n) (1+ (+ n n)) 6)))))

If I wanted to try the my-first-attempt-was-something-like-this clause, I could stick a *** before that clause or change its name to *** or final or blessed, or I could move that clause into the last spot.

There is still an onus on the developer to chose useful alternative names. In most production code, one wants to clean out all of the dead code. On the other hand, during development or for more interactive code bodies, one might prefer to be able to see the exact “How” that goes with the “Why” and easily be able to swap between them.

(Above macro coming in well-documented library form, hopefully this weekend.)

Christophe Rhodesreproducible builds - a month ahead of schedule

· 15 days ago

I think this might be my last blog entry on the subject of building SBCL for a while.

One of the premises behind SBCL as a separate entity from CMUCL, its parent, was to make the result of its build be independent of the compiler used to build it. To a world where separate compilation is the norm, the very idea that building some software should persistently modify the state of the compiler probably seems bizarre, but the Lisp world evolved in that way and Lisp environments (at least those written in themselves) developed build recipes where the steps to construct a new Lisp system from an old one and the source code would depend critically on internal details of both the old and the new one: substantial amounts of introspection on the build host were used to bootstrap the target, so if the details revealed by introspection were no longer valid for the new system, there would need to be some patching in the middle of the build process. (How would you know whether that was necessary? Typically, because the build would fail with a more-or-less - usually more - cryptic error.)

Enter SBCL, whose strategy is essentially to use the source files first to build an SBCL!Compiler running in a host Common Lisp implementation, and then to use that SBCL!Compiler to compile the source files again to produce the target system. This requires some contortions in the source files: we must write enough of the system in portable Common Lisp so that an arbitrary host can execute SBCL!Compiler to compile SBCL-flavoured sources (including the standard headache-inducing (defun car (list) (car list)) and similar, which works because SBCL!Compiler knows how to compile calls to car).

How much is "enough" of the system? Well, one answer might be when the build output actually works, at least to the point of running and executing some Lisp code. We got there about twelve years ago, when OpenMCL (as it was then called) compiled SBCL. And yet... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).

I've been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail; if output files are different, it might be a harmless divergence, for example a difference in ordering of steps where neither depends on the other, or it might in fact indicate a leak from the host environment into the target. Before this latest attack at the problem, I last worked on it seriously in 2009, getting most of the way there but with some problems remaining, as measured by the number of output files (out of some 330 or so) whose contents differed depending on which host Common Lisp implementation SBCL!Compiler was running on.

Over the last month, then, I have been slowly solving these problems, one by one. This has involved refining what is probably my second most useless skill, working out what SBCL fasl files are doing by looking at their contents in a text editor, and from that intuiting the differences in the implementations that give rise to the differences in the output files. The final pieces of the puzzle fell into place earlier this week, and the triumphant commit announces that as of Wednesday all 335 target source files get compiled identically by SBCL!Compiler, whether that is running under Clozure Common Lisp (32- or 64-bit versions), CLISP, or a different version of SBCL itself.

Oh but wait. There is another component to the build: as well as SBCL!Compiler, we have SBCL!Loader, which is responsible for taking those 335 output files and constructing from them a Lisp image file which the platform executable can use to start a Lisp session. (SBCL!Loader is maybe better known as "genesis"; but it is to load what SBCL!Compiler is to compile-file). And it was slightly disheartening to find that despite having 335 identical output files, the resulting cold-sbcl.core file differed between builds on different host compilers, even after I had remembered to discount the build fingerprint constructed to be different for every build.

Fortunately, the actual problem that needed fixing was relatively small: a call to maphash, which (understandably) makes no guarantees about ordering, was used to affect the Lisp image data directly. I then spent a certain amount of time being thoroughly confused, having managed to construct for myself a Lisp image where the following forms executed with ... odd results:

(loop for x being the external-symbols of "CL" count 1)
; => 1032
(length (delete-duplicates (loop for x being the external-symbols of "CL" collect x)))
; => 978

(shades of times gone by). Eventually I realised that

(unless (member (package-name package) '("COMMON-LISP" "KEYWORD" :test #'string=))

was not the same as

(unless (member (package-name package) '("COMMON-LISP" "KEYWORD") :test #'string=)

and all was well again, and as of this commit the cold-sbcl.core output file is identical no matter the build host.

It might be interesting to survey the various implementation-specific behaviours that we have run into during the process of making this build completely repeatable. The following is a probably non-exhaustive list - it has been twelve years, after all - but maybe is some food for thought, or (if you have a particularly demonic turn of mind) an ingredients list for a maximally-irritating CL implementation...

There were probably other, more minor differences between implementations, but the above list gives a flavour of the things that needed doing in order to get to this point, where we have some assurance that our code is behaving as intended. And all of this is a month ahead of my self-imposed deadline of SBCL's 15th birthday: SBCL was announced to the world on December 14th, 1999. (I'm hoping to be able to put on an sbcl15 workshop in conjunction with the European Lisp Symposium around April 20th/21st/22nd - if that sounds interesting, please pencil the dates in the diary and let me know...)

Quicklisp newsNovember 2014 Quicklisp dist update now available

· 17 days ago
New projects:
Updated projects: asteroids, buildapp, chirp, cl-algebraic-data-type, cl-ana, cl-arrows, cl-async, cl-autowrap, cl-bert, cl-case-control, cl-conspack, cl-dbi, cl-erlang-term, cl-mustache, cl-read-macro-tokens, cl-store, cl-virtualbox, clack, cleric, clfswm, clip, closer-mop, clsql-helper, clss, clx, coleslaw, colleen, crane, crypto-shortcuts, deferred, defmacro-enhance, dissect, drakma-async, esrap-liquid, event-glue, gbbopen, glyphs, graph, hdf5-cffi, helambdap, hh-web, humbler, ironclad, lass, lfarm, lisp-executable, lisp-gflags, lparallel, lquery, lredis, mcclim, method-combination-utilities, mgl-pax, modularize, modularize-hooks, modularize-interfaces, okra, optima, petit.string-utils, pgloader, plump, plump-sexp, plump-tex, postmodern, prove, ratify, rcl, readable, rutils, serapeum, shelly, slime, smug, softdrink, south, staple, stumpwm, trivial-benchmark, trivial-download, trivial-indent, trivial-mimes, trivial-signal, trivial-thumbnail, uiop, universal-config, vecto, verbose, weblocks, weblocks-stores, weblocks-tree-widget, yason.

To get this update, use (ql:update-dist "quicklisp").

LispjobsClojure Software Engineer, Staples Sparx, San Mateo, CA

· 18 days ago

SparX is a small engineering team focused on applying online machine learning and predictive modeling to eCommerce (impacting a 24 billion dollar business).

Our stack is 100% Clojure, service oriented, targeting 50 million users with 1ms SLAs. We apply engineering and data science to tough problems such as dynamic pricing, shipping estimations, personalized emails, and multi-variate testing.

We are always looking for talent in data-science, engineering and devops. Bonus points if you can bridge 2 of these together. We love people with strong fundamentals who can dive deep.


Timofei ShatrovIt's alive! The path from library to web-app.

· 18 days ago

In case you’d rather just play with the website instead of reading this boring post, the url is

In my previous posts (part 1, part 2, part 3) I described the development process of a romanization algorithm for texts in Japanese language. However the ultimate goal was always to make a simple one-purpose web application that makes use of this algorithm. It took quite a while, but it’s finally here. In this post I will describe the technical details behind the development of this website.

I decided to build it with bare Hunchentoot; while there are some nice Lisp web frameworks developed lately like Restas or Caveman, my app would be too simple to need them. There would be a single handler that takes a query and various options as GET parameters, and returns a nicely formatted result.

Now I needed something to produce HTML. I used CL-WHO before, but this time I wanted a templating library where I can just copy-paste plain HTML into. I settled on closure-templates, which is based on Google Closure Templates but the syntax is slightly different. Now, I don’t know if I should recommend this library because its documentation in English is rather sparse and it has a dead link in its Github description. It has a detailed manual written in Russian, so I was able to read that. As to why I chose it, this library has a major advantage over its competitors. The same template can be compiled into a Common Lisp function and into Javascript! Why is this useful? Well, for example, I have these cute cards that explain the definition of a word:


These are mostly generated statically by the Common Lisp backend. But if I want to make such card on the fly client-side, I can call the corresponding Javascript function and it will produce the exact same HTML content. Makes dynamic HTML generation really easy.

For the front-end framework I chose Foundation. Bootstrap was my first choice, but it really doesn’t look all that great and it’s difficult to customize. So I decided to try something else. Foundation was pretty nice, it was easy to make my website responsive and look decent on mobile screen. The only problem, like I later discovered, was its sheer size. The 183kb javascript file (minified!) was refused to be cached by my browser for some reason, so each page load took quite a while. Fortunately that was solved by loading this file from cloudflare CDN.

One thing I didn’t concern myself about when writing the backend Ichiran algorithm was thread safety. However, as Hunchentoot uses threads to process requests, this matter becomes very important. Fortunately writing thread-safe code in Lisp is not that hard. Mostly you should just avoid modifying global special variables (binding them with let is okay) and be careful with writing persistent data. Since my app is pretty much read-only, there was only one such issue. I am storing a cache of word suffixes in a special variable. Generating this cache takes several seconds, but is only done once per session. As you can guess, this creates problems with thread safety, so I put a lock around this procedure and called it when the server is launched. Each server launch would therefore take several seconds, which is suboptimal. Later I would make the lock non-blocking and display a warning if the init-suffixes procedure is in progress.

Like I said before, I wanted my data to be compatible between Common Lisp and Javascript, so I added some functions to Ichiran to produce JSON objects containing various data. There are many JSON libraries for Common Lisp. I was using jsown before, so I decided to stick with it. jsown objects are lists with :obj as the first element and alist of properties as its cdr. The problem was that closure-templates only supports plists and alists as its context parameter, and jsown object is neither. The solution was to extend the methods fetch-property and fetch-keys. Since they are already defined for lists, I added :around methods to check for jsown objects specifically and call-next-method on cdr in that case.

(defmethod closure-template:fetch-property :around ((map list) key)
  "Support for jsown dict objects"
  (if (and (not (integerp key))
           (eql (car map) :obj)
           (every #'listp (cdr map)))
      (call-next-method (cdr map) key)

(defmethod closure-template:fetch-keys :around ((map list))
  (if (and (eql (car map) :obj)
           (every #'listp (cdr map)))
      (call-next-method (cdr map))

Theoretically this would fail if passed a valid plist like ‘(:obj (1 2)), but this cannot possibly happen in my application.

Now, at some point I had to actually put my app online. I needed a server and a domain name and I needed them cheap (because I’m currently unemployed (pls hire me)). For the server I chose Linode VPS, and I bought domain from I still think these new TLDs are a pretty stupid idea, but at least it gives us all an opportunity to buy a short and memorable domain name. I spent the rest of the day configuring my Linode server, which I never did before. Thankfully the documentation they provide is really good.

Because I wanted to get the most juice out of my cheap-ass server, the plan was to put hunchentoot server behind Nginx and to cache everything. There are existing guides on how to do this setup, which were very helpful. In my setup everything is served by Nginx except for URLs that start with /cl/, which are passed to Hunchentoot. The static pages (including error pages) are also generated by closure-template (so that the design is consistent), but they are just dumped into .html files served by Nginx. Nginx also caches dynamic content, which might help if some high-traffic site links to a certain query. This, and the fact that Linodes are hosted on SSD made the site run pretty smooth.

Now let’s talk about my infrastructure. As described in the guides above, I have a special hunchentoot user in addition to the main user. The main user’s quicklisp directory is symlinked to hunchentoot’s so the server can load the code but cannot write there. The code is stored in 2 repositories. One is the open-source core of the project (ichiran) and the other one is a private bitbucket repository ichiran-web which holds web-related code. However a simple git pull doesn’t update the code running on the server. If I’m lazy, I do “sudo service hunchentoot restart”, which restarts everything and reloads the code. This might of course create service interruptions for the users. Another option is hot swapping all the changes. For this purpose my hunchentoot server also starts a swank server like this:

(defun start-app (&optional (port 8080))
  (handler-case (swank:create-server :dont-close t)
    (error ()))
  (let ((acceptor (make-instance 'easy-acceptor :port port
                                 :access-log-destination *access-log*
                                 :message-log-destination *message-log*
    (setf *ichiran-web-server* (start acceptor))))

Swank is, of course, the server-side component of SLIME. It runs on a port that is not accessible remotely and can only be connected to locally or via SSH tunnel. I use the latter to connect SLIME on my PC to Swank running on my server, which allows me to apply various fixes without restarting, either from the REPL or by using C-c C-c to recompile some function.

Anyway, I’m pretty happy with the way things turned out, and I got some positive feedback already. The biggest thing left is tightening up the web design, which is my least favorite part of web development. The other thing is attracting enough traffic so that I can analyze the performance (I’m only getting a few people a day right now, which barely makes a blip on my server’s CPU graph).

In retrospect, getting this website up and running was pretty easy. I spent much more time trying to tweak ichiran library to split the sentences in a correct way (and I’m still working on it). It’s not much harder than, say, building a Django-based site. The tools are all there, the documentation is out there (kind of). VPSes are cheap. And it spreads awareness of Common Lisp. No reason not to try!

Nicolas HafnerParanormal Parasols - Confession 36

· 19 days ago

header It's been too long since my last entry. I just haven't had much that I felt safe talking about. But, now that I'm mostly done with everything that occupied me for a while (Radiance and Purplish), I have more time available for other things. One of these things happens to be Parasol.

As you may or may not know, Parasol is a Common Lisp painting application that was born out of curiosity and wasn't really meant to be anything big, but then quickly exploded in size. But as it is with these things, at some point I hit a big roadblock: It couldn't process events fast enough and was thus not getting enough feedback from the tablet pen. This happened because the drawing operations took too long and clogged up the event loop.

To solve this problem, we need to put the operations that take a long time into a separate thread and queue up events while it runs. I tried to hack this in, but the results were less than fantastic. It could go either one of two ways; either the entire CL environment had a segmentation fault at a random time, or the painting would be littered with strange drawing artefacts.

The only thing that I could guess from this was that Qt wasn't happy with me using CL threads. But that wasn't the only issue. I really didn't like what I had built over the weeks of working on Parasol, as it reeked of patchwork design, without much of an ulterior architecture. So I put the project to rest for the time being, hoping to return to it some day and rewrite it anew.

In preparation for this day I recently wrote Qtools, a collection of utilities that should make working with Qt easier. Writing that library alone caused me quite some amounts of grief, so I'm not very enthusiastic about diving deeper into the sea of tears that is C++ interaction.

Regardless, I couldn't keep my mind off of it and used some lecture time yesterday to write together a crude design document that should lay out a basic plan of action and architecture for the new Parasol version. I have started to set this plan into motion today.

First in line is building a skeleton main window with a pane for “gizmos” and a main area that can hold a range of documents or other forms of tabs. With that I'll have a minimal environment to test things out on. After that there's a large chunk of internal structure that needs to be completed: the document representation.

As I've laid it out so far, a document is composed of a set of layer-type objects, a history stack, and metadata such as the size and file associated with it. Layer objects themselves are composed of a position, size, drawables list, and an image buffer. A drawable itself is not specified to be anything fixed. Unlike before, it does not have to be a stroke, but simply an object that has the general methods a drawable has. This allows me to later add things that behave differently, such as images and 3D objects.

This change away from Parasol's original model necessitates two further, drastic alterations: First we need to have tools aside from a general brush that allow manipulating other kinds of objects, so we need to have a way to define ‘tools’ and their effects when used, in a uniform fashion. Secondly, we need a proper, generalised history that allows us to un/re-do any kind of operation on the document. Both of those things offer significant design challenges and I really hope I can pull it off.

Fortunately however, integrating these is a long while off, as after implementing a basic document I first need to add a way to publish these –so far purely virtual– documents to the UI. To tie these two worlds together, we need a view object that defines where to and how we're currently looking at the document. The view's job will also be to relay input events, such as tablet or mouse movement, to the actual document (and preserve the correct coordinate translation while doing so).

At this point I'd like to quickly talk about how I'm intending on solving the issue that initially brought Parasol to halt. After thinking things over I came to the realisation that my previous attempt of adding complex splines to the strokes to make smooth lines possible was an artefact of never getting enough tablet input to begin with. With enough events, a simple linear interpolation is a feasible approache. Knowing this it becomes apparent that we do not need to precalculate the spline's effective points (as linear interpolation is dirt cheap). From this it follows that I do not need to put the actual stroke updates into a separate thread, but can simply add points to the stroke in the event loop. The only thing that does have to be outsourced is the drawing of the buffers in the document.

In order to solve the drawing artefacts problem that arose in my previous attempt, I thought I'd take a look at a solution of threading from Qt out. After all, it might just be that Qt isn't happy with being called from threads it doesn't know about. After looking around I found out that the library CommonQt uses to bind to Qt (smokeqt) does not include these classes by default, even though they could be parsed.

Adding these classes to the library was easy enough, a simple XML config change and a recompilation made it available. But whether it actually worked or not was a different question. At first it seemed that it would not work at all. A callback from a QThread back into lisp would always cause SBCL to segfault, which was a rather devastating sign. Fortunately enough it seems that CCL instead works just fine with foreign threads. Halleluyah!

I haven't tested whether everything works just fine and dandy with this idea yet, but hopefully I'll get to that soon enough. However, threading always comes at the price of an immense complexity increase. Object access needs to be carefully synchronised with mutexes or other techniques. Thanks to the fact that interaction between the two threads is minimal, this doesn't pose too big of an issue. Once the drawing thread is done it only needs a single atomic call to set the buffer of the object to its finished value and otherwise it only needs to read things from vectors, where it won't make a big difference if it misses an element or two ahead.

Or at least so I hope. I'm sure that I'll get plenty of headaches with threading once I get to it. For now I'll be content with imagining that it'll all work out perfectly fine.

So, when this is all well and done I can move on to writing the architecture for the tools and history operations and with that the general brushes tool, which I hope I can mostly copy over from the previous version.

With image operations, threaded drawing, history and document presentation all sorted out the final step will then be to make a front-end for it all: Layer management, colour picking, brush configuration toolbar, a pluggable gizmos panel and so on and so forth.

If all goes according to my dreams I'll end up with a painting application that is extensible in every aspect imaginable, exhibits the most powerful user-configurable brush-engine I've seen, offers support for integrating and rendering 3D models as well as plenty of other constructs, and allows rudimentary image manipulation.

All of this is left up to your and my imagination for the time being, but I certainly hope to make it a reality at some point.

I might talk more about my progress as I advance, but maybe I'll keep on thinking I have nothing to talk about, irregardless of how much truth there is to that in actuality.

Until the sun shines again.

PixelI'm Giving A Talk Today!

· 23 days ago

Apologies for the short notice--I've been rushed just trying to get the darn thing ready--but I will be giving a talk at Iowa Code Camp on Saturday, November 1st.

If you can't attend, or want to review my talk before/after I give it, you can find it here. Feel free to offer any feedback. It's much too late to make major changes at this point, but feedback is welcome regardless.

(Technically the talk is about PHP, but tagged Lisp because I blame Common Lisp for inspiring a bunch of it.)

Ze "official" blog URL has changed, this is merely a cross-post for your reading convenience. You'll have to click through to read comment count unavailable comments or leave your own.

LispjobsClojure DevOps Engineer, Diligence Engine, Toronto or Remote

· 26 days ago

DiligenceEngine is a Toronto-based startup using machine learning to automate legal work. We're looking for a DevOps engineer to help us manage and automate our technology stack. Our team is small, pragmatic, and inquisitive; we love learning new technologies and balance adoption with good analysis. We prefer to hire in the Toronto area, but also welcome remote work in a time zone within North America.

Full job listing at their blog: We’re hiring a Clojure engineer!

LispjobsSenior Software Engineer: D-Wave, Burnaby, British Columbia, Canada

· 32 days ago

D-Wave is looking for exceptionally motivated people who love to see the impact of their work on a daily basis, who will do whatever it takes to ensure success of the company, and who want to be a part of something special.

D-Wave is working to radically change what it is possible to do with computers. Our mission is to integrate new discoveries in physics and computer science into new breakthrough approaches to computation. We are committed to commercializing quantum computers. The company’s flagship product, the D-Wave Two, is built around a novel type of superconducting quantum processor. D-Wave Two systems are currently in use by on-line customers and by customers in the field such as NASA & Google


D-Wave is seeking an experienced Software Developer to join the Processor Development group. The successful candidate will work closely with physicists to develop and optimize measurement routines used to calibrate D-Wave’s quantum processor. You will be self-driven, but comfortable working closely with others. You will share responsibility for designing, implementing, testing and maintaining the suite of software necessary to support the testing and operation of D-Wave's quantum computing hardware. The software is implemented in Common Lisp (SBCL) and is an integral part of the quantum computing system. It is used for a variety of purposes including calibration, operation, testing and benchmarking.


  • Work closely with physicists and other software engineers to develop any and all aspects of quantum processor calibration, operation infrastructure, performance optimization and profiling
  • Analyze and optimize existing software and newly developed routines for performance and reliability. Develop and support software related to architecture, usage of libraries and functions, the best ways of solving a problem or implementing a new feature, and layer efficiency and performance optimization
  • Software development, support, and troubleshooting systems hardware including fridge control and processor electronics
  • Full life-cycle support of software products from development, test and validation, production deployment, through to decommissioning
  • Configuring and upgrading quantum processor control servers and software development servers


  • Masters Degree in Computer Science with 4+ years relevant experience, or Bachelor’s degree and 8+ years experience.
  • Experience developing and optimizing software in compiled languages; ability to consider both algorithm choice and how your code is compiled when tuning performance.
  • At least 4 years of professional software development experience including software design, code and test, and maintenance
  • Familiarity with Common Lisp is a definite asset
  • Comfortable working alongside one or two other scientists or software engineers, such as in a pair programming

    We thank all applicants for their interest, however, only those who are selected for interviews will be contacted. It is D-Wave Systems Inc policy to provide equal employment opportunity (EEO) to all persons regardless of race, color, religion, sex, national origin, age, sexual orientation, genetic information, physical or mental disability, protected veteran status, or any other characteristic protected by federal, state/provincial, local law.


Lindsay Andrea <>

Talent Acquisition Specialist, Human Resources

D-Wave Systems Inc.

604.630.1428  Ext. 119

Paul KhuongPerformance Tuning ~ Writing an Essay

· 35 days ago

Skip to the meaty bits.

My work at AppNexus mostly involves performance optimisation, at any level from microarchitecture-driven improvements to data layout and assembly code to improving the responsiveness of our distributed system under load. Technically, this is similar to what I was doing as a lone developer on research-grade programs. However, the scale of our (constantly changing) code base and collaboration with a dozen other coders mean that I approach the task differently: e.g., rather than single-mindedly improving throughput now, I aim to pick an evolution path that improves throughput today without imposing too much of a burden on future development or fossilising ourselves in a design dead-end. So, although numbers still don’t lie (hah), my current approach also calls for something like judgment and taste, as well as a fair bit of empathy for others. Rare are the obviously correct choices, and, in that regard, determining what changes to make and which to discard as over-the-top ricing feels like I’m drafting a literary essay.

This view is probably tainted by the fact that, between English and French classes, I spent something like half of my time in High School critiquing essays, writing essays, or preparing to write one. Initially, there was a striking difference between the two languages: English teachers had us begin with the five paragraph format where one presents multiple arguments for the same thesis, while French teachers imposed a thesis/antithesis/synthesis triad (and never really let it go until CÉGEP, but that’s another topic). When I write that performance optimisation feels like drafting essays, I’m referring to the latter “Hegelian” process, where one exposes arguments and counterarguments alike in order to finally make a stronger case.

I’ll stretch the analogy further. Reading between the lines gives us access to more arguments, but it’s also easy to get the context wrong and come up with hilariously far-fetched interpretations. When I try to understand a system’s performance, the most robust metrics treat the system as a black box: it’s hard to get throughput under production data wrong. However, I need finer grained information (e.g., performance counters, instruction-level profiling, or application-specific metrics) to guide my work, and, the more useful that information can be – like domain specific metrics that highlight what we could do differently rather than how to do the same thing more efficiently – the easier it is to measure incorrectly. That’s not a cause for despair, but rather a fruitful line of skepticism that helps me find more opportunities.

Just two weeks ago, questioning our application-specific metrics lead to an easy 10% improvement in throughput for our biggest consumer of CPU cycles. The consumer is an application that determines whether internet advertising campaigns are eligible to bid on an ad slot, and if so, which creative (ad) to show and at what bid price. For the longest time, the most time-consuming part of that process was the first step, testing for campaign eligibility. Consequently, we tracked the execution of that step precisely and worked hard to minimise the time spent on ineligible campaigns, without paying much attention to the rest of the pipeline. However, we were clearly hitting diminishing returns in that area, so I asked myself how an adversary could use our statistics to mislead us. The easiest way I could think of was to have campaigns that are eligible to bid, but without any creative compatible with the ad slot (e.g., because it’s the wrong size or because the website forbids Flash ads): although the campaigns are technically eligible, they are unable to bid on the ad slot. We added code to track these cases and found that almost half of our “eligible” campaigns simply had no creative in the right size. Filtering these campaigns early proved to be a low-hanging fruit with an ideal code complexity:performance improvement ratio.

Trust no one, not even performance counters

I recently learned that we also had to second-guess instruction level profiles. Contemporary x86oids are out of order, superscalar, and speculative machines, so profiles are always messy: “blame” is scattered around the real culprit, and some instructions (pipeline hazards like conditional jumps and uncached memory accesses, mostly) seem to account for more than their actual share. What I never realised is that, in effect, some instructions systematically mislead and push their cycles to others.

Some of our internal spinlocks use mfence. I expected that to be suboptimal, since it’s common knowledge that locked instruction are more efficient barriers: serialising instructions like mfence have to affect streaming stores and other weakly ordered memory accesses, and that’s a lot more work than just preventing store/load reordering. However, our profiles showed that we spent very little time on locking so I never gave it much thought... until eliminating a set of locks had a much better impact on performance than I would have expected from the profile. Faced with this puzzle, I had to take a closer look at the way mfence and locked instructions affect hardware-assisted instruction profiles on our production Xeon E5s.

I came up with a simple synthetic microbenchmark to simulate locking on my E5-4617: the loop body is an adjustable set of memory accesses (reads and writes of out-of-TLB or uncached locations) or computations (divisions) bracketed by pairs of normal stores, mfence, or lock inc/dec to cached memory (I would replace the fences with an increment/decrement pair and it looks like all read-modify-write instructions are implemented similarly on Intel). Comparing runtimes for normal stores with the other instructions helps us gauge their overhead. I can then execute each version under perf and estimate the overhead from the instruction-level profile. If mfence is indeed extra misleading, there should be a greater discrepancy between the empirical impact of the mfence pair and my estimate from the profile.

You can find the super crufty code here, along with a rdtscp version of cycle.h.

With locked instructions and random reads that miss the L3 cache, the (cycle) profile for the microbenchmark loop is:

$ perf annotate -s cache_misses
    0.06 :        4006b0:       and    %rdx,%r10
    0.00 :        4006b3:       add    $0x1,%r9
    ;; random (out of last level cache) read
    0.00 :        4006b7:       mov    (%rsi,%r10,8),%rbp
   30.37 :        4006bb:       mov    %rcx,%r10
    ;; foo is cached, to simulate our internal lock
    0.12 :        4006be:       mov    %r9,0x200fbb(%rip)        # 601680 <foo>
    0.00 :        4006c5:       shl    $0x17,%r10
    [... Skipping arithmetic with < 1% weight in the profile]
    ;; locked increment of an in-cache "lock" byte
    1.00 :        4006e7:       lock incb 0x200d92(%rip)        # 601480 <private+0x200>
   21.57 :        4006ee:       add    $0x1,%rax
    ;; random out of cache read
    0.00 :        400704:       xor    (%rsi,%r10,8),%rbp
   21.99 :        400708:       xor    %r9,%r8
    ;; locked in-cache decrement
    0.00 :        400729:       lock decb 0x200d50(%rip)        # 601480 <private+0x200>
   18.61 :        400730:       add    $0x1,%rax
    0.92 :        400755:       jne    4006b0 <cache_misses+0x30>

Looking at that profile, I’d estimate that the two random reads account for ~50% of runtime, and the pair of lock inc/dec for ~40%.

The picture is completely different for mfence.

$ perf annotate -s cache_misses
    0.00 :        4006b0:       and    %rdx,%r10
    0.00 :        4006b3:       add    $0x1,%r9
    ;; random read
    0.00 :        4006b7:       mov    (%rsi,%r10,8),%rbp
   42.04 :        4006bb:       mov    %rcx,%r10
    ;; store to cached memory (lock word)
    0.00 :        4006be:       mov    %r9,0x200fbb(%rip)        # 601680 <foo>
    0.20 :        4006e7:       mfence 
    5.26 :        4006ea:       add    $0x1,%rax
    ;; random read
    0.19 :        400700:       xor    (%rsi,%r10,8),%rbp
   43.13 :        400704:       xor    %r9,%r8
    0.00 :        400725:       mfence 
    4.96 :        400728:       add    $0x1,%rax
    0.92 :        40072c:       add    $0x1,%rax
    0.36 :        40074d:       jne    4006b0 <cache_misses+0x30>

It looks like the loads from uncached memory represent ~85% of the runtime, while the mfence pair might account for at most ~15%, if I include all the noise from surrounding instructions.

If I trusted the profile, I would worry about eliminating locked instructions, but not so much for mfence. However, runtimes (in cycles), which is what I’m ultimately interested in, tell a different story. The same loop of LLC load misses takes 2.81e9 cycles for 32M iterations without any atomic or fence, versus 3.66e9 for lock inc/dec and 19.60e9 cycles for mfence. So, while the profile for the mfence loop would let me believe that only ~15% of the time is spent on synchronisation, the mfence pair really represents 86% \(((19.6 - 2.81) / 19.6)\) of the runtime for that loop! Inversely, the profile for the locked pair would make me guess that we spend about 40% of the time there, but, according to the timings, the real figure is around 23%.

The other tests all point to the same conclusion: the overhead of mfence is strongly underestimated by instruction level profiling, and that of locked instructions exaggerated, especially when adjacent instructions write to memory.

  setup     cycles   (est. overhead)  ~actual overhead

div [ALU] (100 Mi iterations)
 atomic: 20153782848   (20%)          ~ 3.8%
 mfence: 28202315112   (25%)          ~31.3%
vanilla: 19385020088


TLB misses (64Mi iterations)
 atomic:  3776164048   (80%)          ~39.3%
 mfence: 12108883816   (50%)          ~81.1%
vanilla:  2293219400 

LLC misses (32Mi iterations)
 atomic:  3661686632   (40%)          ~23.3%
 mfence: 19596840824   (15%)          ~85.7%
vanilla:  2807258536


TLB (64Mi iterations)
 atomic:  3864497496   (80%)          ~10.4%
 mfence: 13860666388   (50%)          ~75.0%
vanilla:  3461354848

LLC (32Mi iterations)
 atomic:  4023626584   (60%)          ~16.9%
 mfence: 21425039912   (20%)          ~84.4%
 vanilla: 3345564432

I can guess why we observe this effect; it’s not like Intel is intentionally messing with us. mfence is a full pipeline flush: it slows code down because it waits for all in-flight instructions to complete their execution. Thus, while it’s flushing that slows us down, the profiling machinery will assign these cycles to any of the instructions that are being flushed. Locked instructions instead affect stores that are still queued. By forcing such stores to retire, locked instructions become responsible for the extra cycles and end up “paying” for writes that would have taken up time anyway.

Losing faith in hardware profiling being remotely representative of reality makes me a sad panda; I now have to double check perf profiles when hunting for misleading metrics. At least I can tell myself that knowing about this phenomenon helps us make better informed – if less definite – decisions and ferret out more easy wins.

P.S., if you find this stuff interesting, feel free to send an email (pkhuong at $ My team is hiring both experienced developers and recent graduates (:

Gábor MelisTranscripts

· 35 days ago

I've just committed a major feature to MGL-PAX: the ability to include code examples in docstrings. Printed output and return values are marked up with ".." and "=>", respectively.

 (values (princ :hello) (list 1 2))
 => :HELLO
 => (1 2)

The extras are:

  • parsing back and updating a transcript
  • auto-checking of up-to-dateness at documentation generation time
  • readable return values can be commented, hand-indented without breaking consistency checks and updates will not destroy those changes
  • Emacs integration: transcribing the last expression and updating a transcript in a region.
  • TRANSCRIBE works without the rest of MGL-PAX so it can be used to format bug reports or as a poor man's expect script.

The documentation provides a tutorialish treatment. I hope you'll find it useful.

Christophe Rhodesstill working on reproducible builds

· 41 days ago

It's been nearly fifteen years, and SBCL still can't be reliably built by other Lisp compilers.

Of course, other peoples' definition of "reliably" might differ. We did achieve successful building under unrelated Lisp compilers twelve years ago; there were a couple of nasty bugs along the way, found both before and after that triumphant announcement, but at least with a set of compilers whose interpretation of the standard was sufficiently similar to SBCL's own, and with certain non-mandated but expected features (such as the type (array (unsigned-byte 8) (*)) being distinct from simple-vector, and single-float being distinct from double-float), SBCL achieved its aim of being buildable on a system without an SBCL binary installed (indeed, using CLISP or XCL as a build host, SBCL could in theory be bootstrapped starting with only gcc).

For true "reliability", though, we should not be depending on any particular implementation-defined features other than ones we actually require - or if we are, then the presence or absence of them should not cause a visible difference in the resulting SBCL. The most common kind of leak from the host lisp to the SBCL binary was the host's value of most-positive-fixnum influencing the target, causing problems from documentation errors all the way up to type errors in the assembler. Those leaks were mostly plugged a while ago, though they do recur every so often; there are other problems, and over the last week I spent some time tracking down three of them.

The first: if you've ever done (apropos "PRINT") or something similar at the SBCL prompt, you might wonder at the existence of functions named something like SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX (QUOTE (102))) (OP1 (QUOTE (58))) (OP2 (QUOTE (32))) (IMM NIL TYPE (QUOTE IMM-BYTE))) (QUOTE (NAME TAB REG , REG/MEM ...)))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|.

What is going on there? Well, these functions are a part of the disassembler machinery; they are responsible for taking a certain amount of the machine code stream and generating a printed representation of the corresponding assembly: in this case, for the PINSRB instruction. Ah, but (in most instruction sets) related instructions share a fair amount of structure, and decoding and printing a PINSRD instruction is basically the same as for PINSRB, with just one #x20 changed to a #x22 - in both cases we want the name of the instruction, then a tab, then the destination register, a comma, the source, another comma, and the offset in the destination register. So SBCL arranges to reuse the PINSRB instruction printer for PINSRD; it maintains a cache of printer functions, looked up by printer specification, and reuses them when appropriate. So far, so normal; the ugly name above is the generated name for such a function, constructed by interning a printed, string representation of some useful information.

Hm, but wait. See those (QUOTE (58)) fragments inside the name? They result from printing the list (quote (58)). Is there a consensus on how to print that list? Note that *print-pretty* is bound to nil for this printing; prior experience has shown that there are strong divergences between implementations, as well as long-standing individual bugs, in pretty-printer support. So, what happens if I do (write-to-string '(quote foo) :pretty nil)?

So, if SBCL was compiled using CLISP, the name of the same function in the final image would be SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX '(102)) (OP1 '(58)) (OP2 '(32)) (IMM NIL TYPE 'IMM-BYTE)) '(NAME TAB REG , REG/MEM ...))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|. Which is shorter, and maybe marginally easier to read, but importantly for my purposes is not bitwise-identical.

Thus, here we have a difference between host Common Lisp compilers which leaks over into the final image, and it must be eliminated. Fortunately, this was fairly straightforward to eliminate; those names are never in fact used to find the function object, so generating a unique name for functions based on a counter makes the generated object file bitwise identical, no matter how the implementation prints two-element lists beginning with quote.

The second host leak is also related to quote, and to our old friend backquote - though not related in any way to the new implementation. Consider this apparently innocuous fragment, which is a simplified version of some code to implement the :type option to defstruct:

(macrolet ((def (name type n)
                (declaim (inline ,name (setf ,name)))
                (defun ,name (thing)
                  (declare (type simple-vector thing))
                  (the ,type (elt thing ,n)))
                (defun (setf ,name) (value thing)
                  (declare (type simple-vector thing))
                  (declare (type ,type value))
                  (setf (elt thing ,n) value)))))
  (def foo fixnum 0)
  (def bar string 1))

What's the problem here? Well, the functions are declaimed to be inline, so SBCL records their source code. Their source code is generated by a macroexpander, and so is made up of conses that are generated programmatically (as opposed to freshly consed by the reader). That source code is then stored as a literal object in an object file, which means in practice that instructions for reconstructing a similar object are dumped, to be executed when the object file is processed by load.

Backquote is a reader macro that expands into code that, when evaluated, generates list structure with appropriate evaluation and splicing of unquoted fragments. What does this mean in practice? Well, one reasonable implementation of reading `(type ,type value) might be:

(cons 'type (cons type '(value)))

and indeed you might (no guarantees) see something like that if you do

(macroexpand '`(type ,type value))

in the implementation of your choice. Similarly, reading `(setf (elt thing ,n) value) will eventually generate code like

(cons 'setf (cons (cons 'elt (list 'thing n)) '(value)))

Now, what is "similar"? In this context, it has a technical definition: it relates two objects in possibly-unrelated Lisp images, such that they can be considered to be equivalent despite the fact that they can't be compared:

similar adj. (of two objects) defined to be equivalent under the similarity relationship.

similarity n. a two-place conceptual equivalence predicate, which is independent of the Lisp image so that two objects in different Lisp images can be understood to be equivalent under this predicate. See Section 3.2.4 (Literal Objects in Compiled Files).

Following that link, we discover that similarity for conses is defined in the obvious way:

Two conses, S and C, are similar if the car of S is similar to the car of C, and the cdr of S is similar to the cdr of C.

and also that implementations have some obligations:

Objects containing circular references can be externalizable objects. The file compiler is required to preserve eqlness of substructures within a file.

and some freedom:

With the exception of symbols and packages, any two literal objects in code being processed by the file compiler may be coalesced if and only if they are similar [...]

Put this all together, and what do we have? That def macro above generates code with similar literal objects: there are two instances of '(value) in it. A host compiler may, or may not, choose to coalesce those two literal '(value)s into a single literal object; if it does, the inline expansion of foo (and bar) will have a circular reference, which must be preserved, showing up as a difference in the object files produced during the SBCL build. The fix? It's ugly, but portable: since we can't stop an aggressive compiler from coalescing constants which are similar but not identical, we must make sure that any similar substructure is in fact identical:

(macrolet ((def (name type n)
             (let ((value '(value)))
                  (declaim (inline ,name (setf ,name)))
                  (defun ,name (thing)
                    (declare (type simple-vector thing))
                    (the ,type (elt thing ,n)))
                  (defun (setf ,name) (value thing)
                    (declare (type simple-vector thing))
                    (declare (type ,type . ,value))
                    (setf (elt thing ,n) . ,value)))))
  (def foo fixnum 0)
  (def bar string 1))

Having dealt with a problem with quote, and a problem with backquote, what might the Universe serve up for my third problem? Naturally, it would be a problem with a code walker. This code walker is somewhat naďve, assuming as it does that its body is made up of forms or tags; it is the assemble macro, which is used implicitly in the definition of VOPs (reusable assembly units); for example, like

(assemble ()
  (move ptr object)
  (zeroize count)
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
  (loadw ptr ptr cons-cdr-slot list-pointer-lowtag)
  (inst add count (fixnumize 1))
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
  (%test-lowtag ptr LOOP nil list-pointer-lowtag)
  (error-call vop 'object-not-list-error ptr)

which generates code to compute the length of a list. The expander for assemble scans its body for any atoms, and generates binding forms for those atoms to labels:

(let ((new-labels (append labels
                          (set-difference visible-labels inherited-labels))))
  `(let (,@(mapcar (lambda (name) `(,name (gen-label))) new-labels))

The problem with this, from a reproducibility point of view, is that set-difference (and the other set-related functions: union, intersection, set-exclusive-or and their n-destructive variants) do not return the sets with a specified order - which is fine when the objects are truly treated as sets, but in this case the LOOP and DONE label objects ended up in different stack locations depending on the order of their binding. Consequently the machine code for the function emitting code for computing a list's length - though not the machine code emitted by that function - would vary depending on the host's implementation of set-difference. The fix here was to sort the result of the set operations, knowing that all the labels would be symbols and that they could be treated as string designators.

And after all this is? We're still not quite there: there are three to four files (out of 330 or so) which are not bitwise-identical for differing host compilers. I hope to be able to rectify this situation in time for SBCL's 15th birthday...

Timofei ShatrovWords made out of words

· 45 days ago

Since my last post I’ve done a lot of work on my Japanese sentence-segmenting algorithm, so it’s time for an update.

First of all, I added conjugations. Here’s how JMdict does conjugations. That’s for a single verb. There’s a note saying “this table has been automatically generated”; indeed, in JMdict conjugations are generated on a basis of a rather large .csv file and are not stored in the database. Obviously for my purposes it is more efficient to have these in my database, so I ported a (rather simple) algorithm to Common Lisp and wrote a (really complex) procedure to load them. It takes quite a while to INSERT those one by one, which made me wish postmodern had some sort of bulk inserting mechanism. Some time later I discovered that some of these conjugations are themselves verbs that can be (and often are) conjugated. So I added “second level” conjugations that point both to first level conjugation and to the original verb. Hopefully “third level” conjugations are rarely used.

Meanwhile I’ve been trying to improve the segmentation algorithm. The first major change was calculating n best segmentations instead of just one. That would allow me to have a better picture of what the algorithm prefers. I came up with the structure that I call top-array, which is basically an array of n scores sorted from the biggest to smallest and when a new score is added, we go from the end and push everything smaller than the new score to the right. I thought it was pretty elegant and probably the fastest way to do this for small n (obviously some sort of tree would work better for large n).

(defstruct (top-array-item (:conc-name tai-)) score payload)

(defclass top-array ()
  ((array :reader top-array)
   (count :reader item-count :initform 0)

(defmethod initialize-instance :after ((obj top-array) &key (limit 5))
  (setf (slot-value obj 'array) (make-array limit :initial-element nil)))

(defgeneric register-item (collection score payload)
  (:method ((obj top-array) score payload)
    (with-slots (array count) obj
      (let ((item (make-top-array-item :score score :payload payload))
            (len (length array)))
        (loop for idx from (min count len) downto 0
           for prev-item = (when (> idx 0) (aref array (1- idx)))
           for done = (or (not prev-item) (>= (tai-score prev-item) score))
           when (< idx len) do (setf (aref array idx) (if done item prev-item))
           until done)
        (incf count)))))

(defgeneric get-array (collection)
  (:method ((obj top-array))
    (with-slots (array count) obj
      (if (>= count (length array)) array (subseq array 0 count)))))

An instance of top-array is created for every segment (found word in a sentence), as well as one for the entire sentence, from which the best path (a sequence of words) is taken in the end. Then the basic algorithm is similar to the one described in my previous post, but gains an extra inner loop.

(loop for (seg1 . rest) on segments
      for score1 = (get-segment-score seg1)
      do (register-item (segment-top seg1) score1 (list seg1))
         (register-item top score1 (list seg1))
      (loop for seg2 in rest
            for score2 = (get-segment-score seg2)
            when (>= (segment-start seg2) (segment-end seg1)) do
            (loop for tai across (get-array (segment-top seg1))
                 for path = (cons seg2 (tai-payload tai))
                 for score = (+ score2 (tai-score tai))
                 do (register-item (segment-top seg2) score path)
                    (register-item top score path))))

Then (get-array top) would return n best paths.

After this I started thinking on how to make my algorithm more context-sensitive. The way in which every segment is scored is completely independent of the other segments, which might cause best scored path to be a sequence of words that make no sense when put next to each other! The above algorithm is easy to modify to add some sort of bonus to two subsequent segments, so my first attempt was to encourage words that like to be next to each other in natural language with some extra score (I called that “synergy”). So, for example, there are “no-adjectives”, which are basically nouns, but when followed by particle “no” they become adjectives. I added a synergy that adds 15 points if such word is followed by particle “no”. In the end this way to do things has proven itself limited. Words can have wildly different scores and when things go wrong, extra 15 points might not be enough to make them right. On the other hand, if I increase this bonus too much, this might erroneously break up words that just so happen to have “no” in them.

Later I came up with the concept of compound words, which are “words” that don’t exist in the database, but rather consist of several words that do exist in the database. Right now, it’s mostly a primary word + one or several suffixes, but potentially there could be prefixes too. For the purposes of segmentation a compound word acts like one single word. One example of a common suffix would be “たい” (-tai) , which follows a verb (“to X”) conjugated in a certain way and the resultant meaning is “to want to X”. Most of these suffixes themselves have many conjugations. To check if a word can be understood as a compound word, I need to check if it ends with one of many suffixes, and then check if the part before the suffix has correct part of speech or conjugation. All possible suffixes and their meanings are put into a hashtable and then we can check if a word ends with some of them by checking all its endings versus the hashtable.

(defun get-suffixes (word)
  (loop for start from (1- (length word)) downto 1
       for substr = (subseq word start)
       for val = (gethash substr *suffix-cache*)
       when val
       collect (cons substr val)))

The concept of suffixes has fared much better as now I am able to calculate scores of compound words in a more versatile way.

I would still sometimes encounter phrases that are split badly by my algorithm, but a human would segment easily. For example if the words “AB” and “ABC” both exist in database, but “AB” happens to score higher (e.g. because it’s a really common word, while ABC is not so much), then “ABC” would never be segmented as one word “ABC”, it would be “AB”+”C”, even if “C” is a completely worthless word, or even not a word at all (a gap). An example of a “worthless” word is a hiragana spelling of one-syllable word that would normally be spelled with a kanji. I didn’t care about those much, because they had really low scores and thus only appeared when something went awry. However getting rid of these low-scoring words would allow me to place a large penalty on gaps and thus “ABC” will be able to score higher than “AB”+gap. In the path-finding algorithm above the same score is put into top and segment-top top-arrays. But if we want to penalize gaps, the score put into top should also include a penalty for the gap to the right of the last segment, if it exists. Penalties for gaps to the left of the leftmost segment and in-between segments should be added to both.

Anyway, I’m pretty happy with how this thing is progressing, and I’m going to switch my efforts to building a web-interface. Here’s how it currently works in REPL:

(click here for full-res image)


Kinda messy, isn’t it? The challenge would be to display all this information in reasonable manner. I already have some ideas, but it would still probably take some effort to decipher. But then again, translating the sentence was never the goal, just romanizing it, which ichiran does pretty well right now.

drmeisterDebugging with the Clasp debugger

· 46 days ago

Clasp provides two ways of debugging code. In interactive sessions Clasp invokes a built in Common Lisp debugger when errors or other exceptional situations arise. The Clasp compiler also generates DWARF debugging information that can be used by the GDB debugger (and hopefully soon the LLDB debugger) to display Clasp Common Lisp source information interleaved with C++ source information.

To see this start up clasp and type what follows the > prompt:

Top level.
> (defun c () (break "In c"))

> (defun b () (c))

> (defun a () (b))

> (a)

Condition of type: SIMPLE-CONDITION
In c

Available restarts:
(use :r1 to invoke restart 1)

1. (CONTINUE) Return from BREAK.
2. (RESTART-TOPLEVEL) Go back to Top-Level REPL.

Broken at frame[14] CORE::REP.
 File: #<CORE:SOURCE-FILE-INFO #P"/Users/meister/Development/clasp/src/lisp/kernel/lsp/top.lsp"> (Position #573)

The double prompt >> indicates that we are now in the Clasp Common Lisp debugger. This debugger is inherited from ECL (because Clasp uses the excellent Common Lisp source code from ECL). To get a list of commands that are available in the debugger type:

>> :h

Top level commands:
:cf		Compile file.
:exit or ^D	Exit Lisp.
:ld		Load file.
:step		Single step form.
:tr(ace)	Trace function.
:untr(ace)	Untrace function.
:pwd	Print the current value of *default-pathname-defaults*.
:cd	Change the current value of *default-pathname-defaults*.

Help commands:
:apropos	Apropos.
:doc(ument)	Document.
:h(elp) or ?	Help.  Type ":help help" for more information.

Break commands:
:q(uit)		Return to some previous break level.
:pop		Pop to previous break level.
:c(ontinue)	Continue execution.
:b(acktrace)	Print backtrace.
:f(unction)	Show current function.
:p(revious)	Go to previous function.
:d(own)         Alias to :previous.
:n(ext)		Go to next function.
:u(p)           Alias to :next.
:g(o)		Go to next function.
:fs             Search forward for function.
:bs             Search backward for function.
:disassemble	Disassemble current function.
:l(ambda-)e(expression)	Show lisp code for current function.
:v(ariables)	Show local variables, functions, blocks, and tags.
:hide		Hide function.
:unhide		Unhide function.
:hp		Hide package.
:unhp		Unhide package.
:unhide-all     Unhide all variables and packages.
:bds            Show binding stack.
:frs            Show frame stack.
:m(essage)      Show error message.
:hs		Help stack.
:i(nspect)      Inspect value of local variable.

Restart commands:
:r1             Return from BREAK. (CONTINUE).
:r2             Go back to Top-Level REPL. (RESTART-TOPLEVEL).

Clasp/ECL use Common Lisp keywords to activate debugger functionality.

To generate a backtrace type:

>> :b

--------STACK TRACE--------
   frame#  0toplevel         epilogueForm     0/0   REPL
   frame#  1/c              top.lsp   419/2   CORE::TOP-LEVEL
   frame#  2/c              top.lsp   615/21  CORE::TPL
   frame#  3/c              top.lsp   605/32  CORE::REP
   frame#  4/b  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame#  5/b  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame#  6/c            -no-file-     1/0   nil
   frame#  7/c            -no-file-     1/0   A
   frame#  8/c            -no-file-     1/0   B
   frame#  9/c            -no-file-     1/0   C
   frame# 10/c       conditions.lsp   457/8   COMMON-LISP:BREAK
   frame# 11/c              top.lsp  1507/9   COMMON-LISP:INVOKE-DEBUGGER
   frame# 12/c              top.lsp  1489/5   CORE::DEFAULT-DEBUGGER
   frame# 13/c              top.lsp   618/7   CORE::TPL
-->frame# 14/c              top.lsp   605/32  CORE::REP
   frame# 15/b  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame# 16/b  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame# 17/c            -no-file-     0/0   nil
   frame# 18/c              top.lsp  1088/3   CORE::TPL-BACKTRACE
   frame# 19/b     712/0   CORE:IHS-BACKTRACE


The —-> indicates the current frame that the debugger has stopped on. Since the error handling code and the debugger functions are all written in Common Lisp, those functions also appear on the backtrace. The functions we entered are in frames 7, 8, and 9.

At this point we could go to a specific frame using :g and view the environment of that frame using :v or we can print variables by just typing their names.

For now we will just leave the debugger and return to the top level REPL by invoking a restart.

>> :r2


Now we are back in the top level REPL and can continue working.

Next I’ll show you how to use the DWARF generated debugging information embedded in compiled Common Lisp code to debug Clasp using GDB or LLDB.

Quicklisp newsOctober 2014 Quicklisp dist update now available

· 48 days ago
New projects:
Updated projects: asteroids, avatar-api, babel, basic-binary-ipc, caveman, cffi, checkl, cl-ana, cl-async, cl-autowrap, cl-base58, cl-charms, cl-cli, cl-cli-parser, cl-conspack, cl-dbi, cl-dot, cl-gdata, cl-gss, cl-locatives, cl-mediawiki, cl-opengl, cl-project, clack, clip, closer-mop, clss, coleslaw, colleen, com.informatimago, cqlcl, datafly, dbus, djula, docbrowser, drakma, dynamic-mixins, fast-io, floating-point, gbbopen, gendl, graph, hdf5-cffi, lisp-executable, lisp-interface-library, lisp-unit2, mel-base, metabang-bind, mgl-pax, micmac, modularize-hooks, modularize-interfaces, nibbles, osicat, pg, plump, postmodern, quickproject, ratify, restas, rucksack, rutils, s-xml, scriptl, serapeum, shelly, smug, spinneret, staple, stumpwm, trivial-download, trivial-mimes, trivial-signal, universal-config, utils-kt, yason.

Removed projects: cl-test-more, phemlock.

cl-test-more hasn't really been removed. It's been renamed to prove.

I removed phemlock by request; it represents an old, dead branch of development, hosted on CVS. You can still get hemlock through Quicklisp by loading one of hemlock.tty, hemlock.qt, or hemlock.clx, all provided by the hemlock project on gitorious.

To get this update, use (ql:update-dist "quicklisp").

Christophe Rhodesinteresting pretty-printer bug

· 48 days ago

One of SBCL's Google Summer of Code students, Krzysztof Drewniak (no relation) just got to merge in his development efforts, giving SBCL a far more complete set of Unicode operations.

Given that this was the merge of three months' out-of-tree work, it's not entirely surprising that there were some hiccups, and indeed we spent some time diagnosing and fixing a 1000-fold slowdown in char-downcase. Touch wood, all seems mostly well, except that Jan Moringen reported that, when building without the :sb-unicode feature (and hence having a Lisp with 8-bit characters) one of the printer consistency tests was resulting in an error.

Tracking this down was fun; it in fact had nothing in particular to do with the commit that first showed the symptom, but had been lying latent for a while and had simply never shown up in automated testing. I've expressed my admiration for the Common Lisp standard before, and I'll do it again: both as a user of the language and as an implementor, I think the Common Lisp standard is a well-executed document. But that doesn't stop it from having problems, and this is a neat one:

When a line break is inserted by any type of conditional newline, any blanks that immediately precede the conditional newline are omitted from the output and indentation is introduced at the beginning of the next line.

(from pprint-newline)

For the graphic standard characters, the character itself is always used for printing in #\ notation---even if the character also has a name[5].

(from CLHS

Space is defined to be graphic.

(from CLHS glossary entry for 'graphic')

What do these three requirements together imply? Imagine printing the list (#\a #\b #\c #\Space #\d #\e #\f) with a right-margin of 17:

(write-to-string '(#\a #\b #\c #\Space #\d #\e #\f) :pretty t :right-margin 17)
; => "(#\\a #\\b #\\c #\\
; #\\d #\\e #\\f)"

The #\Space character is defined to be graphic; therefore, it must print as #\ rather than #\Space; if it happens to be printed just before a conditional newline (such as, for example, generated by using pprint-fill to print a list), the pretty-printer will helpfully remove the space character that has just been printed before inserting the newline. This means that a #\Space character, printed at or near the right margin, will be read back as a #\Newline character.

It's interesting to see what other implementations do. CLISP 2.49 in its default mode always prints #\Space; in -ansi mode it prints #\ but preserves the space even before a conditional newline. CCL 1.10 similarly preserves the space; there's an explicit check in output-line-and-setup-for-next for an "escaped" space (and a comment that acknowledges that this is a heuristic that can be wrong in the other direction). I'm not sure what the best fix for this is; it's fairly clear that the requirements on the printer aren't totally consistent. For SBCL, I have merged a one-line change that makes the printer print using character names even for graphic characters, if the *print-readably* printer control variable is true; it may not be ideal that print/read round-tripping was broken in the normal case, but in the case where it's explicitly been asked for it is clearly wrong.

ABCL DevShort tip :: Booting quicklisp via SLIME

· 51 days ago
For the truly lazy, the following forms will quickly load Quicklisp:

   (require :abcl-contrib) (asdf:load-system :quicklisp-abcl)

drmeisterThings you can do with Clasp #1

· 54 days ago

Now that people are starting to build Clasp I thought I would say a few things about what you can do with it.

Clasp is an implementation of Common Lisp (80-90% complete) and I plan to make it an ANSI compliant Common Lisp as soon as possible. I also plan to add a faster compiler and expose some powerful C++ libraries to extend its capabilities.

But here are a few things you can do right away to get a sense of what Clasp does and to test some of its capabilities. Code that you type will (look-like-this). Don’t worry about how what you see isn’t colored or highlighted as shown here, I’m exploring different ways of rendering program output in this blog.

(defun add (x y) (+ x y))


(add 3 5)


(disassemble 'add)


define internal void @ADD({ {}*, i32 }* %result-ptr, { {}* }* %closed-af-ptr, i32 %num-varargs, {}* %farg0, {}* %farg1, {}* %farg2, i8* %va-list) {
  %lambda-args-42- = alloca { {}* }
  call void @newAFsp({ {}* }* %lambda-args-42-)
  %temp = alloca { {}* }
  call void @newTsp({ {}* }* %temp)
  %0 = alloca { {}* }
  call void @newTsp({ {}* }* %0)
  call void @makeValueFrame({ {}* }* %lambda-args-42-, i32 2, i32 2000058)
  call void @setParentOfActivationFrame({ {}* }* %lambda-args-42-, { {}* }* %closed-af-ptr)
  %correct-num-args = icmp eq i32 %num-varargs, 2
  br i1 %correct-num-args, label %\"(TRY-0).continue3\", label %\"(TRY-0).error\"

\"(TRY-0).error\":                                  ; preds = %\"(TRY-0).entry\"
  %enough-args = icmp slt i32 %num-varargs, 2
  br i1 %enough-args, label %\"(TRY-0).error1\", label %\"(TRY-0).continue\"

That gobbledy-gook is pure LLVM-IR code generated by the Clasp Common Lisp compiler and using the fantastic LLVM library. It is exactly the same intermediate language that the Clang compiler converts C++/C/Objective-C into before it lowers it to machine code. Clasp shares the same LLVM backend library with the Clang compiler and once Clasp generates LLVM-IR, that is as efficient as that generated by Clang, then Clasp will generate code that runs as fast as C++ and C!

For some more information on Common Lisp you can check out the following links or use Google. I haven’t tested the tutorial against Clasp but almost everything should work.

For a tutorial:

For beginners:

Land of Lisp: Learn to Program in Lisp, One Game at a Time!

Land of Lisp: Learn to Program in Lisp, One Game at a Time!

Buy from Amazon

I’ll post more later.

Hans Hübner

· 55 days ago

Berlin Lispers Meetup: Tuesday September 30th, 2014, 8.00pm

You are kindly invited to the next "Berlin Lispers Meetup", an informal gathering for anyone interested in Lisp, beer or coffee:

Berlin Lispers Meetup
Tuesday, September 30th, 2014
8 pm onwards

St Oberholz, Rosenthaler Straße 72, 10119 Berlin
U-Bahn Rosenthaler Platz

We will try to occupy a large table on the first floor, but in case you don't see us,
please contact Christian: 0157 87 05 16 14.

Please join for another evening of parentheses!

drmeisterHow Clasp compiles itself

· 58 days ago

This is the process that Clasp uses to compile itself from the repository. I wrote it stream-of-consciousness in response to a question on IRC and I thought I should document this for later.

Clasp starts up running with a slow S-expression walking interpreter.
It loads clasp/src/lisp/kernel/init.lsp. Within init.lsp is a list of modules that it loads to build everything. The list is in the variable *init-files* which contains a list of symbols that are translated to file pathnames by the system. The list also includes keywords like :base and :cmp – these denote way-points in the build process. Clasp then loads files from :start to :cmp into the interpreter. That loads in the Common Lisp code for the Clasp compiler. Then Clasp COMPILE-FILE(s) each source file from :start to :cmp and after compiling each file it loads the new bitcode file that is generated.
It’s slow to start but once the compiler starts compiling the files between :pre-cmp and :cmp it gets faster and faster as interpreted functions are replaced by compiled functions. Then it loads cmp/cmprepl (these files are all in src/lisp/kernel/lsp/* and src/lisp/kernel/cmp/*)
cmp/cmprepl installs a function that automatically compiles every form passed to EVAL before it evaluates it. Then it loads the files from :cmp to :min and then COMPILE-FILE(s) them. Then Clasp links all the bitcode files together with a bitcode file generated from src/llvmo/ So C++ code gets inlined into the compiled Lisp code. Then it generates min-boehm:image.bundle this is a minimal common lisp without CLOS. Then Clasp starts compiling the full-boehm version. It boots with the min-boehm:image.bundle and LOADs everything from :base to :all. Then it COMPILE-FILE(s) everything from :base to :all.
Then it links all of that together with and some epilogue code to start up the REPL.

Whew – that’s the build system.

drmeisterBuilding Clasp and externals-clasp

· 59 days ago

Hey folks,

— Update 18:00 EST Sept 29, 2014 —

Things are starting to stabilize and multiple people are getting Clasp built on OS X and Linux. The precise systems that Clasp has been built on are listed in the file on github.

We have a chatroom on IRC on freenode in #clasp. Drop in if you have some time.

On linux you need either (a) gcc 4.8 -or- (b) gcc 4.9 and clang 3.5. In the (a) case, the clang 3.6 compiler that comes with externals-clasp can be used. (Sorry, it’s a bit complicated).

These linux requirements are so specific because of changes that have been made to gcc 4.9 and llvm/clang that are outside of my control.

Thank you for being so brave and installing a new package like clasp in the first days after it was released.

You need to install externals-clasp first before installing clasp.

People are starting to post notifications about bugs in Clasp and that’s fantastic! Thank you. I’ll get to them as quickly as possible.

I love to hear peoples comments, feedback and suggestions regarding how to improve clasp.



Christophe Rhodescode walking for pipe sequencing

· 59 days ago

Since it seems still topical to talk about Lisp and code-transformation macros, here's another worked example - this time inspired by the enthusiasm for the R magrittr package.

The basic idea behind the magrittr package is, as Hadley said at EARL2014, to convert from a form of code where arguments to the same function are far apart to one where they're essentially close together; the example he presented was converting

      filter(babynames, name == "Hadley"),
    total = sum(n)


b0 <- babynames
b1 <- filter(b0, name == "Hadley")
b2 <- group_by(b1, year)
b3 <- summarise(b2, total = sum(n))
b4 <- arrange(b3, desc(year))

only without the danger of mistyping one of the variable names along the way and failing to perform the computation that was intended.

R, as I have said before, is a Lisp-1 with weird syntax and wacky evaluation semantics. One of the things that ordinary user code can do is inspect the syntactic form of its arguments, before evaluating them. This means that when looking at a fragment of code such as

foo(bar(2,3), 4)

where a call-by-value language would first evaluate bar(2,3), then call foo with two arguments (the value resulting from the evaluation, and 4), R instead uses a form of call-by-need evaluation, and also provides operators for inspecting the promise directly. This means R users can do such horrible things as

foo <- function(x) {
    tmp <- substitute(x)
    sgn <- 1
    while(class(tmp) == "(") {
        tmp <- tmp[[2]]
        sgn <- sgn * -1
    sgn * eval.parent(tmp)
foo(3) # 3
foo((3)) # -3
foo(((3))) # 3
foo((((3)))) # -3 (isn't this awesome?  I did say "wacky")

In the case of magrittr, the package authors have taken advantage of this to invent some new syntax; the pipe operator %>% is charged with inserting its first argument (its left-hand side, in normal operation) as the first argument to the call of its second argument (right-hand side). Hadley's example is

babynames %>%
  filter(name == "Hadley") %>%
  group_by(year) %>%
  summarise(total = sum(n)) %>%

and this is effective because the data flow in this case really is a pipeline: there's a dataset, which needs filtering, then grouping, then summarization, then sorting, and each operation works on the result of the previous. This already needs to inspect the syntactic form of the argument; an additional feature is recognizing the presence of .s in the call, and placing the left-hand side value in that argument position instead of as the first argument if it is present.

In Common Lisp, there are some piping or chaining operators out there (e.g. one two three (search for ablock) four and probably many others), and they do well enough. However! They mostly suffer from similar problems that we've seen before: doing code transformations with not quite enough understanding of the semantics of the code that they're transforming; again, that's fine for normal use, but for didactic purposes let's pretend that we really care about this.

The -> macro from is basically the same as the magrittr %>% operator: it converts symbols in the pipeline to function calls, and places the result of the previous evaluation as the first argument of the current operator, except if a $ is present in the arguments, in which case it replaces that. (This version doesn't support more than one $ in the argument list; it would be a little bit of a pain to support that, needing a temporary name, but it's straightforward in principle).

Since the -> macro does its job, a code-walker implementation isn't strictly necessary: pure syntactic manipulation is good enough, and if it's used with just the code it expects, it will do it well. It is of course possible to express what it does using a code-walker; we'll fix the multiple-$ 'bug' along the way, by explicitly introducing bindings rather than replacements of symbols:

(defmacro -> (form &body body)
  (labels ((find-$ (form env)
             (sb-walker:walk-form form env
              (lambda (f c e)
                  ((eql f '$) (return-from find-$ t))
                  ((eql f form) f)
                  (t (values f t)))))
           (walker (form context env)
               ((symbolp form) (list form))
               ((atom form) form)
               (t (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
    `(let (($ ,form))
       ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))

How to understand this implementation? Well, clearly, we need to understand what sb-walker:walk does. Broadly, it calls the walker function (its third argument) on successive evaluated subforms of the original form (and on variable names set by setq); the primary return value is used as the interim result of the walk, subject to further walking (macroexpansion and walking of its subforms) except if the second return value from the walker function is t.

Now, let's start with the find-$ local function: its job is to walk a form, and returns t if it finds a $ variable to be evaluated at toplevel and nil otherwise. It does that by returning t if the form it's given is $; otherwise, if the form it's given is the original form, we need to walk its subforms, so return f; otherwise, return its form argument f with a secondary value of t to inhibit further walking. This operation is slightly at odds with the use of a code walker: we are explicitly not taking advantage of the fact that it understands the semantics of the code it's walking. This might explain why the find-$ function itself looks a bit weird.

The walker local function is responsible for most of the code transformation. It binds $ to the value of the first form, then repeatedly sets $ to the value of successive forms, rewritten to interpolate a $ in the first argument position if there isn't one in the form already (as reported by find-$). If any of the forms is a symbol, it gets listified and subsequently re-walked. Thus

(macroexpand-1 '(-> "THREE" string-downcase (char 0)))
; => (LET (($ "THREE"))
;      (SETQ $ (CHAR $ 0))),
;    T

So far, so good. Now, what could we do with a code-walker that we can't without? Well, the above implementation of -> supports chaining simple function calls, so one answer is "chaining things that aren't just function calls". Another refinement is to support eliding the insertion of $ when there are any uses of $ in the form, not just as a bare argument. Looking at the second one first, since it's less controversial:

(defmacro -> (form &body body)
  (labels ((find-$ (form env)
             (sb-walker:walk-form form env
              (lambda (f c e)
                  ((and (eql f '$) (eql c :eval))
                   (return-from find-$ t))
                  (t f))))
           (walker (form context env)
               ((symbolp form) (list form))
               ((atom form) form)
               (t (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
    `(let (($ ,form))
       ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))

The only thing that's changed here is the definition of find-$, and in fact it's a little simpler: the task is now to walk the entire form and find uses of $ in an evaluated position, no matter how deep in the evaluation. Because this is a code-walker, this will correctly handle macros, backquotes, quoted symbols, and so on, and this allows code of the form

(macroexpand-1 '(-> "THREE" string-downcase (char 0) char-code (complex (1+ $) (1- $))))
; => (LET (($ "THREE"))
;      (SETQ $ (CHAR-CODE $))
;      (SETQ $ (COMPLEX (1+ $) (1- $)))),
;    T

which, as far as I can tell, is not supported in magrittr: doing 3 %>% complex(.+1,.-1) is met with the error that "object '.' not found". Supporting this might, of course, not be a good idea, but at least the code walker shows that it's possible.

What if we wanted to augment -> to handle binding forms, or special forms in general? This is probably beyond the call of duty, but let's just briefly imagine that we wanted to be able to support binding special variables around the individual calls in the chain; for example, we want

(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean)

to expand to

(let (($ 3))
  (setq $ (let ((*random-state* (make-random-state))) (rnorm $)))
  (setq $ (mean $)))

and let us also say, to make it interesting, that uses of $ in the bindings clauses of the let should not count against inhibiting the insertion of $ in the first argument position of the first form in the body of the let, so

(-> 3 (let ((y (1+ $))) (atan y)))

should expand to

(let (($ 3)) (setq $ (let ((y (1+ $))) (atan $ y))))

So our code walker needs to walk the bindings of the let, merely collecting information into the walker's lexical environment, then walk the body performing the same rewrite as before. CHALLENGE ACCEPTED:

(defmacro -> (&body forms)
  (let ((rewrite t))
    (declare (special rewrite))
    (labels ((find-$ (form env)
               (sb-walker:walk-form form env
                (lambda (f c e)
                    ((and (eql f '$) (eql c :eval))
                     (return-from find-$ t))
                    (t f))))
             (walker (form context env)
               (declare (ignore context))
               (typecase form
                 (symbol (if rewrite (list form) form))
                 (atom form)
                 ((cons (member with-rewriting without-rewriting))
                  (let ((rewrite (eql (car form) 'with-rewriting)))
                    (declare (special rewrite))
                    (values (sb-walker:walk-form (cadr form) env #'walker) t)))
                 ((cons (member let let*))
                  (unless rewrite
                    (return-from walker form))
                  (let* ((body (member 'declare (cddr form)
                                       :key (lambda (x) (when (consp x) (car x))) :test-not #'eql))
                         (declares (ldiff (cddr form) body))
                         (rewritten (sb-walker:walk-form
                                          (,(car form) ,(cadr form)
                                     env #'walker)))
                    (values rewritten t)))
                  (unless rewrite
                    (return-from walker form))
                  (if (find-$ form env)
                      (values `(setq $ ,form) t)
                      (values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
      `(let (($ ,(car forms)))
         ,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) (cdr forms))))))

Here, find-$ is unchanged from the previous version; all the new functionality is in walker. How does it work? The default branch of the walker function is also unchanged; what has changed is handling of let and let* forms. The main trick is to communicate information between successive calls to the walker function, and turn the rewriting on and off appropriately: we wrap parts of the form in new pseudo-special operators with-rewriting and without-rewriting, which is basically a tacky and restricted implementation of compiler-let - if we needed to, we could do a proper one with macrolet. Within the scope of a without-rewriting, walker doesn't do anything special, but merely return the form it was given, except if the form it's given is a with-rewriting form. This is a nice illustration, incidentally, of the idea that lexical scope in the code translates nicely to dynamic scope in the compiler; I can't remember where I read that first (but it's certainly not a new idea).

And now

(macroexpand '(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean))
; => (LET (($ 3))
;        (SETQ $ (RNORM $)))
;      (SETQ $ (MEAN $))),
;    T
(macroexpand '(-> 3 (let ((y (1+ $))) (atan y))))
; => (LET (($ 3))
;      (LET ((Y (1+ $)))
;        (SETQ $ (ATAN $ Y)))),
;    T

Just to be clear: this post isn't advocating a smarter pipe operator; I don't have a clear enough view, but I doubt that the benefits of the smartness outweigh the complexity. It is demonstrating what can be done, in a reasonably controlled way, using a code-walker: ascribing semantics to fragments of Common Lisp code, and combining those fragments in a particular way, and of course it's another example of sb-walker:walk in use.

Finally, if something like this does in fact get used, people sometimes get tripped up by the package system: the special bits of syntax are symbols, and importing or package-qualifying -> without doing the corresponding thing to $ would lead to cryptic errors, wrong results and/or confusion. One possibility to handle that is to invent a bit more reader syntax:

(set-macro-character #\¦
 (defun pipe-reader (stream char)
   (let ((*readtable* (copy-readtable)))
     (set-macro-character #\·
      (lambda (stream char)
        (declare (ignore stream char))
        '$) t)
   (cons '-> (read-delimited-list char stream t)))) nil)
¦"THREE" string-downcase (find-if #'alpha-char-p ·) char-code¦

If this is the exported syntax, it has the advantage that the interface can only be misused intentionally: the actual macro and its anaphoric symbol are both hidden from the programmer; and the syntax is reasonably easy to type - on my keyboard ¦ is AltGr+| and · is AltGr+. - and moderately mnemonic from shell pipes and function notation respectively. It also has all the usual disadvantages of reader-based interfaces, such as composability, somewhat mitigated if pipe-reader is part of the macro's exported interface.

Gábor MelisMigration to github

· 60 days ago

Due to the bash security hole that keeps giving, I had to disable gitweb at and move all non-obsolete code over to github. This affects Six the Hex AI, the Planet Wars bot, MiCMaC, FSVD, Lassie and cl-libsvm.

drmeisterBinding C++ to Clasp

· 61 days ago

This document borrows heavily from the luabind documentation.

Clbind is a C++ template library that helps you create bindings between C++ and Clasp Common Lisp. It is very different from a typical FFI (foreign function interfaces) libraries. It allows you to expose functions, classes, methods, enums and other C++ features to Clasp. Clbind also provides the functionality to create classes in Common Lisp that subclass C++ classes and add CLOS slots to those derived classes. Common Lisp functions can override C++ methods.

Clbind is implemented using C++ template meta programming. It is heavily inspired by boost::python and luabind, two other C++ template libraries that provide interoperation between C++ and Python and Lua respectively. Clbind doesn’t require you to run extra preprocessing passes to compile your project (all of the work is done by the C++ compiler). Clbind doesn’t require you to know the exact signature of each function that you register because the C++ template library will determine this information at compile time.

Clbind closely follows the API used by luabind. I will more fully describe the Clbind API in the coming weeks.

For examples of how Clbind is used see:  clasp/src/asttooling/  or clasp/src/asttooling/

in the Clasp source code.

Joe MarshallA couple more homographic function tricks

· 62 days ago
A generalized continued fraction is an expression of the form:
You can partly apply a homographic function to a generalized continued fraction if you have a stream of the ai and bi
(define (partly-apply-general hf nums dens)
  (let ((a (first  hf))
        (b (second hf))
        (c (third  hf))
        (d (fourth hf)))
    (if (empty-stream? nums)
        (values (list a a
                      c c)
        (let ((num (head nums))
              (den (head dens)))
          (values (list (+ (* a den) (* b num)) a
                        (+ (* c den) (* d num)) c)
                  (tail nums)
                  (tail dens))))))

(define (print-hf-general hf nums dens)
  (call-with-values (lambda () (partly-evaluate hf))
    (lambda (term hf*)
      (if (not term)
          (call-with-values (lambda () (partly-apply-general hf nums dens))
            (display term) 
            ;; take reciprocal and multiply by 10
            (let ((a (first hf*))
                  (b (second hf*))
                  (c (third hf*))
                  (d (fourth hf*)))
              (print-hf-general (list (* c 10) (* d 10)
                                      a        b)
                           nums dens)))))))
For example, we can compute pi from this generalized continued fraction:
(print-hf-general (list 0 4 1 0)
              ;; [1 1 4 9 16 25 ...]
       (cons-stream 1
      (let squares ((i 1))
    (cons-stream (* i i) (squares (1+ i)))))
              ;; [1 3 5 7 9 11 ...]
       (let odds ((j 1)) 
     (cons-stream j (odds (+ j 2)))))

; Quit!
A bi-homographic function is a function of the form:
(define (bi-homographic a b c d e f g h)
  (lambda (x y)
    (/ (+ (* a x y) (* b x) (* c y) d)
       (+ (* e x y) (* f x) (* g y) h))))
Like a homographic function, you can partly evaluate a bi-homographic function and generate a continued fraction. You can also partly apply a bi-homographic function to a pair of continued fractions. When you do this, you have a choice of which continued fraction to be the object of the partial application. There's about twice as much nasty math involved, but the gist is that a bi-homographic function takes two continued fractions as arguments and produces one continued fraction as a result.

It turns out that addition, subtraction, multiplication and division are bi-homographic functions, so one can incrementally compute sums and products of continued fractions.

Gábor MelisHiggs Boson Machine Learning Challenge Bits and Pieces

· 62 days ago

The Higgs Boson contest on Kaggle has ended. Sticking to my word at ELS 2014, I released some code that came about during these long four months.

MGL-GPR is no longer a Genetic Programming only library because it got another Evolutionary Algorithm implementation: Differential Evolution. My original plan for this contest was to breed input features that the physicists in their insistence on comprehensibility overlooked, but it didn't work as well as I had hoped for reasons specific to this contest and also because evolutionary algorithms just do not scale to larger problem sizes.

In other news, MGL got cross-validation, bagging and stratification support in the brand new MGL-RESAMPLE package documented with MGL-PAX which all of you will most definitely want to use. My winning submission used bagged cross-validated dropout neural networks with stratified splits so this is where it's coming from.

MGL itself and MGL-MAT were updated to work with the latest CL-CUDA. The neural network code also saw some additions such as ->MAX-CHANNEL activation (which originated as LWTA) and also gaussian multiplicative noise. The next steps here are further cleanups to MGL, writing documentation and moving it to github. Also, there is some hope that one day CL-CUDA can be included in quicklisp allowing my stuff there to be updated to their latest versions.

The code for this contest is available at higgsml which from now on doubles as my skeleton for lisp projects that need to be delivered as source and as binary. It sucks in all dependencies from quicklisp available at a certain date, clones the necessary repositories not available in quicklisp, builds an executable, and has a simple 'make dist' rule as well.

There is also a fairly generic ensembling algorithm that I will factor out of the code later. newsServer migration completed

· 62 days ago

Thanks to the hard work of everyone in the team we have successfully (it seems) migrated to a new machine.

While a lot of care went into ensuring that all services could be preserved, it is quite unlikely that we didn’t miss something. Please let us know if you find something that used to work but doesn’t anymore by writing to clo-devel(at)

For older items, see the Planet Lisp Archives.

Last updated: 2014-11-18 02:44