Planet Lisp

McCLIMMcCLIM 0.9.7 "Imbolc" release

· 2 days ago

After 10 years we have decided that it is time to make a new release - the first one since 2008, which was McCLIM 0.9.6, St. George's Day. Imbolc is a Gaelic traditional festival marking the beginning of spring held between the winter solstice and the spring equinox.

Due to a long period of time, the number of changes is too big to list in full detail and we will thus note only major changes made during the last eleven iterations (though many important changes were done before that). For more information please check out previous iteration reports on McCLIM blog, git log and the issue tracker. We'd like to thank all present and past contributors for their time, support and testing.

  • Bug fix: tab-layout fixes.
  • Bug fix: formatting-table fixes.
  • Bug fix: scrolling and viewport fixes and refactor.
  • Feature: raster image draw backend extension.
  • Feature: bezier curves extension.
  • Feature: new tests and demos in clim-examples.
  • Feature: truetype rendering is now default on clx.
  • Feature: additions to region, clipping rectangles and drawing.
  • Feature: clim-debugger and clim-listener improvmenets.
  • Feature: mop is now done with CLOSER-MOP.
  • Feature: threading is now done with BORDEAUX-THREADS.
  • Feature: clx-fb backend (poc of framebuffer-based backend).
  • Feature: assumption that all panes must be mirrored has been removed.
  • Cleanup: many files cleaned up from style warnings and such.
  • Cleanup: removal of PIXIE.
  • Cleanup: removal of CLIM-FFI package.
  • Cleanup: changes to directory structure and asd definitions.
  • Cleanup: numerous manual additions and corrections.
  • Cleanup: broken backends has been removed.
  • Cleanup: goatee has been removed in favour of Drei.
  • Cleanup: all methods have now corresponding generic function declarations.

We also have a bounty program financed with money from the fundraiser. We are grateful for financial contributions which allow us to attract new developers and reward old ones with bounties. Currently active bounties (worth $2650) are available here.

As Imbolc marks the beginning of spring we hope this release will be one of many in the upcoming future.

Nicolas HafnerShader Pipelines and Effects - Gamedev

· 10 days ago

header
In the last entry I talked about Trial's way of handling shaders and how they are integrated into the object system to allow inheritance and composition. This time we'll take a look at how that system is extended to allow entire pipelines of effects and rendering systems.

Back in the olden days you could get away with a single pass to render everything. In fact, the fixed-function pipeline of GL 2.1 works that way, as do many simple rendering systems like primitive forward rendering, or most 2D games. However, more complex systems and effects require multiple passes. For instance, a deferred rendering system requires at least two passes, one that emits a "gbuffer", and another that combines the information from it to produce the final image. Another example would be a "god-rays" effect, which requires rendering the scene with all objects black, applying a radial blur to that, and then finally blending it with a rendering of the scene as normal.

These kinds of pipelines can become quite complex, especially when you start to consider that each pipeline might require multiple kinds of textures to use as buffers, inputs, and outputs. The management and allocation of these resources is a pain to do manually, so Trial includes a system to reduce the process to a graph definition. This allows you to write complicated pipelines without having to worry about any of the underlying buffers. The pipeline part is only one hand of the equation, but we'll get to the rest later.

To do the rendering, a stage will require a set of textures to output to, and a set of textures as inputs to the stage. A stage might offer additional parameters aside from that to tweak its behaviour, but those do not require allocation, so we can disregard them for now.

To give a bit of context I'll explain how these textures are attached to the shaders in OpenGL. I've glossed over that in the previous entry as it wasn't directly relevant. Let's consider a simple fragment shader that simply copies the colour from an input texture to an output texture:

uniform sampler2D input;
out vec4 color;

void main(){
  color = texelFetch(input, ivec2(int(gl_FragCoord.x), int(gl_FragCoord.y)));
}

Every GLSL file begins with a set of variable declarations. Each of these global variables has a storage qualifier that tells OpenGL where it comes from. uniforms are parameters that can be set from the CPU before the rendering is performed and are the same for all of the stages in the rendering. outs are variables that are sent to the next stage. The last important qualifier that isn't mentioned above is in, which takes variables from a previous stage. Every file must also contain a main function that acts as the entry point for the computation. In this case we perform a texture read at the current fragment position and store it into the output color - effectively copying it.

The way in which outputs and inputs differ should become apparent here. Inputs are declared as uniforms, which need to be set by the application before each rendering call. The outputs on the other hand are part of the framebuffer target, which is much more implicit. A frame buffer is a special kind of object that points to a set of textures to which the rendering is output. The default frame buffer will draw to your window so that things can actually be seen. This discrepancy between inputs and outputs is a bit of a pain, but it is simply a consequence of how OpenGL evolved. If I had to design a graphics API like this today I would definitely treat input and output buffers in a much more similar manner.

Whatever the case may be, we have the ability to fix this in Trial by giving the user a uniform view over both inputs and outputs to a shader stage. For instance, it would be nice if we could simply say something like the following:

(define-shader-pass renderer ()
  ((:output color)))

(define-shader-pass blur ()
  ((:input previous)
   (:output color)))

Then when we need to define an actual shader pipeline we could just do something like this:

(let ((pipeline (make-pipeline))
      (renderer (make-instance 'renderer))
      (blur (make-instance 'blur)))
  (connect (output color renderer) (input previous blur) pipeline)
  (pack pipeline))

What you see above, you can do in Trial, almost exactly. The syntax is a bit different, but the concept is much the same. The pack step analyses the graph formed by the pipeline and automatically allocates the minimal number of texture buffers needed, re-using them where possible. This is especially useful if you have a long chain of post-effects.

As the texture allocation system is implemented right now it's unfortunately pretty suboptimal. There's some severe limitations that I need to fix, but I have a good idea on how to do that, so I'll probably get around to doing that soon. Still, for most of my work so far the current approach has sufficed, and the general idea is clear.

With the pipeline and allocation system in place, one half of the equation is solved. The other half requires a bit more ingenuity. So far the passes we've looked at did one of two things: either simply render the scene to a buffer, or perform some kind of buffer to buffer transformation like blurring. However, most actually interesting shaders will want to do more complex things than that - they'll want to not only render objects, but also influence how they are rendered.

But, hold on, don't the objects themselves decide how they are rendered? And now the pass should suddenly decide what's going on? Well yes, and yes. Some passes will want to completely influence how every object is drawn, some will want to influence how objects are drawn, and yet some will not care how the objects are drawn at all. Allowing this is the second part of the shader pipeline protocol. As you know, modelling protocols with CLOS is my favourite thing to do, so strap in.

(define-shader-entity shader-pass () ())

(defgeneric register-object-for-pass (pass object))
(defgeneric shader-program-for-pass (pass object))
(defgeneric make-pass-shader-program (pass class))
(defgeneric coerce-pass-shader (pass class type spec))

The above class definition is not quite accurate, but for simplicity's sake I've shortened it to something that will be clear enough for our purposes here, and will link back to the previous entry appropriately. Each shader pass is a shader entity, and as such allows you to attach shader code to the class definition.

The first function, register-object-for-pass, is used to notify a shader pass about an object that would like to draw in the current scene. This is necessary for any shader pass that would like to influence the drawing of an object, or leave it up to the object entirely. In both of those cases, the pass will compute a new effective shader for the object and compile that into a shader program to use. This program can then be retrieved by the object with shader-program-for-pass. Thus, whenever an object is painted onto a stage, it can use this function to retrieve the shader program and set the necessary uniforms with it. Stages that would like to supplant their own rendering method can simply ignore these two functions.

The make-pass-shader-program function is responsible for combining the shader programs from the given class and pass, and to produce a useful shader program resource object as a result. This function, by default, simply does some book keeping and calls coerce-pass-shader to do the actual combination. coerce-pass-shader in turn merely retrieves the corresponding shader code from the pass and once again uses glsl-toolkit's merge-shader-sources to combine it with the existing shader.

That might be a bit much to put together in your head, so let's look at an example. We'll have an object that simply draws itself textured, and a shader pass that should apply a sort of fade out effect where things farther away become more transparent.

(define-shader-entity cube (vertex-entity textured-entity) ())

(define-shader-pass fade (render-pass) ())

(define-class-shader (fade :fragment-shader)
  "out vec4 color;
  
void main(){
  color.a *= (1 - gl_FragCoord.z);
}")

(defmethod setup-scene ((main main) scene)
  (enter (make-instance 'fade) scene)
  (enter (make-instance 'cube) scene))

Similar to how we did it in the previous entry, we define a new entity that is composed of vertices and has textured surfaces. I omitted some details like what vertex data and textures to use to make it simpler. Then we define a new shader pass, which is based on the existing render-pass that is a per-object-pass with a default color and depth output. Next we have to add the shader logic to our pass. Doing what we want is actually quite simple - we simply multiply the colour's alpha value with the depth value. The depth needs to be inverted, as by default 1 is far away and 0 is close by. Thus, with this shader, the farther away the fragment is, the more transparent it will now become.

Finally we define a method that hooks into the rest of Trial and establishes a scene to be drawn. We first enter a new instance of our fade pass into it, which will automatically get added to the underlying shader pipeline. We then add an instance of the cube to it, at which point several things will happen:

  1. The cube is added to the scene graph
  2. register-object-for-pass is called on the pass and cube instances.
  3. This will call determine-effective-shader-class on the cube. This function is used as a caching strategy to avoid duplicating shaders for classes that don't change anything from their superclass.
  4. If the determined class is not yet registered, make-pass-shader-program is called.
  5. make-pass-shader-program computes the effective shader definitions for each shader type.
  6. coerce-pass-shader is called for each type of shader definition that is used, and combines the object's shader source with that of the pass.
  7. The resulting shader program resource object is returned and stored in the pass.

The fragment shader produced by coerce-pass-shader will look something like this:

in vec2 texcoord;
out vec4 color;
uniform sampler2D texture_image;

void _GLSLTK_main_1(){
  color *= texture(texture_image, texcoord);
}

void _GLSLTK_main_2(){
  color.a *= (1 - gl_FragCoord.z);
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
}

Again, this neatly combines the two effects into one source file that can be transparently used in the system. Once it comes to actually drawing the scene, something like the following course of events is going to take place:

  1. paint is called on the scene with the pipeline.
  2. paint-with is called on the fade pass with the scene.
  3. paint is called on the cube with the fade pass.
  4. shader-program-for-pass is called on the fade pass and cube. The previously cached program is retrieved and returned.
  5. The textured-entity part of the cube uses the shader program to set the texture uniform.
  6. The vertex-entity part of the cube uses the shader program to set the transformation matrices.
  7. The vertex-entity part of the cube calls the OpenGL vertex drawing function to finally run the rendering process.
  8. The color output from the fade pass is blitted onto the window for the user to marvel at.

And that is pretty much the entire process. Obviously some steps will be repeated as needed if you add more objects and passes. The pipeline system ensures that the passes are run in the correct order to satisfy dependencies. In case you're wondering about the paint and paint-with inversion, the latter function is useful so that the shader pass gets to have absolute control over what is rendered and how if it so desires. For instance, it could completely ignore the scene and draw something else entirely.

That about concludes the explanation of Trial's shader system as it stands today. As I've mentioned over these two entries there are a few minor places where improvements have to be made, but overall the design seems quite solid. At least I hope I won't suddenly realise a severe limitation somewhere down the line, forcing me to rethink and rewrite everything again.

Next I think I will take a bit of a break from talking about engine details, and instead talk about geometry clipmaps and the adventure I had trying to implement them. Until then~

Quicklisp newsQuicklisp implementation stats for 2017+

· 10 days ago
Here are some raw HTTP request stats for each CL implementation supported by Quicklisp from 2017-01-01 onward. Each request corresponds roughly to downloading a single project.
  • SBCL: 4,518,774 requests
  • Clozure CL: 488,057 
  • CLISP: 34,767
  • LispWorks: 25,700
  • ABCL: 23,913 
  • Allegro: 19,501
  • ECL: 19,027
  • CLASP: 3,335
  • CMUCL: 965
  • MKCL: 79
  • Scieneer: 0
I gathered this info to check Scieneer activity levels to plan future support. Turns out nobody with Scieneer has used Quicklisp since 2013. Bummer!

TurtleWarecl-charms crash course

· 14 days ago

Writing a new CLIM backend

This work is meant as a showcase how to write a new McCLIM backend. To make it more interesting to me I'm writing it using cl-charms library which is a Common Lisp library for ncurses - console manipulation library for UNIX systems. During development I'm planning to make notes about necessary steps. If possible I'll also write a test suite for backends which will test the functionality from most basic parts (like creating windows) to more sophisticated ones (transformations and drawing). That should simplify verifying, if a new backend works fine and to what degree it is complete. We start with a crash course for cl-charms library.

Pointers

cl-charms crash course

Ensure you have ncurses development package installed on your system. Start the real terminal (Emacs doesn't start *inferior-lisp* in something ncurses can work with) and launch your implementation. I use CCL because usually software is developed with SBCL and I want to catch as many problems with cl-charms as possible. After that start swank server and connect from your Emacs session.

~ ccl
? (defvar *console-io* *terminal-io*) ; we will need it later
*CONSOLE-IO*
? (ql:quickload 'swank :silent t)
(SWANK)
? (swank:create-server :port 4005 :dont-close t)
;; Swank started at port: 4005.
4005
? (loop (sleep 1))

We loop over sleep because we don't want console prompt to read our first line for the console. If you don't do that you may have somewhat confusing behavior.

In Emacs: M-x slime-connect *Host:* localhost *Port:* 4005. Now we are working in *slime-repl ccl* buffer in Emacs and we have ncurses output in the terminal we have launched server from. Try some demos bundled with the library:

CL-USER> (ql:quickload '(cl-charms bordeaux-threads alexandria))
(CL-CHARMS BORDEAUX-THREADS ALEXANDRIA)
CL-USER> (ql:quickload '(cl-charms-paint cl-charms-timer) :silent t)
(CL-CHARMS-PAINT CL-CHARMS-TIMER)
CL-USER> (charms-timer:main) ; quit with Q, start/stop/reset with [SPACE]
CL-USER> (charms-paint:main) ; quit with Q, move with WSAD and paint with [SPACE]

Now we will go through various charms (and ncurses) capabilities. Our final goal is to have a window with four buttons and text input box. Navigation should be possible with [TAB] / [SHIFT]+[TAB] and by selecting gadgets with a mouse pointer. Behold, time for the first application.

First application

Lets dissect this simple program which prints "Hello world!" on the screen:

(defun hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       do (progn
            (charms:clear-window window)
            (charms:write-string-at-point window "Hello world!" 0 0)
            (charms:refresh-window window)

            ;; Process input
            (when (eql (charms:get-char window) #\q)
              (return-from hello-world))
            (sleep 0.1)))))

Program must be wrapped in charms:with-curses macro which ensures proper initialization and finalization of the program. In this operator context charms functions which configure the library are available. We use charms:disable-echoing to prevent unnecessary obfuscation of the window (we interpret characters ourself) and (charms:enable-raw-input) to turn off line buffering. charms:*standard-window* is a window covering whole terminal screen.

We create a Window for output (its size is 50x15 and offset is 10x10) and then in a loop we print "Hello world!" (at the top-left corner of it) until user press the character q.

Extending cl-charms API

All functions used until now come from higher-level interface. charms has also a low-level interface which maps to libncurses via CFFI. This interface is defined in the package named charms/ll. I highly recommend skimming through http://www.tldp.org/HOWTO/NCURSES-Programming-HOWTO which is a great overview of ncurses functionality.

We want borders around the window. CFFI interface is a bit ugly (i.e we would have to extract a window pointer to call wborder on it). We are going to abstract this with a function which plays nice with the lispy abstraction.

(defun draw-window-border (window
                           &optional;
                             (ls #\|) (rs #\|) (ts #\-) (bs #\-)
                             (tl #\+) (tr #\+) (bl #\+) (br #\+))
  (apply #'charms/ll:wborder (charms::window-pointer window)
         (mapcar #'char-code (list ls rs ts bs tl tr bl br))))

(defun draw-window-box (window &optional; (verch #\|) (horch #\-))
  (charms/ll:box (charms::window-pointer window) (char-code verch) (char-code horch)))

Now we can freely use draw-window-border. Put (draw-window-box window) after (charms:clear-window window) in hello-world program and see the result. It is ugly, but what did you expect from a window rendered in the terminal?

It is worth mentioning that border is drawn inside the window, so when we start writing string at point [0,0] - it overlaps with the border. If we want to paint content inside the border we should start at least at [1,1] and stop at [48,13].

Somewhat more appealing result may be achieved by having distinct window background instead of drawing a border with characters. To do that we need to dive into the low-level interface once more. We define colors API.

(defun start-color ()
  (when (eql (charms/ll:has-colors) charms/ll:FALSE)
    (error "Your terminal does not support color."))
  (let ((ret-code (charms/ll:start-color)))
    (if (= ret-code 0)
        T
        (error "start-color error ~s." ret-code))))

(eval-when (:load-toplevel :compile-toplevel :execute)
  (defconstant +black+   charms/ll:COLOR_BLACK)
  (defconstant +red+     charms/ll:COLOR_RED)
  (defconstant +green+   charms/ll:COLOR_GREEN)
  (defconstant +yellow+  charms/ll:COLOR_YELLOW)
  (defconstant +blue+    charms/ll:COLOR_BLUE)
  (defconstant +magenta+ charms/ll:COLOR_MAGENTA)
  (defconstant +cyan+    charms/ll:COLOR_CYAN)
  (defconstant +white+   charms/ll:COLOR_WHITE))

(defmacro define-color-pair ((name pair) foreground background)
  `(progn
     (start-color)
     (defparameter ,name (progn (charms/ll:init-pair ,pair ,foreground ,background)
                                (charms/ll:color-pair ,pair)))))

(define-color-pair (+white/blue+ 1) +white+ +blue+)
(define-color-pair (+black/red+ 2) +black+ +red+)

(defun draw-window-background (window color-pair)
  (charms/ll:wbkgd (charms::window-pointer window) color-pair))

(defmacro with-colors ((window color-pair) &body body)
  (let ((winptr (gensym)))
    (alexandria:once-only (color-pair)
      `(let ((,winptr (charms::window-pointer ,window)))
         (charms/ll:wattron ,winptr ,color-pair)
         ,@body
         (charms/ll:wattroff ,winptr ,color-pair)))))

start-color must be called when we configure the library. We map charm/ll constants to lisp constants and create define-color-pair macro. This abstraction could be improved so we are not forced to supply pair numbers and providing proper association between names and integers. We skip that step for brevity. Define two color pairs, function for filling a window background and macro with-colors for drawing with a specified palette. Finally lets use the new abstraction in pretty-hello-world function:

(defun pretty-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       do (progn
            (charms:clear-window window)
            (draw-window-background window +white/blue+)
            (with-colors (window +white/blue+)
              (charms:write-string-at-point window "Hello world!" 0 0))
            (with-colors (window +black/red+)
              (charms:write-string-at-point window "Hello world!" 0 1))
            (charms:refresh-window window)

            ;; Process input
            (when (eql (charms:get-char window :ignore-error t) #\q)
              (return-from hello-world))
            (sleep 0.1)))))

Result looks, as promised in the function name, very pretty ;-)

Asynchronous input

Printing Hello world! doesn't satisfy our needs, we want to interact with a brilliant software we've just made while its running. Even more, we want to do it without blocking computations going on in the system (which are truly amazing, believe me). First lets visualise these computations to know that they are really happening. Modify the program loop to draw Hello World! in different color on each iteration.

(defun amazing-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       for flip-flop = (not flip-flop)
       do (progn
            (charms:clear-window window)
            (draw-window-background window +white/blue+)
            (with-colors (window (if flip-flop
                                     +white/blue+
                                     +black/red+))
              (charms:write-string-at-point window "Hello world!" 0 0))
            (charms:refresh-window window)
            ;; Process input
            (when (eql (charms:get-char window :ignore-error t) #\q)
              (return-from hello-world))
            (sleep 1)))))

Something is not right. When we run amazing-hello-world to see it flipping - it doesn't. Our program is flawed. It waits for each character to verify that the user hasn't requested application exit. You press any key (i.e space) to proceed to the next iteration. Now we must think of how to obtain input from user without halting the application.

To do that we can enable non blocking mode for our window.

with window = (let ((win (charms:make-window 50 15 10 10)))
                (charms:enable-non-blocking-mode win)
                win)

This solution is not complete unfortunately. It works reasonably well, but we have to wait a second (because "computation" is performed every second, we call sleep after each get-char) before the character is handled. It gets even worse if we notice, that pressing five times character b and then q will delay processing by six seconds (characters are processed one after another in different iterations with one second sleep between them). We need something better.

I hear your internal scream: use threads! It is important to keep in mind, that if you can get without threads you probably should (same applies for cache and many other clever techniques which introduce even cleverer bugs). Keep also in mind that ncurses is not thread-safe. We are going to listen for events from all inputs like select does and generate "recompute" event each second. On implementation which support timers we could use them but we'll use... a thread to generate "ticks". Note that we use a thread as an asynchronous input rather than asynchronous charms access.

;;; asynchronous input hack (should be a mailbox!)
(defparameter *recompute-flag* nil "ugly and unsafe hack for communication")
(defvar *recompute-thread* nil)

(defun start-recompute-thread ()
  (when *recompute-thread*
    (bt:destroy-thread *recompute-thread*))
  (setf *recompute-thread*
        (bt:make-thread
         #'(lambda ()
             (loop
                (sleep 1)
                (setf *recompute-flag* t))))))

(defun stop-recompute-thread ()
  (when *recompute-thread*
    (bt:destroy-thread *recompute-thread*)
    (setf *recompute-thread* nil)))

In this snippet we create an interface to start a thread which sets a global flag. General solution should be a mailbox (or a thread-safe stream) where asynchronous thread writes and event loop reads from. We will settle with this hack though (it is a crash course not a book after all). Start recompute thread in the background before you start new application. Note, that this code is not thread-safe, we concurrently read and write to a global variable. We are also very drastic with bt:destroy-thread, something not recommended in any code which is not a demonstration like this one.

Time to refactor input and output functions: display-amazing-hello-world and get-amazing-hello-world-input.

(defun display-amazing-hello-world (window flip-flop)
  (charms:clear-window window)
  (draw-window-background window +white/blue+)
  (with-colors (window (if flip-flop
                           +white/blue+
                           +black/red+))
    (charms:write-string-at-point window "Hello world!" 0 0))
  (charms:refresh-window window))

(defun get-amazing-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-amazing-hello-world-input :compute))
  (charms:get-char window :ignore-error t))

And finally improved application which takes asynchronous input without blocking.

(defun improved-amazing-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      (charms:enable-non-blocking-mode window)
      (display-amazing-hello-world window flip-flop)
      (loop named hello-world
         do (case (get-amazing-hello-world-input window)
              ((#\q #\Q) (return-from hello-world))
              (:compute (setf flip-flop (not flip-flop))
                        (display-amazing-hello-world window flip-flop))
              ;; don't be a pig to a processor
              (otherwise (sleep 1/60)))))))

When you are done with demo you may call stop-recompute-thread to spare your image unnecessary flipping a global variable.

Gadgets and input handling

So we have created an amazing piece of software which does the computation and reacts (instantaneously!) to our input. Greed is an amazing phenomena - we want more... We want interactive application - buttons and input box (allowing us to influence the amazing computation at run time).

First we define abstract class gadget.

(defparameter *active-gadget* nil)

;;; gadget should be type of `window' or `panel' - we are simplistic
(defclass gadget ()
  ((position :initarg :position :accessor gadget-position)))

(defgeneric display-gadget (window gadget &key &allow-other-keys)
  (:method ((window charms:window) (gadget gadget) &key)
    (declare (ignore window gadget))))

(defgeneric handle-input (gadget input &key &allow-other-keys)
  (:method (gadget input &key)
    (declare (ignore gadget input))))

In our model each gadget has at least position, display function and method for handling input. Both methods are gadget-specific with defaults doing nothing. We define also a parameter *active-gadget* which holds gadget receiving input.

handle-input returns T only if this input causes, that gadget has to be redisplayed. Otherwise it should return NIL (this is a small optimization which we will use later in main application loop).

Lets define something what we will use in our application.

(define-color-pair (+black/white+ 3) +black+ +white+) ; color for text-input (inactive)
(define-color-pair (+black/cyan+ 4) +black+ +cyan+)   ; color for text-input (active)
(defparameter *computation-name* "Hello world!")

(defclass text-input-gadget (gadget)
  ((buffer :initarg :buffer :accessor gadget-buffer)
   (width :initarg :width :reader gadget-width)))

(defun make-text-input-gadget (width x y)
  (make-instance 'text-input-gadget
                 :width width
                 :position (cons x y)
                 :buffer (make-array width
                                     :element-type 'character
                                     :initial-element #\space
                                     :fill-pointer 0)))

(defmethod display-gadget ((window charms:window) (gadget text-input-gadget) &key)
  (with-colors (window (if (eql gadget *active-gadget*)
                           +black/cyan+
                           +black/white+))
    (let ((background (make-string (gadget-width gadget) :initial-element #\space)))
      (destructuring-bind (x . y) (gadget-position gadget)
        (charms:write-string-at-point window background x y)
        (charms:write-string-at-point window (gadget-buffer gadget) x y)))))

(defmethod handle-input ((gadget text-input-gadget) input &key)
  (let ((buffer (gadget-buffer gadget)))
    (case input
      ((#\Backspace #\Rubout)
       (unless (zerop (fill-pointer buffer))
         (vector-pop buffer)))
      ((#\Return #\Newline)
       (unless (zerop (fill-pointer buffer))
         (setf *computation-name* (copy-seq buffer)
               (fill-pointer buffer) 0)))
      (#\ESC
       (setf (fill-pointer buffer) 0))
      (otherwise
       (when (ignore-errors (graphic-char-p input))
         (vector-push input buffer))))))

First gadget we define is text-input-gadget. What we need as its internal state is a buffer which holds text which is typed in the box. We care also about the string maximal width.

Moreover define colors for it to use in display-gadget (we depend on global parameter *active-gadget* what is a very poor taste). In display function we create a "background" (that wouldn't be necessary if it were a panel, abstraction defined in a library accompanying ncurses) and then at the gadget position we draw background and buffer contents (text which was already typed in text-input-gadget).

Second function handle-input interprets characters it receives and acts accordingly. If it is backspace (or rubout as on my keyboard with this terminal settings), we remove one element. If it is return (or newline) we change the computation name and empty input. escape clears the box and finally, if it is a character which we can print, we add it to the text-input (vector-push won't extend vector length so extra characters are ignored).

(defparameter *gadgets*
  (list (make-text-input-gadget 26 2 13)))

(defun display-greedy-hello-world (window flip-flop)
  (charms:clear-window window)
  (draw-window-background window +white/blue+)
  (with-colors (window (if flip-flop
                           +white/blue+
                           +black/red+))
    (charms:write-string-at-point window *computation-name* 2 1))
  (dolist (g *gadgets*)
    (if (eql g *active-gadget*)
        (display-gadget window g)
        (charms:with-restored-cursor window
          (display-gadget window g))))
  (charms:refresh-window window))

We maintain a list of gadgets which are displayed one-by-one in the window (after signalling, that computation is being performed). Previously we had hard coded "Hello world!" name but now we depend on variable *computation-name* which may be modified from the input box.

Each gadget is displayed but only *active-gadget* is allowed to modify cursor position. Other rendering is wrapped in charms:with-restored-cursor macro which does the thing name suggests. That means, that cursor will be position whenever *active-gadget* puts it (or if it doesn't modify its position - cursor will be at the end of the computation string).

(defun get-greedy-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-greedy-hello-world-input :compute))
  (charms:get-char window :ignore-error t))

(defun greedy-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      (charms:enable-non-blocking-mode window)
      (display-greedy-hello-world window flip-flop)
      (catch :exit
        (loop
           do (let ((input (get-greedy-hello-world-input window)))
                (case input
                  (#\Dc1 ;; this is C-q
                   (throw :exit :c-q))
                  (#\Dc2 ;; this is C-r
                   (charms:clear-window charms:*standard-window*)
                   (charms:refresh-window charms:*standard-window*)
                   (display-greedy-hello-world window flip-flop))
                  (#\tab
                   (alexandria:if-let ((remaining (cdr (member *active-gadget* *gadgets*))))
                     (setf *active-gadget* (car remaining))
                     (setf *active-gadget* (car *gadgets*)))
                   (display-greedy-hello-world window flip-flop))
                  (:compute
                   (setf flip-flop (not flip-flop))
                   (display-greedy-hello-world window flip-flop))
                  (otherwise
                   (if (handle-input *active-gadget* input)
                       ;; redisplay only if handle-input returns non-NIL
                       (display-greedy-hello-world window flip-flop)
                       ;; don't be a pig to a processor
                       (sleep 1/60))))))))))

get-greedy-hello-world-input is fairly the same as get-amazing-hello-world-input for now. greedy-hello-world on the other hand handles input differently. For Quit we reserve C-q instead of q because we want to be able to type this character in the input box. We also add C-r sequence to refresh whole screen (just in case if we resize the terminal and some glitches remain). :compute is handledthe same way as it was previously. Finally if input is something else we feed it to the *active-gadget*. If handle-input returns something else than NIL we redisplay the application.

Note that if it is not redisplayed (i.e input is not handled because *active-gadget* was NIL or it didn't handle the input) we are a good citizen and instead of hogging the processor we wait 1/60 of a second. On the other hand if events come one by one we are legitimately busy so we skip this rest time and continue the event loop.

We don't have means to change which gadget is active yet. For that we have to add a new key which will be handled in the main event loop of the application #\tab. At least we wrap whole loop body in (catch :exit) to allow dynamic exit (instead of lexical one with block/return-from pair).

Start the application and make text-input-gadget active by pressing [tab]. Notice that its background changes. Now we have working input box where we can type new string and when we confirm it with [enter] string at the top will change.

We want more gadgets. For now we will settle with buttons.

(defclass button-gadget (gadget)
  ((label :initarg :label :reader gadget-label)
   (action :initarg :action :reader gadget-action)))

(defun make-button-gadget (text callback x y)
  (make-instance 'button-gadget :label text :action callback :position (cons x y)))

(defmethod display-gadget ((window charms:window) (gadget button-gadget) &key)
  (with-colors (window (if (eql gadget *active-gadget*)
                           +red/black+
                           +yellow/black+))
    (destructuring-bind (x . y) (gadget-position gadget)
      (charms:write-string-at-point window (gadget-label gadget) x y))))

(defmethod handle-input ((gadget button-gadget) input &key)
  (when (member input '(#\return #\newline))
    (funcall (gadget-action gadget))))

Each button is a gadget which (in addition to inherited position) has a label and the associated action. Active button label is drawn with yellow and red ink depending on its state. Handling input reacts only to pressing [enter]. When button action is activated gadget's action is funcall-ed (without arguments).

Modify *gadgets* parameter to have four buttons of ours and run the application (try pressing [tab] a few times to see that active widget is changing).

(defparameter *gadgets*
  (list (make-text-input-gadget 26 2 13)
        (make-button-gadget " Toggle " 'toggle-recompute-thread 30 11)
        (make-button-gadget "  Exit  " 'exit-application 40 11)
        (make-button-gadget " Accept " 'accept-input-box 30 13)
        (make-button-gadget " Cancel " 'cancel-input-box 40 13)))

(defun toggle-recompute-thread ()
  (if *recompute-thread*
      (stop-recompute-thread)
      (start-recompute-thread)))

(defun accept-input-box ()
  (handle-input (car *gadgets*) #\return))

(defun cancel-input-box ()
  (handle-input (car *gadgets*) #\esc))

(defun exit-application ()
  (throw :exit :exit-button))

We have created four buttons - Toggle starts/stops the "computation" thread which feeds us with input, Exit quits the application (that's why we have changed block to catch in greedy-hello-world main loop) and Accept / Cancel pass return and escape input to the text-input-gadget. Once again we prove that we lack a good taste because we aim at first element of *gadgets* parameter assuming, that first element is text input field.

The result looks a little like a user interface, doesn't it? Try activating various buttons and check out if text input works as desired. When you select Toggle and press [enter] computation label at the top will start / stop blinking in one second intervals.

Mouse integration

Our software has reached the phase where it is production ready. Investors crawl at our doorbell because they feel it is something what will boost their income. We are sophisticated and we have proposed a neat idea which will have a tremendous impact on the market. We have even created buttons and text-input gadgets which are the key to success. Something is still missing though. After a brainstorm meeting with the most brilliant minds of our decade we've came to a conclusion - what we miss is the mouse integration. That's crazy, I know. Mouse on a terminal? It is a heresy! Yet we believe that we have to take the risk if we want to outpace the competition. Roll up your sleeves - we are going to change the face of the modern UI design! ;-)

First we will start with abstracting things to make it easy to distribute events among gadgets. To do that we need to know where pointer event has happened. Lets assume that mouse event is composed of three parameters: button, x coordinate and y coordinate. Each gadget occupies some space on the screen (lets call it a region) which may be characterized by its bounding rectangle with [min-x, min-y] and [max-x, max-y] points.

(defgeneric bounding-rectangle (gadget)
  (:method ((gadget text-input-gadget))
    (destructuring-bind (x . y) (gadget-position gadget)
      (values x
              y
              (+ x -1 (gadget-width gadget))
              y)))
  (:method ((gadget button-gadget))
    (destructuring-bind (x . y) (gadget-position gadget)
      (values x
              y
              (+ x -1 (length (gadget-label gadget)))
              y))))

We could refactor button-gadget to have gadget-width method (that would simplify the implementation), but lets use what we have now. Having such representation of bounding rectangle we can now easily determine if cursor event occurred "over" the gadget or somewhere else.

(defun region-contains-position-p (gadget x y)
  (multiple-value-bind (x-min y-min x-max y-max)
      (bounding-rectangle gadget)
    (and (<= x-min x x-max)
         (<= y-min y y-max))))

Now distributing mouse event is as easy as iterating over all gadgets and verifying if event applies to the gadget. If it does we make such gadget active. Moreover if left mouse button is clicked we simulate [return] key to be handled by the previously defined handle-input method.

(defun distribute-mouse-event (bstate x y)
  (dolist (g *gadgets*)
    (when (region-contains-position-p g x y)
      (setf *active-gadget* g)
      (when (eql bstate charms/ll:button1_clicked)
        (handle-input g #\return))
      (return))))

Notice that we have used low-level charms interface (comparing bstate with charms/ll:button1_clicked). Other events are also defined in the charms/ll package so you may expand mouse handling interface.

Time to define display function for our enterprise application. ncurses cursor should to follow mouse movement, so displaying gadgets can't affect cursor position. To achieve that we wrap whole display function in charms:with-restored-cursor macro.

(defun display-enterprise-hello-world (window flip-flop)
  (charms:with-restored-cursor window
    (charms:clear-window window)
    (draw-window-background window +white/blue+)
    (if flip-flop
        (with-colors (window +white/blue+)
          (charms:write-string-at-point window *computation-name* 2 1))
        (with-colors (window +black/red+)
          (charms:write-string-at-point window *computation-name* 2 1)))
    (dolist (g *gadgets*) (display-gadget window g))
    (charms:refresh-window window)))

Usually terminal emulators doesn't report mouse movement (only clicks). To enable such reports print the following in the terminal output (note, that slime doesn't bind *terminal-io* to the right thing, so we use *console-io* which we have dfined at the beginning):

(format *console-io* "~c[?1003h" #\esc)

This escape sequence sets xterm mode for reporting any mouse event including mouse movement (see http://invisible-island.net/xterm/ctlseqs/ctlseqs.html). This escape sequence is usually honored by other terminal emulators (not only xterm). Without it we wouldn't be able to track mouse movement (only pointer press, release, scroll etc). This is important to us because we want to activate gadgets when mouse moves over them.

To start mouse and configure its events lets create initialization function. We configure terminal to report all mouse events including its position. After that we tell the terminal emulator to report mouse position events.

(defun start-mouse ()
  (charms/ll:mousemask
   (logior charms/ll:all_mouse_events charms/ll:report_mouse_position))
  (format *console-io* "~c[?1003h~%" #\esc))

We need to handle new event type :key-mouse (already handled types are C-q, C-r, TAB, :compute and "other" characterrs). Since event handling gets more complicated we factor it into a separate function enterprise-process-event. Note the handler-case - when mouse event queue is empty (for instance because the mouse event is masked), then function will return error. We also take into account that mouse events are reported starting at absolute terminal coordinates while our window starts at point [10,10]. Additionally we implement shift+tab sequence which on my system is reported as #\Latin_Small_Letter_S_With_Caron.

(defun get-enterprise-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-enterprise-hello-world-input :compute))
  (let ((c (charms/ll:wgetch (charms::window-pointer window))))
    (when (not (eql c charms/ll:ERR))
      (alexandria:switch (c)
        (charms/ll:KEY_BACKSPACE #\Backspace)
        (charms/ll:KEY_MOUSE :KEY-MOUSE)
        (otherwise (charms::c-char-to-character c))))))

(defun enterprise-process-event (window flip-flop)
  (loop
     (let ((input (get-enterprise-hello-world-input window)))
       (case input
         (#\Dc1 ;; this is C-q
          (throw :exit :c-q))
         (#\Dc2 ;; this is C-r
          (display-enterprise-hello-world window flip-flop))
         (#\tab
          (alexandria:if-let ((remaining (cdr (member *active-gadget* *gadgets*))))
            (setf *active-gadget* (car remaining))
            (setf *active-gadget* (car *gadgets*)))
          (display-enterprise-hello-world window flip-flop))
         (#\Latin_Small_Letter_S_With_Caron ;; this is S-[tab]
          (if (eql *active-gadget* (car *gadgets*))
              (setf *active-gadget* (alexandria:lastcar *gadgets*))
              (do ((g *gadgets* (cdr g)))
                  ((eql *active-gadget* (cadr g))
                   (setf *active-gadget* (car g)))))
          (display-enterprise-hello-world window flip-flop))
         (:key-mouse
          (handler-case (multiple-value-bind (bstate x y z id)
                            (charms/ll:getmouse)
                          (declare (ignore z id))
                          ;; window starts at 10,10
                          (decf x 10)
                          (decf y 10)
                          (charms:move-cursor window x y)
                          (distribute-mouse-event bstate x y)
                          (display-enterprise-hello-world window flip-flop))
            (error () nil)))
         (:compute
          (setf flip-flop (not flip-flop))
          (display-enterprise-hello-world window flip-flop))
         (otherwise
          (if (handle-input *active-gadget* input)
              ;; redisplay only if handle-input returns non-NIL
              (display-enterprise-hello-world window flip-flop)
              ;; don't be a pig to a processor
              (sleep 1/60)))))))

To make terminal report mouse events in the intelligible way we need to call function charms:enable-extra-keys (thanks to that we don't deal with raw character sequences). We also call start-mouse.

(defun enterprise-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      ;; full enterprise ay?
      (charms:enable-non-blocking-mode window)
      (charms:enable-extra-keys window)
      (start-mouse)
      (display-enterprise-hello-world window flip-flop)
      (catch :exit
        (loop (funcall 'enterprise-process-event window flip-flop))))))

Our final result is splendid! We got rich :-). Source of the following asciinema video is located here.

asciicast

To finish our lisp session type in the slime REPL (quit).

Conclusions

In this crash course we have explored some parts of the cl-charms interface (windows, colors, mouse integration...) and defined an ad-hoc toolkit for the user interface. There is much more to learn and the library may be expanded (especially with regard to high level interface, but also to include supplement ncurses libraries like panel, menu and forms). Ability to run charms application in a new terminal started with run-program would be beneficial too.

Some programmers may have noticed that we have defined an ad-hoc, informally-specified, bug-ridden, slow implementation of 1/10th of CLIM. No wonder, it defines a very good abstraction for defining UI in a consistent manner and it is a standard (with full specification).

Instead of reinventing the wheel we want to plug into CLIM abstractions and use cl-charms as CLIM backend. This option will be explored in a near-term future - it will help to document a process of writing new backends. Of course such backend will be limited in many ways - that will be an opportunity to explore bugs which got unnoticed when the smallest distance is a pixel (in contrast to a reasonably big terminal character size). Stay tuned :-)

Zach BeanePlanet Lisp is back on twitter

· 14 days ago

Years ago, Hans Hübner created the @planet_lisp twitter account and made some Lisp software to keep it updated with recent posts from Planet Lisp.

But it stopped working last summer, and he offered to let someone else run it, so I wrote new software and it should be working again.

This post is half announcement, half a test to make sure the new gateway works.

Enjoy!

Nicolas HafnerIntegrating Shaders and Objects - Gamedev

· 15 days ago

header
In this entry I'll describe the way in which shaders are integrated into the object system in Trial. It serves as a predecessor to the next entry, which will be about shader pipelines.

In case you're not familiar with modern graphics programming, a shader is a piece of code that runs on the GPU during a particular stage in the rendering pipeline. The customisable part of the pipeline looks as follows:

Vertex (Tesselation Control) (Tesselation Evaluation) (Geometry) Fragment

In order to run anything at all, you need to provide shaders for the vertex and the fragment stages. The vertex stage emits vertices that form triangles. The tesselation stages can then subdivide these triangles further to add detail. The geometry stage can add entirely new triangles, for instance to produce grass. The triangles are then rasterised, and for each fragment (pixel) that is drawn the fragment shader is evaluated to produce the resulting colour.

Whenever you perform any drawing command, this pipeline is run. Which particular shaders are run depends on the shader program that is bound at the time of drawing. The shader program ties together shaders for the different stages to form the complete pipeline.

Different objects in the scene you're drawing will need different shaders to produce individual effects and change drawing behaviour. However, often times a large part of the shaders will be similar or the same. As such, it seems sensible to introduce some kind of system to allow sharing bits and pieces of shader logic between objects. I'd like to do some in-depth research into how other engines approach this problem and publish a paper on it at some point. For now, here's a brief summary of the strategies I've seen so far:

  • Special #include directives that are run through a pre-processor to splice other files in.
  • A "mega-shader" that includes everything, but sections are conditionalised through #ifdefs.
  • No strategy at all. People just copy paste things where they need them.

The drawback with the include strategy is that the included code won't have any idea about where it's included. It limits you to only being able to include function definitions that are fully encapsulated. The drawback with the mega-shader should be obvious: you're creating a huge mess of a file. Simply copy pasting should also be obviously a bad idea since that doesn't really share anything at all.

The other issue that all of these approaches have in common is that they do not integrate with the rest of the engine's entity system at all. In almost all cases there will be some kind of inheritance system to allow reusing and combining behaviour between common components. However, all of these combination approaches won't include shaders; they still have to be handled separately.

Trial solves this in two steps. The first step is the combination of shader parts, which is solved by glsl-toolkit. This library allows you to parse GLSL code into an AST, run static analysis on it, and use that to combine different code parts logically. Namely, it will recognise shared global variables and main functions, and will automatically rewrite the code to fit together properly. Naturally, there are limits to how well it can do this, but overall it has worked pretty well so far. There's other really interesting things that could be done with glsl-toolkit, like early error checking, inlining, cleanup, and so forth.

The second step is to use the MOP to tie shader code to classes. That way multiple inheritance can be used to combine shader parts as needed, alongside the other bits of logic that are naturally combined through methods and slots. When an instance of a class is finally created, all the shader parts can then be merged together to form a complete shader program.

In Trial this is implemented through the shader-entity class. Aside from a bunch of convenience method definitions and some MOP boilerplate, this is actually not that much code. Let's take a look at how the shaders are computed, which is the most substantial part of the file.

(defmethod compute-effective-shaders ((class shader-entity-class))
  (let ((effective-shaders ())
        (inhibited (inhibited-shaders class))
        (superclasses (remove 'shader-entity-class
                              (c2mop:compute-class-precedence-list class)
                              :test-not (lambda (type class) (typep class type)))))
    ;; Check whether inhibits are effective
    (loop for (name type) in inhibited
          for super = (find name superclasses :key #'class-name)
          do (cond ((not super)
                    (warn "No superclass ~s in hierarchy of ~s. Cannot inhibit its shader ~s." name (class-of super) (class-name class))
                    (setf (inhibited-shaders class) (remove (list name type) inhibited :test #'equal)))
                   ((not (getf (direct-shaders super) type))
                    (warn "No shader of type ~s is defined on ~s. Cannot inhibit it for ~s." type name (class-name class))
                    (setf (inhibited-shaders class) (remove (list name type) inhibited :test #'equal)))))
    ;; Compute effective inhibited list
    (loop for super in superclasses
          do (setf inhibited (append inhibited (inhibited-shaders super))))
    ;; Make all direct shaders effective
    (loop for (type shader) on (direct-shaders class) by #'cddr
          do (setf (getf effective-shaders type)
                   (list shader)))
    ;; Go through all superclasses in order
    (loop for super in superclasses
          do (loop for (type shader) on (direct-shaders super) by #'cddr
                   unless (find (list (class-name super) type) inhibited :test #'equal)
                   do (pushnew shader (getf effective-shaders type))))
    ;; Compute effective single shader sources
    (loop for (type shaders) on effective-shaders by #'cddr
          do (setf (getf effective-shaders type)
                   (glsl-toolkit:merge-shader-sources
                    (loop for (priority shader) in (stable-sort shaders #'> :key #'first)
                          collect (etypecase shader
                                    (string shader)
                                    (list (destructuring-bind (pool path) shader
                                            (pool-path pool path))))))))
    (setf (effective-shaders class) effective-shaders)))

It might look a bit intimidating at first, but it's not too complicated. First we compute a list of superclasses that are also shader-entity-classes. Then we go through the list of inhibition specs, which allow you to prevent certain shaders from being inherited. This can be useful if you want to slightly alter the way the shader is implemented without having to also rewrite the other parts of functionality that come with inheritance. When going through the inhibitions, we check whether they are even applicable at all, to give the user a bit of error checking. We then combine the inhibition specs with those of all the superclasses. This is done because we always compute a fresh effective shader from all direct shaders on the superclasses, so we need to preserve inhibition from superclasses as well. Next we push all the shaders that were defined directly on the class to the effective shaders. Then we go through all the superclasses in order and, unless their shader was inhibited, push it onto the effective shaders. Finally, for each shader stage we combine all the shader parts using glsl-toolkit's merge-shader-sources. The merging is done in the order of priority of the shaders. Adding a priority number to shaders allows you to bypass the natural order by inheritance, and instead make sure your code gets included at a particular point. This can be important because merge semantics change depending on the order.

Let's look at an example of how this can be useful. I will omit code that isn't directly related to the shaders. If you want to have a closer look at how it all works, check the helpers.lisp file.

(define-shader-entity vertex-entity () ())

(define-class-shader (vertex-entity :vertex-shader)
  "layout (location = 0) in vec3 position;

uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void main(){
  gl_Position = projection_matrix * view_matrix * model_matrix * vec4(position, 1.0f);
}")

The vertex-entity is a base class for almost all entities that have a vertex array that should be drawn to the screen. It provides a single vertex shader that does the basic vertex translation from model space to clip space. In case you're not familiar with these terms, model space means that vertex positions are relative to the model's origin. Clip space means vertex positions are relative to the OpenGL viewport (window). If we ever have an object that draws a model in some way, this class is now a good candidate for inclusion as a superclass.

(define-shader-entity colored-entity () ())

(define-class-shader (colored-entity :fragment-shader)
  "uniform vec4 objectcolor;
out vec4 color;

void main(){
  color *= objectcolor;
}")

This class is a bit smaller and simply gives you the ability to colour the resulting fragments uniformly. Note however that we don't simply set the output colour to our object's colour, but instead multiply it. This means that this class, too, can be combined with others.

(define-shader-entity textured-entity () ())

(define-class-shader (textured-entity :vertex-shader)
  "layout (location = 1) in vec2 in_texcoord;
out vec2 texcoord;

void main(){
  texcoord = in_texcoord;
}")

(define-class-shader (textured-entity :fragment-shader)
  "in vec2 texcoord;
out vec4 out_color;
uniform sampler2D texture_image;

void main(){
  out_color *= texture(texture_image, texcoord);
}")

Finally, the textured-entity class provides you with the ability to stick textures onto your vertices. To do so it needs another attribute that describes the texture coordinates for each vertex and, using the vertex shader, passes these on down the line to the fragment shader. There it does a lookup into the texture and, as with the previous entity, multiplies the output colour with that of the texture's.

Now for the interesting bit: if we want to create an object that is drawn using a vertex array, is uniformly textured, and should be tinted with a particular colour, we can simply do this:

(define-shader-entity funky-object (vertex-entity colored-entity textured-entity) ())

When we finalise the inheritance on this class, all the shaders it inherits are combined to produce single shaders as we would expect:

TRIAL> (effective-shaders 'funky-object)
(:VERTEX-SHADER "#version 330 core
layout(location = 1) in vec2 in_texcoord;
out vec2 texcoord;

void _GLSLTK_main_1(){
  texcoord = in_texcoord;
}
layout(location = 0) in vec3 position;
uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void _GLSLTK_main_2(){
  gl_Position = (projection_matrix * (view_matrix * (model_matrix * vec4(position, 1.0))));
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
}"
:FRAGMENT-SHADER "#version 330 core
out vec4 color;

void _GLSLTK_main_1(){
  color = vec4(1.0, 1.0, 1.0, 1.0);
}
in vec2 texcoord;
uniform sampler2D texture_image;

void _GLSLTK_main_2(){
  color *= texture(texture_image, texcoord);
}
uniform vec4 objectcolor;

void _GLSLTK_main_3(){
  color *= objectcolor;
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
  _GLSLTK_main_3();
}")

As you can see, the individual main functions got renamed and coupled, and specifications for in and out variables that denoted the same thing got merged together. The code perhaps isn't structured as nicely as one would like, but there's always room for improvements.

This inheritance and combination of shader parts is pretty powerful. While it will probably not allow you to produce very efficient shaders, this should not be a big concern for an overwhelmingly large class of projects, especially during development. Instead, this "Lego blocks" approach to shaders and graphics functionality allows you to be very productive in the early phases, just to hammer things out.

With that covered, you should be ready for the next part, where I will go into how rendering is actually performed, and how more complicated effects and functions are composed. Hopefully I'll have that out in a couple of days.

Quicklisp newsDownload stats for January, 2018

· 16 days ago
Here are the raw Quicklisp download stats for January 2018.
20452  alexandria
17822 closer-mop
16236 cl-ppcre
15928 split-sequence
15762 babel
15131 iterate
14929 anaphora
14859 trivial-features
14580 let-plus
14136 trivial-gray-streams
13668 bordeaux-threads
12524 cffi
12215 trivial-garbage
11815 flexi-streams
11383 puri
11201 more-conditions
11063 nibbles
10266 usocket
9545 utilities.print-items
9499 trivial-backtrace
9312 esrap
9170 cl+ssl
8940 chunga
8812 cl-base64
8446 chipz
8417 cl-fad
8107 drakma
7212 cl-yacc
6937 named-readtables
6929 asdf-flv
6647 fiveam
6429 local-time
6179 ironclad
6103 parse-number
5850 closure-common
5844 cxml
5743 log4cl
5229 architecture.hooks
5037 cl-json
5023 parser.common-rules
4773 plexippus-xpath
4530 optima
4323 lift
4028 lparallel
4002 cl-clon
3882 cl-dot
3594 cxml-stp
3469 cl-interpol
3421 xml.location
3399 cl-unicode
3363 utilities.print-tree
3350 cl-store
3311 fare-utils
3305 fare-quasiquote
3135 slime
3109 inferior-shell
3098 fare-mop
2810 metabang-bind
2557 trivial-utf-8
2556 trivial-types
2544 cl-utilities
2288 uuid
2244 cl-annot
2234 cl-syntax
2232 quri
2128 trivial-indent
2118 documentation-utils
2035 cl-slice
2027 plump
2023 array-utils
2013 gettext
1998 static-vectors
1983 symbol-munger
1966 collectors
1933 arnesi
1932 md5
1920 access
1915 djula
1898 cl-locale
1893 cl-parser-combinators
1833 fast-io
1822 simple-date-time
1804 hunchentoot
1567 ieee-floats
1537 yason
1364 prove
1312 asdf-system-connections
1307 metatilities-base
1304 cl-containers
1192 osicat
1138 monkeylib-binary-data
1115 rfc2388
1041 trivial-shell
993 diff
989 postmodern
961 cl-custom-hash-table
929 parse-float
912 salza2
882 cl-sqlite
872 trivial-mimes

Nicolas HafnerAssets, Resources, and Loaders - Gamedev

· 17 days ago

header
Welcome to a new article series! I've decided to try to record my thoughts on various matters about game development. This will include observations, discoveries, explanations of existing systems, and proposals for improvements in related projects. Most of this will be directly related to Shirakumo's game engine Trial and the Treehouse gamedev streams. This first entry is a proposal for a new resource management system in Trial.

Any game engine needs to deal with the management of assets and resources. To eliminate any possibility for confusion, I'll define what I mean by those two terms here:

  • A resource is a representation of some kind of allocated object. Typically this object resides on the GPU, is costly to allocate, and has a variety of associated properties that determine its state.
  • An asset is a representation of an external, static object. Typically this object is a file of a particular format, needs to be decoded to be useful, can be very large, and can be very costly to decode.

Right now Trial mushes both of these concepts into one. For a while this seemed like a sane thing to do - after all, both assets and resources encapsulate some kind of loading logic, represent some kind of thing that you would like to load/allocate into foreign memory, and let stick around until you deallocate it again.

However, as the engine grows more complex and the demands for capabilities rise, this is no longer appropriate. For instance, not every texture you need will be loaded from an image. Frame buffers, complex shader pipelines, and other systems require textures without an external image attached. This has resulted in a hybrid approach, where a texture could take a "spec" as an input which would construct a texture without loading it.

Unfortunately, textures have a lot of properties: width, height, internal format, wrapping, min-filter, max-filter, attachment, depth, mipmap levels, structure, and probably others I'm forgetting about. The spec thus grew into a very weird format that not only didn't suit all needs, but was generally a pain to deal with. Textures aren't the only area where this combination of concerns hurts either. A similar problem arises for meshes and vertex buffers and arrays. There's probably others still that I haven't come across yet.

So, now that we have established that this separation of concerns is indeed a Good Idea™, we need to move on to the actual proposal. I will formulate this using my favourite modelling language, CLOS. Here goes:

(defclass resource ()
  ())

(defgeneric allocate (resource))
(defgeneric deallocate (resource))
(defgeneric allocated-p (resource))

So far so good. The resource is a relatively opaque type, which merely gives you operations to allocate, deallocate, or test for allocation. It is so opaque on purpose, to allow it to be as generic as possible. Double allocation or double deallocation should be automatically prevented by the protocol.

(defclass foreign-resource (resource)
  ((data-pointer :initform NIL
                 :initarg :data-pointer
                 :reader data-pointer)))

Most actual resources will represent a foreign data block of some kind, for which this foreign-resource subtype is useful. For GL resources for instance, the data-pointer would return the integer naming the specific resource.

(defclass asset (resource)
  ((pool :initarg :pool :reader pool)
   (name :initarg :name :reader name)
   (input :initarg :input :accessor input)))

(defgeneric load (asset))
(defgeneric reload (asset))
(defgeneric offload (asset))
(defgeneric loaded-p (asset))

Onward to the asset, where things get a bit more complicated. Trial employs something called "asset pools" that group assets under a common name. Each pool is associated with a base pathname, from which every asset within it is derived. This allows writing libraries for Trial that ship their own assets, without having to require the user to take care to properly deploy the assets into their own game when it's time to ship. Each asset also has a unique name within the pool, and an input of some kind. The operations we can perform on an asset should be fairly self-explanatory.

Note that asset is a subtype of resource. The idea here is that, while resources are useful on their own, assets are not. Every asset is going to translate to loading into some kind of resource. Thus it makes perfect sense to retain the capabilities of the underlying resource with the associated asset type. To illustrate this a bit better, let's create an image asset.

(defclass texture (foreign-resource)
  (wrapping min-filter mag-filter anisotropy ...))

(defmethod allocate ((texture texture))
  #| Use glCreateTextures and so forth to allocate. |#)

(defclass image (texture asset)
  ())

(defmethod load ((image image))
  (allocate image)
  #| Load image data and use glTexSubImage* to transfer. |#)

In order to back an image we need textures. Textures come with all the wonderful aforementioned properties. We can encode the creation of the texture and the setting of these properties in the allocation of the texture resource. Then, we simply create a subclass of both texture and asset, inheriting all the texture properties, and gaining the ability to transfer the data into the texture on loading.

This may seem like we're back at the beginning with the mushing of the two concepts, since assets are also resources now. However, remember that the primary ache with that approach was that resources were assets. This is no longer the case. You can now create resources completely independently, and interact with their various properties and requirements in a much more sensible manner. This is in addition to the major advantage that now anything dealing with resources does not have to care whether it is passed an asset or not - regardless, it will work as expected.

By having the assets be direct instances of resources as well, we also eliminate another layer of indirection we would have to follow otherwise. If the resource had been separate, we would need a slot on the asset to track it, requiring two dereferences to reach the data-pointer contents. This might not seem like a big deal, but in tight render loops this sort of thing can add up. Ideally you'd be able to inline the data-pointer, but doing so is quite tricky. I hope to talk about my ideas for that kind of optimisation in a future entry.

I could go into a lot more detail here about the specific background details of how this asset/resource system should be implemented, particularly in the way of invariants, but I think the crux of it should be clear. Once again the fact that multiple inheritance is so easily supported by CLOS makes a very convenient, sensible, and efficient design possible.

I'll update this entry with a link to the relevant code once I have implemented it in Trial.

I have some other ideas for entries in this series ready to go, so hopefully it won't be long before another article rolls out.

Quicklisp newsJanuary 2018 Quicklisp dist update now available

· 17 days ago
New projects: 
  • bodge-blobs-support — Common utilities for loading/distributing foreign libraries — The Unlicense
  • bodge-chipmunk — Wrapper over chipmunk 2d physics library — MIT
  • chipmunk-blob — Chipmunk physics foreign library collection — MIT
  • cl-dct — Discrete cosine transform. — Apache-2.0
  • cl-grnm — Grid Restrained Nelder-Mead, a multivariate rootfinder. — MIT (See LICENSE.txt)
  • cl-skkserv — skkserv with Common Lisp — GPLv3
  • claw — Import c2ffi specs and generate CFFI wrappers — BSD-2-Clause
  • clinenoise — A trivial line-input library for VT-like terminals — BSD-2
  • clweb — CLWEB is a literate programming system for Common Lisp — Unspecified
  • dbd-oracle — ORACLE database driver for CL-DBI. — Lessor Lisp General Public License
  • dufy — Color library for Common Lisp — MIT
  • glsl-packing — calculate std140/std430 layout for a glsl UBO/SSBO — MIT
  • shadow — A lightweight system to help with defining and managing OpenGL shader programs. — MIT
  • westbrook — An RSS feed generator. — BSD
Updated projects: 3bgl-shader3d-matrices3d-vectorsarray-utilsasd-generatorasdf-vizaws-sign4beastccldoccellsceplcepl.sdl2cepl.sdl2-ttfchancerychirpchungacl-anacl-ansi-termcl-cognitocl-conllucl-csvcl-cutcl-dbicl-digraphcl-diskspacecl-dotcl-editdistancecl-flaccl-fondcl-gamepadcl-gobject-introspectioncl-gpiocl-hamcrestcl-ledgercl-lzmacl-mixedcl-monitorscl-mpg123cl-oclapicl-online-learningcl-openglcl-out123cl-pcgcl-random-forestcl-readlinecl-riffcl-sandboxcl-smtpcl-spidevcl-strcl-string-matchcl-stringscl-svgcl-unicodecl-wavclipcloser-mopclssclx-cursorcodexcommonqtcroatoancrypto-shortcutsdartsclhashtreedeedsdeferreddefpackage-plusdeploydissectdjuladocumentation-utilseazy-projectesrapfare-scriptsfast-httpflareflowforform-fiddlegamebox-mathgeneric-comparabilityglsl-specglsl-toolkitgraphhalftoneharmonyhu.dwim.perechumblerhunchensocketinquisitoriteratejosejskenzolacklambda-fiddlelasslegitlichat-protocollisp-chatlisp-unit2local-timelquerymaidenmcclimmgl-paxmk-string-metricsmodularizemodularize-hooksmodularize-interfacesnamed-readtablesninevehnorthoverlordpapyrusparachuteparser.common-rulespathname-utilspipingplumpportable-threadspostmodernqlotqmyndqtoolsqtools-uirandom-stateratifyredirect-streamremote-jsrutilsscalplserapeumsimple-inferiorssimple-rgbsimple-tasksskippy-renderersnoozesoftdrinkspinneretstaplestumpwmthe-cost-of-nothingtreestrivial-argumentstrivial-benchmarktrivial-file-sizetrivial-indenttrivial-main-threadtrivial-mimestrivial-thumbnailtrivial-timeouttrivial-updatetrivial-wsubiquitousunix-optsvarjoverbosewebsocket-driverwith-c-syntaxwuweixml.locationyaclml.

Removed projects: blackthorn-engine, gamebox-grids, linedit, software-evolution, squirl, tbnl.

The latest removed projects are due to author request or author neglect - if a project you need has been removed, maybe a new maintenance arrangement can be worked out, get in touch!

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

Vsevolod DyomkinMinimal Perfect Hash-Tables in Common Lisp

· 17 days ago

Recently, my twitter pal @ifesdjeen wrote a line that resonated with me: "Looks like it's easier for people to read 40 blog posts than a single whitepaper." And although he used it in a negative context, I recognized it as a very precise (and, actually, positive) description of what a research engineer does: read a whitepaper (or a dozen, for what it's worth) and transform it into working code and - as a possible byproduct - into a blog post that other engineers will understand and be able to reproduce. I'm in the business of reproducing papers for about 7 years now, and, I believe, everyone with the same experience will confirm that it's not a skill every engineer should be easily capable of. Not all papers even can be reproduced (because the experiment was just not set up correctly), and of those, which, in principle, can be, only some are presented in the form that I can grasp well enough to be able to program them. And, after all, working code is an ultimate judge of your understanding.

But I digressed... :) This post is meant to explain (in simple "engineering" terms) the concept of minimal perfect hash-tables and how I've recently implemented one of their varieties to improve the memory efficiency of my language identification library WILD. Uncovering, in the process, a more abstract concept behind it.

The Justification of Perfect Hash-Tables

Minimal perfect hash-tables are persistent data structures that solve 2 basic deficiencies of normal hash-tables: they guarantee both constant (not only amortized) O(1) time for collision-free key-based access, while no reserved space is required (which for hash-tables may be in the range of 20%-50% of its size). It comes at a cost of two restrictions: the table should be filled with a keyset known ahead-of-time, and the process of building one takes longer than for a usual hash-table (although, for many methods, it can still be bounded by amortized O(1) time).

From my point of view, the main advantage of perfect HTs is the possibility to substantially reduce memory usage, which is important in such use cases as storing big dictionaries (relevant to many NLP tasks). Moreover, the space requirements can be lowered even more if the whole keyset is stored in the table, i.e. there can be no misses. Under such constraints, unlike with normal hash-tables, which still require storing the keys alongside the values due to the need to compare them in cases of hash collisions, in a perfect HT they can be just omitted. Unfortunately, such situation is rarely the case, but if some level of false positives is permitted, with the help of additional simple programmatic tricks, the memory usage for keys can be also reduced by orders of magnitude.

Hash-tables on their own are the trickiest among the basic data structures. But, in terms of complexity, they pale in comparison to minimal perfect hash-tables, which belong to the advanced algorithms family. One reason for that is that perfect hash-tables require more "moving parts", but the main one is, probably, that there's no common well-known algorithm to distribute keys in a perfect manner. And reading many perfect hash-table papers will scare away most programmers, at least it did me. However, after some research and trial-and-error, I've managed to find the one that presents a simple and scalable approach. So, I'm going to explain it in this post.

Hash-tables in SBCL

If you take a look at some particular hash-table implementation, your mental model of what a hash-table is may be quite seriously shaken. A straightforward open addressing table will assume a vector of key-value pairs as an underlying concrete structure, filled as the table is modified. On a 64-bit system, it will require: (8 × 1.5 + 8 × 16) × entries-count + total size of all keys + total size of all values + constant size of metadata. The 8 × 1.5 bytes in the formula is storage needed to hold a pointer to a hash-table entry including an average 33% extra overhead, the additional 8 × 16 bytes per entry will be spent on a cons cell (if we use this efficient although antiquated way to represent pairs in Lisp). It should be noted that depending on the size of keys and values the first term may be either a significant (up to half of the total size) or a negligible part of the table's memory profile.

However, this is not how SBCL implements hash-tables. See for yourself. Let's create a random hash-table with approximately 1000 keys:


> (defparameter *ht*
(pairs->ht (loop :repeat 1000
:collect (pair (fmt "~{~C~}"
(loop :repeat (random 10)
:collect (code-char (+ 65 (random 50)))))
(random 1000)))
:test 'equal))

And inspect its contents:


> (inspect *ht*)

The object is a STRUCTURE-OBJECT of type HASH-TABLE.
0. TEST: EQUAL
1. TEST-FUN: #
2. HASH-FUN: #
3. REHASH-SIZE: 1.5
4. REHASH-THRESHOLD: 1.0
5. REHASH-TRIGGER: 1024
6. NUMBER-ENTRIES: 844
7. TABLE: #(#{EQUAL
"plCjU" 985
"VVYZqKm[" 861
"N\\" 870
"fqfBdZP\\" 364
"cHNjIM]Y" 804
"p_oHUWc^" 630
"CqGRaMH" 408
"NecO" 636
"QDBq" 380
"M" 838
...
}
0 "plCjU" 985 "VVYZqKm[" 861 "N\\" 870 "fqfBdZP\\" 364 ...)
8. NEXT-WEAK-HASH-TABLE: NIL
9. %WEAKNESS: 0
10. NEXT-FREE-KV: 845
11. CACHE: 1688
12. INDEX-VECTOR: #(0 617 332 393 35 0 194 512 0 420 ...)
13. NEXT-VECTOR: #(0 72 0 0 253 499 211 0 349 488 ...)
14. HASH-VECTOR: #(9223372036854775808 1830284672361826498 3086478891113066655
24243962159539 2602570438820086662 2431530612434713043
4568274338569089845 3085527599069570347 2374133389538826432
3322613783177911862 ...)
15. LOCK: #
16. SYNCHRONIZED-P: NIL

As the fog of initial surprise clears, we can see that instead of a single vector it uses 4! The keys and values are placed in the one called TABLE (that starts with a reference to the hash-table object itself). Note that storing the entries as pairs is redundant both in terms of memory and access (additional dereferencing). That's why an obvious optimization is to put them directly in the backing array: so the keys and values here are interleaved. One more indirection that may be removed, at least in Lisp, is "unboxing" the numbers, i.e. storing them immediately in the array avoiding the extra pointer. This may be an additional specialized optimization for number-keyed hash-tables (to which we'll return later), but it is hardly possible with the interleaved scheme.

Perhaps, the biggest surprise is that the entries of the TABLE array are stored sequentially, i.e. there are no gaps. If they were to be added randomly, we'd expect the uniform distribution of 845×2 unique keys and corresponding values over the 2048-element array. Instead, this randomness is transferred to the 1024-element integer INDEX-VECTOR: another level of indirection. Why? For the same reason yet another 1024-element array is used — a HASH-VECTOR, which stores the values of hashes for all the table keys: efficient resizing. Although, it may seem that the biggest overhead of a hash-table is incurred due to reserved space for new keys, in fact, as we have now learned, resizing is a much heavier factor. Especially this relates to the HASH-VECTOR: if the hashes of all keys would not have been stored, each time the table got resized they would have had to be recalculated anew: an intolerable slowdown for larger tables. Having a separate INDEX-VECTOR is not so critical, but it provides two additional nice capabilities. Resizing the table is performed by adjusting the vectors' lengths (an optimized operation) without the need to redistribute the entries. Besides, unlike in theory, we can iterate this table in a deterministic order: the one of keys addition into the table. Which comes quite handy, although SBCL developers don't want to advertise this as a public API as it may restrict potential future modifications to the data structure. There's also a NEXT-VECTOR that is used to optimize insertion.

Overall, we can see that a production-optimized hash-table turns out to be much heavier than a textbook variant. And, indeed, these optimizations are relevant, as SBCL's hash-table are quite efficient. In my experiments several years ago, they turned out to be 2-3 times faster than, for instance, the CCL ones. Yet, it's clear that the speed optimizations, as usual, come at a cost of storage deoptimization. To restore storage sanity, we could fall back to the basic hash-table variant, and it's a possible solution for some cases, although a mediocre one, in general. Neither will it be the fastest, nor fully optimized.

Building an MPHT

Most of space in the SBCL's hash-table implementation is spent on metadata and keys, not on values. Yet, if we introduce a restriction that the table cannot be altered — no new values added after initial construction and no resizing possible — most of those optimizations and connected storage become redundant. An ideally space-efficient data structure is an array, but it doesn't allow key-based access. In theory, minimal perfect hash-tables are arrays with a minimal amount of metadata overhead. Yet, the precise amount is dependent on the algorithm, and there's still new research improving them going on. Overviewing all the existing approaches (most of which I don't fully understand) is beyond the scope of this post.

Besides, MPHTs require additional calculations at access time compared to normal hash-tables. So, if a hash-table is well-optimized it will usually be faster than and MPH.

And still, MPHs require a non-negligible amount of additional space for metadata: for the algorithm that will be discussed it's around 2 bytes per key. The best algorithms in the literature claim to reduce this amount more than 4 times to less than 4 bits per key. However, for now, I have picked the simpler approach, since this difference is not critical considering that every value in the table, on a 64-bit machine, occupies at least 16 bytes (a pointer plus the data), so an overhead of 2 bytes versus 0.5 bytes (that will probably be rounded to 1 byte) is already negligible.

Now, let's think of how we can distribute the keys in a hash-table so that there are no collisions and the backing array has the same number of elements as the number of hash-table keys. Theoretically, as the set of keys is known before the creation of the table, we can find a hash function that produces such distribution. Unfortunately, due to the Birthday paradox, it may be a long search. The algorithms for MPHTs suggest ways of structuring it. A good algorithm should have at most O(n) complexity as, otherwise, it will be infeasible for large keysets, which are the main use case for perfect hash-tables.

The algorithm I will briefly describe now was suggested by Botelho and Ziviani. It is a 2-step process:

  • at first stage, using a normal hash-function (in particular, Jenkins hash), all keys are nearly uniformly distributed into buckets, so that the number of keys in each bucket doesn't exceed 256. This can be done by setting the hash divisor to (ceiling (length keyset) 200 #| or slightly less |#);
  • next, for each bucket, a perfect hash function is constructed via a simple algorithm: for each key, two hash codes are calculated by one call to the Jenkins hash (this function outputs 3 hashes at once), which are treated as vertices of a graph. If the graph happens to be non-circular (which can be ensured with high probability with a suitable hash divisor), it is possible to construct the desired function as a sum of 2 hashes. Otherwise, we change the Jenkins hash seed and try constructing a new graph until a non-circular one is obtained. In practice, this requires just a couple of tries;
  • the construction of the final hash function is described very clearly by Czech, Havas and Majewski who have proposed this method: it lies in performing a depth-first search on the graph, labelling the edges with unique numbers and deducing the corresponding number for each possible Jenkins hash value.

Here you can see one of the possible labelings (each edge corresponds to a key and its unique index, each vertex — to the value for each of the possible Jenkins hashes):

Now, the per-bucket hash-function can be reconstructed from an array of numbers (in the range 0-255) associated with each possible hash. The divisor of the hash is twice the number of keys in the bucket, though it can be any number above the number of keys: the greater the divisor, the more space is used for storing the numbers (the divisor is the length of an array) and the less time it takes to find an acyclic graph.

The algorithm works quite fast: on my laptop, it takes 8 seconds for the table of 725 359 character trigrams from a mid-size language identification model and 13 seconds for 1 236 452 words from the same model.

To sum up, this is how to find the index of an element (bytes argument) in our perfect hash-table:


(defun csindex (bytes cstab)
(with ((mod (/ (1- (length @cstab.meta)) 2)) ; divisor of the top-level hash
(hash (* (mod (jenkins-hash bytes) mod) 2)) ; determine the bucket
(off (? @cstab.meta hash))
(seed (? @cstab.meta (1+ hash))) ; each bucket has its own Jenkins seed
(mod2 (- (? @cstab.meta (+ 2 hash)) off))
(b c (jenkins-hash2 bytes seed (* mod2 2)))
(goff (* 2 off)))
;; bucket offset + in-bucket hash
(+ off (mod (+ (? gs (+ goff b))
(? gs (+ goff c)))
mod2))))

Note, that, in the code for this article, I refer to perfect hash tables as cstabs for the reasons described in the end.

Efficiency In Practice

So, let's now examine the memory efficiency of this method. Thankfully, just recently the SBCL developers started working on a critical for every algorithmic developer missing piece in the SBCL API: a function to determine the size in memory occupied by an arbitrary object. As we know from the famous koan, "LISP programmers know the value of everything and the cost of nothing". Indeed, from a certain point of view, this applies to SBCL. Although we, now, have a rough tool at our disposal that patches this hole... ;)

Using this unofficial function, we can roughly calculate the space occupied by the character trigrams hash-table mentioned above:


> (let ((ht (? wild:*lang-detector* '3gs)))
(+ (object-size ht)
(object-size @ht.table)
(object-size @ht.index-vector)
(object-size @ht.hash-vector)
(reduce '+ (map 'list
(lambda (obj)
;; the keys of the table are strings
;; and values -- alists of a symbol and a float
(reduce '+ (map 'list ^(if (listp %)
(sum 'object-size %)
(object-size %))
@ht.table)))))))
102856432

100 MB!


> (let ((ct (build-const-table (? wild:*lang-detector* '3gs))))
(+ (object-size ct)
(object-size @ct.gs)
(object-size @ct.meta)
(reduce '+ (map 'list ^(sum 'object-size %)
@ct.data))))
53372880

I.e., we have reduced the memory usage almost twice. Because all the metadata now occupies just 1479808 bytes or 1,5 MB.

One critical decision that allows for such drastic memory-use improvement is omitting the keys from the table. It should be noted that adding them back is trivial: (defstruct (keyed-pht (:include const-table)) (keys nil :type simple-vector) (test 'eql)), for which getting the value will work like this:


(defun csget (key cstab)
(let ((idx (csindex (bytes key) cstab)))
(when (call @cstab.test key (svref @cstab.keys idx))
(values (svref @cstab.data idx)
t)))

However, this will, basically, return us to the same ballpark as a textbook hash-table implementation in terms of memory usage while loosing in terms of speed. Yet, if we allow for some controlled level of false positives, there are a few algorithmic tricks that can be used to once again make keys almost negligible.

The first one is really simple and straightforward: replace the vector of keys with the vector of their hashes. In particular, if we take a single byte of the hash, such array will be 10-100 times smaller than the generic keys array and produce an FP-rate of 1/255.

Another trick will be to use a Bloom filter. For instance, a filter with 0.1 FP-rate for all the trigrams from our language identification model will require just around 0.4 MB of storage compared to 0.7 MB needed to store the 1-byte hashes and 30 MB needed to store just the keys. The disadvantage of a Bloom filter, however, is slower processing time: for the mentioned 10% FP-rate we'll need to perform 4 hash calculations, and if we'd like to reach the same 0.3% rate as the 1-byte hash array we'd have to increase the memory usage to 1MB and perform 8 hash calculations.

Finally, it should be noted that the main motivator for this work was reducing the memory footprint of my language identification library, for, to be widely adopted, such kind of project should be very frugal in its memory usage: 10-30 MB is its maximum polite footprint. By switching to a perfect hash-table, we haven't reached that goal (at least, for this particular model), but there's also plenty of room for optimizing values memory usage that I will return to later.

Const-tables

The other motivator for this work, in general, was my interest in the topic of efficient "static" data structures. In this context, I feel that the notion of a perfect hash table doesn't fully capture the essential features of the data structures class we're dealing with. First of all, the main distinguishing factor is static vs dynamic usage. A hash-table is thought of as a dynamic entity while our data structure is primarily a static one. Next, hashing is, for sure, involved in constructing all these structures, but it's only part of a bigger picture. The way of structuring the metadata, in particular, the index arrays may differ greatly not only between perfect and usual hash-table, but also within the ordinary hash-table class — within the different implementations.

So, I decided to come up with a different name for this data structure — a const-table (or cstab). It defines a broad class of persistent data structures that allow constant-time key-based access and, ideally, efficiently store the keys. The implementation, described here, is released as the library with the same name. It is still in its infancy, so major changes are possible — and suggestions on its improvement are also welcome.

McCLIMProgress report #11

· 22 days ago

Dear Community,

After three months of work, we are happy to present this progress report.

Some highlights for this iteration:

  • numerous bug fixes (and some regressions introduced)
  • graphics state has been factored into its own class hierarchy
  • clipping region may be an arbitrary region (not only a rectangle)
  • draw-design works correctly now for ellipses and patterns
  • major region code improvements including better coverage of region-intersection, region-contains-position-p, support for unsupported yet pairs (i.e standard-rectangle vs bounding-rectangle, standard-ellipse vs standard-rectangle and many others)
  • complete implementation of ellipsis with limited angles and rotations (region code and rendering)
  • numerous UX improvements for text-field, describe, drag-and-drop, scrolling, compose-space, resize-sheet, clim-debugger and listener
  • :pane option in define-application-frame is more useful now
  • clim-formatting-table code major refactor. One of the results is that borders may wrap table cells, columns and rows appropriately
  • drawing tests cleanups, improvements, and additions.

Some work has been done to improve CLX library. We have added a test suite and fixed its working on CCL (which had some problems with it). Thanks to that McCLIM works on CCL way faster than before (around 3x - subjective). We were also able to make CLIM-TOS (Franz's CLIM 2 fork) start simple applications on CCL and SBCL. I'm mentioning this because some people are not fond of McCLIM license and fact that we have two FOSS implementations of the same standard may be yet another good reason to start writing applications in CLIM.

Bounties:

All McCLIM bounties (both active and already solved) may be found here. Default bounty expiration date is 6 months after publishing it (a bounty may be reissued after that time period).

Bounties solved this iteration:

  • [$100] drag-test demo: dragging square to the empty position invokes the debugger
  • [$100] Text field pane height is too small, clipping the bottom of characters

Active bounties ($1600):

  • [$300] Listener: repl blocks when long process runs
  • [$500] Windows Backend
  • [$400] Fix Beagle backend
  • [$150] When flowing text in a FORMATTING-TABLE, the pane size is used instead of the column size
  • [$150] clx: input: english layout
  • [$100] Add PDF file generation (PDF backend)

Our current financial status is $2014 for bounties and $288 recurring monthly contributions from the supporters (thank you!).

Suggestions as to which other issues should have a bounty on them are appreciated and welcome. Please note that Bountysource has a functionality "Suggest an Issue" which may be found on the bounties page. If you would like to work on an issue that is not covered by the existing bounties, feel free to suggest a new bounty.

During this quarter we have noticed more interest in learning CLIM and developing applications with it among fellow Common Lisp programmers.

If you have any questions, doubts or suggestions - please contact me either by email (daniel@turtleware.eu) or on IRC (my nick is jackdaniel). McCLIM developers and users hang out on #clim IRC channel on Freenode.

Sincerely yours,
Daniel Kochmański

Here are some screenshots made in the meantime which are related to the changes: Ellipsis Ellipsis 3 Ellipsis 2 Scroll notes Slope lines Slope lines 2 Bordered table Bordered table 2 Address Book demo

Quicklisp newsBuild failure RSS feeds

· 22 days ago
Every day, I build each project that Quicklisp tracks. If a project fails, I try to track down the author and let them know. If a project chronically fails and there's no fix in sight, I drop the project from Quicklisp.

In an effort to help authors follow build status, I've made a system of RSS feeds that are updated with daily build failures. The feeds have the following patterns:

Each feed entry has info about where Quicklisp gets the project, what systems are failing, and an excerpt from the build log and backtrace. There are links to the full build log. 

I hope you find these reports useful. If you have any suggestions or questions, feel free to get in touch.


Quicklisp newsThe Quicklisp local-projects mechanism

· 32 days ago
Quicklisp provides a lot of software, but there's also a simple way to load things that Quicklisp doesn't provide. That same mechanism can be used to override libraries Quicklisp does provide.

The local projects mechanism sets up a special directory that is automatically scanned for software to load. Here are a few quick examples.

Trying a library not in Quicklisp

First, imagine that you just heard about a great new library and want to try it. However, it's not available through Quicklisp yet, only through a git repository on https://example.com/fun-project.git. One easy way to try it:
$ cd ~/quicklisp/local-projects
$ git clone https://example.com/fun-project.git
After the git command completes, and there is a fun-project subdirectory with a fun-project/fun-project.asd file present, the system is visible to ASDF and can be loaded either with ql:quickload or asdf:find-system. When loaded through ql:quickload, Quicklisp will automatically fetch and load any prerequisites if needed.

Overriding a library in Quicklisp

Second, imagine that you want to hack on a library that Quicklisp already provides. You don't want to load and hack on the version from Quicklisp - that software is not under version control, and just represents a snapshot of the project at a particular point in time.

Once again, the procedure is to put the software in the ~/quicklisp/local-projects/ directory:
$ cd ~/quicklisp/local-projects/
$ git clone https://github.com/xach/vecto.git
After the git command completes, (ql:quickload "vecto") will load the library from local-projects rather than from the standard Quicklisp release.

How it works

The local-projects mechanism is relatively automatic. Here's how it works underneath, and how to fix problems that might crop up.

ASDF has an extensible mechansim (the asdf:*system-definition-search-functions* variable) for searching for system files. Quicklisp extends this mechanism with a function that does the following, all in the context of the local-projectsdirectory.
  1. If there is no file named system-index.txt, it is created by scanning the directory tree for system files (matching "*.asd"). Each pathname is added to the file.
  2. If the system-index.txt file exists, but its timestamp is older than its containing directory, the directory is rescanned and the index recreated.
  3. The system-index.txt is searched for any entry with a pathname-name that matches the desired system name. If there's a match, matching pathname is probed. If it still exists, it is returned. If it has disappeared, the system-index.txt is recreated as in step 1 and the search is retried.
  4. Otherwise the system search is deferred to the remaining ASDF system search functions.
When there are multiple system files with the same name in the directory tree, the one with the shortest full pathname name is returned. In the case of a pathname length tie, the one that is #'string< is returned.

Timestamp problems can sometimes crop up with step 2 above. For example, if you have a directory local-projects/my-project/ and you create local-projects/my-project/supporting-system.asd, the timestamp of local-projects/ is not updated and supporting-system.asd won't be automatically added to the system index file.

There are a couple ways to force an update of the system index file. Within Lisp, you can use (ql:register-local-projects) to immediately regenerate system-index.txt. Outside of Lisp, you can use the touch command (or an equivalent) to update the timestamp of the local-projects directory, which will trigger a rebuild of the index on the next attempt to find systems..

Because of how the system index file is created (and recreated as needed), Quicklisp must have write access to the local-projects directory to make use of it.

Configuration

The local-projects mechanism is configured through a special variable ql:*local-project-directories*. By default, it includes only the local-projects subdirectory in the Quicklisp install directory, but you can add or remove directories at any time to have more places scanned for systems.
To disable the local-projects mechanism entirely, set ql:*local-project-directories* to NIL.

Quicklisp newsBuild failures with ASDF 3.3.1

· 32 days ago
SBCL 1.4.3 ships with ASDF 3.3.1, and a number of Quicklisp projects have build problems as a result. Linedit, mgl, micmac, cl-string-match, and others are affected.

Here is a build failure report for yesterday. (You can ignore the gendl failures - it's a special case.) If anyone has ways to fix these projects, please do so as soon as you can - otherwise they will be removed from the January Quicklisp dist update in a few weeks.

Victor AnyakinReading a file line-by-line revisited

· 33 days ago

One of the frequent questions is how do you read a file line by line using Common Lisp?

A canonical answer, as formulated by the Practical Common Lisp, section 14. Files and File I/O is essentially the same as the one provided by the Common Lisp Cookbook (Reading a File one Line at a Time):

(let ((in (open "/some/file/name.txt" :if-does-not-exist nil)))
  (when in
    (loop for line = (read-line in nil)
        while line do (format t "~a~%" line))
    (close in)))

And basically it does the job.

But what happens if you deal with a log that has captured random bytes from a crashing application? Lets simulate this scenario by reading from /dev/urandom. SBCL will give us a following result:

debugger invoked on a SB-INT:STREAM-DECODING-ERROR in thread
#:  :UTF-8 stream decoding error on
#:   the octet sequence #(199 231) cannot be decoded.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ATTEMPT-RESYNC   ] Attempt to resync the stream at a character boundary
                         and continue.
  1: [FORCE-END-OF-FILE] Force an end of file.
  2: [INPUT-REPLACEMENT] Use string as replacement input, attempt to resync at
                         a character boundary and continue.
  3: [ABORT            ] Exit debugger, returning to top level.

(SB-IMPL::STREAM-DECODING-ERROR-AND-HANDLE # 2)

The same will be reported on other Lisp implementations. However, dealing with this problem is not really portable, and requires platform-specific switches and boilerplate code.

For example, on SBCL it is possible to specify a replacement character in the external-format specification:

(with-open-file (in "/dev/urandom"
                      :if-does-not-exist nil
                      :external-format '(:utf-8 :replacement "?"))
  ;; read lines
)

Other Lisps require a different and incompatible external format specification.

But there are actually other ways to read a file line-by line. cl-faster-input looks into some of them. Namely:

  • A standard read-line.
  • read-line-into-sequence suggested by Pascal Bourguignon in a cll discussion. Unlike the standard read-line this function reads lines into a pre-allocated buffer, reducing workload on the garbage collector.
  • read-ascii-line that is the part of the COM.INFORMATIMAGO.COMMON-LISP.CESARUM library.
  • ub-read-line-string from the ASCII-STRINGS package that is a part of the CL-STRING-MATCH library

Please check the src/benchmark-read-line.lisp in the sources repository.

Benchmarks show that the ub-read-line-string outperforms the standard read-line approach, does not require platform-specific switches, and allows trivial character substitution on the fly (like up/down casing the text, replacing control characters etc.)

Sample usage (from the sources):

(with-open-file (is +fname+ :direction :input :element-type 'ascii:ub-char)
    (loop with reader = (ascii:make-ub-line-reader :stream is)
       for line = (ascii:ub-read-line-string reader)
       while line
       count line))

On developer’s desktop it takes 1.71 seconds to complete the benchmark with the standard read-line, and 1.076 seconds with the ub-read-line-string benchmark. Memory consumption is on the same level as the standard read-line, though significantly higher than the read-line-into-sequence.

On Clozure CL 1.9 the read-ascii-line benchmark fails. The ub-read-line-string falls into an infinite loop.

On Embeddable CL 16.0 all functions work, but the ub-read-line-string takes almost 10 times more time to complete than any of the alternatives.

Conclusion: It might be reasonable to look at different approaches for reading files line-by-line if you plan to deal with large volumes of text data with a possibility of presence of malformed characters. Check the sources of cl-faster-input for different ideas, tweak and run the benchmarks as it suits your tasks.

P.S. this post has been written in September of 2015 but never published. As it appeared to be pretty complete I decided to post it now, in the January of 2018. Stay tuned…

Quicklisp newsDownload stats for December, 2017

· 43 days ago
Here are the raw Quicklisp download stats for December, 2017:

27247  alexandria
23604 closer-mop
21195 anaphora
20818 cl-ppcre
20690 split-sequence
20360 let-plus
20153 iterate
20032 babel
19888 trivial-features
18779 trivial-gray-streams
18077 bordeaux-threads
17215 cffi
16969 more-conditions
16966 trivial-garbage
16258 puri
16049 flexi-streams
15447 nibbles
14567 utilities.print-items
14366 usocket
13449 esrap
13366 chunga
13149 cl+ssl
12853 cl-base64
12701 chipz
12408 trivial-backtrace
12365 drakma
9502 cl-fad
9335 asdf-flv
9270 cl-yacc
8593 fiveam
8050 parse-number
7899 closure-common
7893 cxml
7878 log4cl
7798 local-time
7646 ironclad
7621 architecture.hooks
7347 named-readtables
7343 parser.common-rules
6783 plexippus-xpath
6767 cl-json
6708 lift
6050 optima
5425 lparallel
5234 cl-clon
5107 cxml-stp
5031 xml.location
4858 utilities.print-tree
4855 cl-dot
4430 cl-store
4055 fare-quasiquote
3963 fare-utils
3816 inferior-shell
3815 fare-mop
3707 cl-unicode
3432 cl-interpol
3321 slime
2919 trivial-utf-8
2848 cl-utilities
2830 metabang-bind
2744 quri
2628 uuid
2415 trivial-types
2375 cl-annot
2372 cl-syntax
2299 cl-slice
2255 md5
2247 trivial-indent
2234 array-utils
2229 plump
2227 documentation-utils
2226 static-vectors
2219 gettext
2107 symbol-munger
2101 arnesi
2092 collectors
2087 access
2086 fast-io
2065 djula
2056 cl-locale
2051 cl-parser-combinators
2014 hunchentoot
1910 simple-date-time
1844 ieee-floats
1625 yason
1352 rfc2388
1293 monkeylib-binary-data
1171 osicat
1163 salza2
1153 utilities.binary-dump
1135 postmodern
1044 trivial-shell
1015 prove
980 diff
949 cl-who
942 asdf-system-connections
936 command-line-arguments
933 cl-containers
931 cl-custom-hash-table
925 metatilities-base

Christophe Rhodesalgorithms and data structures term1

· 43 days ago

Another term, another cycle of Algorithms & Data Structures.

This year I am teaching the entire two-term course by myself, compared with last academic year when I co-taught with another member of staff. The change brings with it different challenges; notably that I am directly responsible for more content. Or in this case, more multiple-choice quizzes!

This term, the students have been given nine multiple-choice quizzes, of which four are new content and five are revisions of last year's quizzes. The revisions are mostly to add more variety to particular question types - finding axes of generalization that I hadn't exploited last time - and more (and more punishing) distractor answers, so that students cannot game the system too easily.

Some of this is my own adaptation to behaviours I see in the student body; for example, one behaviour I started seeing this year was the habit of selecting all the answers from a multiple-answer multiple-choice question. This was probably a reaction itself to the configuration of the quiz not telling the students the right answer after the attempt, but merely whether they had got it right or wrong; the behaviour of the quiz engine (the Moodle quiz activity) was for each selected answer to indicate the correctness status, and so students were exploiting this to see the right answers. This was not in itself a problem - there were enough questions that in the student's next attempts at the quiz they were unlikely to see the same questions again - but it was being used as a substitute for actually thinking and working at the problem, and so this was a behaviour that I wanted to discourage. The next quiz, therefore, I adapted so that it had many single-answer multiple choice with many more distractor answers than usual: seven or eight, rather than the usual three or so. (I do not know whether the message got through.)

The new quizzes address some weaknesses I saw in the student body of knowledge last year, and indeed have seen in previous years too: a general lack of a robust mental model (or possibly "notional machine" of computation. To try to address this, I taught a specific dialect of pseudocode in the introductory lecture (nothing particularly esoteric; in fact, essentially what is provided by the algorithmicx LaTeX package). I then also wrote a small interpreter in emacs lisp for that pseudocode language (with a s-expression surface syntax, of course) and a pretty-printer from s-expressions to HTML, so that I could randomly generate blocks of pseudocode and ask students questions about them: starting with the basics, with sequences of expressions, and introducing conditionals, loops, nested loops, and loops with break.

The results of this quiz were revealing; at the start of the cycle, many students were unable to answer questions about loops at all - perhaps unsurprising as the complete description of the loop syntax was only given to the students in the second lecture. Even those who had intuited the meaning of the loop form that I was using had difficulty with nested loops, and this difficulty remained evident all the way to the end of the quiz period. (By way of comparison, students were able to deal with quite complicated chains of conditionals with not much more difficulty than straight-line pseudocode.) It will be very interesting to see whether the reforms and extensions we have put in place to our first-year curriculum will make a difference to this particular task next year.

Of course, once I had the beginnings of a pseudocode interpreter and pretty-printer, I then got to use it elsewhere, to the point that it has now grown (almost) enough functionality to warrant a static analyser and compiler. I'm resisting, because fundamentally it's about generating questions for multiple-choice quizzes, not about being a language for doing anything else with - but with an even fuller model for the computation, I could make something akin to Paul F. Dietz' Common Lisp random tester (still active after 15 years) which would probably have helped me spot a mistake I made when generating multiple-choice answers to questions about recursive algorithms, of the form "which of the following expressions replaces X in order to make this function return the sum of its two arguments?".

As well as the quizzes, the students have done six automatically-marked lab exercises and one technical peer-assessment. My direct involvement in assessment after setting all of these exercises has been limited to checking that the results of the peer-assessment are reasonable, by examining cases with outliers at a rate of maybe 6 minutes per student. Indirect involvement includes delivering lectures, answering questions face-to-face and on the forum, system administration, and writing code that writes code that writes summary feedback reports; this is probably a higher-value use of my time for the students than individualized marking; in that time, the students have received, on average: right-or-wrong judgments on 330 quiz questions (many of which have more than one possible right answer); 45 individual judgments on a moderately open algorithmic task; 50 marks and feedback on programming tasks; and a gentle helping of encouragement and sarcasm, in approximately equal measure, from me.

2017-18 A&DS term one coursework marks

The coursework marks are encouraging; there are a cluster of students at the top, and while the lower tail is disappointingly substantial it is artificially enhanced by a number of students who have (for whatever reason) dropped out. Limiting the analysis to those students who missed at most one assessment gives a more pleasing distribution; almost no-one who attempted everything received a failing mark, though I should temper my satisfaction with that by saying that I need to be careful that I'm not simply giving credit for being able to show up. (There are also some marks from this picture in the middle range, 50-70, which are withheld while allegations of plagiarism and/or collusion are resolved.)

And now, as I write, it is the last working day before term starts again, when the students return to find this term's lovingly prepared hastily assembled activities. Onward!

Zach BeaneVectometry is now part of Vecto

· 45 days ago

I wrote vecto to be able to draw stuff to PNGs. It's based on the PostScript/PDF drawing model, all the way down to the level of function arguments. To move to a point, you use (move-to x y). Curves are done with (curve-to x1 y1 x2 y2 x3 y3). Color calls are done with (set-rgb-fill r g b), etc. Each function argument that has multiple components is passed with the components separated.

This is all right, I guess, but it's also pretty inconvenient if you have an object that aggregates X and Y components to break them out all the time. Passing around six things instead of three or three things instead of one is annoying.

So, a long time ago, I made a more objecty frontend for Vecto and called it Vectometry. It has a load of convenience functions for working with points and colors as objects rather than separated components. It predefines some useful points (like *origin*) and colors (like *white* and *black*). It also adds a set of functions for working with rectangles, which are called "boxes" in the interface.

So, for example, the old vecto code might look like this:

(with-canvas (:height 100 :width 100)
  (move-to 0 0)
  (line-to 50 50)
  (line-to 100 0)
  (stroke)
  (save-png "foo.png"))

The new code looks something like this:

(let ((canvas (box 0 0 100 100)))
  (with-box-canvas canvas
    (move-to *origin*)
    (line-to (centerpoint canvas))
    (line-to (bottom-right canvas))
    (stroke)
    (save-png "bar.png")))

Boxes have maxpoint, minpoint, centerpoint, bottom-left, top-left, top-right, bottom-right, height, and width functions. They all return about what you'd expect.

But there's also a combine function that takes two boxes, or a point and a box, or two points, and returns a box big enough to cover the two objects.

And expand takes a box and an amount, and returns a new box that has its corners moved out by the specified amount, in all directions. And the nice thing about with-box-canvas is that if your box doesn't align with the origin, the drawing system still does - that is, the bottom left of your canvas box can be at positive or negative coordinates, but drawing at the origin will still draw at 0, 0.

displace takes a box and a point, and adds the point components to the minpoint of the box to produce a new box at a new location.

For points, there's new add, sub, mul, and div functions that do about what you'd expect. And there's also a function angle to get the angle between two points, and a function apoint that produces a point at a specified angle and distance from the origin.

Color objects are easier to make and pass around. rgb-color does what you'd expect, but there's also an hsv-color that provides a much nicer way to get related colors, and even an html-color function so you can easily use "#rgb" or "#rrggbb" strings to set fill or stroke colors.

Everything else that doesn't deal with component arguments is just passed through verbatim to Vecto, so things like stroke or fill-path don't change.

This object stuff creates new objects all the time instead of mutating old ones. Maybe it's slower and puts more pressure on the GC. But for the stuff I do I haven't noticed and it hasn't mattered.

I've been sitting on this vectometry code, using it for all my drawing needs for years but never publicizing it, because it wasn't documented yet. But I'd rather put it into vecto and make it easily accessible and document it Someday rather than leave it buried.

If you like drawing stuff and 2D APIs and PNGs and stuff, try the latest vecto from Quicklisp and give it a whirl. If you have problems let me know. If you want me to post some example code and output images let me know. Enjoy!

edit Here's some code I posted to twitter and 100-pointed and 5-pointed stars:

(defun star-demo (box points dip-factor)
  (let* ((radius (/  (max (height box) (width box)) 2))
         (step (/ pi points))
         (angle (/ pi 2))
         (center (centerpoint box)))
    (with-box-canvas (expand box 5)
      (set-fill-color *white*)
      (clear-canvas)
      (set-stroke-color (rgba-color 0 0 0 0.5))
      (centered-circle-path center radius)
      (set-line-width 10)
      (stroke)
      (translate center)
      (move-to (apoint angle radius))
      (dotimes (i points)
        (incf angle step)
        (line-to (apoint angle (* radius dip-factor)))
        (incf angle step)
        (line-to (apoint angle radius)))
      (set-fill-color *black*)
      (fill-path)
      (save-png (format nil "star~D.png" points)))))

Christophe Rhodessbcl 1 4 3 released

· 46 days ago

I released sbcl-1.4.3 just before the new year.

Mostly this blog post is to remind me that I need to improve the release script:

The script assumes that I build on x86-64. That is usually true, but this month there's a problem with building on my release build host, which is a squeeze-era VM: the new immobile-space support on x86-64 uses a synchonization builtin (__atomic_load()) that wasn't supported in gcc-4.4, and I didn't have time to investigate and test which of the __sync_ builtins is an acceptable replacement before vanishing off the internet for a couple of days. Probably:

hint = __sync_fetch_and_add(&page_hints[hint_index], 0);

Immobile space isn't (yet) supported on 32-bit x86. This meant that after shuffling some gpg keys around, I could do most of the release process on a 32-bit installation; the hard-coding of the architecture meant that I had to do some bits manually, so be more than usually suspicious when checking gnupg signatures.

Hopefully I'll arrange to fix some of this before the end of this month. (Testing the release script is a little tricky.)

Zach BeaneFASL package pitfall

· 49 days ago

In the past month I’ve seen the same failure affect multiple independent projects. Here’s how it happens:

Project A has code like this:

#+project-b (b:frob "Hello")

Project A’s system definition does not have an explicit dependency on project B.

Instead, the code that relies on project B is only read and compiled if project B happpens to be loaded first, otherwise it’s ignored.

There’s no problem if project A is compiled on its own. The project B code is ignored and the FASL can be loaded normally.

As soon as project B is present before project A is compiled, however, the project A FASLs contain a reference to B:FROB. In the next session, if project A is loaded from FASLs without loading project B, the system signals an error about a missing package B.

People write code like this to get optional extra functionality without unconditionally pulling in a new dependency.

I understand the desire to have optional dependencies, but I don’t think there’s a clear and direct way to express them in the current CL ecosystem. If you have a project’s symbols in your source code, you need to depend-on that project, or risk FASL-related package failure.

Quicklisp newsDecember 2017 Quicklisp dist update now available

· 50 days ago
New projects:
  • cl-ascii-table — Common Lisp library to present tabular data in ascii-art table. — MIT
  • cl-editdistance — A Common Lisp implementation of edit distance. — CC-BY-4.0
  • cl-ntriples — CL-NTRIPLES is a simple basic parser for Ntriples data. — BSD
  • cl-proj — CL-PROJ provides Proj.4 library bindings — BSD
  • de.setf.wilbur — a fork of net.sourceforge.wilbur updated for mcl and sbcl — LLGPL
  • vectors — Some utilities for using vectors — LLGPL
Updated projectsa-cl-loggerableahungry-fleecearchitecture.builder-protocolarchitecture.service-provideraws-sign4babelbeastcaclecarrierceplchancerychanlchronicitycl+sslcl-conllucl-cudacl-digraphcl-diskspacecl-disquecl-emojicl-enumerationcl-fadcl-fondcl-gamepadcl-gobject-introspectioncl-gpiocl-graphcl-iconvcl-interpolcl-liballegrocl-messagepack-rpccl-neovimcl-ntp-clientcl-pcgcl-portaudiocl-portmanteaucl-ppcrecl-rabbitcl-railcl-random-forestcl-spidevcl-strcl-string-matchcl-unificationcl-webkitclazycloser-mopclxcommonqtconfiguration.optionscroatoancurry-compose-reader-macrosdecltdelta-debugesrapf2clfare-csvfare-scriptsfemlispfiascoflexi-streamsfxmlgamebox-frame-managergamebox-gridsgamebox-mathglkitgraphhalftoneharmonyhornerhtml-templatehunchentootlegitlichat-protocollichat-serverliblichat-tcp-serverlisp-criticlocal-timemaidenmcclimmore-conditionsnew-opookoscoverlordpapyrusparser.common-rulesparser.inipostmodernprojecturedqbase64qt-libsqtoolsquickprojectsb-cgascalplserapeumsimple-tasksstaplestumpwmtap-unit-testthe-cost-of-nothingtinaatrivial-bit-streamstrivial-package-managerubiquitousuiopurl-rewriteutilities.binary-dumputilities.print-itemsutilities.print-treevectoworkout-timerxhtmlambdazs3.

To get this update, use (ql:update-dist "quicklisp").

This update was created with an older version of SBCL. The latest SBCL includes ASDF 3.3.1, which breaks a handful of projects in ways that have not been resolved yet. If you use the latest SBCL, see the build failure report to get an idea of what might not work properly.

Enjoy!

Paul KhuongHow to Print Integers Really Fast (With Open Source AppNexus Code!)

· 57 days ago

Back in 2014, AppNexus needed a faster way to convert machine integers to strings. It’s a stupid problem to have–just use a binary format–but it’s surprisingly common. Maybe a project can only grow large enough for serialisation speed to be an issue if it first disregards some trivial concerns. I can finally share our solution: after failing to nicely package our internal support code for years, we recently got the OK to open source the AppNexus Common Framework (ACF) under the Apache License 2.0, despite its incomplete state.

In the words of a friend and former colleague:

If you don’t want to read more about what’s in ACF and why I feel it’s important to open source imperfect repositories, jump to the section on fast itoa.

ACF contains the base data structure and runtime library code we use to build production services, in C that targets Linux/x86-64. Some of it is correctly packaged, most of it just has the raw files from our internal repository. Ironically, after settling on the project’s name, we decided not to publish the most “framework-y” bits of code: it’s unclear why anyone else would want to use it. The data structures are in C, and tend to be read-optimised, with perhaps some support for non-blocking single-writer/multi-reader concurrency. There’s also non-blocking algorithms to support the data structures, and basic HTTP server code that we find useful to run CPU-intensive or mixed CPU/network-intensive services.

Publishing this internal code took a long time because we were trying to open a project that didn’t exist yet, despite being composed of code that we use every day. AppNexus doesn’t sell code or binaries. Like many other companies, AppNexus sells services backed by in-house code. Our code base is full of informal libraries (I would be unable to make sense of the code base if it wasn’t organised that way), but enforcing a clean separation between pseudo-libraries can be a lot of extra work for questionable value.

These fuzzy demarcations are made worse by the way we imported some ideas directly from Operating Systems literature, in order to support efficient concurrent operations. That had a snowball effect: everything, even basic data structures, ends up indirectly depending on runtime system/framework code specialised for our use case. The usual initial offenders are the safe memory reclamation module, and the tracking memory allocator (with a bump pointer mode); both go deep in internals that probably don’t make sense outside AppNexus.

Back in 2015, we looked at our support code (i.e., code that doesn’t directly run the business) and decided we should share it. We were–and still are–sure that other people face similar challenges, and exchanging ideas, if not directly trading code, can only be good for us and for programming in general. We tried to untangle the “Common” (great name) support library from the rest of the code base, and to decouple it from the more opinionated parts of our code, while keeping integration around (we need it), but purely opt-in.

That was hard. Aiming for a separate shared object and a real Debian package made it even harder than it had to be. The strong separation between packaged ACF code and the rest of repo added a lot of friction, and the majority of the support code remained in-tree.

Maybe we made a mistake when we tried to librarify our internals. We want a library of reusable code; that doesn’t have to mean a literal shared object. I’m reminded of the two definitions of portable code: code sprinkled with platform conditionals, or code that can be made to run on a new machine with minimal effort. Most of the time, I’d rather have the latter. Especially when code mostly runs on a single platform, or is integrated in few programs, I try to reduce overhead for the common case, while making reuse possible and easy enough that others can benefit.

And that’s how we got the ACF effort out of the door: we accepted that the result would not be as polished as our favourite open source libraries, and that most of the code wouldn’t even be packaged or disentangled from internals. That’s far from an ideal state, but it’s closer to our goals than keeping the project private and on the backburner. We got it out by “feature” boxing the amount of work–paring it down to figuring out what would never be useful to others, and tracking down licenses and provenance–before pushing the partial result out to a public repository. Unsurprisingly, once that was done, we completed more tasks on ACF in a few days than we have in the past year.

Now that ACF is out, we still have to figure out the best way to help others co-opt our code, to synchronise the public repository with our internal repository, and, in my dreams, to accept patches for the public repo and have them also work for the internal one. In the end, what’s important is that the code is out there with a clear license, and that someone with similar problems can easily borrow our ideas, if not our code.

The source isn’t always pretty, and is definitely not as well packaged and easily re-usable as we’d like it to be, but it has proved itself in production (years of use on thousands of cores), and builds to our real needs. The code also tries to expose correct and efficient enough code in ways that make correct usage easy, and, ideally, misuse hard. Since we were addressing specific concrete challenges, we were able to tweak contracts and interfaces a bit, even for standard functionality like memory allocation.

The last two things are what I’m really looking for when exploring other people’s support code: how did usage and development experience drive interface design, and what kind of non-standard tradeoffs allowed them to find new low-hanging fruits?

If anyone else is in the same situation, please give yourself the permission to open source something that’s not yet fully packaged. As frustrating as that can be, it has to be better than keeping it closed. I’d rather see real, flawed but production-tested, code from which I can take inspiration than nothing at all.

How to print integers faster

The integer to string conversion file (an_itoa) is one instance of code that relaxes the usual [u]itoa contract because it was written for a specific problem (which also gave us real data to optimise for). The relaxation stems from the fact that callers should reserve up to 10 chars to convert 32 bit (unsigned) integers, and 20 chars for 64 bit ones: we let the routines write garbage (0/NUL bytes) after the converted string, as long as it’s in bounds. This allowance, coupled with a smidge of thinking, let us combine a few cute ideas to solve the depressingly common problem of needing to print integers quickly. Switching to an_itoa might be a quick win for someone else, so I cleaned it up and packaged it immediately after making the repository public.

We wrote an_itoa in July 2014. Back then, we had an application with a moderate deployment (a couple racks on three continents) that was approaching capacity. While more machines were in the pipeline, a quick perf run showed it was spending a lot of time converting strings to integers and back. We already had a fast-ish string to integer function. Converting machine integers back to string however, is a bit more work, and took up around 20% of total CPU time.

Of course, the real solution here is to not have this problem. We shouldn’t have been using a human-readable format like JSON in the first place. We had realised the format would be a problem a long time ago, and were actually in the middle of a transition to protobuf, after a first temporary fix (replacing a piece of theoretically reconfigurable JavaScript that was almost never reconfigured with hardcoded C that performed the same JSON manipulation). But, there we were, in the middle of this slow transition involving terabytes of valuable persistent data, and we needed another speed boost until protobuf was ready to go.

When you’re stuck with C code that was manually converted, line by line, from JavaScript, you don’t want to try and make high level changes to the code. The only reasonable quick win was to make the conversion from integer to string faster.

Human readable formats wasting CPU cycles to print integers is a common problem, and we quickly found a few promising approaches and libraries. Our baseline was the radix-10 code in stringencoders. This post about Lwan suggested using radix-10, but generating strings backward instead of reversing like the stringencoders library. Facebook apparently hit a similar problem in 2013, which lead to this solution by Andrei Alexandrescu. The Facebook code combines two key ideas: radix-100 encoding, and finding the length of the string with galloping search to write the result backward, directly where it should go.

Radix-100 made sense, although I wasn’t a fan of the 200-byte lookup table. I was also dubious of the galloping search; it’s a lot of branches, and not necessarily easy to predict. The kind of memmove we need to fixup after conversion is small and easy to specialise on x86, so we might not need to predict the number of digits at all.

I then looked at the microbenchmarks for Andrei’s code, and they made it look like the code was either tested on integers with a fixed number of digits (e.g., only 4-digit integers), or randomly picked with uniform probability over a large range.

If the number of digits is fixed, the branchiness of galloping search isn’t an issue. When sampling uniformly... it’s also not an issue because most integers are large! If I pick an integer at random in [0, 1e6), 90% of the integers have 6 digits, 99% 5 or 6, etc.

Sometimes, uniform selection is representative of the real workload (e.g., random uids or sequential object ids). Often, not so much. In general, small numbers are more common; for example, small counts can be expected to roughly follow a Poisson distribution.

I was also worried about data cache footprint with the larger lookup table for radix-100 encoding, but then realised we were converting integers in tight loops, so the lookup table should usually be hot. That also meant we could afford a lot of instruction bytes; a multi-KB atoi function wouldn’t be acceptable, but a couple hundred bytes was fine.

Given these known solutions, John and I started doodling for a bit. Clearly, the radix-100 encoding was a good idea. We now had to know if we could do better.

Our first attempt was to find the number of decimal digits more quickly than with the galloping search. It turns out that approximating \(\log\sb{10}\) is hard, and we gave up ;)

We then realised we didn’t need to know the number of decimal digits. If we generated the string in registers, we could find the length after the fact, slide bytes with bitwise shifts, and directly write to memory.

I was still worried about the lookup table: the random accesses in the 200 byte table for radix-100 encoding could hurt when converting short arrays of small integers. I was more comfortable with some form of arithmetic that would trade best-case speed for consistent, if slightly sub-optimal, performance. As it turns out, it’s easy to convert values between 0 and 100 to unpacked BCD with a reciprocal multiplication by \( 1/10 \) and some in-register bit twiddling. Once we have a string of BCD bytes buffered in a general purpose register, we can vertically add '0' to every byte in the register to convert to ASCII characters. We can even do the whole conversion on a pair of such values at once, with SIMD within a register.

The radix-100 approach is nice because it chops up the input two digits at a time; the makespan for a given integer is roughly half as long, since modern CPUs have plenty of execution units for the body.

The dependency graph for radix-10 encoding of 12345678 looks like the following, with 7 serial steps.

Going for radix-100 halves the number of steps, to 4. The steps are still serial, except for the conversion of integers in [0, 100) to strings.

Could we expose even more ILP than the radix-100 loop?

The trick is to divide and conquer: divide by 10000 (1e4) before splitting each group of four digits with a radix-100 conversion.

Recursive encoding gives us fewer steps, and 2 of the 3 steps can execute in parallel. However, that might not always be worth the trouble for small integers, and we know that small numbers are common. Even if we have a good divide-and-conquer approach for larger integers, we must also implement a fast path for small integers.

The fast path for small integers (or the most significant limb of larger integers) converts a 2- or 4- digit integer to unpacked BCD, bitscans for the number of leading zeros, converts the BCD to ASCII by adding '0' (0x30) to each byte, and shifts out any leading zero; we assume that trailing noise is acceptable, and it’s all NUL bytes anyway.

For 32-bit integers an_itoa (really an_uitoa) looks like:

if number < 100:
    execute specialised 2-digit function
if number < 10000:
    execute specialised 4-digit function

partition number with first 4 digits, next 4 digits, and remainder.

convert first 2 groups of 4 digits to string.
If the number is < 1e8:  # remainder is 0!
    shift out leading zeros, print string.
else:
    print remainder # at most 100, since 2^32 < 1e10
    print strings for the first 2 groups of 4 digits.

The 64 bit version, an_ltoa (really an_ultoa) is more of the same, with differences when the input number exceeds 1e8.

Is it actually faster?

I’ve already concluded that cache footprint was mostly not an issue, but we should still made sure we didn’t get anything too big.

  • an_itoa: 400 bytes.
  • an_ltoa: 880 bytes
  • fb_itoa: 426 bytes + 200 byte LUT
  • fb_constant_itoa (without the galloping search): 172 bytes + 200 byte LUT
  • lwan_itoa (radix-10, backward generation): 60 bytes.
  • modp_uitoa10: 91 bytes.

The galloping search in Facebook’s converter takes a lot of space (there’s a ton of conditional branches, and large numbers must be encoded somewhere). Even if we disregard the lookup table, an_itoa is smaller than fb_itoa, and an_ltoa (which adds code for > 32 bit integers) is only 254 bytes larger than fb_itoa (+ LUT). Now, Facebook’s galloping search attempts to make small integers go faster by checking for them first; if we convert small numbers, we don’t expect to use all ~250 bytes in the galloping search. However, an_itoa and an_ltoa are similar: the code is setup such that larger numbers jump forward over specialised subroutines for small integers. Small integers thus fall through to only execute code at the beginning of the functions. 400 or 800 bytes are sizable footprints compared to the 60 or 90 bytes of the radix-10 functions, but acceptable when called in tight loops.

Now that we feel like the code and lookup table sizes are reasonable (something that microbenchmarks rarely highlight), we can look at speed.

I first ran the conversion with random integers in each digit count class from 1 digit (i.e., numbers in [0, 10)) to 19 (numbers in [1e8, 1e9)). The instruction cache was hot, but the routines were not warmed on that size class of numbers (more realistic that way).

The results are cycle counts (with the minimum overhead for a no-op conversion subtracted from the raw count), on an unloaded 2.4 GHz Xeon E5-2630L, a machine that’s similar to our older production hardware.

We have data for:

  • an_itoa, our 32 bit conversion routine;
  • an_ltoa, our 64 bit conversion routine;
  • fb_constant_itoa, Facebook’s code, with the galloping search stubbed out;
  • fb_itoa, Facebook’s radix-100 code;
  • itoa, GNU libc conversion (via sprintf);
  • lw_itoa, Lwan’s backward radix-10 converter;
  • modp, stringencoder’s radix-10 / strreverse converter.

I included fb_constant_itoa to serve as a lower bound on the radix-100 approach: the conversion loop stops as soon as it hits 0 (same as fb_itoa), but the data is written at a fixed offset, like lw_itoa does. In both fb_constant_itoa’s and lw_itoa’s cases, we’d need another copy to slide the part of the output buffer that was populated with characters over the unused padding (that’s why fb_itoa has a galloping search).

When I chose these functions back in 2014, they were all I could find that was reasonable. Since then, I’ve seen one other divide and conquer implementation, although it uses a lookup table instead of arithmetic to convert radix-100 limbs to characters, and an SSE2 implementation that only pays off for larger integers (32 bits or more).

Some functions only go up to UINT32_MAX, in which case we have no data after 9 digits. The raw data is here; I used this R script to generate the plot.

The solid line is the average time per conversion (in cycles), over 10K data points, while the shaded region covers the 10th percentile to the 90th percentile.

(GNU) libc’s conversion is just wayy out there. The straightforward modp (stringencoders) code overlaps with Facebook’s itoa; it’s slightly slower, but so much smaller.

We then have two incomplete string encoders: neither fb_constant_itoa nor lw_itoa generates their output where it should go. They fill a buffer from the end, and something else (not benchmarked) is responsible for copying the valid bytes where they belong. If an incomplete implementation suffices, Lwan’s radix-10 approach is already competitive with, arguably faster than, the Facebook code. The same backward loop, but in radix-100, is definitely faster than Facebook’s full galloping search/radix-100 converter.

Finally, we have an_itoa and an_ltoa, that are neck and neck with one another, faster than both modp and fb_itoa on small and large integers, and even comparable with or faster than the incomplete converters. Their runtime is also more reliable (less variance) than modp’s and fb_itoa’s: modp pays for the second variable length loop in strreverse, and fb_itoa for the galloping search. There are more code paths in an_itoa and an_ltoa, but no loop, so the number of (unpredictable) conditional branches is lower.

What have we learned from this experiment?

  1. It’s easy to improve on (g)libc’s sprintf. That makes sense, since that code is so generic. However, in practice, we only convert to decimal, some hex, even less octal, and the rest is noise. Maybe we can afford to special case these bases.
  2. The double traverse in modp_uitoa10 hurts. It does make sense to avoid that by generating backward, ideally in the right spot from the start.
  3. Radix-100 encoding is a win over radix-10 (fb_constant_itoa is faster than lwan_itoa).
  4. Using registers as buffers while generating digits is a win (an_itoa and an_ltoa are faster for small values).
  5. Divide and conquer is also a win (an_ltoa is flatter for large integers).

With results that made sense for an easily understood microbenchmark, I decided to try a bunch of distributions. Again, the code was hot, the predictors lukewarm, and we gathered 10K cycle counts per distribution/function. The raw data is here, and I used this R script to generate the plot.

The independent variables are all categorical here, so I use one facet per distribution, and, in each facet, a boxplot per conversion function, as well as a jittered scatter plot to show the distribution of cycle counts.

Clearly, we can disregard glibc’s sprintf (itoa).

The first facet generated integers by choosing uniformly between \(100, 1000, 10\sp{4}, \ldots, 10\sp{8}\). That’s a semi-realistic variation on the earlier dataset, which generated a bunch of numbers in each size class, and serves as an easily understood worst-case for branch prediction. Both an_itoa and an_ltoa are faster than the other implementations, and branchier implementations (fb_itoa and modp) show their variance. Facebook’s fb_itoa isn’t even faster than modp’s radix-10/strreverse encoder. The galloping search really hurts: fb_constant_itoa, without that component, is slightly faster than the radix-10 lw_itoa.

The second facet is an even harder case for branch predictors: random values skewed with an exponential (pow(2, 64.0 * random() / RAND_MAX)), to simulate real-world counts. Both an_itoa and an_ltoa are faster than the other implementations, although an_ltoa less so: an_itoa only handles 32 bit integers, so it deals with less entropy. Between the 32-bit implementations, an_itoa is markedly faster and more consistent than lw_itoa (which is incomplete) and modp. Full 64-bit converters generally exhibit more variance in runtime (their input is more randomised), but an_ltoa is still visibly faster than fb_itoa, and even than the incomplete fb_constant_itoa. We also notice that fb_itoa’s runtimes are more spread out than fb_constant_itoa: the galloping search adds overhead in time, but also a lot of variance. That makes me think that the Facebook code is more sensitive than others to difference in data distribution between microbenchmarks and production.

The third facet should be representative of printing internal sequential object ids: uniform integers in [0, 256K). As expected, every approach is tighter than with the skewed “counts” distribution (most integers are large). The an_itoa/an_ltoa options are faster than the rest, and it’s far from clear that fb_itoa is preferable to even modp. The range was also chosen because it’s somewhat of a worst case for an_itoa: the code does extra work for values between \(10\sp{4}\) and \(10\sp{8}\) to have more to do before the conditional branch for x < 1e8. That never pays off in the range tested here. However, even with this weakness, an_itoa still seems preferable to fb_itoa, and even to the simpler modp_uitoa10.

The fourth facet (first of the second row) shows what happens when we choose random integers in [0, 20). That test case is interesting because it’s small, thus semi-representative of some of our counts, and because it needs 1 or 2 digits with equal probability. Everything does pretty well, and runtime distributions are overall tight; branch predictors can do a decent job when there are only two options. I’m not sure why there’s such a difference between an_itoa and an_ltoa’s distribution. Although the code for any value less than 100 is identical at the C level, there are small difference in code generation... but I can’t pinpoint where the difference might come from.

The fifth facet, for random integers in [100, 200) is similar, with a bit more variance.

The sixth facet generates unix timestamps around a date in 2014 with uniform selection plus or minus one million second. It’s meant to be representative of printing timestamps. Again, an_itoa and an_ltoa are faster than the rest, with an_itoa being slightly faster and more consistent. Radix-100 (fb_constant_itoa) is faster and more consistent than radix-10 (lw_itoa), but it’s not clear if fb_itoa is preferable to modp. The variance for modp is larger than for the other implementations, even fb_itoa: that’s the cost of a radix-10 loop and of the additional strreverse.

This set of results shows that conditional branches are an issue when converting integers to strings, and that the impact of branches strongly depends on the distribution. The Facebook approach, with a galloping search for the number of digits, seems particularly sensitive to the distribution. Running something like fb_itoa because it does well in microbenchmark is thus only a good idea if we know that the microbenchmark is representative of production.

Bigger numbers take more time to convert, but the divide and conquer approach of an_itoa and an_ltoa is consistently faster at the high end, while their unrolled SIMD-within-a-register fast path does well for small numbers.

So, if you ever find yourself bottlenecked on s[n]printf

The correct solution to the “integer printing is too slow” problem is simple: don’t do that. After all, remember the first rule of high performance string processing: “DON’T.” When there’s no special requirement, I find Protobuf does very well as a better JSON.

However, once you find yourself in this bad spot, it’s trivial to do better than generic libc conversion code. This makes it a dangerously fun problem in a way... especially given that the data distribution can matter so much. No benchmark is perfect, but various implementations are affected differently by flaws in microbenchmarks. It’s thus essential not to overfit on the benchmark data, probably even more important than improving performance by another factor of 10% or 20% (doing 4-5x better than libc code is already a given). That’s why I prefer integer conversion code with more consistent cycle counts: there’s less room for differences due to the distribution of data.

Finally, if, like 2014-AppNexus, you find yourself converting a lot of integers to strings in tight loops (on x86-64 machines), try an_itoa or an_ltoa! The whole repository is Apache 2.0, and it should be easy to copy and paste all the dependencies to pare it down to two files. If you do snatch our code, note that the functions use their destination array (up to 10 bytes for an_itoa, and 20 for an_ltoa) as scratch space, even for small integers.

Thank you for reviewing drafts, John, Ruchir, Shreyas, and Andrew.

Luís OliveiraA Lisp REPL in your pocket

· 60 days ago

Thanks to Polos Ruetz, you can now play with Common Lisp directly on your Android phone. All you need to do is install CL REPL from the Google Play Store. CL REPL is one of the examples part of EQL5-Android which is built on top of EQL5, a project that marries ECL with Qt.



CL REPL

A Common Lisp REPL with command line and history, plus a simple editor with syntax highlighting, simple visual paren-matching, a file dialog for opening/saving files, and a simple debug dialog. It uses the ECL implementation for the Lisp side, and Qt5/QML for the UI. This is an open source project (see EQL5-Android).


(via @dk_jackdaniel)

ECL NewsECL license

· 65 days ago

From time to time a little misconception emerges regarding the ECL license - namely some people seem to have the impression that it is GPL-2.0, while in fact it is LGPL-2.1+. In this post I want to talk a little about the history of ECL and to describe what the practical implications of such licensing are.

The heritage of Embeddable Common Lisp is rich, as you can read here. The software has had a few maintainers throughout its history who hold copyrights to various parts of the code. ECL was licensed under GPL-2.0, but that license was changed after ECLS (or ECL-Spain) and ECoLisp were once again one project and Prof. Juanjo García-Ripoll changed the license to LGPL-2.1+ with agreement of Prof. Giuseppe Attardi. That's the point from which I took over in 2015 with blessing from the previous maintainer. I do not own all copyrights to the software and I can't change its license to anything that is incompatible with working in LGPL-2.1+. Note that parts of the codebase are licensed more liberally (like programs in the examples directory which may be used for any purpose and are licensed under the terms of BSD-2-Clause).

That said, I feel very comfortable with current licensing. It preserves a reasonable balance between producent and consumer rights and fits project goals perfectly. The meaning of our current license is, in short, the following: you can use ECL for any purpose in any setting (including commercial applications), but if you commit changes to ECL itself you are obliged to share these changes (and only them).

The core library is a shared object libecl.so and it is dynamically linked with the software. The binary called ecl is just a program which is a client of this library. Moreover, ECL compilation artifacts are usually shared objects themself (usually disguised under the fas extension):

➜  ~ ldd `which ecl`
    linux-vdso.so.1 =>  (0x00007ffff80c3000)
    libecl.so.16.1 => /home/jack/Pulpit/lisps/ecl-16.1.3/lib/libecl.so.16.1 (0x00007fc7c4665000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc7c427a000)
    libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fc7c3ffa000)
    libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fc7c3df2000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc7c3bd4000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc7c39ce000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc7c36c5000)
    /lib64/ld-linux-x86-64.so.2 (0x00005592f1735000)
➜  ~ cd .cache/common-lisp/ecl-16.1.3-21f0b92f-linux-x64/home/jack/Pulpit/repo/mcclim/Apps/Listener
➜  Listener ls listener*
listener.fas listener.o
➜  Listener ldd listener.fas 
    linux-vdso.so.1 =>  (0x00007fffb43f5000)
    libecl.so.16.1 => /home/jack/Pulpit/lisps/ecl-develop/lib/libecl.so.16.1 (0x00007fa2bbfc1000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa2bbbd6000)
    libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fa2bb956000)
    libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007fa2bb74e000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa2bb530000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa2bb32a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa2bb021000)
    /lib64/ld-linux-x86-64.so.2 (0x0000563db6716000)
➜  Listener file listener.fas 
listener.fas: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=ed1cece88028eb3a388ab0a589a9ee12415532e9, not stripped

There are a few implications of this. First, I will explain (informally) what LLGPL clarification to LGPL is. Many Common Lisp implementations don't work with the notion of linking, so "static linking" and "dynamic linking" doesn't make much sense in them. Libraries are simply added to the image. That may raise questions whenever a binary with an LGPL library in it may be considered derivative work or not? My strong belief lies on the side that it is not derived work and if the author of the Lisp library gave it the LGPL license - they meant it. If we take another interpretation, then it is no different than licensing it with GPL, so it would be nonsense. I think that what is tested in court is intention and there is no rational interpretation of giving a Lisp library the LGPL license except the one that the software does not influence other image components. LLGPL additionally clarifies some wording to make it less ambiguous and clear up unnecessary doubts.

All that said, ECL is safe from concerns like that and such clarification is not necessary because it works with the very same notion LGPL talks about (static and dynamic linking). There is no program image, only objects which a binary is composed of and linked with (exactly like in C programs).

Another interesting license (with which ECL is compatible with thanks to "or later version" clause) is LGPL-3.0. This license is for example used by ECL's fork named MKCL and GMP since version 6. In short, this license adds an important restriction to LGPL-2.1+ called the anti-tivoization provision. This privision adds a new freedom for software users - they must have the ability to update the library on a device with software under this license to their own version of it. This effectively means that the device can't be a complete black box.

This leads us to another topic. ECL is licensed under LGPL-2.1+ license and it is linked with GMP. As we have noted in the paragraph above, the newest version of GMP is licensed under LGPL-3.0. In practice this means that if you use ECL and GMP in your application and any of them is LGPL-3.0 you can't put such a bundle on a camera device which doesn't allow changing its software. To prevent such situation ECL has bundled its own version of GMP, which is a slightly modified version of GMP version 4, which was licensed under the LGPL-2.1+ terms. By default, when building it tries to link against libgmp.so on your system, but given appropriate configure flags it may use the bundled GMP version and link it statically into libecl.so (ECL doesn't export symbols from libeclgmp.a to avoid symbol name conflicts with the original libgmp.so).

I think that summarises it. Now I will provide some made up FAQ to illustrate licensing implications in shorter form:

Q: Is ECL GPL-2.0?
A: No, ECL is LGPL-2.1+.

Q: Can you provide a commercial license for us?
A: No, I don't own all the copyrights.

Q: Can we use ECL in our commercial product with proprietary components?
A: Yes, but you have to keep ECL linked dynamically with them (as a shared object).

Q: Can you incorporate proprietary components in ECL?
A: God no (and I wouldn't do that even if I could).

Q: Can we use ECL in our LGPL/GPL/AGPL product?
A: Yes, you can even link ECL statically for that purpose. Your license will be intact.

Q: Can you incorporate LGPL/GPL/AGPL components in ECL?
A: If you incorporate LGPL-2.1+, then ECL remains in LGPL-2.1+ and it can be integrated in the upstream; but if you incorporate LGPL-3.0, GPL or AGPL, then your fork of ECL will become LGPL-3.0, GPL or AGPL and it won't be integrated upstream.

Q: Can we use ECL in our BSD/MIT/Apache-2.0 product?
A: Yes. If it is dynamically linked there are no further implications. If you link it statically, the overall license is LGPL-2.1+.

Q: Can you incorporate BSD/MIT/Apache-2.0 components in ECL?
A: Yes, sometimes I do that.

Q: If I compile my software with ECL is it LGPL-2.1+?
A: No, products of compilation are not influcenced by compiler license. You may compile any kind of free and proprietary software with ECL without any implications on the compilation artifacts or the code it compiles. I would appreciate not using it for something unethical though.

Q: Can I put ECL in an embedded device to which the consumer doesn't have access to?
A: Yes. You may need to build ECL with bundled GMP library to avoid LGPL-3.0 implications.

Q: If I use two libraries in my application - one being LGPL and the other MIT - what is my program license?
A: That depends. If you link them statically (monolithic programs), then resulting work will be covered at least with LGPL (you may add more restrictions if you want). If you link them dynamically (default), then you may use any license you want.

Q: If I use GPL libraries in my application - what is my program license?
A: Its license is at least GPL.

Q: Are you a lawyer?
A: Nope. You may want to consult one. But hey, you should also consult a lawyer regarding the terms of services you probably agree on surfing the web and EULAs bundled with your favourite software and hardware (i.e phone).

Q: Did you cover all LGPL-2.1+ implications?
A: No, I recommend reading the license. I have talked about things which I find relevant to this post.

Q: Can I buy something from you?
A: Yes, you may buy developer time to work on ECL or to help integrate ECL with your application. I'll probably do it anyway if you drop by on the IRC channel.

If you have more questions you may ask on the mailing list and IRC (channel #ecl on freenode).

Thank you for reading,
Daniel Kochmański

--

  • I want to thank Elias Mårtenson, Pascal Bourguignon and Tomek Kurcz for proofreading this text and for providing valuable remarks before publishing it on the blog.

  • In this post I've used SPDX-License-Identifier format where appropriate.

Didier VernaAnnouncing Quickref: a global documentation project for Common Lisp

· 67 days ago

Today, we deployed the first version of Quickref, a new global documentation project for Common Lisp.

The purpose of Quickref is to provide a centralized collection of reference manuals for the whole Quicklisp world. This means around 1500 libraries, for a total of around 3000 ASDF systems. The reference manuals are generated by Declt, which is probably the most complete documentation system for Common Lisp currently available, and delivered in HTML (PDF versions could easily be made available as well).

A lot of things can still be improved, but I'm pretty satisfied with the result so far. 3000 ASDF systems is a hell of a test suite for Declt, and I'm happy to report that it passes on practically all of them. Only a couple of issues remain, not even due to Declt itself, and less than a dozen or so libraries still pose problems (mostly technical difficulties due to foreign dependencies).

Quickref was made by Antoine Martin, as part of an internship with me. Many thanks to him! We still have some cleanup and packaging to do, but we expect to open-source the infrastructure soon. I also want to thank Mark Evenson, Erik Huelsmann and the Common Lisp Foundation for hosting the project on common-lisp.net (it was only natural)!

Finally, let me restate this again (and again): reference manuals are not user manuals. They are... reference manuals. Although automatically generated, there are some things you can do, as a library author, to improve the output (this is an area of Declt which I intend to work on in the future). Please refer to the Declt user manual (notably section 3.2 Coding Style) for more information.

TurtleWareMcCLIM demo - St Nicolas Day present

· 74 days ago

One of the projects I work on is McCLIM which is a GUI toolkit tailored for Common Lisp. For a few weeks now I was thinking about recording a demo session which shows some random usage of this software. If you are interested to take time and watch it, it takes around 30 minutes.

This is my first tutorial video recorded in the home, so I would appreciate any feedback with remarks what I did good and what I did wrong. Thank you and enjoy the video!

Quicklisp newsNovember 2017 Quicklisp download stats

· 75 days ago

Here are the raw download counts for the top 100 projects in Quicklisp for November:

16654  alexandria
14200 closer-mop
13130 split-sequence
12669 cl-ppcre
12667 anaphora
12397 babel
12381 trivial-features
11989 iterate
11818 trivial-gray-streams
11520 bordeaux-threads
10907 let-plus
10821 cffi
10380 trivial-garbage
9687 flexi-streams
9603 puri
9396 nibbles
8847 usocket
8712 more-conditions
8008 cl+ssl
7807 trivial-backtrace
7741 cl-base64
7707 chunga
7505 utilities.print-items
7409 chipz
7269 esrap
7098 drakma
6996 cl-fad
5862 cl-yacc
5862 ironclad
5524 named-readtables
5337 local-time
5239 parse-number
4966 cxml
4966 closure-common
4878 fiveam
4712 asdf-flv
4565 cl-json
4515 log4cl
4375 bt-semaphore
4258 architecture.hooks
3904 plexippus-xpath
3886 lparallel
3745 parser.common-rules
3576 lift
3486 optima
3238 cl-dot
3166 slime
3109 cl-unicode
3101 cl-interpol
3061 cxml-stp
3036 cl-store
3013 cl-clon
2910 xml.location
2890 trivial-utf-8
2730 utilities.print-tree
2660 uuid
2606 fare-utils
2595 md5
2521 fare-quasiquote
2489 metabang-bind
2488 static-vectors
2401 fare-mop
2400 cl-utilities
2398 inferior-shell
2246 ieee-floats
2227 quri
2174 fast-io
1961 hunchentoot
1956 trivial-types
1930 cl-annot
1921 cl-syntax
1711 symbol-munger
1687 trivial-indent
1679 collectors
1671 arnesi
1671 access
1661 rfc2388
1642 cl-slice
1631 documentation-utils
1626 array-utils
1623 yason
1623 plump
1619 cl-parser-combinators
1614 gettext
1609 cl-locale
1606 djula
1602 cl-who
1496 simple-date-time
1415 osicat
1366 parenscript
1358 monkeylib-binary-data
1305 postmodern
1245 lisp-unit
1239 trivial-shell
1233 command-line-arguments
1227 asdf-system-connections
1223 cl-containers
1221 metatilities-base
1198 salza2
1167 parse-float

Quicklisp newsNovember 2017 Quicklisp dist update now available

· 75 days ago
New projects:
  • cacle — Extensible cache services for Common Lisp — MIT
  • ccl-compat — Clozure CL compatibility module — LLGPL
  • ccldoc — create lisp documentation using s-expressions — Apache License 2.0
  • chancery — A library for procedurally generating text, inspired by Tracery. — MIT/X11
  • cl-flac — Bindings to libflac, a simple FLAC decoding library — Artistic
  • cl-portmanteau — cl-portmanteau — OSI approved 3-clause 'New BSD License'
  • clack-pretend — A testing and debugging tool for Clack — Apache License, version 2.0
  • clath — Clath is single sign-on middleware for Clack. It allows basic login with OAuth1.0a, OAuth2 and OpenID Connect. — Apache License, version 2.0
  • lisp-chat — An experimental chat irc-like — MIT
  • mockingbird — A small stubbing and mocking library for Common Lisp — MIT
  • qbase64 — Fast and flexible base64 encoder and decoder — BSD-3-Clause
  • specialization-store — The specialization store system provides a new kind of function, called a store function, whose behavior depends on the types of objects passed to the function. — Simplified BSD License variant
  • template-function — A system for generating functions from a template. — Simplified BSD License variant
  • trivial-package-manager — Functions for installing packages from distro-specific package manager. — LLGPL
Updated projectsahungry-fleecealso-alsaantikarray-operationsasdf-vizaws-sign4cellsceplcepl.sdl2ceramicchirpchungacl-anacl-ansi-termcl-asynccl-bplustreecl-charmscl-conllucl-cudacl-custom-hash-tablecl-data-framecl-fadcl-formscl-gdcl-gdatacl-graphcl-kyoto-cabinetcl-liballegrocl-messagepackcl-messagepack-rpccl-mixedcl-mpg123cl-neovimcl-online-learningcl-ppcrecl-pythoncl-random-forestcl-readlinecl-rulescl-satcl-sat.glucosecl-sat.minisatcl-sdl2cl-slicecl-tesseractcl-tiledcl-unicodecl-virtualboxcl-whoclachecloser-mopclssclxcoleslawconfiguration.optionscroatoandeedsdexadoreazy-projecteventfdfare-quasiquotefemlispflexi-streamsfs-watcherfxmlgamebox-dgengamebox-frame-managergettextglsl-specharmonyhu.dwim.partial-evalhu.dwim.quasi-quoteinquisitorironcladjsonrpcjsownlasslegionlegitlet-pluslichat-protocollichat-serverliblichat-tcp-clientlisp-namespacelocal-timelog4cllquerymaidenmaxpcmcclimmetabang-bindmk-string-metricsmodularizemoiramtifnibblesoclcloookoverlordparser.common-rulesplumppostmodernproveqlotqmyndqt-libsremote-jsrtg-mathscalplsdl2kitserapeumsimple-currencysketchstumpwmterminfotriviatrivial-batterytrivial-benchmarktrivial-clipboardtrivial-updatetrivial-wsunix-optsutilities.print-itemsvarjoverbosezenekindarl.

Removed projects: de.setf.wilbur, odd-streams.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

LispjobsLisp Engineer, MIND.AI, Seoul, Korea & L.A, USA

· 86 days ago

MIND.AI (http://www.mind.ai) is working on a brand new paradigm in artificial intelligence. We are looking to innovate REAL AI. Not just natural language processing but Natural Language Reasoning. We need versatile talents who are interested in changing the world. This is not deep learning but something totally new. And we need people with open and creative minds.

 

Artificial intelligence in a brand new paradigm. NOT deep learning/neural networks, but knowledge of same a plus. Symbolic/logical AI experience will be practiced. Developing core logic based on a new theory of information. Duties will include optimization of existing code and developing new features of the AI. We require significant experience in development activities on large projects and advanced software architecture and development in a large Common Lisp codebase. To build new components and extend existing tooling to meet project needs, and implement high-quality library components and products.

 

Specifically, the Lisp engineer will:

  • Integrate with natural language libraries and packages such as the google parser to form structured data based on our proprietary model (similar to semantic networks)
  • Build data structures and algorithms to acquire new information and relate it to existing ontologies (it learns how to learn)
  • Work with computational linguist to build up knowledge bases and further define knowledge representation
  • Develop persistence model (using database and object serialization)
  • Implementation of logical reasoning based on abductive, inductive and deductive reasoning models
  • Design and develop interactive learning interface
  • Optimizing existing code base

 

BASIC QUALIFICATIONS

  • BS in Computer Science or equivalent.
  • 3+ years experience in Lisp programming (5+ years programming), working in structured systems and/or software development teams (Common Lisp, LispWorks preferred)
  • Thorough understanding of Lisp paradigms (eg, functional & object oriented)
  • Familiarity with common software architectures, design patterns, and software development life cycles

 

PREFERRED QUALIFICATIONS

  • Experience with persistence (database & object serialization)
  • Experience with GOFAI.
  • Familiarity with formal logic.
  • Working with ontologies and taxonomies
  • Static and/or Dynamic Analysis
  • Dynamic analysis, program instrumentation, and profiling

Please send your CV to reeyan@mind.ai

 


Nicolas HafnerHarmony - Confession 77

· 94 days ago

header
This is a blog entry about Shirakumo's sound system Harmony. While Harmony was released some months back, I spent the last few weeks rewriting large parts of it from the ground up, and I think it's worth it to write a short article describing how it's built and what you can do with it. So, if you're interested in doing sound processing and playback in Lisp, this is for you.

The need for Harmony arose out of me not finding any suitably powerful sound solution in Lisp. I tried doing a pure Lisp solution at first, but was not able to figure out how to make things go fast without sacrificing design. So, in the interest of performance, I first set out to write a C library that does the essential sound computations for me. This library is called libmixed.

I wanted to keep libmixed as simple and straight-forward as possible. As such, it does not do any sound file reading or writing, synthesising, or sound output to devices. Instead, it only concerns itself with the processing of sound samples. Another important factor for its design was that it should be easy to extend it with further capabilities, and to allow combining processing steps. This led to the segment pipeline model.

In libmixed, you assemble a set of segments - audio producers, consumers, or transforms - that perform the computations you need. You connect them together through buffers so that they can transparently exchange audio data. This produces a directed, acyclic graph where each vertex is a segment, and each edge is a buffer. This graph can then be serialised into a simple sequence that dictates the order in which the segments should be run.

Unfortunately, audio data comes in a variety of formats. Samples are frequently encoded as signed or unsigned integers of 8, 16, 24, or 32 bits, or in floats or doubles. The data might have multiple audio channels, and the samples can be either interleaved (LRLRLR..) or sequential (LLL..RRR..). The sample rate might be different as well. All of this can make it quite difficult to deal with the data between different audio components. Libmixed's solution to this problem is to force all the buffers to use the same sample rate, to encode samples in floats, and to only represent a single channel. Almost all segments present in a pipeline thus don't have to worry about any of these discrepancies anymore, reducing complexity tremendously. In order to allow interacting with foreign components easily, it does also include an unpacker and a packer segment.

The packer takes a set of buffers and information about the sample representation, and packs the buffer's data into a single C array, ensuring proper sample format, layout, and sample rate. The unpacker does the opposite. Thus, if for example you have a library that decodes an audio file, you most likely need to add an unpacker segment to the pipeline that decodes the audio data from the library into the proper internal format.

So, for a very simple example of taking two audio files, mixing them together, applying a reverb effect, and then playing them back, the pipeline would need to look something like this:

simple pipeline

We can get away with assigning the same two buffers for both of the unpackers here by using a special property of the basic-mixer segment. Instead of manually processing the two unpackers in our pipeline, we can set the segment property on the basic-mixer's inputs, which tells the basic-mixer to cause the processing on the segment on its own. This way, the mixer can process the segment that produces the input as it mixes it together, reducing the need to allocate individual buffers for each input to the mixer. This is one of the design decisions that still bother me a bit, but I found it necessary after discovering that I would otherwise need to allocate a huge amount of buffers if I wanted to allow playback of a lot of sources simultaneously.

As it currently stands, libmixed includes segments to mix audio by either just adding samples, or through 3D spatial positioning of the source. It also includes segments to change volume and pan, to fade in and out, to generate simple sawtooth, square, triangle, or sine waves, and to include LADSPA plugins in the pipeline. I'd like to add a bunch more effects segments to it to make it more useful for real-time sound processing, but I haven't felt the motivation to get into that yet. If you're interested in sound processing and would be willing to do this, let me know!

Basically the idea of libmixed boils down to this: there's segments that have properties, inputs, and outputs. You can write and read properties, and connect buffers to the inputs and outputs. You can then tell the segment to process a number of samples, and it will read its input buffers, and write to its output buffers. This all works over a struct that contains a bunch of function pointers to perform these actions. It is thus very easy to add further segments to libmixed, even as an outside library: simple produce a struct that holds the appropriate function pointers to the functions that do what you want. This is also how cl-mixed allows you to write segments from Lisp out.

Ascending from the C world to the C+L world then leads us to cl-mixed, which is the bindings and wrapper library for libmixed. It takes care of all the hairy low-level stuff of interacting with the C library, tracking and allocating foreign memory, and so forth. As mentioned, it also gives you a simple interface to write your own segments from Lisp. This can be really nice in order to prototype an effect.

While libmixed is a neat framework to base your sound processing around, it doesn't exactly make most of the common tasks very convenient. Usually you have some audio files that you would like to play back, and maybe apply some effects to them. This is where Harmony comes in.

Harmony takes libmixed's generic view of segments and extends it to include sources, drains, and mixers. Sources are segments with no inputs, drains are segments without outputs, and mixers take a run-time variable number of inputs. It also greatly simplifies the pipeline construction by handling the buffer allocation for you. It does this with the help of a graph library called Flow. More on that later. Harmony also gives you a sound server object that handles the mixing in the background, allowing you to focus on just adding, removing, and changing sources in your program. Finally, Harmony includes a number of pre-made sources and drains that either connect to other libraries, or present native implementations. Currently, it supports playing back MP3, WAV, FLAC, and raw buffers, and supports outputting to out123, OpenAL, WASAPI, CoreAudio, ALSA, PulseAudio, and to raw buffers.

The easiest way to get started is to use the harmony-simple system, which assembles a default pipeline for you, allowing you to just directly play some stuff.

 (ql:quickload :harmony-simple)
 (harmony-simple:initialize)
 (harmony-simple:play #p"my-cool-music.mp3" :music)
 (harmony-simple:play #p"kablammo.wav" :sfx)

Assembling your own pipeline isn't very difficult, though. It comes down to just telling Harmony how to connect the inputs and outputs between segments. An optimal buffer layout is automatically computed based on the segment's properties and the graph that you describe through the connections. To do this, Harmony uses the Flow library to describe segments in a graph. Unlike most graph libraries, Flow assigns distinct input and output ports to each vertex. These ports have properties like their arity, direction, and so on. For instance, the basic-mixer from the previously illustrated pipeline would be a vertex with an input port of arbitrary arity, and two output ports of single arity. Flow then employs a simple algorithm to assign colours to the edges in such a way that no input is connected to the same two colours, and no input and output on a vertex have the same colour unless the vertex is marked as in-place. This kind of allocation computation has cropped up in a couple of places, so I've been able to use Flow for it in other projects as well. I don't think it's important to know how the algorithm works, but in case you're interested, the source is commented and pretty short.

How to write new sources and segments, or how to assemble your own pipeline is already illustrated pretty succinctly in the documentation, so I suggest you check it out if you're interested in working on that kind of thing. Harmony is primarily geared towards use in games, where simple pipelines and immediate playback of a variety of audio sources is necessary. However, I can also see it being used in some kind of digital audio workstation, where a graphical user interface could allow you to put together segments and configure them, mapping to a libmixed pipeline underneath.

I feel like I've been rambling about tangents for a bit here, but I suppose the reason is that Harmony doesn't really do all that much. At best it just smooths over the last remaining corners that come from the libmixed C heritage and adds some useful segments to the library. All the concepts used and all the sound technology behind it lies more in libmixed's hands though, and I think I've already explained that all earlier.

So, to close off: if you're thinking of doing some kind of digital audio processing in Lisp, keep Harmony and cl-mixed in mind. I'm also more than open to feedback and suggestions, so if you have any ideas or things you'd like to add to the projects, head on over to GitHub's issues, or talk to us on Freenode/#shirakumo.


For older items, see the Planet Lisp Archives.


Last updated: 2018-02-16 00:00