#### Quicklisp news — June 2017 Quicklisp download stats

· 9 hours ago
Here are the raw download stats for the top 100 projects in Quicklisp for June:

 9081  alexandria 7797  closer-mop 7437  split-sequence 6863  cl-ppcre 6790  babel 6498  trivial-features 6303  iterate 6222  bordeaux-threads 6173  anaphora 6099  trivial-gray-streams 5522  trivial-garbage 5367  cffi 5056  flexi-streams 4911  nibbles 4729  let-plus 4702  usocket 4592  puri 4582  cl-base64 4286  trivial-backtrace 4181  chipz 4145  cl+ssl 4021  cl-fad 3959  chunga 3381  drakma 3292  named-readtables 3281  ironclad 3221  more-conditions 3153  esrap 3144  local-time 2928  utilities.print-items 2587  parse-number 2439  cl-yacc 2149  metabang-bind 2142  cl-unicode 2131  cl-interpol 2101  trivial-utf-8 2084  md5 2083  fiveam 2056  asdf-flv 1930  optima 1918  lparallel 1897  log4cl 1879  slime 1869  lift 1854  trivial-indent 1822  closure-common 1808  cxml 1795  array-utils 1746  plump 1743  uuid 1612  bt-semaphore 1561  trivial-types 1541  simple-date-time 1513  cl-clon 1472  cl-json 1429  cl-utilities 1392  architecture.hooks 1390  quri 1342  cl-containers 1340  metatilities-base 1330  cl-annot 1319  cl-syntax 1317  asdf-system-connections 1291  ieee-floats 1253  plexippus-xpath 1113  salza2 1079  trivial-mimes 1070  postmodern 1067  arnesi 1052  cl-slice 1050  fare-utils 1047  fast-io 1040  static-vectors 1027  fare-quasiquote 1015  symbol-munger 1009  djula 1007  collectors 1003  access  996  gettext  982  cl-parser-combinators  980  cl-locale  925  hunchentoot  904  cl-sqlite  896  inferior-shell  894  fare-mop  887  prove  885  rfc2388  868  cl-log  865  command-line-arguments  859  trivia  858  lisp-namespace  851  cl-colors  824  py-configparser  821  cl-markdown  821  cl-ansi-text  821  asdf-finalizers  820  dynamic-classes  819  cl-mssql  818  garbage-pools  805  cl-abnf

#### Quicklisp news — July 2017 Quicklisp dist update now available

· 11 hours ago
New projects:
• 3bgl-shader — CL-hosted CL-like DSL for generating GLSL — MIT
• cl-forms — A web forms handling library — MIT
• cl-ksuid — K-sortable unique identifiers — GPLv3
• cl-pixman — Low-level pixel manipulation. — LLGPL
• cl-yesql — Common Lisp library for using SQL. — MIT
• easy-routes — Yet another routes handling utility on top of Hunchentoot — MIT
• laap — A Common Lisp multi-threaded event loop. — MIT
• matplotlib-cl — A 2D Plotting library for Common Lisp using Matplotlib. — MIT
• oook — Some magic on the shoulders of CLSQL — MIT
• overlord — Experimental build/module system. — MIT
• semantic-spinneret — A set of Semantic UI components for use with Spinneret — MIT
• with-setf — Macros for setting a place for the duration of a scope — Unlicense
• xlsx — Basic reader for Excel files. — MIT

Removed projects: gtfl, s-dot.

gtfl and s-dot are related projects. The website hosting them has disappeared, and the author has not responded to email queries. So they are not in Quicklisp any more.

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

#### ECL News — Lisp (ECL) and QML (Qt5) on Android?

· 11 days ago
(please note that I'm assuming a Linux/64 bit platform or VirtualBox image)

Preamble: about a month ago, I was completely void of any android experience.
This is to say: using both QML (which is easy to learn) and Common Lisp (which I assume you already know) to develop android apps is not a difficult task at all, as you'll see.

So, if you are like me just a month ago, there are no excuses not to write your own, first android app using this new "EQL5-Android" project!

We will build a small game (Sokoban), which uses Lisp for the program logic, and QML for the UI, and build an APK for the android platform.

Being the author of that very first attempt of integrating Lisp and Qt4 (see lisp-cffi-qt4), what I would like to accomplish is providing you with a ca. 3 MB download, which can be tried out instantly.

10 years ago, the lisp-cffi-qt4.zip (a runnable win32 binary version), was a 3 MB download, including both ECL and Qt4 (UPX compressed, but still).
10 years later, this time on android, what download size is to be expected?
We will see...

Since all the documentation needed for preparing the development environment is already covered in the "EQL5-Android" project itself, I will give only the link here:

So, I'm assuming that you already have everything installed and set up (Qt 5.7.1, Android NDK 10e, Android SDK, Java JDK, and obviously the EQL5-Android sources from gitlab), in order to build android packages (APK files).

(EQL5 itself, the desktop version, is not strictly needed to follow this example; but for developing your own apps, you will obviously need it; even here it's very helpful for testing and debugging, if something doesn't work as expected.)

If you already know the process of building EQL5 apps for the desktop, you will find that building (cross-compiling) to android is very similar, with only a few more steps involved.

Since we use an example of EQL5-Android itself, everything has already been set up. But I want to remember the steps that are not obvious, if you are not familiar with Qt and EQL:

• you need to add all your external resources, like QML files, PNG files etc. to a Qt resource file (ending .qrc); this will integrate them (compressed) directly into the executable
• you need to add all your Lisp files, in exact order of loading, to make.lisp (in a future version of EQL5, I will try to integrate this step with ASDF)

And that's it, basically (except the app name, which needs to be adapted to the *.qrc file name, to your *.pro file name and contents (see TARGET and RESOURCES), and to the contents of the third script 3-build-apk.sh (see *.json file name).

Everything else will stay the same for every project.

Now I want to call your attention on the huge advantage of using Qt for your android apps: you can first build a desktop exe, with the exact same sources, and try/debug it. If the desktop version works, the android app will generally work too (the only things that may need adaption are e.g. font sizes and similar).

So, let's get practical! In the EQL5-Android sources, switch to 'examples/sokoban/'.

Building a desktop exe would be this 3 liner:


$eql5 make-desktop.lisp$ qmake sokoban_desktop.pro
$make  (To test if all resources have really been included in the sokoban_desktop executable, you need to move it to a different directory, and launch it from there.) Here's a screenshot of our app running on the desktop: But now let's do the cross-compile dance! First let's copy the needed shared libraries to the 'android-build/' directory. Just run script number 1: $ ./1-copy-libs.sh


This step needs only be done once for every new project. It will copy the cross-compiled ECL and EQL5 libs into our android build directory.

The next steps are very similar to a desktop build:


$ecl-android -shell make.lisp$ qmake-android sokoban.pro
$make  Since cross-compiling requires a special "host ECL", which needs to match the target platform (that is, 32 bit, no double floats), we would be in trouble with cross-compiling EQL5 code: we certainly don't want a seperate EQL5 32 bit version, only for cross-compiling... But there is a solution to this (see 'utils/EQL5-symbols.lisp' in sources): for cross-compiling -- which is the job of our "host ECL" -- we pretend to be the eql5 executable, by defining all packages and symbols, defining all EQL5 macros (otherwise we can't compile), and simply defining dummy-functions for all EQL5 functions, so the compiler will not complain. So, loading 'utils/EQL5-symbols.lisp' in our host ECL will be sufficient for cross-compiling EQL5 code. If you are interested in how the cross-compile environment is set up, please see: utils/cross-compile.lisp (thanks to Sylvain Ageneau, who wrote the original version; this is a simplified version not depending on ASDF; the latter will be integrated in a future version) So, the above 3 liner will build the shared library of our app, ready to be included in the android build. To copy it where the android build expects it, use script number 2: $ ./2-install-lib.sh


The last step, which will build our APK file, is a verbose one (for eventual debugging), and a little time consuming: it will create the whole package structure, and compile the whole thing into an APK file, ready to be installed on an android device.

There is this great tool (courtesy Qt) called androiddeployqt, which automates the whole and complex process of creating an APK file, with all the needed options already set in our 3rd script:


$./3-build-apk.sh  Here the contents of the above script, where you can see all the selected options: $ ~/Qt5.7.1/5.7/android_armv7/bin/androiddeployqt \
--input android-libsokoban.so-deployment-settings.json \
--output android-build \
--deployment ministro \
--verbose


If it will tell you BUILD SUCCESSFUL, then you'll find the APK file (ready for deployment) in 'android-build/build/outputs/apk/'.

The last step is copying over the APK file to your android device, and install / run it. Normally you're not allowed to do this, because it requires special developer settings (please search the web for instructions how to enable them, as they are different for every android version).

After connecting via USB and copying the APK file over to your device, just tap it from there. This will ask for installing, and then for opening the app.

Here's a screenshot of the sokoban app running on a tablet:

Above you see the splash screen, because startup will take a few seconds.

Below the game:

After seeing the result, I'd like to consider a few QML and Lisp snippets.

QML is easy to learn. You just need to declare what you want (and it will do the how behind the scenes).
Let's see this snippet for defining and placing our arrow buttons:


// container for arrow buttons
Item {
id: arrows
width: up.width * 3
height: up.height * 3
anchors.margins: 10
anchors.right: parent.right
anchors.verticalCenter: parent.verticalCenter

Ext.Button {
id: up
objectName: "up"
source: "img/up.png"
anchors.horizontalCenter: parent.horizontalCenter
}

Ext.Button {
objectName: "left"
source: "img/left.png"
anchors.verticalCenter: parent.verticalCenter
}

Ext.Button {
objectName: "right"
source: "img/right.png"
anchors.verticalCenter: parent.verticalCenter
anchors.right: parent.right
}

Ext.Button {
objectName: "down"
source: "img/down.png"
anchors.horizontalCenter: parent.horizontalCenter
anchors.bottom: parent.bottom
}
}


So, we define an Item as container for the buttons, giving it the width (up.width * 3) and height (up.height * 3), according to the button sizes: this may be any calculation or function call, and may refer to any item of the file, referred to by its id.

For placing the container itself, and the single arrow buttons, we just need to define anchors, which can be relative to other items (here: the parent item).

The Ext.Button is a user defined item class, which can be found in 'qml/ext/Button.qml. That is, the whole directory 'ext/' is imported as Ext:


import "ext/" as Ext


This means that every file in 'ext/' is now a new QML class, which can be referred to via Ext (like a namespace).

The definition of our image button class (see 'qml/ext/Button.qml') is:


import QtQuick 2.0

Image {
signal pressed()

MouseArea {
anchors.fill: parent
onPressed: { parent.pressed() }
}
}


So, we don't need a real button, but only a clickable image: adding a mouse area will allow us to capture mouse events; this event is then passed to the parent (that is, the Image class), where a Qt signal will be emitted: this will allow us to connect to it from Lisp:


(defun connect ()
(macrolet ((pressed (item function)
(qconnect (find-quick-item ,item) "pressed()"
(lambda () ,function))))
(pressed "up"       (sokoban:move :north *maze*))
(pressed "down"     (sokoban:move :south *maze*))
(pressed "left"     (sokoban:move :west *maze*))
(pressed "right"    (sokoban:move :east *maze*))
(pressed "previous" (change-level :previous))
(pressed "next"     (change-level :next))
(pressed "undo"     (undo))
(pressed "restart"  (reset-maze))
(pressed "solve"    (solve))))


If you already played the game finishing a level, you will have noticed that there are 2 animations (rotation of the player, wiggling of all boxes) which run queued.
This is a little tricky to do, but with the help of a Lisp macro, we only need these lines in Lisp (being queued a macro):


(defun final-animation ()
(queued (qml-set "rotate_player" "running" t)
(qml-set-all "wiggle_box" "running" t)))


Please see the sources for all the details. And this would not be possible without a Lisp function called from QML for notifying us whenever an animation state changes, see 'qml/ext/RotationAnimation.qml':


import QtQuick 2.0
import EQL5 1.0

RotationAnimation {
onRunningChanged: Lisp.call("qsoko:animation-change", running)
}


What I left out to explain is the dynamic (at run time) creation of QML items (see 'qml/items/*' and 'lisp/sokoban.lisp'); let's just say that this is left as an exercise to the reader, as all sources will patiently stay there to be read...

Well. But I still didn't answer the initial question: how big of a download is to be expected, 10 years later?

Since our APK file uses the ministro service for automatically installing the needed Qt libraries at the first launch of the app, it will only need to include both the ECL and EQL5 libraries (you can therefore use different ECL and EQL5 versions for every app you deploy).

So, to finally answer the question: our download will be ca. 3.5 MB (instead of 3 MB, 10 years ago, although we obviously compare apples and oranges here).

Seems acceptable.

And since I promised you to test it instantly (if you own a device with ARM processor), here you are:

sokoban.apk

Enjoy!

#### Quicklisp news — June 2017 Quicklisp dist update now available

· 24 days ago
New projects:
• cepl.spaces — Adds abstractions over vector spaces to CEPL — BSD 2 Clause
• cl-cpus — Get number of CPUs — ISC
• cl-diskspace — List disks, get disk total/free/usable space information. — ISC
• cl-fixtures — A simple library to create and use parameterized fixtures — MIT
• cl-fluent-logger — A structured logger for Fluentd — BSD 3-Clause
• cl-mixed — Bindings to libmixed, a sound mixing and processing library. — Artistic
• cl-rail — Unspecified — Unspecified
• cl-random-forest — Random Forest and Global Refinement for Common Lisp — MIT Licence
• cl-soloud — Bindings to SoLoud, a multi-platform, multi-backend, minimal dependencies sound mixing and output library — Artistic
• cl-ssdb — SSDB client for Common Lisp. — MIT
• cl-threadpool — Implementation of a thread pool — MIT
• deploy — Tools to aid in the deployment of a fully standalone application. — Artistic
• doplus — DO+ (doplus) is a high-level, extensible iteration construct for Common Lisp with a reasonably simple implementation, which in particular does not use a code walker. — GPLv3
• flow — A flowchart and generalised graph library. — Artistic
• gtk-tagged-streams — Text I/O using streams for GTK text buffers, including tags for styling. — BSD Simplified (2-clause)
• harmony — A common lisp sound server and sound processing library. — Artistic
• hu.dwim.zlib — Common Lisp FFI wrapper for zlib, aka http://zlib.net/ — BSD or Bugroff
• modest-config — A modest config file loader library — MIT
• nineveh — A library of common gpu functions — BSD 2 Clause
• papyrus — A Literate Programming Tool — MIT
• parseq — A parser for sequences such as strings, lists, vectors as well as trees. — GPLv2
• physical-quantities — Use lisp numbers for physical quantities with unit and error. — GPLv2
• roan — A library to support change ringing applications, including methods library support — MIT
• sanitized-params — Sanitizer for parameters — BSD 2-Clause
• sdl2-game-controller-db — Lets you easily load the lovely sdl2 gamecontroller db into cl-sdl2 — BSD 3 Clause
• trivial-battery — Getting the battery information — BSD 2-Clause
• trivial-swank — swank server communications — BSD simplified
• trivial-wish — Create 'wishes' which are requests to compute something later — BSD 2-clause

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

#### Paul Khuong — Chubanov's Projection Methods for 0/1 Programming

· 38 days ago

I’ve long felt that compilers (and symbolic processing in general) would benefit from embedding integer programming solvers. However, I was never comfortable with actually doing so for a production system that others would have to run: industrial strength integer linear programming solvers are large systems with complex runtime behaviour, and that’s not the kind of black box you want to impose on people who just want to build their project. (That’s also true of SAT solvers, though, so maybe embedding complicated black boxes is the new normal?)

However, if we had something simple enough to implement natively in the compiler, we could hope for the maintainers to understand what the ILP solver is doing. This seems realistic to me mostly because the generic complexity tends to lie in the continuous optimisation part. Branching, bound propagation, etc. is basic, sometimes domain specific, combinatorial logic; cut generation is probably the most prominent exception, and even that tends to be fairly combinatorial. (Maybe that’s why we seem to be growing comfortable with SAT solvers: no scary analysis.) So, for the past couple years, I’ve been looking for simple enough specialised solvers I could use in branch-and-bound for large 0/1 ILP.

Some stuff with augmented lagrangians and specialised methods for box-constrained QP almost panned out, but nested optimisation sucks when the inner solver is approximate: you never know if you should be more precise in the lower level or if you should aim for more outer iterations.

A subroutine in Chubanov’s polynomial-time linear programming algorithm [PDF] (related journal version) seems promising, especially since it doesn’t suffer from the numerical issues inherent to log barriers.

## Chubanov’s subroutine in branch-and-bound

Chubanov’s “Basic Subroutine” accepts a problem of the form $$Ax = 0$$, $$x > 0$$, and either:

1. returns a solution;
2. returns a non-empty subset of variables that must be 0 in any feasible solution;
3. returns a non-empty subset of variables $$x\sb{i}$$ that always satisfy $$x\sb{i} \leq u$$ in feasible solutions with $$x\sp{\star} \in [0, 1]$$, for some constant $$u < 1$$ (Chubanov sets $$u = \frac{1}{2}$$).

The class of homogeneous problems seems useless (never mind the nondeterministic return value), but we can convert “regular” 0/1 problems to that form with a bit of algebra.

Let’s start with $$Ax = b$$, $$0 \leq x \leq 1$$, we can reformulate that in the homogeneous form:

$Ax - by = 0,$ $x + s - \mathbf{1}y = 0,$ $x, s, y \geq 0.$

Any solution to the original problem in $$[0, 1]$$ may be translated to the homogeneous form (let $$y = 1$$ and $$s = 1 - x$$). Crucially, any 0/1 (binary) solution to the original problem is still 0/1 in the homogeneous form. In the other direction, any solution with $$y > 0$$ may be converted to the box-constrained problem by dividing everything by $$y$$.

If we try to solve the homogenous form with Chubanov’s subroutine, we may get:

1. a strictly positive (for all elements) solution. In that case, $$y > 0$$ and we can recover a solution to the box-constrained problem.
2. a subset of variables that must be 0 in any feasible solution. If that subset includes $$y$$, the box-constrained problem is infeasible. Otherwise, we can take out the variables and try again.
3. a subset of variables that are always strictly less than 1 in feasible solutions. We exploit the fact that we only really care about 0/1 solutions (to the original problem or to the homogenous reformulation) to also fix these variables to 0; if the subset includes $$y$$, the 0/1 problem is infeasible.

As soon as we invoke the third case to recursively solve a smaller problem, we end up solving an interesting ill-specified relaxation of the initial 0/1 linear program: it’s still a valid relaxation of the binary problem, but is stricter than the usual box linear relaxation.

That’s more than enough to drive a branch-and-bound process. In practice, branch-and-bound is much more about proving the (near-) optimality of an existing solution than coming up with strong feasible solutions. That’s why the fact that the subroutine “only” solves feasibility isn’t a blocker. We only need to prove the absence of 0/1 solutions (much) better than the incumbent solution, and that’s a constraint on the objective value. If we get such a proof, we can prune away that whole search subtree; if we don’t, the subroutine might have fixed some variables 0 or 1 (always useful), and we definitely have a fractional solution. That solution to the relaxation could be useful for primal heuristics, and will definitely be used for branching (solving the natural LP relaxation of constraint satisfaction problems ends up performing basic propagation for us, so we get some domain propagation for free by only branching on variables with fractional values).

At the root, if we don’t have any primal solution yet, we should probably run some binary search on the objective value at the root node and feed the resulting fractional solutions to rounding heuristics. However, we can’t use the variables fixed by the subroutine: until we have a feasible binary solution with objective value $$Z\sp{\star}$$, we can’t assume that we’re only interested in binary solutions with object value $$Z < Z\sp{\star}$$, so the subroutine might fix some variables simply because there is no 0/1 solution that satisfy $$Z < k$$ (case 3 is vacuously valid if there is no 0/1 solution to the homogeneous problem).

That suffices to convince me of correctness. I still have to understand Chubanov’s “Basic Subroutine.”

## Understanding the basic subroutine

This note by Cornelis/Kees Roos helped me understand what makes the subroutine tick.

The basic procedure updates a dual vector $$y$$ (not the same $$y$$ as the one I had in the reformulation... sorry) such that $$y \geq 0$$ and $$|y|_1 = 1$$, and constantly derives from the dual vector a tentative solution $$z = P\sb{A}y$$, where $$P\sb{A}$$ projects (orthogonally) in the null space of the homogeneous constraint matrix $$A$$ (the tentative solution is $$x$$ in Chubanov’s paper).

At any time, if $$z > 0$$, we have a solution to the homogenous system.

If $$z = P\sb{A}y = 0$$, we can exploit the fact that, for any feasible solution $$x$$, $$x = P\sb{A}x$$: any feasible solution is alrady in the null space of $$A$$. We have

$x\sp{\top}y = x\sp{\top}P\sb{A}y = x\sp{\top}\mathbf{0} = 0$

(the projection matrix is symmetric). The solution $$x$$ is strictly positive and $$y$$ is non-negative, so this must mean that, for every component of $$y\sb{k} > 0$$, $$x\sb{k} = 0$$. There is at least one such component since $$|y|_1 = 1$$.

The last condition is how we bound the number of iterations. For any feasible solution $$x$$ and any component $$j$$,

$y\sb{j}x\sb{j} \leq y\sp{\top}x = y\sp{\top}P\sb{A}x \leq |x| |P\sb{A}y| \leq \sqrt{n} |z|.$

Let’s say the max element of $$y$$, $$y\sb{j} \geq 2 \sqrt{n}|z|$$. In that case, we have $x\sb{j} \leq \frac{\sqrt{n}|z|}{y\sb{j}} \leq \frac{1}{2}.$

Chubanov uses this criterion, along with a potential argument on $$|z|$$, to bound the number of iterations. However, we can apply the result at any iteration where we find that $$x\sp{\top}z < y\sb{j}$$: any such $$x\sb{j} = 0$$ in binary solutions. In general, we may upper bound the left-hand side with $$x\sp{\top}z \leq |x||z| \leq \sqrt{n}|z|$$, but we can always exploit the structure of the problem to have a tighter bound (e.g., by encoding clique constraints $$x\sb{1} + x\sb{2} + … = 1$$ directly in the homogeneous reformulation).

The rest is mostly applying lines 9-12 of the basic procedure in Kees’s note. Find the set $$K$$ of all indices such that $$\forall k\in K,\ z\sb{k} \leq 0$$ (Kees’s criterion is more relaxed, but that’s what he uses in experiments), project the vector $$\frac{1}{|K|} \sum\sb{k\in K}e\sb{k}$$ in the null space of $$A$$ to obtain $$p\sb{K}$$, and update $$y$$ and $$z$$.

The potential argument here is that after updating $$z$$, $$\frac{1}{|z|\sp{2}}$$ has increased by at least $$|K| > 1$$. We also know that $$\max y \geq \frac{1}{n}$$, so we can fix a variable to 0 as soon as $$\sqrt{n} |z| < \frac{1}{n}$$, or, equivalently, $$\frac{1}{|z|} > n\sp{3/2}$$. We need to increment $$\frac{1}{|z|\sp{2}}$$ to at most $$n\sp{3}$$, so we will go through at most $$1 + n\sp{3})$$ iterations of the basic procedure before it terminates; if the set $$K$$ includes more than one coordinate, we should need fewer iterations to reach the same limit.

Chubanov shows how to embed the basic procedure in a basic iterative method to solve binary LPs. The interesting bit is that we reuse the dual vector $$y$$ as much as we can in order to bound the total number of iterations in the basic procedure. We fix at least one variable to $$0$$ after a call to the basic procedure that does not yield a fractional solution; there are thus at most $$n$$ such calls.

## Next step

In contrast to regular numerical algorithms, the number of iterations and calls so far have all had exact (non asymptotic) bounds. The asymptotics hide in the projection step, where we average elementary unit vectors and project them in the null space of $$A$$. We know there will be few (at most $$n$$) calls to the basic procedure, so we can expend a lot of time on matrix factorisation. In fact, Chubanov outright computes the projection matrix in $$\mathcal{O}(n\sp{3})$$ time to get his complexity bound of $$\mathcal{O}(n\sp{4})$$. In practice, this approach is likely to fill a lot of zeros in, and thus run out of RAM.

I’d start with the sparse projection code in SuiteSparse. The direct sparse solver spends less time on precomputation than fully building the projection matrix (good if we don’t expect to always hit the worst case iteration bound), and should preserve sparsity (good for memory usage). In return, computing projections is slower, which brings the worst-case complexity to something like $$\mathcal{O}(n\sp{5})$$, but that can be parallelised, should be more proportional to the number of non-zeros in the constraint matrix ($$\mathcal{O}(n)$$ in practice), and may even exploit sparsity in the right-hand side. Moreover, we can hope that the $$n\sp{3}$$ iteration bound is pessimistic; that certainly seems to be the case for most experiments with random matrices.

The worst-case complexity, between $$\mathcal{O}(n\sp{4})$$ and $$\mathcal{O}(n\sp{5})$$, doesn’t compare that well to interior point methods ($$\mathcal{O}(\sqrt{n})$$ sparse linear solutions). However, that’s all worst-case (even for IPMs). We also have different goals when embedding linear programming solvers in branch-and-bound methods. Warm starts and the ability to find solution close to their bounds are key to efficient branch-and-bound; that’s why we still use simplex methods in such methods. Chubanov’s projection routine seems like it might come close to the simplex’s good fit in branch-and-bound, while improving efficiency and parallelisability on large LPs.

#### McCLIM — Progress report #8

· 44 days ago

Dear Community,

During this iteration we had many valuable contributions. It's a joy to see how McCLIM gains more mindshare and people are willing to put their time and wallet in fixing issues and writing applications in McCLIM.

Some highlights for this iteration:

• many Listener fixes,
• major tab layout extension refactor,
• new extension for Bezier curves (based on older internal implementation),
• interactor improvements,
• layout improvements,
• fixes for redisplay and transformations,
• documentation cleanups,
• cleanup of the issues (closed the obsolete and fixed ones).

All McCLIM bounties (both active and already solved) may be found here.

Bounties solved this iteration:

• [$200] Interactor CLI prompt print problem Fixed by Gabriel Laddel. Waiting for a pull request and a bounty claim. • [$200] Problem with coordinate swizzling (probably).

Fixed by Alessandro Serra and merged. Waiting for a bounty claim.

• [$100] menu for input-prompt in lisp-listener does not disappear after use. Fixed by Alessandro Serra and merged. Waiting for a bounty claim. Active bounties: • [$150] When flowing text in a FORMATTING-TABLE, the pane size is used instead of the column size.

• [$150] clx: input: english layout. (someone already works on it). • [$100] Caps lock affects non-alphabetic keys. (new)

• [$100] Add PDF file generation (PDF backend). (new) • [$100] Keystroke accelerators may shadow keys even if inactive. (new)

Our current financial status is $1,429 for bounties and$267 recurring monthly contributions from the supporters (thanks!).

Suggestions as to which other issues should have a bounty on them are appreciated and welcome. Please note that Bountysource has a functionality "Suggest an Issue" which may be found on the bounties page. If you feel that you may solve some problem, but there is no bounty on it, feel free to suggest it too.

If you have any questions, doubts or suggestions - please contact me either by email (daniel@turtleware.eu) or on IRC (my nick is jackdaniel).

Sincerely yours,
Daniel Kochmański

#### ABCL Dev — ABCL 1.5.0

· 44 days ago
We are pleased to announce that we have released the Sixth Edition of the Armed Bear Common Lisp implementation as ABCL 1.5.0.

Due to the lack of a publicly available Java 5 implementation, with this release we drop support for that platform, and henceforth support running on Java 6, Java 7, and Java 8.

In addition to consolidating eight months of bug fixes, the following notable features are now also present in the implementation.

The compiler now records more complete debugging information on the SYS:SOURCE symbol property.

ABCL-INTROSPECT offers improved inspection of backtraces to the point that local variables may be inspected in Lisp debug frames.  Patches to SLIME to use this feature are in the process of being merged into the upstream repository. The OBJECTWEB system allows the user to disassemble JVM bytecode via dependencies managed by Maven.

JSS now contains a syntax for accessing Java static and member fields.

For declaring dependencies on Java artifacts ABCL-ASDF,  we have added an experimental syntax to address JRE/JDK artifacts via the ASDF:JDK-JAR class, as well as the ability to more finely control Maven dependencies with the  ASDF:MVN-MODULE class.

A complete list of changes may be viewed in the source repository.

Binaries for this release may either be downloaded directly from http://abcl.org/releases/1.5.0retrieved from the distributed Maven POM graph, or run from Docker via
docker run -it easye/abcl:1.5.0
Many thanks to all who have contributed to nurturing the Bear's execution of conforming ANSI Common Lisp on the Java Virtual Machine.

#### Nicolas Hafner — Trial "Study Session" Next Saturday, 17th of June

· 45 days ago

Next Saturday, the 17th of June, there is going to be a live "study session" about Shirakumo's game engine Trial. The intention of this event is to get people acquainted with the internal structure and features of Trial, so that they may work on it by themselves, and thus help improve it in the future.

The study session is going to be held on my regular stream from 10:00-16:00 UTC. That's 12:00-18:00 CEST, 6:00-12:00 EST, 19:00-3:00 JST. We might stop earlier if people run out of steam or there isn't as much to cover as I anticipated.

Participants that want to actively ask questions and follow along are encouraged to download and set up Mumble for voice communication, and to download the source code of Trial and its dependencies ahead of time, so that they may follow along on their own screens, rather than being subject to the stream delay.

You are expected to have a decent understanding of Common Lisp, as I won't have time to teach you that. While an understanding of modern OpenGL is advantageous, it won't be required. I'll try to explain OpenGL as much as possible or necessary as we go along. Likely there will be another stream at a later point, where modern OpenGL is explained from the ground up.

Hope to see you there!

#### Lispjobs — Lisp programmer, Keepit.com, Lviv, Ukraine

· 46 days ago

Keepit.com is expanding and we are looking for candidates to join our strong cloud software development team.

We use Lisp systems to implement business logic operations such as resource accounting, data mining, billing, automated operations (AI), full system test suites and more. We wish to extend our team with another skilled colleague, to work with us in this area.

We expect a strong technical mind coupled with a visionary outlook and ability to work closely together with the entire team, from architecture through development, QA and ultimately production.

Keepit can offer a passionate working environment with bright minded colleagues bringing out the next generations of data consolidation, data security and data sharing solutions coupled with the ability to bring-out real value of the data of our customers.

Required skills:

Good understanding of Common Lisp;

Good algorithmic knowledge

Will be a plus:

Good understanding of HTTP, REST and XML;

Shell scripting and ability to read/write Makefiles;

Experience with Emacs, slime/swank and SBCL;

Some knowledge of C++

Desired communication skills:

Team player, but with high degree of ability to work independently;

Upper-intermediary level of English for daily written and video calls communication;

Person who is gladly willing to share knowledge within the team;

Willingness to take initiative and suggest alternative ways of solving problems;

Quick learner, self starter.

Contact information:

e-mail: mhn@keepit.com, Skype: maryna.hnatyk

#### ABCL Dev — ABCL 1.5.0-rc-0 draft of upcoming User Manual

· 48 days ago
An unsigned ABCL 1.5.0-rc-0 release now available to test the distributions mechanisms for the upcoming ABCL-1.5.0 release.

http://abcl.org/trac/milestone/1.5.0

http://abcl.org/releases/1.5.0/

Draft of upcoming User Manual to which corrections are solicited.

http://abcl.org/releases/1.5.0/abcl-1.5.0-rc-0.pdf

#### Paul Khuong — Relaxed Revocable Locks: Mutual Exclusion Modulo Preemption

· 50 days ago

Update: there’s a way to detect “running” status even across cores. It’s not pretty. Search for /proc/sched_debug.

The hard part about locking tends not to be the locking itself, but preemption. For example, if you structure a memory allocator like jemalloc, you want as few arenas as possible; one per CPU would be ideal, while one per thread would affect fragmentation and make some operations scale linearly with the number of threads. However, you don’t want to get stuck when a thread is preempted while it owns an arena. The usual fix is two-pronged:

1. have a few arenas per CPU (e.g., jemalloc defaults to 4x the number of CPUs);
2. hold exclusive ownership for short critical sections.

The first tweak isn’t that bad; scaling the number of arenas, stats regions, etc. with the number of CPUs is better than scaling with the number of threads. The second one really hurts performance: each allocation must acquire a lock with an interlocked write. Even if the arena is (mostly) CPU-local, the atomic wrecks your pipeline.

It would be nice to have locks that a thread can acquire once per scheduling quantum, and benefit from ownership until the thread is scheduled out. We could then have a few arenas per CPU (if only to handle migration), but amortise lock acquisition over the timeslice.

That’s not a new idea. Dice and Garthwaite described this exact application in 2002 (PDF) and refer to older work for uniprocessors. However, I think the best exposition of the idea is Harris and Fraser’s Revocable locks for non-blocking programming, published in 2005 (PDF). Harris and Fraser want revocable locks for non-blocking multiwriter code; our problem is easier, but only marginally so. Although the history of revocable locks is pretty Solaris-centric, Linux is catching up. Google, Facebook, and EfficiOS (LTTng) have been pushing for restartable sequences, which is essentially OS support for sections that are revoked on context switches. Facebook even has a pure userspace implementation with Rseq; they report good results for jemalloc.

Facebook’s Rseq implements almost exactly what I described above, for the exact same reason (speeding up a memory allocator or replacing miscellaneous per-thread structs with ~per-CPU data). However, they’re trying to port a kernel idiom directly to userspace: restartable sequences implement strict per-CPU data. With kernel supports, that makes sense. Without such support though, strict per-CPU data incurs a lot of extra complexity when a thread migrates to a new CPU: Rseq needs an asymmetric fence to ensure that the evicted thread observes its eviction and publishes any write it performed before being evicted.

I’m not sure that’s the best fit for userspace. We can avoid a lot of complexity by instead dynamically allocating a few arenas (exclusive data) per CPU and assuming only a few threads at a time will be migrated while owning arenas.

Here’s the relaxed revocable locks interface I propose:

1. Each thread has a thread state struct. That state struct has:

• a generation counter;
• a canceled counter (generation - 1 or equal to generation);
• a signaled counter (generation - 1 or equal to generation);
• an acknowledged cancel counter (generation - 1 or equal to generation);
• an “in critical section” flag (pointer to a revocable lock).
2. Locks are owned by a pair of thread state struct and generation counter (ideally packed in one word, but two words are doable). Threads acquire locks with normal compare-and-swap, but may bulk revoke every lock they own by advancing their generation counter.

3. Threads may execute any number of conditional stores per lock acquisition. Lock acquisition returns an ownership descriptor (pair of thread state struct and generation counter), and rlock_store_64(descriptor, lock, dst, value) stores value in dst if the descriptor still owns the lock and the ownership has not been cancelled.

4. Threads do not have to release lock ownership to let others make progress: any thread may attempt to cancel another thread’s ownership of a lock. After rlock_owner_cancel(descriptor, lock) returns successfully, the victim will not execute a conditional store under the notion that it still owns lock with descriptor.

The only difference from Rseq is that rlock_owner_cancel may fail. In practice, it will only fail if a thread on CPU A attempts to cancel ownership for a thread that’s currently running on another CPU B. That could happen after migration, but also when an administrative task iterates through every (pseudo-)per-CPU struct without changing its CPU mask. Being able to iterate through all available pseudo-per-CPU data without migrating to the CPU is big win for slow paths; another advantage of not assuming strict per-CPU affinity.

Rather than failing on migration, Rseq issues an asymmetric fence to ensure both its writes and the victim’s writes are visible. At best, that’s implemented with inter-processor interrupts (IPIs) that scale linearly with the number of CPUs... for a point-to-point signal. I oversubscribed a server with 2-4x more threads than CPUs, and thread migrations happened at a constant frequency per CPU. Incurring O(#CPU) IPIs for every migration makes the per-CPU overhead of Rseq linear with the number of CPUs (cores) in the system. I’m also wary of the high rate of code self/cross -modification in Rseq: mprotect incurs IPIs when downgrading permissions, so Rseq must leave some code page with writes enabled. These downsides (potential for IPI storm and lack of W^X) aren’t unique to Rseq. I think they’re inherent to emulating unpreempted per-CPU data in userspace without explicit OS support.

When rlock_owner_cancel fails, I expect callers to iterate down the list of pseudo-per-CPU structs associated with the CPU and eventually append a new struct to that list. In theory, we could end up with as many structs in that list as the peak number of thread on that CPU; in practice, it should be a small constant since rlock_owner_cancel only fails after thread migration.

## Code for Rlock (Linux/x86-64 only)

I dumped my code as a gist, but it is definitely hard to follow, so I’ll try to explain it here.

Bitpacked ownership records must include the address of the owner struct and a sequence counter. Ideally, we’d preallocate some address space and only need 20-30 bits to encode the address. For now, I’m sticking to 64 byte aligned allocations and rely on x86-64’s 48 bits of address space. With 64 bit owner/sequence records, an rlock is a 64 bit spinlock.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11  typedef union rlock_owner_seq { uint64_t bits; struct { uint64_t sequence:22; uint64_t address:42; }; } rlock_owner_seq_t; struct rlock { rlock_owner_seq_t owner; }; 

In the easy case, acquiring an rlock means:

1. reading the owner field (with a 64 bit load);
2. confirming that the owner has advanced its sequence;
3. CASing in our own rlock_owner_seq_t.

But first, we must make canonicalise our own owner struct.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  struct rlock_owner { /* SPMC. */ rlock_owner_seq_t seq; /* MPMC: Asked to cancel up to here (inclusively). */ uint32_t cancel_sequence; /* MPMC: Signaled to cancel up to here (inclusively). */ uint32_t signal_sequence; /* SPMC: Acked cancel ask up to here (inclusively). */ uint32_t acked_sequence; /* Private: forcibly release lock after too many ops. */ uint32_t op_count; /* SPMC */ pid_t tid; /* SPMC; "in critical section" flag. */ struct rlock *critical_section; } __attribute__((__aligned__(64))); 

Rlock lazily allocates an rlock_owner per thread and stores it in TLS; we can’t free that memory without some safe memory reclamation scheme (and I’d like to use Rlock to implement SMR), but it is possible to use a type-stable freelist.

Regardless of the allocation/reuse strategy, canonicalising an rlock means making sure we observe any cancellation request.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  static inline bool update_self(struct rlock_owner *self) { rlock_owner_seq_t snapshot = { .bits = self->seq.bits }; uint32_t cancel_sequence = ck_pr_load_32(&self->cancel_sequence); /* We've been asked to cancel if cancel_sequence == seq.sequence. */ if (LIKELY(self->seq.sequence != cancel_sequence)) { return false; } ck_pr_fas_32(&self->cancel_sequence, snapshot.sequence); ck_pr_fas_32(&self->signal_sequence, snapshot.sequence); ck_pr_fas_32(&self->acked_sequence, snapshot.sequence); snapshot.sequence++; ck_pr_fas_64(&self->seq.bits, snapshot.bits); return true; } static inline struct rlock_owner * get_self(void) { struct rlock_owner *self; self = rlock_self; if (UNLIKELY(self == NULL)) { self = allocate_self(); } update_self(self); return self; } 

To acquire a lock we observe the current owner, attempt to cancel its ownership, and (if we did cancel ownership) CAS in our own owner/sequence descriptor.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  rlock_owner_seq_t rlock_lock(struct rlock *lock) { struct rlock_owner *self = get_self(); rlock_owner_seq_t seq, snapshot; /* Load the current owner. */ snapshot.bits = ck_pr_load_64(&lock->owner.bits); /* Easy case: we already own the lock. */ if (snapshot.bits == self->seq.bits) { return self->seq; } for (;;) { seq.bits = self->seq.bits; /* Make sure the current owner isn't anymore. */ if (!rlock_owner_cancel(snapshot, lock)) { /* Couldn't; return 0. */ seq.bits = 0; return seq; } /* Replace the old owner with ourself. */ if (ck_pr_cas_64_value(&lock->owner.bits, snapshot.bits, seq.bits, &snapshot.bits)) { /* Success! */ break; } /* CAS failed. snapshot.bits has the new owner. */ /* Eagerly observe any cancellation. */ update_self(self); /* CAS failed. Spin a bit. */ ck_pr_stall(); } return seq; } 

Most of the trickiness hides in rlock_owner_cancel.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80  bool rlock_owner_cancel(union rlock_owner_seq owner, struct rlock *evict) { struct rlock_owner *victim = (void *)((uintptr_t)owner.address * 64); rlock_owner_seq_t snapshot; uint32_t acked; uint32_t sequence = owner.sequence; assert(evict != NULL); /* Easy case: no owner. */ if (victim == NULL) { return true; } snapshot.bits = ck_pr_load_64(&victim->seq.bits); if (snapshot.bits != owner.bits) { /* The victim has already moved on to a new sequence value. */ return true; } acked = ck_pr_load_32(&victim->acked_sequence); if (mod_lte(sequence, acked)) { /* We already have acked cancellation >= sequence. */ return true; } /* Advance the victim's cancel counter to sequence. */ if (!ensure_cancel_sequence(victim, sequence)) { /* Already advanced; nothing to do! */ return true; } if (victim_running(victim)) { /* The victim isn't obviously scheduled out; /* See if we must ensure visibility of our cancel. */ snapshot.bits = ck_pr_load_64(&victim->seq.bits); if (snapshot.bits == owner.bits) { ensure_signal_sequence(victim, sequence); } return false; } if (ck_pr_load_ptr(&victim->critical_section) != evict) { /* * Easy case: victim isn't in a critical section with * our lock. The victim has either been scheduled out * since we called ensure_cancel_sequence, our went * through a context switch at least once. In either * case, it has already observed the cancellation or * will before the next critical section. */ return true; } /* * The victim might be in the middle of a critical section. * Send a signal that'll skip the critical section if * necessary. */ ensure_signal_sequence(victim, sequence); /* * If the victim is definitely not running, it either has * already executed the signal handler or will before resuming * normal execution. If the victim might be running, * we can only hope we got lucky. */ if (!victim_running(victim)) { return true; } /* * We know the vitim was scheduled out before we signaled for * cancellation. We can see if the victim has released our * critical section at least once since then. */ return (ck_pr_load_ptr(&victim->critical_section) != evict); } 

The fancy stuff begins around ensure_cancel_sequence(victim, sequence);. Our code maintains the invariant that the MPMC sequences (cancel_sequence, signal_sequence) are either the SPMC sequence - 1 (normal state), or exactly the SPMC sequence (cancellation request).

ensure_cancel_sequence CASes the cancel_sequence field from its expected value of owner.sequence - 1 to owner.sequence. If the actual value is neither of them, the owner has already advanced to a new sequence value, and we’re done.

Otherwise, we have to hope the victim isn’t running.

Now comes the really tricky stuff. Our CAS is immediately visible globally. The issue is that the victim might already be in the middle of a critical section. When writers executes a critical sections, they:

1. Set the critical section flag (with a normal write);
2. Check that the lock hasn’t been revoked;
3. Perform the write;
4. Clear the critical section flag.

It’s really hard to guarantee that the write in step 1 is visible (without killing performance in the common case), and if it is, that the victim isn’t about to execute step 3.

We get that guarantee by determining that the victim hasn’t been continuously executing since the time we attempted to CAS the cancel_sequence forward. That’s (hopefully) enough of a barrier to order the CAS, step 1, and our read of the critical section flag.

That’s not information that Linux exposes directly. However, we can borrow a trick from Rseq and read /proc/self/task/[tid]/stat. The contents of that file include whether the task is (R)unnable (or (S)leeping, waiting for (D)isk, etc.), and the CPU on which the task last executed.

If the task isn’t runnable, it definitely hasn’t been running continuously since the CAS. If the task is runnable but last ran on the CPU the current thread is itself running on (and the current thread wasn’t migrated in the middle of reading the stat file), it’s not running now.

If the task is runnable on another CPU, we can try to look at /proc/sched_debug: each CPU has a .curr->pid line that tells us the PID of the task that’s currently running (0 for none). That file has a lot of extra information so reading it is really slow, but we only need to do that after migrations.

Finally, the victim might really be running. Other proposals would fire an IPI; we instead ask the caller to allocate a few more pseudo-per-CPU structs.

Assuming we did get a barrier out of the scheduler, we hopefully observe that the victim’s critical section flag is clear. If that happens, we had:

1. CAS the cancellation sequence;
2. Barrier in the victim from being scheduled out;
3. Critical section flag was empty after the CAS.

This guarantees that the victim hasn’t been in the same critical section since the CAS in step 1. Either it’s not in a critical section, or if it is, it’s a fresh one that will observe the CAS. It’s safe to assume the victim has been successfully evicted.

The less happy path happens when we observe that the victim’s critical section flag is set. We must assume that it was scheduled out in the middle of a critical section. We’ll send a (POSIX) signal to the victim: the handler will skip over the critical section if the victim is still in one. Once that signal is sent, we know that the first thing Linux will do is execute the handler when the victim resume execution. If the victim is still not running after tgkill returned, we’re good to go: if the victim is still in the critical section, the handler will fire when it resumes execution.

Otherwise, the victim might have been scheduled in between the CAS and the signal; we still have the implicit barrier given by the context switch between CAS and signal, but we can’t rely on signal execution. We can only hope to observe that the victim has noticed the cancellation request and advanced its sequence, or that it cleared its critical section flag.

The rest is straightforward. The rlock_store_64 must observe any cancellation, ensure that it still holds the lock, and enter the critical section:

1. set the critical section flag (overwrite with the lock’s address);
2. check again that we still hold the lock and have not been asked to cancel;
3. flip the result flag to “success”;
4. store.

Once it leaves the critical section, rlock_store_64 clears the critical section flags, looks for any cancellation request, and returns success/failure. The critical section is in inline assembly for the signal handler: executing the store in step 4 implicitly marks the end of the critical section.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69  bool rlock_store_64(rlock_owner_seq_t snapshot, struct rlock *lock, uint64_t *dst, uint64_t value) { struct rlock_owner *self = (void *)((uintptr_t)snapshot.address * 64); rlock_owner_seq_t seq; uint32_t op_count; int status; seq.bits = self->seq.bits; op_count = ++self->op_count; /* We cancelled this lock. */ if (UNLIKELY(seq.bits != snapshot.bits)) { return false; } /* The handler will reset RAX to 1 on skip. */ status = 1; asm volatile( /* Move the lock's address in the critical section flag. */ "0: movq %[lock], %[critical_section]\n\t" /* Do we still own the lock? */ "cmpq %[owner], %[snapshot]\n\t" "jne 1f\n\t" /* Were we asked to cancel? */ "cmpl %[cancelled], %[seq]\n\t" "je 1f\n\t" /* Success path! Set status to 0. */ "xorl %[status], %[status]\n\t" /* Store the value in *dst. */ "movq %[value], %[dst]\n\t" /* End of critical section. */ "1:\n\t" /* * Make sure the signal handler knows where the * critical section code begins & ends. */ ".pushsection rlock_store_list, \"a\", @progbits\n\t" ".quad 0b, 1b\n\t" ".popsection\n\t" : [status] "+a"(status), [critical_section] "+m"(self->critical_section), [dst] "=m"(*dst) : [lock] "r"(lock), [snapshot] "r"(snapshot.bits), [owner] "m"(lock->owner.bits), [seq] "r"((uint32_t)seq.sequence), [cancelled] "m"(self->cancel_sequence), [value] "r"(value) : "memory", "cc"); /* Clear the flag. */ ck_pr_store_ptr(&self->critical_section, NULL); /* Acknowledge any cancellation request. */ if (UNLIKELY(status != 0)) { update_self(self); return false; } /* Force lock reacquisition after a couple thousand writes. */ if (UNLIKELY(op_count >= OP_LIMIT)) { self->op_count = 0; rlock_owner_release(); } return true; } 

Finally, the signal handler for rlock cancellation requests iterates through the rlock_store_list section until it finds a record that strictly includes the instruction pointer. If there is such a record, the thread is in a critical section, and we can skip it by overwriting RIP (to the end of the critical section) and setting RAX to 1.

“rlock.c”
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38  void rlock_signal_handler(int signal, siginfo_t *info, void *arg) { ucontext_t *ctx = arg; mcontext_t *mctx = &ctx->uc_mcontext; struct rlock_owner *self = rlock_self; uintptr_t rip; size_t nloc = __stop_rlock_store_list - __start_rlock_store_list; (void)signal; (void)info; rip = (uintptr_t)mctx->gregs[REG_RIP]; for (size_t i = 0; i < nloc; i++) { struct rlock_store record; record = __start_rlock_store_list[i]; if (rip < record.begin || rip >= record.end) { continue; } assert(self != NULL); /* skip the critical instruction. */ mctx->gregs[REG_RIP] = record.end; /* set the interrupted flag. */ mctx->gregs[REG_RAX] = 1; return; } /* Might as well publish that we observed any cancellation request. */ if (self != NULL) { ck_pr_fas_32(&self->acked_sequence, ck_pr_load_32(&self->cancel_sequence)); } return; } 

## Silly benchmarks

On my 2.9 GHz Sandy Bridge, a baseline loop to increment a counter a billion times takes 6.9 cycles per increment, which makes sense given that I use inline assembly loads and stores to prevent any compiler cleverness.

The same loop with an interlocked store (xchg) takes 36 cycles per increment.

Interestingly, an xchg-based spinlock around normal increments only takes 31.7 cycles per increment (0.44 IPC). If we wish to back our spinlocks with futexes, we must unlock with an interlocked write; releasing the lock with a compare-and-swap brings us to 53.6 cycles per increment (0.30 IPC)! Atomics really mess with pipelining: unless they’re separated by dozens or even hundreds of instructions, their barrier semantics (that we usually need) practically forces an in-order, barely pipelined, execution.

FWIW, 50ish cycles per transaction is close to what I see in microbenchmarks for Intel’s RTM/HLE. So, while the overhead of TSX is non-negligible for very short critical sections, it seems more than reasonable for adaptive locks (and TSX definitely helps when preemption happens, as shown by Dice and Harris in Lock Holder Preemption Avoidance via Transactional Lock Elision).

Finally, the figure that really matters: when incrementing with rlock_store_64, we need 13 cycles per increment. That loop hits 2.99 IPC, so I think the bottleneck is just the number of instructions in rlock_store_64. The performance even seems independent of the number of worker threads, as long as they’re all on the same CPU.

In tabular form:

| Method               | Cycle / increment | IPC  |
|----------------------|-------------------|------|
| Vanilla              |             6.961 | 1.15 |
| xchg                 |            36.054 | 0.22 |
| FAS spinlock         |            31.710 | 0.44 |
| FAS-CAS lock         |            53.656 | 0.30 |
| Rlock, 1 thd         |            13.044 | 2.99 |
| Rlock, 4 thd / 1 CPU |            13.099 | 2.98 |
| Rlock, 256 / 1       |            13.952 | 2.96 |
| Rlock, 2 / 2         |            13.047 | 2.99 |


Six more cycles per write versus thread-private storage really isn’t that bad (accessing TLS in a shared library might add as much overhead)... especially compared to 25-50 cycles (in addition to indirect slowdowns from the barrier semantics) with locked instructions.

I also have a statistics-gathering mode that lets me vary the fraction of cycles spent in critical sections. On my server, the frequency of context switches between CPU-intensive threads scheduled on the same CPU increases in steps until seven or eight threads; at that point, the frequency tops out at one switch per jiffy (250 Hz). Apart from this scheduling detail, evictions act as expected (same logic as for sampled profiles). The number of evictions is almost equal to the number of context switches, which is proportional to the runtime. However, the number of hard evictions (with the victim in a critical section) is always proportional to the number of critical section executed: roughly one in five million critical section is preempted. That’s even less than the one in two million we’d expect from the ~six cycle per critical section: that kind of makes sense with out of order execution, given that the critical section should easily flow through the pipeline and slip past timer interrupts.

The main trade-off is that rlocks do not attempt to handle thread migrations: when a thread migrates to another CPU, we let it assume (temporary) exclusive ownership of its pseudo-per-CPU struct instead of issuing IPIs. That’s good for simplicity, and also – arguably – for scaling. The scaling argument is weak, given how efficient IPIs seem to be. However, IPIs feel like one of these operations for which most of the cost is indirect and hard to measure. The overhead isn’t only (or even mostly) incurred by the thread that triggers the IPIs: each CPU must stop what it’s currently doing, flush the pipeline, switch to the kernel to handle the interrupt, and resume execution. A scheme that relies on IPIs to handle events like thread migrations (rare, but happens at a non-negligible base rate) will scale badly to really large CPU counts, and, more importantly, may make it hard to identify when the IPIs hurt overall system performance.

The other important design decision is that rlocks uses signals instead of cross-modifying code. I’m not opposed to cross-modifying code, but I cringe at the idea of leaving writable and executable pages lying around just for performance. Again, we could mprotect around cross-modification, but mprotect triggers IPIs, and that’s we’re trying to avoid. Also, if we’re going to mprotect in the common case, we might as well just mmap in different machine code; that’s likely a bit faster than two mprotect and definitely safer (I would use this mmap approach for revocable multi-CPU locks à la Harris and Fraser).

The downside of using signals is that they’re more invasive than cross-modifying code. If user code expects any (async) signal, its handlers must either mask the rlock signal away and not use rlocks, or call the rlock signal handler... not transparent, but not exacting either.

Rlocks really aren’t that much code (560 LOC), and that code is fairly reasonable (no mprotect or self-modification trick, just signals). After more testing and validation, I would consider merging them in Concurrency Kit for production use.

Next step: either mmap-based strict revocable locks for non-blocking concurrent code, or a full implementation of pseudo-per-CPU data based on relaxed rlocks.

#### Patrick Stein — Fog of Light - Starting to Add Star-Fields

· 58 days ago

I have finally written my first OpenGL code using GLSL. Whew. That took way too long to get all working correctly. I promise, soon, I will upload some sample code so that others may not have to stumble as long as I did.

For the star-field, I generate a few thousand 2-D points. Each point has its own radius, its own opacity, and its own color.

I put these all into an OpenGL array buffer. Then, the vertex shader copies data out of my struct to set the color and the point size. Then, the fragment shader turns the color into two, overlapping radial gradients (one that is half the radius of the other) by modulating the color’s opacity.

Next up will be nebulae, then planets/asteroids in the local system.

· 59 days ago

#### Quicklisp news — May 2017 Quicklisp dist update now available

· 66 days ago
New projects:
• cepl.glop — glop host for cepl — BSD 2 Clause
• cepl.sdl2-image — Some helper methods for using sdl2-image to load images to CEPL types — BSD 2 Clause
• cepl.sdl2-ttf — A few additional helpers for making working with sdl2-ttf even easier from CEPL — BSD 2 Clause
• cl-clblas — clBLAS binding — Apache License, Version 2.0
• cl-emoji — cl-emoji provides the Unicode emoji characters — MIT
• cl-fond — Bindings to libfond, a simple text rendering engine for OpenGL — Artistic
• cl-hamcrest — This library makes your CL unittests more readable. — New BSD
• cl-ntp-client — A simple NTP (Network Time Protocol) client in Common Lisp — BSD
• cl-pcg — A bare-bones Permuted Congruential Generator implementation in pure Common Lisp. — MIT
• cl-sdl2-mixer — Bindings for SDL2_mixer — MIT
• cl-trie — Common Lisp implementation of Trie data structure. — MIT
• cl-why — (X)HTML generation macros — BSD
• doubly-linked-list — An implementation of the doubly linked list data structure. — MIT
• flac-parser — A parser for FLAC audio files. — MIT
• fs-utils — Utilities for working with files and file paths. — MIT
• gamebox-dgen — A procedural dungeon generator. — MIT
• gamebox-ecs — An implementation of the Entity-Component System (ECS) pattern, popular with game development. — MIT
• gamebox-frame-manager — A manager for frames within a game loop. — MIT
• gamebox-grids — Create and manipulate tiles in a two-dimensional grid layout. — MIT
• gamebox-math — A high performance math library useful for making games. — MIT
• genie — A simple wrapper to generate portably seedable pseudo-random numbers. — MIT
• markdown.cl — A markdown parser for Common Lisp — MIT
• narrowed-types — Type definitions narrowed with predicates — BSD
• simple-logger — A simple message logging system. — MIT
• simple-routes — Facility for straightforward http routing on top of Hunchentoot. — 2 clause BSD
• stealth-mixin — Library for creating stealth mixin classes. — FreeBSD, see file LICENSE.text
• the-cost-of-nothing — Determine the cost of things in Common Lisp. — GPLv3
• trivial-clipboard — trivial-clipboard let access system clipboard. — MIT

Removed projects: cl-geo, cl-wkb, cl4l, clim-pkg-doc, gsharp, lifoo, lisp-binary.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

#### Patrick Stein — Fog of Light - Getting Underway

· 71 days ago

Dauntless (The Lost Fleet, Book 1) was the first science-fiction book I read that tried to deal with space combat with the real-world constraint that light only travels so fast. It takes light eight minutes to get from the Sun to Earth. It takes light more than a second to get from the Earth to the Moon. Depending on where they are in their orbits, it takes between three minutes and twenty-two minutes to get light from Mars to Earth.

Imagine that you’re a star-ship. You and your companions have just warped into a new star system. You see a flotilla of enemy ships about 45 light-minutes away. That means, you’ve got 45 minutes before that flotilla can possibly even know that you’re in their star system. How much can you get done in that time? Once they can see you, how much can you mislead them on your target if they’re going to be operating on data about where you were heading more than half an hour ago?

For years, I have been batting around this concept, hammering it into a game. I have finally gotten started on it.

Armed with some functions like these, I am constructing values which change at points in space-time and querying the value visible from other points in space-time.

(defgeneric get-nearest-value (space-time-value space-time-point)
...
(:documentation "Find the observable value of a quantity
SPACE-TIME-VALUE when observed from a given location
SPACE-TIME-POINT. This method finds the most-recent
value V0 (at location P0) for this data when viewed from
the given location. This method returns (VALUES V0 P0).
This method makes no effort to interpolate the results."
))

Hopefully, there will be plenty more coming in the near future.

#### Lispjobs — Linux systems engineer with Common Lisp experience, m-creations, Mainz, Germany

· 84 days ago

Full time position for German speaking Linux admin with Docker and
Common Lisp experience near Frankfurt, Germany

We are a small German software shop based in Mainz, Germany, founded in
2000. We create custom software solutions for mid-size to big companies
in finance/payment, health care, and media research.

For some of our customers, we also cover operational aspects of the
software lifecycle by creating and running Docker containers in
development, test, and production environments on clusters of servers
running Apache Mesos.

Missing pieces of infrastructure are written in Common Lisp (CL) and
interact with existing software components of the cluster (DNS, load
balancer etc.). Docker images are based on the embedded Linux
distribution OpenWrt.

We are looking for new colleagues who ideally

– have 3+ years of Linux experience (e.g. are fluent in shell scripting
and have a good overview of the GNU/Linux tools)

– have a working knowledge of Docker, its interaction with the host, and
the role of the container image

– have experience in Common Lisp (not necessarily professional)

– want to use CL to solve systems engineering problems (e.g. dynamic
load balancing, DNS re-configuration)

– are interested in mastering the OpenWrt build system (buildroot +
make/cmake) to create a secure in-container distribution

Experience in the mentioned areas is not as important as curiosity,
intelligence and open-mindedness. You will get the necessary time to
learn the missing skills. We are interested in a long-term relationship
rather than just staffing a project with ‘resources’.

Due to our size as a small company, we do care about each one of our
colleagues and react flexibly to the (sometimes changing) necessities of
their life. Together we try to develop a plan for your personal career,
depending on your own goals.

He’ll be happy to give you more information and answer all your
questions.

m-creations gmbh
Acker 2
55116 Mainz

#### Zach Beane — Common Lisp Standard Draft

· 85 days ago
Common Lisp Standard Draft:

This is a nice PDF version of the CL spec built from the final draft TeX sources. There's also a gitlab repo that can be used to reproduce the PDF locally. (Thanks to Rainer Joswig for sharing this on twitter.)

#### François-René Rideau — Design at the confluence of programming languages and build systems

· 86 days ago

This short article discusses upcoming changes and future challenges for ASDF, the Common Lisp build system. It also draws lessons for a hypothetical successor to ASDF, for build systems in general, languages in which to write them, and languages that would have an internal build system that could rival with modern build systems.

ASDF, "Another System Definition Facility", is the de facto standard build system for Common Lisp (CL). It is relatively lightweight (13 kloc, over half of which for the portability layer UIOP, the "Utilities for Implementation- and OS- Portability"), quite portable (17 supported implementations), configurable (though importantly it "just works" by default), well-featured (it can create standalone executables), extensible (e.g. with support for linking C code, or for compiling FORTRAN through Lisp, etc.). But it lacks many features of modern build systems like e.g. Bazel: it does not support determinism and reproducibility, distribution and caching, cross-compilation to other platforms, building software written in languages other than CL, integration with non-CL build systems, management of multiple versions of the same software, or scaling to millions of files, etc. Historically, these limitations are due to ASDF being at heart an in-image build system in direct line of the original Lisp Machine DEFSYSTEM: it is designed to build and load software into the current Lisp image. But the challenges in possibly transforming ASDF into a modern build system touch limitations of Common Lisp itself and tell us something about language design in general.

I have essentially two development branches more or less ready for merge in the upcoming ASDF 3.3: the "plan" branch that provides proper phase separation (briefly discussed in my ELS 2017 demo), and the "syntax-control" branch that binding for syntax variables around ASDF evaluation (briefly discussed in my ELS 2014 extended article, section 3.5 "Safety before Ubiquity").

### Phase Separation

The first branch solves the problem of phase separation. The branch is called "plan" because I started with the belief that most of the changes would be centered around how ASDF computes its plan. But the changes run deeper than that: 970 lines were added or modified all over the source code, not counting hundreds more were moved around as the code got reorganized. That's double the number of lines of the original ASDF, and it took me several months (part time, off hours) to get just right. Still, it is up-to-date, passes all tests, and works fine for me.

To understand what this is about, consider that a basic design point in ASDF 1.0 to 3.2 is that it first plans your entire build, then it performs the plan. The plan is a list of actions (pair of OPERATION and COMPONENT), obtained by walking the action dependency graph implicitly defined by the COMPONENT-DEPENDS-ON methods. Performing the plan is achieved by calling the PERFORM generic function on each action, which in turn will call INPUT-FILES and OUTPUT-FILES to locate its inputs and outputs.

This plan-then-perform strategy works perfectly fine as long as you don't need ASDF extensions (such as, e.g. cffi-grovel, or f2l). However, if you need extensions, there is a problem: how do you load it? Well, it's written in Lisp, so you could use a Lisp build system to load it, for instance, ASDF! And so people either use load-system (or an older equivalent) from their .asd files, or more declaratively use :defsystem-depends-on in their (defsystem ...) form, which in practice is about the same. Now, since ASDF up until 3.2 has no notion of multiple loading phases, what happens is that a brand new separate plan is computed then performed every time you use this feature. This works well enough in simple cases: some actions may be planned then performed in multiple phases, but performing should be idempotent (or else you deserve to lose), therefore ASDF wastes some time rebuilding a few actions that were planned before an extension was loaded that also depended on them. However, the real problems arise when something causes an extension to be invalidated: then the behavior of the extension may change (even subtly) due to its modified dependency, and the extension and all the systems that directly or indirectly depend on should be invalidated and recomputed. But ASDF up until 3.2 fail to do so, and the resulting build can thus be incorrect.

The bug is quite subtle: to experience it, you must be attempting an incremental build, while meaningful changes were made that affect the behavior of an ASDF extension. This kind of situation is rare enough in the small. And it is easily remedied by manually building from scratch. In the small, you can afford to always build from scratch the few systems that you modify, anyway. But when programming in the large, the bug may become very serious. What is more, it is a hurdle on the road to making a future ASDF a robust system with deterministic builds.

Addressing the issue was not a simple fix, but required deep and subtle changes that introduce notions neglected in the previous simpler build models: having a session that spans multiple plan-then-perform phases and caches the proper information not too little not too much; having a notion that loading a .asd file is itself an action that must be taken into account in the plan; having a notion of dynamically detecting the dependencies of loading a .asd file; being able to check cross-phase dependencies before to keep or invalidate a previously loaded version of a .asd file without causing anything to be loaded in the doing; expanding the state space associated to actions as they are traversed potentially many times while building the now multi-phase dependency graph. And all these things interfere with each other and have to be gotten just right.

Now, while my implemented solution is obviously very specific to ASDF, the issue of properly staging build extensions is a common user need; and addressing the issue would require the introduction of similar notions in any build system. Yet, most build systems, like ASDF up until 3.2, fail to offer proper dependency tracking when extensions change: e.g. with GNU Make you can include the result of a target into the Makefile, but there is no attempt to invalidate targets if recipes have changed or the Makefile or some included file was modified. Those build systems that do implement proper phase separation to track these dependencies are usually language-specific build systems (like ASDF); but most of them (unlike ASDF) only deal with staging macros or extensions inside the language (e.g. Racket), not with building arbitrary code outside the language. An interesting case is Bazel, which does maintain a strict plan-then-perform model yet allows user-provided extensions (e.g. to support Lisp). However, its extensions, written in a safe restricted DSL (that runs into plan phase only, with two subphases, "load" and "analysis") are not themselves subject to extension using the build system (yet the DSL being a universal language, you could implement extensibility the hard way).

Fixing the build model in ASDF 3.3 led to subtle backward-incompatible changes. Libraries available on Quicklisp were inspected, and their authors contacted if they depended on modified functionality or abandoned internals. Those libraries that are still maintained were fixed. Still, I'd just like to see how compatible it is with next month's Quicklisp before I can recommend releasing these changes upon the masses.

### Syntax Control

The current ASDF has no notion of syntax, and uses whatever *readtable*, *print-pprint-dispatch*, *read-default-float-format* or many other syntax variables are ambient at the time ASDF is called. This means that if you ever side-effect those variables and/or the tables that underlie the first two, (e.g. to enable fare-quasiquote for the sake of matching with optima or trivia), then call ASDF, the code will be compiled with those modified tables, which will make fasl that are unloadable unless the same side-effects are present. If systems are modified and compiled that do not have explicit dependencies on those side-effects, or worse, that those side-effects depend on (e.g. fare-utils, that fare-quasiquote depends on), then your fasl cache will be polluted and the only way out will be to rm -rf the contaminated parts of the fasl cache and/or to build with :force :all until all parts are overwritten. Which is surprising and painful. In practice, this means that using ASDF is not compatible with making non-additive modifications to the syntax.

Back in the 3.1 days, I wrote a branch whereby each system has its own bindings for the syntax variables, whereas the default tables be read-only (if possible, which it is in many implementations). With that branch, the convention is each system can do modify the syntax in whatever way it wants, and that will only affect that system; however, changes to syntax tables must be done after explicitly creating new tables, and any attempt to side-effect the default global tables will result in an error.

This was the cleanest solution, but alas it is not compatible with a few legacy systems that explicitly depend on modifying the syntax tables (and/or variables?) for the next system to use, as ugly as that is. My initial opinion was that this should be forbidden, and that these legacy systems should be fixed; however, these were legacy systems at a notable Lisp company, with no one willing to fix them; also, I had resigned from maintainership and the new maintainer is more conservative than I am, so in the end the branch was delayed until after said Lisp company would investigate, which never happened, and the branch was never merged.

A simpler and more backward-compatible change to ASDF would have been to have global settings for the variables that are bound around any ASDF session. Then, the convention would be that you are not allowed to use ASDF again to load regular CL systems after you modify these variables in a non-additive way; and the only additive changes you can make are to add new entries to the shared *readtable* and *print-pprint-dispatch* tables that do not conflict with any default entry or earlier entry (and that includes default entries on any implementation that you may want to support, so e.g. no getting #_ or #/ if you want to support CCL). Even additive changes, if made, must somehow not clash with each other, or they become non-additive; but there is no way to automatically check that this is the case and issue a warning. After you make non-additive changes (if you do), then ASDF can't be used anymore to build normal systems that may conflict with those changes, and if they are modified and you call ASDF on a system that depends on them, you lose (or you must first make all those systems immutable).

Note that because ASDF would already break in those cases, most of these constraints de facto exist, are enforced, and are respected by all ASDF users. There remains the question of binding the variables around the build, which allows normal systems to be built even if a user changes the variables, or to not bind them, which puts the onus on most users of keeping these variables bound to reasonable values around calls to ASDF for the benefit of a few users would want their own breaking changes to persist after the build. I believe the first option (bind the variables) is cleaner, though the second (basically, do nothing) is more backward-compatible.

In all cases, you can always make non-additive changes to a readtable (such as enabling fare-quasiquote) by locally binding *readtable* to a different value, e.g. using named-readtables:in-readtable. A local binding won't adversely affect the ASDF build; but unless ASDF is changed to enforce its own bindings, you'll have to make sure to manually undo your local bindings before you call ASDF again.

The problem with not adding any syntax-control to ASDF is that it forces Lispers to always be conservative about modifying the readtable and calling ASDF (or having it called indirectly by any function whatsoever that they call, which they can't always predict). In practice this makes hacking CL code hostile to interactive development with non-additive syntax modification; which defeats in social conventions a technical feature of the language often touted as cool by its zealots. If syntax-control is added to ASDF, then you can freely do your syntax modifications and be confident that building code won't be adversely affected.

The current branch implements the simpler option of binding variables around ASDF sessions, and using a mutable shared readtable that should only be modified additively. It has probably bitrotten, and should be updated or rewritten. The current maintainer, Robert Goldman, should probably opine on which change to adopt with what schedule (3.3.0? 3.2.2? 3.3.1? 3.4.0?) and sign off the API.

### Vanquishing Language Limitations

These two modifications are ((now)low)-hanging fruits in making ASDF a more robust build tool, one that supports working with non-trivial extension to the build system or the Lisp syntax. And in both cases, the limit reached by ASDF is ultimately that CL is a hippie language that allows unrestricted global side-effects and disallows disallowing. Therefore extensions necessarily introduce potential conflict with each other that have to be solved in wetware via convention, whereby all users are to be trusted not go wild with side-effects. The system cannot even detect violations and warn users of a potential mistake; users will have to experience subtle or catastrophic failure and figure out what went wrong.

A better language for a build system should be purer: inasmuch as it has "global" side-effects, it should allow to "fork" the "global" state in an efficient incremental way. Or even better, it should make it easy to catch side-effects and write this forking support in userland. At the very least, it would make it possible to detect violations and warn the user. Bazel is an example build system with an extension language that has local side-effects, but globally has pure forked environments. A successor to ASDF could similarly provide a suitably pure dialect of Lisp for extensions.

Happily, adding better syntax control to ASDF suggests an obvious solution: ASDF extensions could be written in an enforceable subset of a suitable extension of Common Lisp. Thus, ASDF extensions, if not random Common Lisp programs, can be made to follow a discipline compatible with a deterministic, reproducible build.

What would be an ideal language in which to write a extensible build system? Well, I tackled that question in another article, the Chapter 9: "Build Systems" of my blog "Ngnghm". That's probably too far from CL to be in the future of ASDF as such, though: the CL extension would be too large to fit ASDF's requirement of minimalism. On the other hand, if such a language and build system is ever written, interest for CL and ASDF might wane in favor of said latter build system.

In any case, in addition to not being a blub language, features that will make for a great programming language for an integrated build system include the following: making it possible to directly express functional reactive programming, determinism as well as system I/O, laziness as well as strictness, reflection to map variables to filesystem and/or version control as well as to stage computations in general including dynamic build plans, hygiene in syntax extension and file reference, modularity in the large as well as in the small, programmable namespace management, the ability to virtualize computations at all sizes and levels of abstractions, to instrument code, etc.

### Towards cross-compilation

Now, before we get reproducible builds, we also need to enable cross-compilation for ASDF systems, so the necessarily unrestricted side-effects of compiling Common Lisp code cannot interfere with the rest of the build. Cross-compilation also allows building on a different platform, which would be important to properly support MOCL, but would probably also mesh well with support for building software in arbitrary other languages.

Importantly, instead of the (perform operation component) protocol that specifies how to build software in the current image, a (perform-form target operation component) protocol (or maybe one where the target information has been made part of the operation object) would return forms specifying how to build software, which could then happen in separate Lisp or non-Lisp process, on the same machine or on another worker of a distributed build farm.

Note however, that one essential constraint of ASDF is that it should keep working in-image in the small and not depend on external processes or additional libraries. Any serious effort towards a "deterministic build" should therefore remain an extension indeed (though one users would load early).

Still, if this extension is to remain compatible with ASDF and its .asd files, providing a backward-compatible path forward, then modifications and cleanups may have to be done to ASDF itself so it behaves well. Even keeping that hypothetical deterministic build separate, I expect non-trivial changes to the ASDF API to enable it, such as the perform-form protocol mentioned above. But backward-compatibility and smooth transition paths have always been the name of the game for ASDF; they are what make possible an ecosystem with thousands of packages.

There is a precedent to an ASDF extension leading to (most positive) changes in ASDF: POIU, the "Parallel Operators on Independent Units", Andreas Fuchs' extension to compile files in forks (but still load them in-image). Making sure that POIU can be expressed as an extension of ASDF without redefining or breaking the provided abstractions, was instrumental in the evolution of ASDF: it led to many cleanups in ASDF 2, it inspired several of the breakthroughs that informed what became ASDF 3, and it kept influencing ASDF 3.3.

Thus, even though ASDF will stay forever an in-image build system, and even though a deterministic build extension (let's call it FDSA, the Federated Deterministic System Assembler) may ultimately remain as little used as POIU (i.e. because it lacks sufficient benefits to justify the transition costs), I expect the design of the base ASDF to be deeply influenced by the development of such a tool (if it happens).

### Looking for new developers

Robert Goldman and I are not getting younger, not getting more interested in ASDF, and we're not getting paid to hack on it. We are looking for young Common Lisp hackers to join us as developers, and maybe some day become maintainers, while we're still there to guide them through the code base. Even without the ambition (and resources) to actually work towards a hypothetical FDSA, our TODO file is full of items of all sizes and difficulties that could use some love. So, whatever your level of proficiency, if you feel like hacking on a build system both quite practical and full of potentiality, there are plenty of opportunities for you to work on ASDF (or a successor?) and do great, impactful work.

#### McCLIM — Progress report #7

· 91 days ago

Dear Community,

During this iteration I have worked on the Debugger (system clim-debugger) to bring it closer to sldb:

More work on the module is planned with a final goal to integrate it with the Listener and to have it as a default debugger for McCLIM applications. Suggestions on how to improve the interface, testing and help with coding are appreciated. Preliminary documentation has been written and put in the McCLIM manual draft.

I've started working on a library called Slim[1] the goal of which is to provide some of the CLIM interfaces in easy to learn and write manner (i.e more context dependent, less verbose names, unified abstractions etc.). For now it is only a skeleton having barely four macros, but I'd love to hear suggestions, what should it contain, in what form etc. A sketch may be found in the source code Libraries/Slim/. If you think it is a bad idea to have such library shipped with McCLIM - let me know about that too!

The documentation was extended in some places. Also building the info document works now (long standing issue). An updated version of manual draft may be found on the McCLIM website. The Drei documentation has been put in a separate document due to its size and independent scope from the rest of McCLIM.

Nisar Ahmad has solved one of the bounties related to menus and command tables. He also submitted documentation for the menu functionality, thereby earning $150. Congratulations! Speaking of finances - all money is now accumulated solely for bounties and development tasks, none is withdrawn by me since the beginning of year 2017. Currently we have$1226 at our disposal and active bounties worth $700. Declared monthly donations at the moment equal$297. Additionally one-time contributions come every month. That means that we can post two or three bounties a month without draining the current resources, or spawn a bunch of worthwhile tasks and keep going as money comes. This is fantastic news. Thank you all for your support to the project!

At the moment we have five active bounties worth 700 which may be found here: https://www.bountysource.com/teams/mcclim/bounties New bounties have a time limit assigned to them (six months) - thanks to that we are able to reclaim money from unresolved issues and propose it somewhere else (or repost the same bounty). To improve the visibility of the issues which have bounties on them I've added a label to GitHub issue tracker: bounty. Suggestions as to which other issues should have a bounty on them are appreciated and welcome. Please note that Bountysource has a functionality "Suggest an Issue" which may be found on the bounties page. If you feel that you may solve some problem, but there is no bounty on it, feel free to suggest it too. If you have any questions, doubts or suggestions - please contact me either by email (daniel@turtleware.eu) or on IRC (my nick is jackdaniel). We are very happy that the number of McCLIM users grows, which may be infered from number of questions on the IRC channel, bug reports and pull requests. Sincerely yours, Daniel Kochmański [1] Simplified Lisp Interface Manager. #### Vsevolod Dyomkin — Pretty-Printing Trees · 99 days ago (or The Ugliest Code I've Ever Written) In the last couple of days, I was ill and had to stay in bed, so I've used this time also to tidy up the work that accumulated over the past year in cl-nlp. That was especially timely, considering the interest that was expressed in using it by some people who I've met at the recent Lisp-related events. I've even assembled a rough checklist of the things that need to be finished to get it to v.1.0 and beyond. Besides, after finishing the basic cleaning, I've returned to one of the programming tasks that has racked my head for long: tree pretty-printing. In NLP, we constantly have to deal with various versions of parse trees, like the constituency or dependency ones, but the problem is that they are not easily visualized. And good visualization plays, at least for me, a critical role in effective debugging, ideation and programming. It's an essential part of a solid interactive experience that is one of the fundamental traits of Lisp development. For instance, a constituency tree is usually presented as a Lisp list. Here's an infamous example from the Penn Treebank:  (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ) A dependency tree has several representations, all of which are not really intuitive to grasp. This is the Stanford format:  amod(ideas-2, Colorless-0) amod(ideas-2, green-1) nsubj(sleep-3, ideas-2) root(sleep-3, sleep-3) advmod(sleep-3, furiously-4) punct(sleep-3, .-5) And here's the CoNLL one: 0 Colorless _ _ ADJ 21 green _ _ ADJ 22 ideas _ _ NOUN 33 sleep _ _ NOUN 34 furiously _ _ ADV 35 . _ _ PUNCT 3 Also, Google's Parsey McParseface offers another - presumably, more visual - representation (using the asciitree lib). Still, it is not good enough, as it messes with the order of words in a sentence. Input: Bob brought the pizza to Alice .Parse:brought VBD ROOT +-- Bob NNP nsubj +-- pizza NN dobj | +-- the DT det +-- to IN prep | +-- Alice NNP pobj +-- . . punct As you see, dependency trees are not trivial to visualize (or pretty-print) in ASCII. The authors of Spacy creatively approached solving this problem by using CSS in their displaCy tool: However, it seems like an overkill to bring a browser to with you for such a small task. And it's also not very scalable: I, in fact, was always interested in creative ways of text-based visualization. So, I thought of ways to represent parse trees in ASCII. With constituency ones, it's rather trivial: > (pprint-tree '(TOP (S (NP (NN )) (VP (VBZ ) (NP (DT ) (JJ ) (NN ))) (|.| <.:5 22..23>))) TOP : S .-----------:---------. : VP : : .---------. : NP : NP : : : .----:-----. : NN VBZ DT JJ NN . : : : : : : This is a simple test .  The dependencies are trickier, but I managed to find a way to show them without compromising the sentence word order: > (pprint-deps '(<This:0 0..4> <is:1 5..7> <a:2 8..9> <simple:3 10..16> <test:4 17..21> <.:5 22..23>) '(root(_ROOT_-0, is-1) nsubj(is-1, This-0) dobj(is-1, test-4) det(test-4, a-2) amod(test-4, simple-3) punct(is-1, .-5)))Colorless green ideas sleep furiously . ^ ^ .^ .^. ^ ^ : . amod .´: ::: : : ..... amod .....´: ::: : : . nsubj .´:: : : :. advmod .´ : :.... punct .....´ root And it looks pretty neat even for longer sentences: We hold these truths to be self - evident , that all men are created equal , that they are endowed by their Creator with certain unalienable Rights , that among these are Life , Liberty and the pursuit of Happiness . ^ .^. ^ .^ ^ .^. ^ ^ .^ ^ ^ ^ .^ ^ .^. ^ ^ ^ ^ ^ .^. ^. ^ .^. ^. ^ ^ .^ ^ ^ ^. ^ .^. ^. ^ ^. ^ ^ .^. ^. ^ ^ . nsubj .´:: . det .´: . aux .´:: : . punct .´: : : . det .´: . auxpass .´:: : : : : . auxpass .´:: :: . poss .´:: :: : . amod .´: : : :. pobj .´ ::: :. punct .´ :. cc .´ . det .´:: :. pobj .´ : :... dobj ...´ :: . npadvmod .´: : : : ::. advcl .´ : : : ::: :: :: :: ...... amod ......´: : : : ::: :: :: :. prep .´ : :: :..... acomp .....´ : : .. nsubjpass ..´:: : : : ::: :: :: :......... pobj .........´ : : : ::: :: :...... conj .......´ : :......... advcl ..........´ : : ::... punct ...´ : : ::: :: :. prep .´ : : : ::: :.... conj ....´ : :..................... punct ......................´ ........... mark ...........´:: : : ::: :... pobj ....´ : : : ::. attr .´ : :: :: : : ::. agent .´ : : ... prep ....´: : :: :: : .. nsubjpass ..´:: : ...... mark ......´: : :: :: ....... mark .......´:: : : : :: :: :............................ punct .............................´ : : :: :: :........................................ advcl .........................................´ : :: :................ advcl ................´ : :...................................... ccomp .......................................´ : :............................................................................................................................................ punct .............................................................................................................................................´ root However, writing the visualization code was one of the most intimidating programming tasks I've ever encountered. One explanation is that trees are most naturally processed in depth-first order top-down, while the visualization requires bottom-up BFS approach. The other may be that pixel-perfect (or, in this case, character-perfect display is always tedious). As far as I'm concerned, this is not a sufficient explanation, but I couldn't find any other. The ugliest part of this machinery is deps->levels function that prints the dependency relations in a layered fashion. The problem is to properly calculate minimal space necessary to accommodate both tokens and dependency labels and to account for different cases when the token has outgoing dependency arcs or doesn't. In theory sounds pretty easy, but in practice, it turned out a nightmare. And all of this assumes projective trees (non-intersecting arcs), as well as doesn't know how to show on one level two arcs going from one token in two directions. Finally, I still couldn't align the two trees (constituency and dependency) above and under the sentence. Here's the target:  TOP : S .----------------:--------------. : VP : : .---------. : NP : NP : : : .----:---------. : NN VBZ DT JJ NN . This is a simple test . ^ .^. ^ ^ .^ ^ . nsubj .´:: : . amod .´: : :: .... det ....´: : :..... dobj .....´ : :...... punct ......´ root and this is how it prints for now (one more challenge was to transfer additional offsets from dependencies into the constituency tree):  TOP : S .-----------:---------. : VP : : .---------. : NP : NP : : : .----:-----. : NN VBZ DT JJ NN . This is a simple test . ^ .^. ^ ^ .^ ^ . nsubj .´:: : . amod .´: : :: .... det ....´: : :..... dobj .....´ : :...... punct ......´ root Well, the good news is that it is usable, but it still needs more work to be feature complete. I wonder what was I doing wrong: maybe, someone can come up with a clean and simple implementation of this functionality (in any language)? I consider it a great coding challenge, although it may require a week of your free time and a bunch of dead neurons to accomplish. But if you're willing to take it, I'd be glad to see the results... :D #### Eugene Zaikonnikov — About Time · 102 days ago This week I put together a small NTP client. To keep dependencies at minimum and to avoid forcing a permanently running process onto users, it does not attempt to adjust system RTC clock, compensate jitter or evaluate time server quality. As I see it, much of that behaviour is easy enough to add via mixins with the defined NTP class. NTP timestamp is two 32-bit values: seconds and fraction of a second. NTP conveniently counts seconds from Jan 1 1900, just like universal time in Common Lisp. There is however no portable Common Lisp representation for fractions of a second. Thus the client sticks to using NTP formatted fraction for that. It is way more precision than any existing CL implementation has in INTERNAL-TIME-UNITS-PER-SECOND, but this makes the value comparable across implemenations. The new GET-ADJUSTED-UNIVERSAL-TIME method then returns a pair of values: universal time and NTP fraction. The fraction can be converted to the implementation's internal time scale with FRACTION-TO-INTERNAL. Internally we define no special arithmetic on NTP timestamps but provide two conversion macros for single integer space. BIG-TIME converts NTP stamp into a large integer. We then do all calculations in that domain, and convert back to NTP timestamp using SMALL-TIME when it's time to send it over the wire. An NTP instance stores adjusted time as an offset from internal real time. The offset is roughly intialized with universal time and then adjusted after each server request. #### Nicolas Hafner — Radiance Release - Confession 73 · 109 days ago Right now I'm at Brussels Airport, waiting for my departing flight back to Zürich. The 10th European Lisp Symposium is over, and I got to have my first "real" talk at it. It was, as you might guess, about Radiance and some of the core concepts behind the project. With that, I think it is finally time to announce Radiance's proper release into the wild! It's been a long time coming- starting back when I made my first steps in Lisp in June of 2013. Radiance's full story started much earlier though, back when I was still dabbling in PHP and Java for most of my projects. The changes that this project has undergone since then are massive, to the point where hardly a single aspect of it now has any connection to its initial beginnings. One thing has always stood the same, though: the intention to make Radiance a framework that eases deployment and housing of multiple, different web services within the same instance. Circumventing a long talk about the history of how everything got together though, I'll instead try to say a bit about what Radiance's goals right now are, so that you may judge whether it might be a good fit for your next web project. First, it is important to mention that Radiance is not like Weblocks and similar projects that try to present new and interesting ways to develop web applications. Its strengths lie elsewhere. On the surface, it is very classic in approach: you write a program that has "handlers" to which the framework dispatches for each request. The handler then returns the data that should be sent back to the user. And that's it. There's no extra support for JavaScript/AJAX interaction, no continuations, no widgets, no presentations, not even a template system. All of those other choices are up to you to decide and settle on, depending on your needs. So what is Radiance good for then? Why not just use Hunchentoot? Well, depending on your project size and intentions, Hunchentoot may well be a viable alternative. What Radiance does do over Hunchentoot however, is that it offers you a layer around the webserver allowing you to exchange it later, similar to Clack. It also offers many more layers around various other features that are useful to develop web applications, however. Radiance is intended to be an adaptable intermediate layer between an application and the features that it depends on. It provides these features in such a way that it is still possible for the administrator of an installation of your application to decide what the implementations of those features are, and leaves them a choice to select their priorities. Now, this probably sounds rather abstract and confusing, so let me try and illustrate what I mean a bit more clearly. Central to this aspect of Radiance is the standard-interfaces.lisp file and section 2 of the documentation. A quick look at them should make a few things clear: rather than implementing all sorts of features like a database layer, sessions, user accounts, authentication, and so forth, Radiance provides them through interface definitions. These definitions outline the signatures of functions, macros, and so forth that the interface provides. It does not, however, actually implement the features. Your application can make use of these features by depending on the interfaces it needs, without having to specify a particular underlying implementation. In the end, the administrator decides which implementing system to use for each interface, and Radiance takes care of loading the appropriate one whenever your application is loaded. I won't go into a discrete example here, as I've already described how to use interfaces and what they can do for you in increasing levels of detail in the conference paper, the documentation, and the tutorial. If you're still with me and do intend on jumping in or having a more in-depth look, I really recommend starting with the tutorial. It's lengthy and touches on pretty much every aspect involved in writing a fully-fledged web application from the ground up. It doesn't touch on every single piece Radiance gives to you, but it will show you where to look and how to proceed should you still need more. Outside of the interfaces and pluggable features, Radiance also offers a powerful and flexible routing system. Unlike other frameworks that associate pages with tags or directly hard-code the URL into the source code, Radiance uses an "internal URL representation" and an "external URL representation". The former is what your application and templates speak in, and the latter is what the user and web server deal in. The translation between the two is handled by regular functions, called routes, which rewrite and transform URLs in order to achieve the URL namespace setup that is desired on a particular installation. This allows the administrator quick and easy control over the setup of an application. Finally, Radiance has a built in configuration and file management system that is responsible for keeping all the run-time data of an installation in one place that is easy to track. It offers you easy access to application parameters that are configurable by the administrator, and bundles everything together in such a way that multiple configuration sets can be kept on the same machine easily, thus allowing you to switch between different setups quickly. For example, you might have a "development" and "production" setup on your machine that pick different settings and implementations for the interfaces. Aside from these three major features of interfaces, routing, and configuration, Radiance offers a variety of tools and related functionality to help you with your development. In the end it is all constructed and written with the intention of making your specific web application work in such a way that it can be deployed on systems other than your own without much further work, and that it can be deployed alongside other web applications within the same Radiance instance. This allows the applications to share data like users, database, sessions, and so forth between each other, without tripping over each others' toes. While it is of course possible to use Radiance for an application that is just for you and you alone, this is not where its strengths lie. It's intended for people that want to write web applications that can be redistributed and used by other people, and focuses on allowing someone to gather together the services they want and run them all together in a common environment, leaving them as much control over the system as possible without having to touch the applications' code. Now, mind you, this does have a price associated with it. You will need to give in to certain conventions that Radiance follows and give up certain amounts of control and freedom in order to really make use of the features. That's how things go for everything. However, I dare say that the price is, in most cases, not very high. Most applications can be written with the tools the interfaces provide to you. And even if they do not, Radiance in no way forces you to use the interfaces. You can always break out of the layers and directly make use of whatever library you might need, at the cost of making your application share less with others in the system, or constraining the administrator further. Because almost everything in Radiance is optional, it becomes rather hard to advertise it fully. I'm aware of the drawbacks and the strengths, so I can't in good conscience just praise it for all of its aspects. The only thing I can say with certainty is that it's a system I've worked with for many years now, and a system I've already written a bunch of applications with. I've also been running these applications on public servers for a good many years, so it isn't without testing either. You're actually reading this on one of my services right now. In the end, it's unfortunately still your choice which framework you're going to use for your next project. I can't make that choice for you. In the very least, though, I can now recommend Radiance without having to list a bunch of "but"s. Radiance is documented, it works, and it is now, finally, officially released! I'd like to thank everyone who helped me along the way, by reading through my documentation and tutorial, testing things out, and giving me advice all around the project. I'd particularly like to thank Janne Pakarinen, Joram Schrijver, and Till Ehrengruber for their invaluable input. #### Marco Antoniotti — CLAST: a Common Lisp AST and Code Walking library · 112 days ago I guess this is a good time to publicize the CLAST library I have been working on with Matteo Crespi. CLAST is a Common Lisp AST and Code Walking library that stands apart because it is geared at producing a source-level Abstract Syntax Tree (AST) of Common Lisp code. Of course the usual issues with MACROEXPAND are all there, but I believe the choices made to handle it are quite sensible. The library is still "in fiery", but most of the heavy lifting is done. The development branch is the most up-to-date one Cheers #### Marco Antoniotti — ELS 2017: thank you! · 112 days ago Dear all just got back for ELS 2017 in Brussels, which went very well; thanks especially to Didier Verna, Irene Durand and Alberto Riva. It was a particularly good edition of the event. Cheers #### Quicklisp news — April 2017 Quicklisp dist update now available · 113 days ago New projects: • cl-cudd — A two-layered binding to the CUDD binary decision diagram library. See README.md for more details. — BSD Style (see LICENSE) • cl-marklogic — Common Lisp library for accessing MarkLogic Server. — LGPL3 • cl-sandbox — Utility package for creating safe experimental environment. — MIT • esrap-peg — A wrapper around Esrap to allow generating Esrap grammars from PEG definitions — MIT • glsl-toolkit — A library to parse and modify OpenGL Shader Language (GLSL) source code — Artistic • horner — Inline polynomial evaluation using Horner's rule. — MIT • http-get-cache — Common Lisp library for caching HTTP GET responses — MIT • random-sample — Random sample of a sequence with uniform distribution. — MIT Updated projects3d-matrices3d-vectorsagnostic-lizardarchitecture.builder-protocolarchitecture.service-providerarnesiasdf-dependency-grovelasdf-finalizersassoc-utilsbeastbuildnodecamblcaveman2-widgetscaveman2-widgets-bootstrapcl+sslcl-anacl-arxiv-apicl-ascii-artcl-association-rulescl-autorepocl-autowrapcl-bsoncl-conspackcl-containerscl-csvcl-cudacl-custom-hash-tablecl-digraphcl-feedparsercl-freeimagecl-html5-parsercl-influxdbcl-jpegcl-llvmcl-online-learningcl-opsresearchcl-pangocl-protobufscl-pythoncl-scriptingcl-sdl2cl-secure-readcl-strcl-tcodcl-videocl4lclackclinchclipclmlcloser-mopcoleslawcroatoancserial-portdaemondecltdefmacro-enhancedexadoreasy-audioexit-hooksexscribef2clfare-scriptsfemlispfocusfolio2fxmlglisphhermetichu.dwim.asdfhu.dwim.perechu.dwim.presentationhu.dwim.rdbmshu.dwim.reiteratehu.dwim.stefilhu.dwim.utilhu.dwim.web-serverinlined-generic-functionjsonrpckenzolegitlifoolispbuilderlmdblol-remaidenmcclimmedia-typesmetacopymetatilities-basemitomodularize-interfacesmonkeylib-utilitiesmoptilitiesnibblesomer-countopticlpostmodernprovepzmqqlotretrospectiffrtg-mathrutilsscriptlserapeumsketchspinneretstaplestumpwmtriviatrivial-argumentstrivial-ldaptrivial-main-threadwebsocket-driverworkout-timerxhtmlambdazenekindarlzlib. To get this update, use (ql:update-dist "quicklisp"). Enjoy! #### Paul Khuong — Three-universal Hashing in Four Instructions · 114 days ago ... with one caveat: the hash functions only generate one bit. “hash.c”  1 2 3 4 5 6  bool bit_hash(uint64_t x, uint64_t table, uint64_t bit) { /* table is a random uniform uint64_t, bit is a random bit. */ return __builtin_parityll((x & table) ^ bit); }  With hardware popcount, this compiles to something like the following. “hash.s”  1 2 3 4 5   andq %rsi, %rdi # x & table xorl %eax, %eax # work around a hardware perf bug in popcnt xorq %rdi, %rdx # () ^ bit popcntq %rdx, %rax # get the popcount andl1, %eax # isolate parity 

This should raise a few questions:

1. Why?
2. Why does it work?
3. Is it useful?

Someone with a passing familiarity with x86 would also ask why we use popcnt instead of checking the parity flag after xor. Unfortunately, the parity flag only considers the least significant byte of the result (:

## One-bit hash functions: but why?

When implementing something like the hashing trick or count sketches (PDF), you need two sets of provably strong hash functions: one to pick the destination bucket, and another to decide whether to increment or decrement by the sketched value.

One-bit hash functions are ideal for the latter use case.

## How does that even work?

The bitwise operations in bit_hash implement a degenerate form of tabulation hashing. It considers the 64 bit input value x as a vector of 64 bits, and associates a two intermediate output values with each index. The naïve implementation would be something like the following.

“hash_slow.c”
 1 2 3 4 5 6 7 8 9 10 11  bool bit_hash_slow(uint64_t x, bool random_table[64][2]) { int acc = 0 for (size_t i = 0; i < 64; i++, x >>= 1) { acc ^= random_table[i][x & 1]; } return acc; } 

Of course, the representation of random_table is inefficient, and we should hand-roll a bitmap. However, the loop itself is a problem.

The trick is to notice that we can normalise the table so that the value for random_table[i][0] is always 0: in order to do so, we have to fix the initial value for acc to a random bit. That initial value is the hash value for 0, and the values in random_table[i][1] now encode whether a non-zero bit i in x flips the hash value or leaves it as is.

The table argument for bit_hash is simply the 64 bits in random_table[i][1], and bit is the hash value for 0. If bit i in table is 0, bit i is irrelevant to the hash. If bit i in table is 1, the hash flips when bit i in x is 1. Finally, the parity counts how many times the hash was flipped.

## Is it even useful?

I don’t think so. Whenever we need a hash bit, we also want a hash bucket; we might as well steal one bit from the latter wider hash. Worse, we usually want a few such bucket/bit pairs, so we could also compute a wider hash and carve out individual bits.

I only thought about this trick because I’ve been reading a few empirical evaluation of sketching techniques, and a few authors find it normal that computing a hash bit doubles the CPU time spent on hashing. It seems to me the right way to do this is to map columns/features to not-too-small integers (e.g., universal hashing to [0, n^2) if we have n features), and apply strong hashing to these integers. Hashing machine integers is fast, and we can always split strong hashes in multiple values.

In the end, this family of one-bit hash functions seems like a good solution to a problem no one should ever have. But it’s still a cute trick!

#### Christophe Rhodes — karatsuba multiplication in sbcl

· 114 days ago

Possible alternative title: I'm on a train!

In particular, I'm on the train heading to the European Lisp Symposium, and for the first time since December I don't have a criticially urgent piece of teaching to construct. (For the last term, I've been under the cosh of attempting to teach Algorithms & Data Structures to a class, having never learnt Algorithms & Data Structures formally, let along properly, myself).

I have been giving the students some structure to help them in their learning by constructing multiple-choice quizzes. "But multiple-choice quizzes are easy!", I hear you cry! Well, they might be in general, but these quizzes were designed to probe some understanding, and to help students recognize what they did not know; of the ten quizzes I ran this term, several had a period where the modal mark in the quiz was zero. (The students were allowed take the quizzes more than once; the idea behind that being that they can learn from their mistakes and improve their score; the score is correlated to a mark in some minor way to act as a tiny carrot-bite of motivation; this means I have to write lots of questions so that multiple attempts aren't merely an exercise in memory or screenshot navigation).

The last time I was on a train, a few weeks ago, I was travelling to and from Warwick to sing Haydn's Nelson Mass ("Missa in angustiis"; troubled times, indeed), and had to write a quiz on numbers. I'd already decided that I would show the students the clever Karatsuba trick for big integer multiplication, and I wanted to write some questions to see if they'd understood it, or at least could apply some of the results of it.

Standard multiplication as learnt in school is, fairly clearly, an Ω(d2) algorithm. My children learn to multiply using the "grid method", where: each digit value of the number is written out along the edges of a table; the cells of the table are the products of the digit values; and the result is found by adding the cells together. Something like:

       400     20      7
300 120000   6000   2100
90  36000   1800    630
3   1200     60     21
`

for 427×393 = 167811. Similar diagrammatic ways of multiplying (like [link]) duplicate this table structure, and traditional long multiplication or even the online multiplication trick where you can basically do everything in your head all multiply each digit of one of the multiplicands with each digit of the other.

But wait! This is an Algorithms & Data Structures class, so there must be some recursive way of decomposing the problem into smaller problems; divide-and-conquer is classic fodder for Computer Scientists. So, write a×b as (ahi×2k+alo)×(bhi×2k+blo), multiply out the brackets, and hi and lo and behold we have ahi×bhi×22k+(ahi×blo+alo×bhi)×2k+alo×blo, and we've turned our big multiplication into four multiplications of half the size, with some additional simpler work to combine the results, and big-O dear! that's still quadratic in the number of digits to multiply. Surely there is a better way?

Yes there is. Karatsuba multiplication is a better (asymptotically at least) divide-and-conquer algorithm. It gets its efficiency from a clever observation[1]: that middle term in the expansion is expensive, and in fact we can compute it more cheaply. We have to calculate chi=ahi×bhi and clo=alo×blo, there's no getting around that, but to get the cross term we can compute (ahi+alo)×(bhi+blo) and subtract off chi and clo: and that's then one multiply for the result of two. With that trick, Karatsuba multiplication lets us turn our big multiplication into three multiplications of half the size, and that eventaully boils down to an algorithm with complexity Θ(d1.58) or thereabouts. Hooray!

Some of the questions I was writing for the quiz were for the students to compute the complexity of variants of Karatsuba's trick: generalize the trick to cross-terms when the numbers are divided into thirds rather than halves, or quarters, and so on. You can multiply numbers by doing six multiplies of one-third the size, or ten multiplies of one-quarter the size, or... wait a minute! Those generalizations of Karatsuba's trick are worse, not better! That was completely counter to my intuition that a generalization of Karatsuba's trick should be asymptotically better, and that there was probably some sense in which the limit of doing divide-bigly-and-conquer-muchly would turn into an equivalent of FFT-based multiplication with Θ(d×log(d)) complexity. But this generalization was pulling me back towards Θ(d2)! What was I doing wrong?

Well what I was doing wrong was applying the wrong generalization. I don't feel too much shame; it appears that Karatsuba did the same. If you're Toom or Cook, you probably see straight away that the right generalization is not to be clever about how to calculate cross terms, but to be clever about how to multiply polynomials: treat the divided numbers as polynomials in 2k, and use the fact that you need one more value than the polynomial's degree to determine all its coefficients. This gets you a product in five multiplications of one-third the size, or seven multiplications of one-quarter, and this is much better and fit with my intuition as to what the result should be. (I decided that the students could do without being explicitly taught about all this).

Meanwhile, here I am on my train journey of relative freedom, and I thought it might be interesting to see whether and where there was any benefit to implement Karatsuba multiplication in SBCL. (This isn't a pedagogy blog post, or an Algorithms & Data Structures blog post, after all; I'm on my way to a Lisp conference!). I had a go, and have a half-baked implementation: enough to give an idea. It only works on positive bignums, and bails if the numbers aren't of similar enough sizes; on the other hand, it is substantially consier than it needs to be, and there's probably still some room for micro-optimization. The results?

The slopes on the built-in and Karatsuba mulitply (according to the linear model fit) are 1.85±0.04 and 1.52±0.1 respectively, so even million-(binary)-digit bignums aren't clearly in the asymptotic régimes for the two multiplication methods: but at that size (and even at substantially smaller sizes, though not quite yet at Juho's 1000 bits) my simple Karatsuba implementation is clearly faster than the built-in multiply. I should mention that at least part of the reason that I have even heard of Karatsuba multiplication is Raymond Toy's implementation of it in CMUCL.

Does anyone out there use SBCL for million-digit multiplications?

#### Christophe Rhodes — going to els2017

· 114 days ago

I'm going to the European Lisp Symposium this year! In fact, I have to catch a train in two and a half hours, so I should start thinking about packing.

I don't have a paper to present, or indeed any agenda beyond catching up with old and new friends and having a little bit of space to think about what might be fun to try. If you're there and you want to make suggestions, or just tell me what you're up to: I'd love to hear. (And if you're not there: bad luck, and see you another time!)

#### Eugene Zaikonnikov — Sifting through ImageNet dataset in Lisp

· 117 days ago

..there is no shortage of rainy evenings in the rain capital of the world, so I used a few of them to put together this small application that I called (perhaps overly ambitiously) cl-imagenet. It uses a bunch of Lisp libraries: opticl and cl-jpeg for image processing, cxml for extracting bounding boxes from the annotations, cl-fad for filesystem operations, and trivial-channels in combination with clx for streaming to display.

The code tries to detect how many cores the host machine has, then creates the corresponding number of worker units. The workset ImageNet subunits list is built up, which are then assigned to the workunits. Each workunit fetches annotation file, extracts the bounding boxes and image file reference, decodes the corresponding JPEG file, handles processing with OptiCL and sends the result via shared channel to display thread. It is impressive how compact the code can be when leveraging random bits of the ecosystem available through Quicklisp.

In this setup only the luminance component of JPEG is extracted and then thresholded from medium gray. The video is filmed on an old quad i5-2500. On my 8-core i7-6700 box with visualisation off, it averages some 200K processed images per hour.

Tested lightly with SBCL on Linux. One known problem place is the message channel gradually eating up memory with visualization on, but as it's only used in tests it hasn't been a pressing need to fix yet.

#### Zach Beane — Planet Lisp Archives - March, 2007

· 119 days ago
Planet Lisp Archives - March, 2007:

Ten years ago, headlines from Planet Lisp. (Many links are broken, but fewer than I expected. And I still have the article text saved in my Planet Lisp database...)

For older items, see the Planet Lisp Archives.

Last updated: 2017-07-25 19:13