Planet Lisp

Zach BeaneECLM 2013 - two days left to register

· 5 days ago

Important news:

Subject: ECLM 2013 - two days left to register
Date: Mon, 13 May 2013 19:17:26 +0200
From: Edi Weitz

Ladies and Gentlemen,

This is our last call - there are two days left to register for this year's ECLM in case you haven't done so already:

   http://weitz.de/eclm2013/

We have more than 50 registrations so far, but we wouldn't mind seeing another dozen or two...

Best regards,
Arthur & Edi.

I greatly enjoyed my last visit to ECLM and I can't recommend it highly enough if you want to meet people with interesting Common Lisp projects and ideas.

Dimitri Fontainefrom Parsing to Compiling

· 5 days ago

Last week came with two bank holidays in a row, and I took the opportunity to design a command language for pgloader. While doing that, I unexpectedly stumbled accross a very nice AHAH! moment, and I now want to share it with you, dear reader.

AHAH, you'll see!

The general approach I'm following code wise with that command language is to first get a code API to expose the capabilities of the system, then somehow plug the command language into that API thanks to a parser. It turns out that doing so in Common Lisp is really easy, and that you can get a compiler for free too, while at it. Let's see about that.

A very simple toy example

In this newsgroup article What is symbolic compoutation?, Pascal Bourguignon did propose a very simple piece of code:

(defparameter *additive-color-graph*
  '((red   (red white)   (green yellow) (blue magenta))
    (green (red yellow)  (green white)  (blue cyan))
    (blue  (red magenta) (green cyan)   (blue white))))

(defun symbolic-color-add (a b)
  (cadr (assoc a (cdr (assoc b *additive-color-graph*)))))

This is an example of symbolic computation, and we're going to build a little language to express the data and the code. Not that we would need to build one, mind you, more in order to have a really simple example leading us to the ahah moment you're now waiting for.

Before we dive into the main topic, you have to realize that the previous code example actually works: it's defining some data, using an implicit data structure composed by nesting lists together, and defines a function that knows how to sort out the data in that anonymous data structure so as to compound 2 colors together.

TOY-PARSER> (symbolic-color-add 'red 'green)
YELLOW

A command language and parser

I decided to go with the following language:

color red   +red white    +green yellow  +blue magenta
color green +red yellow   +green white   +blue cyan
color blue  +red magenta  +green cyan    +blue white

mix red and green

And here's how some of the parser looks like, using the esrap packrat lib:

(defrule color-name (and whitespaces (+ (alpha-char-p character)))
  (:destructure (ws name)
    (declare (ignore ws))               ; ignore whitespaces
    ;; CL symbols default to upper case.
    (intern (string-upcase (coerce name 'string)) :toy-parser)))

;;; parse string "+ red white"
(defrule color-mix (and whitespaces "+" color-name color-name)
  (:destructure (ws plus color-added color-obtained)
    (declare (ignore ws plus))          ; ignore whitespaces and keywords
    (list color-added color-obtained)))

;;; mix red and green
(defrule mix-two-colors (and kw-mix color-name kw-and color-name)
  (:destructure (mix c1 and c2)
    (declare (ignore mix and))          ; ignore keywords
    (list c1 c2)))

Those rules are not the whole parser, go have a look at the project on github if you want to see the whole code, it's called toy-parser over there. The main idea here is to show that when we parse a line from our little language, we produce the simplest possible structured data: in lisp that's symbols and lists.

The reason why it makes sense doing that is the next rule:

The one grammar rule to bind them all

(defrule program (and colors mix-two-colors)
  (:destructure (graph (c1 c2))
    `(lambda ()
       (let ((*additive-color-graph* ',graph))
         (symbolic-color-add ',c1 ',c2)))))

This rule is the complex one to bind them all. It's using a quasiquote, a basic lisp syntax element allowing the programmer to very easily produce data that looks exactly like code. Let's see how it goes with a very simple example:

TOY-PARSER> (pprint (parse 'program
                           "color red +green yellow mix green and red"))

(LAMBDA NIL
  (LET ((*ADDITIVE-COLOR-GRAPH* '((RED (GREEN YELLOW)))))
    (SYMBOLIC-COLOR-ADD 'RED 'GREEN)))

The parser is producing structure (nested) data that really looks like lisp code, right? So maybe we can just run that code...

What about a compiler now?

Here is my AHAH moment!

Let's see about actually running the code:

TOY-PARSER> (let* ((code "color red +green yellow mix green and red")
                   (program (parse 'program code)))
              (compile nil program))
#<Anonymous Function #x3020027CF0EF>
NIL
NIL
TOY-PARSER> (let* ((code "color red +green yellow mix green and red")
                   (program (parse 'program code)))
              (funcall (compile nil program)))
YELLOW

So we have a string reprensing code in our very little language, and a parser that knows how to produce a nested list of atoms that looks like lisp code. And as we have lisp, we can actually compile that code at run-time with the same compiler that we used to produce our parser, and we can then funcall that function we just built.

Oh and the function is actually compiled down to native code, of course:

TOY-PARSER> (let* ((code "color red +green yellow mix red and green")
                   (program (parse 'program code))
                   (func    (compile nil program)))
              (time (loop repeat 1000 do (funcall func))))

(LOOP REPEAT 1000 DO (FUNCALL FUNC))
took 108 microseconds (0.000108 seconds) to run.
During that period, and with 4 available CPU cores,
     105 microseconds (0.000105 seconds) were spent in user mode
      13 microseconds (0.000013 seconds) were spent in system mode
NIL

Yeah, it took the whole of 108 microseconds to actually run the code generated by our own parser a thousand times, on my laptop. I can believe it's been compiled to native code, that seems like the right ballpark.

Conclusion

The toy-parser code is there on GitHub and you can actually load it using Quicklisp: clone the repository in ~/quicklisp/local-projects/ then (ql:quickload "toy-parser"), and play with it in (in-package :toy-parser).

The only thing I still want to say here is this: can your programming language of choice make it that easy?

Brit ButlerColeslaw 0.9.2 and other lispiness

· 7 days ago

Coleslaw 0.9.2

It still amuses me that my most successful project to date is a blog engine. Not that I'm complaining about having contributors. When I last mentioned it, version 0.8 had just been released. Since then there have been 2 new contributors and a bunch of new features. I think the code has mostly improved in cleanliness.

The biggest changes are new shiny docs, a new tags implementation, cleanups to theming, and plugins for Google Analytics, Github Pages, and Sitemap Generation. For the full details, see the changelog.

My plans for 1.0 are primarily to take advantage of the extensible content types added in 0.8 and add some sort of tumblr-like auto-embedding support. But I probably won't get around to working on that for a spell. Why?

Other Lispiness

Because my lisp emulation experiment/art project is ongoing. Nyef was kind enough to share some code he'd hacked up for NES emulation years ago and it helped give me the motivation to rewrite famiclom's PPU (Graphics Card). The former code was largely cribbed from Patrick Walton's sprocketnes and I didn't understand it very well. I've hit the nesdev wiki again and am getting more comfortable with the PPU's actual workings. The code is on github in the new-ppu branch and I'm hoping to spend more time on it this coming week.

I also spent the last week porting cl-6502 to clojurescript for giggles. Heresy, I know. ;)
cljs-6502 is in a basic working state but there are bugs aplenty and I haven't implemented the assembler or disassembler. The must frustrating part was dealing with A) differences in macro hygiene and B) poor macro debugging facilities.

The browser is a fun target though. I'll have to try parenscript or ... jscl! JSCL is a full CL->JS compiler in early development, which I contributed a tiny patch to for fboundp. It's a great project and if you have any interest in helping implement a lisp, I'd encourage you to get involved. The maintainers are very approachable and there's plenty of fun hacking to be had.

All for now. It's time to play around trying static analysis of Nintendo ROMs with Lisp. I'm bound to learn something...hopefully.

Patrick SteinTrack-Best Library Released

· 9 days ago

In doing a problem set from the Internet a few weeks ago, I found myself writing awkward constructions with REDUCE and LOOP to try to find the best one (or two or three) things in a big bag of things based on various criteria: similarity to English, hamming distance from their neighbor, etc.

I wrote a macro that encapsulated the pattern. I’ve reworked that macro into a library for public use.

Acquiring

Using

There are a variety of examples in the README and the tests directory.

Here is one example to pique your interest. Suppose you have some data about the elevations of various cities in various states and you (being a Moxy Fruvous fan) want to know What Is the Lowest Highest Point? Here’s how you might tackle that with the TRACK-BEST library:

(let ((data '(("Alabama"  ("Birmingham" 664)
                          ("Mobile" 218)
                          ("Montegomery" 221))
              ("Alaska"   ("Anchorage" 144)
                          ("Fairbanks" 531))
              ("Arizona"  ("Grand Canyon" 6606)
                          ("Phoenix" 1132)
                          ("Tuscon"  2641)))))
  (with-track-best (:order-by-fn #'<)
    (dolist (state-info data)
      (multiple-value-bind (city altitude)
          (with-track-best ()
            (dolist (city-info (rest state-info))
              (track (first city-info) (second city-info))))
        (track (list (first state-info) city) altitude)))))

With this limited dataset, the end result would be (VALUES '("Alaska" "Fairbanks") 531). The inner WITH-TRACK-BEST finds the highest city in each state. The outer WITH-TRACK-BEST finds the lowest of these.

Zach BeaneLisp gatherings

· 10 days ago

I'm very excited to be attending both ECLM and ELS in Madrid this year. This is the last week to register for ECLM, so if you want to go, you should register ASAP.

Liam Healycl-plplot for simple x-y plots from Lisp

· 13 days ago
I had need recently to create quick x-y plots on the screen. I have been generating TikZ and PGFplots from Lisp for journal and conference papers and presentations. They look very nice but are not quick, either in writing or displaying. There are a number of CL packages listed for plotting, and I gave cl-plplot a try. It is in quicklisp. Though there is no active work on it, I was able to generate plots that I wanted quickly.

Here is an example, plotting the square function. First load needed libraries
sudo aptitude install libplplot-dev plplot11-driver-cairo plplot11-driver-xwin

(in Ubuntu 12.04LTS), then in Lisp
(ql:quickload :cl-plplot)
(defun x-y-plot ()
       (let* ((x (iter (for x from 0.0 to 10.0 by 0.25) (collect x result-type vector)))
              (y (map 'vector (alexandria:rcurry 'expt 2) x))
              (p (cl-plplot:new-x-y-plot x y))
              (w (cl-plplot:basic-window :x-label "x" :y-label "x squared" :title "The square function")))
         (cl-plplot:add-plot-to-window w p)
         (cl-plplot:render w "xwin")))
(x-y-plot)

Click on window and Enter to close; clicking on the close box for the window causes Lisp to crash, at least from Slime.

Patrick SteinBuilding the language up to you

· 13 days ago

One of the things that’s often said about Lisp is that it allows you to build up the language to you. Here are two examples that really drive home that point, for me.

The Problem Domain

I read an article about Treaps on reddit yesterday. The article used pretty direct pylisp to present Treaps.

I thought it would be fun to go through the exercise in Common Lisp instead. Pylisp made a number of things awkward that would melt away in Common Lisp.

I began by defining a node structure to encapsulate the information at a given node:

(defstruct node
  (priority (random 1.0d0) :type real :read-only t)
  (key 0 :read-only t)
  (value 0 :read-only t)
  (left nil :type (or null node) :read-only t)
  (right nil :type (or null node) :read-only t))

Then, I defined a top-level structure to hold the root node, track the function used to sort the tree, and track the function used to create a key from a value. I used the convention that two keys are equivalent if neither is less than the other.

(defstruct treap
  (root nil :type (or null node) :read-only t)
  (less-than #'< :read-only t)
  (key #'identity :read-only t))

The First Build Up

The article was using a functional style. As you may have guessed by the liberal use of :read-only t in those DEFSTRUCT clauses, I also used a functional style.

When working with functional data structures, one often needs to copy a whole structure with only slight modifications. Here, Lisp’s keyword arguments made everything simple and clean. I made this functions:

(defun copy-node (node &key (priority (node-priority node))
                            (key (node-key node))
                            (value (node-value node))
                            (left (node-left node))
                            (right (node-right node)))
  (make-node :priority priority
             :key key
             :value value
             :left left
             :right right))

Now, for any node, I could copy it and only have to specify the fields that I wanted to change. Here is a left-rotation using this:

(defun left-rotate (node)
  (let ((left (node-left node)))
    (copy-node left
               :right (copy-node node
                                 :left (node-right left)))))

What other languages let you do anything like that so simply? You could pull it off in Perl if you were willing to have your node be a hash (which, admittedly, you probably are if you’re writing Perl). What other language lets you do anything like that? Even other languages with named arguments don’t let you have the defaults based on other arguments.

The Second Build Up

As with any binary tree, you find yourself having to deal with four specific cases over and over again with Treaps.

I wrote myself a TREAP-CASES macro that streamlines all of these checks. It also validates to make sure you don’t have duplicate cases or cases other than these four. It makes sure that no matter which order you write the cases (or even if you leave some out), they end up organized in the order listed above. The four cases are mutually exclusive so the order doesn’t matter except that you want to make sure you haven’t run out of tree before doing comparisons and that once you’re beyond the less-than and greater-than cases you already know you’re in the equivalence case.

With this macro, my TREAP-FIND function looks like this:

(defun treap-find (key treap)
  (check-type treap treap)
  (labels ((treap-node-find (root)
             (treap-cases (key root treap)
               (null (values nil nil))
               (< (treap-node-find (node-left root)))
               (> (treap-node-find (node-right root)))
               (= (values (node-value root) t)))))
    (treap-node-find (treap-root treap))))

The little DSL makes writing and reading the code so much easier. All of the mess of comparing the key to the node’s key is hidden away.

With years of practice and days of debugging, you might be able to pull off some quasi-readable control construct like this using C++ templates. With enough therapy, you could convince yourself you can get a close-enough effect with C-preprocessor macros. In Lisp, it’s the work of minutes (without lying to yourself).

Here is the TREAP-CASES macro for reference/completeness.

(defmacro treap-cases ((key root treap) &rest clauses)
  (validate-treap-case-clauses clauses)
  (let ((k (gensym "KEY-"))
        (tr (gensym "TREAP-"))
        (r (gensym "ROOT-"))
        (t< (gensym "<-")))
    `(let* ((,k ,key)
            (,tr ,treap)
            (,r ,root)
            (,t< (treap-less-than ,tr)))
       (cond
        ((null ,r) ,@(rest (assoc 'null clauses)))
        ((funcall ,t< ,k (node-key ,r)) ,@(rest (assoc '< clauses)))
        ((funcall ,t< (node-key ,r) ,k) ,@(rest (assoc '> clauses)))
        (t ,@(rest (assoc '= clauses)))))))

I suppose I should also include #'VALIDATE-TREAP-CASE-CLAUSES, too, but it’s what you’d expect:

(defun validate-treap-case-clauses (clauses)
  (let ((all-choices '(null < > =)))
    (flet ((assert-all-choices-valid ()
             (dolist (c clauses)
               (unless (member (first c) all-choices)
                 (error "Unrecognized clause type: ~S" (first c)))))
           (assert-no-duplicates ()
             (dolist (c all-choices)
               (unless (<= (count c clauses :key #'first) 1)
                 (error "Duplicate ~S clause not allowed." c)))))
      (assert-all-choices-valid)
      (assert-no-duplicates))))

Nick LevineRavenbrook has recovered and open-sourced MLWorks

· 15 days ago
MLWorks is an "industrial strength" Standard ML compiler and integrated development environment, developed by Harlequin in the 1990s.

Ravenbrook Limited (whose directors were members of the original MLWorks team) acquired the rights to MLWorks on 2013-04-26 and have open sourced the project. Source code is under the BSD license on GitHub.

David VázquezThe surprising JSCL rising

· 17 days ago

I am just back from a few days of holidays and have seen how active JSCL is! It is really exciting.

This rising is due to abeaumont probably. I stopped working, but he came up with an idea I liked. It is, why don't integrate JSCL into Conkeror? We would end up with a Lisp programmable browser… that's a practical reason! We did some tests locally and it seemed to work, but now it is time for improving JSCL itself.

It is not the only platform where JSCL could run on. I plan to add support for Node JS. Indeed, it will let us use the REPL and tests in the terminal in addition to the browser, which is a much more convenient way to program.

In the meantime, I would like to improve the compiler a little bit.

Suggestion and patches are very welcome if you want to join us.

https://github.com/davazp/jscl

Vsevolod DyomkinErrors in your Platform

· 19 days ago

One of the always trending topics in our pop-culture-like computer industry is dealing with errors. There are whole movements addressing it. Yet, in my humble opinion, these movements often see errors in a rather narrow, I would even say, simplistic manner: like the proverbial example of passing a string to a function which expects integers. But, in fact, the scope of errors that can occur in a program is indefinite. In a senses, the language of errors is Turing complete, and creating a program can be viewed from a certain point as devising a way of working around all relevant errors.

Today I'd like to talk about a particularly nasty type of errors - those that occur in the platform on which we're building the program. And the reality is such, that we base our programs on at least several layers of platforms: a hardware platform, an OS and its standard library, a language's VM and the language's standard library, some user-level framework(s) - to name a few. There are several aspects of this picture - some positive for us and some negative.

There's a revealing rant by node.js creator Ryan Dahl on the topic of such platforms - now removed, but available in the all-remembering Internet - "I hate almost all software".

The platform errors are so nasty, because of two things:

And the thing is, there are severe bugs in all of the platforms we use. The Pentium FDIV bug is one acute example, but let's talk Java here, because there's so many people bragging how rock solid the JVM is. One of the problems of the JVM is jar hell - the issue that haunts almost every platform under different names (like DLL hell). We had an encounter with it recently, when after an update our server started hanging for no apparent reason. After an investigation we've found that part of our code was compiled with a more recent version of Gson library for parsing JSON, while the main server used the older version. The interesting thing is that there was no error condition signalled. I wonder, how many projects do the same mistake and don't even know what's the source of their problems. We had a similar but more visible issue with different versions of the servlet-api, but that time they were caught at compile-time because of the minor API inconsistency.

The other problem we'd fought fruitlessly and had to eventually work around with external tools was the JVM not properly closing network sockets. Yes, it leaks sockets in some scenarios and we couldn't find a way to prevent it. This is an example of another nasty platform error - the one which happens at layer boundaries. Also, many java libraries, like the ubiquitous jetty webserver leak memory in their default setting. Actually, I wonder, what percentage of Java libraries doesn't leak memory by default?.. This is no way to say, that JVM is exceptional in this regard - other platforms suffer from the same problems. At the same time to JVM's credit it has one of the tools to fight platform errors that many other platforms lack.

So, what are the ways to deal with those errors? Unfortunately, I don't have a lot of answers here, but I hope to discover some with this article. Here are some of the suggestions:

First of all, don't trust the platform.
Try to check everything and cross-validate your assumptions with external tools. This is much harder to do than say, and there's a tradeoff to make somewhere between slackness and paranoia. But if you think of it beforehand, you can see ways to set-up the system so that it would be easier to deal with the problem when it's necessary.
Have the platform's source code at hand.
This is one dimension of traceability - the static one. It won't help you easily find all the problems, but without it you're just at the mercy of the platform's creator. And I'd not trust those guys (see p.1)
Do proper error-handling.
There's a plethora of information on that, and I'd not repeat it here. See the error handling manifesto for one of the good examples.

But the biggest upside is not in fighting with the existing system, but in removing or diminishing the source of the problems - i.e. creating more programmer-friendly platforms. So the most important recommendations I see are for platform authors (and, in fact, most of us become them to some extent when we release a utility library or a framework).

Build platforms with traceability in mind.
One example of such traceability is JVM's VisualVM which is a great tool to get visibility into what's happening in the platform. Another example is Lisp images the state of which you can inspect interactively with Lisp commands. For instance, Lisp has built-in operators TRACE that allow to see the invocations of any interesting function, and INSPECT which is an interactive object browser. These things are really hard to add after-the-fact: look, for example, at the trouble of bringing DTrace into the Linux world.
Create homogenous platforms.
Why? Because in homogenous platforms the boundaries are not so sharp, and so there tend to be fewer bugs on the boundaries. And it's much easier to build in traceability: for example you don't need to deal with multiple stacks, multiple object representation etc. Once again, Lisp comes up here, because with its flexibility it allows to build systems of different levels of abstraction in the same language. And that was exemplified by the Lisp Machines - see the Kalman Reti's talk describing their inner workings.
Support component versioning at core.
This is, actually, one of the topics not addressed well by any mainstream platform (except, maybe, .Net), as far as I know: how to allow several versions of the same component to live together in one program image.

How can static typing help with these types of errors? I don't know. What about the Erlang's failure isolation. Once again, it's pretty orthogonal to the mentioned issues: if your Erlang library leaks memory, the whole machine will suffer - i.e. you can't isolate errors which touch the underlying layers of your platform which you don't fully control...

Please, share your approaches and stories about dealing with platform errors.

Zach Beanejscl rising

· 20 days ago
I am impressed with the web-based jscl repl. Most Lisp web repls are for toy Lisps. I was so charmed that the first form I tried in JSCL, (make-package "FOO"), worked that I didn't mind that a lot of other stuff hasn't been implemented yet. I want to help hack on it to make it better!

A bunch of people have forked it on github to add more and more functionality, and I think jscl will just keep getting cooler. Good job, David Vázquez and contributors!

Zach BeaneQuick and dirty progress tracking

· 24 days ago

Today I was running some analysis on about 9,000 files, basically mapping a function over each to warm up a cache. Something like this:

* (map nil 'analyze-file *9000-files*)
time passes

I had no idea how well it was progressing, and whether I'd need to take a snack break or let it run overnight. So I interrupted it and wrote a quick and dirty REPL utility function:

(defun :dot-every (count fun)
  (let ((i 0))
    (lambda (&rest args)
      (when (< count (incf i))
        (setf i 0)
        (write-char #\. *trace-output*)
        (force-output *trace-output*))
      (apply fun args))))

It prints out one dot per COUNT invocations of the function it returns, giving some indication of progress. Sensible values for COUNT depend on the volume of function calls.

For this problem, I called it with a COUNT of 100:

* (map nil (:dot-every 100 'analyze-file) *9000-files*)
............etc

The cached analyses printed out a ton of dots quickly, and the uncached analyses started printing dots at a slow but steady pace, and I could tell that it would be done in a few minutes instead of a few hours.

So now I'm going to use this to wrap up any function I have to call a ton of times and I want to get a sense of how it's progressing.

Christophe Rhodes24 Apr 2013

· 24 days ago
The student application period for Google's Summer of Code is now open. If you are a student interested in applying for an SBCL-related project, great! There's an application template at the SBCL organization page: at the very least any application must include all the requested information.

When applying, it's worth realizing that both SBCL mentors and Google SoC administrators have a say in whether students are accepted onto the programme. I can't really say anything about what SoC administrators are looking for, but SBCL mentors are likely to be positively disposed towards applications which demonstrate:


There is still time for any prospective applicants to work to provide the basis for these demonstrations: applications to the SoC programme close on Friday 3rd May.

Members of some of the other, longer-participating projects in Summer of Code have also posted on the subject of student applications; they are perhaps worth reading for any prospective applicants. I've seen a blog post from a Debian mentor and the Clojure guidelines; there are probably others, saying similar things: communicate, demonstrate, self-motivate.

Hans HübnerBerlin Lispers Meetup: Tuesday April 30th 2013, 8pm

· 25 days ago

You are kindly invited to the next "Berlin Lispers Meetup", an informal gathering for anyone interested in Lisp, beer or coffee:

Berlin Lispers Meetup
Tuesday, April 30th, 2013
8 pm onwards
St Oberholz, Rosenthaler Straße 72, 10119 Berlin
U-Bahn Rosenthaler Platz
We will try to occupy a nice table on the first floor, but in case you don't see us please contact Hans: 0177 512 1024.

Please join for another evening of parentheses!

Patrick SteinTwo snippets that I rewrite every three or four months...

· 25 days ago

A friend of mine is taking a cryptography course this semester. One of his current homework problems involves creating n, e, and d values for RSA Encryption. Always one to play the home game when math is involved, I found myself rewriting two snippets of code that I always end up writing any time I do any number-theory related coding.

One of them is simple EXPONENT-MODULO that takes a base b, an exponent e, and a modulus m and returns b^e \pmod m. There are more efficient ways to do this, but this one is much faster than the naïve method and quite easy to rewrite every time that I need it. The idea is that when e is odd, you recursively calculate b^{e-1} \pmod m and return b \cdot b^{e-1} \pmod m. When e is even, you recursively calculate a = b^{e/2} \pmod m and return a^2 \pmod m.

(defun expt-mod (b e m)
  (cond
    ((zerop e) 1)
    ((evenp e) (let ((a (expt-mod b (/ e 2) m)))
                 (mod (* a a) m)))
    (t (mod (* b (expt-mod b (1- e) m)) m))))

The other snippet of code is a good deal trickier. Every time I need it, I spend way more time than I want to futzing with examples, getting lost in subscripts, etc. So, while I could rewrite the above in a moment, what’s below, I hope I have the search-fu to find this post next time I need it.

If you have a number n and some number a which is relatively prime to (shares no factors except for one with) n, then there exists some number b such that a \cdot b \equiv 1 \pmod n. And, in fact, all such numbers b for a given a differ by a multiple of n, so there is a unique such b \in [1,n).

Another way of saying a and n are relatively prime is to say that their greatest common divisor \gcd(a,n) is one. Besides being the biggest number that divides into both n and a, the \gcd(a,n) is also the smallest positive number expressible as x \cdot a + y \cdot n for integers x and y. If you keep track of the steps you use when calculating the greatest common divisor (which in this case is 1) while using the Euclidean algorithm you can (with enough care) backtrack through the steps and determine x and y for your given a and n.

For example, suppose you had n = 40 and a = 31. You want to know what b you’d need for 31b \equiv 1 \pmod{40}.
The steps in the Euclidean algorithm show:

\begin{array}{rcl}40 & = & 31 \cdot 1 + 9 \\31 & = & 9 \cdot 3 + 4 \\9 & = & 4 \cdot 2 + 1\end{array}

Working backwards, you can express 1 as a combination of x_0 \cdot 4 + y_0 \cdot 9. Then, you can express 4 as a combination of x_1 \cdot 9 + y_1 \cdot 31 and so on:

\begin{array}{rcl}1 & = & (-2)\cdot4 + 9\cdot{1} \\  & = & (-2)\cdot\left((-3)\cdot9 + 31\right) + 9\cdot{1} \\  & = & (-2)\cdot31 + 7\cdot9 \\  & = & (-2)\cdot31 + 7\cdot(40 - 31\cdot1) \\  & = & (-9)\cdot31 + 7\cdot(40) \equiv 1 \pmod{40}\end{array}

Wheee… in my example chosen to be easy to backtrack, the answer comes out to be -9 which is equal to 31 modulo 40. So, 31^2 \equiv 1 \pmod{40}. I picked a lucky number that is its own inverse modulo 40. It’s not usually like that. For example, 3 \cdot 27 \equiv 7 \cdot 23 \equiv 1 \pmod{40}.

So, anyhow, here’s a function that takes a and n and returns two values x and y so that x\cdot{}a + y\cdot{}n \equiv \gcd(a,n) \pmod n.

(defun gcd-as-linear-combination (a n)
  (multiple-value-bind (q r) (truncate n a)
    (if (zerop r)
        (values 1 0)
        (multiple-value-bind (x y) (gcd-as-linear-combination r a)
          (values (- y (* q x)) x)))))

Assuming that a and n are relatively prime with < n" />< n" style="vertical-align:-20%;" class="tex" alt="a < n" />, the first value returned is a‘s inverse modulo n.

Didier VernaLisp, GSoC and CDRs

· 26 days ago

Here's a bright idea: why not take the GSoC opportunity to have someone implement all current CDRs in every major Common Lisp implementation? Not that I'm volunteering for anything. I'm justing throwing the though out in the open...

Quicklisp newsWhat's new in Quicklisp this month?

· 26 days ago
Lisp hero Ben Hyde has done me a big favor and summarized what the new projects in Quicklisp this month are all about.

Didier VernaELS 2013 registration now open

· 27 days ago

Hello,

just a quick note to let you know that we have finally opened the registration process for ELS 2013. See the bottom the page for more information. It is possible to register via PayPal or via direct bank transfer.

Looking forward to see you all !

Quicklisp newsApril 2013 Quicklisp dist update now available

· 28 days ago

New projects: cl-arff-parser, cl-bayesnet, cl-libpuzzle, cl-one-time-passwords, cl-rrt, cl-secure-read, function-cache, gendl, sha3, trivial-raw-io, yaclanapht.

Updated projects: access, asdf-finalizers, binary-types, binomial-heap, bk-tree, bknr-datastore, buildnode, caramel, cffi, chunga, city-hash, cl+ssl, cl-6502, cl-async, cl-async-future, cl-base32, cl-cairo2, cl-closure-template, cl-csv, cl-decimals, cl-erlang-term, cl-factoring, cl-general-accumulator, cl-gss, cl-i18n, cl-inotify, cl-l10n, cl-libevent2, cl-libxml2, cl-murmurhash, cl-mustache, cl-oauth, cl-pdf, cl-ppcre, cl-protobufs, cl-random, cl-sanitize, cl-seek-project, cl-slice, cl-store, cl-typesetting, clack, cleric, clfswm, closer-mop, clpython, clsql, clsql-helper, codata-recommended-values, coleslaw, collectors, com.google.base, common-lisp-stat, commonqt, css-selectors, doplus, drakma, elf, esrap, f2cl, gbbopen, green-threads, gtk-cffi, ht-simple-ajax, hu.dwim.computed-class, hunchentoot, hyperobject, inotify, kmrcl, lfarm, lift, lisp-interface-library, lisp-unit, lla, log4cl, lparallel, mgl, more-conditions, optima, patron, periods, pileup, plokami, portableaserve, postmodern, rcl, repl-utilities, restas, sb-fastcgi, single-threaded-ccl, sip-hash, slime, temporary-file, trivial-gray-streams, trivial-ldap, twfy, uiop, umlisp, weblocks, xml.location.

Removed projects: genworks-gdl (renamed to "gendl"), printf (depends on smug), pubmed (repo gone), smug (withdrawn per maintainer request).

Stelian IonescuA legal analysis of the LLGPL

· 31 days ago

Eli Greenbaum's analysis here is the first legal commentary on the LLGPL I've encountered.

⚓ Permalink

Paul KhuongStarting to hack on SBCL

· 35 days ago

SBCL was accepted as a mentoring organisation for Google’s Summer of Code 2013 (our list of project suggestion is here). This will be our first time, so that’s really great news. I’m also extremely surprised by the number of people who’ve expressed interest in working with us. I was going to reply to a bunch of emails individually, but I figure I should also centralise some of the stuff here.

EDIT: There’s a section with a bunch of general references.

EDIT 2: Added a note on genesis when playing with core formats.

Getting started

Setting up the basic tools

The first step is probably to install git, and clone our repo (the github mirror works well, and lets you fork to your own github account for easy publication). Then, building from source and installing SBCL (a local installation to $HOME works fine) is obviously useful experience, and will be useful to explore the source. Reading INSTALL should be enough to get started on Linux or OS X and x86[-64]. Other platforms may need more work, and might not be the best choice if you’re not interested in improving the port itself (although I’m told FreeBSD and Solaris work very well on x86[-64]). To build SBCL from source, you’ll need an SBCL binary (bootstrapping from other CLs should work, but support regularly bitrots away), and the usual C development tools (e.g. build-essential on debian). A fancy build (./make.sh --fancy) is probably the best choice for development.

You’ll also want to run the test suite often; better try it out now (cd tests; sh run-tests.sh) to make sure you can get it working. The test suite will barf if there’s non-ASCII characters in the environment. Oh my Zsh’s git customisation systematically trips me up, for example (I currently kludge it and start a bash from ~ and then run the tests).

Once SBCL HEAD is working and installed, it’s probably best to install emacs and SLIME. Quicklisp’s quicklisp-slime-helper can take care of installing SLIME. It is possible to work on SBCL without SLIME. However, SLIME has a lot of useful extensions, if only to explore the code base. If you’re not (and don’t wish to become) comfortable with emacs, it’s probably best to nevertheless use emacs and SLIME for the REPL, debugger, inspector, etc. and write code in your favourite editor. Later on, it’ll be useful to figure out how to make SLIME connect to a freshly-built SBCL.

Exploring the source

I often see newcomers try to read the source like a book, and, once they realise there’s a lot of code, try to figure out a good order to read the source. I don’t think that’s the best approach. SBCL is pretty huge, and I doubt anyone ever simultaneously holds the complete system in their head. RAM’s “The Python Compiler for CMU Common Lisp” is still useful as an overview, and SBCL’s internals manual is a good supplement. Once you get close to bootstrapping logic, Christophe Rhodes’s “SBCL: a Sanely-Bootstrappable Common Lisp” helps understand the exclamation marks. Past that, I believe it’s preferrable to start out small, learn just enough to get the current task done, and accept that some things just work, without asking how (for now).

In that spirit, I’d say M-. (Alt period, Command period on some OS X emacsen) is the best way to explore most of SBCL’s source. SBCL’s build process preserves a lot of source location information, and M-. queries that information to jump to the definitions for any given symbol (M-, will pop back up to the previous location). For example, if you type “(truncate” at the REPL and hit M-. (with the point on or just after “truncate”), you’ll find the out of line definition for truncate in (mostly) regular Common Lisp, optimisation rules regarding truncate, and VOPs, assembly language templates, for truncate called with a few sets of argument and return types. The out of line definition isn’t that interesting. The transforms, however, are. (VOPs aren’t useful if one isn’t comfortable with the platform’s assembly language, and mostly self-explanatory otherwise.)

The one to “convert integer division to multiplication” is a very good example. One could M-. on deftransform, and go down a very long chain of definitions. Instead, I think it’s only essential to see that the form defines a new rule, like a compiler macro, such that compile-time values (lvars) that represent its two arguments are bound to x and y, and the rule only triggers if its first argument is known to be an unsigned word, and its second a constant unsigned word. If that’s satisfied, the transformation still only triggers if the speed optimisation quality is higher than both compilation speed and space (code size).

Then, the constant value for y is extracted and bound to y, and a conservative bound on the maximum value that x can take at runtime is computed. If truncate by y should be handled elsewhere, the transform gives up. Otherwise, it returns a form that will be wrapped in (lambda (x y) ...) and spliced in the call, instead of truncate.

To extend SBCL’s support for division by constants, it’s not necessary to understand more of SBCL’s compiler than the above. There’s no need to try and understand how deftransform works, only that it defines a rule to simplify calls to truncate. Similarly for lvar-value and lvar-type: the former extracts the value for constant lvars, and the latter the static type derived for that lvar (value at a program point). With time, knowledge will slowly accrete. However it’s possible, if not best, to start hacking without understanding the whole system. This approach will lead to a bit of cargo culting, but mentors and people on IRC will help make sure it doesn’t do any harm, and can explain more stuff if it’s interesting or à propos.

Finding where the compiler lives

Working on the compiler itself is a bit more work. I think the best approach is to go in src/compiler/main.lisp and look for compile-component. ir1-phases loops on a component and performs high-level optimisations until fixpoint (or we get tired of waiting), while %compile-component handles the conversion to IR2 and then to machine code. The compilation pipeline hasn’t really changed since the Python paper was written, and the subphases each have their own function (and file). M-. on stuff that sounds interesting is probably the best approach at the IR2 level.

Runtime stuff

The C and assembly runtime lives in src/runtime/. There’s a lot of stuff that’s symlinked or generated during the build, so it’s probably best to look at it after a successful build. Sadly, we don’t track source locations there, but {c,e,whatever}tags works; so does grep.

GC stuff is in the obvious suspects (gc-common, gencgc, gc, etc.), but may end up affecting core loading/saving (core, coreparse, save). Depending on what in the core loading code is affected, code in genesis (the initial bootstrap that reads fasl files from the cross compiler and builds the initial core file) might also have to be modified (mostly in src/compiler/generic/genesis.lisp). That’s... more work. Like the project suggestions list says, when we change things in the runtime, it sometimes ends up affecting a lot of other components.

GDB tends to be less than useful, because of the many tricks SBCL plays on itself. It’s usually hard to beat pen, paper, and printf. At least, rebuilding the C runtime is quick: if the feature :sb-after-xc-core is enabled (which already happens for --fancy builds), slam.sh should be able to rebuild only the C runtime, and then continue the bootstrap with the rest of SBCL from the previous build. That mostly leaves PCL to build, so the whole thing should takes less than a minute on a decent machine.

Some references

I was replying to an email when I realised that some general compiler references would be useful, in addition to project- and SBCL- specific tips.

Christian Queinnec’s Lisp in Small Pieces gives a good overview of issues regarding compiling Lisp-like languages. Andrew Appel’s Modern Compiler Implementation in ML is more, well, modern (I hear the versions in C and Java have the same text, but the code isn’t as nice... and ML is a very nice language for writing compilers). I also remember liking Appel’s Compiling with Continuations, but I don’t know if it’s particularly useful for CL or the projects we suggest.

For more complicated stuff, I believe Stephen Muchnick’s Advanced Compiler Design and Implementation would have been really nice to have, instead of slogging through code and dozens of papers. Allen & Kennedy’s Optimizing Compilers for Modern Architectures: A Dependence-based Approach is another really good read, but I’m not sure how useful it would be when working on SBCL: we still have a lot of work to do before reaching for the really sophisticated stuff (and what sophistication there is is fairly non-standard).

I believe the Rabbit and Orbit compilers have influenced the design of CMUCL and SBCL. The Lambda papers provide some historical perspective, and the RABBIT and ORBIT theses are linked here.

What little magic remains in SBCL and CMUCL is the type derivation (constraint propagation) pass, and how it’s used to exploit a repository of source-to-source transformations (deftransforms). The rest is bog-standard tech from the 70s or 80s. When trying to understand SBCL’s type derivation pass at a very high level, I remember finding Henry Baker’s The Nimble Type Inferencer for Common Lisp-84 very useful, even though it describes a scheme that doesn’t quite work for Common Lisp (it’s very hard to propagate information backward while respecting the final standard). Kaplan and Ullman’s A Scheme for the Automatic Inference of Variable Types was also helpful.

Getting help

Over the years, I’ve seen a couple of people come in with great ambition, and give up after some time, seemingly without having made any progress. I believe a large part of the problem is that they tried to understand all of SBCL instead of just learning the bare minimum to get hacking, and that their goal was too big. I already wrote that SBCL is probably best approached bit by bit, with some guidance from people who’ve been there before, and I hope the projects we suggest can all lead to visible progress quickly, after a couple days or two weeks of work at most.

Still, before investing my time, I like to see the other person also give some of theirs to SBCL. This is why, as I wrote on the mailing list last week, I’m much more inclined to help someone who’s already built SBCL on their own and has submitted a patch that’s been committed or is being improved on the mailing list. I absolutely do not care what the patch is; it can be new code, a bugfix for a highly unlikely corner case, better documentation, or spelling and grammar corrections in comments. The bugs tagged as easy in our bugtracker may provide some inspiration. However trivial a patch might seem, it’s still a sign that someone is willing to put the work in to concretely make SBCL better, and I like that... it’s also a minimal test to make sure the person is able to work with our toolchain. (This isn’t SBCL policy for GSoC. It’s simply how I feel about these things.)

Again, I’m amazed by the number of people who wish to hack on SBCL this summer (as part of Google’s Summer of Code or otherwise). Because of that, I think it’s important to note that this is our first year, and so we’ll likely not have more than two or three spots. However, I always like seeing more contributors, and I hope anyone who’d like to contribute will always be guided, GSoC or not.

Finally, I’ll note that Google’s Summer of Code program was only a good excuse to write up our list of projects: they’re simply suggestions to incite programmers to see what they can do that is useful for SBCL and, most importantly, is interesting for them. Anyone should feel welcome to work on any of these projects, even if they’re not eligible or chosen for GSoC. They’re also only suggestions; if someone has their own idea, we can likely help them out just the same.

Christophe Rhodes12 Apr 2013

· 36 days ago
In case a single cryptic twitter post isn't clear enough: SBCL got accepted as a mentoring organization to Google Summer of Code for 2013. Our hastily-constructed organization profile page is available, and anyone who is interested in mentoring students on SBCL-related projects (and has a few hours to spare a week over the summer months) is welcome to sign up to the site and request mentor status.

We also have an indicative list of potential projects, mostly constructed with extreme speed just before the application deadline by Paul Khuong. I say "indicative" because I think the most important factor in successful student projects is intrinsic motivation, which must (of course) come from the student themselves. So I'd suggest that any potential student applicant should use those ideas as a guide, rather than a finished application: feel free to mix and match goals.

I think it's also worth emphasizing that it's unlikely that SBCL will be allocated more than a handful of student slots: the guidelines do say that organizations new to the programme are likely to be given relatively few to see how they cope; although I've mentored for the program before it's not clear whether that experience, with LispNYC, will carry forward into this new organizational setup. So students keen to work on SBCL would be well advised to engage early, and demonstrate that they can at least begin to work with the codebase (a trivial typo-fix patch, with correctly-formatted commit message, would be a great start, for example). It's also worth saying that Google, not the SBCL organization, has the last word on which students are accepted.

(Thanks to the Common Lisp Foundation for agreeing at very short notice to handle the financial-related side of SBCL's participation.)

LispjobsSoftware Engineer - Maps for Asha Smartphones (Scheme), Nokia, Berlin, Germany

· 37 days ago

From: Software Engineer – Maps for Asha Smartphones (Scheme)

To quote:

The Nokia Asha range of smartphones is taking the world by storm. Sales of these feature-packed yet affordable devices have been steadily increasing in the world's major markets - so much so that we need to expand our HERE development team to provide all the location-based functionality and features our customers are demanding.

We're looking for an experienced, energetic and attentive software engineer to help us create leading-edge mapping and orientation experiences for the latest Asha phones. If you believe in our philosophy of 'doing good,' join us and help people around the world find their way and enjoy their lives even more using our maps and apps.

Your main role will be to specify, implement and ensure automation, stability and quality of software tests running on mobile devices on various platforms. And you'll be a valuable member of a crack team of front- and back-end developers, graphic designers and interaction designers from all over the world.

Nicely enough you'll be doing all of this work in one of the liveliest and most creative cities in the world, Berlin - and in a shiny new office building with cafeteria, gym, sauna, game room and lots of talented, friendly and creative people in it.


Vsevolod DyomkinNLTK 2.3 - Working with Wordnet

· 37 days ago

I'm a little bit behind my schedule of implementing NLTK examples in Lisp with no posts on topic in March. It doesn't mean that work on CL-NLP has stopped - I've just had an unexpected vacation and also worked on parts, related to writing programs for the excellent Natural Language Processing by Michael Collins Coursera course.

Today we'll start looking at Chapter 2, but we'll do it from the end, first exploring the topic of Wordnet.

Wordnet and Lisp

Wordnet is a database of word meanings (senses) and word semantic relations (synonymy, hyponymy and various other ymy's). It is a really important freely available resource, so having easy access to it is very valuable. At the same time, getting Wordnet to work is somewhat involved technically. That's why I've decided to implement it's support in cl-nlp-contrib system, so that the basic cl-nlpfunctionality can be loaded without the additional hassle of having to ensure Wordnet (and other similar stuff) is loading.

The implementation of Wordnet interface in cl-nlp is incomplete in the sense, that it doesn't provide matching entities for all Wordnet tables and so doesn't support all the possible interactions with it out-of-the-box. Yet, it is sufficient to run all NLTK's examples and it is trivial to implement the rest as the need arises (I plan to do it later this year).

There are several other Wordnet interfaces for CL developed in the previous years, starting from as long ago as early nineties:

Non of them use my desired storage format (sqlite) and their overall support ranges from non-existing to little if any. Besides, one of the principles behind cl-nlp is to prefer uniform interface and ease of extensibility over performance and efficiency with the premise that a specialized efficient version can be easily implemented by restricting the more general one, while the opposite is often much harder. And adapting the existing packages to the desired interface didn't seem like a straightforward task, so I decided to implement Wordnet support from scratch.

Wordnet installation

Wordnet isn't a corpus, although NLTK categorizes it as such placing it in nltk.corpus module. It is a word database distributed in a custom format, as well as in binary format of several databases. I have picked sqlite3-based distribution as the easiest to work with — it's a single file of roughly 64 MB in size which can be downloaded from WNSQL project page. The default place where cl-nlp will search for Wordnet is the data/directory. If you call (download :wordnet), it will fetch a copy and place it there.

There are several options to work with sqlite databases in Lisp, and I have chosen CLSQL as the most mature and comprehensive system — it's a goto library for doing simple SQL stuff in CL. Besides the low-level interface it provides the basic ORM functionality and some other goodies.

Also, to work with sqlite from Lisp a C client library is needed. It can be obtained in various ways depending on your OS and distribution (on most Linuxes with apt-get or similar package managers). Yet there's another issue of making Lisp find the installed library. So for Linux I decided to ship the appropriate binaries in the project's lib/ dir. If you are not on Linux and CLSQL can't find sqlite native library, you need to manually call the following command before you load cl-nlp-contrib with quicklisp or ASDF:

(clsql:push-library-path "path to sqlite3 or libsqlite3 directory")

Supporting Wordnet interaction

Using CLSQL we can define classes for Wordnet entities, such as: Synset, Sense or Lexlink.

(clsql:def-view-class synset ()
((synsetid :initarg :synsetid :reader synset-id
:type integer :db-kind :key)
(pos :initarg :pos :reader synset-pos
:type char)
(lexdomainid :initarg :lexdomain
:type smallint :db-constraints (:not-null))
(definition :initarg :definition :reader synset-def
:type string)
(word :initarg :word :db-kind :virtual)
(sensenum :initarg :sensenum :db-kind :virtual))
(:base-table "synsets"))

Notice the presence of virtual slots word and sensenum which are used to cache data from other tables in synset to display it like it is done in NLTK. They are populated lazily with slot-unbound method just like we've seen in Chapter 1.

All the DB entities are cached as well in the global cache, so that same requests don't produce different objects and Wordnet objects could be compared with eql.

There are some inconsistencies in Wordnet lexicon which have also migrated to NLTK interface (and then NLTK introduced some more). I've done a cleanup on that so some classes and slot accessors don't fully mimic the names of their DB counterparts or the NLTK's interface. For instance, in our dictionary word refers to a raw string and lemma — to its representation as a DB entity.

CLSQL also has special syntactic support for writing SQL queries in Lisp:

(clsql:select [item_id] [as [count [*]] 'num]
:from [table]
:where [is [null [sale_date]]]
:group-by [item_id]
:order-by '((num :desc))
:limit 1)

Yet I've chosen to use another very simple custom solution for that, which I've kept in the back of my mind for a long time since I'd started working with CLSQL literal SQL syntax. Here's how we get all lemma object for a specific synset — they are connected through senses:

(query (select 'lemma
`(:where wordid
:in ,(select '(wordid :from senses)
`(:where synsetid := ,(synset-id synset))))))

For the sake of efficiency we don't open new DB connection on every such request, but use a combination of the following:

Running NLTK's examples

Now, let's run the examples from the book to see how they work in our interface.

First, connect to Wordnet:

WORDNET> (connect-wordnet <wordnet>)
#<CLSQL-SQLITE3:SQLITE3-DATABASE /cl-nlp/data/wordnet30.sqlite OPEN {1011D1FAF3}>

Now we can run the functions with the existing connection passed implicitly through the special variable. wn is a special convenience package which implements the logic of implicitly relying on the default <wordnet>. There are functions with the same names in wordnet package that take a wordnet instance as first argument, similar to how other parts of cl-nlp are organized.

WORDNET> (wn:synsets "motorcar")
(#<SYNSET auto.n.1 102958343 {10116B0003}>)
WORDNET> (synsets <wordnet> "motorcar")
(#<SYNSET auto.n.1 102958343 {10116B0003}>)

The printing of synsets and other Wordnet objects is performed with custom print-object methods. Here's a print-object for synsetthat uses the built-in print-unreadable-object macro:

(defmethod print-object ((sample sample) stream)
(print-unreadable-object (sample stream :type t :identity t)
(with-slots (sample sampleid) sample
(format stream "~A ~A" sample sampleid))))

Let's proceed further.

WORDNET> (wn:words (wn:synset "car.n.1"))
("auto" "automobile" "car" "machine" "motorcar")

This snippet is equivalent to the following NLTK code:

wn.synset('car.n.01').lemma_names

Definitions and examples:

WORDNET> (synset-def (wn:synset "car.n.1"))
"a motor vehicle with four wheels; usually propelled by an internal combustion engine"
WORDNET> (wn:examples (wn:synset "car.n.1"))
("he needs a car to get to work")

Next are lemmas.

WORDNET> (wn:lemmas (wn:synset "car.n.1"))
(#<LEMMA auto 9953 {100386BCC3}> #<LEMMA automobile 10063 {100386BCF3}>
#<LEMMA car 20722 {100386BC03}> #<LEMMA machine 80312 {100386BD23}>
#<LEMMA motorcar 86898 {1003250543}>)

As I've written earlier, there's some confusion in Wordnet between words and lemmas, and, IMHO, it is propagated in NLTK. The term lemma only appears once in Wordnet DB as a column in words table. And synsets are not directly related to words — they are linked through senses table. NLTK calls a synset to word pairing a lemma, which only adds to the confusion. I decided to call entities of words table lemmas. Now, how do you implement the equivalent of

wn.lemma('car.n.01.automobile')

We can do it like this:

WORDNET> (remove-if-not #`(string= "automobile" (lemma-word %))
(wn:lemmas (wn:synset "car.n.1")))
(#<LEMMA automobile 10063 {100386BCF3}>)

But, actually, we need not a raw lemma object, but a sense object, because sense is that mapping of word to its meaning (defined by a synset), and it's a proper Wordnet entity for this:

WORDNET> (wn:sense "car~automobile.n.1")
#<SENSE car~auto.n.1 28261 {1011F226C3}>

You can also get at the raw lemma by its DB id:

WORDNET> (wn:synset (wn:lemma 10063))
#<SYNSET auto.n.1 102958343 {100386BB33}>

Notice, that synset here is named auto. This is different from NLTK's 'car', and the reason for this is that it's unclear from the Wordnet DB, what is the "primary" lemma for a synset, so I just use the word which appears first in the DB. Probably, NLTK also uses it, but has a different ordering — compare the next output with the book's one:

WORDNET> (dolist (synset (wn:synsets "car"))
(print (wn:words synset)))
("cable car" "car")
("auto" "automobile" "car" "machine" "motorcar")
("car" "railcar" "railroad car" "railway car")
("car" "elevator car")
("car" "gondola")

Also I've chosen a different naming scheme for senses — word~synset — as it better reflects the semantic meaning of this concept.

Wordnet relations

There're two types of Wordnet relations: semantic ones between synsets and lexical ones between senses. All of the relations or links can be found in *link-types* parameter. There are 28 of them in Wordnet 3.0.

To get at any relation there's a generic function related:

WORDNET> (defvar *motorcar* (wn:synset "car.n.1"))
WORDNET> (defvar *types-of-motorcar* (wn:related *motorcar* :hyponym))
WORDNET> (nth 0 *types-of-motorcar*)
#<SYNSET ambulance.n.1 102701002 {1011E35063}>

In our case "ambulance" is the first motorcar type.

WORDNET> (sort (mapcan #'wn:words *types-of-motorcar*) 'string<)
("ambulance" "beach waggon" "beach wagon" "bus" "cab" "compact" "compact car"
"convertible" "coupe" "cruiser" "electric" "electric automobile"
"electric car" "estate car" "gas guzzler" "hack" "hardtop" "hatchback" "heap"
"horseless carriage" "hot rod" "hot-rod" "jalopy" "jeep" "landrover" "limo"
"limousine" "loaner" "minicar" "minivan" "model t" "pace car" "patrol car"
"phaeton" "police car" "police cruiser" "prowl car" "race car" "racer"
"racing car" "roadster" "runabout" "s.u.v." "saloon" "secondhand car" "sedan"
"sport car" "sport utility" "sport utility vehicle" "sports car" "squad car"
"stanley steamer" "station waggon" "station wagon" "stock car" "subcompact"
"subcompact car" "suv" "taxi" "taxicab" "tourer" "touring car" "two-seater"
"used-car" "waggon" "wagon")

Hypernyms are more general entities in synset hierarchy:

WORDNET> (wn:related *motorcar* :hypernym)
(#<SYNSET automotive vehicle.n.1 103791235 {10125702C3}>)

Let's trace them up to root entity:

WORDNET> (defvar *paths* (wn:hypernym-paths *motorcar*))
WORDNET> (length *paths*)
2
WORDNET> (mapcar #'synset-name (nth 0 *paths*))
("entity.n.1" "physical entity.n.1" "object.n.1" "unit.n.6" "artefact.n.1"
"instrumentality.n.3" "conveyance.n.3" "vehicle.n.1" "wheeled vehicle.n.1"
"self-propelled vehicle.n.1" "automotive vehicle.n.1" "auto.n.1")
WORDNET> (mapcar #'synset-name (nth 1 *paths*))
("entity.n.1" "physical entity.n.1" "object.n.1" "unit.n.6" "artefact.n.1"
"instrumentality.n.3" "container.n.1" "wheeled vehicle.n.1"
"self-propelled vehicle.n.1" "automotive vehicle.n.1" "auto.n.1")

And here are just the root hypernyms:

WORDNET> (remove-duplicates (mapcar #'car *paths*))
(#<SYNSET entity.n.1 100001740 {101D016453}>)

Now, if we look at part-meronym, substance-meronym and member-holonymrelations and try to get them with related, we'll get an empty set.

WORDNET> (wn:related (wn:synset "tree.n.1") :substance-meronym)
NIL

That's because the relation is actually one-way: a burl is part of a tree, but not vice versa. For this case there's a :reverse key to related:

WORDNET> (wn:related (wn:synset "tree.n.1") :part-meronym :reverse t)
(#<SYNSET stump.n.1 113111504 {10114220B3}>
#<SYNSET crown.n.7 113128003 {1011423B73}>
#<SYNSET limb.n.2 113163803 {1011425633}>
#<SYNSET bole.n.2 113165815 {10114270F3}>
#<SYNSET burl.n.2 113166044 {1011428BE3}>)
WORDNET> (wn:related (wn:synset "tree.n.1") :substance-meronym :reverse t)
(#<SYNSET sapwood.n.1 113097536 {10113E8BE3}>
#<SYNSET duramen.n.1 113097752 {10113EA6A3}>)
WORDNET> (wn:related (wn:synset "tree.n.1") :member-holonym :reverse t)
(#<SYNSET forest.n.1 108438533 {10115A81C3}>)

While the tree is member-meronym of forest (i.e. NLTK has it slightly the opposite way):

WORDNET> (wn:related (wn:synset "tree.n.1") :member-meronym)
(#<SYNSET forest.n.1 108438533 {10115A81C3}>)

And here's the mint example:

WORDNET> (dolist (s (wn:synsets "mint" :pos #\n))
(format t "~A: ~A~%" (synset-name s) (synset-def s)))
mint.n.6: a plant where money is coined by authority of the government
mint.n.5: a candy that is flavored with a mint oil
mint.n.4: the leaves of a mint plant used fresh or candied
mint.n.3: any member of the mint family of plants
mint.n.2: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
batch.n.2: (often followed by `of') a large number or amount or extent
WORDNET> (wn:related (wn:synset "mint.n.4") :part-holonym :reverse t)
(#<SYNSET mint.n.2 112855042 {1011CF3CF3}>)
WORDNET> (wn:related (wn:synset "mint.n.4") :substance-holonym :reverse t)
(#<SYNSET mint.n.5 107606278 {1011CEECA3}>)

Verbs:

WORDNET> (wn:related (wn:synset "walk.v.1") :entail)
(#<SYNSET step.v.1 201928838 {1004139FF3}>)
WORDNET> (wn:related (wn:synset "eat.v.1") :entail)
(#<SYNSET chew.v.1 201201089 {10041BDC33}>
#<SYNSET get down.v.4 201201856 {10041BF6F3}>)
WORDNET> (wn:related (wn:synset "tease.v.3") :entail)
(#<SYNSET arouse.v.7 201762283 {10042495E3}>
#<SYNSET disappoint.v.1 201798936 {100424B0A3}>)

And, finally, here's antonomy, which is a lexical, not semantic, relationship, and it takes place between senses:

WORDNET> (wn:related (wn:sense "supply~supply.n.2") :antonym)
(#<SENSE demand~demand.n.2 48880 {1003637C03}>)
WORDNET> (wn:related (wn:sense "rush~rush.v.1") :antonym)
(#<SENSE linger~dawdle.v.4 107873 {1010780DE3}>)
WORDNET> (wn:related (wn:sense "horizontal~horizontal.a.1") :antonym)
(#<SENSE inclined~inclined.a.2 94496 {1010A5E653}>)
WORDNET> (wn:related (wn:sense "staccato~staccato.r.1") :antonym)
(#<SENSE legato~legato.r.1 105844 {1010C18AF3}>)

Similarity measures

Let's now use the lowest-common-hypernyms function abbreviated to lch to see the semantic grouping of marine animals:

WORDNET> (defvar *right* (wn:synset "right whale.n.1"))
WORDNET> (defvar *orca* (wn:synset "orca.n.1"))
WORDNET> (defvar *minke* (wn:synset "minke whale.n.1"))
WORDNET> (defvar *tortoise* (wn:synset "tortoise.n.1"))
WORDNET> (defvar *novel* (wn:synset "novel.n.1"))
WORDNET> (wn:lch *right* *minke*)
(#<SYNSET baleen whale.n.1 102063224 {102A2E0003}>)
2
1
WORDNET> (wn:lch *right* *orca*)
(#<SYNSET whale.n.2 102062744 {102A373323}>)
3
2
WORDNET> (wn:lch *right* *tortoise*)
(#<SYNSET craniate.n.1 101471682 {102A3734A3}>)
7
5
WORDNET> (wn:lch *right* *novel*)
(#<SYNSET entity.n.1 100001740 {102A373653}>)
15
7

The second and third return values of lch come handy here, as they show the depth of the paths to the common ancestor and give some immediate data for estimating semantic relatedness, that we're going to explore more now.

WORDNET> (wn:min-depth (wn:synset "baleen whale.n.1"))
14
WORDNET> (wn:min-depth (wn:synset "whale.n.2"))
13
WORDNET> (wn:min-depth (wn:synset "vertebrate.n.1"))
8
WORDNET> (wn:min-depth (wn:synset "entity.n.1"))
0

By the way, there's another whale, that is much closer to the root:

WORDNET> (wn:min-depth (wn:synset "whale.n.1"))
5

Guess what it is?

WORDNET> (synset-def (wn:synset "whale.n.1"))
"a very large person; impressive in size or qualities"

Now, let's calculate different similarity measures:

WORDNET> (wn:path-similarity *right* *minke*)
1/4
WORDNET> (wn:path-similarity *right* *orca*)
1/6
WORDNET> (wn:path-similarity *right* *tortoise*)
1/13
WORDNET> (wn:path-similarity *right* *novel*)
1/23

The algorithm here is very simple — it just reuses the secondary return values of lch:

(defmethod path-similarity ((wordnet sql-wordnet3)
(synset1 synset) (synset2 synset))
(mv-bind (_ l1 l2) (lowest-common-hypernyms wordnet synset1 synset2)
(declare (ignore _))
(when l1
(/ 1 (+ l1 l2 1)))))

There are many more Wordnet similarity measures in NLTK, and they are also implemented in cl-nlp. For example, lch-similarity the name of which luckily coincides with lch function:

WORDNET> (wn:lch-similarity (wn:synset "dog.n.1") (wn:synset "cat.n.1"))
Calculating taxonomy depth for #<SYNSET entity.n.1 100001740 {102A9F7A83}>
2.0281482

This measure depends on performing expensive computation of taxonomy depth which is done in a lazy manner in the max-depth function:

(defmethod max-depth ((wordnet sql-wordnet3) (synset synset))
(reduce #'max (mapcar #`(or (get# % *taxonomy-depths*)
(progn
(format *debug-io*
"Calculating taxonomy depth for ~A~%" %)
(set# % *taxonomy-depths*
(calc-max-depth wordnet %))))
(mapcar #'car (hypernym-paths wordnet synset)))))

Other similarity measures include wup-similarity, res-similarity, lin-similarity and others. You can read about them in the NLTK Wordnet manual. Most of them depend on the words' information content database which is calculated for different corpora (e.g. BNC and Penn Treebank) and is not part of Wordnet. There's a WordNet::Similarity projectthat distributes pre-calculated databases for some popular corpora.

Lin similarity proposes a similar formula to LCH similarity for calculating the score, but only uses information contents instead of taxonomy depths as the arguments to it:

WORDNET> (wn:lin-similarity (wn:synset "dog.n.1") (wn:synset "cat.n.1"))
0.87035906

To calculate it we first need to fetch the information content corpus with (download :wordnet-ic) and load the data for SemCor:

WORDNET> (setf *lc* (load-ic :filename-pattern "semcor"))
#<HASH-TABLE :TEST EQUAL :COUNT 213482 {100D7B80D3}>

The default :filename-pattern will load the combined information content scores for the following 5 corpora: BNC, Brown, SemCor and its raw version, all of Shakespeare's works and Penn Treebank. We'll talk more about various corpora in the next part of the series...

Well, this is all as far as Wordnet concerned for now. There's another useful Wordnet functionality — word morpher, but we'll return to it when we'll be talking about word forming, stemming and lemmatization. Meanwhile, here's a good schema showing the full potential of Wordnet:

Wordnet 3.0 UML diagram

LispjobsArtificial Intelligence Intern, Clojure, SAIC, Arlington, VA

· 37 days ago

From this posting: Artificial Intelligence Intern Job

SAIC's Sensors and Phenomenology Operation has an internship position at its Autonomy and Analytics Division in Arlington, Virginia for the summer of 2013.

Under the general supervision of a staff researcher, the Intern will perform a wide variety of duties applicable to their field of study supporting technical projects often in a team environment. Interns may be asked to participate in activities supporting multiple projects or work on a small task that contributes to a large project of great importance. Interns will have the opportunity throughout the summer to participate in additional activities to include multiple social functions, SAIC Interns' Outing to a Nats game, and numerous coaching sessions on life after college.

The goal for this internship is to explore artificial intelligence (AI) techniques for diagram and chart analysis in real-world domains. This research may also look at related areas, such as diagrammatic and visual reasoning, document retrieval from large corpora or from the web, and diagram generation.

Systems will be built using Clojure and/or Scheme, so experience with one of these languages or with Lisp is essential. Although systems will not be built with Java, familiarity with existing Java libraries is also useful.


Michael ForsterCompiling lparallel under SBCL 1.0.50

· 40 days ago
Compiling lparallel under SBCL 1.0.50:

Previously I wrote that lparallel 2.3.2 failed to compile for me under SBCL 1.0.50. James was kind enough to contact me by email to say that he has fixed that. I'm guessing we'll see that in a future Quicklisp update.

In the near future, I plan to write about my grokking and use of lparallel in a commercial project.

Christophe Rhodes8 Apr 2013

· 40 days ago
Before I forget: I was at what called itself the very first Emacs conference last weekend. The first thing to say is that I had a lot of fun!

I experimented with semi-live-tweeting the thing -- indeed, it was noticed by an external viewer that I was the only live-tweeter on #emacsconf using an emacs-based twitter client. I'm not sure about whether this is a good thing to do or not; certainly, in lectures that I'm giving, it's irritating (and distracting) if students are glued to their ludicrously expensive phones rather than to my perfectly-crafted narrative, but I accept that it happens – and if any of my undergraduate lecturers is reading this: I apologize profusely for reading the newspaper and trying to do the Inquisitor crossword during the Saturday lectures (but I wasn't the only one...). In this instance, I aimed not to create any disturbance, and mostly managed to tweet during breaks rather than during the actual talks.

The actual talks? This had something of the feel of a European Common Lisp Meeting (or its precursors) from about a decade ago: several times I heard the expression of surprise along the lines of "Who would have thought there were so many of us?" Particularly so many who were willing to turn up on the Saturday of an extended weekend to be cooped away from the sunshine all da... no, wait, being in the warm was a bonus. (It snowed on me at lunchtime). The actual talks were a good part of the draw: I had admired Sacha Chua's posts about using org-mode for, well, everything, back when I was picking it up for GTD (Gah, I am a long way off the wagon!) and John Wiegley is a name that has popped up in many a place that I have investigated (emacs obviously, but also Common Lisp and personal accounting), so the fact that they were doing a double-act for the keynote was a great draw (and it made a great start to the day).

The other talks were all interesting, though some were more relevant to me as an emacs outsider than others: highlights for me were the insanity of embedding a gtk-emacs inside another emacs (memories of McCLIM craziness) using Joakim Verona's oddly-named XWidgets; a call to arms on EmacsWiki from Nick Ferrier; Sam Aaron's emacs-based music/live coding system; and John Wiegley's rapid tour through emacs and emacs-lisp productivity enhancers (made me feel like a complete newbie). There was a lot about packaging systems for emacs lisp libraries and applications, about which I expressed a certain amount of skepticism, and Luke Gorrie gave a talk about SLIME, from (almost) the opposite perspective of the talk he gave at ECLM in 2005. (I have got this far without mentioning them, but I can't leave the subject without linking to Sacha Chua's talk sketchnotes; the keynote was excellent, and these notes are icing.)

The conversations between talks were good too; I ducked out of some lunch ones to visit Camden Lock market, but there was plenty of time to socialize. Possibly the weirdest moment for me came when Reuben Thomas showed up between Easter weekend services; I had last seen Reuben in a crazy performance of Handel's Giulio Cesare many, many years ago, and the context-switch that I think both of us had to go through to place the other was lengthy and extended. Small world.

I feel very lucky to have been able to participate. Thanks from me to the organizers, Aleksandar Simic for doing what seemed to be a lot of the heavy lifting and Forward for hosting. And I'll idle a bit more strongly on the emacs IRC channel and attempt to discover more of the London-based hacking community, time permitting...

Quicklisp newsUpdates coming soon

· 42 days ago
I hope to make a new dist update tomorrow, the 7th of April. In general, I like to make new releases on the first weekend of every month.

I haven't been announcing the past few updates because I wanted to take advantage of a new web service to do so. But development on that service has stalled, so I'll announce tomorrow's update as I usually do.

On an unrelated topic, if you're making a new library that's a CFFI wrapper around a C library, please explain in your README or description what the original C library is for. I've seen a lot of libraries lately that look like this: "This is cl-fribble, a CL wrapper around libfribble." But there's no indication of what libfribble does or why someone would want to use a CL wrapper for it.

Michael ForsterUpgrading SBCL and loading lparallel

· 44 days ago

lparallel 2.3.2 failed to compile with my SBCL 1.0.50 installation:

CL-USER&gt; (ql:quickload "lparallel")
To load "lparallel":
  Load 1 ASDF system:
    lparallel
; Loading "lparallel"
[package alexandria.0.dev]........................
[package bordeaux-threads]........................
[package lparallel.util]..........................
[package lparallel.thread-util]...................
[package lparallel.raw-queue].....................
[package lparallel.cons-queue]....................
[package lparallel.vector-queue]..................
[package lparallel.queue].........................
[package lparallel.biased-queue]..................
[package lparallel.counter].......................
[package lparallel.spin-queue]....................
[package lparallel.kernel]........................
[package lparallel.kernel-util]...................
[package lparallel.ptree].........................
[package lparallel.promise].......................
[package lparallel.defpun]........................
[package lparallel.cognate].......................
[package lparallel].........

; file: /home/mjf/quicklisp/dists/quicklisp/software/lparallel-20130312-git/src/spin-queue/cas-spin-queue.lisp
; in: DEFUN PUSH-SPIN-QUEUE
;     (LPARALLEL.SPIN-QUEUE::CAS
;      (LPARALLEL.SPIN-QUEUE::NODE-CDR
;       (LPARALLEL.SPIN-QUEUE::SPIN-QUEUE-TAIL LPARALLEL.SPIN-QUEUE::QUEUE))
;      NIL LPARALLEL.SPIN-QUEUE::NEW)
; --&gt; EQ 
; ==&gt;
;   (COMPARE-AND-SWAP
;    (LPARALLEL.SPIN-QUEUE::NODE-CDR
;     (LPARALLEL.SPIN-QUEUE::SPIN-QUEUE-TAIL LPARALLEL.SPIN-QUEUE::QUEUE))
;    NIL LPARALLEL.SPIN-QUEUE::NEW)
; 
; caught ERROR:
;   during macroexpansion of (SB-EXT:COMPARE-AND-SWAP (NODE-CDR #) NIL ...). Use
;   *BREAK-ON-SIGNALS* to intercept:
;   
;    Invalid first argument to COMPARE-AND-SWAP: (NODE-CDR (SPIN-QUEUE-TAIL QUEUE)).
; 
; compilation unit aborted
;   caught 1 fatal ERROR condition
;   caught 1 ERROR condition
; Evaluation aborted on #.

We deploy applications on more recent versions of SBCL, so there was no reason to not upgrade. However, warned by Christophe Rhodes’s post, I chose to avoid version 1.1.6. I downloaded 1.1.5 and tweaked my copy of the Slackbuilds SBCL script to build it.

Following the upgrade, I was able to load lparallel and run its tests successfully.

Christophe Rhodes3 Apr 2013

· 45 days ago
I released SBCL 1.1.6 before the weekend, after the customary one-week period of code freeze and request for testing. Unfortunately, it turns out that the release contains a bug which affects a substantial number of quicklisp libraries: the compiler's transformation of svref was sped up, but unfortunately without sufficient generality, so code doing svref on a symbol-macro fails to compile.

This bug has been fixed in the master branch; this post is somewhat in the vein of a public service announcement: don't use SBCL 1.1.6 if your code or anything it depends on does svref on a symbol-macro. (If it does, a quick workaround is to replace the svrefs with arefs). As well as the public service announcement, though, I have a question for the wider community: how can we encourage testing the code and clearly communicating the results just before the release happens rather than just after? It's somewhat frustrating to have a week-long code freeze, then bug reports of serious issues a few hours after the release is made... and unfortunately answers of the form "just test everything yourself during the freeze period" aren't currently practical. Maybe someday.


For older items, see the Planet Lisp Archives.


Last updated: 2013-05-13 17:33