Important news:
Subject: ECLM 2013 - two days left to register
Date: Mon, 13 May 2013 19:17:26 +0200
From: Edi WeitzLadies and Gentlemen,
This is our last call - there are two days left to register for this year's ECLM in case you haven't done so already:
We have more than 50 registrations so far, but we wouldn't mind seeing another dozen or two...
Best regards,
Arthur & Edi.
I greatly enjoyed my last visit to ECLM and I can't recommend it highly enough if you want to meet people with interesting Common Lisp projects and ideas.
Last week came with two bank holidays in a row, and I took the opportunity to design a command language for pgloader. While doing that, I unexpectedly stumbled accross a very nice AHAH! moment, and I now want to share it with you, dear reader.

AHAH, you'll see!
The general approach I'm following code wise with that command language is to first get a code API to expose the capabilities of the system, then somehow plug the command language into that API thanks to a parser. It turns out that doing so in Common Lisp is really easy, and that you can get a compiler for free too, while at it. Let's see about that.
In this newsgroup article What is symbolic compoutation?, Pascal Bourguignon did propose a very simple piece of code:
(defparameter *additive-color-graph* '((red (red white) (green yellow) (blue magenta)) (green (red yellow) (green white) (blue cyan)) (blue (red magenta) (green cyan) (blue white)))) (defun symbolic-color-add (a b) (cadr (assoc a (cdr (assoc b *additive-color-graph*)))))
This is an example of symbolic computation, and we're going to build a little language to express the data and the code. Not that we would need to build one, mind you, more in order to have a really simple example leading us to the ahah moment you're now waiting for.
Before we dive into the main topic, you have to realize that the previous code example actually works: it's defining some data, using an implicit data structure composed by nesting lists together, and defines a function that knows how to sort out the data in that anonymous data structure so as to compound 2 colors together.
TOY-PARSER> (symbolic-color-add 'red 'green) YELLOW
I decided to go with the following language:
color red +red white +green yellow +blue magenta color green +red yellow +green white +blue cyan color blue +red magenta +green cyan +blue white mix red and green
And here's how some of the parser looks like, using the esrap packrat lib:
(defrule color-name (and whitespaces (+ (alpha-char-p character))) (:destructure (ws name) (declare (ignore ws)) ; ignore whitespaces ;; CL symbols default to upper case. (intern (string-upcase (coerce name 'string)) :toy-parser))) ;;; parse string "+ red white" (defrule color-mix (and whitespaces "+" color-name color-name) (:destructure (ws plus color-added color-obtained) (declare (ignore ws plus)) ; ignore whitespaces and keywords (list color-added color-obtained))) ;;; mix red and green (defrule mix-two-colors (and kw-mix color-name kw-and color-name) (:destructure (mix c1 and c2) (declare (ignore mix and)) ; ignore keywords (list c1 c2)))
Those rules are not the whole parser, go have a look at the project on github if you want to see the whole code, it's called toy-parser over there. The main idea here is to show that when we parse a line from our little language, we produce the simplest possible structured data: in lisp that's symbols and lists.
The reason why it makes sense doing that is the next rule:

The one grammar rule to bind them all
(defrule program (and colors mix-two-colors) (:destructure (graph (c1 c2)) `(lambda () (let ((*additive-color-graph* ',graph)) (symbolic-color-add ',c1 ',c2)))))
This rule is the complex one to bind them all. It's using a quasiquote, a basic lisp syntax element allowing the programmer to very easily produce data that looks exactly like code. Let's see how it goes with a very simple example:
TOY-PARSER> (pprint (parse 'program
"color red +green yellow mix green and red"))
(LAMBDA NIL
(LET ((*ADDITIVE-COLOR-GRAPH* '((RED (GREEN YELLOW)))))
(SYMBOLIC-COLOR-ADD 'RED 'GREEN)))
The parser is producing structure (nested) data that really looks like lisp code, right? So maybe we can just run that code...

Here is my AHAH moment!
Let's see about actually running the code:
TOY-PARSER> (let* ((code "color red +green yellow mix green and red") (program (parse 'program code))) (compile nil program)) #<Anonymous Function #x3020027CF0EF> NIL NIL TOY-PARSER> (let* ((code "color red +green yellow mix green and red") (program (parse 'program code))) (funcall (compile nil program))) YELLOW
So we have a string reprensing code in our very little language, and a
parser that knows how to produce a nested list of atoms that looks like lisp
code. And as we have lisp, we can actually compile that code at run-time
with the same compiler that we used to produce our parser, and we can then
funcall that function we just built.
Oh and the function is actually compiled down to native code, of course:
TOY-PARSER> (let* ((code "color red +green yellow mix red and green") (program (parse 'program code)) (func (compile nil program))) (time (loop repeat 1000 do (funcall func)))) (LOOP REPEAT 1000 DO (FUNCALL FUNC)) took 108 microseconds (0.000108 seconds) to run. During that period, and with 4 available CPU cores, 105 microseconds (0.000105 seconds) were spent in user mode 13 microseconds (0.000013 seconds) were spent in system mode NIL
Yeah, it took the whole of 108 microseconds to actually run the code
generated by our own parser a thousand times, on my laptop. I can believe
it's been compiled to native code, that seems like the right ballpark.
The toy-parser code is there on GitHub and you can actually load it using
Quicklisp: clone the repository in ~/quicklisp/local-projects/ then
(ql:quickload "toy-parser"), and play with it in (in-package :toy-parser).
The only thing I still want to say here is this: can your programming language of choice make it that easy?
It still amuses me that my most successful project to date is a blog engine. Not that I'm complaining about having contributors. When I last mentioned it, version 0.8 had just been released. Since then there have been 2 new contributors and a bunch of new features. I think the code has mostly improved in cleanliness.
The biggest changes are new shiny docs, a new tags implementation, cleanups to theming, and plugins for Google Analytics, Github Pages, and Sitemap Generation. For the full details, see the changelog.
My plans for 1.0 are primarily to take advantage of the extensible content types added in 0.8 and add some sort of tumblr-like auto-embedding support. But I probably won't get around to working on that for a spell. Why?
Because my lisp emulation experiment/art project is ongoing. Nyef was kind enough to share some code he'd hacked up for NES emulation years ago and it helped give me the motivation to rewrite famiclom's PPU (Graphics Card). The former code was largely cribbed from Patrick Walton's sprocketnes and I didn't understand it very well. I've hit the nesdev wiki again and am getting more comfortable with the PPU's actual workings. The code is on github in the new-ppu branch and I'm hoping to spend more time on it this coming week.
I also spent the last week porting cl-6502 to clojurescript for giggles. Heresy, I know. ;)
cljs-6502 is in a basic working state but there are bugs aplenty and I haven't implemented the assembler or disassembler. The must frustrating part was dealing with A) differences in macro hygiene and B) poor macro debugging facilities.
The browser is a fun target though. I'll have to try parenscript or ... jscl! JSCL is a full CL->JS compiler in early development, which I contributed a tiny patch to for fboundp. It's a great project and if you have any interest in helping implement a lisp, I'd encourage you to get involved. The maintainers are very approachable and there's plenty of fun hacking to be had.
All for now. It's time to play around trying static analysis of Nintendo ROMs with Lisp. I'm bound to learn something...hopefully.
In doing a problem set from the Internet a few weeks ago, I found myself writing awkward constructions with REDUCE and LOOP to try to find the best one (or two or three) things in a big bag of things based on various criteria: similarity to English, hamming distance from their neighbor, etc.
I wrote a macro that encapsulated the pattern. I’ve reworked that macro into a library for public use.
There are a variety of examples in the README and the tests directory.
Here is one example to pique your interest. Suppose you have some data about the elevations of various cities in various states and you (being a Moxy Fruvous fan) want to know What Is the Lowest Highest Point?
Here’s how you might tackle that with the TRACK-BEST library:
(let ((data '(("Alabama" ("Birmingham" 664) ("Mobile" 218) ("Montegomery" 221)) ("Alaska" ("Anchorage" 144) ("Fairbanks" 531)) ("Arizona" ("Grand Canyon" 6606) ("Phoenix" 1132) ("Tuscon" 2641))))) (with-track-best (:order-by-fn #'<) (dolist (state-info data) (multiple-value-bind (city altitude) (with-track-best () (dolist (city-info (rest state-info)) (track (first city-info) (second city-info)))) (track (list (first state-info) city) altitude)))))
With this limited dataset, the end result would be (VALUES '("Alaska" "Fairbanks") 531). The inner WITH-TRACK-BEST finds the highest city in each state. The outer WITH-TRACK-BEST finds the lowest of these.
I'm very excited to be attending both ECLM and ELS in Madrid this year. This is the last week to register for ECLM, so if you want to go, you should register ASAP.
sudo aptitude install libplplot-dev plplot11-driver-cairo plplot11-driver-xwin
(ql:quickload :cl-plplot)
(defun x-y-plot ()
(let* ((x (iter (for x from 0.0 to 10.0 by 0.25) (collect x result-type vector)))
(y (map 'vector (alexandria:rcurry 'expt 2) x))
(p (cl-plplot:new-x-y-plot x y))
(w (cl-plplot:basic-window :x-label "x" :y-label "x squared" :title "The square function")))
(cl-plplot:add-plot-to-window w p)
(cl-plplot:render w "xwin")))
(x-y-plot)
One of the things that’s often said about Lisp is that it allows you to build up the language to you. Here are two examples that really drive home that point, for me.
I read an article about Treaps on reddit yesterday. The article used pretty direct pylisp to present Treaps.
I thought it would be fun to go through the exercise in Common Lisp instead. Pylisp made a number of things awkward that would melt away in Common Lisp.
I began by defining a node structure to encapsulate the information at a given node:
(defstruct node (priority (random 1.0d0) :type real :read-only t) (key 0 :read-only t) (value 0 :read-only t) (left nil :type (or null node) :read-only t) (right nil :type (or null node) :read-only t))
Then, I defined a top-level structure to hold the root node, track the function used to sort the tree, and track the function used to create a key from a value. I used the convention that two keys are equivalent if neither is less than the other.
(defstruct treap (root nil :type (or null node) :read-only t) (less-than #'< :read-only t) (key #'identity :read-only t))
The article was using a functional style. As you may have guessed by the liberal use of :read-only t in those DEFSTRUCT clauses, I also used a functional style.
When working with functional data structures, one often needs to copy a whole structure with only slight modifications. Here, Lisp’s keyword arguments made everything simple and clean. I made this functions:
(defun copy-node (node &key (priority (node-priority node)) (key (node-key node)) (value (node-value node)) (left (node-left node)) (right (node-right node))) (make-node :priority priority :key key :value value :left left :right right))
Now, for any node, I could copy it and only have to specify the fields that I wanted to change. Here is a left-rotation using this:
(defun left-rotate (node) (let ((left (node-left node))) (copy-node left :right (copy-node node :left (node-right left)))))
What other languages let you do anything like that so simply? You could pull it off in Perl if you were willing to have your node be a hash (which, admittedly, you probably are if you’re writing Perl). What other language lets you do anything like that? Even other languages with named arguments don’t let you have the defaults based on other arguments.
As with any binary tree, you find yourself having to deal with four specific cases over and over again with Treaps.
I wrote myself a TREAP-CASES macro that streamlines all of these checks. It also validates to make sure you don’t have duplicate cases or cases other than these four. It makes sure that no matter which order you write the cases (or even if you leave some out), they end up organized in the order listed above. The four cases are mutually exclusive so the order doesn’t matter except that you want to make sure you haven’t run out of tree before doing comparisons and that once you’re beyond the less-than and greater-than cases you already know you’re in the equivalence case.
With this macro, my TREAP-FIND function looks like this:
(defun treap-find (key treap) (check-type treap treap) (labels ((treap-node-find (root) (treap-cases (key root treap) (null (values nil nil)) (< (treap-node-find (node-left root))) (> (treap-node-find (node-right root))) (= (values (node-value root) t))))) (treap-node-find (treap-root treap))))
The little DSL makes writing and reading the code so much easier. All of the mess of comparing the key to the node’s key is hidden away.
With years of practice and days of debugging, you might be able to pull off some quasi-readable control construct like this using C++ templates. With enough therapy, you could convince yourself you can get a close-enough effect with C-preprocessor macros. In Lisp, it’s the work of minutes (without lying to yourself).
Here is the TREAP-CASES macro for reference/completeness.
(defmacro treap-cases ((key root treap) &rest clauses) (validate-treap-case-clauses clauses) (let ((k (gensym "KEY-")) (tr (gensym "TREAP-")) (r (gensym "ROOT-")) (t< (gensym "<-"))) `(let* ((,k ,key) (,tr ,treap) (,r ,root) (,t< (treap-less-than ,tr))) (cond ((null ,r) ,@(rest (assoc 'null clauses))) ((funcall ,t< ,k (node-key ,r)) ,@(rest (assoc '< clauses))) ((funcall ,t< (node-key ,r) ,k) ,@(rest (assoc '> clauses))) (t ,@(rest (assoc '= clauses)))))))
I suppose I should also include #'VALIDATE-TREAP-CASE-CLAUSES, too, but it’s what you’d expect:
(defun validate-treap-case-clauses (clauses) (let ((all-choices '(null < > =))) (flet ((assert-all-choices-valid () (dolist (c clauses) (unless (member (first c) all-choices) (error "Unrecognized clause type: ~S" (first c))))) (assert-no-duplicates () (dolist (c all-choices) (unless (<= (count c clauses :key #'first) 1) (error "Duplicate ~S clause not allowed." c))))) (assert-all-choices-valid) (assert-no-duplicates))))
I am just back from a few days of holidays and have seen how active JSCL is! It is really exciting.
This rising is due to abeaumont probably. I stopped working, but he came up with an idea I liked. It is, why don't integrate JSCL into Conkeror? We would end up with a Lisp programmable browser… that's a practical reason! We did some tests locally and it seemed to work, but now it is time for improving JSCL itself.
It is not the only platform where JSCL could run on. I plan to add support for Node JS. Indeed, it will let us use the REPL and tests in the terminal in addition to the browser, which is a much more convenient way to program.
In the meantime, I would like to improve the compiler a little bit.
Suggestion and patches are very welcome if you want to join us.

One of the always trending topics in our pop-culture-like computer industry is dealing with errors. There are whole movements addressing it. Yet, in my humble opinion, these movements often see errors in a rather narrow, I would even say, simplistic manner: like the proverbial example of passing a string to a function which expects integers. But, in fact, the scope of errors that can occur in a program is indefinite. In a senses, the language of errors is Turing complete, and creating a program can be viewed from a certain point as devising a way of working around all relevant errors.
Today I'd like to talk about a particularly nasty type of errors - those that occur in the platform on which we're building the program. And the reality is such, that we base our programs on at least several layers of platforms: a hardware platform, an OS and its standard library, a language's VM and the language's standard library, some user-level framework(s) - to name a few. There are several aspects of this picture - some positive for us and some negative.
There's a revealing rant by node.js creator Ryan Dahl on the topic of such platforms - now removed, but available in the all-remembering Internet - "I hate almost all software".
The platform errors are so nasty, because of two things:
And the thing is, there are severe bugs in all of the platforms we use. The Pentium FDIV bug is one acute example, but let's talk Java here, because there's so many people bragging how rock solid the JVM is. One of the problems of the JVM is jar hell - the issue that haunts almost every platform under different names (like DLL hell). We had an encounter with it recently, when after an update our server started hanging for no apparent reason. After an investigation we've found that part of our code was compiled with a more recent version of Gson library for parsing JSON, while the main server used the older version. The interesting thing is that there was no error condition signalled. I wonder, how many projects do the same mistake and don't even know what's the source of their problems. We had a similar but more visible issue with different versions of the servlet-api, but that time they were caught at compile-time because of the minor API inconsistency.
The other problem we'd fought fruitlessly and had to eventually work around with external tools was the JVM not properly closing network sockets. Yes, it leaks sockets in some scenarios and we couldn't find a way to prevent it. This is an example of another nasty platform error - the one which happens at layer boundaries. Also, many java libraries, like the ubiquitous jetty webserver leak memory in their default setting. Actually, I wonder, what percentage of Java libraries doesn't leak memory by default?.. This is no way to say, that JVM is exceptional in this regard - other platforms suffer from the same problems. At the same time to JVM's credit it has one of the tools to fight platform errors that many other platforms lack.
So, what are the ways to deal with those errors? Unfortunately, I don't have a lot of answers here, but I hope to discover some with this article. Here are some of the suggestions:
But the biggest upside is not in fighting with the existing system, but in removing or diminishing the source of the problems - i.e. creating more programmer-friendly platforms. So the most important recommendations I see are for platform authors (and, in fact, most of us become them to some extent when we release a utility library or a framework).
How can static typing help with these types of errors? I don't know. What about the Erlang's failure isolation. Once again, it's pretty orthogonal to the mentioned issues: if your Erlang library leaks memory, the whole machine will suffer - i.e. you can't isolate errors which touch the underlying layers of your platform which you don't fully control...
Please, share your approaches and stories about dealing with platform errors.
Today I was running some analysis on about 9,000 files, basically mapping a function over each to warm up a cache. Something like this:
* (map nil 'analyze-file *9000-files*) time passes
I had no idea how well it was progressing, and whether I'd need to take a snack break or let it run overnight. So I interrupted it and wrote a quick and dirty REPL utility function:
(defun :dot-every (count fun)
(let ((i 0))
(lambda (&rest args)
(when (< count (incf i))
(setf i 0)
(write-char #\. *trace-output*)
(force-output *trace-output*))
(apply fun args))))
It prints out one dot per COUNT invocations of the function it returns, giving some indication of progress. Sensible values for COUNT depend on the volume of function calls.
For this problem, I called it with a COUNT of 100:
* (map nil (:dot-every 100 'analyze-file) *9000-files*) ............etc
The cached analyses printed out a ton of dots quickly, and the uncached analyses started printing dots at a slow but steady pace, and I could tell that it would be done in a few minutes instead of a few hours.
So now I'm going to use this to wrap up any function I have to call a ton of times and I want to get a sense of how it's progressing.
A friend of mine is taking a cryptography course this semester. One of his current homework problems involves creating
, , and
values for RSA Encryption. Always one to play the home game when math is involved, I found myself rewriting two snippets of code that I always end up writing any time I do any number-theory related coding.
One of them is simple EXPONENT-MODULO that takes a base
, an exponent , and a modulus
and returns
. There are more efficient ways to do this, but this one is much faster than the naïve method and quite easy to rewrite every time that I need it. The idea is that when
is odd, you recursively calculate
and return
. When
is even, you recursively calculate
and return
.
(defun expt-mod (b e m) (cond ((zerop e) 1) ((evenp e) (let ((a (expt-mod b (/ e 2) m))) (mod (* a a) m))) (t (mod (* b (expt-mod b (1- e) m)) m))))
The other snippet of code is a good deal trickier. Every time I need it, I spend way more time than I want to futzing with examples, getting lost in subscripts, etc. So, while I could rewrite the above in a moment, what’s below, I hope I have the search-fu to find this post next time I need it.
If you have a number
and some number
which is relatively prime to (shares no factors except for one with)
, then there exists some number
such that . And, in fact, all such numbers
for a given
differ by a multiple of
, so there is a unique such .
Another way of saying
and
are relatively prime is to say that their greatest common divisor is one. Besides being the biggest number that divides into both
and
, the is also the smallest positive number expressible as
for integers
and
. If you keep track of the steps you use when calculating the greatest common divisor (which in this case is
) while using the Euclidean algorithm you can (with enough care) backtrack through the steps and determine
and
for your given
and
.
For example, suppose you had and
. You want to know what
you’d need for .
The steps in the Euclidean algorithm show:
Working backwards, you can express
as a combination of . Then, you can express
as a combination of
and so on:
Wheee… in my example chosen to be easy to backtrack, the answer comes out to be which is equal to
modulo
. So,
. I picked a lucky number that is its own inverse modulo
. It’s not usually like that. For example,
.
So, anyhow, here’s a function that takes
and
and returns two values
and
so that .
(defun gcd-as-linear-combination (a n) (multiple-value-bind (q r) (truncate n a) (if (zerop r) (values 1 0) (multiple-value-bind (x y) (gcd-as-linear-combination r a) (values (- y (* q x)) x)))))
Assuming that
and
are relatively prime with < n" />< n" style="vertical-align:-20%;" class="tex" alt="a < n" />, the first value returned is
‘s inverse modulo
.
Here's a bright idea: why not take the GSoC opportunity to have someone implement all current CDRs in every major Common Lisp implementation? Not that I'm volunteering for anything. I'm justing throwing the though out in the open...
Hello,
just a quick note to let you know that we have finally opened the registration process for ELS 2013. See the bottom the page for more information. It is possible to register via PayPal or via direct bank transfer.
Looking forward to see you all !
Eli Greenbaum's analysis here is the first legal commentary on the LLGPL I've encountered.
SBCL was accepted as a mentoring organisation for Google’s Summer of Code 2013 (our list of project suggestion is here). This will be our first time, so that’s really great news. I’m also extremely surprised by the number of people who’ve expressed interest in working with us. I was going to reply to a bunch of emails individually, but I figure I should also centralise some of the stuff here.
EDIT: There’s a section with a bunch of general references.
EDIT 2: Added a note on genesis when playing with core formats.
The first step is probably to install git, and clone our repo
(the github mirror works well, and
lets you fork to your own github account for easy publication). Then,
building from source and installing SBCL (a local installation to
$HOME works fine) is obviously useful experience, and will be useful
to explore the source. Reading INSTALL should be enough to get
started on Linux or OS X and x86[-64]. Other platforms may need more
work, and might not be the best choice if you’re not interested in
improving the port itself (although I’m told FreeBSD and Solaris work
very well on x86[-64]). To build SBCL from source, you’ll need an
SBCL binary (bootstrapping from other CLs should work, but support
regularly bitrots away), and the usual C development tools
(e.g. build-essential on debian). A fancy build (./make.sh --fancy)
is probably the best choice for development.
You’ll also want to run the test suite often; better try it out now
(cd tests; sh run-tests.sh) to make sure you can get it working.
The test suite will barf if there’s non-ASCII characters in the
environment. Oh my Zsh’s
git customisation systematically trips me up, for example (I currently
kludge it and start a bash from ~ and then run the tests).
Once SBCL HEAD is working and installed, it’s probably best to install emacs and SLIME. Quicklisp’s quicklisp-slime-helper can take care of installing SLIME. It is possible to work on SBCL without SLIME. However, SLIME has a lot of useful extensions, if only to explore the code base. If you’re not (and don’t wish to become) comfortable with emacs, it’s probably best to nevertheless use emacs and SLIME for the REPL, debugger, inspector, etc. and write code in your favourite editor. Later on, it’ll be useful to figure out how to make SLIME connect to a freshly-built SBCL.
I often see newcomers try to read the source like a book, and, once they realise there’s a lot of code, try to figure out a good order to read the source. I don’t think that’s the best approach. SBCL is pretty huge, and I doubt anyone ever simultaneously holds the complete system in their head. RAM’s “The Python Compiler for CMU Common Lisp” is still useful as an overview, and SBCL’s internals manual is a good supplement. Once you get close to bootstrapping logic, Christophe Rhodes’s “SBCL: a Sanely-Bootstrappable Common Lisp” helps understand the exclamation marks. Past that, I believe it’s preferrable to start out small, learn just enough to get the current task done, and accept that some things just work, without asking how (for now).
In that spirit, I’d say M-. (Alt period, Command period on some OS X emacsen) is the best way to explore most of SBCL’s source. SBCL’s build process preserves a lot of source location information, and M-. queries that information to jump to the definitions for any given symbol (M-, will pop back up to the previous location). For example, if you type “(truncate” at the REPL and hit M-. (with the point on or just after “truncate”), you’ll find the out of line definition for truncate in (mostly) regular Common Lisp, optimisation rules regarding truncate, and VOPs, assembly language templates, for truncate called with a few sets of argument and return types. The out of line definition isn’t that interesting. The transforms, however, are. (VOPs aren’t useful if one isn’t comfortable with the platform’s assembly language, and mostly self-explanatory otherwise.)
The one to “convert integer division to multiplication” is a very good
example. One could M-. on deftransform, and go down a very long
chain of definitions. Instead, I think it’s only essential to see
that the form defines a new rule, like a compiler macro, such that
compile-time values (lvars) that represent its two arguments are bound
to x and y, and the rule only triggers if its first argument is known
to be an unsigned word, and its second a constant unsigned word. If
that’s satisfied, the transformation still only triggers if the speed
optimisation quality is higher than both compilation speed and space
(code size).
Then, the constant value for y is extracted and bound to y, and a
conservative bound on the maximum value that x can take at runtime is
computed. If truncate by y should be handled elsewhere, the transform
gives up. Otherwise, it returns a form that will be wrapped in
(lambda (x y) ...) and spliced in the call, instead of truncate.
To extend SBCL’s support for division by constants, it’s not necessary to understand more of SBCL’s compiler than the above. There’s no need to try and understand how deftransform works, only that it defines a rule to simplify calls to truncate. Similarly for lvar-value and lvar-type: the former extracts the value for constant lvars, and the latter the static type derived for that lvar (value at a program point). With time, knowledge will slowly accrete. However it’s possible, if not best, to start hacking without understanding the whole system. This approach will lead to a bit of cargo culting, but mentors and people on IRC will help make sure it doesn’t do any harm, and can explain more stuff if it’s interesting or à propos.
Working on the compiler itself is a bit more work. I think the best
approach is to go in src/compiler/main.lisp and look for
compile-component. ir1-phases loops on a component and performs
high-level optimisations until fixpoint (or we get tired of waiting),
while %compile-component handles the conversion to IR2 and then to
machine code. The compilation pipeline hasn’t really changed since
the Python paper was written, and the subphases each have their own
function (and file). M-. on stuff that sounds interesting is probably
the best approach at the IR2 level.
The C and assembly runtime lives in src/runtime/. There’s a lot of
stuff that’s symlinked or generated during the build, so it’s probably
best to look at it after a successful build. Sadly, we don’t track
source locations there, but {c,e,whatever}tags works; so does grep.
GC stuff is in the obvious suspects (gc-common, gencgc, gc, etc.), but
may end up affecting core loading/saving (core, coreparse, save).
Depending on what in the core loading code is affected, code in
genesis (the initial bootstrap that reads fasl files from the cross
compiler and builds the initial core file) might also have to be
modified (mostly in src/compiler/generic/genesis.lisp). That’s...
more work. Like the project suggestions list says, when we change
things in the runtime, it sometimes ends up affecting a lot of other
components.
GDB tends to be less than useful, because of the many tricks SBCL
plays on itself. It’s usually hard to beat pen, paper, and printf.
At least, rebuilding the C runtime is quick: if the feature
:sb-after-xc-core is enabled (which already happens for --fancy
builds), slam.sh should be able to rebuild only the C runtime, and
then continue the bootstrap with the rest of SBCL from the previous
build. That mostly leaves PCL to build, so the whole thing should
takes less than a minute on a decent machine.
I was replying to an email when I realised that some general compiler references would be useful, in addition to project- and SBCL- specific tips.
Christian Queinnec’s Lisp in Small Pieces gives a good overview of issues regarding compiling Lisp-like languages. Andrew Appel’s Modern Compiler Implementation in ML is more, well, modern (I hear the versions in C and Java have the same text, but the code isn’t as nice... and ML is a very nice language for writing compilers). I also remember liking Appel’s Compiling with Continuations, but I don’t know if it’s particularly useful for CL or the projects we suggest.
For more complicated stuff, I believe Stephen Muchnick’s Advanced Compiler Design and Implementation would have been really nice to have, instead of slogging through code and dozens of papers. Allen & Kennedy’s Optimizing Compilers for Modern Architectures: A Dependence-based Approach is another really good read, but I’m not sure how useful it would be when working on SBCL: we still have a lot of work to do before reaching for the really sophisticated stuff (and what sophistication there is is fairly non-standard).
I believe the Rabbit and Orbit compilers have influenced the design of CMUCL and SBCL. The Lambda papers provide some historical perspective, and the RABBIT and ORBIT theses are linked here.
What little magic remains in SBCL and CMUCL is the type derivation (constraint propagation) pass, and how it’s used to exploit a repository of source-to-source transformations (deftransforms). The rest is bog-standard tech from the 70s or 80s. When trying to understand SBCL’s type derivation pass at a very high level, I remember finding Henry Baker’s The Nimble Type Inferencer for Common Lisp-84 very useful, even though it describes a scheme that doesn’t quite work for Common Lisp (it’s very hard to propagate information backward while respecting the final standard). Kaplan and Ullman’s A Scheme for the Automatic Inference of Variable Types was also helpful.
Over the years, I’ve seen a couple of people come in with great ambition, and give up after some time, seemingly without having made any progress. I believe a large part of the problem is that they tried to understand all of SBCL instead of just learning the bare minimum to get hacking, and that their goal was too big. I already wrote that SBCL is probably best approached bit by bit, with some guidance from people who’ve been there before, and I hope the projects we suggest can all lead to visible progress quickly, after a couple days or two weeks of work at most.
Still, before investing my time, I like to see the other person also give some of theirs to SBCL. This is why, as I wrote on the mailing list last week, I’m much more inclined to help someone who’s already built SBCL on their own and has submitted a patch that’s been committed or is being improved on the mailing list. I absolutely do not care what the patch is; it can be new code, a bugfix for a highly unlikely corner case, better documentation, or spelling and grammar corrections in comments. The bugs tagged as easy in our bugtracker may provide some inspiration. However trivial a patch might seem, it’s still a sign that someone is willing to put the work in to concretely make SBCL better, and I like that... it’s also a minimal test to make sure the person is able to work with our toolchain. (This isn’t SBCL policy for GSoC. It’s simply how I feel about these things.)
Again, I’m amazed by the number of people who wish to hack on SBCL this summer (as part of Google’s Summer of Code or otherwise). Because of that, I think it’s important to note that this is our first year, and so we’ll likely not have more than two or three spots. However, I always like seeing more contributors, and I hope anyone who’d like to contribute will always be guided, GSoC or not.
Finally, I’ll note that Google’s Summer of Code program was only a good excuse to write up our list of projects: they’re simply suggestions to incite programmers to see what they can do that is useful for SBCL and, most importantly, is interesting for them. Anyone should feel welcome to work on any of these projects, even if they’re not eligible or chosen for GSoC. They’re also only suggestions; if someone has their own idea, we can likely help them out just the same.
From: Software Engineer – Maps for Asha Smartphones (Scheme)
To quote:
The Nokia Asha range of smartphones is taking the world by storm. Sales of these feature-packed yet affordable devices have been steadily increasing in the world's major markets - so much so that we need to expand our HERE development team to provide all the location-based functionality and features our customers are demanding.
We're looking for an experienced, energetic and attentive software engineer to help us create leading-edge mapping and orientation experiences for the latest Asha phones. If you believe in our philosophy of 'doing good,' join us and help people around the world find their way and enjoy their lives even more using our maps and apps.
Your main role will be to specify, implement and ensure automation, stability and quality of software tests running on mobile devices on various platforms. And you'll be a valuable member of a crack team of front- and back-end developers, graphic designers and interaction designers from all over the world.
Nicely enough you'll be doing all of this work in one of the liveliest and most creative cities in the world, Berlin - and in a shiny new office building with cafeteria, gym, sauna, game room and lots of talented, friendly and creative people in it.
I'm a little bit behind my schedule of implementing NLTK examples in Lisp with no posts on topic in March. It doesn't mean that work on CL-NLP has stopped - I've just had an unexpected vacation and also worked on parts, related to writing programs for the excellent Natural Language Processing by Michael Collins Coursera course.
Today we'll start looking at Chapter 2, but we'll do it from the end, first exploring the topic of Wordnet.
Wordnet is a database of word meanings (senses) and word semantic relations (synonymy, hyponymy and various other ymy's). It is a really important freely available resource, so having easy access to it is very valuable. At the same time, getting Wordnet to work is somewhat involved technically. That's why I've decided to implement it's support in cl-nlp-contrib system, so that the basic cl-nlpfunctionality can be loaded without the additional hassle of having to ensure Wordnet (and other similar stuff) is loading.
The implementation of Wordnet interface in cl-nlp is incomplete in the sense, that it doesn't provide matching entities for all Wordnet tables and so doesn't support all the possible interactions with it out-of-the-box. Yet, it is sufficient to run all NLTK's examples and it is trivial to implement the rest as the need arises (I plan to do it later this year).
There are several other Wordnet interfaces for CL developed in the previous years, starting from as long ago as early nineties:
Non of them use my desired storage format (sqlite) and their overall support ranges from non-existing to little if any. Besides, one of the principles behind cl-nlp is to prefer uniform interface and ease of extensibility over performance and efficiency with the premise that a specialized efficient version can be easily implemented by restricting the more general one, while the opposite is often much harder. And adapting the existing packages to the desired interface didn't seem like a straightforward task, so I decided to implement Wordnet support from scratch.
Wordnet isn't a corpus, although NLTK categorizes it as such placing it in nltk.corpus module. It is a word database distributed in a custom format, as well as in binary format of several databases. I have picked sqlite3-based distribution as the easiest to work with — it's a single file of roughly 64 MB in size which can be downloaded from WNSQL project page. The default place where cl-nlp will search for Wordnet is the data/directory. If you call (download :wordnet), it will fetch a copy and place it there.
There are several options to work with sqlite databases in Lisp, and I have chosen CLSQL as the most mature and comprehensive system — it's a goto library for doing simple SQL stuff in CL. Besides the low-level interface it provides the basic ORM functionality and some other goodies.
Also, to work with sqlite from Lisp a C client library is needed. It can be obtained in various ways depending on your OS and distribution (on most Linuxes with apt-get or similar package managers). Yet there's another issue of making Lisp find the installed library. So for Linux I decided to ship the appropriate binaries in the project's lib/ dir. If you are not on Linux and CLSQL can't find sqlite native library, you need to manually call the following command before you load cl-nlp-contrib with quicklisp or ASDF:
(clsql:push-library-path "path to sqlite3 or libsqlite3 directory") Using CLSQL we can define classes for Wordnet entities, such as: Synset, Sense or Lexlink.
(clsql:def-view-class synset ()
((synsetid :initarg :synsetid :reader synset-id
:type integer :db-kind :key)
(pos :initarg :pos :reader synset-pos
:type char)
(lexdomainid :initarg :lexdomain
:type smallint :db-constraints (:not-null))
(definition :initarg :definition :reader synset-def
:type string)
(word :initarg :word :db-kind :virtual)
(sensenum :initarg :sensenum :db-kind :virtual))
(:base-table "synsets")) Notice the presence of virtual slots word and sensenum which are used to cache data from other tables in synset to display it like it is done in NLTK. They are populated lazily with slot-unbound method just like we've seen in Chapter 1.
All the DB entities are cached as well in the global cache, so that same requests don't produce different objects and Wordnet objects could be compared with eql.
There are some inconsistencies in Wordnet lexicon which have also migrated to NLTK interface (and then NLTK introduced some more). I've done a cleanup on that so some classes and slot accessors don't fully mimic the names of their DB counterparts or the NLTK's interface. For instance, in our dictionary word refers to a raw string and lemma — to its representation as a DB entity.
CLSQL also has special syntactic support for writing SQL queries in Lisp:
(clsql:select [item_id] [as [count [*]] 'num]
:from [table]
:where [is [null [sale_date]]]
:group-by [item_id]
:order-by '((num :desc))
:limit 1) Yet I've chosen to use another very simple custom solution for that, which I've kept in the back of my mind for a long time since I'd started working with CLSQL literal SQL syntax. Here's how we get all lemma object for a specific synset — they are connected through senses:
(query (select 'lemma
`(:where wordid
:in ,(select '(wordid :from senses)
`(:where synsetid := ,(synset-id synset)))))) For the sake of efficiency we don't open new DB connection on every such request, but use a combination of the following:
*default-database* that points to the current DB connection; it is used by all Wordnet interfacing functionsconnect-wordnet which is given an instance of sql-wordnet3 class (it has the usual default instance <wordnet>, but you can use any other instance which may connect to some othe Wordnet SQL DB like MySQL if needed)ensure-db function checks, if the connection is present and opens it otherwisewith-wordnet macro that implements a standard Lisp resource-management call-with-*pattern for the Wordnet connectionNow, let's run the examples from the book to see how they work in our interface.
First, connect to Wordnet:
WORDNET> (connect-wordnet <wordnet>)
#<CLSQL-SQLITE3:SQLITE3-DATABASE /cl-nlp/data/wordnet30.sqlite OPEN {1011D1FAF3}>
Now we can run the functions with the existing connection passed implicitly through the special variable. wn is a special convenience package which implements the logic of implicitly relying on the default <wordnet>. There are functions with the same names in wordnet package that take a wordnet instance as first argument, similar to how other parts of cl-nlp are organized.
WORDNET> (wn:synsets "motorcar")
(#<SYNSET auto.n.1 102958343 {10116B0003}>)
WORDNET> (synsets <wordnet> "motorcar")
(#<SYNSET auto.n.1 102958343 {10116B0003}>) The printing of synsets and other Wordnet objects is performed with custom print-object methods. Here's a print-object for synsetthat uses the built-in print-unreadable-object macro:
(defmethod print-object ((sample sample) stream)
(print-unreadable-object (sample stream :type t :identity t)
(with-slots (sample sampleid) sample
(format stream "~A ~A" sample sampleid)))) Let's proceed further.
WORDNET> (wn:words (wn:synset "car.n.1"))
("auto" "automobile" "car" "machine" "motorcar") This snippet is equivalent to the following NLTK code:
wn.synset('car.n.01').lemma_names Definitions and examples:
WORDNET> (synset-def (wn:synset "car.n.1"))
"a motor vehicle with four wheels; usually propelled by an internal combustion engine"
WORDNET> (wn:examples (wn:synset "car.n.1"))
("he needs a car to get to work") Next are lemmas.
WORDNET> (wn:lemmas (wn:synset "car.n.1"))
(#<LEMMA auto 9953 {100386BCC3}> #<LEMMA automobile 10063 {100386BCF3}>
#<LEMMA car 20722 {100386BC03}> #<LEMMA machine 80312 {100386BD23}>
#<LEMMA motorcar 86898 {1003250543}>) As I've written earlier, there's some confusion in Wordnet between words and lemmas, and, IMHO, it is propagated in NLTK. The term lemma only appears once in Wordnet DB as a column in words table. And synsets are not directly related to words — they are linked through senses table. NLTK calls a synset to word pairing a lemma, which only adds to the confusion. I decided to call entities of words table lemmas. Now, how do you implement the equivalent of
wn.lemma('car.n.01.automobile') We can do it like this:
WORDNET> (remove-if-not #`(string= "automobile" (lemma-word %))
(wn:lemmas (wn:synset "car.n.1")))
(#<LEMMA automobile 10063 {100386BCF3}>) But, actually, we need not a raw lemma object, but a sense object, because sense is that mapping of word to its meaning (defined by a synset), and it's a proper Wordnet entity for this:
WORDNET> (wn:sense "car~automobile.n.1")
#<SENSE car~auto.n.1 28261 {1011F226C3}> You can also get at the raw lemma by its DB id:
WORDNET> (wn:synset (wn:lemma 10063))
#<SYNSET auto.n.1 102958343 {100386BB33}> Notice, that synset here is named auto. This is different from NLTK's 'car', and the reason for this is that it's unclear from the Wordnet DB, what is the "primary" lemma for a synset, so I just use the word which appears first in the DB. Probably, NLTK also uses it, but has a different ordering — compare the next output with the book's one:
WORDNET> (dolist (synset (wn:synsets "car"))
(print (wn:words synset)))
("cable car" "car")
("auto" "automobile" "car" "machine" "motorcar")
("car" "railcar" "railroad car" "railway car")
("car" "elevator car")
("car" "gondola") Also I've chosen a different naming scheme for senses — word~synset — as it better reflects the semantic meaning of this concept.
There're two types of Wordnet relations: semantic ones between synsets and lexical ones between senses. All of the relations or links can be found in *link-types* parameter. There are 28 of them in Wordnet 3.0.
To get at any relation there's a generic function related:
WORDNET> (defvar *motorcar* (wn:synset "car.n.1"))
WORDNET> (defvar *types-of-motorcar* (wn:related *motorcar* :hyponym))
WORDNET> (nth 0 *types-of-motorcar*)
#<SYNSET ambulance.n.1 102701002 {1011E35063}> In our case "ambulance" is the first motorcar type.
WORDNET> (sort (mapcan #'wn:words *types-of-motorcar*) 'string<)
("ambulance" "beach waggon" "beach wagon" "bus" "cab" "compact" "compact car"
"convertible" "coupe" "cruiser" "electric" "electric automobile"
"electric car" "estate car" "gas guzzler" "hack" "hardtop" "hatchback" "heap"
"horseless carriage" "hot rod" "hot-rod" "jalopy" "jeep" "landrover" "limo"
"limousine" "loaner" "minicar" "minivan" "model t" "pace car" "patrol car"
"phaeton" "police car" "police cruiser" "prowl car" "race car" "racer"
"racing car" "roadster" "runabout" "s.u.v." "saloon" "secondhand car" "sedan"
"sport car" "sport utility" "sport utility vehicle" "sports car" "squad car"
"stanley steamer" "station waggon" "station wagon" "stock car" "subcompact"
"subcompact car" "suv" "taxi" "taxicab" "tourer" "touring car" "two-seater"
"used-car" "waggon" "wagon") Hypernyms are more general entities in synset hierarchy:
WORDNET> (wn:related *motorcar* :hypernym)
(#<SYNSET automotive vehicle.n.1 103791235 {10125702C3}>) Let's trace them up to root entity:
WORDNET> (defvar *paths* (wn:hypernym-paths *motorcar*))
WORDNET> (length *paths*)
2
WORDNET> (mapcar #'synset-name (nth 0 *paths*))
("entity.n.1" "physical entity.n.1" "object.n.1" "unit.n.6" "artefact.n.1"
"instrumentality.n.3" "conveyance.n.3" "vehicle.n.1" "wheeled vehicle.n.1"
"self-propelled vehicle.n.1" "automotive vehicle.n.1" "auto.n.1")
WORDNET> (mapcar #'synset-name (nth 1 *paths*))
("entity.n.1" "physical entity.n.1" "object.n.1" "unit.n.6" "artefact.n.1"
"instrumentality.n.3" "container.n.1" "wheeled vehicle.n.1"
"self-propelled vehicle.n.1" "automotive vehicle.n.1" "auto.n.1") And here are just the root hypernyms:
WORDNET> (remove-duplicates (mapcar #'car *paths*))
(#<SYNSET entity.n.1 100001740 {101D016453}>) Now, if we look at part-meronym, substance-meronym and member-holonymrelations and try to get them with related, we'll get an empty set.
WORDNET> (wn:related (wn:synset "tree.n.1") :substance-meronym)
NIL That's because the relation is actually one-way: a burl is part of a tree, but not vice versa. For this case there's a :reverse key to related:
WORDNET> (wn:related (wn:synset "tree.n.1") :part-meronym :reverse t)
(#<SYNSET stump.n.1 113111504 {10114220B3}>
#<SYNSET crown.n.7 113128003 {1011423B73}>
#<SYNSET limb.n.2 113163803 {1011425633}>
#<SYNSET bole.n.2 113165815 {10114270F3}>
#<SYNSET burl.n.2 113166044 {1011428BE3}>)
WORDNET> (wn:related (wn:synset "tree.n.1") :substance-meronym :reverse t)
(#<SYNSET sapwood.n.1 113097536 {10113E8BE3}>
#<SYNSET duramen.n.1 113097752 {10113EA6A3}>)
WORDNET> (wn:related (wn:synset "tree.n.1") :member-holonym :reverse t)
(#<SYNSET forest.n.1 108438533 {10115A81C3}>) While the tree is member-meronym of forest (i.e. NLTK has it slightly the opposite way):
WORDNET> (wn:related (wn:synset "tree.n.1") :member-meronym)
(#<SYNSET forest.n.1 108438533 {10115A81C3}>) And here's the mint example:
WORDNET> (dolist (s (wn:synsets "mint" :pos #\n))
(format t "~A: ~A~%" (synset-name s) (synset-def s)))
mint.n.6: a plant where money is coined by authority of the government
mint.n.5: a candy that is flavored with a mint oil
mint.n.4: the leaves of a mint plant used fresh or candied
mint.n.3: any member of the mint family of plants
mint.n.2: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
batch.n.2: (often followed by `of') a large number or amount or extent
WORDNET> (wn:related (wn:synset "mint.n.4") :part-holonym :reverse t)
(#<SYNSET mint.n.2 112855042 {1011CF3CF3}>)
WORDNET> (wn:related (wn:synset "mint.n.4") :substance-holonym :reverse t)
(#<SYNSET mint.n.5 107606278 {1011CEECA3}>) Verbs:
WORDNET> (wn:related (wn:synset "walk.v.1") :entail)
(#<SYNSET step.v.1 201928838 {1004139FF3}>)
WORDNET> (wn:related (wn:synset "eat.v.1") :entail)
(#<SYNSET chew.v.1 201201089 {10041BDC33}>
#<SYNSET get down.v.4 201201856 {10041BF6F3}>)
WORDNET> (wn:related (wn:synset "tease.v.3") :entail)
(#<SYNSET arouse.v.7 201762283 {10042495E3}>
#<SYNSET disappoint.v.1 201798936 {100424B0A3}>) And, finally, here's antonomy, which is a lexical, not semantic, relationship, and it takes place between senses:
WORDNET> (wn:related (wn:sense "supply~supply.n.2") :antonym)
(#<SENSE demand~demand.n.2 48880 {1003637C03}>)
WORDNET> (wn:related (wn:sense "rush~rush.v.1") :antonym)
(#<SENSE linger~dawdle.v.4 107873 {1010780DE3}>)
WORDNET> (wn:related (wn:sense "horizontal~horizontal.a.1") :antonym)
(#<SENSE inclined~inclined.a.2 94496 {1010A5E653}>)
WORDNET> (wn:related (wn:sense "staccato~staccato.r.1") :antonym)
(#<SENSE legato~legato.r.1 105844 {1010C18AF3}>) Let's now use the lowest-common-hypernyms function abbreviated to lch to see the semantic grouping of marine animals:
WORDNET> (defvar *right* (wn:synset "right whale.n.1"))
WORDNET> (defvar *orca* (wn:synset "orca.n.1"))
WORDNET> (defvar *minke* (wn:synset "minke whale.n.1"))
WORDNET> (defvar *tortoise* (wn:synset "tortoise.n.1"))
WORDNET> (defvar *novel* (wn:synset "novel.n.1"))
WORDNET> (wn:lch *right* *minke*)
(#<SYNSET baleen whale.n.1 102063224 {102A2E0003}>)
2
1
WORDNET> (wn:lch *right* *orca*)
(#<SYNSET whale.n.2 102062744 {102A373323}>)
3
2
WORDNET> (wn:lch *right* *tortoise*)
(#<SYNSET craniate.n.1 101471682 {102A3734A3}>)
7
5
WORDNET> (wn:lch *right* *novel*)
(#<SYNSET entity.n.1 100001740 {102A373653}>)
15
7 The second and third return values of lch come handy here, as they show the depth of the paths to the common ancestor and give some immediate data for estimating semantic relatedness, that we're going to explore more now.
WORDNET> (wn:min-depth (wn:synset "baleen whale.n.1"))
14
WORDNET> (wn:min-depth (wn:synset "whale.n.2"))
13
WORDNET> (wn:min-depth (wn:synset "vertebrate.n.1"))
8
WORDNET> (wn:min-depth (wn:synset "entity.n.1"))
0 By the way, there's another whale, that is much closer to the root:
WORDNET> (wn:min-depth (wn:synset "whale.n.1"))
5 Guess what it is?
WORDNET> (synset-def (wn:synset "whale.n.1"))
"a very large person; impressive in size or qualities" Now, let's calculate different similarity measures:
WORDNET> (wn:path-similarity *right* *minke*)
1/4
WORDNET> (wn:path-similarity *right* *orca*)
1/6
WORDNET> (wn:path-similarity *right* *tortoise*)
1/13
WORDNET> (wn:path-similarity *right* *novel*)
1/23 The algorithm here is very simple — it just reuses the secondary return values of lch:
(defmethod path-similarity ((wordnet sql-wordnet3)
(synset1 synset) (synset2 synset))
(mv-bind (_ l1 l2) (lowest-common-hypernyms wordnet synset1 synset2)
(declare (ignore _))
(when l1
(/ 1 (+ l1 l2 1))))) There are many more Wordnet similarity measures in NLTK, and they are also implemented in cl-nlp. For example, lch-similarity the name of which luckily coincides with lch function:
WORDNET> (wn:lch-similarity (wn:synset "dog.n.1") (wn:synset "cat.n.1"))
Calculating taxonomy depth for #<SYNSET entity.n.1 100001740 {102A9F7A83}>
2.0281482 This measure depends on performing expensive computation of taxonomy depth which is done in a lazy manner in the max-depth function:
(defmethod max-depth ((wordnet sql-wordnet3) (synset synset))
(reduce #'max (mapcar #`(or (get# % *taxonomy-depths*)
(progn
(format *debug-io*
"Calculating taxonomy depth for ~A~%" %)
(set# % *taxonomy-depths*
(calc-max-depth wordnet %))))
(mapcar #'car (hypernym-paths wordnet synset))))) Other similarity measures include wup-similarity, res-similarity, lin-similarity and others. You can read about them in the NLTK Wordnet manual. Most of them depend on the words' information content database which is calculated for different corpora (e.g. BNC and Penn Treebank) and is not part of Wordnet. There's a WordNet::Similarity projectthat distributes pre-calculated databases for some popular corpora.
Lin similarity proposes a similar formula to LCH similarity for calculating the score, but only uses information contents instead of taxonomy depths as the arguments to it:
WORDNET> (wn:lin-similarity (wn:synset "dog.n.1") (wn:synset "cat.n.1"))
0.87035906 To calculate it we first need to fetch the information content corpus with (download :wordnet-ic) and load the data for SemCor:
WORDNET> (setf *lc* (load-ic :filename-pattern "semcor"))
#<HASH-TABLE :TEST EQUAL :COUNT 213482 {100D7B80D3}> The default :filename-pattern will load the combined information content scores for the following 5 corpora: BNC, Brown, SemCor and its raw version, all of Shakespeare's works and Penn Treebank. We'll talk more about various corpora in the next part of the series...
Well, this is all as far as Wordnet concerned for now. There's another useful Wordnet functionality — word morpher, but we'll return to it when we'll be talking about word forming, stemming and lemmatization. Meanwhile, here's a good schema showing the full potential of Wordnet:

From this posting: Artificial Intelligence Intern Job
SAIC's Sensors and Phenomenology Operation has an internship position at its Autonomy and Analytics Division in Arlington, Virginia for the summer of 2013.
Under the general supervision of a staff researcher, the Intern will perform a wide variety of duties applicable to their field of study supporting technical projects often in a team environment. Interns may be asked to participate in activities supporting multiple projects or work on a small task that contributes to a large project of great importance. Interns will have the opportunity throughout the summer to participate in additional activities to include multiple social functions, SAIC Interns' Outing to a Nats game, and numerous coaching sessions on life after college.
The goal for this internship is to explore artificial intelligence (AI) techniques for diagram and chart analysis in real-world domains. This research may also look at related areas, such as diagrammatic and visual reasoning, document retrieval from large corpora or from the web, and diagram generation.
Systems will be built using Clojure and/or Scheme, so experience with one of these languages or with Lisp is essential. Although systems will not be built with Java, familiarity with existing Java libraries is also useful.
Previously I wrote that lparallel 2.3.2 failed to compile for me under SBCL 1.0.50. James was kind enough to contact me by email to say that he has fixed that. I'm guessing we'll see that in a future Quicklisp update.
In the near future, I plan to write about my grokking and use of lparallel in a commercial project.
lparallel 2.3.2 failed to compile with my SBCL 1.0.50 installation:
CL-USER> (ql:quickload "lparallel")
To load "lparallel":
Load 1 ASDF system:
lparallel
; Loading "lparallel"
[package alexandria.0.dev]........................
[package bordeaux-threads]........................
[package lparallel.util]..........................
[package lparallel.thread-util]...................
[package lparallel.raw-queue].....................
[package lparallel.cons-queue]....................
[package lparallel.vector-queue]..................
[package lparallel.queue].........................
[package lparallel.biased-queue]..................
[package lparallel.counter].......................
[package lparallel.spin-queue]....................
[package lparallel.kernel]........................
[package lparallel.kernel-util]...................
[package lparallel.ptree].........................
[package lparallel.promise].......................
[package lparallel.defpun]........................
[package lparallel.cognate].......................
[package lparallel].........
; file: /home/mjf/quicklisp/dists/quicklisp/software/lparallel-20130312-git/src/spin-queue/cas-spin-queue.lisp
; in: DEFUN PUSH-SPIN-QUEUE
; (LPARALLEL.SPIN-QUEUE::CAS
; (LPARALLEL.SPIN-QUEUE::NODE-CDR
; (LPARALLEL.SPIN-QUEUE::SPIN-QUEUE-TAIL LPARALLEL.SPIN-QUEUE::QUEUE))
; NIL LPARALLEL.SPIN-QUEUE::NEW)
; --> EQ
; ==>
; (COMPARE-AND-SWAP
; (LPARALLEL.SPIN-QUEUE::NODE-CDR
; (LPARALLEL.SPIN-QUEUE::SPIN-QUEUE-TAIL LPARALLEL.SPIN-QUEUE::QUEUE))
; NIL LPARALLEL.SPIN-QUEUE::NEW)
;
; caught ERROR:
; during macroexpansion of (SB-EXT:COMPARE-AND-SWAP (NODE-CDR #) NIL ...). Use
; *BREAK-ON-SIGNALS* to intercept:
;
; Invalid first argument to COMPARE-AND-SWAP: (NODE-CDR (SPIN-QUEUE-TAIL QUEUE)).
;
; compilation unit aborted
; caught 1 fatal ERROR condition
; caught 1 ERROR condition
; Evaluation aborted on #.
We deploy applications on more recent versions of SBCL, so there was no reason to not upgrade. However, warned by Christophe Rhodes’s post, I chose to avoid version 1.1.6. I downloaded 1.1.5 and tweaked my copy of the Slackbuilds SBCL script to build it.
Following the upgrade, I was able to load lparallel and run its tests successfully.
svref was sped up, but unfortunately without sufficient generality, so code doing svref on a symbol-macro fails to compile.svref on a symbol-macro. (If it does, a quick workaround is to replace the svrefs with arefs). As well as the public service announcement, though, I have a question for the wider community: how can we encourage testing the code and clearly communicating the results just before the release happens rather than just after? It's somewhat frustrating to have a week-long code freeze, then bug reports of serious issues a few hours after the release is made... and unfortunately answers of the form "just test everything yourself during the freeze period" aren't currently practical. Maybe someday.
For older items, see the Planet Lisp Archives.
Last updated: 2013-05-13 17:33