Planet Lisp

Lispers.deBerlin Lispers - Talk Announcement - Thursday, 24th October 2019

· 2 days ago

Georg Kinnemann will give a talk about LISP and AI with the title "Die Entwicklung eines Spezial-Computers für Künstliche Intelligenz in der DDR" at Deutsche Technikmuseum Berlin.

5.30pm, 24th October

Deutsches Technikmuseum, Trebbiner Straße 9, Vortragssaal, 4. OG

[1] https://sdtb.de/technikmuseum/veranstaltungen/ein-spezial-computer-fuer-kuenstliche-intelligenz-in-der-ddr/

[2] https://lispers.de/docs/ki-spezialcomputer-dtb-20191024.txt

Quicklisp newsOctober 2019 Quicklisp dist update now available

· 9 days ago
New projects:
  • 3bz — deflate decompressor — MIT
  • bp — Bitcoin Protocol components in Common Lisp — MIT
  • cardiogram — Simple test framework — MIT
  • cesdi — Provides a compute-effective-slot-definition-initargs generic function that allows for more ergonomic initialization of effective slot definition objects. — Unlicense
  • chameleon — Configuration management facilities for Common Lisp with multiple profile support. — MIT
  • ci-utils — A set of tools for using CI platforms — MIT
  • cl-clsparse — Common Lisp bindings for clSPARSE — Apache License, Version 2.0
  • cl-ecma-48 — This package exports a macro for defining ECMA-48 control functions and the 162 functions defined by this. — AGPLv3
  • cl-flat-tree — A flat-tree implementation in Common Lisp. — MIT
  • cl-kraken — A Common Lisp API client for the Kraken exchange — MIT
  • cl-naive-store — This is a naive, persisted, in memory (lazy loading) data store for Common Lisp. — MIT
  • cl-shlex — Lexical analyzer for simple shell-like syntax. — MIT
  • cl-smt-lib — SMT object supporting SMT-LIB communication over input and output streams — BSD-3-Clause
  • cl-wadler-pprint — An implementation of A Prettier Printer in Common Lisp. — Apache-2.0/MIT
  • classowary — An implementation of the Cassowary linear constraint solver toolkit — zlib
  • clsql-local-time — Allows the use of local-time:timestamp objects in CLSQL models and queries — MIT license
  • datamuse — Common Lisp library for accessing the Datamuse word-finding API — MIT
  • date-calc — Package for simple date calculation — GPL or Artistic
  • font-discovery — Find system font files matching a font spec. — zlib
  • horse-html — Parenscript/HTML done better — MIT
  • hunchentoot-multi-acceptor — Multiple domain support under single hunchentoot acceptor — Apache License, Version 2.0
  • lila — a cleaner language based on Common Lisp — MIT
  • linear-programming — A library for solving linear programming problems — MIT
  • lsx — Embeddable HTML templating engine with JSX-like syntax — BSD 2-Clause
  • markup — markup provides a reader-macro to read HTML/XML tags inside of Common Lisp code — Apache License, Version 2.0
  • num-utils — Numerical utilities for Common Lisp — Boost Software License - Version 1.0
  • orizuru-orm — An ORM for Common Lisp and PostgreSQL. — GPLv3
  • paren6 — Paren6 is a set of ES6 macros for Parenscript — Apache License, version 2.0
  • pngload-fast — A reader for the PNG image format. — MIT
  • polisher — Infix notation to S-expression translator — MIT
  • select — DSL for array slices. — Boost
  • simple-parallel-tasks — Evaluate forms in parallel — GPL-3
  • stripe — A client for the Stripe payment API. — MIT
  • trivial-extensible-sequences — Portability library for the extensible sequences protocol. — zlib
  • trivial-package-local-nicknames — Portability library for package-local nicknames — Public domain
  • uax-14 — Implementation of the Unicode Standards Annex #14's line breaking algorithm — zlib
  • uax-9 — Implementation of the Unicode Standards Annex #9's bidirectional text algorithm — zlib
  • with-output-to-stream — Provides a simple way of directing output to a stream according to the concise and intuitive semantics of FORMAT's stream argument. — Public Domain
  • ziz — An ad hoc Quicklisp distribution. — MIT
Updated projects: 3d-matricesalso-alsaanaphoraantikaprilarchitecture.service-providerasdf-encodingsasteroidsatomicsbikebinary-iobinfixbknr-datastoreblack-tiebodge-chipmunkbodge-glfwbodge-nanovgbodge-nuklearbodge-odebodge-openalbodge-sndfilecavemanceplcl+sslcl-algebraic-data-typecl-amqpcl-change-casecl-collidercl-cookiecl-coverallscl-cudacl-dbicl-digikar-utilitiescl-fadcl-fondcl-freetype2cl-geocodecl-hamcrestcl-ipfs-api2cl-kanrencl-ledgercl-lexercl-lzlibcl-mangocl-marklesscl-mssqlcl-openstack-clientcl-patternscl-pdfcl-permutationcl-pythoncl-qrencodecl-rdkafkacl-readlinecl-satcl-sat.glucosecl-sat.minisatcl-sdl2cl-sqlitecl-strcl-tiledcl-yesqlclackclack-errorscloser-mopclxcoleslawcommand-line-argumentscommon-lisp-jupyterconcrete-syntax-treecroatoandata-lensdatum-commentsdefinitionsdeploydexadordrakmadufyeasy-routeseclectorecoenvyeruditeesrapesrap-pegfare-scriptsfast-httpfast-websocketfemlispfiascofloat-featuresflowfolio2fxmlgendlglsl-specglsl-toolkitgolden-utilsgraphhelambdaphermetichttp-bodyironcladjsonrpcjsownkenzolacklastfmlisp-binaryliterate-lisplog4cllucernemagiclmaidenmatlispmcclimmitoninevehninglenodguioriginoverlordparachuteparseparser.common-rulespatchworkpetalisppiggyback-parameterspjlinkpngloadportableaservepostmodernproc-parseprometheus.clpy4clqlotquilcquriqvmrandomratifyrereplicrestasroverpcqrtg-mathrutilssc-extensionsscalplselserapeumshadowshould-testsimplified-typesslyspinneretstaplestumpwmswank-clienttriviatrivial-continuationtrivial-hashtable-serializetrivial-indenttrivial-json-codectrivial-left-padtrivial-monitored-threadtrivial-object-locktrivial-pooled-databasetrivial-timertrivial-utilitiestrivial-variable-bindingstype-iumbrauri-templateutilities.print-itemsvarjoverbosevernacularwoozs3.

To get this update, use (ql:update-dist "quicklisp").

If you get a "badly sized local archive" error during the update, you can also safely use the DELETE-AND-RETRY restart to proceed. This error was introduced by a metadata problem on my end. Sorry about that!

Didier VernaTFM 1.0 "Artificial Uncial" is released.

· 14 days ago

I'm happy to announce the first stable version of TFM, a TeX Font Metrics parser for Common Lisp. TFM is the standard font description format used by TeX. The TFM library parses and decodes TFM files into an abstract data structure, providing easy access to the corresponding font information. Get it here.

Lispers.deLisp-Meetup in Hamburg on Monday, 7th October 2019

· 16 days ago

We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CEST on 7th October 2019.

This is an informal gathering of Lispers of all experience levels.

Paul KhuongA Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing

· 23 days ago

I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.

A bound on the length of contiguous runs of entries

There’s a lot of work on the expected time complexity of operations on linear probing Robin Hood hash tables. Alfredo Viola, along with a few collaborators, has long been exploring the distribution of displacements (i.e., search times) for random elements. The packed memory array angle has also been around for a while.1

I’m a bit wary of the “random element” aspect of the linear probing bounds: while I’m comfortable with an expectation over the hash function (i.e., over the uniformly distributed hash values), a program could repeatedly ask for the same key, and consistently experience worse-than-expected performance. I’m more interested in bounding the worst-case displacement (the distance between the ideal location for an element, and where it is actually located) across all values in a randomly generated2 Robin Hood table, with high enough probability. That probability doesn’t have to be extremely high: \(p = 0.9\) or even \(p = 0.5\) is good enough, as long as we can either rehash with an independent hash function, or the probability of failure drops exponentially enough as the displacement leeway grows.

The people who study hashing with buckets, or hashing as load balancing, seem more interested in these probable worst-case bounds: as soon as one bucket overflows, it’s game over for that hash table! In that context, we wish to determine how much headroom we must reserve in each bucket, on top of the expected occupancy, in order to make sure failures are rare enough. That’s a balls into bins problem, where the \(m\) balls are entries in the hash table, and the \(n\) bins its hash buckets.

Raab and Steger’s Simple and Tight Analysis of the “balls into bins” problem shows that the case where the average occupancy grows with \((\log m)\sp{k}\) and \(k > 1\) has potential, when it comes to worst-case bounds that shrink quickly enough: we only need headroom that grows with \(\sqrt{\log\sp{k+1} m} = \log\sp{(k+1)/2} m\), slightly more than the square root of the average occupancy.

The only issue is that the balls-into-bins analysis is asymptotic, and, more importantly, doesn’t apply at all to linear probing!

One could propose a form of packed memory array, where the sorted set is subdivided in chunks such that the expected load per chunk is in \(\Theta(\log\sp{k} m)\), and the size of each chunk multiplicatively larger (more than \(\log\sp{(k+1)/2}m\))…

Can we instead derive similar bounds with regular linear probing? It turns out that Raab and Steger’s bounds are indeed simple: they find the probability of overflow for one bucket, and derive a union bound for the probability of overflow for any bucket by multiplying the single-bucket failure probability by the number of buckets. Moreover, the single-bucket case itself is a Binomial confidence interval.

We can use the same approach for linear probing; I don’t expect a tight result, but it might be useful.

Let’s say we want to determine how unlikely it is to observe a clump of \(\log\sp{k}n\) entries, where \(n\) is the capacity of the hash table. We can bound the probability of observing such a clump starting at index 0 in the backing array, and multiply by the size of the array for our union bound (the clump could start anywhere in the array).

Given density \(d = m / n\), where \(m\) is the number of entries and \(n\) is the size of the array that backs the hash table, the probability that any given element falls in a range of size \(\log\sp{k}n\) is \(p = d/n \log\sp{k}n\). The number of entries in such a range follows a Binomial distribution \(B(dn, p)\), with expected value \(d \log\sp{k}n\). We want to determine the maximum density \(d\) such that \(\mathrm{Pr}[B(dn, p) > \log\sp{k}n] < \frac{\alpha}{dn}\), where \(\alpha\) is our overall failure rate. If rehashing is acceptable, we can let \(\alpha = 0.5\), and expect to find a suitably uniform hash function after half a rehash on average.

We know we want \(k > 1\) for the tail to shrink rapidly enough as \(n\) grows, but even \(\log\sp{2}n\) doesn’t shrink very rapidly. After some trial and error, I settled on a chunk size \(s(n) = 5 \log\sb{2}\sp{3/2} n\). That’s not great for small or medium sized tables (e.g., \(s(1024) = 158.1\)), but grows slowly, and reflects extreme worst cases; in practice, we can expect the worst case for any table to be more reasonable.

Assuming we have a quantile function for the Binomial distribution, we can find the occupancy of our chunk, at \(q = 1 - \frac{\alpha}{n}\). The occupancy is a monotonic function of the density, so we can use, e.g., bisection search to find the maximum density such that the probability that we saturate our chunk is \(\frac{\alpha}{n}\), and thus the probability that any continuous run of entries has size at least \(s(n) = 5 \log\sb{2}\sp{3/2} n\) is less than \(\alpha\).

For \(\alpha = 0.5\), the plot of densities looks like the following.

This curve roughly matches the shape of some my older purely numerical experiments with Robin Hood hashing. When the table is small (less than \(\approx 1000\)), \(\log n\) is a large fraction of \(n\), so the probability of finding a run of size \(s(n) = 5 \log\sb{2}\sp{3/2} n\) is low. When the table is much larger, the asymptotic result kicks in, and the probabiliy slowly shrinks. However, even around the worst case \(n \approx 4500\), we can exceed \(77\%\) density and only observe a run of length \(s(n)\) half the time.

If we really don’t want to rehash, we can let \(\alpha = 10\sp{-10}\), which compresses the curve and shifts it down: the minimum value is now slightly above \(50\%\) density, and we can clearly see the growth in permissible density as the size of the table grows.

In practice, we can dynamically compute the worst-case displacement, which is always less than the longest run (i.e., less than \(s(n) = 5 \log\sb{2}\sp{3/2} n\)). However, having non-asymptotic bounds lets us write size-specialised code and know that its assumptions are likely to be satisfied in real life.

Bounding buffer sizes for operations on sorted hashed streams

I mentioned at the beginning of this post that we can also manipulate Robin Hood hash tables as sorted sets, where the sort keys are uniformly distributed hash values.

Let’s say we wished to merge the immutable source table S into the larger destination table D in-place, without copying all of D. For example, from

S = [2, 3, 4]
D = [1, 6, 7, 9, 10];

we want the merged result

D' = [1, 2, 3, 4, 6, 7, 9, 10].

The issue here is that, even with gaps, we might have to overwrite elements of D, and buffer them in some scratch space until we get to their final position. In this case, all three elements of S must be inserted between the first and second elements of D, so we could need to buffer D[1:4].

How large of a merge buffer should we reasonably plan for?

In general, we might have to buffer as many elements as there are in the smaller table of S and D. However, we’re working with hash values, so we can expect them to be distributed uniformly. That should give us some grip on the problem.

We can do even better and only assume that both sorted sets were sampled from the same underlying distribution. The key idea is that the rank of an element in S is equal to the value of S’s empirical distribution function for that element, multiplied by the size of S (similarly for D).

The amount of buffering we might need is simply a measure of the worst-case difference between the two empirical DFs: the more S get ahead of D, the more we need to buffer values of D before overwriting them (if we’re very unlucky, we might need a buffer the same size as S). That’s the two-sample Kolmogorov-Smirnov statistic, and we have simple bounds for that distance.

With probability \(1 - \alpha\), we’ll consume from S and D at the same rate \(\pm \sqrt{-\frac{(|S| + |D|) \ln \alpha}{2 |S| |D|}}\). We can let \(\alpha = 10\sp{-10}\) and pre-allocate a buffer of size

\[|S| \sqrt{-\frac{(|S| + |D|) \ln \alpha}{2 |S| |D|}} < \sqrt{\frac{23.03 |S| (|S| + |D|)}{2 |D|}}.\]

In the worst case, \(|S| = |D|\), and we can preallocate a buffer of size \(\sqrt{23.03 |D|} < 4.8 \sqrt{|D|}\) and only need to grow the buffer every ten billion (\(\alpha\sp{-1}\)) merge.

The same bound applies in a stream processing setting; I assume this is closer to what Frank had in mind when he brought up this question.

Let’s assume a “push” dataflow model, where we still work on sorted sets of uniformly distributed hash values (and the data tuples associated with them), but now in streams that generate values every tick. The buffer size problem now sounds as follows. We wish to implement a sorted merge operator for two input streams that generate one value every tick, and we can’t tell our sources to cease producing values; how much buffer space might we need in order to merge them correctly?

Again, we can go back to the Kolmogorov-Smirnov statistic. In this case however, we could buffer each stream independently, so we’re looking for critical values for the one-sided one-sample Kolmogorov-Smirnov test (how much one stream might get ahead of the hypothetical exactly uniform stream). We have recent (1990) simple and tight bounds for this case as well.

The critical values for the one-sided case are stronger than the two-sided two-sample critical values we used earlier: given an overflow probability of \(1 - \alpha\), we need to buffer at most \(\sqrt{-\frac{n \ln \alpha}{2}},\) elements. For \(\alpha = 10\sp{-20}\) that’s less than \(4.8 \sqrt{n}\).3 This square root scaling is pretty good news in practice: shrinking \(n\) to \(\sqrt{n}\) tends to correspond to going down a rung or two in the storage hierarchy. For example, \(10\sp{15}\) elements is clearly in the range of distributed storage; however, such a humongous stream calls for a buffer of fewer than \(1.5 \cdot 10\sp{8}\) elements, which, at a couple gigabytes, should fit in RAM on one large machine. Similarly, \(10\sp{10}\) elements might fill the RAM on one machine, but the corresponding buffer of less than half a million elements could fit in L3 cache, while one million elements could fill the L3, and 4800 elements fit in L1 or L2.

What I find neat about this (probabilistic) bound on the buffer size is its independence from the size of the other inputs to the merge operator. We can have a shared \(\Theta(\sqrt{n})\)-size buffer in front of each stream, and do all our operations without worrying about getting stuck (unless we’re extremely unlucky, in which case we can grow the buffer a bit and resume or restart the computation).

Probably of more theoretical interest is the fact that these bounds do not assume a uniform distribution, only that all the input streams are identically and independently sampled from the same underlying distribution. That’s the beauty of working in terms of the (inverse) distribution functions.

I don’t think there’s anything deeper

That’s it. Two cute tricks that use well-understood statistical distributions in hashed data structure and algorithm design. I doubt there’s anything to generalise from either bounding approach.

However, I definitely believe they’re useful in practice. I like knowing that I can expect the maximum displacement for a table of \(n\) elements with Robin Hood linear probing to be less than \(5 \log\sb{2}^{3/2} n\), because that lets me select an appropriate option for each table, as a function of that table’s maximum displacement, while knowing the range of displacements I might have to handle. Having a strong bound on how much I might have to buffer for stream join operators feels even more useful: I can pre-allocate a single buffer per stream and not think about efficiently growing the buffer, or signaling that a consumer is falling behind. The probability that I’ll need a larger buffer is so low that I just need to handle it, however inefficiently. In a replicated system, where each node picks an independent hash function, I would even consider crashing when the buffer is too small!


  1. I swear the Waterloo theme isn’t some CanCon thing.

  2. With consistent tie-breaking, the layout of entries in a Robin hood hash table is a function of the set of entries, independently of the sequence of add and remove operations.

  3. Assuming we consume ten streams per nanosecond, we expect to experience underbuffering once every 316 years.

Vsevolod DyomkinProgramming Algorithms: Trees

· 25 days ago


Couldn't resist adding this xkcd

Balancing a binary tree is the infamous interview problem that has all that folklore and debate associated with it. To tell you the truth, like the other 99% of programmers, I never had to perform this task for some work-related project. And not even due to the existence of ready-made libraries, but because self-balancing binary trees are, actually, pretty rarely used. But trees, in general, are ubiquitous even if you may not recognize their presence. The source code we operate with, at some stage of its life, is represented as a tree (a popular term here is Abstract Syntax Tree or AST, but the abstract variant is not the only one the compilers process). The directory structure of the file system is the tree. The object-oriented class hierarchy is likewise. And so on. So, returning to interview questions, trees indeed are a good area as they allow to cover a number of basic points: linked data structures, recursion, complexity. But there's a much better task, which I have encountered a lot in practice and also used quite successfully in the interview process: breadth-first tree traversal. We'll talk about it a bit later.

Similar to how hash-tables can be thought of as more sophisticated arrays (they are sometimes even called "associative arrays"), trees may be considered an expansion of linked lists. Although technically, a few specific trees are implemented not as a linked data structure but are based on arrays, the majority of trees are linked. Like hash-tables, some trees also allow for efficient access to the element by key, representing an alternative key-value implementation option.

Basically, a tree is a recursive data structure that consists of nodes. Each node may have zero or more children. If the node doesn't have a parent, it is called the root of the tree. And the constraint on trees is that the root is always single. Graphs may be considered a generalization of trees that don't impose this constraint, and we'll discuss them in a separate chapter. In graph terms, a tree is an acyclic directed single-component graph. Directed means that there's a one-way parent-child relation. And acyclic means that a child can't have a connection to the parent neither directly, nor through some other nodes (in the opposite case, what will be the parent and what — the child?) The recursive nature of trees manifests in the fact that if we extract an arbitrary node of the tree with all of its descendants, the resulting part will remain a tree. We can call it a subtree. Besides parent-child or, more generally, ancestor-descendant "vertical" relationships that apply to all the nodes in the tree, we can also talk about horizontal siblings — the set of nodes that have the same parent/ancestor.

Another important tree concept is the distinction between terminal (leaf) and nonterminal (branch) nodes. Leaf nodes don't have any children. In some trees, the data is stored only in the leaves with branch nodes serving to structure the tree in a certain manner. In other trees, the data is stored in all nodes without any distinction.

Implementation Variants

As we said, the default tree implementation is a linked structure. A linked list may be considered a degenerate tree with all nodes having a single child. A tree node may have more than one child, and so, in a linked representation, each tree root or subroot is the origin of a number of linked lists (sometimes, they are called "paths").

Tree:  a
/ \
b c
/ \ \
d e f

Lists:
a -> b -> d
a -> b -> e
a -> c -> f
b -> d
b -> e
c -> f

So, a simple linked tree implementation will look a lot like a linked list one:

(defstruct (tree-node (:conc-name nil))
key
children) ; instead of linked list's next

CL-USER> (with ((f (make-tree-node :key "f"))
(e (make-tree-node :key "e"))
(d (make-tree-node :key "d"))
(c (make-tree-node :key "c" :children (list f)))
(b (make-tree-node :key "b" :children (list d e))))
(make-tree-node :key "a"
:children (list b c)))
#S(TREE-NODE
:KEY "a"
:CHILDREN (#S(TREE-NODE
:KEY "b"
:CHILDREN (#S(TREE-NODE :KEY "d" :CHILDREN NIL)
#S(TREE-NODE :KEY "e" :CHILDREN NIL)))
#S(TREE-NODE
:KEY "c"
:CHILDREN (#S(TREE-NODE :KEY "f" :CHILDREN NIL)))))

Similar to lists that had to be constructed from tail to head, we had to populate the tree in reverse order: from leaves to root. With lists, we could, as an alternative, use push and reverse the result, in the end. But, for trees, there's no such operation as reverse.

Obviously, not only lists can be used as a data structure to hold the children. When the number of children is fixed (for example, in a binary tree), they may be defined as separate slots: e.g. left and right. Another option will be to use a key-value, which allows assigning labels to tree edges (as the keys of the kv), but the downside is that the ordering isn't defined (unless we use an ordered kv like a linked hash-table). We may also want to assign weights or other properties to the edges, and, in this case, either an additional collection (say child-weights) or a separate edge struct should be defined to store all those properties. In the latter case, the node structure will contain edges instead of children. In fact, the tree can also be represented as a list of such edge structures, although this approach is quite inefficient, for most of the use cases.

Another tree representation utilizes the available linked list implementation directly instead of reimplementing it. Let's consider the following simple Lisp form:

(defun foo (bar)
"Foo function."
(baz bar))

It is a tree with the root containing the symbol defun and 4 children:

  • the terminal symbol foo
  • the tree containing the function arguments ((bar))
  • the terminal sting (the docstring "Foo function.")
  • and the tree containing the form to evaluate ((baz bar))

By default, in the list-based tree, the first element is the head and the rest are the leaves. This representation is very compact and convenient for humans, so it is used not only for source code. For example, you can see a similar representation for the constituency trees, in linguistics:

(TOP (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN test))) (. .)))
;; if we'd like to use the above form as Lisp code,
;; we'd have to shield the symbol "." with ||: (|.| |.|) instead of (. .)

It is equivalent to the following parse tree:

     TOP
/ | \
| VP |
| | \ |
NP | NP |
| | / \ |
DT VBZ DT NN .
This is a test .

Another, more specific, alternative is when we are interested only in the terminal nodes. In that case, there will be no explicit root and each list item will be a subtree. The following trees are equivalent:

(((a b (c d)) e) (f (g h)))

<root>
/ \
/ \ / \
/ | \ e f /\
a b /\ g h
c d

A tree that has all terminals at the same depth and all nonterminal nodes present — a complete tree — with a specified number of children may be stored in a vector. This is a very efficient implementation that we'll have a glance at when we'll talk about heaps.

Finally, a tree may be also represented, although quite inefficiently, with a matrix (only one half is necessary).

Tree Traversal

It should be noted that, unlike with other structures, basic operations, such as tree construction, modification, element search and retrieval, work differently for different tree variants. Thus we'll discuss them further when describing those variants.

Yet, one tree-specific operation is common to all tree representations: traversal. Traversing a tree means iterating over its subtrees or nodes in a certain order. The most direct traversal is called depth-first search or DFS. It is the recursive traversal from parent to child and then to the next child after we return from the recursion. The simplest DFS for our tree-node-based tree may be coded in the following manner:

(defun dfs-node (fn root)
(call fn (key root))
(dolist (child (children root))
(dfs-node fn child)))

CL-USER> (dfs-node 'print *tree*) ; where *tree* is taken from the previous example
"a"
"b"
"d"
"e"
"c"
"f"

In the spirit of Lisp, we could also define a convenience macro:

(defmacro dotree-dfs ((value root) &body body)
(let ((node (gensym))) ; this code is needed to prevent possible symbol collisions for NODE
`(dfs-node (lambda (,node)
(let ((,value (key ,node)))
,@body))
,root)))

And if we'd like to traverse a tree represented as a list, the changes are minor:

(defun dfs-list (fn tree)
;; we need to handle both subtrees (lists) and leaves (atoms)
;; so, we'll just convert everything to a list
(let ((tree (mklist tree)))
(call fn (first tree))
(dolist (child (rest tree))
(dfs-list fn child))))

CL-USER> (dfs-list 'print '(defun foo (bar)
"Foo function."
(baz bar)))
DEFUN
FOO
BAR
"Foo function."
BAZ
BAR

Recursion is very natural in tree traversal: we could even say that trees are recursion realized in a data structure. And the good news here is that, very rarely, there's a chance to hit recursion limits as the majority of trees are not infinite, and also the height of the tree, which conditions the depth of recursion, grows proportionally to the logarithm of the tree size[1], and that's pretty slow.

These simple DFS implementations apply the function before descending down the tree. This style is called preorder traversal. There are alternative styles: inorder and postorder. With postorder, the call is executed after the recursion returns, i.e. on the recursive ascent:

(defun post-dfs (fn node)
(dolist (child (children node))
(post-dfs fn child))
(call fn (key node)))

CL-USER> (post-dfs 'print *tree*)
"d"
"e"
"b"
"f"
"c"
"a"

Inorder traversal is applicable only to binary trees: first traverse the left side, then call fn and then descend into the right side.

An alternative traversal approach is Breadth-first search (BFS). It isn't so natural as DFS as it traverses the tree layer by layer, i.e. it has to, first, accumulate all the nodes that have the same depth and then integrate them. In the general case, it isn't justified, but there's a number of algorithms where exactly such ordering is required.

Here is an implementation of BFS (preorder) for our tree-nodes:

(defun bfs (fn nodes)
(let ((next-level (list)))
(dolist (node (mklist nodes))
(call fn (key node))
(dolist (child (children node))
(push child next-level)))
(when next-level
(bfs fn (reverse next-level)))))

CL-USER> (bfs 'print *tree*)
"a"
"b"
"c"
"d"
"e"
"f"

An advantage of BFS traversal is that it can handle potentially unbounded trees, i.e. it is suitable for processing trees in a streamed manner, layer-by-layer.

In object-orientation, tree traversal is usually accomplished with by the means of the so-called Visitor pattern. Basically, it's the same approach of passing a function to the traversal procedure but in disguise of additional (and excessive) OO-related machinery. Here is a Visitor pattern example in Java:

interface ITreeVisitor {
List<ITreeNode> children;
void visit(ITreeNode node);
}

interface ITreeNode {
void accept(ITreeVisitor visitor);
}

interface IFn {
void call(ITreeNode);
}

class TreeNode implements ITreeNode {
public void accept(ITreeVisitor visitor) {
visitor.visit(this);
}
}

class PreOrderTreeVisitor implements ITreeVisitor {
private IFn fn;

public PreOrderTreeVisitor(IFn fn) {
this.fn = fn;
}

public void visit(ITreeNode node) {
fn.call(node);
for (ITreeeNode child : node.children())
child.visit(this);
}
}

The zest of this example is the implementation of the method visit that calls the function with the current node and iterates over its children by recursively applying the same visitor. You can see that it's exactly the same as our dfs-node.

One of the interesting tree-traversal tasks is tree printing. There are many ways in which trees can be displayed. The simplest one is directory-style (like the one used by the Unix tree utility):

$ tree /etc/acpi
/etc/acpi
├── asus-wireless.sh
├── events
│   ├── asus-keyboard-backlight-down
│   ├── asus-keyboard-backlight-up
│   ├── asus-wireless-off
│   └── asus-wireless-on
└── undock.sh

It may be implemented with DFS and only requires tracking of the current level in the tree:

(defun pprint-tree-dfs (node &optional (level 0) (skip-levels (make-hash-table)))
(when (= 0 level)
(format t "~A~%" (key node)))
(let ((last-index (1- (length (children node)))))
(doindex (i child (children node))
(let ((last-child-p (= i last-index)))
(dotimes (j level)
(format t "~C " (if (? skip-levels j) #\Space #\│)))
(format t "~C── ~A~%"
(if last-child-p #\└ #\├)
(key child))
(:= (? skip-levels level) last-child-p)
(pprint-tree-dfs child
(1+ level)
skip-levels))))))

CL-USER> (pprint-tree-dfs *tree*)
a
├── b
│ ├── d
│ └── e
└── c
└── f

1+ and 1- are standard Lisp shortucts for adding/substracting 1 from a number. The skip-levels argument is used for the last elements to not print the excess .

A more complicated variant is top-to-bottom printing:

;; example from CL-NLP library
CL-USER> (nlp:pprint-tree
'(TOP (S (NP (NN "This"))
(VP (VBZ "is")
(NP (DT "a")
(JJ "simple")
(NN "test")))
(|.| ".")))
TOP
:
S
.-----------:---------.
: VP :
: .---------. :
NP : NP :
: : .----:-----. :
NN VBZ DT JJ NN .
: : : : : :
This is a simple test .

This style, most probably, will need a BFS and a careful calculation of spans of each node to properly align everything. Implementing such a function is left as an exercise to the reader, and a very enlightening one, I should say.

Binary Search Trees

Now, we can return to the topic of basic operations on tree elements. The advantage of trees is that, when built properly, they guarantee O(log n) for all the main operations: search, insertion, modification, and deletion.

This quality is achieved by keeping the leaves sorted and the trees in a balanced state. "Balanced" means that any pair of paths from the root to the leaves have lengths that may differ by at most some predefined quantity: ideally, just 1 (AVL trees), or, as in the case of Red-Black trees, the longest path can be at most twice as long as the shortest. Yet, such situations when all the elements align along a single path, effectively, turning the tree into a list, should be completely ruled out. We have already seen, with Binary search and Quicksort (remember the justification for the 3-medians rule), why this constraint guarantees logarithmic complexity.

The classic example of balanced trees are Binary Search Trees (BSTs), of which AVL and Red-Black trees are the most popular variants. All the properties of BSTs may be trivially extended to n-ary trees, so we'll discuss the topic using the binary trees examples.

Just to reiterate the general intuition for the logarithmic complexity of tree operations, let's examine a complete binary tree: a tree that has all levels completely filled with elements, except maybe for the last one. In it, we have n elements, and each level contains twice as many nodes as the previous. This property means that n is not greater than (+ 1 2 4 ... (/ k 2) k), where k is the capacity of the last level. This formula is nothing but the sum of a geometric progression with the number of items equal to h, which is, by the textbook:

(/ (* 1 (- 1 (expt 2 h)))
(- 1 2))

In turn, thisexpression may be reduced to: (- (expt 2 h) 1). So (+ n 1) equals to (expt 2 h), i.e. the height of the tree (h) equals to (log (+ n 1) 2).

BSTs have the ordering property: if some element is to the right of another in the tree, it should consistently be greater (or smaller — depending on the ordering direction). This constraint means that after the tree is built, just extracting its elements by performing an inorder DFS produces a sorted array. The Treesort algorithm utilizes this approach directly to achieve the same O(n * log n) complexity as other efficient sorting algorithms. This n * log n is the complexity of each insertion (O(log n)) multiplied by the number of times it should be performed (n). So, Treesort operates by taking an array and adding its elements to the BST, then traversing the tree and putting the encountered elements into the resulting array, in a proper order.

Besides, the ordering property also means that, after adding a new element to the tree, in the general case, it should be rebalanced as the newly added element may not be placed in an arbitrary spot, but has just two admissible locations, and choosing any of those may violate the balance constraint. The specific balance invariants and approaches to tree rebalancing are the distinctive properties of each variant of BSTs that we will see below.

Splay Trees

A Splay tree represents a kind of BST that is one of the simplest to understand and to implement. It is also quite useful in practice. It has the least strict constraints and a nice property that recently accessed elements occur near the root. Thus, a Splay tree can naturally act as an LRU-cache. However, there are degraded scenarios that result in O(n) access performance, although, the average complexity of Splay tree operations is O(log n) due to amortization (we'll talk about it in a bit).

The approach to balancing a Splay tree is to move the element we have accessed/inserted into the root position. The movement is performed by a series of operations that are called tree rotations. A certain pattern of rotations forms a step of the algorithm. For all BSTs, there are just two possible tree rotations, and they serve as the basic block, in all balancing algorithms. A rotation may be either a left or a right one. Their purpose is to put the left or the right child into the position of its parent, preserving the order of all the other child elements. The rotations can be illustrated by the following diagrams in which x is the parent node, y is the target child node that will become the new parent, and A,B,C are subtrees. It is said that the rotation is performed around the edge x -> y.

Left rotation:

    x          y
/ \ / \
y C -> A x
/ \ / \
A B B C

Right rotation:

  x              y
/ \ / \
A y -> x C
/ \ / \
B C A B

As you see, the left and right rotations are complementary operations, i.e. performing one after the other will return the tree to the original state. During the rotation, the inner subtree (B) has its parent changed from y to x.

Here's an implementation of rotations:

(defstruct (bst-node (:conc-name nil)
(:print-object (lambda (node out)
(format out "[~a-~@[~a~]-~@[~a~]]"
(key node)
(lt node)
(rt node)))))
key
val ; we won't use this slot in the examples,
; but without it, in real-world use cases,
; the tree doesn't make any sense :)
lt ; left child
rt) ; right child

(defun tree-rotate (node parent grandparent)
(cond
((eql node (lt parent)) (:= (lt parent) (rt node)
(rt node) parent))
((eql node (rt parent)) (:= (rt parent) (lt node)
(lt node) parent))
(t (error "NODE (~A) is not the child of PARENT (~A)"
node parent)))
(cond
((null grandparent) (return-from tree-rotate node))
((eql parent (lt grandparent)) (:= (lt grandparent) node))
((eql parent (rt grandparent)) (:= (rt grandparent) node))
(t (error "PARENT (~A) is not the child of GRANDPARENT (~A)"
parent grandparent))))

You have probably noticed that we need to pass to this function not only the nodes on the edge around which the rotation is executed but also the grandparent node of the target to link the changes to the tree. If grandparent is not supplied, it is assumed that parent is the root and we need to separately reassign the variable holding the reference to the tree to child, after the rotation.

Splay trees combine rotations into three possible actions:

  • The Zig step is used to make the node the new root when it's already the direct child of the root. It is accomplished by a single left/right rotation(depending on whether the target is to the left or to the right of the root) followed by an assignment.
  • The Zig-zig step is a combination of two zig steps that is performed when both the target node and its parent are left/right nodes. The first rotation is around the edge between the target node and its parent, and the second — around the target and its former grandparent that has become its new parent, after the first rotation.
  • The Zig-zag step is performed when the target and its parent are not in the same direction: either one is left while the other is right or vise versa. In this case, correspondingly, first a left rotation around the parent is needed, and then a right one around its former grandparent (that has now become the new parent of the target). Or vice versa.

However, with our implementation of tree rotations, we don't have to distinguish the 3 different steps and the implementation of the operation splay becomes really trivial:

(defun splay (node &rest chain)
(loop :for (parent grandparent) :on chain :do
(tree-rotate node parent grandparent))
node)

The key point here and in the implementation of Splay tree operations is the use of reverse chains of nodes from the child to the root which will allow us to perform chains of splay operations in an end-to-end manner and also custom modifications of the tree structure.

From the code, it is clear that splaying requires at maximum the same number of steps as the height of the tree because each rotation brings the target element 1 level up. Now, let's discuss why all Splay tree operations are O(log n). Element access requires binary search for the element in the tree, which is O(log n) provided the tree is balanced, and then splaying it to root — also O(log n). Deletion requires search, then swapping the element either with the rightmost child of its left subtree or the leftmost child of its right subtree (direct predecessor/successor) — to make it childless, removing it, and, finally, splaying the parent of the removed node. And update is, at worst, deletion followed by insertion.

Here is the implementation of the Splay tree built of bst-nodes and restricted to only arithmetic comparison operations. All of the high-level functions, such as st-search, st-insert or st-delete return the new tree root obtained after that should substitute the previous one in the caller code.

(defun node-chain (item root &optional chain)
"Return as the values the node equal to ITEM or the closest one to it
and the chain of nodes leading to it, in the splay tree based in ROOT."
(if root
(with (((key lt rt) ? root)
(chain (cons root chain)))
(cond ((= item key) (values root
chain))
((< item key) (st-search item lt chain))
((> item key) (st-search item rt chain))))
(values nil
chain)))

(defun st-search (item root)
(with ((node chain (node-chain item root)))
(when node
(apply 'splay chain))))

(defun st-insert (item root)
(assert root nil "Can't insert item into a null tree")
(with ((node chain (st-search item root)))
(unless node
(let ((parent (first chain)))
;; here, we use the property of the := expression
;; that it returns the item being set
(push (:= (? parent (if (> (key parent) item) 'lt 'rt))
(make-bst-node :key item))
chain)))
(apply 'splay chain)))

(defun idir (dir)
(case dir
(lt 'rt)
(rt 'lt)))

(defun closest-child (node)
(dolist (dir '(lt rt))
(let ((parent nil)
(current nil))
(do ((child (call dir node) (call (idir dir) child)))
((null child) (when current
(return-from closest-child
(values dir
current
parent))))
(:= current child
parent current)))))

(defun st-delete (item root)
(with ((node chain (st-search item root))
(parent (second chain)))
(if (null node)
root ; ITEM was not found
(with ((dir child child-parent (closest-child node))
(idir (idir dir)))
(when parent
(:= (? parent (if (eql (lt parent) node) 'lt 'rt))
child))
(when child
(:= (? child idir) (? node idir))
(when child-parent
(:= (? child-parent idir) (? child dir))))
(if parent
(apply 'splay (rest chain))
child)))))

(defun st-update (old new root)
(st-insert new (st-delete old root)))

The deletion is somewhat tricky due to the need to account for different cases: when removing the root, the direct child of the root, or the other node.

Let's test the Splay tree operation in the REPL (coding pprint-bst as a slight modification of pprint-tree-dfs is left as an excercise to the reader):

CL-USER> (defparameter *st* (make-bst-node :key 5))
CL-USER> *st*
[5--]
CL-USER> (pprint-bst (:= *st* (st-insert 1 *st*)))
1
├── .
└── 5
CL-USER> (pprint-bst (:= *st* (st-insert 10 *st*)))
10
├── 1
│ ├── .
│ └── 5
└── .
CL-USER> (pprint-bst (:= *st* (st-insert 3 *st*)))
3
├── 1
└── 10
├── .
└── 5
CL-USER> (pprint-bst (:= *st* (st-insert 7 *st*)))
7
├── 3
│ ├── 1
│ └── 5
└── 10
CL-USER> (pprint-bst (:= *st* (st-insert 8 *st*)))
8
├── 7
│ ├── 3
│ │ ├── 1
│ │ └── 5
│ └── .
└── 10
CL-USER> (pprint-bst (:= *st* (st-insert 2 *st*)))
2
├── 1
└── 8
├── 7
│ ├── 3
│ │ ├── .
│ │ └── 5
│ └── .
└── 10
CL-USER> (pprint-bst (:= *st* (st-insert 4 *st*)))
4
├── 2
│ ├── 1
│ └── 3
└── 8
├── 7
│ ├── 5
│ └── .
└── 10
CL-USER> *st*
[4-[2-[1--]-[3--]]-[8-[7-[5--]-]-[10--]]]

As you can see, the tree gets constantly rearranged at every insertion.

Accessing an element, when it's found in the tree, also triggers tree restructuring:

RTL-USER> (pprint-bst (st-search 5 *st*))
5
├── 4
│ ├── 2
│ │ ├── 1
│ │ └── 3
│ └── .
└── 8
├── 7
└── 10

The insertion and deletion operations, for the Splay tree, also may have an alternative implementation: first, split the tree in two at the place of the element to be added/removed and then combine them. For insertion, the combination is performed by making the new element the root and linking the previously split subtrees to its left and right. As for deletion, splitting the Splay tree requires splaying the target element and then breaking the two subtrees apart (removing the target that has become the root). The combination is also O(log n) and it is performed by splaying the rightmost node of the left subtree (the largest element) so that it doesn't have the right child. Then the right subtree can be linked to this vacant slot.

Although regular access to the Splay tree requires splaying of the element we have touched, tree traversal should be implemented without splaying. Or rather, just the normal DFS/BFS procedures should be used. First of all, this approach will keep the complexity of the operation at O(n) without the unnecessary log n multiplier added by the splaying operations. Besides, accessing all the elements inorder will trigger the edge-case scenario and turn the Splay tree into a list — exactly the situation we want to avoid.

Complexity Analysis

All of those considerations apply under the assumption that all the tree operations are O(log n). But we haven't proven it yet. Turns out that, for Splay trees, it isn't a trivial task and requires amortized analysis. Basically, this approach averages the cost of all operations over all tree elements. Amortized analysis allows us to confidently use many advanced data structures for which it isn't possible to prove the required time bounds for individual operations, but the general performance over the lifetime of the data structure is in those bounds.

The principal tool of the amortized analysis is the potential method. Its idea is to combine, for each operation, not only its direct cost but also the change to the potential cost of other operations that it brings. For Splay trees, we can observe that only zig-zig and zig-zag steps are important, for the analysis, as zig step happens only once for each splay operation and changes the height of the tree by at most 1. Also, both zig-zig and zig-zag have the same potential.

Rigorously calculating the exact potential requires a number of mathematical proofs that we don't have space to show here, so let's just list the main results.

  1. The potential of the whole Splay tree is the sum of the ranks of all nodes, where rank is the logarithm of the number of elements in the subtree rooted at node:

    (defun rank (node)
    (let ((size 0))
    (dotree-dfs (_ node)
    (:+ size))
    (log size 2)))
  2. The change of potential produced by a single zig-zig/zig-zag step can be calculated in the following manner:

    (+ (- (rank grandparent-new) (rank grandparent-old))
    (- (rank parent-new) (rank parent-old))
    (- (rank node-new) (rank node-old)))

    Since (= (rank node-new) (rank grandparent-old)) it can be reduced to:

    (- (+ (rank grandparent-new) (rank parent-new))
    (+ (rank parent-old) (rank node-old)))

    Which is not larger than:

    (- (+ (rank grandparent-new) (rank node-new))
    (* 2 (rank node-old)))

    Which, in turn, due to the concavity of the log function, may be reduced to:

    (- (* 3 (- (rank node-new) (rank node-old))) 2)

    The amortized cost of any step is 2 operations larger than the change in potential as we need to perform 2 tree rotations, so it's not larger than:

    (* 3 (- (rank node-new) (rank node-old)))
  3. When summed over the entire splay operation, this expression "telescopes" to (* 3 (- (rank root) (rank node))) which is O(log n). Telescoping means that when we calculate the sum of the cost of all zig-zag/zig-zig steps, the inner terms cancel each other and only the boundary ones remain. The difference in ranks is, in the worst case, log n as the rank of the root is (log n 2) and the rank of the arbitrary node is between that value and (log 1 2) (0).

  4. Finally, the total cost for m splay operations is O(m log n + n log n), where m log n term represents the total amortized cost of a sequence of m operations and n log n is the change in potential that it brings.

As mentioned, the above exposition is just a cursory look at the application of the potential method that skips some important details. If you want to learn more you can start with this discussion on CS Theory StackExchange.

To conclude, similar to hash-tables, the performance of Splay tree operations for a concrete element depends on the order of the insertion/removal of all the elements of the tree, i.e. it has an unpredictable (random) nature. This property is a disadvantage compared to some other BST variants that provide precise performance guarantees. Another disadvantage, in some situations, is that the tree is constantly restructured, which makes it mostly unfit for usage as a persistent data structure and also may not play well with many storage options. Yet, Splay trees are simple and, in many situations, due to their LRU-property, may be preferable over other BSTs.

Red-Black and AVL Trees

Another BST that also has similar complexity characteristics to Splay trees and, in general, a somewhat similar approach to rebalancing is the Scapegoat tree. Both of these BSTs don't require storing any additional information about the current state of the tree, which results in the random aspect of their operation. And although it is smoothed over all the tree accesses, it may not be acceptable in some usage scenarios.

An alternative approach, if we want to exclude the random factor, is to track the tree state. Tracking may be achieved by adding just 1 bit to each tree node (as with Red-Black trees) or 2 bits, the so-called balance factors (AVL trees)[2]. However, for most of the high-level languages, including Lisp, we'll need to go to great lengths or even perform low-level non-portable hacking to, actually, ensure that exactly 1 or 2 bits is spent for this data, as the standard structure implementation will allocate a whole word even for a bit-sized slot. Moreover, in C likewise, due to cache alignment, the structure will also have the size aligned to memory word boundaries. So, by and large, usually we don't really care whether the data we'll need to track is a single bit flag or a full integer counter.

The balance guarantee of an RB tree is that, for each node, the height of the left and right subtrees may differ by at most a factor of 2. Such boundary condition occurs when the longer path contains alternating red and black nodes, and the shorter — only black nodes. Balancing is ensured by the requirement to satisfy the following invariants:

  1. Each tree node is assigned a label: red or black (basically, a 1-bit flag: 0 or 1).
  2. The root should be black (0).
  3. All the leaves are also black (0). And the leaves don't hold any data. A good implementation strategy to satisfy this property is to have a constant singleton terminal node that all preterminals will link to. ((defparameter *rb-leaf* (make-rb-node))).
  4. If a parent node is red (1) then both its children should be black (0). Due to mock leaves, each node has exactly 2 children.
  5. Every path from a given node to any of its descendant leaf nodes should contain the same number of black nodes.

So, to keep the tree in a balanced state, the insert/update/delete operations should perform rebalancing when the constraints are violated. Robert Sedgewick has proposed the simplest version of the red-black tree called the Left-Leaning Red-Black Tree (LLRB). The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes, which makes for the simplest implementation of the operations. Below, we can see the outline of the insert operation:

(defstruct (rb-node (:include bst-node) (:conc-name nil))
(red nil :type boolean))

(defun rb-insert (item root &optional parent)
(let ((node (make-rb-node :key item)))
(when (null root)
(return-from rb-insert node))
(when (and (red (lt root))
(red (rt root)))
(:= (red root) (not (red root))
(red (lt root)) nil
(red (rt root)) nil))
(cond ((< (key root) value)
(:= (lt root) (rb-insert node (lt root) root)))
((> (key root) value)
(:= (rt root) (rb-insert node (rt root) root))))
(when (and (red (rt root))
(not (red (lt root))))
(:= (red (lt root)) (red root)
root (tree-rotate (lt root) root parent)
(red root) t))
(when (and (red (lt root))
(not (red (rt root))))
(:= (red (rt root)) (red root)
root (tree-rotate (rt root) root parent)
(red root) t)))
root)

This code is more of an outline. You can easily find the complete implementation of the RB-tree on the internet. The key here is to understand the principle of their operation. Also, we won't discuss AVL trees, in detail. Suffice to say that they are based on the same principles but use a different set of balancing operations.

Both Red-Black and AVL trees may be used when worst-case performance guarantees are required, for example, in real-time systems. Besides, they serve a basis for implementing persistent data-structures that we'll talk about later. The Java TreeMap and similar data structures from the standard libraries of many languages are implemented with one of these BSTs. And the implementations of them both are present in the Linux kernel and are used as data structures for various queues.

OK, now you know how to balance a binary tree :D

B-Trees

B-tree is a generalization of a BST that allows for more than two children. The number of children is not unbounded and should be in a predefined range. For instance, the simplest B-tree — 2-3 tree — allows for 2 or 3 children. Such trees combine the main advantage of self-balanced trees — logarithmic access time — with the benefit of arrays — locality — the property which allows for faster cache access or retrieval from the storage. That's why B-trees are mainly used in data storage systems. Overall, B-tree implementations perform the same trick as we saw in prod-sort: switching to sequential search when the sequence becomes small enough to fit into the cache line of the CPU.

Each internal node of a B-tree contains a number of keys. For a 2-3 tree, the number is either 1 or 2. The keys act as separation values which divide the subtrees. For example, if the keys are x and y, all the values in the leftmost subtree will be less than x, all values in the middle subtree will be between x and y, and all values in the rightmost subtree will be greater than y. Here is an example:

              [ 7 . 18 ]
/ | \
[ 1 . 3 ] [ 10 . 15 ] [ 20 . _ ]

This tree has 4 nodes. Each node has 2 key slots and may have 0 (in the case of the leaf nodes), 2 or 3 children. The node structure for it might look like this:

(defstruct 23-node
key1
key2
val1
val2
lt
md
rt)

Yet, a more general B-tree node would, probably, contain arrays for keys/values and children links:

(defstruct bt-node
(keys (make-array *max-keys*))
(vals (make-array *max-keys*))
(children (make-array (1+ *max-keys*)))

The element search in a B-tree is very similar to that of a BST. Just, there will be up to *max-keys* comparisons instead of 1, in each node. Insertion is more tricky as it may require rearranging the tree items to satisfy its invariants. A B-tree is kept balanced after insertion by the procedure of splitting a would-be overfilled node, of (1+ n) keys, into two (/ n 2)-key siblings and inserting the mid-value key into the parent. That's why, usually, the range of the number of keys in the node, in the B-tree is chosen to be between k and (* 2 k). Also, in practice, k will be pretty large: an order of 10s or even 100. Depth only increases when the root is split, maintaining balance. Similarly, a B-tree is kept balanced after deletion by merging or redistributing keys among siblings to maintain the minimum number of keys for non-root nodes. A merger reduces the number of keys in the parent potentially forcing it to merge or redistribute keys with its siblings, and so on. The depth of the tree will increase slowly as elements are added to it, but an increase in the overall depth is infrequent and results in all leaf nodes being one more node farther away from the root.

A version of B-trees that is particularly developed for storage systems and is used in a number of filesystems, such as NTFS and ext4, and databases, such as Oracle and SQLite, is B+ trees. A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves. The leaves of the B+ tree are linked to one another in a linked list, making range queries or an (ordered) iteration through the blocks simpler and more efficient. Such a property could not be achieved in a B-tree, since not all keys are present in the leaves: some are stored in the root or intermediate nodes.

However, a newer Linux file-system, developed specifically for use on the SSDs and called btrfs uses plain B-trees instead of B+ trees because the former allows implementing copy-on-write, which is needed for efficient snapshots. The issue with B+ trees is that its leaf nodes are interlinked, so if a leaf were copy-on-write, its siblings and parents would have to be as well, as would their siblings and parents and so on until the entire tree was copied. We can recall the same situation pertaining to the doubly-linked list compared to singly-linked ones. So, a modified B-tree without leaf linkage is used in btrfs, with a refcount associated with each tree node but stored in an ad-hoc free map structure.

Overall, B-trees are a very natural continuation of BSTs, so we won't spend more time with them here. I believe, it should be clear how to deal with them, overall. Surely, there are a lot of B-tree variants that have their nuances, but those should be studied in the context of a particular problem they are considered for.

Heaps

A different variant of a binary tree is a Binary Heap. Heaps are used in many different algorithms, such as path pathfinding, encoding, minimum spanning tree, etc. They even have their own O(log n) sorting algorithm — the elegant Heapsort. In a heap, each element is either the smallest (min-heap) or the largest (max-heap) element of its subtree. It is also a complete tree and the last layer should be filled left-to-right. This invariant makes the heap well suited for keeping track of element priorities. So Priority Queues are, usually, based on heaps. Thus, it's beneficial to be aware of the existence of this peculiar data structure.

The constraints on the heap allow representing it in a compact and efficient manner — as a simple vector. Its first element is the heap root, the second and third are its left and right child (if present) and so on, by recursion. This arrangement permits access to the parent and children of any element using the simple offset-based formulas (in which the element is identified by its index):

(defun hparent (i)
"Calculate the index of the parent of the heap element with index I."
(floor (- i 1) 2))

(defun hrt (i)
"Calculate the index of the right child of the heap element with index I."
(* (+ i 1) 2))

(defun hlt (i)
"Calculate the index of the left child of the heap element with index I."
(- (hrt i) 1))

So, to implement a heap, we don't need to define a custom node structure, and besides, can get to any element in O(1)! Here is the utility to rearrange an arbitrary array in a min-heap formation (in other words, we can consider a binary heap to be a special arrangement of array elements). It works by iteratively placing each element in its proper place by swapping with children until it's larger than both of the children.

(defun heapify (vec)
(let ((mid (floor (length vec) 2)))
(dotimes (i mid)
(heap-down vec (- mid i 1))))
vec)

(defun heap-down (vec beg &optional (end (length vec)))
(let ((l (hlt beg))
(r (hrt beg)))
(when (< l end)
(let ((child (if (or (>= r end)
(> (? vec l)
(? vec r)))
l r)))
(when (> (? vec child)
(? vec beg))
;; rotatef swaps the elements of the sequence
(rotatef (? vec beg)
(? vec child))
(heap-down vec child end)))))
vec)

And here is the reverse operation to pop the item up the heap:

(defun heap-up (vec i)
(when (> (? vec i)
(? vec (hparent i)))
(rotatef (? vec i)
(? vec (hparent i)))
(heap-up vec (hparent i)))
heap)

Also, as with other data structures, it's essential to be able to visualize the content of the heap in a convenient form, as well as to check the invariants. These tasks may be accomplished with the help of the following functions:

(defun draw-heap (vec)
(format t "~%")
(with ((size (length vec))
(h (+ 1 (floor (log size 2)))))
(dotimes (i h)
(let ((spaces (loop :repeat (- (expt 2 (- h i)) 1) :collect #\Space)))
(dotimes (j (expt 2 i))
(let ((k (+ (expt 2 i) j -1)))
(when (= k size) (return))
(format t "~{~C~}~2D~{~C~}" spaces (? vec k) spaces)))
(format t "~%"))))
(format t "~%")
vec)

(defun check-heap (vec)
(dotimes (i (floor (length vec) 2))
(when (= (hlt i) (length vec)) (return))
(assert (not (> (? vec (hlt i)) (? vec i)))
() "Left child (~A) is > parent at position ~A (~A)."
(? vec (hlt i)) i (? vec i))
(when (= (hrt i) (length vec)) (return))
(assert (not (> (? vec (hrt i)) (? vec i)))
() "Right child (~A) is > than parent at position ~A (~A)."
(? vec (hrt i)) i (? vec i)))
vec)

CL-USER> (check-heap #(10 5 8 2 3 7 1 9))
Left child (9) is > parent at position 3 (2).
[Condition of type SIMPLE-ERROR]

CL-USER> (check-heap (draw-heap (heapify #(1 22 10 5 3 7 8 9 7 13))))

22
13 10
9 3 7 8
5 7 1

#(22 13 10 9 3 7 8 5 7 1)
Due to the regular nature of the heap, drawing it with BFS is much simpler than for most other trees. As with ordered trees, heap element insertion and deletion require repositioning of some of the elements.
(defun heap-push (node vec)
(vector-push-extend node vec)
(heap-up vec (1- (length vec))))

(defun heap-pop (vec)
(rotatef (? vec 0) (? vec (- (length vec) 1)))
;; PROG1 is used to return the result of the first form
;; instead of the last, like it happens with PROGN
(prog1 (vector-pop vec)
(heap-down vec 0)))

Now, we can implement Heapsort. The idea is to iteratively arrange the array in heap order element by element. Each arrangement will take log n time as we're pushing the item down a complete binary tree the height of which is log n. And we'll need to perform n such iterations.

(defun heapsort (vec)
(heapify vec)
(dotimes (i (length vec))
(let ((last (- (length vec) i 1)))
(rotatef (? vec 0)
(? vec last))
(heap-down vec 0 last)))
vec)

CL-USER> (heapsort #(1 22 10 5 3 7 8 9 7 13))
#(1 3 5 7 7 8 9 10 13 22)

There are so many sorting algorithms, so why invent another one? That's a totally valid point, but the advantage of heaps is that they keep the maximum/minimum element constantly at the top so you don't have to perform a full sort or even descend into the tree if you need just the top element. This simplification is especially relevant if we constantly need to access such elements as with priority queues.

Actually, a heap should not necessarily be a tree. Besides the Binary Heap, there are also Binomial, Fibonacci and other kinds of heaps that may even not necessary be trees, but even collections of trees (forests). We'll discuss some of them in more detail in the next chapters, in the context of the algorithms for which their use makes a notable difference in performance.

Tries

If I were to answer the question, what's the most underappreciated data structure, I'd probably say, a trie. For me, tries are a gift that keeps on giving, and they have already saved me program performance in a couple of situations that seemed hopeless. Besides, they are very simple to understand and implement.

A trie is also called a prefix tree. It is, usually, used to optimize dictionary storage and lookup when the dictionary has a lot of entries and there is some overlap between them. The most obvious example is a normal English language dictionary. A lot of words have common stems ("work", "word", "worry" all share the same beginning "wor"), there are many wordforms of the same word ("word", "words", "wording", "worded").

Thre are many approaches to trie implementation. Let's discuss with the most straightforward and, to so to say, primitive one. Here is a trie for representing a string dictionary that is character-based and uses an alist to store children pointers:

(defstruct (tr-node (:conc-name nil))
val
(children (list)))

(defun tr-lookup (key root)
(dovec (ch key
;; when iteration terminates normally
;; we have found the node we were looking for
(val root))
(if-it (assoc1 ch (children root))
(:= root it)
(return))))

(defun tr-add (key val root)
(let ((i 0))
(dovec (ch key)
(if-it (assoc1 ch (children root))
(:= root it
i (1+ i))
(return)))
(if (= i (length key))
;; there was already something at key -
;; several actions might be taken:
;; signal an error (continuable), overwrite, abort
(cerror "Assign a new value"
"There was already a value at key: ~A" (val root))
(dovec (ch (slice key i))
(let ((child (make-tr-node)))
(push (cons ch child) (children root))
(:= root child))))
(:= (val root) val)))

CL-USER> (defparameter *trie* (make-tr-node))
*TRIE*
CL-USER> *trie*
#S(TR-NODE :VAL NIL :CHILDREN NIL)

For the sake of brevity, we won't define a special print-function for our trie and will use a default one. In a real setting, though, it is highly advisable.

CL-USER> (tr-lookup "word" *trie*)
NIL
CL-USER> (tr-add "word" 42 *trie*)
42
CL-USER> *trie*
#S(TR-NODE
:VAL NIL
:CHILDREN ((#\w
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\o
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\r
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\d
. #S(TR-NODE
:VAL 42
:CHILDREN NIL)))))))))))))
CL-USER> (tr-lookup "word" *trie*)
42
CL-USER> (tr-add "word" :foo *trie*)

There was already a value at key: 42
[Condition of type SIMPLE-ERROR]

Restarts:
0: [CONTINUE] Assign a new value
1: [RETRY] Retry SLIME REPL evaluation request.
2: [*ABORT] Return to SLIME's top level.
3: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {100F6297C3}>)

Backtrace:
0: (TR-ADD "word" :FOO #S(TR-NODE :VAL 42 :CHILDREN NIL))
1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (TR-ADD "word" :FOO *TRIE*) #<NULL-LEXENV>)
2: (EVAL (TR-ADD "word" :FOO *TRIE*))
--more--

;;; Take the restart 0

:FOO
Cl-USER> (tr-add "we" :baz *trie*)
:BAZ
CL-USER> *trie*
#S(TR-NODE
:VAL NIL
:CHILDREN ((#\w
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\e . #S(TR-NODE :VAL :BAZ :CHILDREN NIL))
(#\o
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\r
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\k
. #S(TR-NODE
:VAL :BAR
:CHILDREN NIL))
(#\d
. #S(TR-NODE
:VAL :FOO
:CHILDREN NIL)))))))))))))

There are many ways to optimize this trie implementation. First of all, you can see that some space is wasted on intermediate nodes with no values. This is mended by Radix Trees (also known as Patricia Trees) that merge all intermediate nodes. I.e., our trie would change into the following more compact structure:

#S(TR-NODE
:VAL NIL
:CHILDREN ((#\w
. #S(TR-NODE
:VAL NIL
:CHILDREN ((#\e . #S(TR-NODE :VAL :BAZ :CHILDREN NIL))
("or" . #S(TR-NODE
:VAL NIL
:CHILDREN ((#\k
. #S(TR-NODE
:VAL :BAR
:CHILDREN NIL))
(#\d
. #S(TR-NODE
:VAL :FOO
:CHILDREN NIL)))))))))))))

Besides, there are ways to utilize the array to store trie offsets (similar to heaps), instead of using a linked backbone for it. Such variant is called a succinct trie. Also, there are compressed (C-tries), hash-array mapped (HAMTs), and other kinds of tries.

The main advantage of tries is efficient space usage thanks to the elimination of repetition in keys storage. In many scenarios, usage of tries also improves the speed of access. Consider the task of matching against a dictionary of phrases, for example, biological or medical terms, names of companies or works of art, etc. These are usually 2-3 words long phrases, but, occasionally, there may be an outlier of 10 or more words. The straightforward approach would be to put the dictionary into a hash-table, then iterate over the input string trying to find the phrases in the table, starting from each word. The only question is: where do we put an end of the phrase? As we said, the phrase may be from 1 to, say, 10 words in length. With a hash-table, we have to check every variant: a single-word phrase, a two-word one, and so on up to the maximum length. Moreover, if there are phrases with the same beginning, which is often the case, we'd do duplicate work of hashing that beginning, for each variant (unless we use an additive hash, but this isn't adviced for hash-tables). With a trie, all the duplication is not necessary: we can iteratively match each word until we either find the match in the tree or discover that there is no continuation of the current subphrase.

Trees in Action: Efficient Mapping

Finally, the last family of tree data structures I had to mention is trees for representing spatial relations. Overall, mapping and pathfinding is an area that prompted the creation of a wide range of useful algorithms and data structures. There are two fundamental operations for processing spatial data: nearest neighbor search and range queries. Given the points on the plane, how do we determine the closest points to a particular one? How do we retrieve all points inside a rectangle or a circle? A primitive approach is to loop through all the points and collect the relevant information, which results in at least O(n) complexity — prohibitively expensive if the number of points is beyond several tens or hundreds. And such problems, by the way, arise not only in the field of processing geospatial data (they are at the core of such systems as PostGIS, mapping libraries, etc.) but also in Machine Learning (for instance, the k-NN algorithm directly requires such calculations) and other areas.

A more efficient solution has an O(log n) complexity and is, as you might expect, based on indexing the data in a special-purpose tree. The changes to the tree will also have O(log n) complexity, while the initial indexing is O(n log n). However, in most of the applications that use this technic, changes are much less frequent than read operations, so the upfront cost pays off.

There is a number of trees that allow efficient storage of spatial data: segment trees, interval trees, k-d trees, R-trees, etc. The most common spatial data structure is an R-tree (rectangle-tree). It distributes all the points in an n-dimensional space (usually, n will be 2 or 3) among the leaves of the tree by recursively dividing the space into k rectangles holding roughly the same number of points until each tree node has at most k points. Let's say we have started from 1000 points on the plane and chosen k to be 10. In this case, the first level of the tree (i.e. children of the root) will contain 10 nodes, each one having as the value the dimensions of the rectangle that bounds approximately 100 points. Every node like that will have 10 more children, each one having around 10 points. Maybe, some will have more, and, in this case, we'll give those nodes 10 children each with, probably, 1 or 2 points in the rectangles they will command. Now, we can perform a range search with the obtained tree by selecting only the nodes that intersect the query rectangle. For a small query box, this approach will result in the discarding of the majority of the nodes at each level of the tree. So, a range search over an R-tree has O(k log n) where k is the number of intersecting rectangles.

Now, let's consider neighbor search. Obviously, the closest points to a particular one we are examining lie either in the same rectangle as the point or in the closest ones to it. So, we need to, first, find the smallest bounding rectangle, which contains our point, perform the search in it, then, if we haven't got enough points yet, process the siblings of the current tree node in the order of their proximity to it.

There are many other spatial problems that may be efficiently solved with this approach. One thing to note is that the described procedures require the tree to store, in the leaf nodes, references to every point contained in their rectangles.

Take-aways

So, balancing a tree isn't such a unique and interesting task. On the contrary, it's quite simple yet boring due to the number of edge cases you have to account for. Yet, we have just scratched the surface of the general topic of trees. It is vast: the Wikipedia section for tree data structures contains almost 100 of them and it's, definitely, not complete. Moreover, new tree variants will surely be invented in the future. But you will hardly deal with more than just a few variants during the course of your career, spending the majority of time with the simple "unconstrained" trees. And we have seen, in action, the basic principles of tree operation that will be helpful, in the process.

There's a couple of other general observations about programming algorithms we can draw from this chapter:

  1. Trees are very versatile data structures that are a default choice when you need to represent some hierarchy. They are also one of a few data structures for which recursive processing is not only admissible but also natural and efficient.
  2. Visualization is key to efficient debugging of complex data-structures. Unfortunately, it's hard to show that in the book how I have spent several hours on the code for the splay tree, but without an efficient way to display the trees coupled with dynamic tracing, I would probably have spent twice as much. And, both the print-function for individual node and pprint-bst were helpful here.

[1] This statement is stricktly true for balanced trees, but, even for imbalanced trees, such estimation is usually correct.

[2] Although it was shown that this value may also be reduced to a single bit using a clever implementation trick.

Lispers.deBerlin Lispers Meetup, Monday, 30th September 2019

· 26 days ago

We meet again on Monday 8pm, 30th September. Our host this time is James Anderson (www.dydra.com).

Berlin Lispers is about all flavors of Lisp including Common Lisp, Clojure and Scheme.

Max-Gerd Retzlaff will talk about his art installation "Other Dimension" [1] as part of the "Gewebtes Licht" exhibition in 2011 and how he used Embeddable Common Lisp for it.

[1] http://other-dimension.net/ in collaboration with Alex Wenger

We meet in the Taut-Haus at Engeldamm 70 in Berlin-Mitte, the bell is "James Anderson". It is located in 10min walking distance from U Moritzplatz or U Kottbusser Tor or Ostbahnhof. In case of questions call Christian +49 1578 70 51 61 4.

Vsevolod DyomkinProgramming Algorithms: Hash-Tables

· 41 days ago

Now, we can move on to studying advanced data structures which are built on top of the basic ones such as arrays and lists, but may exhibit distinct properties, have different use cases, and special algorithms. Many of them will combine the basic data structures to obtain new properties not accessible to the underlying structures. The first and most important of these advanced structures is, undoubtedly, the hash-table. However vast is the list of candidates to serve as key-values, hash-tables are the default choice for implementing them.

Also, hash-sets, in general, serve as the main representation for medium and large-sized sets as they ensure O(1) membership test, as well as optimal set-theoretic operations complexity. A simple version of a hash-set can be created using a normal hash-table with t for all values.

Implementation

The basic properties of hash-tables are average O(1) access and support for arbitrary keys. These features can be realized by storing the items in an array at indices determined by a specialized function that maps the keys in a pseudo-random way — hashes them. Technically, the keys should pertain to the domain that allows hashing, but, in practice, it is always possible to ensure either directly or by using an intermediate transformation. The choice of variants for the hash-function is rather big, but there are some limitations to keep in mind:

  1. As the backing array has a limited number of cells (n), the function should produce values in the interval [0, n). This limitation can be respected by a 2-step process: first, produce a number in an arbitrary range (for instance, a 32-bit integer) and then take the remainder of its division by n.
  2. Ideally, the distribution of indices should be uniform, but similar keys should map to quite distinct indices. I.e. hashing should turn things which are close, into things which are distant. This way, even very small changes to the input will yield sweeping changes in the value of the hash. This property is called the "avalanche effect".

Dealing with Collisions

Even better would be if there were no collisions — situations when two or more keys are mapped to the same index. Is that, at all, possible? Theoretically, yes, but all the practical implementations that we have found so far are too slow and not feasible for a hash-table that is dynamically updated. However, such approaches may be used if the keyset is static and known beforehand. They will be covered in the discussion of perfect hash-tables.

For dynamic hash-tables, we have to accept that collisions are inevitable. The probability of collisions is governed by an interesting phenomenon called "The Birthday Paradox". Let's say, we have a group of people of some size, for instance, 20. What is the probability that two of them have birthdays on the same date? It may seem quite improbable, considering that there are 365 days in a year and we are talking just about a handful of people. But if you take into account that we need to examine each pair of people to learn about their possible birthday collision that will give us (/ (* 20 19) 2), i.e. 190 pairs. We can calculate the exact probability by taking the complement to the probability that no one has a birthday collision, which is easier to reason about. The probability that two people don't share their birthday is (/ (- 365 1) 365): there's only 1 chance in 365 that they do. For three people, we can use the chain rule and state that the probability that they don't have a birthday collision is a product of the probability that any two of them don't have it and that the third person also doesn't share a birthday with any of them. This results in (* (/ 364 365) (/ (- 365 2) 365)). The value (- 365 2) refers to the third person not having a birthday intersection with neither the first nor the second individually, and those are distinct, as we have already asserted in the first term. Continuing in such fashion, we can count the number for 20 persons:

(defun birthday-collision-prob (n)
(let ((rez 1))
(dotimes (i n)
(:* rez (/ (- 365 i) 365)))
;; don't forget that we want the complement
;; of the probability of no collisions,
;; hence (- 1.0 ...)
(- 1.0 rez)))

CL-USER> (birthday-collision-prob 20)
0.4114384

So, among 20 people, there's already a 40% chance of observing a coinciding birthday. And this number grows quickly: it will become 50% at 23, 70% at 30, and 99.9% at just 70!

But why, on Earth, you could ask, have we started to discusss birthdays? Well, if you substitute keys for persons and the array size for the number of days in a year, you'll get the formula of the probability of at least one collision among the hashed keys in an array, provided the hash function produces perfectly uniform output. (It will be even higher if the distribution is non-uniform).

(defun hash-collision-prob (n size)
(let ((rez 1))
(dotimes (i n)
(:* rez (/ (- size i) size)))
(- 1.0 rez)))

Let's say, we have 10 keys. What should be the array size to be safe against collisions?

CL-USER> (hash-collision-prob 10 10)
0.9996371

99.9%. OK, we don't stand a chance to accidentally get a perfect layout. :( What if we double the array size?

CL-USER> (hash-collision-prob 10 20)
0.9345271

93%. Still, pretty high.

CL-USER> (hash-collision-prob 10 100)
0.37184352
CL-USER> (hash-collision-prob 10 10000)
0.004491329

So, if we were to use a 10k-element array to store 10 items the chance of a collision would fall below 1%. Not practical...

Note that the number depends on both arguments, so (hash-collision-prob 10 100) (0.37) is not the same as (hash-collision-prob 20 200) (0.63).

We did this exercise to completely abandon any hope of avoiding collisions and accept that they are inevitable. Such mind/coding experiments may be an effective smoke-test of our novel algorithmic ideas: before we go full-speed and implement them, it makes sense to perform some back-of-the-envelope feasibility calculations.

Now, let's discuss what difference the presence of these collisions makes to our hash-table idea and how to deal with this issue. The obvious solution is to have a fallback option: when two keys hash to the same index, store both of the items in a list. The retrieval operation, in this case, will require a sequential scan to find the requested key and return the corresponding value. Such an approach is called "chaining" and it is used by some implementations. Yet, it has a number of drawbacks:

  • It complicates the implementation: we now have to deal with both a static array and a dynamic list/array/tree. This change opens a possibility for some hard-to-catch bugs, especially, in the concurrent settings.
  • It requires more memory than the hash-table backing array, so we will be in a situation when some of the slots of the array are empty while others chain several elements.
  • It will have poor performance due to the necessity of dealing with a linked structure and, what's worse, not respecting cache locality: the chain will not fit in the original array so at least one additional RAM round-trip will be required.

One upside of this approach is that it can store more elements than the size of the backing array. And, in the extreme case, it degrades to bucketing: when a small number of buckets point to long chains of randomly shuffled elements.

The more widely-used alternative to chaining is called "open addressing" or "closed hashing". With it, the chains are, basically, stored in the same backing array. The algorithm is simple: when the calculated hash is pointing at an already occupied slot in the array, find the next vacant slot by cycling over the array. If the table isn't full we're guaranteed to find one. If it is full, we need to resize it, first. Now, when the element is retrieved by key, we need to perform the same procedure: calculate the hash, then compare the key of the item at the returned index. if the keys are the same, we've found the desired element, otherwise — we need to cycle over the array comparing keys until we encounter the item we need.

Here's an implementation of the simple open addressing hash-table using eql for keys comparison:

(defstruct ht
array
(count 0))

(defun ht (&rest kvs)
(let ((rez (make-ht :array (make-array 16))))
(loop :for (k v) :in kvs :do
(add-ht k v rez))
rez))

(defun ht-get (key ht)
(with ((size (length @ht.array)))
(start (rem (hash key) size)))
(do ((count 0 (1+ count))
(i start (rem (1+ i) size))
(item (? ht 'array start)
(? ht 'array i))
((or (null item)
(= count size)))
(when (eql key (car item))
(return
(values (cdr item)
;; the second value is an index, at which
;; the item was found (also used to distinguish
;; the value nil from not found, which is also
;; represented by nil but without the second value)
i))))))

(defun ht-add (key val ht)
(with ((array (ht-array ht))
(size (length array)))
;; flet defines a local function that has access
;; to the local variables defined in HT-ADD
(flet ((add-item (k v)
(do ((i (rem (hash k) size)
(rem (1+ i) size))
((null @ht.array#i)
(:= @ht.array#i (cons k v)))
;; this do-loop doesn't have a body
)))
;; TALLY is a generic function for size retrieval, from RUTILS
(when (= (tally ht) size)
;; when the backing array is full
;; expand it to have the length equal to the next power of 2
(:= size (expt 2 (ceiling (log (1+ count) 2)))
@ht.array (make-array size))
;; and re-add its contents
(dovec (item array)
(add-item (car item) (cdr item)))
;; finally, add the new item
(add-item key val)))

(defun ht-rem (key ht)
;; here, we use the index of the item
;; returned as the 2nd value of HT-GET
;; (when-it is a so called anaphoric macro, from RUTILS,
;; that assigns the value of its first argument
;; to an implicitly created variable IT
;; and evaluates the body when IT isn't null)
(when-it (nth-value 2 (ht-get key ht))
(void (? ht 'array it))
;; return the index to indicate that the item was found
it))

To avoid constant resizing of the hash-table, just as with dynamic arrays, the backing array is, usually, allocated to have the size equal to a power of 2: 16 elements, to begin with. When it is filled up to a certain capacity it is resized to the next power of 2: 32, in this case. Usually, around 70-80% is considered peak occupancy as too collisions may happen afterward and the table access performance severely degrades. In practice, this means that normal open-addressing hash-tables also waste from 20 to 50 percent of allocated space. This inefficiency becomes a serious problem with large tables, so other implementation strategies become preferable when the size of data reaches tens and hundreds of megabytes. Note that, in our trivial implementation above, we have, effectively, used the threshold of 100% to simplify the code. Adding a configurable threshold is just a matter of introducing a parameter and initiating resizing not when (= (ht-count ht) size) but upon (= (ht-count ht) (floor size threshold)). As we've seen, resizing the hash-table requires calculating the new indices for all stored elements and adding them anew into the resized array.

Analyzing the complexity of the access function of the hash-table and proving that it is amortized O(1) isn't trivial. It depends on the properties of the hash-function, which should ensure good uniformity. Besides, the resizing threshold also matters: the more elements are in the table, the higher the chance of collisions. Also, you should keep in mind that if the keys possess some strange qualities that prevent them from being hashed uniformly, the theoretical results will not hold.

In short, if we consider a hash-table with 60% occupancy (which should be the average number, for a common table) we end up with the following probabilities:

  • probability that we'll need just 1 operation to access the item (i.e. the initially indexed slot is empty): 0.4
  • probability that we'll need 2 operations (the current slot is occupied, the next one is empty): (* 0.6 0.4) — 0.24
  • probability that we'll need 3 operations: (* (expt 0.6 2) 0.4) — 0.14
  • probability that we'll need 4 operations: (* (expt 0.6 3) 0.4) — 0.09

Actually, these calculations are slightly off and the correct probability of finding an empty slot should be somewhat lower, although the larger the table is, the smaller the deviation in the numbers. Finding out why is left as an exercise for the reader :)

As you see, there's a progression here. With probability around 0.87, we'll need no more than 4 operations. Without continuing with the arithmetic, I think, it should be obvious that we'll need, on average, around 3 operations to access each item and the probability that we'll need twice as many (6) is quite low (below 5%). So, we can say that the number of access operations is constant (i.e. independent of the number of elements in the table) and is determined only by the occupancy percent. So, if we keep the occupancy in the reasonable bounds, named earlier, on average, 1 hash code calculation/lookup and a couple of retrievals and equality comparisons will be needed to access an item in our hash-table.

Hash-Code

So, we can conclude that a hash-table is primarily parametrized by two things: the hash-function and the equality predicate. In Lisp, in particular, there's a choice of just the four standard equality predicates: eq, eql, equal, and equalp. It's somewhat of a legacy that you can't use other comparison functions so some implementations, as an extension, allow th programmer to specify other predicates. However, in practice, the following approach is sufficient for the majority of the hash-table use cases:

  • use the eql predicate if the keys are numbers, characters, or symbols
  • use equal if the keys are strings or lists of the mentioned items
  • use equalp if the keys are vectors, structs, CLOS objects or anything else containing one of those

But I'd recommend trying your best to avoid using the complex keys requiring equalp. Besides the performance penalty of using the heaviest equality predicate that performs deep structural comparison, structs, and vectors, in particular, will most likely hash to the same index. Here is a quote from one of the implementors describing why this happens:

Structs have no extra space to store a unique hash code within them. The decision was made to implement this because automatic inclusion of a hashing slot in all structure objects would have made all structs an average of one word longer. For small structs this is unacceptable. Instead, the user may define a struct with an extra slot, and the constructor for that struct type could store a unique value into that slot (either a random value or a value gotten by incrementing a counter each time the constructor is run). Also, create a hash generating function which accesses this hash-slot to generate its value. If the structs to be hashed are buried inside a list, then this hash function would need to know how to traverse these keys to obtain a unique value. Finally, then, build your hash-table using the :hash-function argument to make-hash-table (still using the equal test argument), to create a hash-table which will be well-distributed. Alternatively, and if you can guarantee that none of the slots in your structures will be changed after they are used as keys in the hash-table, you can use the equalp test function in your make-hash-table call, rather than equal. If you do, however, make sure that these struct objects don't change, because then they may not be found in the hash-table.

But what if you still need to use a struct or a CLOS object as a hash key (for instance, if you want to put them in a set)? There are three possible workarounds:

  • Choose one of their slots as a key (if you can guarantee its uniqueness).
  • Add a special slot to hold a unique value that will serve as a key.
  • Use the literal representation obtained by calling the print-function of the object. Still, you'll need to ensure that it will be unique and constant. Using an item that changes while being the hash key is a source of very nasty bugs, so avoid it at all cost.

These considerations are also applicable to the question of why Java requires defining both equals and hashCode methods for objects that are used as keys in the hash-table or hash-set.

Advanced Hashing Techniques

Beyond the direct implementation of open addressing, called "linear probing" (for it tries to resolve collisions by performing a linear scan for an empty slot), a number of approaches were proposed to improve hash distribution and reduce the collision rate. However, for the general case, their superiority remains questionable, and so the utility of a particular approach has to be tested in the context of the situations when linear probing demonstrates suboptimal behavior. One type of such situations occurs when the hash-codes become clustered near some locations due to deficiencies of either the hash-function or the keyset.

The simplest modification of linear probing is called "quadratic probing". It operates by performing the search for the next vacant slot using the linear probing offsets (or some other sequence of offsets) that are just raised to the power 2. I.e. if, with linear probing, the offset sequence was 1,2,3,etc, with the quadratic one, it is 1,4,9,... "Double hashing" is another simple alternative, which, instead of a linear sequence of offsets, calculates the offsets using another hash-function. This approach makes the sequence specific to each key, so the keys that map to the same location will have different possible variants of collision resolution. "2-choice hashing" also uses 2 hash-functions but selects the particular one for each key based on the distance from the original index it has to be moved for collision resolution.

More elaborate changes to the original idea are proposed in Cuckoo, Hopscotch, and Robin Hood caching, to name some of the popular alternatives. We won't discuss them now, but if the need arises to implement a non-standard hash-table it's worth studying all of those before proceeding with an idea of your own. Although, who knows, someday you might come up with a viable alternative technique, as well...

Hash-Functions

The class of possible hash-functions is very diverse: any function that sufficiently randomizes the key hashes will do. But what good enough means? One of the ways to find out is to look at the the pictures of the distribution of hashes. Yet, there are other factors that may condition the choice: speed, complexity of implementation, collision resistance (important for cryptographic hashes that we won't discuss in this book).

The good news is that, for most practical purposes, there's a single function that is both fast and easy to implement and understand. It is called FNV-1a.

(defparameter *fnv-primes*
'((32 . 16777619)
(64 . 1099511628211)
(128 . 309485009821345068724781371)
(256 . 374144419156711147060143317175368453031918731002211)))

(defparameter *fnv-offsets*
'((32 . 2166136261)
(64 . 14695981039346656037)
(128 . 144066263297769815596495629667062367629)
(256 . 100029257958052580907070968620625704837092796014241193945225284501741471925557)))

(defun fnv-1a (x &key (bits 32))
(assert (member bits '(32 64 128 256)))
(let ((rez (assoc1 bits *fnv-offsets*))
(prime (assoc1 bits *fnv-primes*)))
(dotimes (i (/ bits 8))
(:= rez (ldb (byte bits 0)
(* (logxor rez (ldb (byte 8 (* i 8)) x))
prime))))
rez))

The constants *fnv-primes* and *fnv-offsets* are precalculated up to 1024 bits (here, I used just a portion of the tables).

Note that, in this implementation, we use normal Lisp multiplication (*) that is not limited to fixed-size numbers (32-bit, 64-bit,...) so we need to extract only the first bits with ldb.

Also note that if you were to calculate FNV-1a with some online hash calculator you'd, probably, get a different result. Experimenting with it, I noticed that it is the same if we use only the non-zero bytes from the input number. This observation aligns well with calculating the hash for simple strings when each character is a single byte. For them the hash-function would look like the following:

(defun fnv-1a-str (str)
(let ((rez (assoc1 32 *fnv-offsets*))
(prime (assoc1 32 *fnv-primes*)))
(dovec (char str)
(:= rez (ldb 32 (* (logxor rez (ldb (byte 8 (* i 8))
(char-code char)))
prime))))
rez))

So, even such a simple hash-function has nuances in its implementation and it should be meticulously checked against some reference implementation or a set of expected results.

Alongside FNV-1a, there's also FNV-1, which is a slightly worse variation, but it may be used if we need to apply 2 different hash functions at once (like, in 2-way or double hashing).

What is the source of the hashing property of FNV-1a? Xors and modulos. Combining these simple and efficient operations is enough to create a desired level of randomization. Most of the other hash-functions use the same building blocks as FNV-1a. They all perform arithmetic (usually, addition and multiplication as division is slow) and xor'ing, adding into the mix some prime numbers. For instance, here's what the code for another popular hash-function "djb2" approximately looks like:

(defun djb2-str (str)
(let ((rez 5381) ; a DJB2 prime number
(loop :for char :across str :do
(:= rez (ldb 32 (+ (char-code char)
(ldb 32 (+ (ash rez 5)
rez)))))))
rez))

Operations

The generic key-value operations we have discussed in the previous chapter, obviously also apply to hash-tables. There are also specific low-level ones, defined by the Lisp standard. And it's worth mentioning that, in regards to hash-tables, I find the standard quite lacking so a lot of utilities were added as part of RUTILS. The reason for the deficiency in the stadard is, I believe, that when hash-tables had been added to Lisp they had been still pretty novel technology not widely adopted in the programming languages community. So there had been neither any significant experience using them, nor a good understanding of the important role they would play. Languages such as Python or Clojure as well as the ones that were designed even later, were developed with this knowledge already in mind. Yet, this situation doesn't pose insurmountable difficulty for Lisp users as the language provides advanced extension tools such as macros and reader macros, so the necessary parts can be added and, in fact, exist as 3rd-party extensions. Using them becomes just a question of changing your habits and adapting to more efficient approaches. The situation is different for the users of many other languages, such as Java users, who had to wait for the new major version of the language to get access to such things as literal hash-table initialization. The feature I consider to be crucially important to improving the level of code clarity, in the declarative paradigm.

Initialization

Normally, the hash-table can be created with make-hash-table, which has a number of configuration options, including :test (default: eql). Most of the implementations allow the programmer to make synchronized (thread-safe) hash-tables via another configuration parameter, but the variants of concurrency control will differ.

Yet, it is important to have a way to define hash-tables already pre-initialized with a number of key-value pairs, and make-hash-table can't handle this. Pre-initialized hash tables represent a common necessity for tables serving as dictionaries, and such pre-initialization greatly simplifies many code patterns. Thus RUTILS provides such a syntax (in fact, in 2 flavors) with the help of reader macros:

#{equal "foo" :bar "baz" 42}
#h(equal "foo" :bar "baz" 42)

Both of these expressions will expand into a call to make-hash-table with equal test and a two calls to set operation to populate the table with the kv-pairs "foo" :bar and "baz" 42. For this stuff to work, you need to switch to the appropriate readtable by executing: (named-readtables:in-readtable rutils-readtable).

The reader-macro to parse #h()-style literal readtables isn't very complicated. As all reader-macros, it operates on the character stream of the program text, processing one character at a time. Here is it's implementation:

(defun |#h-reader| (stream char arg)
(read-char stream) ; skip the open paren
;; we can also add a sanity check to ensure that this character
;; is indeed a #\(
(with (;; read-delimited-list is a standard library function
;; that reads items until a delimiter is encountered
;; and then returns them as a list of parsed Lisp objects
(sexp (read-delimited-list #\) stream t))
;; the idea is that the first element may be a hash-table
;; test function; in this case, the number of items in the
;; definition will be odd as each key-value pair should have
;; an even number of elements
(test (when (oddp (length sexp))
(first sexp)))
;; the rest of the values, after the possible test function,
;; are key-value pairs
(kvs (group 2 (if test (rest sexp) sexp)))
(ht (gensym)))
`(let ((,ht (make-hash-table :test ',(or test 'eql))))
;; iterate the tail of the KVS list (:on loop clause)
;; and, for each key-value pair, generate an expression
;; to add the value for the key in the resulting hash-table
,@(mapcar (lambda (kv)
`(:= (? ,ht ,(first kv)) ,(second kv)))
,kvs)
,ht)))

After such a function is defined, it can be plugged into the standard readtable:

(set-dispatch-macro-character #\# #\h '|#h-reader|)

Or it may be used in a named-readtable (you can learn how to do that, from the docs).

print-hash-table is the utility to perform the reverse operation — display hash-tables in the similar manner:

RUTILS> (print-hash-table #h(equal "foo" :bar "baz" 42))
#{EQUAL
"foo" :BAR
"baz" 42
}
#<HASH-TABLE :TEST EQUAL :COUNT 2 {10127C0003}>

The last line of the output is the default Lisp printed representation of the hash-table. As you see, it is opaque and doesn't display the elements of the table. RUTILS also allows switching to printing the literal representation instead of the standard one with the help of toggle-print-hash-table. However, this extension is intended only for debugging purposes as it is not fully standard-conforming.

Access

Accessing the hash-table elements is performed with gethash, which returns two things: the value at key and t when the key was found in the table, or two nils otherwise. By using (:= (gethash key ht) val) (or (:= (? ht key) val)) we can modify the stored value. Notice the reverse order of arguments of gethash compared to the usual order in most accessor functions, when the structure is placed first and the key second. However, gethash differs from generic ? in that it accepts an optional argument that is used as the default value if the requested key is not present in the table. In some languages, like Python, there's a notion of "default hash-tables" that may be initialized with a common default element. In Lisp, a different approach is taken. However, it's possible to easily implement default hash-tables and plug them into the generic-elt mechanism:

(defstruct default-hash-table
table
default-value)

(defun gethash-default (key ht)
(gethash key (? ht 'table) (? ht 'default-value)))

(defmethod generic-elt ((kv default-hash-table) key &rest keys)
(gethash-default key kv))

RUTILS also defines a number of aliases/shorthands for hash-table operations. As the # symbol is etymologically associated with hashes, it is used in the names of all these functions:

  • get# is a shorthand and a more distinctive alias for gethash
  • set# is an alias for (:= (gethash ...
  • getset# is an implementation of the common pattern: this operation either retrieves the value if the key is found in the table or calculates its third argument returns it and also sets it for the given key for future retrieval
  • rem# is an alias for remhash (remove the element from the table)
  • take# both returns the key and removes it (unlike rem# that only removes)
  • in# tests for the presence of the key in the table
  • also, p# is an abbreviated version of print-hash-table

Iteration

Hash-tables are unordered collections, in principle. But, still, there is always a way to iterate over them in some (unspecified) order. The standard utility for that is either maphash, which unlike map doesn't populate the resulting collection and is called just for the side effects, or the special loop syntax. Both are suboptimal, from several points of view, so RUTILS defines a couple of alternative options:

  • dotable functions in the same manner as dolist except that it uses two variables: for the key and the value
  • mapkv, mentioned in the previous chapter, works just like mapcar by creating a new result table with the same configuration as the hash-table it iterates over and assigns the results of invoking the first argument — the function of two elements — with each of the kv-pairs

Despite the absence of a predefined ordering, there are ways in which some order may be introduced. For example, in SBCL, the order in which the elements are added, is preserved by using additional vectors called index-vector and next-vector that store this information. Another option which allows forcing arbitrary ordering is to use the so-called Linked Hash-Table. It is a combination of a hash-table and a linked list: each key-value pair also has the next pointer, which links it to some other item in the table. This way, it is possible to have ordered key-values without resorting to tree-based structures. A poor man's linked hash-table can be created on top of the normal one with the following trick: substitute values by pairs containing a value plus a pointer to the next pair and keep track of the pointer to the first pair in a special slot.

(deftruct linked-hash-table-item
key
val
next)

(defstruct linked-hash-table
table
head
tail)

(defun gethash-linked (key ht)
(? ht 'table key 'val))

(defun sethash-linked (key ht val)
;; The initial order of items is the order of addition.
;; If we'd like to impose a different order,
;; we'll have to perform reordering after each addition
;; or implement a custom sethash function.
(with (((table head tail) ? ht)
(cur (gethash key table)))
(if cur
(:= (? cur 'val) val)
(let ((new (make-linked-hash-table-item
:key key :val val)))
(when (null head)
(:= (? ht 'head) new))
(:= (? ht 'tail)
(if tail
(:= (? ht 'tail 'next) new)
new))))))

(defmethod mapkv (fn (ht linked-hash-table))
(with ((rez (make-linked-hash-table
:table (hash-table :key (hash-table-key (? rez 'table)))))
(prev nil))
(do ((item (? rez 'head) (? item 'next)))
((null item))
(sethash-linked key rez (call fn (? item 'val))))))

The issue with this approach, as you can see from the code, is that we also need to store the key, and it duplicates the data also stored in the backing hash-table itself. So, an efficient linked hash-table has to be implemented from scratch using an array as a base instead of a hash-table.

Perfect Hashing

In the previous exposition, we have concluded that using hash-tables implies a significant level of reserved unused space (up to 30%) and inevitable collisions. Yet, if the keyset is static and known beforehand, we can do better: find a hash-function, which will exclude collisions (simple perfect hashing) and even totally get rid of reserved space (minimal perfect hashing, MPH). Although the last variant will still need extra space to store the additional information about the hash-functions, it may be much smaller: in some methods, down to ~3-4 bits per key, so just 5-10% overhead. Statistically speaking, constructing such a hash-function is possible. But the search for its parameters may require some trial and error.

Implementation

The general idea is simple, but how to find the appropriate hash-function? There are several approaches described in sometimes hard-to-follow scientific papers and a number of cryptic programs in low-level C libraries. At a certain point in time, I needed to implement some variant of an MPH so I read those papers and studied the libraries to some extent. Not the most pleasant process, I should confess. One of my twitter pals once wrote: "Looks like it's easier for people to read 40 blog posts than a single whitepaper." And, although he was putting a negative connotation to it, I recognized the statement as a very precise description of what a research engineer does: read a whitepaper (or a dozen, for what it's worth) and transform it into working code and — as a possible byproduct — into an explanation ("blog post") that other engineers will understand and be able to reproduce. And it's not a skill every software developer should be easily capable of. Not all papers can even be reproduced because the experiment was not set up correctly, some parts of the description are missing, the data is not available, etc. Of those, which, in principle, can be, only some are presented in the form that is clear enough to be reliably programmed.

Here is one of the variants of minimal perfect hashing that possesses such qualities. It works for datasets of any size as a 3-step process:

  1. At the first stage, by the use of a common hash-function (in particular, the Jenkins hash), all keys are near-uniformly distributed into buckets, so that the number of keys in each bucket doesn't exceed 256. It can be achieved with very high probability if the hash divisor is set to (ceiling (length keyset) 200). This allows the algorithm to work for data sets of arbitrary size, thereby reducing the problem to a simpler one that already has a known solution.
  2. Next, for each bucket, the perfect hash function is constructed. This function is a table (and it's an important mathematical fact that each discrete function is equivalent to a table, albeit, potentially, of unlimited length). The table contains byte-sized offsets for each hash code, calculated by another application of the Jenkins hash, which produces two values in one go (actually, three, but one of them is not used). The divisor of the hash-function, this time, equals to double the number of elements in the bucket. And the uniqueness requirement is that the sum of offsets corresponding, in the table, to the two values produced by the Jenkins hash is unique, for each key. To check if the constraint is satisfied, the hashes are treated as vertices of a graph, and if it happens to be acyclic (the probability of this event is quite high if the parameters are chosen properly), the requirement can be satisfied, and it is possible to construct the perfect hash function, by the process described as the next step. Otherwise, we change the seed of the Jenkins hash and try again until the resulting graph is acyclic. In practice, just a couple of tries are needed.
  3. Finally, the hash-function for the current bucket may be constructed from the graph by the CHM92 algorithm (named after the authors and the year of the paper), which is another version of perfect hashing but suitable only for limited keysets. Here, you can see the CHM92 formula implemented in code:
(defstruct mpht
(data nil :type simple-vector)
(offsets nil :type (simple-array octet))
(meta nil :type (simple-array quad))
(div nil))

;; div is the divisor of the top-level hash, which is calculated as:
;; (/ (1- (length meta)) 2)

(defun mpht-index (item mpht)
(with (((offsets meta div) ? mpht)
(bucket-id (* (mod (jenkins-hash item) div) 2))
(bucket-offset (? meta bucket-id))
(bucket-seed (? meta (+ 1 bucket-id)))
;; the number of items in the bucket is calculated
;; by substracting the offset of the next bucket
;; from the offset of the current one
(bucket-count (- (? meta (+ 2 bucket-id))
bucket-offset))
(hash1 hash2 (jenkins-hash2 item bucket-seed bucket-div))
(base (* bucket-offset 2)))
(+ bucket-offset (mod (+ (? offsets (+ base hash1))
(? offsets (+ base hash2)))
bucket-count))))

This algorithm guarantees exactly O(1) hash-table access and uses 2 bytes per key, i.e. it will result in a constant 25% overhead on the table's size (in a 64-bit system): 2 byte-sized offsets for the hashes plus negligible 8 bytes per bucket (each bucket contains ~200 elements) for meta information. Better space-utilization solutions (up to 4 times more efficient) exist, but they are harder to implement and explain.

The Jenkins hash-function was chosen for two reasons:

  • Primarily, because, being a relatively good-quality hash, it has a configurable parameter seed that is used for probabilistic probing (searching for an acyclic graph). On the contrary, FNV-1a doesn't work well with an arbitrary prime hence the usage of a pre-calculated one that isn't subject to change.
  • Also, it produces 3 pseudo-random numbers right away, and we need 2 for the second stage of the algorithm.

The CHM92 Algorithm

The CHM92 algorithm operates by performing a depth-first search (DFS) on the graph, in the process, labeling the edges with unique numbers and calculating the corresponding offset for each of the Jenkins hash values. In the picture, you can see one of the possible labelings: each vertex is the value of one of the two hash-codes returned by jenkins-hash2 for each key, and every edge, connecting them, corresponds to a key that produced the hashes. The unique indices of the edges were obtained during DFS. Now, each hash-code is mapped iteratively to the number that is (- edge-index other-vertex-index). So, some codes will map to the same number, but it is guaranteed that, for each key, the sum of two corresponding numbers will be unique (as the edge indices are unique).

CHM92 is an example of the probabilistic algorithms we will discuss in more detail near the end of the book.

Let's say we have implemented the described scheme like I did in the const-table library. Now, we need to perform the measurements to validate that we have, in fact, achieved the desired improvement over the standard hash-table implementation. In this case, we are interested not only in speed measurements, which we already know how to perform but also in calculating the space occupied.

The latter goal is harder to achieve. Usually, most of the programming languages will provide the analog of a sizeof function that returns the space occupied by an array, a structure or an object. Here, we're interested not in "shallow" sizeof but in a "deep" one that will descend into the structure's slots and add their sizes recursively.

First, let's create functions to populate the tables with a significant number of random string key-value pairs.

(defun random-string (size)
(coerce (loop :repeat size :collect (code-char (+ 32 (random 100))))
'string))

(defun random-hash-table (&key (n 100000))
(let ((rez (make-hash-table :test 'equal)))
(loop :repeat n :do
(:= (? rez (random-string (+ 3 (random 4))))
(random-string (+ 3 (random 4)))))
rez))

(defun random-const-table (&key (n 100000))
(let ((rez (make-const-table :test 'equal)))
(loop :repeat n :do
(:= (? rez (random-string (+ 3 (random 4))))
(random-string (+ 3 (random 4)))))
rez))

A very approximate space measurement may be performed using the standard operator room. But it doesn't provide detailed per-object statistics. Here's a result of the room measurement, in SBCL (the format of the report will be somewhat different, for each implementation):

CL-USER> (room)
Dynamic space usage is: 45,076,224 bytes.
Immobile space usage is: 18,998,832 bytes (64,672 bytes overhead).
Read-only space usage is: 0 bytes.
Static space usage is: 1,264 bytes.
Control stack usage is: 9,048 bytes.
Binding stack usage is: 640 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
11,369,232 bytes for 76,040 simple-vector objects
9,095,952 bytes for 160,669 instance objects
8,289,568 bytes for 518,098 cons objects
3,105,920 bytes for 54,655 simple-array-unsigned-byte-8 objects
2,789,168 bytes for 54,537 simple-base-string objects
2,344,672 bytes for 9,217 simple-character-string objects
6,973,472 bytes for 115,152 other objects

43,967,984 bytes for 988,368 dynamic objects (space total)

Breakdown for immobile space:
16,197,840 bytes for 24,269 code objects
1,286,496 bytes for 26,789 symbol objects
1,041,936 bytes for 27,922 other objects

18,526,272 bytes for 78,980 immobile objects (space total)


CL-USER> (defparameter *ht* (random-hash-table))
*HT*
CL-USER> (room)
...
Breakdown for dynamic space:
13,349,920 bytes for 77,984 simple-vector objects
11,127,008 bytes for 208,576 simple-character-string objects
9,147,824 bytes for 161,469 instance objects
8,419,360 bytes for 526,210 cons objects
3,517,792 bytes for 2,997 simple-array-unsigned-byte-32 objects
3,106,288 bytes for 54,661 simple-array-unsigned-byte-8 objects
7,671,168 bytes for 166,882 other objects

56,339,360 bytes for 1,198,779 dynamic objects (space total)

So, it seems like we added roughly 10 megabytes by creating a hash-table with 100,000 random 5-9 character keys and values. Almost all of that space went into the keys and values themselves — 9 Mb ("11,127,008 bytes for 208,576 simple-character-string objects" versus "2,344,672 bytes for 9,217 simple-character-string objects" — a bit less than 200,000 new strings were added).

Also, if we examine the hash-table, we can see that its occupancy is rather high — around 90%! (The number of keys 99706 instead of 10000 tells us that there was a small portion of duplicate keys among the randomly generated ones).

CL-USER> (describe *ht*)
#<HASH-TABLE :TEST EQUAL :COUNT 99706 {1002162EF3}>
[hash-table]

Occupancy: 0.9
Rehash-threshold: 1.0
Rehash-size: 1.5
Size: 111411

And now, a simple time measurement:

CL-USER> (let ((keys (keys *ht*)))
(time (loop :repeat 100 :do
(dolist (k keys)
(gethash k *ht*)))))
Evaluation took:
0.029 seconds of real time
0.032000 seconds of total run time (0.032000 user, 0.000000 system)
110.34% CPU
72,079,880 processor cycles
0 bytes consed

Now, let's try the const-tables that are the MPHT implementation:

СL-USER> (time (defparameter *ct* (cstab:build-const-table *ht*)))
...................................................................................................
Evaluation took:
0.864 seconds of real time
...
СL-USER> (room)
...
Breakdown for dynamic space:
14,179,584 bytes for 78,624 simple-vector objects
11,128,464 bytes for 208,582 simple-character-string objects
9,169,120 bytes for 161,815 instance objects
8,481,536 bytes for 530,096 cons objects
3,521,808 bytes for 2,998 simple-array-unsigned-byte-32 objects
3,305,984 bytes for 54,668 simple-array-unsigned-byte-8 objects
7,678,064 bytes for 166,992 other objects

57,464,560 bytes for 1,203,775 dynamic objects (space total)

Another megabyte was added for the metadata of the new table, which doesn't seem significantly different from the hash-table version. Surely, often we'd like to be much more precise in space measurements. For this, SBCL recently added an allocation profiler sb-aprof, but we won't go into the details of its usage, in this chapter.

And now, time measurement:

CL-USER> (let ((keys (keys *ht*)))
(time (loop :repeat 100 :do
(dolist (k keys)
(cstab:csget k *ct*)))))
Evaluation took:
3.561 seconds of real time

Oops, a two-orders-of-magnitude slowdown! Probably, it has to do with many factors: the lack of optimization in my implementation compared to the one in SBCL, the need to calculate more hashes and with a slower hash-function, etc. I'm sure that the implementation may be sped up at least an order of magnitude, but, even then, what's the benefit of using it over the default hash-tables? Especially, considering that MPHTs have a lot of moving parts and rely on a number of "low-level" algorithms like graph traversal or efficient membership testing, most of which need a custom efficient implementation...

Still, there's one dimension in which MPHTs may provide an advantage: significantly reduce space usage by not storing the keys. Though, it becomes problematic if we need to distinguish the keys that are in the table from the unknown ones as those will also hash to some index, i.e. overlap with an existing key. So, either the keyspace should be known beforehand and exhaustively covered in the table or some precursory membership test is necessary when we anticipate the possibility of unseen keys. Yet, there are ways to perform the test efficiently (exactly or probabilistically), which require much less storage space than would be needed to store the keys themselves. Some of them we'll see in the following chapters.

If the keys are omitted, the whole table may be reduced to a Jump-table. Jump-tables are a low-level trick possible when all the keys are integers in the interval [0, n). It removes the necessity to perform sequential equality comparisons for every possible branch until one of the conditions matches: instead, the numbers are used directly as an offset. I.e. the table is represented by a vector, each hash-code being the index in that vector.

A jump-table for the MPHT will be simply a data array, but sometimes evaluation of different code is required for different keys. Such more complex behavior may be implemented in Lisp using the lowest-level operators tagbody and go (and a bit of macrology if we need to generate a huge table). This implementation will be a complete analog of the C switch statement. The skeleton for such "executable" table will look like this, where 0, 1,... are goto labels:

(block nil
(tagbody (go key)
0 (return (do-something0))
1 (return (do-something1))
...))

Distributed Hash-Tables

Another active area of hash-table-related research is algorithms for distributing them over the network. This is a natural way to represent a lot of datasets, and thus there are numerous storage systems (both general- and special-purpose) which are built as distributed hash-tables. Among them are, for instance, Amazon DynamoDB or an influential open-source project Kademlia. We will discuss in more detail, in the chapter on Distributed Algorithms, some of the technologies developed for this use case, and here I wanted to mention just one concept.

Consistent Hashing addresses the problem of distributing the hash-codes among k storage nodes under the real-world limitations that some of them may become temporarily unavailable or new peers may be added into the system. The changes result in changes of the value of k. The straightforward approach would just divide the space of all codes into k equal portions and select the node into whose portion the particular key maps. Yet, if k is changed, all the keys need to be rehashed, which we'd like to avoid at all cost as rehashing the whole database and moving the majority of the keys between the nodes, at once, will saturate the network and bring the system to a complete halt.

The idea or rather the tweak behind Consistent Hashing is simple: we also hash the node ids and store the keys on the node that has the next hash-code larger than the hash of the key (modulo n, i.e. wrap around 0). Now, when a new node is added, it is placed on this so-called "hash ring" between two other peers, so only part of the keys from a single node (the next on the ring) require being redistributed to it. Likewise, when the node is removed, only its keys need to be reassigned to the next peer on the ring (it is supposed that the data is stored in multiple copies on different nodes, so when one of the nodes disappears the data doesn't become totally lost).

The only problem with applying this approach directly is the uneven distribution of keys originating from uneven placement of the hash-codes of the nodes on the hash ring. This problem can be solved with another simple tweak: have multiple ids for each node that will be hashed to different locations, effectively emulating a larger number of virtual nodes, each storing a smaller portion of the keys. Due to the randomization property of hashes, not so many virtual nodes will be needed, to obtain a nearly uniform distribution of keys over the nodes.

A more general version of this approach is called Rendezvous Hashing. In it, the key for the item is combined with the node id for each node and then hashed. The largest value of the hash determines the designated node to store the item.

Hashing in Action: Content Addressing

Hash-tables are so ubiquitous that it's, actually, difficult to single out some peculiar use case. Instead, let's talk about hash-functions. They can find numerous uses beyond determining the positions of the items in the hash-table, and one of them is called "content addressing": globally identify a piece of data by its fingerprint instead of using external meta information like name or path. This is one of the suggested building blocks for large-scale distributed storage systems, but it works locally, as well: your git SCM system silently uses it behind the scenes to identify the changesets it operates upon.

The advantages of Content Addressing are:

  • Potential for space economy: if the system has a chance of operating on repeated items (like git does, although it's not the only reason for choosing such naming scheme for blobs: the other being the lack of a better variant), content addressing will make it possible to avoid storing them multiple times.
  • It guarantees that the links will always return the same content, regardless of where it is retrieved from, who added it to the network, how and when. This enables such distributed protocols as BitTorrent that split the original file into multiple pieces, each one identified by its hash. These pieces can be distributed in an untrusted network.
  • As mentioned above, content addressing also results in a conflict-free naming scheme (provided that the hash has enough bits — usually, cryptographic hashes such as SHA-1 are used for this purpose, although, in many cases, such powerful hash-functions are an overkill).

Take-aways

This chapter resented a number of complex approaches that require a lot of attention to detail to be implemented efficiently. On the surface, the hash-table concept may seem rather simple, but, as we have seen, the production-grade implementations are not that straightforward. What general conclusions can we make?

  1. In such mathematically loaded areas as hash-function and hash-table implementation, rigorous testing is critically important. For there is a number of unexpected sources of errors: incorrect implementation, integer overflow, concurrency issues, etc. A good testing strategy is to use an already existing trusted implementation and perform a large-scale comparison testing with a lot of random inputs. We haven't discussed the testing code here but will return to the practical implementation of such testing frameworks in the following chapters.
  2. Besides, a correct implementation doesn't necessarily mean a fast one. Low-level optimization techniques play a crucial role here.
  3. In the implementation of MPHT, we have seen in action another important approach to solving algorithmic and, more generally, mathematic problems: reducing them to a problem that has a known solution.
  4. Space measurement is another important area of algorithms evaluation that is somewhat harder to accomplish than runtime profiling. We'll also see more usage of both of these tools throughout the book.

Nicolas HafnerLibrary Policy Update - Confession 88

· 48 days ago

header
Recently there's been a few changes to the libraries I maintain, as well as to the new libraries I publish. I thought that these changes would be of general interest, even if you might not use my libraries at all.

License Switch from Artistic 2 to Zlib

Previously all of my libraries were published under the Artistic 2 license. However, a problem was brought to my attention in that license, specifically prevailing to Lisp distributions. Namely, §8 in the license text:

8) You are permitted to link Modified and Standard Versions with other works, to embed the Package in a larger work of your own, or to build stand-alone binary or bytecode versions of applications that include the Package, and Distribute the result without restriction, provided the result does not expose a direct interface to the Package.

The problem being this, in specific:

provided the result does not expose a direct interface to the Package.

This seems to prohibit distributing an application that would expose something like a REPL or a scripting interface that allows the user to interface with the library. I'm not sure what "direct" means in this context, nor what "linking" really means for Lisp - the GPL licenses have a similar issue. Either way, the implications of this restriction are severe enough that I was convinced to abandon the license.

I have since changed the license of all of my libraries and projects to Zlib. Everyone is also hereby permitted to use any prior versions of my projects that were licensed under Artistic 2 under the zlib license terms.

Why Zlib and not MIT or BSD? Simply because I like the clauses that prohibit claiming credit for the software. In any case, I hope that alleviates potential concerns people had about using my software due to the license terms.

Fully Qualified Domain Names for Packages

Since the package local nicknames extension to Common Lisp is now supported widely enough for my tastes, I have decided to drop the short nickname I used to include for packages so far. All of my packages have always included a FQDN name, but also included a shorter nickname for convenience. Newly published libraries will not do this anymore.

You are now encouraged to use the packages by adding local nicknames to your own package definitions. For instance, to use the new Classowary library, you would do something like this:

(defpackage #:org.my.stuff.package
  (:use #:cl)
  (:local-nicknames
    (#:cass #:org.shirakumo.classowary)))

If you use an implementation - like Allegro or LispWorks - that does not support package local nicknames yet, please contact support and request the addition of this feature. It should not be a difficult feature to add, and there is a comprehensive test suite available to aid the process.

If you are a user of my packages and have so far been using the short name, please update your own packages to use a nickname. You won't have to change any other code as long as you do that. You should do this because I would like to alleviate the package namespace pollution problem by removing the short, global aliases from my libraries in the future. Thus, consider the short names to be officially deprecated.

Closing Thoughts

I hope that with these changes I can help push the ecosystem in a positive direction. I would also like to remind everyone of the portability effort. I think the variety of implementations on offer is a big asset to Common Lisp, and I would very much like it to be possible for people to write high quality libraries that work on a variety of implementations without having to expel a huge amount of time repeating the porting work. As such, it would be amazing if people could help to improve existing portability libraries and implementations to extend support coverage.

More articles and Lispy things coming soon, I hope!

Lispers.deLisp-Meetup in Hamburg on Monday, 2nd September 2019

· 53 days ago

We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CEST on 2nd September 2019.

This is an informal gathering of Lispers of all experience levels.

Vsevolod DyomkinProgramming Algorithms: Key-Values

· 53 days ago

To conclude the description of essential data structures, we need to discuss key-values (kvs), which are the broadest family of structures one can imagine. Unlike arrays and lists, kvs are not concrete structures. In fact, they span, at least in some capacity, all of the popular concrete ones, as well as some obscure.

The main feature of kvs is efficient access to the values by some kind of keys that they are associated with. In other words, each element of such data structure is a key-value pair that can be easily retrieved if we know the key, and, on the other hand, if we ask for the key that is not in the structure, the null result is also returned efficiently. By "efficiently", we usually mean O(1) or, at least, something sublinear (like O(log n)), although, for some cases, even O(n) retrieval time may be acceptable. See how broad this is! So, a lot of different structures may play the role of key-values.

By the way, there isn't even a single widely-adopted name for such structures. Besides key-values — which isn't such a popular term (I derived it from key-value stores) — in different languages, they are called maps, dictionaries, associative arrays, tables, objects and so on.

In a sense, these are the most basic and essential data structures. They are so essential that some dynamic languages — for example, Lua, explicitly, and JavaScript, without a lot of advertisement — rely on them as the core (sometimes sole) language's data structure. Moreover, key-values are used almost everywhere. Below is a list of some of the most popular scenarios:

  • implementation of the object system in programming languages
  • most of the key-value stores are, for the most part, glorified key-value structures
  • internal tables in the operating system (running process table or file descriptor tables in the Linux kernel), programming language environment or application software
  • all kinds of memoization and caching
  • efficient implementation of sets
  • ad hoc or predefined records for returning aggregated data from function calls
  • representing various dictionaries (in language processing and beyond)

Considering such a wide spread, it may be surprising that, historically, the programming language community only gradually realized the usefulness of key-values. For instance, such languages as C and C++ don't have the built-in support for general kvs (if we don't count structs and arrays, which may be considered significantly limited versions). Lisp, on the contrary, was to some extent pioneering their recognition with the concepts of alists and plists, as well as being one of the first languages to have hash-table support in the standard.

Concrete Key-values

Let's see what concrete structures can be considered key-values and in which cases it makes sense to use them.

Simple Arrays

Simple sequences, especially arrays, may be regarded as a particular variant of kvs that allows only numeric keys with efficient (and fastest) constant-time access. This restriction is serious. However, as we'll see below, it can often be worked around with clever algorithms. As a result, arrays actually play a major role in the key-value space, but not in the most straightforward form. Although, if it is possible to be content with numeric keys and their number is known beforehand, vanilla arrays are the best possible implementation option. Example: OS kernels that have a predefined limit on the number of processes and a "process table" that is indexed by pid (process id) that lies in the range 0..MAX_PID.

So, let's note this curious fact that arrays are also a variant of key-values.

Associative Lists

The main drawback of using simple arrays for kvs is not even the restriction that all keys should somehow be reduced to numbers, but the static nature of arrays, that do not lend themselves well to resizing. As an alternative, we could then use linked lists, which do not have this restriction. If the key-value contains many elements, linked lists are clearly not ideal in terms of efficiency. Many times, the key-value contains very few elements, perhaps only half a dozen or so. In this case, even a linear scan of the whole list may not be such an expensive operation. This is where various forms of associative lists enter the scene. They store pairs of keys and values and don't impose any restrictions, neither on the keys nor on the number of elements. But their performance quickly degrades below acceptable once the number of elements grows above several. Many flavors of associative lists can be invented. Historically, Lisp supports two variants in the standard library:

  • alists (association lists) are lists of cons pairs. A cons pair is the original Lisp data structure, and it consists of two values called the car and the cdr (the names come from two IBM machine instructions). Association lists have dedicated operations to find a pair in the list (assoc) and to add an item to it (pairlis), although, it may be easier to just push the new cons cell onto it. Modification may be performed simply by altering the cdr of the appropriate cons-cell. ((:foo . "bar") (42 . "baz")) is an alist of 2 items with keys :foo and 42, and values "bar" and "baz". As you can see, it's heterogenous in a sense that it allows keys of arbitrary type.
  • plists (property lists) are flat lists of alternating keys and values. They also have dedicated search (getf) and modify operations (setf getf), while insertion may be performed by calling push twice (on the value, and then the key). The plist with the same data as the previous alist will look like this: (:foo "bar" 42 "baz"). Plists are used in Lisp to represent the keyword function arguments as a whole.

Deleting an item from such lists is quite efficient if we already know the place that we want to clear, but tracking this place if we haven't found it yet is a bit cumbersome. In general, the procedure will be to iterate the list by tails until the relevant cons cell is found and then make the previous cell point to this one's tail. A destructive version for alists will look like this:

(defun alist-del (key alist)
(loop :for tail := alist :then (rest tail)
:for prev := alist :then tail
:when (eql key (car (first tail)))
:do (return (if (eql prev alist) ; special case of first item
(rest alist)
(progn (setf (rest prev) (rest tail))
alist)))))

However, the standard provides higher-level delete operations for plists (remf) and alists: (remove key alist :key 'car).

Both of these ad-hoc list-based kvs have some historical baggage associated with them and are not very convenient to use. Nevertheless, they can be utilized for some simple scenarios, as well as for interoperability with the existing language machinery. And, however counter-intuitive it may seem, if the number of items is small, alists may be the most efficient key-value data structure.

Another nonstandard but more convenient and slightly more efficient variant of associatie lists was proposed by Ron Garret and is called dlists (dictionary lists). It is a cons-pair of two lists: the list of keys and the list of values. The dlist for our example will look like this: ((:foo 42) . ("bar" "baz")).

As the interface of different associative lists is a thin wrapper over the standard list API, the general list-processing knowledge can be applied to dealing with them, so we won't spend any more time describing how they work.

Hash-Tables

Hash-tables are, probably, the most common way to do key-values, nowadays. They are dynamic and don't impose restrictions on keys while having an amortized O(1) performance albeit with a rather high constant. The next chapter will be exclusively dedicated to hash-table implementation and usage. Here, it suffices to say that hash-tables come in many different flavors, including the ones that can be efficiently pre-computed if we want to store a set of items that is known ahead-of-time. Hash-tables are, definitely, the most versatile key-value variant and thus the default choice for such a structure. However, they are not so simple and may pose a number of surprises that the programmer should understand in order to use them properly.

Structs

Speaking of structs, they may also be considered a special variant of key-values with a predefined set of keys. In this respect, structs are similar to arrays, which have a fixed set of keys (from 0 to MAX_KEY). As we already know, structs internally map to arrays, so they may be considered a layer of syntactic sugar that provides names for the keys and handy accessors. Usually, the struct is pictured not as a key-value but rather a way to make the code more "semantic" and understandable. Yet, if we consider returning the aggregate value from a function call, as the possible set of keys is known beforehand, it's a good stylistic and implementation choice to define a special-purpose one-off struct for this instead of using an alist or a hash-table. Here is a small example — compare the clarity of the alternatives:

(defun foo-adhoc-list (arg)
(let ((rez (list)))
...
(push "hello" rez)
...
(push arg rez)
...
rez))

CL-USER> (foo-adhoc-list 42)
(42 "hello")

(defun foo-adhoc-hash (arg)
(let ((rez (make-hash-table)))
...
(:= (gethash :baz rez) "hello")
...
(:= (gethash :quux rez) arg))
...
rez))

CL-USER> (foo-adhoc-hash 42)
#<HASH-TABLE :TEST EQL :COUNT 2 {1040DBFE83}>

(defstruct foo-rez
baz quux)

(defun foo-struct (&rest args)
(let ((rez (make-foo-rez)))
...
(:= (foo-baz rez) "hello")
...
(:= (foo-quux rez) 42))
...
rez))

CL-USER> (foo-struct 42)
#S(FOO-REZ :BAZ "hello" :QUUX 42)

Trees

Another versatile option for implementing kvs is by using trees. There are even more tree variants than hash-tables and we'll also have dedicated chapters to study them. Generally, the main advantage of trees, compared to simple hash-tables, is the possibility to impose some ordering on the keys (although, linked hash-tables also allow for that), while the disadvantage is less efficient operation: O(log n). Also, trees don't require hashing. Another major direction that the usage of trees opens is the possibility of persistent key-values implementation. Some languages, like Java, have standard-library support for tree-based kvs (TreeMap), but most languages delegate dealing with such structures to library authors for there is a wide choice of specific trees and neither may serve as the default choice of a key-value structure.

KV Operations

The primary operation for a kv structure is access to its elements by key: to set, change, and remove. As there are so many different variants of concrete kvs there's a number of different low-level access operations, some of which we have already discussed in the previous chapters and the others will see in the next ones.

Yet, most of the algorithms don't necessarily require the efficiency of built-in accessors, while their clarity will seriously benefit from a uniform generic access operation. Such an operation, as we have already mentioned, is defined by RUTILS and is called generic-elt or ?, for short. We have already seen it in action in some of the examples before. And that's not an accident as kv access is among the most frequent o. In the following chapter, we will stick to the rule of using the specific accessors like gethash when we are talking about some structure-specific operations and ? in all other cases — when clarity matters more than low-level considerations. ? is implemented using the CLOS generic function machinery that provides dynamic dispatch to a concrete retrieval operation and allows defining additional variants for new structures as the need arises. Another useful feature of generic-elt is chaining that allows expressing multiple accesses as a single call. This comes in very handy for nested structures. Consider an example of accessing the first element of the field of the struct that is the value in some hash table: (? x :key 0 'field). If we were to use concrete operations it would look like this: (slot-value (nth 0 (gethash :key x)) 'field).

Below is the backbone of the generic-elt function that handles chaining and error reporting:

(defgeneric generic-elt (obj key &rest keys)
(:documentation
"Generic element access in OBJ by KEY.
Supports chaining with KEYS.")
(:method :around (obj key &rest keys)
(reduce #'generic-elt keys :initial-value (call-next-method obj key)))
(:method (obj key &rest keys)
(declare (ignore keys))
(error 'generic-elt-error :obj obj :key key)))

And here are some methods for specific kvs (as well as sequences):

(defmethod generic-elt ((obj hash-table) key &rest keys)
(declare (ignore keys))
(gethash key obj))

(defmethod generic-elt ((obj vector) key &rest keys)
(declare (ignore keys))
;; Python-like handling of negative indices as offsets from the end
(when (minusp key) (setf key (- (length obj) key)))
(aref obj key))

(defmethod generic-elt ((obj (eql nil)) key &rest keys)
(declare (ignore key keys))
(error "Can't access NIL with generic-elt!"))

generic-setf is a complement function that allows defining setter operations for generic-elt. There exists a built-in protocol to make Lisp aware that generic-setf should be called whenever := (or the standard setf) is invoked for the value accessed with ?: (defsetf ? generic-setf).

It is also common to retrieve all keys or values of the kv, which is handled in a generic way by the keys and vals RUTILS functions.

Key-values are not sequences in a sense that they are not necessarily ordered, although some variants are. But even unordered kvs may be traversed in some random order. Iterating over kvs is another common and essential operation. In Lisp, as we already know, there are two complimentary iteration patterns: the functional map- and the imperative do-style. RUTILS provides both of them as mapkv and dokv, although I'd recommend to first consider the macro dotable that is specifically designed to operate on hash-tables.

Finally, another common necessity is the transformation between different kv representations, primarily, between hash-tables and lists of pairs, which is also handled by RUTILS with its ht->pairs/ht->alist and pairs->ht/alist->ht functions.

As you see, the authors of the Lisp standard library hadn't envisioned the generic key-value access protocols, and so it is implemented completely in a 3rd-party addon. Yet, what's most important is that the building blocks for doing that were provided by the language, so this case shows the critical importance that these blocks (primarily, CLOS generic functions) have in future-proofing the language's design.

Memoization

One of the major use cases for key-values is memoization — storing the results of previous computations in a dedicated table (cache) to avoid recalculating them. Memoization is one of the main optimization techniques; I'd even say the default one. Essentially, it trades space for speed. And the main issue is that space is also limited so memoization algorithms are geared towards optimizing its usage to retain the most relevant items, i.e. maximize the probability that the items in the cache will be reused.

Memoization may be performed ad-hoc or explicitly: just set up some key scheme and a table to store the results and add/retrieve/remove the items as needed. It can also be delegated to the compiler in the implicit form. For instance, Java or Python provide the @memoize decorator: once it is used with the function definition, each call to it will pass through the assigned cache using the call arguments as the cache keys. This is how the same feature may be implemented in Lisp, in the simplest fashion:

(defun start-memoizing (fn)
(stop-memoizing fn)
(:= (symbol-function fn)
(let ((table (make-hash-table :test 'equal))
(vanilla-fn (symbol-function fn)))
(:= (get fn :cache) table
(get fn :fn) vanilla-fn)
(lambda (&rest args)
(getset# (format nil "~{~A~^|~}" args)
table
(apply vanilla-fn args))))))

(defun stop-memoizing (fn)
(when (get fn :fn)
(:= (symbol-function fn) (get fn :fn)
(get fn :fn) nil)))

CL-USER> (defun foo (x)
(sleep 5)
x)
CL-USER> (start-memoizing 'foo)
CL-USER> (time (foo 1))
Evaluation took:
5.000 seconds of real time
CL-USER> (time (foo 1))
Evaluation took:
0.000 seconds of real time
CL-USER> (time (foo 2))
Evaluation took:
5.001 seconds of real time

We use a hash-table to store the memoized results. The getset# macro from RUTILS tries to retrieve the item from the table by key and, if it's not present there, performs the calculation given as its last argument returning its result while also storing it in the table at key. Another useful Lisp feature utilized in this facility is called "symbol plist": every symbol has an associated key-value plist. Items in this plist can be retrieved using the get operator.[1]

This approach is rather primitive and has a number of drawbacks. First of all, the hash-table is not limited in capacity. Thus if it is used carelessly, a memory-leak is inevitable. Another possible issue may occur with the keys, which are determined by simply concatenating the string representations of the arguments — possibly, non-unique. Such bug may be very subtle and hard to infer. Overall, memoization is the source of implicit behavior that always poses potential trouble but sometimes is just necessary. A more nuanced solution will allow us to configure both how the keys are calculated and various parameters of the cache, which we'll discuss next. One more possible decision to make might be about what to cache and what not: for example, we could add a time measurement around the call to the original function and only when it exceeds a predefined limit the results will be cached.

Cache Invalidation

The problem of cache invalidation arises when we set some limit on the size of the cache. Once it is full — and a properly setup cache should be full, effectively, all the time — we have to decide which item to remove (evict) when we need to put a new one in the cache. I've already mentioned the saying that (alongside naming things) it is the hardest challenge in computer science. In fact, it's not, it's rather trivial, from the point of view of algorithms. The hard part is defining the notion of relevance. There are two general approximations which are used unless there are some specific considerations: frequency of access or time of last access. Let's see the algorithms built around these. Each approach uses some additional data stored with each key. The purpose of the data is to track one of the properties, i.e., either frequency of access or time of last access.

Second Chance and Clock Algorithms

The simplest approach to cache invalidation except for random choice eviction may be utilized when we are severely limited in the amount of additional space we can use per key. Usually, this situation is typical for hardware caches. The minimal possible amount of information to store is 1 bit. If we have just as much space, the only option we have is to use it as a flag indicating whether the item was accessed again after it was put into the cache. This technique is very fast and very simple. And improves cache performance to some extent. There may be two ways of tracking this bit efficiently:

  1. Just use a bit vector (usually called "bitmap", in such context) of the same length as the cache size. To select the item for eviction, find the first 0 from the left or right. With the help of one of the hardware instructions from the bit scan family (ffs — find first zero, clz — count trailing zeros, etc.), this operation can be blazingly fast. In Lisp, we could use the high-level function position:
    (defun find-candidate-second-chance (bitmap)
    (declare (type bit-vector bitmap))
    (position 0 bitmap))

    The type declaration is necessary for the implementation to emit the appropriate machine instruction. If you're not confident in that, just disassemble the function and look at the generated machine code:

    CL-USER> (disassemble 'find-candidate-second-chance)
    ; disassembly for FIND-CANDIDATE-SECOND-CHANCE
    ; Size: 228 bytes. Origin: #x103A8E42F0
    ...
    ; 340: B878D53620 MOV EAX, #x2036D578 ; #<FDEFN SB-KERNEL:%BIT-POSITION/0>
    ...

    So, SBCL uses sb-kernel:%bit-position/0, nice. If you look inside this function, though, you'll find out that it's also pretty complicated. And, overall, there are lots of other assembler instructions in this piece, so if our goal is squeezing the last bit out of it there's more we can do:

    • Force the implementation to optimize for speed: put (declaim (optimize (speed 3) (debug 0) (safety 1))) at the top of the file with the function definition or use proclaim in the REPL with the same declarations.
    • Use the low-level function sb-kernel:%bit-position/0 directly.
    • Go even deeper and use the machine instruction directly — SBCL allows that as well: (sb-vm::%primitive sb-vm::unsigned-word-find-first-bit x). But this will be truly context-dependent (on the endianness, hardware architecture, and the size of the bit vector itself, which should fit into a machine word for this technique to work).

    However, there's one problem with the function find-candidate-second-chance: if all the bits are set it will return nil. By selecting the first element (or even better, some random element), we can fix this problem. Still, eventually, we'll end up with all elements of the bitmap set to 1, so the method will degrade to simple random choice. It means that we need to periodically reset the bit vector. Either on every eviction — this is a good strategy if we happen to hit the cache more often than miss. Or after some number of iterations. Or after every bit is set to 1.

  2. Another method for selecting a candidate to evict is known as the Clock algorithm. It keeps examining the visited bit of each item, in a cycle: if it's equal to 1 reset it and move to the next item; if it's 0 — select the item for eviction. Basically, it's yet another strategy for dealing with the saturation of the bit vector. Here's how it may be implemented in Lisp with the help of the closure pattern: the function keeps track of its internal state, using a lexical variable that is only accessible from inside the function, and that has a value that persists between calls to the function. The closure is created by the let block and the variable closed over is i, here:
  3. (let ((i 0))
    (defun find-candidate-clock (bitmap)
    (declare (type (vector bit) bitmap))
    (loop :with len := (length bitmap)
    :until (zerop (svref bitmap i))
    :do (:+ i)
    (when (= i len)
    (:= i 0)))
    i))

    Our loop is guaranteed to find the zero bit at least after we cycle over all the elements and return to the first one that we have set to zero ourselves. Obviously, here and in other places where it is not stated explicitly, we're talking about single-threaded execution only.

LFU

So, what if we don't have such a serious restriction on the size of the access counter? In this case, a similar algorithm that uses a counter instead of a flag will be called least frequently used (LFU) item eviction. There is one problem though: the access counter will only grow over time, so some items that were heavily used during some period will never be evicted from the cache, even though they may never be accessed again. To counteract this accumulation property, which is similar to bitmap saturation we've seen in the previous algorithm, a similar measure can be applied. Namely, we'll have to introduce some notion of epochs, which reset or diminish the value of all counters. The most common approach to epochs is to right shift each counter, i.e. divide by 2. This strategy is called aging. An LFU cache with aging may be called LRFU — least frequently and recently used.

As usual, the question arises, how often to apply aging. The answer may be context-dependent and dependent on the size of the access counter. For instance, usually, a 1-byte counter, which can distinguish between 256 access operations, will be good enough, and it rarely makes sense to use a smaller one as most hardware operates in byte-sized units. The common strategies for aging may be:

  • periodically with an arbitrarily chosen interval — which should be enough to accumulate some number of changes in the counters but not to overflow them
  • after a certain number of cache access operations. Such an approach may ensure that the counter doesn't overflow: say, if we use a 1-byte counter and age after each 128 access operations the counter will never exceed 192. Or we could perform the shift after 256 operations and still ensure lack of overflows with high probability

LRU

An alternative approach to LFU is LRU — evict the item that was used the longest time ago. LRU means that we need to store either last-access timestamps or some generation/epoch counters. Another possibility is to utilize access counters, similar to the ones that were used for LFU, except that we initialize them by setting all bits to 1, i.e. to the maximum possible value (255 for 1-byte counter). The counters are decremented, on each cache access, simultaneously for all items except for the item being accessed. The benefit of such an approach is that it doesn't require accessing any external notion of time making the cache fully self-contained, which is necessary for some hardware implementations, for instance. The only thing to remember is not to decrement the counter beyond 0 :)

Unlike LFU, this strategy can't distinguish between a heavily-accessed item and a sparingly-accessed one. So, in the general case, I'd say that LFU with aging (LRFU) should be the default approach, although its implementation is slightly more complex.

Memoization in Action: Transposition Tables

Transposition Tables is a characteristic example of the effective usage of memoization, which comes from classic game AI. But the same approach may be applied in numerous other areas with lots of computation paths that converge and diverge at times. We'll return to similar problems in the last third of this book.

In such games as chess, the same position may be reached in a great variety of moves. All possible sequences are called transpositions, and it is obvious that, regardless of how we reached a certain position, if we have already analyzed that situation previously, we don't need to repeat the analysis when it repeats. So, caching the results allows us to save a lot of redundant computation. However, the number of positions, in chess, that comes up during the analysis is huge so we don't stand a chance of remembering all of them. In this case, a good predictor for the chance of a situation to occur is very likely the number of times it has occurred in the past. For that reason, an appropriate caching technique, in this context, is plain LFU. But there's more. Yet, another measure of the value of a certain position is how early it occurred in the game tree (since the number of possible developments, from it, is larger). So, classic LFU should be mixed with this temporal information yielding a domain-specific caching approach. And the parameters of combining the two measures together are subject to empirical evaluation and research.

There's much more to transposition tables than mentioned in this short introduction. For instance, the keys describing the position may need to include additional information if the history of occurrence in it impacts the further game outcome (castling and repetition rules). Here's, also, a quote from Wikipedia on their additional use in another common chess-playing algorithm:

The transposition table can have other uses than finding transpositions. In alpha-beta pruning, the search is fastest (in fact, optimal) when the child of a node corresponding to the best move is always considered first. Of course, there is no way of knowing the best move beforehand, but when iterative deepening is used, the move that was found to be the best in a shallower search is a good approximation. Therefore this move is tried first. For storing the best child of a node, the entry corresponding to that node in the transposition table is used.

Low-Level Caching

So, memoization is the primary tool for algorithm optimization, and the lower we descend into our computing platform the more this fact becomes apparent. For hardware, it is, basically, the only option. There are many caches in the platform that act behind the scenes, but which have a great impact on the actual performance of your code: the CPU caches, the disk cache, the page cache, and other OS caches. The main issue, here, is the lack of transparency into their operation and sometimes even the lack of awareness of their existence. This topic is, largely, beyond the scope of our book, so if you want to learn more, there's a well-known talk "A Crash Course in Modern Hardware" and an accompanying list of "Latency Numbers Every Programmer Should Know" that you can start with. Here, I can provide only a brief outline.

The most important cache in the system is the CPU cache — or, rather, in most of the modern architectures, a system of 2 or 3 caches. There's an infamous von-Neumann's bottleneck of the conventional computer hardware design: the CPU works roughly 2 orders of magnitude faster than it can fetch data from memory. Last time I checked, the numbers were: execution of one memory transfer took around 250-300 CPU cycles, i.e. around 300 additions or other primitive instructions could be run during that time. And the problem is that CPUs operate only on data that they get from memory, so if the bottleneck didn't exist at all, theoretically, we could have 2 orders of magnitude faster execution. Fortunately, the degradation in performance is not so drastic, thanks to the use of CPU caches: only around an order of magnitude. The cache transfer numbers are the following: from L1 (the fastest and hence smallest) cache — around 5 cycles, from L2 — 20-30 cycles, from L3 — 50-100 cycles (that's why L3 is, not always used as it's almost on par with the main memory). Why do I say that fastest means smallest? Just because fast access memory is more expensive and requires more energy. Otherwise, we could just make all RAM as fast as the L1 cache.

How these caches operate? This is one of the things that every algorithmic programmer should know, at least, in general. Even if some algorithm seems good on paper, a more cache-friendly one with worse theoretical properties may very well outperform it.

The CPU cache temporarily stores contents of the memory cells (memory words) indexed by their addresses. It is called set-associative as it operates not on single cells but on sequential blocks of those (in the so-called cache lines). The L1 cache of size 1MB, usually, will store 64 such blocks each one holding 16 words. This approach is oriented towards the normal sequential layout of executable code, structures, and arrays — the majority of the memory contents. And the corresponding common memory access pattern — sequential. I.e., after reading one memory cell, usually, the processor will move on to the next: either because it's the next instruction to execute or the next item in the array being iterated over. That's why so much importance in program optimization folklore is given to cache alignment, i.e. structuring the program's memory so that the things commonly accessed together will fit into the same cache line. One example of this principle is the padding of structures with zeroes to align their size to be a multiple of 32 or 64. The same applies to code padding with nops. And this is another reason why arrays are a preferred data structure compared to linked lists: when the whole contents fit in the same cache line its processing performance is blazingly fast. The catch, though, is that it's, practically, impossible, for normal programmers, to directly observe how CPU cache interoperates with their programs. There are no tools to make it transparent so what remains is to rely on the general principles, second-guessing, and trial&error.;

Another interesting choice for hardware (and some software) caches is write-through versus write-back behavior. The question is how the cache deals with cached data being modified:

  • either the modifications will be immediately stored to the main storage, effectively, making the whole operation longer
  • or they may, first, be persisted to the cache only; while writing to the backing store (synchronization) will be performed on of all data in the cache at configured intervals

The second option is faster as there's a smaller number of expensive round-trips, but it is less resilient to failure. A good example of the write-back cache in action is the origin of the Windows "Safely remove hardware" option. The underlying assumption is that the data to be written to the flash drive passes through the OS cache, which may be configured in the write-back fashion. In this case, forced sync is required before disconnecting the device to ensure that the latest version of the cached data is saved to it.

Another example of caching drastically impacting performance, which everyone is familiar with, is paging or swapping — an operation performed by the operating system. When the executing programs together require more (virtual) memory than the size of the RAM that is physically available, the OS saves some of the pages of data that these program use to a place on disk known as the swap section.

A few points we can take away from this chapter:

  1. Key-values are very versatile and widely-used data structures. Don't limit your understanding of them to a particular implementation choice made by the designers of the programming language you're currently using.
  2. Trading space for time is, probably, the most wide-spread and impactful algorithmic technique.
  3. Caching, which is a direct manifestation of this technic and one of the main applications of key-value data structures, is one of the principal factors impacting program performance, on a large scale. It may be utilized by the programmer in the form of memoization, and will also inevitably be used by the underlying platform, in hard to control and predict ways. The area of program optimization for efficient hardware utilization represents a distinct set of techniques, requiring skills that are obscure and also not fully systematized.

Footnotes:

[1] Symbol plists represent on of the unpleasant legacy features of the language, in that the most obvious accessor name, namely get, is reserved for working with symbols. Therefore, this name can not be used for accessing other kinds of data. Historically, symbol plists where the first and only variant of key-values available in the language (at that time, the other languages didn't have the slightest idea of such a high-level concept).

Vsevolod DyomkinRUTILS 5.0 and Tutorial

· 55 days ago

RUTILS is my take on the Lisp "modernization" effort that adds the missing syntactic and data structure pieces, which became proven and well-established, in Lisp itself or in other languages. The programming field is constantly developing while the Lisp standard remains fixed so some additions, over time, are not only desirable but inevitable, if we don't want to lag behind. Thankfully, Lisp provides all the necessary means for implementing them and so, with some creativity, there's a way to have access to almost anything you want and need while retaining full backward compatibility (a lack of which is the most critical problem of some alternative solutions).

I, surely, understand that using such an extension remains a matter of taste and not every Lisper will like it. I didn't try to seriously promote it and was quite satisfied with the benefit that it provided to me and my teams' development. However, as I decided to use it for the "Programming Algorithms" book, it received some attention and a number of questions. From the direction of the discussions, I realized that the docs are lacking a critical part — the tutorial explaining how to effectively use the library. This text is intended to bridge that gap. I had to finish it before publishing the next chapter of the book, which I'll do on Friday.

So, today, version 5 of RUTILS is released alongside with the tutorial that aims to better explain its usage.

Leo ZovicError Handling In Context Managers

· 55 days ago

In various Lisps, there's a semi-common pattern of context wrapping with error handling. You have a situation in which you want to do something, and in the process bind some external resource, then free the resource afterwards.

(let* ((resource (open-resource foo))
       (result (progn
		 (bar)
		 (baz)
		 (mumble))))
  (close-resource resource)
  (delete-resource-from-disk resource)
  result)

You can write a relatively simple wrapper macro for this situation.

(defmacro with-resource ((name target) &body body)
  (let ((result (gensym)))
    `(let* ((,name (open-resource ,target))
	    (,result (progn ,@body)))
       (close-resource resource)
       (delete-resource-from-disk resource)
       ,result)))

With that definition, you can instead write.

(with-resource (resource foo)
  (bar)
  (mumble (baz resource)))

Ok, but what if the code you've written throws an error somehow?

(with-resource (resource foo)
  (bar)
  (error "Arbitrary Explosion")
  (mumble (baz resource)))

Your routine doesn't complete, but also, the claimed resource never gets freed afterwards. You could fix this by just always wrapping the stuff you wrap with with-resource in some error-trapping. ignore-errors/handler-case/handler-bind depending on the specific situation.

(with-resource (resource foo)
  (ignore-errors
    (bar)
    (error "Arbitrary Explosion")
    (mumble (baz resource))))

However, it'd still be nice to be more responsible as a macro developer and do the right thing with the bound resource without depending on your user doing the right thing. The solution is unwind-protect 1.

(defmacro with-resource ((name target) &body body)
  (let ((result (gensym)))
    `(let* ((,name (open-resource ,target)))
       (unwind-protect
	    (let ((,result (ignore-errors (progn ,@body))))
	      (resolve-resource resource)
	      ,result)
	 (close-resource resource)
	 (delete-resource-from-disk resource)))))

When you're deaing with deploying micro-services, it sometimes gets a bit trickier. You generally want some central log/diagnostic server to be notified of the error condition. That's not a situation where you want something to happen regardless of error presence; it's a situation where you want something different to happen on error. For a concrete example, imagine needing to do something locally that involves downloading, poking, and then deletin a file from a URL. unwind-protect could still help, but it'd be only part of the story.

(defmacro with-pdf-from-uri ((path uri) &body body)
  (let ((result (gensym)))
    `(handler-case
	 (let ((,result (http-request uri))
	       (,path (with-output-to-temporary-file (s) (write (http-body ,result) :stream s))))
	   (unwind-protect
		(progn ,@body)
	     (delete-file))
	   (write-to-disk))
       (http-error (e)
	 (remote-log "Failed to download PDF")
	 (error e))
       (cannot-create-temporary-file (e)
	 (remote-log "Failed to create tempfile")
	 (error e))
       (error (e)
	 (remote-log "An ancient evil stirs. Your lights flicker. In the distance, sirens." e)
	 (error e)))))

Off the top of my head, I'd write something like this, though there's possibly better ways of abstracting the situation.

And Now, For Something Completely Different

You can do something similar in Python too. For example

@contextmanager
def logged(tag):
    print(f"Starting a procedure <{tag}>...")
	yield
	print(f"Finished the procedure <{tag}>...")
>>> with logged("MY TEST"):
...     print("blah")
...
Starting a procedure <MY TEST>...
blah
Finished the procedure <MY TEST>...
>>>

And now, the point of this entire post, yes, you can wrap that yield in a try/catch and have it do what you're expecting.

@contextmanager
def logged(tag):
    print(f"Starting a procedure <{tag}>...")
    try:
        yield
        print("IT WORKED!")
    except:
        print("OH NO! ETERNAL DARKNESS AWAITS YOU!")
    print(f"Finished the procedure <{tag}>...")

Now you can do

>>> with logged("MY TEST"):
...     print("blah")
...
Starting a procedure <MY TEST>...
blah
IT WORKED!
Finished the procedure <MY TEST>...
>>> with logged("MY TEST"):
...     raise("EXPLOSIONS")
...
Starting a procedure <MY TEST>...
OH NO! ETERNAL DARKNESS AWAITS YOU!
Finished the procedure <MY TEST>...
>>>

Hopefully, you found that reassuring.

  1. Incidentally, thank you to the readers who pointed this out. I left out what turned out to be a key piece of context from the real situation I was dealing with for the sake of expedience (this situation came up in a Python context in real life). I've added the relevant explanation; hopefully that clarifies everything (and I'm not being denser than I thought I was being).

Zach BeaneGeneric vs. plain functions

· 62 days ago

I really like this comment from Stas Boukarev on Methods or Functions, when do I use which on reddit:

[You] don’t make an extensible program by just making each function generic, if each function still makes some assumption on the data. You can’t say “oh, it’s extensible, but you have to define a hundred methods of your own”, an extensible protocol will require defining a handful of pertinent methods and then the hundred of other functions will automatically work on new classes.

Lispers.deBerlin Lispers Meetup, Monday, 26th August 2019

· 63 days ago

We meet again on Monday 8pm, 26th August. Our host this time is James Anderson (www.dydra.com).

Berlin Lispers is about all flavors of Lisp including Clojure, Scheme and Common Lisp.

Willem Broekema will talk about his work-in-progress on "Developments in the AllegroGraph query engine".

We meet in the Taut-Haus at Engeldamm 70 in Berlin-Mitte, the bell is "James Anderson". It is located in 10min walking distance from U Moritzplatz or U Kottbusser Tor or Ostbahnhof. In case of questions call Christian +49 1578 70 51 61 4.

Nicolas HafnerThe End of Daily Gamedev - Confession 87

· 64 days ago

header
It's been two months now since I started to do daily game development streams. I've been trying my best, but it is time for this to come to a close. In this article I'll talk about the various things that happened, why I'm stopping, and the future of the Leaf game. Strap in!

It's actually been slightly longer than two months, but since I missed some days due to being sick, and some others because I didn't feel like streaming - more on that later - I'll just count it as two months. In any case, in this time I've done 56 streams, almost all of them two hours long. That's a lot of hours, and I'm truly impressed that some people stuck around for almost all of them. Thank you very much! A lot happened in that time too, and I think it would be interesting to go over some of the major features and talk about them briefly.

New Features in Leaf

Slopes and Collision

Collision detection was heavily revised from the previous version. The general procedure is to scan the current chunk for hits until there are no more hits to be found. If we have more than ten hits we assume that the player is in a wall somehow and just die. The number ten is obviously arbitrary, but somehow it seems sufficient and I haven't had any accidental deaths yet.

When a hit is detected, it dispatches on the type of tile or entity that was collided with. It does so in two steps, the first is a test whether the collision will happen at all, to allow sub-tile precision, and the second is the actual collision resolution, should a full hit have been detected. The first test can be used to elide collisions with jump-through platforms or slopes if the player moves above the actual slope surface. The actual collision resolution is typically comprised of moving the player to the collision point, updating velocity along the hit normal, and finally zipping out of the ground if necessary to avoid floating point precision issues.

The collision detection of the slopes itself is surprisingly simple and works on the same principle as swept AABB tests: we can enlarge the slope triangle by simply moving the line towards the player by the player's half-size. Once this shift is done we only need to do a ray-line collision test. During resolution there's some slight physics cheating going on to make the player stick to the ground when going down a slope, rather than flying off, but that's it.

Packets and File Formats

Leaf defines a multitude of file formats. These formats are typically all defined around the idea of a packet - a collection of files in a directory hierarchy. The idea of a packet allows me to define these formats as both directly on disk, in-memory as some data structure, or encapsulated within an archive. The packet protocol isn't that complicated and I intend on either at least putting it into Trial, or putting it into its own library altogether. Either way, it allows the transparent implementation of these formats regardless of backing storage.

The actual formats themselves also follow a very similar file structure: a meta.lisp file for a brief metadata header, which identifies the format, the version, and some authoring metadata fields. This file is in typical s-expression form and can be used to create a version object, which controls the loading and writing process of the rest of the format. In the current v0, this usually means an extra data.lisp payload file, and a number of other associated payload files like texture images.

The beauty of using generic functions with methods that specialise both on the version and object at the same time is that it allows me to define new versions in terms of select overrides, so that I can specify new behaviour for select classes, rather than having to redo the entire de/serialisation process, or breaking compatibility altogether.

Dialogue and Quests

The dialogue and quests are implemented as very generic systems that should have the flexibility (I hope) to deal with all the story needs I might have in the future. Dialogue is written in an extended dialect of Markless. For instance, the following is a valid dialogue snippet:

~ Fi
| (:happy) Well isn't this a sight for sore eyes!
| Finally a bit of sunshine!

- I don't like rain
  ~ Player
  | I don't mind the rain, actually.
  | Makes it easier to think.
- Yeah!
  ~ Player
  | Yeah, it's been too long! Hopefully this isn't announcing the coming of a sandstorm.
  ! incf (favour 'fi)
- ...
  ! decf (favour 'fi)

~ Fi
| ? (< 3 (favour 'fi))
| | So, what's our next move?
| |?
| | Alright, good luck out there!

The list is translated into a choice for the player to make, which can impact the dialogue later. The way this is implemented is through a syntax extension in the cl-markless parser, followed by a compiler from the Markless AST to an assembly language, and a virtual machine to execute the assembly. The user of the dialogue system only needs to implement the evaluation of commands, the display of text, and the presentation of choices.

The quest system on the other hand is based on node graphs. Each quest is represented as a directed graph of task nodes, each describing a task the player must fulfil through an invariant and a success condition. On success, one or more successor tasks can be unlocked. Tasks can also spawn dialogue pieces to become available as interactions with NPCs or items. The system is smart enough to allow different, competing branches, as well as parallel branches to complete a quest. I intend on building a graph editor UI for this once Alloy is further along.

Both of these systems are, again, detached enough that I'll either put them into Trial, or put them into a completely separate library altogether. I'm sure I'll need to adjust things once I actually have some written story on hand to use these systems with.

Platforming AI

The platforming AI allows characters to move along the terrain just like the player would. This is extremely useful for story reasons, so that characters can naturally move to select points, or idle around places rather than just standing still. The way this is implemented is through a node graph that describes the possible movement options from one valid position to the next. This graph is built through a number of scanline passes over the tile map that either add new nodes or connect existing nodes together in new ways.

The result is a graph with nodes that can connect through walk, crawl, fall, or jump edges. A character can be moved along this graph by first running A* to find a shortest path to the target node, and then performing a real-time movement through the calculated path. Generally the idea is to always move the player in the direction of the next target node until that node has been reached, in which case it's popped off the path. The jump edges already encode the necessary jump parameters to use, so when reaching a jump node the character just needs to assume the initial velocity and let standard physics do the rest.

The implementation includes a simple visualiser so that you can see how characters would move across the chunk terrain. When the chunk terrain changes, the node graph is currently just recomputed from scratch which isn't fast, but then again during gameplay the chunk isn't going to change anyway so it's only really annoying during editing. I'll think about whether I want to implement incremental updates.

Lighting

Leaf has gone through two lighting systems. The old one worked through signed distance fields that were implicitly computed through a light description. New light types required new shader code to evaluate the SDF, and each light required many operations in the fragment stage, which is costly.

The new system uses two passes, in the first lights are rendered to a separate buffer. The lights are rendered like regular geometry, so we can use discrete polygons to define light areas, and use other fancy tricks like textured lights. In the second pass the fragment shader simply looks up the current fragment position in the light texture and mixes the colours together.

In effect this new system is easier to implement, more expressive, and much faster to run. Overall it's a massive win in almost every way I can imagine. There's further improvements I want to make still, such as shadow casting, dynamic daylights, and light absorption mapping to allow the light to dissipate into the ground gradually.

Alloy

Alloy is a new user interface toolkit that I've been working on as part of Leaf's development. I've been in need for a good UI toolkit that I can use within GL (and otherwise) for a while, and a lot of Leaf's features had to be stalled because I didn't have one yet. However, a lot of Alloy's development is also only very distantly related to game development itself, and hardly at all related to the game itself. Thus I think I'll talk more about Alloy in other articles sometime.

Why I'm Stopping

I initially started this daily stuff to get myself out of a rut. At the time I wasn't doing much at all, and that bothered me a lot, so committing to a daily endeavour seemed like a good way to kick myself out of it. And it was! For a long time it worked really well. I enjoyed the streams and made good progress with the game.

Unfortunately I have the tendency to turn things like this into enormous burdens for myself. The stream turned from something I wanted to do into something I felt I had to do, and then ultimately into something I dreaded doing. This has happened before with all of my projects, especially streaming ones. With streams I quickly feel a lot of pressure because I get the idea that people aren't enjoying the content, that it's just a boring waste of time. Maybe it is, or maybe it isn't, I don't know. Either way, having to worry about the viewers and not just the project I'm working on, especially trying to constrain tasks to interesting little features that can fit into two hours turns into a big constraint that I can't keep up anymore.

There's a lot of interesting work left to be done, sure, but I just can't bear things anymore at the moment. Dreading the stream poisoned a lot of the rest of my days and ultimately started to hurt my productivity and well-being over the past two weeks. Maybe I'll do more streams again at some point in the future, but for now I need a break for an indeterminate amount of time.

The Future of Leaf

Leaf isn't dead, though. I intend to keep working on it on my own, and I really do want to see it finished one day, however far away that day may be. Currently I feel like I need to focus on writing, which is a big challenge for me. I'm a very, very inexperienced writer, especially when it comes to long-form stories and world-building. There I have practically no idea on how to do anything. If you are a writer, or are interested in talking shop about stories, please contact me.

Other than writing I'm probably going to mostly work on Alloy in the immediate future. I hope to have a better idea of the writing once I'm done, and that should give rise to more features to implement in Leaf directly. I'll try to keep posting updates on the blog here as things progress in any case, and there's a few systems I'd like to elaborate on in technical articles as well.

Thanks to everyone who read my summaries, watched the streams or recordings, and chatted live during this time. It means a lot to me to see people genuinely interested in what I do.

Vsevolod DyomkinProgramming Algorithms: Linked Lists

· 65 days ago

Linked data structures are in many ways the opposite of the contiguous ones that we have explored to some extent in the previous chapter using the example of arrays. In terms of complexity, they fail where those ones shine (first of all, at random access) — but prevail at scenarios when a repeated modification is necessary. In general, they are much more flexible and so allow the programmer to represent almost any kind of a data structure, although the ones that require such level of flexibility may not be too frequent. Usually, they are specialized trees or graphs.

The basic linked data structure is a singly-linked list.

Just like arrays, lists in Lisp may be created both with a literal syntax for constants and by calling a function — make-list — that creates a list of a certain size filled with nil elements. Besides, there's a handy list utility that is used to create lists with the specified content (the analog of vec).

CL-USER> '("hello" world 111)
("hello" WORLD 111)
CL-USER> (make-list 3)
(NIL NIL NIL)
CL-USER> (list "hello" 'world 111)
("hello" WORLD 111)

An empty list is represented as () and, interestingly, in Lisp, it is also a synonym of logical falsehood (nil). This property is used very often, and we'll have a chance to see that.

If we were to introduce our own lists, which may be quite a common scenario in case the built-in ones' capabilities do not suit us, we'd need to define the structure "node", and our list would be built as a chain of such nodes. We might have wanted to store the list head and, possibly, tail, as well as other properties like size. All in all, it would look like the following:

(defstruct list-cell
data
next)

(defstruct our-own-list
(head nil :type (or list-cell null))
(tail nil :type (or list-cell null)))

CL-USER> (let ((tail (make-list-cell :data "world")))
(make-our-own-list
:head (make-list-cell
:data "hello"
:next tail)
:tail tail))
#S(OUR-OWN-LIST
:HEAD #S(LIST-CELL
:DATA "hello"
:NEXT #S(LIST-CELL :DATA "world" :NEXT NIL))
:TAIL #S(LIST-CELL :DATA "world" :NEXT NIL))

Lists as Sequences

Alongside arrays, list is the other basic data structure that implements the sequence abstract data type. Let's consider the complexity of basic sequence operations for linked lists:

  • so-called random access, i.e. access by index of a random element, requires O(n) time as we have to traverse all the preceding elements before we can reach the desired one (n/2 operations on average)
  • yet, once we have reached some element, removing it or inserting something after it takes O(1)
  • subsequencing is also O(n)

Getting the list length, in the basic case, is also O(n) i.e. it requires full list traversal. It is possible, though, to store list length as a separate slot, tracking each change on the fly, which means O(1) complexity. Lisp, however, implements the simplest variant of lists without size tracking. This is an example of a small but important decision that real-world programming is full of. Why is such a solution the right thing™, in this case? Adding the size counter to each list would have certainly made this common length operation more effective, but the cost of doing that would've included: increase in occupied storage space for all lists, a need to update size in all list modification operations, and, possibly, a need for a more complex cons cell implementation[1]. These considerations make the situation with lists almost opposite to arrays, for which size tracking is quite reasonable because they change much less often and not tracking the length historically proved to be a terrible security decision. So, what side to choose? A default approach is to prefer the solution which doesn't completely rule out the alternative strategy. If we were to choose a simple cons-cell sans size (what the authors of Lisp did) we'll always be able to add the "smart" list data structure with the size field, on top of it. Yet, stripping the size field from built-in lists won't be possible. Similar reasoning is also applicable to other questions, such as: why aren't lists, in Lisp, doubly-linked. Also, it helps that there's no security implication as lists aren't used as data exchange buffers, for which the problem manifests itself.

For demonstration, let's add the size field to our-own-list (and, meanwhile, consider all the functions that will need to update it...):

(defstruct our-own-list
(head nil :type (or list-cell nil))
(tail nil :type (or list-cell nil))
(size 0 :type (integer 0)))

Given that obtaining the length of a list, in Lisp, is an expensive operation, a common pattern in programs that require multiple requests of the length field is to store its value in some variable at the beginning of the algorithm and then use this cached value, updating it if necessary.

As we see, lists are quite inefficient in random access scenarios. However, many sequences don't require random access and can satisfy all the requirements of a particular use case using just the sequential one. That's one of the reasons why they are called sequences, after all. And if we consider the special case of list operations at index 0 they are, obviously, efficient: both access and addition/removal is O(1). Also, if the algorithm requires a sequential scan, list traversal is rather efficient too, although not as good as array traversal for it still requires jumping over the memory pointers. There are numerous sequence operations that are based on sequential scans. The most common is map, which we analyzed in the previous chapter. It is the functional programming alternative to looping, a more high-level operation, and thus simpler to understand for the common cases, although less versatile.

map is a function that works with different types of built-in sequences. It takes as the first argument the target sequence type (if nil is supplied it won't create the resulting sequence and so will be used just for side-effects). Here is a polymorphic example involving lists and vectors:

CL-USER> (map 'vector '+
'(1 2 3 4 5)
#(1 2 3))
#(2 4 6)

map applies the function provided as its second argument (here, addition) sequentially to every element of the sequences that are supplied as other arguments, until one of them ends, and records the result in the output sequence. map would have been even more intuitive, if it just had used the type of the first argument for the result sequence, i.e. be a "do what I mean" dwim-map, while a separate advanced variant with result-type selection might have been used in the background. Unfortunately, the current standard scheme is not for change, but we can define our own wrapper function:

(defun dwim-map (fn seq &rest seqs)
"A thin wrapper over MAP that uses the first SEQ's type for the result."
(apply 'map (type-of seq) fn seqs))

map in Lisp is, historically, used for lists. So there's also a number of list-specific map variants that predated the generic map, in the earlier versions of the language, and are still in wide use today. These include mapcar, mapc, and mapcan (replaced in RUTILS by a safer flat-map). Now, let's see a couple of examples of using mapping. Suppose that we'd like to extract odd numbers from a list of numbers. Using mapcar as a list-specific map we might try to call it with an anonymous function that tests its argument for oddity and keeps them in such case:

CL-USER> (mapcar (lambda (x) (when (oddp x) x))
(range 1 10))
(1 NIL 3 NIL 5 NIL 7 NIL 9)

However, the problem is that non-odd numbers still have their place reserved in the result list, although it is not filled by them. Keeping only the results that satisfy (or don't) certain criteria and discarding the others is a very common pattern that is known as "filtering". There's a set of Lisp functions for such scenarios: remove, remove-if, and remove-if-not, as well as RUTILS' complements to them keep-if and keep-if-not. We can achieve the desired result adding remove to the picture:

CL-USER> (remove nil (mapcar (lambda (x) (when (oddp x) x))
(range 1 10)))
(1 3 5 7 9)

A more elegant solution will use the remove-if(-not) or keep-if(-not) variants. remove-if-not is the most popular among these functions. It takes a predicate and a sequence and returns the sequence of the same type holding only the elements that satisfy the predicate:

CL-USER> (remove-if-not 'oddp (range 1 10))
(1 3 5 7 9)

Using such high-level mapping functions is very convenient, which is why there's a number of other -if(-not) operations, like find(-if(-not)), member(-if(-not)), position(-if(-not)), etc.

The implementation of mapcar or any other list mapping function, including your own task-specific variants, follows the same pattern of traversing the list accumulating the result into another list and reversing it, in the end:

(defun simple-mapcar (fn list)
(let ((rez ()))
(dolist (item list)
(:= rez (cons (call fn item) rez)))
(reverse rez)))

The function cons is used to add an item to the beginning of the list. It creates a new list head that points to the previous list as its tail.

From the complexity point of view, if we compare such iteration with looping over an array we'll see that it is also a linear traversal that requires twice as many operations as with arrays because we need to traverse the result fully once again, in the end, to reverse it. Its advantage, though, is higher versatility: if we don't know the size of the resulting sequence (for example, in the case of remove-if-not) we don't have to change anything in this scheme and just add a filter line ((when (oddp item) ...), while for arrays we'd either need to use a dynamic array (that will need constant resizing and so have at least the same double number of operations) or pre-allocate the full-sized result sequence and then downsize it to fit the actual accumulated number of elements, which may be problematic when we deal with large arrays.

Lists as Functional Data Structures

The distinction between arrays and linked lists in many ways reflects the distinction between the imperative and functional programming paradigms. Within the imperative or, in this context, procedural approach, the program is built out of low-level blocks (conditionals, loops, and sequentials) that allow for the most fine-tuned and efficient implementation, at the expense of abstraction level and modularization capabilities. It also heavily utilizes in-place modification and manual resource management to keep overhead at a minimum. An array is the most suitable data-structure for such a way of programming. Functional programming, on the contrary, strives to bring the abstraction level higher, which may come at a cost of sacrificing efficiency (only when necessary, and, ideally, only for non-critical parts). Functional programs are built by combining referentially transparent computational procedures (aka "pure functions") that operate on more advanced data structures (either persistent ones or having special access semantics, e.g. transactional) that are also more expensive to manage but provide additional benefits.

Singly-linked lists are a simple example of functional data structures. A functional or persistent data structure is the one that doesn't allow in-place modification. In other words, to alter the contents of the structure a fresh copy with the desired changes should be created. The flexibility of linked data structures makes them suitable for serving as functional ones. We have seen the cons operation that is one of the earliest examples of non-destructive, i.e. functional, modification. This action prepends an element to the head of a list, and as we're dealing with the singly-linked list the original doesn't have to be updated: a new cons cell is added in front of it with its next pointer referencing the original list that becomes the new tail. This way, we can preserve both the pointer to the original head and add a new head. Such an approach is the basis for most of the functional data structures: the functional trees, for example, add a new head and a new route from the head to the newly added element, adding new nodes along the way — according to the same principle.

It is interesting, though, that lists can be used in destructive and non-destructive fashion likewise. There are both low- and high-level functions in Lisp that perform list modification, and their existence is justified by the use cases in many algorithms. Purely functional lists render many of the efficient list algorithms useless. One of the high-level list modification function is nconc. It concatenates two lists together updating in the process the next pointer of the last cons cell of the first list:

CL-USER> (let ((l1 (list 1 2 3))
(l2 (list 4 5 6)))
(nconc l1 l2) ; note no assignment to l1
l1) ; but it is still changed
(1 2 3 4 5 6)

There's a functional variant of this operation, append, and, in general, it is considered distasteful to use nconc for two reasons:

  • the risk of unwarranted modification
  • funny enough, the implementation of nconc, actually, isn't mandated to be more efficient than that of append

So, forget nconc, append all the lists!

Using append we'll need to modify the previous piece of code because otherwise the newly created list will be garbage-collected immediately:

CL-USER> (let ((l1 (list 1 2 3))
(l2 (list 4 5 6)))
(:= l1 (append l1 l2))
l1)
(1 2 3 4 5 6)

The low-level list modification operations are rplaca and rplacd. They can be combined with list-specific accessors nth and nthcdr that provide indexed access to list elements and tails respectively. Here's, for example, how to add an element in the middle of a list:

CL-USER> (let ((l1 (list 1 2 3)))
(rplacd (nthcdr 0 l1)
(cons 4 (nthcdr 1 l1)))
l1)
(1 4 2 3)

Just to re-iterate, although functional list operations are the default choice, for efficient implementation of some algorithms, you'll need to resort to the ugly destructive ones.

Different Kinds of Lists

We have, thus far, seen the most basic linked list variant — a singly-linked one. It has a number of limitations: for instance, it's impossible to traverse it from the end to the beginning. Yet, there are many algorithms that require accessing the list from both sides or do other things with it that are inefficient or even impossible with the singly-linked one, hence other, more advanced, list variants exist.

But first, let's consider an interesting tweak to the regular singly-linked list — a circular list. It can be created from the normal one by making the last cons cell point to the first. It may seem like a problematic data structure to work with, but all the potential issues with infinite looping while traversing it are solved if we keep a pointer to any node and stop iteration when we encounter this node for the second time. What's the use for such structure? Well, not so many, but there's a prominent one: the ring buffer. A ring or circular buffer is a structure that can hold a predefined number of items and each item is added to the next slot of the current item. This way, when the buffer is completely filled it will wrap around to the first element, which will be overwritten at the next modification. By our buffer-filling algorithm, the element to be overwritten is the one that was written the earliest for the current item set. Using a circular linked list is one of the simplest ways to implement such a buffer. Another approach would be to use an array of a certain size moving the pointer to the next item by incrementing an index into the array. Obviously, when the index reaches array size it should be reset to zero.

A more advanced list variant is a doubly-linked one, in which all the elements have both the next and previous pointers. The following definition, using inheritance, extends our original list-cell with a pointer to the previous element. Thanks to the basic object-oriented capabilities of structs, it will work with the current definition of our-own-list as well, and allow it to function as a doubly-linked list.

(defstruct (list-cell2 (:include list-cell))
prev)

Yet, we still haven't shown the implementation of the higher-level operations of adding and removing an element to/from our-own-list. Obviously, they will differ for singly- and doubly-linked lists, and that distinction will require us to differentiate the doubly-linked list types. That, in turn, will demand invocation of a rather heavy OO-machinery, which is beyond the subject of this book. Instead, for now, let's just examine the basic list addition function, for the doubly-linked list:

(defun our-cons2 (data list)
(when (null list) (:= list (make-our-own-list)))
(let ((new-head (make-list-cell2
:data data
:next (when list @list.head))))
(:= @list.head.prev new-head)
(make-our-own-list
:head new-head
:tail @list.tail
:size (1+ @list.size))))

The first thing to note is the use of the @ syntactic sugar, from RUTILS, that implements the mainstream dot notation for slot-value access (i.e. @list.head.prev refers to the prev field of the head field of the provided list structure of the assumed our-own-list type, which in a more classically Lispy, although cumbersome, variants may look like one of the following: (our-cons2-prev (our-own-list-head list)) or (slot-value (slot-value list 'head) 'prev)[2]).

More important here is that, unlike for the singly-linked list, this function requires an in-place modification of the head element of the original list: setting its prev pointer. Immediately making doubly-linked lists non-persistent.

Finally, the first line is the protection against trying to access the null list (that will result in a much-feared, especially in Java-land, null-pointer exception class of error).

At first sight, it may seem that doubly-linked lists are more useful than singly-linked ones. But they also have higher overhead so, in practice, they are used quite sporadically. We may see just a couple of use cases on the pages of this book. One of them is presented in the next part — a double-ended queue.

Besides doubly-linked, there are also association lists that serve as a variant of key-value data structures. At least 3 types may be found in Common Lisp code, and we'll briefly discuss them in the chapter on key-value structures. Finally, a skip list is a probabilistic data structure based on singly-linked lists, that allows for faster search, which we'll also discuss in a separate chapter on probabilistic structures. Other more esoteric list variants, such as self-organized list and XOR-list, may also be found in the literature — but very rarely, in practice.

FIFO & LIFO

The flexibility of lists allows them to serve as a common choice for implementing a number of popular abstract data structures.

Queue

A queue or FIFO has the following interface:

  • enqueue an item at the end
  • dequeue the first element: get it and remove it from the queue

It imposes a first-in-first-out (FIFO) ordering on the elements. A queue can be implemented directly with a singly-linked list like our-own-list. Obviously, it can also be built on top of a dynamic array but will require permanent expansion and contraction of the collection, which, as we already know, isn't the preferred scenario for their usage.

There are numerous uses for the queue structures for processing items in a certain order (some of which we'll see in further chapters of this book).

Stack

A stack or LIFO (last-in-first-out) is even simpler than a queue, and it is used even more widely. Its interface is:

  • push an item on top of the stack making it the first element
  • pop an item from the top: get it and remove it from the stack

A simple Lisp list can serve as a stack, and you can see such uses in almost every file with Lisp code. The most common pattern is result accumulation during iteration: using the stack interface, we can rewrite simple-mapcar in an even simpler way (which is idiomatic Lisp):

(defun simple-mapcar (fn list)
(let ((rez ()))
(dolist (item list)
(push (call fn item) rez))
(reverse rez)))

Stacks hold elements in reverse-chronological order and can thus be used to keep the history of changes to be able to undo them. This feature is used in procedure calling conventions by the compilers: there exists a separate segment of program memory called the Stack segment, and when a function call happens (beginning from the program's entry point called the main function in C) all of its arguments and local variables are put on this stack as well as the return address in the program code segment where the call was initiated. Such an approach allows for the existence of local variables that last only for the duration of the call and are referenced relative to the current stack head and not bound to some absolute position in memory like the global ones. After the procedure call returns, the stack is "unwound" and all the local data is forgotten returning the context to the same state in which it was before the call. Such stack-based history-keeping is a very common and useful pattern that may be utilized in userland code likewise.

Lisp itself also uses this trick to implement global variables with a capability to have context-dependent values through the extent of let blocks: each such variable also has a stack of values associated with it. This is one of the most underappreciated features of the Lisp language used quite often by experienced lispers. Here is a small example with a standard global variable (they are called special in Lisp parlance due to this special property) *standard-output* that stores a reference to the current output stream:

CL-USER> (print 1)
1
1
CL-USER> (let ((*standard-output* (make-broadcast-stream)))
(print 1))
1

In the first call to print, we see both the printed value and the returned one, while in the second — only the return value of the print function, while it's output is sent, effectively, to /dev/null.

Stacks can be also used to implement queues. We'll need two of them to do that: one will be used for enqueuing the items and another — for dequeuing. Here's the implementation:

(defstruct queue
head
tail)

(defun enqueue (item queue)
(push item @queue.head))

(defun dequeue (queue)
;; Here and in the next condition, we use the property that an empty list
;; is also logically false. This is discouraged by many Lisp style-guides,
;; but, in many cases, such code is not only more compact but also more clear.
(unless @queue.tail
(do ()
((null @queue.head)) ; this loop continues until head becomes empty
(push (pop @queue.head) @queue.tail)))
;; By pushing all the items from the head to the tail we reverse
;; their order — this is the second reversing that cancels the reversing
;; performed when we push the items to the head and it restores the original order.
(when @queue.tail
(values (pop @queue.tail)
t))) ; this second value is used to indicate that the queue was not empty

CL-USER> (let ((q (make-queue)))
(print q)
(enqueue 1 q)
(enqueue 2 q)
(enqueue 3 q)
(print q)
(print q)
(dequeue q)
(print q)
(enqueue 4 q)
(print q)
(dequeue q)
(print q)
(dequeue q)
(print q)
(dequeue q)
(print q)
(dequeue q))
#S(QUEUE :HEAD NIL :TAIL NIL)
#S(QUEUE :HEAD (3 2 1) :TAIL NIL)
#S(QUEUE :HEAD (3 2 1) :TAIL NIL)
#S(QUEUE :HEAD NIL :TAIL (2 3))
#S(QUEUE :HEAD (4) :TAIL (2 3))
#S(QUEUE :HEAD (4) :TAIL (3))
#S(QUEUE :HEAD (4) :TAIL NIL)
#S(QUEUE :HEAD NIL :TAIL NIL)
NIL ; no second value indicates that the queue is now empty

Such queue implementation still has O(1) operation times for enqueue/dequeue. Each element will experience exactly 4 operations: 2 pushs and 2 pops (for the head and tail).

Another stack-based structure is the stack with a minimum element, i.e. some structure that not only holds elements in LIFO order but also keeps track of the minimum among them. The challenge is that if we just add the min slot that holds the current minimum, when this minimum is popped out of the stack we'll need to examine all the remaining elements to find the new minimum. We can avoid this additional work by adding another stack — a stack of minimums. Now, each push and pop operation requires us to also check the head of this second stack and, in case the added/removed element is the minimum, push it to the stack of minimums or pop it from there, accordingly.

A well-known algorithm that illustrates stack usage is fully-parenthesized arithmetic expressions evaluation:

(defun arith-eval (expr)
"EXPR is a list of symbols that may include:
square brackets, arithmetic operations, and numbers."
(let ((ops ())
(vals ())
(op nil))
(dolist (item expr)
(case item
([ ) ; do nothing
((+ - * /) (push item ops))
(] (:= op (pop ops)
val (pop vals))
(case op
(+ (:+ val (pop vals)))
(- (:- val (pop vals)))
(* (:* val (pop vals)))
(/ (:/ val (pop vals))))
(push val vals))
(otherwise (push item vals))))
(pop vals)))

CL-USER> (arith-eval '([ 1 + [ [ 2 + 3 ] * [ 4 * 5 ] ] ] ]))
101

Deque

A deque is a short name for a double-ended queue, which can be traversed in both orders: FIFO and LIFO. It has 4 operations: push-front and push-back (also called shift), pop-front and pop-back (unshift). This structure may be implemented with a doubly-linked list or likewise a simple queue with 2 stacks. The difference for the 2-stacks implementation is that now items may be pushed back and forth between head and tail depending on the direction we're popping from, which results in worst-case linear complexity of such operations: when there's constant alteration of front and back directions.

The use case for such structure is the algorithm that utilizes both direct and reverse ordering: a classic example being job-stealing algorithms, when the main worker is processing the queue from the front, while other workers, when idle, may steal the lowest priority items from the back (to minimize the chance of a conflict for the same job).

Stacks in Action: SAX Parsing

Custom XML parsing is a common task for those who deal with different datasets, as many of them come in XML form, for example, Wikipedia and other Wikidata resources. There are two main approaches to XML parsing:

  • DOM parsing reads the whole document and creates its tree representation in memory. This technique is handy for small documents, but, for huge ones, such as the dump of Wikipedia, it will quickly fill all available memory. Also, dealing with the deep tree structure, if you want to extract only some specific pieces from it, is not very convenient.
  • SAX parsing is an alternative variant that uses the stream approach. The parser reads the document and, upon completing the processing of a particular part, invokes the relevant callback: what to do when an open tag is read, when a closed one, and with the contents of the current element. These actions happen for each tag, and we can think of the whole process as traversing the document tree utilizing the so-called "visitor pattern": when visiting each node we have a chance to react after the beginning, in the middle, and in the end.

Once you get used to SAX parsing, due to its simplicity, it becomes a tool of choice for processing XML, as well as JSON and other formats that allow for a similar stream parsing approach. Often the simplest parsing pattern is enough: remember the tag we're looking at, and when it matches a set of interesting tags, process its contents. However, sometimes, we need to make decisions based on the broader context. For example, let's say, we have the text marked up into paragraphs, which are split into sentences, which are, in turn, tokenized. To process such a three-level structure, with SAX parsing, we could use the following outline (utilizing CXML library primitives):

(defclass text-sax (sax:sax-parser-mixin)
((parags :initform nil :accessor sax-parags)
(parag :initform nil :accessor sax-parag)
(sent :initform nil :accessor sax-sent)
(tag :initform nil :accessor sax-tag)))

(defmethod sax:start-element ((sax text-sax)
namespace-uri local-name qname attrs)
(declare (ignore namespace-uri qname attrs))
(:= (sax-tag sax) (mkeyw local-name))

(defmethod sax:end-element ((sax text-sax)
namespace-uri local-name qname)
(declare (ignore namespace-uri qname))
(with-slots (tag parags sent) sax
(case tag
(:paragraph (push (reverse parag) parags)
(:= parag nil))
(:sentence (push (reverse sent) parag)
(:= sent nil)))))

(defmethod sax:characters ((sax text-sax) text)
(when (eql :token (sax-tag sax))
(push text (sax-sent sax)))

(defmethod sax:end-document ((sax text-sax))
(reverse (sax-parags sax)))

This code will return the accumulated structure of paragraphs from the sax:end-document method. And two stacks: the current sentence and the current paragraph are used to accumulate intermediate data while parsing. In a similar fashion, another stack of encountered tags might have been used to exactly track our position in the document tree if there were such necessity. Overall, the more you'll be using SAX parsing, the more you'll realize that stacks are enough to address 99% of the arising challenges.

Lists as Sets

Another very important abstract data structure is a Set. It is a collection that holds each element only once no matter how many times we add it there. This structure may be used in a variety of cases: when we need to track the items we have already seen and processed, when we want to calculate some relations between groups of elements,s and so forth.

Basically, its interface consists of set-theoretic operations:

  • add/remove an item
  • check whether an item is in the set
  • check whether a set is a subset of another set
  • union, intersection, difference, etc.

Sets have an interesting aspect that an efficient implementation of element-wise operations (add/remove/member) and set-wise (union/...) require the use of different concrete data-structures, so a choice should be made depending on the main use case. One way to implement sets is by using linked lists. Lisp has standard library support for this with the following functions:

  • adjoin to add an item to the list if it's not already there
  • member to check for item presence in the set
  • subsetp for subset relationship query
  • union, intersection, set-difference, and set-exclusive-or for set operations

This approach works well for small sets (up to tens of elements), but it is rather inefficient, in general. Adding an item to the set or checking for membership will require O(n) operations, while, in the hash-set (that we'll discuss in the chapter on key-value structures), these are O(1) operations. A naive implementation of union and other set-theoretic operations will require O(n^2) as we'll have to compare each element from one set with each one from the other. However, if our set lists are in sorted order set-theoretic operations can be implemented efficiently in just O(n) where n is the total number of elements in all sets, by performing a single linear scan over each set in parallel. Using a hash-set will also result in the same complexity.

Here is a simplified implementation of union for sets of numbers built on sorted lists:

(defun sorted-union (s1 s2)
(let ((rez ()))
(do ()
((and (null s1) (null s2)))
(let ((i1 (first s1))
(i2 (first s2)))
(cond ((null i1) (dolist (i2 s2)
(push i2 rez))
(return))
((null i2) (dolist (i1 s1)
(push i1 rez))
(return))
((= i1 i2) (push i1 rez)
(:= s1 (rest s1)
s2 (rest s2)))
((< i1 i2) (push i1 rez)
(:= s1 (rest s1)))
;; just T may be used instead
;; of the following condition
((> i1 i2) (push i2 rez)
(:= s2 (rest s2))))))
(reverse rez)))

CL-USER> (sorted-union '(1 2 3)
'(0 1 5 6))
(0 1 2 3 5 6)

This approach may be useful even for unsorted list-based sets as sorting is a merely O(n * log n) operation. Even better though, when the use case requires primarily set-theoretic operations on our sets and the number of changes/membership queries is comparatively low, the most efficient technique may be to keep the lists sorted at all times.

Merge Sort

Speaking about sorting, the algorithms we discussed for array sorting in the previous chapter do not work as efficient for lists for they are based on swap operations, which are O(n), in the list case. Thus, another approach is required, and there exist a number of efficient list sorting algorithms, the most prominent of which is Merge sort. It works by splitting the list into two equal parts until we get to trivial one-element lists and then merging the sorted lists into the bigger sorted ones. The merging procedure for sorted lists is efficient as we've seen in the previous example. A nice feature of such an approach is its stability, i.e. preservation of the original order of the equal elements, given the proper implementation of the merge procedure.

(defun merge-sort (list comp)
(if (null (rest list))
list
(let ((half (floor (length list) 2)))
(merge-lists (merge-sort (subseq seq 0 half) comp)
(merge-sort (subseq seq half) comp)
comp))))

(defun merge-lists (l1 l2 comp)
(let ((rez ())
(do ()
((and (null l1) (null l2)))
(let ((i1 (first l1))
(i2 (first l2)))
(cond ((null i1) (dolist (i l2)
(push i rez))
(return))
((null i2) (dolist (i l1)
(push i rez))
(return))
((call comp i1 i2) (push i1 rez)
(:= l1 (rest l1)))
(t (push i2 rez)
(:= l2 (rest l2))))))
(reverse rez)))

The same complexity analysis as for binary search applies to this algorithm. At each level of the recursion tree, we perform O(n) operations: each element is pushed into the resulting list once, reversed once, and there are at most 4 comparison operations: 3 null checks and 1 call of the comp function. We also need to perform one copy per element in the subseq operation and take the length of the list (although it can be memorized and passed down as the function call argument) on the recursive descent. This totals to not more than 10 operations per element, which is a constant. And the height of the tree is, as we already know, (log n 2). So, the total complexity is O(n * log n).

Let's now measure the real time needed for such sorting, and let's compare it to the time of prod-sort (with optimal array accessors) from the Arrays chapter:

CL-USER> (with ((lst (random-list 10000))
(vec (make-array 10000 :initial-contents lst)))
(print-sort-timings "Prod" 'prod-sort vec)
(print-sort-timings "Merge " 'merge-sort lst))
= Prodsort of random vector =
Evaluation took:
0.048 seconds of real time
= Prodsort of sorted vector =
Evaluation took:
0.032 seconds of real time
= Prodsort of reverse sorted vector =
Evaluation took:
0.044 seconds of real time
= Merge sort of random vector =
Evaluation took:
0.007 seconds of real time
= Merge sort of sorted vector =
Evaluation took:
0.007 seconds of real time
= Merge sort of reverse sorted vector =
Evaluation took:
0.008 seconds of real time

Interestingly enough, Merge sort turned out to be around 5 times faster, although it seems that the number of operations required at each level of recursion is at least 2-3 times bigger than for quicksort. Why we got such result is left as an exercise to the reader: I'd start from profiling the function calls and looking where most of the time is wasted...

It should be apparent that the merge-lists procedure works in a similar way to set-theoretic operations on sorted lists that we've discussed in the previous part. It is, in fact, provided in the Lisp standard library. Using the standard merge, Merge sort may be written in a completely functional and also generic way to support any kind of sequences:

(defun merge-sort (seq comp)
(if (or (null seq) ; avoid expensive length calculation for lists
(<= (length seq) 1))
seq
(let ((half (floor (length seq) 2)))
(merge (type-of seq)
(merge-sort (subseq seq 0 half) comp)
(merge-sort (subseq seq half) comp)
comp))))

There's still one substantial difference of Merge sort from the array sorting functions: it is not in-place. So it also requires the O(n * log n) additional space to hold the half sublists that are produced at each iteration. Sorting and merging them in-place is not possible. There are ways to somewhat reduce this extra space usage but not totally eliminate it.

Parallelization of Merge Sort

The extra-space drawback of Merge sort may, however, turn irrelevant if we consider the problem of parallelizing this procedure. The general idea of parallelized implementation of any algorithm is to split the work in a way that allows reducing the runtime proportional to the number of workers performing those jobs. In the ideal case, if we have m workers and are able to spread the work evenly the running time should be reduced by a factor of m. For the Merge sort, it will mean just O(n/m * log n). Such ideal reduction is not always achievable, though, because often there are bottlenecks in the algorithm that require all or some workers to wait for one of them to complete its job.

Here's a trivial parallel Merge sort implementation that uses the eager-future2 library, which adds high-level data parallelism capabilities based on the Lisp implementation's multithreading facilities:

(defun parallel-merge-sort (seq comp)
(if (or (null seq) (<= (length seq) 1))
seq
(with ((half (floor (length seq) 2))
(thread1 (eager-future2:pexec
(merge-sort (subseq seq 0 half) comp)))
(thread2 (eager-future2:pexec
(merge-sort (subseq seq half) comp))))
(merge (type-of seq)
(eager-future2:yield thread1)
(eager-future2:yield thread2)
comp))))

The eager-future2:pexec procedure submits each merge-sort to the thread pool that manages multiple CPU threads available in the system and continues program execution not waiting for it to return. While eager-future2:yield pauses execution until the thread performing the appropriate merge-sort returns.

When I ran our testing function with both serial and parallel merge sorts on my machine, with 4 CPUs, I got the following result:

CL-USER> (with ((lst1 (random-list 10000))
(lst2 (copy-list lst1)))
(print-sort-timings "Merge " 'merge-sort lst1)
(print-sort-timings "Parallel Merge " 'parallel-merge-sort lst2))
= Merge sort of random vector =
Evaluation took:
0.007 seconds of real time
114.29% CPU
= Merge sort of sorted vector =
Evaluation took:
0.006 seconds of real time
116.67% CPU
= Merge sort of reverse sorted vector =
Evaluation took:
0.007 seconds of real time
114.29% CPU
= Parallel Merge sort of random vector =
Evaluation took:
0.003 seconds of real time
266.67% CPU
= Parallel Merge sort of sorted vector =
Evaluation took:
0.003 seconds of real time
266.67% CPU
= Parallel Merge sort of reverse sorted vector =
Evaluation took:
0.005 seconds of real time
220.00% CPU

A speedup of approximately 2x, which is also reflected by the rise in CPU utilization from around 100% (i.e. 1 CPU) to 250%. These are correct numbers as the merge procedure is still executed serially and remains the bottleneck. There are more sophisticated ways to achieve optimal m times speedup, in Merge sort parallelization, but we won't discuss them here due to their complexity.

Lists and Lisp

Historically, Lisp's name originated as an abbreviation of "List Processing", which points both to the significance that lists played in the language's early development and also to the fact that flexibility (a major feature of lists) was always a cornerstone of its design. Why are lists important to Lisp? Maybe, originally, it was connected with the availability and the good support of this data structure in the language itself. But, quickly, the focus shifted to the fact that, unlike other languages, Lisp code is input in the compiler not in a custom string-based format but in the form of nested lists that directly represent the syntax tree. Coupled with superior support for the list data structure, it opens numerous possibilities for programmatic processing of the code itself, which are manifest in the macro system, code walkers and generators, etc. So, "List Processing" turns out to be not about lists of data, but about lists of code, which perfectly describes the main distinctive feature of this language...


Footnotes:

[1] While, in the Lisp machines, cons cells even had special hardware support, and such change would have made it useless.

[2] Although, for structs, it is implementation-dependent if this will work. In the major implementations, it will.

Quicklisp newsAugust 2019 Quicklisp dist update now available

· 71 days ago
New projects:
  • cardiogram — Simple test framework — MIT
  • cesdi — Provides a compute-effective-slot-definition-initargs generic function that allows for more ergonomic initialization of effective slot definition objects. — Unlicense
  • chameleon — Configuration management facilities for Common Lisp with multiple profile support. — MIT
  • ci-utils — A set of tools for using CI platforms — MIT
  • cl-clsparse — Common Lisp bindings for clSPARSE — Apache License, Version 2.0
  • cl-flat-tree — A flat-tree implementation in Common Lisp. — MIT
  • cl-shlex — Lexical analyzer for simple shell-like syntax. — MIT
  • datamuse — Common Lisp library for accessing the Datamuse word-finding API — MIT
  • date-calc — Package for simple date calculation — GPL or Artistic
  • linear-programming — A library for solving linear programming problems — MIT
  • orizuru-orm — An ORM for Common Lisp and PostgreSQL. — GPLv3
  • stripe — A client for the Stripe payment API. — MIT
  • trivial-extensible-sequences — Portability library for the extensible sequences protocol. — zlib
  • trivial-package-local-nicknames — Portability library for package-local nicknames — Public domain
Updated projectsalso-alsaaprilbikebinary-iobinfixblack-tiecavemancl+sslcl-collidercl-digikar-utilitiescl-fadcl-geocodecl-hamcrestcl-ipfs-api2cl-ledgercl-marklesscl-mssqlcl-patternscl-permutationcl-pythoncl-readlinecl-satcl-sat.glucosecl-sat.minisatcl-sqlitecl-strcl-tiledclackclack-errorscloser-mopclxcoleslawcroatoandatum-commentseasy-routeseclectorecoenvyfast-websocketfemlispfloat-featuresflowgendlgolden-utilsgraphhttp-bodyjsownkenzolucernemagiclmatlispmcclimmitooriginparser.common-rulespetalisppjlinkportableaservepostmodernproc-parsepy4clquilcquriqvmreplicroverpcqrutilssc-extensionsscalplselserapeumsimplified-typesslyspinneretstaplestumpwmtrivial-continuationtrivial-left-padtrivial-monitored-threadtrivial-object-locktrivial-pooled-databaseuri-templateutilities.print-itemswoo.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

Vsevolod DyomkinProgramming Algorithms: Arrays

· 72 days ago

Arrays are, alongside structs, the most basic data structure and, at the same time, the default choice for implementing algorithms. A one-dimensional array that is also called a "vector" is a contiguous structure consisting of the elements of the same type. One of the ways to create such arrays, in Lisp, is this:

CL-USER> (make-array 3)
#(0 0 0)

The printed result is the literal array representation. It happens that the array is shown to hold 0's, but that's implementation-dependent. Additional specifics can be set during array initialization: for instance, the :element-type, :initial-element, and even full contents:

CL-USER> (make-array 3 :element-type 'list :initial-element nil)
#(NIL NIL NIL)
CL-USER> (make-array 3 :initial-contents '(1.0 2.0 3.0))
#(1.0 2.0 3.0)

If you read back such an array you'll get a new copy with the same contents:

CL-USER> #(1.0 2.0 3.0)
#(1.0 2.0 3.0)

It is worth noting that the element type restriction is, in fact, not a limitation the default type is T[1]. In this case, the array will just hold pointers to its elements that can be of arbitrary type. If we specify a more precise type, however, the compiler might be able to optimize storage and access by putting the elements in memory directly in the array space. This is, mainly, useful for numeric arrays, but it makes multiple orders of magnitude difference for them for several reasons, including the existence of vector CPU instructions that operate on such arrays.

The arrays we have created are mutable, i.e. we can change their contents, although we cannot resize them. The main operator to access array elements is aref. You will see it in those pieces of code, in this chapter, where we care about performance.

CL-USER> (let ((vec (make-array 3 :initial-contents '(1.0 2.0 3.0))))
(print (aref vec 0))
(print (? vec 1))
(:= (aref vec 2) 4.0))
(print (? vec 2))
(aref vec 3))
1.0
2.0
4.0
; Evaluation aborted on #<SIMPLE-TYPE-ERROR expected-type: (MOD 3) datum: 3>

In Lisp, array access beyond its boundary, as expected, causes an error.

It is also possible to create constant arrays using the literal notation #(). These constants can, actually, be changed in some environments, but don't expect anything nice to come out of such abuse — and the compiler will warn you of that:

CL-USER> (let ((vec #(1.0 2.0 3.0)))
(:= (aref vec 2) nil)
(print vec))
; caught WARNING:
; Destructive function (SETF AREF) called on constant data.
; See also:
; The ANSI Standard, Special Operator QUOTE
; The ANSI Standard, Section 3.2.2.3
;
; compilation unit finished
; caught 1 WARNING condition

#(1.0 2.0 NIL)

RUTILS provides more options to easily create arrays with a shorthand notation:

CL-USER> #v(1 2 3)
#(1 2 3)
CL-USER> (vec 1 2 3)
#(1 2 3)

Although the results seem identical they aren't. The first version creates a mutable analog of #(1 2 3), and the second also makes it adjustable (we'll discuss adjustable or dynamic arrays next).

Arrays as Sequences

Vectors are one of the representatives of the abstract sequence container type that has the following basic interface:

  • inquire the length of a sequence — performed in Lisp using the function length
  • access the element by index — the RUTILS ? operator is the most generic variant while the native one for arrays is aref and a more general elt, for all built-in sequences (this also includes lists and, in some implementations, user-defined, so-called, extensible sequences)
  • get the subsequence — the standard provides the function subseq for this purpose

These methods have some specific that you should mind:

  • the length function, for arrays, works in O(1) time as length is tracked in the array structure. There is an alternative (more primitive) way to handle arrays, employed, primarily, in C when the length is not stored, and, instead, there's a special termination "symbol" that indicates the end of an array. For instance, C strings have a '\0' termination character, and arrays representing command-line arguments, in the Unix syscalls API for such functions as exec, are terminated with null-pointers. Such an approach is, first of all, not efficient from the algorithmic point of view as it requires O(n) time to query the array's length. But, what's even more important, it has proven to be a source of a number of catastrophic security vulnerabilities — the venerable "buffer overflow" family of errors
  • the subseq function creates a copy of the part of its argument, which is an expensive operation. This is the functional approach that is a proper default, but many of the algorithms don't involve subarray mutation, and, for them, a more efficient variant would be to use a shared-structure variant that doesn't make a copy but merely returns a pointer into the original array. Such option is provided, in the Lisp standard, via the so-called displaced arrays, but it is somewhat cumbersome to use, that's why a more straightforward version is present in RUTILS which is named slice
CL-USER> (with ((vec (vec 1 2 3))
(part (slice vec 2)))
(print part)
(:= (? part 0) 4)
(print part)
vec)

#(3)
#(4)
#(1 2 4)

Beyond the basic operations, sequences in Lisp are the target of a number of higher-order functions, such as find, position, remove-if etc. We'll get back to discussing their use later in the book.

Dynamic Vectors

Let's examine arrays from the point of view of algorithmic complexity. General-purpose data structures are usually compared by their performance on several common operations and, also, space requirements. These common operations are: access, insertion, deletion, and, sometimes, search.

In the case of ordinary arrays, the space used is the minimum possible: almost no overhead is incurred except, perhaps, for some meta-information about array size. Array element access is performed by index in constant time because it's just an offset from the beginning that is the product of index by the size of a single element. Search for an element requires a linear scan of the whole array or, in the special case of a sorted array, it can be done in O(log n) using binary search.

Insertion (at the end of an array) and deletion with arrays is problematic, though. Basic arrays are static, i.e. they can't be expanded or shrunk at will. The case of expansion requires free space after the end of the array that isn't generally available (because it's already occupied by other data used by the program) so it means that the whole array needs to be relocated to another place in memory with sufficient space. Shrinking is possible, but it still requires relocation of the elements following the deleted one. Hence, both of these operations require O(n) time and may also cause memory fragmentation. This is a major drawback of arrays.

However, arrays definitely should be the default choice for most algorithms. Why? First of all, because of the other excellent properties arrays provide and also because, in many cases, lack of flexibility can be circumvented in a certain manner. One common example is iteration with accumulation of results in a sequence. This is often performed with the help of a stack (as a rule, implemented with a linked list), but, in many cases (especially, when the length of the result is known beforehand), arrays may be used to the same effect. Another approach is using dynamic arrays, which add array resizing capabilities. And only in the case when an algorithm requires contiguous manipulation (insertion and deletion) of a collection of items or other advanced flexibility, linked data structures are preferred.

So, the first approach to working around the static nature of arrays is possible when we know the target number of elements. For instance, the most common pattern of sequence processing is to map a function over it, which produces the new sequence of the same size filled with results of applying the function to each element of the original sequence. With arrays, it can be performed even more efficiently than with a list. We just need to pre-allocate the resulting vector and set its elements one by one as we process the input:

(defun map-vec (fn vec)
"Map function FN over each element of VEC
and return the new vector with the results."
(let ((rez (make-array (length vec))))
(dotimes (i (length vec))
(:= (aref rez i) (call fn (aref vec i))))
rez))

CL-USER> (map-vec '1+ #(1 2 3))
#(2 3 4)

We use a specific accessor aref here instead of generic ? to ensure efficient operation in the so-called "inner loop" — although, there's just one loop here, but it will be the inner loop of many complex algorithms.

However, in some cases we don't know the size of the result beforehand. For instance, another popular sequence processing function is called filter or remove-if(-not) in Lisp. It iterates over the sequence and keeps only elements that satisfy/don't satisfy a certain predicate. It is, generally, unknown how many elements will remain, so we can't predict the size of the resulting array. One solution will be to allocate the full-sized array and fill only so many cells as needed. It is a viable approach although suboptimal. Filling the result array can be performed by tracking the current index in it or, in Lisp, by using an array with a fill-pointer:

(defun clumsy-filter-vec (pred vec)
"Return the vector with only those elements of VEC
for which calling pred returns true."
(let ((rez (make-array (length vec) :fill-pointer t)))
(dotimes (i (length vec))
(when (call pred (aref vec i))
(vector-push (aref vec i) rez)))
rez))

CL-USER> (describe (clumsy-filter-vec 'oddp #(1 2 3)))
#(1 3)
[vector]
Element-type: T
Fill-pointer: 2
Size: 3
Adjustable: yes
Displaced-to: NIL
Displaced-offset: 0
Storage vector: #<(SIMPLE-VECTOR 3) {100E9AF30F}>

Another, more general way, would be to use a "dynamic vector". This is a kind of an array that supports insertion by automatically expanding its size (usually, not one element at a time but proportionally to the current size of the array). Here is how it works:

CL-USER> (let ((vec (make-array 0 :fill-pointer t :adjustable t)))
(dotimes (i 10)
(vector-push-extend i vec)
(describe vec)))
#(0)
[vector]
Element-type: T
Fill-pointer: 1
Size: 1
Adjustable: yes
Displaced-to: NIL
Displaced-offset: 0
Storage vector: #<(SIMPLE-VECTOR 1) {100ED9238F}>

#(0 1)
Fill-pointer: 2
Size: 3

#(0 1 2)
Fill-pointer: 3
Size: 3

#(0 1 2 3)
Element-type: T
Fill-pointer: 4
Size: 7

...

#(0 1 2 3 4 5 6 7)
Fill-pointer: 8
Size: 15

#(0 1 2 3 4 5 6 7 8)
Element-type: T
Fill-pointer: 9
Size: 15

#(0 1 2 3 4 5 6 7 8 9)
Element-type: T
Fill-pointer: 10
Size: 15

For such "smart" arrays the complexity of insertion of an element becomes asymptotically constant: resizing and moving elements happens less and less often the more elements are added. With a large number of elements, this comes at a cost of a lot of wasted space, though. At the same time, when the number of elements is small (below 20), it happens often enough, so that the performance is worse than for a linked list that requires a constant number of 2 operations for each insertion (or 1 if we don't care to preserve the order). So, dynamic vectors are the solution that can be used efficiently only when the number of elements is neither too big nor too small.

Why Are Arrays Indexed from 0

Although most programmers are used to it, not everyone understands clearly why the choice was made, in most programming languages, for 0-based array indexing. Indeed, there are several languages that prefer a 1-based variant (for instance, MATLAB and Lua). This is quite a deep and yet very practical issue that several notable computer scientists, including Dijkstra, have contributed to.

At first glance, it is "natural" to expect the first element of a sequence to be indexed with 1, second — with 2, etc. This means that if we have a subsequence from the first element to the tenth it will have the beginning index 1 and the ending — 10, i.e. be a closed interval also called a segment: [1, 10]. The cons of this approach are the following:

  1. It is more straightforward to work with half-open intervals (i.e. the ones that don't include the ending index): especially, it is much more convenient to split and merge such intervals, and, also, test for membership. With 0-based indexing, our example interval would be half-open: [0, 10).

  2. If we consider multi-dimensional arrays that are most often represented using one-dimensional ones, getting an element of a matrix with indices i and j translates to accessing the element of an underlying vector with an index i*w + j or i + j*h for 0-based arrays, while for 1-based ones, it's more cumbersome: (i-1)*w + j. And if we consider 3-dimensional arrays (tensors), we'll still get the obvious i*w*h + j*h + k formula, for 0-based arrays, and, maybe, (i-1)*w*h + (j-1)*h + k for 1-based ones, although I'm not, actually, sure if it's correct (which shows how such calculations quickly become untractable). Besides, multi-dimensional array operations that are much more complex than mere indexing also often occur in many practical tasks, and they are also more complex and thus error-prone with base 1.

There are other arguments, but I consider them to be much more minor and a matter of taste and convenience. However, the intervals and multi-dimensional arrays issues are quite serious. And here is a good place to quote one of my favorite anecdotes that there are two hard problems in CS: cache invalidation and naming things,.. and off-by-one errors. Arithmetic errors with indexing are a very nasty kind of bug, and although it can't be avoided altogether 0-based indexing turns out to be a much more balanced solution.

Now, using 0-based indexing, let's write down the formula for finding the middle element of an array. Usually, it is chosen to be (floor (length array) 2). This element will divide the array into two parts, left and right, each one having length at least (1- (floor (length array) 2): the left part will always have such size and will not include the middle element. The right side will start from the middle element and will have the same size if the total number of array elements is even or be one element larger if it is odd.

Multi-Dimensional Arrays

So far, we have only discussed one-dimensional arrays. However, more complex data-structures can be represented using simple arrays. The most obvious example of such structures is multi-dimensional arrays. There's a staggering variety of other structures that can be built on top of arrays, such as binary (or, in fact, any n-ary) trees, hash-tables, and graphs, to name a few. If we have a chance to implement the data structure on an array, usually, we should not hesitate to take it as it will result in constant access time, good cache locality contributing to faster processing and, in most cases, efficient space usage.

Multi-dimensional arrays are a contiguous data-structure that stores its elements so that, given the coordinates of an element in all dimensions, it can be retrieved according to a known formula. Such arrays are also called tensors, and in case of 2-dimensional arrays — matrices. We have already seen one matrix example in the discussion of complexity:

#2A((1 2 3)
(4 5 6))

A matrix has rows (first dimension) and columns (second dimension). Accordingly, the elements of a matrix may be stored in the row-major or column-major order. In row-major, the elements are placed row after row — just like on this picture, i.e., the memory will contain the sequence: 1 2 3 4 5 6. In column-major order, they are stored by column (this approach is used in many "mathematical" languages, such as Fortran or MATLAB), so raw memory will look like this: 1 4 2 5 3 6. If row-major order is used the formula to access the element with coordinates i (row) and j (column) is: (+ (* i n) j) where n is the length of the matrix's row, i.e. its width. In the case of column-major order, it is: (+ i (* j m)) where m is the matrix's height. It is necessary to know, which storage style is used in a particular language as in numeric computing it is common to intermix libraries written in many languages — C, Fortran, and others — and, in the process, incompatible representations may clash.[2]

Such matrix representation is the most obvious one, but it's not exclusive. Many languages, including Java, use iliffe vectors to represent multi-dimensional arrays. These are vectors of vectors, i.e. each matrix row is stored in a separate 1-dimensional array, and the matrix is the vector of such vectors. Besides, more specific multi-dimensional arrays, such as sparse or diagonal matrices, may be represented using more efficient storage techniques at the expense of a possible loss in access speed. Higher-order tensors may also be implemented with the described approaches.

One classic example of operations on multi-dimensional arrays is matrix multiplication. The simple straightforward algorithm below has the complexity of O(n^3) where n is the matrix dimension. The condition for successful multiplication is equality of height of the first marix and width of the second one. The cubic complexity is due to 3 loops: by the outer dimensions of each matrix and by the inner identical dimension.

(defun m* (m1 m2)
(let ((n (array-dimension m1 1))
(n1 (array-dimension m1 0))
(n2 (array-dimension m2 1))
(rez (make-array (list n1 n2))))
(assert (= n (array-dimension m2 1)))
(dotimes (i n1)
(dotimes (j n2)
(let ((cur 0))
(dotimes (k n)
;; :+ is the incrementing analog of :=
(:+ cur (* (aref m1 i k)
(aref m2 k j))))
(:= (aref rez i j) cur))))
rez))

There are more efficient albeit much more complex versions using divide-and-conquer approach that can work in only O(n^2.37), but they have significant hidden constants and, that's why, are rarely used in practice, although if you're relying on an established library for matrix operations, such as the Fortran-based BLAS/ATLAS, you will find one of them under-the-hood.

Binary Search

Now, let's talk about some of the important and instructive array algorithms. The most prominent ones are searching and sorting.

A common sequence operation is searching for the element either to determine if it is present, to get its position or to retrieve the object that has a certain property (key-based search). The simple way to search for an element in Lisp is using the function find:

CL-USER> (let ((vec #v((pair :foo :bar) (pair :baz :quux))))
(print (find (pair :foo :bar) vec))
(print (find (pair :foo :bar) vec :test 'equal))
(print (find (pair :bar :baz) vec :test 'equal))
(print (find :foo vec :key 'lt)))
NIL
(:FOO :BAR)
NIL
(:FOO :BAR)

In the first case, the element was not found due to the wrong comparison predicate: the default eql will only consider to structures the same if it's the same object, and, in this case, there will be two separate pairs with the same content. So, the second search is successful as equal performs deep comparison. Then the element is not found as it is just not present. And, in the last case, we did the key-based search looking just at the lt element of all pairs in vec.

Such search is called sequential scan because it is performed in a sequential manner over all elements of the vector starting from the beginning (or end if we specify :from-end t) until either the element is found or we have examined all the elements. The complexity of such search is, obviously, O(n), i.e. we need to access each element of the collection (if the element is present we'll look, on average, at n/2 elements, and if not present — always at all n elements).

However, if we know that our sequence is sorted, we can perform the search much faster. The algorithm used for that is one of the most famous algorithms that every programmer has to know and use, from time to time — binary search. The more general idea behind it is called "divide and conquer": if there's some way, looking at one element, to determine the outcome of our global operation for more than just this element we can discard the part, for which we already know that the outcome is negative. In binary search, when we're looking at an arbitrary element of the sorted vector and compare it with the item we search for:

  • if the element is the same we have found it
  • if it's smaller all the previous elements are also smaller and thus uninteresting to us — we need to look only on the subsequent ones
  • if it's greater all the following elements are not interesting

Thus, each time we can examine the middle element and, after that, can discard half of the elements of the array without checking them. We can repeat such comparisons and halving until the resulting array contains just a single element.

Here's the straightforward binary search implementation using recursion:

(defun bin-search (val vec &optional (pos 0))
(if (> (length vec) 1)
(with ((mid (floor (+ beg end) 2))
(cur (aref vec mid)))
(cond ((< cur val) (bin-search val
(slice vec (1+ mid))
(+ pos mid 1)))
((> cur val) (bin-search val
(slice vec 0 (1+ mid))
pos))
(t (+ pos mid))))
(when (= (aref vec 0) val)
pos)))

If the middle element differs from the one we're looking for it halves the vector until just one element remains. If the element is found its position (which is passed as an optional 3rd argument to the recursive function) is returned. Note that we assume that the array is sorted. Generally, there's no way to quickly check this property unless we examine all array elements (and thus lose all the benefits of binary search). That's why we don't assert the property in any way and just trust the programmer :)

An important observation is that such recursion is very similar to a loop that at each stage changes the boundaries we're looking in-between. Not every recursive function can be matched with a similar loop so easily (for instance, when there are multiple recursive calls in its body an additional memory data structure is needed), but when it is possible it usually makes sense to choose the loop variant. The pros of looping is the avoidance of both the function calls' overhead and the danger of hitting the recursion limit or the stack overflow associated with it. While the pros of recursion are simpler code and better debuggability that comes with the possibility to examine each iteration by tracing using the built-in tools.

Another thing to note is interesting counter-intuitive arithmetic of additional comparisons. In our naive approach, we had 3 cond clauses, i.e. up to 2 comparisons to make at each iteration. In total, we'll look at (log n 2) elements of our array, so we have no more than (/ (1- (log n 2)) n) chance to match the element with the = comparison before we get to inspect the final 1-element array. I.e. with the probability of (- 1 (/ (1- (log n 2)) n)) we'll have to make all the comparisons up to the final one. Even for such small n as 10 this probability is 0.77 and for 100 — 0.94. And this is an optimistic estimate for the case when the element searched for is actually present in the array, which may not always be so. Otherwise, we'll have to make all the comparisons. Effectively, these numbers prove the equality comparison meaningless and just a waste of computation, although from "normal" programmer intuition it might seem like a good idea to implement early exit in this situation...

Finally, there's also one famous non-obvious bug associated with the binary search that was still present in many production implementations, for many years past the algorithm's inception. It's also a good example of the dangers of forfeiting boundary conditions check that is the root of many severe problems plaguing our computer systems by opening them to various exploits. The problem, in our code, may manifest in systems that have limited integer arithmetic with potential overflow. In Lisp, if the result of summing two fixnums is greater than most-positive-fixnum (the maximum number that can be represented directly by the machine word) it will be automatically converted to bignums, which are a slower representation but with unlimited precision:

CL-USER> most-positive-fixnum
4611686018427387903
CL-USER> (type-of most-positive-fixnum)
(INTEGER 0 4611686018427387903)
CL-USER> (+ most-positive-fixnum most-positive-fixnum)
9223372036854775806
CL-USER> (type-of (+ most-positive-fixnum most-positive-fixnum))
(INTEGER 4611686018427387904)

In many other languages, such as C or Java, what will happen is either silent overflow (the worst), in which case we'll get just the remainder of division of the result by the maximum integer, or an overflow error. Both of these situations are not accounted for in the (floor (+ beg end) 2) line. The simple fix to this problem, which makes sense to keep in mind for future similar situations, is to change the computation to the following equivalent form: (+ beg (floor (- end beg) 2)). It will never overflow. Why? Try to figure out on your own ;)

Taking all that into account and allowing for a custom comparator function, here's an "optimized" version of binary search that returns 3 values:

  • the final element of the array
  • its position
  • has it, actually, matched the element we were searching for?
(defun bin-search (val vec &key (less '<) (test '=) (key 'identity))
(when (plusp (length vec))
(let ((beg 0)
(end (length vec)))
(do ()
((= beg end))
(let ((mid (+ beg (floor (- end beg) 2))))
(if (call less (call key (aref vec mid)) val)
(:= beg (1+ mid))
(:= end mid))))
(values (aref vec beg)
beg
(call test (call key (aref vec beg)) val)))))

How many loop iterations do we need to complete the search? If we were to take the final one-element array and expand the array from it by adding the discarded half it would double in size at each step, i.e. we'll be raising 2 to the power of the number of expansion iterations (initially, before expansion — after 0 iterations — we have 1 element, which is 2^0, after 1 iteration, we have 2 elements, after 2 — 4, and so on). The number of iterations needed to expand the full array may be calculated by the inverse of exponentiation — the logarithmic function. I.e. we'll need (log n 2) iterations (where n is the initial array size). Shrinking the array takes the same as expanding, just in the opposite order, so the complexity of binary search is O(log n).

How big is the speedup from linear to logarithmic complexity? Let's do a quick-and-dirty speed comparison between the built-in (and optimized) sequential scan fucntion find and our bin-search:

CL-USER> (with ((size 100000000)
(mid (1+ (/ size 2)))
(vec (make-array size)))
(dotimes (i size)
(:= (? vec i) i))
(time (find mid vec))
(time (bin-search mid vec)))
Evaluation took:
0.591 seconds of real time
0.595787 seconds of total run time (0.595787 user, 0.000000 system)
100.85% CPU
...
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
...

Unfortunately, I don't have enough RAM on my notebook to make bin-search take at least a millisecond of CPU time. We can count nanoseconds to get the exact difference, but a good number to remember is that (log 1000000 2) is approximately 20, so, for the million elements array, the speedup will be 50000x!

The crucial limitation of binary search is that it requires our sequence to be pre-sorted because sorting before each search already requires at least linear time to complete, which kills any performance benefit we might have expected. There are multiple situations when the pre-sort condition may hold without our intervention:

  • all the data is known beforehand and we can sort it just once prior to running the search, which may be repeated multiple times for different values
  • we maintain the sorted order as we add data. Such an approach is feasible only if addition is performed less frequently than search. This is often the case with databases, which store their indices in sorted order

A final note on binary search: obviously, it will only work fast for vectors and not linked sequences.

Binary Search in Action

In one consumer internet company I was working for, a lot of text processing (which was the company's bread-and-butter) relied on access to a huge statistical dataset called "ngrams". Ngrams is a simple Natural Language Processing concept: basically, they are phrases of a certain length. A unigram (1gram) is a single word, a bigram — a pair of words, a fivegram — a list of 5 words. Each ngram has some weight associated with it, which is calculated (estimated) from the huge corpus of texts (we used the crawl of the whole Internet). There are numerous ways to estimate this weight, but the basic one is to just count the frequency of the occurance of a specific ngram phrase in the corpus.

The total number of ngrams may be huge: for our case, the whole dataset, on disk, measured in tens of gigabytes. And the application requires constant random access to it. Using an off-the-shelf database would have incurred us too much overhead as such systems are general-purpose and don't optimize for the particular use cases, like the one we had. So, a special-purpose solution was needed. In fact, now there is readily-available ngrams handling software, such as KenLM. We have built our own, and, initially, it relied on binary search of the in-memory dataset to answer the queries. Considering the size of the data, what do you think was the number of operations required? I don't remember it exactly, but somewhere between 25 and 30. For handling tens of gigabytes or hundreds of millions/billions of ngrams — quite a decent result. And, most important, it didn't exceed our application's latency limits! The key property that enabled such solution was the fact that all the ngrams were known beforehand and hence the dataset could be pre-sorted. Yet, eventually, we moved to an even faster solution based on perfect hash-tables (that we'll discuss later in this book).

One more interesting property of this program was that it took significant time to initialize as all the data had to be loaded into memory from disk. During that time, which measured in several dozens of minutes, the application was not available, which created a serious bottleneck in the whole system and complicated updates as well as put normal operation at additional risk. The solution we utilized to counteract this was also a common one for such cases: lazy loading in memory using the Unix mmap facility.

Sorting

Sorting is another fundamental sequence operation that has many applications. Unlike searching, the sorted sequence, there is no single optimal algorithm for sorting, and different data structures allow different approaches to it. In general, the problem of sorting a sequence is to place all of its elements in a certain order determined by the comparison predicate. There are several aspects that differentiate sorting functions:

  • in-place sorting is a destructive operation, but it is often desired because it may be faster and also it preserves space (especially relevant when sorting big amounts of data at once). The alternative is copying sort
  • stable: whether 2 elements, which are considered the same by the predicate, retain their original order or may be shuffled
  • online: does the function require to see the whole sequence before starting the sorting process or can it work with each element one-by-one always preserving the result of processing the already seen part of the sequence in the sorted order

One more aspect of a particular sorting algorithm is its behavior on several special kinds of input data: already sorted (in direct and reversed order), almost sorted, completely random. An ideal algorithm should show better than average performance (up to O(1)) on the sorted and almost sorted special cases.

Over the history of CS, sorting was and still remains a popular research topic. Not surprisingly, several dozens of different sorting algorithms were developed. But before discussing the prominent ones, let's talk about "Stupid sort" (or "Bogosort"). It is one of the sorting algorithms that has a very simple idea behind, but an outstandingly nasty performance. The idea is that among all permutations of the input sequence there definitely is the completely sorted one. If we were to take it, we don't need to do anything else. It's an example of the so-called "generate and test" paradigm that may be employed when we know next to nothing about the nature of our task: then, put some input into the black box and see the outcome. In the case of bogosort, the number of possible inputs is the number of all permutations that's equal to n!, so considering that we need to also examine each permutation's order the algorithm's complexity is O(n * n!) — quite a bad number, especially, since some specialized sorting algorithms can work as fast as O(n) (for instance, Bucket sort for integer numbers). On the other hand, if generating all permutations is a library function and we don't care about complexity such an algorithm will have a rather simple implementation that looks quite innocent. So you should always inquire about the performance characteristics of 3rd-party functions. And, by the way, your standard library sort function is also a good example of this rule.

(defun bogosort (vec comp)
(dolist (variant (all-permutations vec))
(dotimes (i (1- (length variant)))
;; this is the 3rd optional argument of dotimes header
;; that is evaluated only after the loop finishes normally
;; if it does we have found a completely sorted permutation!
(return-from bogosort variant))
(when (call comp (? variant (1+ i)) (? variant i))
(return))))) ; current variant is not sorted, skip it

O(n^2) Sorting

Although we can imagine an algorithm with even worse complexity factors than this, bogosort gives us a good lower bound on the sorting algorithm's performance and an idea of the potential complexity of this task. However, there are much faster approaches that don't have a particularly complex implementation. There is a number of such simple algorithms that work in quadratic time. A very well-known one, which is considered by many a kind of "Hello world" algorithm, is Bubble sort. Yet, in my opinion, it's quite a bad example to teach (sadly, often it is taught) because it's both not very straightforward and has poor performance characteristics. That's why it's never used in practice. There are two other simple quadratic sorting algorithms that you actually have a chance to encounter in the wild, especially, Insertion sort that is used rather frequently. Their comparison is also quite insightful, so we'll take a look at both, instead of focusing just on the former.

Selection sort is an in-place sorting algorithm that moves left-to-right from the beginning of the vector one element at a time and builds the sorted prefix to the left of the current element. This is done by finding the "largest" (according to the comparator predicate) element in the right part and swapping it with the current element.

(defun selection-sort (vec comp)
(dotimes (i (1- (length vec)))
(let ((best (aref vec i))
(idx i))
(dotimes (j (- (length vec) i 1))
(when (call comp (aref vec (+ i j 1)) best)
(:= best (aref vec (+ i j 1))
idx (+ i j 1)))))
(rotatef (aref vec i) (aref vec idx))) ; this is the lisp's swap operator
vec)

Selection sort requires a constant number of operations regardless of the level of sortedness of the original sequence: (/ (* n (- n 1)) 2) — the sum of the arithmetic progression from 1 to n, because, at each step, it needs to fully examine the remainder of the elements to find the maximum, and the remainder's size varies from n to 1. It handles equally well both contiguous and linked sequences.

Insertion sort is another quadratic-time in-place sorting algorithm that builds the sorted prefix of the sequence. However, it has a few key differences from Selection sort: instead of looking for the global maximum in the right-hand side it looks for a proper place of the current element in the left-hand side. As this part is always sorted it takes linear time to find the place for the new element and insert it there leaving the side in sorted order. Such change has great implications:

  • it is stable
  • it is online: the left part is already sorted, and, in contrast with selection sort, it doesn't have to find the maximum element of the whole sequence in the first step, it can handle encountering it at any step
  • for sorted sequences it works in the fastest possible way — in linear time — as all elements are already inserted into proper places and don't need moving. The same applies to almost sorted sequences, for which it works in almost linear time. However, for reverse sorted sequences, its performance will be the worse. In fact, there is a clear proportion of the algorithm's complexity to the average offset of the elements from their proper positions in the sorted sequence: O(k * n), where k is the average offset of the element. For sorted sequences k=0 and for reverse sorted it's (/ (- n 1) 2).
(defun insertion-sort (vec comp)
(dotimes (i (1- (length vec)))
(do ((j i (1- j)))
((minusp j))
(if (call comp (aref vec (1+ j)) (aref vec j))
(rotatef (aref vec (1+ j)) (aref vec j))
(return))))
vec)

As you see, the implementation is very simple: we look at each element starting from the second, compare it to the previous element, and if it's better we swap them and continue the comparison with the previous element until we reach the array's beginning.

So, where's the catch? Is there anything that makes Selection sort better than Insertion? Well, if we closely examine the number of operations required by each algorithm we'll see that Selection sort needs exactly (/ (* n (- n 1)) 2) comparisons and on average n/2 swaps. For Insertion sort, the number of comparisons varies from n-1 to (/ (* n (- n 1)) 2), so, in the average case, it will be (/ (* n (- n 1)) 4), i.e. half as many as for the other algorithm. In the sorted case, each element is already in its position, and it will take just 1 comparison to discover that, in the reverse sorted case, the average distance of an element from its position is (/ (- n 1) 2), and for the middle variant, it's in the middle, i.e. (/ (- n 1) 4). Times the number of elements (n). But, as we can see from the implementation, Insertion sort requires almost the same number of swaps as comparisons, i.e. (/ (* (- n 1) (- n 2)) 4) in the average case, and it matches the number of swaps of Selection sort only in the close to best case, when each element is on average 1/2 steps away from its proper position. If we sum up all comparisons and swaps for the average case, we'll get the following numbers:

  • Selection sort: (+ (/ (* n (- n 1)) 2) (/ n 2)) = (/ (+ (* n n) n) 2)
  • Insertion sort: (+ (/ (* n (- n 1)) 2) (+ (/ (* (- n 1) (- n 2)) 4) = (/ (+ (* 1.5 n n) (* -2.5 n) 1) 2)

The second number is slightly higher than the first. For small ns it is almost negligible: for instance, when n=10, we get 55 operations for Selection sort and 63 for Insertion. But, asymptotically (for huge ns like millions and billions), Insertion sort will need 1.5 times more operations. Also, it is often the case that swaps are more expensive operations than comparisons (although, the opposite is also possible).

In practice, Insertion sort ends up being used more often, for, in general, quadratic sorts are only used when the input array is small (and so the difference in the number of operations) doesn't matter, while it has other good properties we mentioned. However, one situation when Selection sort's predictable performance is an important factor is in the systems with deadlines.

Quicksort

There is a number of other O(n^2) sorting algorithms similar to Selection and Insertion sorts, but studying them quickly turns boring so we won't. As there's also a number of significantly faster algorithms that work in O(n * log n) time (almost linear). They usually rely on the divide-and-conquer approach when the whole sequence is recursively divided into smaller subsequences that have some property, thanks to which it's easier to sort them, and then these subsequences are combined back into the final sorted sequence. The feasibility of such performance characteristics is justified by the observation that ordering relations are recursive, i.e. if we have compared two elements of an array and then compare one of them to the third element, with a probability of 1/2 we'll also know how it relates to the other element.

Probably, the most famous of such algorithms is Quicksort. Its idea is, at each iteration, to select some element of the array as the "pivot" point and divide the array into two parts: all the elements that are smaller and all those that are larger than the pivot; then recursively sort each subarray. As all left elements are below the pivot and all right — above when we manage to sort the left and right sides the whole array will be sorted. This invariant holds for all iterations and for all subarrays. The word "invariant", literally, means some property that doesn't change over the course of the algorithm's execution when other factors, e.g. bounds of the array we're processing, are changing.

There're several tricks in Quicksort implementation. The first one has to do with pivot selection. The simplest approach is to always use the last element as the pivot. Now, how do we put all the elements greater than the pivot after it if it's already the last element? Let's say that all elements are greater — then the pivot will be at index 0. Now, if moving left to right over the array we encounter an element that is not greater than the pivot we should put it before, i.e. the pivot's index should increment by 1. When we reach the end of the array we know the correct position of the pivot, and in the process, we can swap all the elements that should precede it in front of this position. Now, we have to put the element that is currently occupying the pivot's place somewhere. Where? Anywhere after the pivot, but the most obvious thing is to swap it with the pivot.

(defun quicksort (vec comp)
(when (> (length vec) 1)
(with ((pivot-i 0)
(pivot (aref vec (1- (length vec)))))
(dotimes (i (1- (length vec)))
(when (call comp (aref vec i) pivot)
(rotatef (aref vec i)
(aref vec pivot-i))
(:+ pivot-i)))
;; swap the pivot (last element) in its proper place
(rotatef (aref vec (1- (length vec)))
(aref vec pivot-i))
(quicksort (slice vec 0 pivot-i) comp)
(quicksort (slice vec (1+ pivot-i)) comp)))
vec)

Although recursion is employed here, such implementation is space-efficient as it uses array displacement ("slicing") that doesn't create new copies of the subarrays, so sorting happens in-place. Speaking of recursion, this is one of the cases when it's not so straightforward to turn it into looping (this is left as an exercise to the reader :) ).

What is the complexity of such implementation? Well, if, on every iteration, we divide the array in two equal halves we'll need to perform n comparisons and n/2 swaps and increments, which totals to 2n operations. And we'll need to do that (log n 2) times, which is the height of a complete binary tree with n elements. At every level in the recursion tree, we'll need to perform twice as many sorts with twice as little data, so each level will take the same number of 2n operations. Total complexity: 2n * (log n 2), i.e. O(n * log n). In the ideal case.

However, we can't guarantee that the selected pivot will divide the array into two ideally equal parts. In the worst case, if we were to split it into 2 totally unbalanced subarrays, with n-1 and 0 elements respectively, we'd need to perform sorting n times and had to perform a number of operations that will diminish in the arithmetic progression from 2n to 2. Which sums to (* n (- n 1)). A dreaded O(n^2) complexity. So, the worst-case performance for quicksort is not just worse, but in a different complexity league than the average-case one. Moreover, the conditions for such performance (given our pivot selection scheme) are not so uncommon: sorted and reverse-sorted arrays. And the almost sorted ones will result in the almost worst-case scenario.

It is also interesting to note that if, at each stage, we were to split the array into parts that have a 10:1 ratio of lengths this would have resulted in n * log n complexity! How come? The 10:1 ratio, basically, means that the bigger part each time is shortened at a factor of around 1.1, which still is a power-law recurrence. The base of the algorithm will be different, though: 1.1 instead of 2. Yet, from the complexity theory point of view, the logarithm base is not important because it's still a constant: (log n x) is the same as (/ (log n 2) (log x 2)), and (/ 1 (log x 2)) is a constant for any fixed logarithm base x. In our case, if x is 1.1 the constant factor is 7.27. Which means that quicksort, in the quite bad case of recurring 10:1 splits, will be just a little more than 7 times slower than, in the best case, of recurring equal splits. Significant — yes. But, if we were to compare n * log n (with base 2) vs n^2 performance for n=1000 we'd already get a 100 times slowdown, which will only continue increasing as the input size grows. Compare this to a constant factor of 7...

So, how do we achieve at least 10:1 split, or, at least, 100:1, or similar? One of the simple solutions is called 3-medians approach. The idea is to consider not just a single point as a potential pivot but 3 candidates: first, middle, and last points — and select the one, which has the median value among them. Unless accidentally two or all three points are equal, this guarantees us not taking the extreme value that is the cause of the all-to-nothing split. Also, for a sorted array, this should produce a nice near to equal split. How probable is stumbling at the special case when we'll always get at the extreme value due to equality of the selected points? The calculations here are not so simple, so I'll give just the answer: it's extremely improbable that such condition will hold for all iterations of the algorithm due to the fact that we'll always remove the last element and all the swapping that is going on. More precisely, the only practical variant when it may happen is when the array consists almost or just entirely of the same elements. And this case will be addressed next. One more refinement to the 3-medians approach that will work even better for large arrays is 9-medians that, as is apparent from its name, performs the median selection not among 3 but 9 equidistant points in the array.

Dealing with equal elements is another corner case for quicksort that should be addressed properly. The fix is simple: to divide the array not in 2 but 3 parts, smaller, larger, and equal to the pivot. This will allow for the removal of the equal elements from further consideration and will even speed up sorting instead of slowing it down. The implementation adds another index (this time, from the end of the array) that will tell us where the equal-to-pivot elements will start, and we'll be gradually swapping them into this tail as they are encountered during array traversal.

Production Sort

I was always wondering how it's possible, for Quicksort, to be the default sorting algorithm when it has such bad worst-case performance and there are other algorithms like Merge sort or Heap sort that have guaranteed O(n * log n) ones. With all the mentioned refinements, it's apparent that the worst-case scenario, for Quicksort, can be completely avoided (in the probabilistic sense) while it has a very nice property of sorting in-place with good cache locality, which significantly contributes to better real-world performance. Moreover, production sort implementation will be even smarter by utilizing Quicksort while the array is large and switching to something like Insertion sort when the size of the subarray reaches a certain threshold (10-20 elements). All this, however, is applicable only to arrays. When we consider lists, other factors come into play that make Quicksort much less plausible.

Here's an attempt at such — let's call it "Production sort" — implementation (the function 3-medians is left as an excercise to the reader).

(defun prod-sort (vec comp &optional (eq 'eql))
(cond ((< (length vec) 2)
vec)
((< (length vec) 10)
(insertion-sort vec comp))
(t
(rotatef (aref vec (1- (length vec)))
(aref vec (3-medians vec comp eq)))
(with ((pivot-i 0)
(pivot-count 1)
(last-i (1- (length vec)))
(pivot (aref vec last-i)))
(do ((i 0 (1+ i)))
((> i (- last-i pivot-count)))
(cond ((call comp (aref vec i) pivot)
(rotatef (aref vec i)
(aref vec pivot-i))
(:+ pivot-i))
((call eq (aref vec i) pivot)
(rotatef (aref vec i)
(aref vec (- last-i pivot-count)))
(:+ pivot-count)
(:- i)))) ; decrement i to reprocess newly swapped point
(dotimes (i pivot-count)
(rotatef (aref vec (+ pivot-i i))
(aref vec (- last-i i))))
(prod-sort (slice vec 0 pivot-i) comp eq)
(prod-sort (slice vec (+ pivot-i pivot-count)) comp eq))))
vec)

All in all, the example of Quicksort is very interesting, from the point of view of complexity analysis. It shows the importance of analyzing the worst-case and other corner-case scenarios, and, at the same time, teaches that we shouldn't give up immediately if the worst case is not good enough, for there may be ways to handle such corner cases that reduce or remove their impact.

Performance Benchmark

Finally, let's look at our problem from another angle: simple and stupid. We have developed 3 sorting functions' implementations: Insertion, Quick, and Prod. Let's create a tool to compare their performance on randomly generated datasets of decent sizes. This may be done with the following code and repeated many times to exclude the effects of randomness.

(defun random-vec (size)
(let ((vec (make-array size)))
(dotimes (i size)
(:= (? vec i) (random size)))
vec))

(defun print-sort-timings (sort-name sort-fn vec)
;; we'll use in-place modification of the input vector VEC
;; so we need to copy it to preserve the original for future use
(let ((vec (copy-seq vec))
(len (length vec)))
(format t "= ~Asort of random vector (length=~A) =~%"
sort-name len)
(time (call sort-fn vec '<))
(format t "= ~Asort of sorted vector (length=~A) =~%"
sort-name len)
(time (call sort-fn vec '<))
(format t "= ~Asort of reverse sorted vector (length=~A) =~%"
sort-name len)
(time (call sort-fn vec '>))))

CL-USER> (let ((vec (random-vec 1000)))
(print-sort-timings "Insertion " 'insertion-sort vec)
(print-sort-timings "Quick" 'quicksort vec)
(print-sort-timings "Prod" 'prod-sort vec))
= Insertion sort of random vector (length=1000) =
Evaluation took:
0.128 seconds of real time
...
= Insertion sort of sorted vector (length=1000) =
Evaluation took:
0.001 seconds of real time
...
= Insertion sort of reverse sorted vector (length=1000) =
Evaluation took:
0.257 seconds of real time
...
= Quicksort of random vector (length=1000) =
Evaluation took:
0.005 seconds of real time
...
= Quicksort of sorted vector (length=1000) =
Evaluation took:
5.429 seconds of real time
...
= Quicksort of reverse sorted vector (length=1000) =
Evaluation took:
2.176 seconds of real time
...
= Prodsort of random vector (length=1000) =
Evaluation took:
0.008 seconds of real time
...
= Prodsort of sorted vector (length=1000) =
Evaluation took:
0.004 seconds of real time
...
= Prodsort of reverse sorted vector (length=1000) =
Evaluation took:
0.007 seconds of real time

Overall, this is a really primitive approach that can't serve as conclusive evidence on its own, but it has value as it aligns well with our previous calculations. Moreover, it once again reveals some things that may be omitted in those calculations: for instance, the effects of the hidden constants of the Big-O notation or of the particular programming vehicles used. We can see that, for their worst-case scenarios, where Quicksort and Insertion sort both have O(n^2) complexity and work the longest, Quicksort comes 10 times slower, although it's more than 20 times faster for the average case. This slowdown may be attributed both to the larger number of operations and to using recursion. Also, our Prodsort algorithm demonstrates its expected performance. As you see, such simple testbeds quickly become essential in testing, debugging, and fine-tuning our algorithms' implementations. So it's a worthy investment.

Finally, it is worth noting that array sort is often implemented as in-place sorting, which means that it will modify (spoil) the input vector. We use that in our test function: first, we sort the array and then sort the sorted array in direct and reverse orders. This way, we can omit creating new arrays. Such destructive sort behavior may be both the intended and surprising behavior. The standard Lisp's sort and stable-sort functions also exhibit it, which is, unfortunately, a source of numerous bugs due to the application programmer forgetfulness of the function's side-effects (at least, this is an acute case, for myself). That's why RUTILS provides an additional function safe-sort that is just a thin wrapper over standard sort to free the programmer's mind from worrying or forgetting about this treacherous sort's property.

A few points we can take away from this chapter:

  1. Array is a goto structure for implementing your algorithms. First, try to fit it before moving to other things like lists, trees, and so on.
  2. Complexity estimates should be considered in context: of the particular task's requirements and limitations, of the hardware platform, etc. Performing some real-world benchmarking alongside back-of-the-napkin abstract calculations may be quite insightful.
  3. It's always worth thinking of how to reduce the code to the simplest form: checking of additional conditions, recursion, and many other forms of code complexity, although, rarely are a game changer, often may lead to significant unnecessary slowdowns.

Footnotes:

[1] or void* in C, or some other type that allows any element in your language of choice

[2] Such incompatibility errors are not a cheap thing: for instance, it is reported that the crash of the first Arian V rocket happened due to interoperation of two programs that used the metric and the imperial measurement systems without explicit conversion of the data. There's an elegant solution to such problem: "dimensional numbers", which a custom reader macro to encode the measure alongside the number. Here is a formula expressed with such numbers:

(defun running-distance-for-1kg-weight-loss (mass)
(* 1/4 (/ #M37600kJ (* #M0.98m/s2 mass))))

CL-USER> (running-distance-for-1kg-weight-loss #M80kg)
119897.96
CL-USER> (running-distance-for-1kg-weight-loss #I200lb)
105732.45

The output is, of course, in metric units. Unfortunately, this approach will not be useful for arrays encoded by different languages as they are obtained not by reading the input but by referencing external memory. Instead, a wrapper struct/class is, usually, used to specify the elements order.

Nicolas HafnerAn Extensible Particle System - Gamedev

· 78 days ago

header
This article was originally published on GameDev.NET. In it, I illustrate a new particle system that was developed for my Lisp game engine, Trial. It contains quite a bit of graphics stuff, but also a lot of Lisp. I thought it would be worthwhile to share it here as well. For those unfamiliar, a particle system deals in orchestrating a lot of very similar things (particles). The challenge is to efficiently draw and update these particles.

For the drawing we consider two separate parts - the geometry used for each particle, and the data used to distinguish one particle from another. We pack both of these two parts into a singular vertex array, using instancing for the vertex attributes of the latter part. This allows us to use instanced drawing and draw all of the particles in one draw call. In the particle shader we then need to make sure to add the particle's location offset, and to do whatever is necessary to render the geometry appropriately as usual. This can be done easily enough in any game engine, though it would be much more challenging to create a generic system that can easily work with any particle geometry and any rendering logic. In Trial this is almost free.

There's two parts in Trial that allow me to do this: first, the ability to inherit and combine opaque shader parts along the class hierarchy, and second, the ability to create structures that are backed by an opaque memory region, while retaining the type information. The latter part is not that surprising for languages where you can cast memory and control the memory layout precisely, but nonetheless in Trial you can combine these structures through inheritance, something not typically possible without significant hassle. Trial also allows you to describe the memory layout precisely. For instance, this same system is used to represent uniform buffer objects, as well as what we're using here, which is attributes in a vertex buffer.

If you'll excuse the code dump, we'll now take a look at the actual particle system implementation:

(define-gl-struct (particle (:layout-standard :vertex-buffer))
  (lifetime :vec2 :accessor lifetime))

(define-shader-subject particle-emitter ()
  ((live-particles :initform 0 :accessor live-particles)
   (vertex-array :accessor vertex-array)
   (particle-buffer :initarg :particle-buffer :accessor particle-buffer)))

(defmethod initialize-instance :after ((emitter particle-emitter) &key particle-mesh particle-buffer)
  (setf (vertex-array emitter)
        (add-vertex-bindings
         particle-buffer
         (change-class particle-mesh 'vertex-array))))

(defmethod paint ((emitter particle-emitter) pass)
  (let ((vao (vertex-array emitter)))
    (gl:bind-vertex-array (gl-name vao))
    (%gl:draw-elements-instanced (vertex-form vao) (size vao) :unsigned-int 0 (live-particles emitter))))

(defgeneric initial-particle-state (emitter tick particle))
(defgeneric update-particle-state (emitter tick input output))
(defgeneric new-particle-count (emitter tick)) ; => N

(define-handler (particle-emitter tick) (ev)
  (let ((vbo (particle-buffer particle-emitter))
        (write-offset 0))
    (let ((data (struct-vector vbo)))
      (declare (type simple-vector data))
      (loop for read-offset from 0 below (live-particles particle-emitter)
            for particle = (aref data read-offset)
            do (when (< (vx2 (lifetime particle)) (vy2 (lifetime particle)))
                 (when (update-particle-state particle-emitter ev particle (aref data write-offset))
                   (incf write-offset))))
      (loop repeat (new-particle-count particle-emitter ev)
            while (< write-offset (length data))
            do (initial-particle-state particle-emitter ev (aref data write-offset))
               (incf write-offset))
      (setf (live-particles particle-emitter) write-offset)
      (update-buffer-data vbo T))))

Let's go over this real quick. We first define a base class for all particles. This only mandates the lifetime field, which is a vector composed of the current age and the max age. This is used by the emitter to check liveness. Any other attribute of a particle is specific to the use-case, so we leave that up to the user.

Next we define our main particle-emitter class. It's called a "shader subject" in Trial, which means that it has shader code attached to the class, and can react to events in separate handler functions. Anyway, all we need for this class is to keep track of the number of live particles, the vertex array for all the particles, and the buffer we use to keep the per-particle data. In our constructor we construct the vertex array be combining the vertex attribute bindings of the particle buffer and the particle mesh.

The painting logic is very light, as we just need to bind the vertex array and do an instanced draw call, using the live-particles count for our current number of instances.

The three functions defined afterwards specify the protocol users need to follow to actually create and update the particles throughout their lifetime. The first function fills the initial state into the passed particle instance, the second uses the info from the input particle instance to fill the update into the output particle info, and the final function determines the number of new particles per update. These particle instances are instances of the particle class the user specifies through the particle-buffer, but their fields are backed by a common byte array. This allows us to make manipulation of the particles feel native and remain extensible, without requiring complex and expensive marshalling.

Finally we come to the bulk of the code, which is the tick update handler. This does not do too much in terms of logic, however. We simply iterate over the particle vector, checking the current lifetime. If the particle is still alive, we call the update-particle-state function. If this succeeds, we increase the write-offset into the particle vector. If it does not succeed, or the particle is dead, the write-offset remains the same, and the particle at that position will be overwritten by the next live, successful update. This in effect means that live particles are always at the beginning of the vector, allowing us to cut off the dead ones with the live-particles count. Then, we simply construct as many new particles as we should without overrunning the array, and finally we upload the buffer data from RAM to the GPU by using update-buffer-data, which in effect translates to a glBufferSubData call.

Now that we have this base protocol in place we can define a simple standard emitter, which should provide a much easier interface.

(define-gl-struct (simple-particle (:include particle)
                                   (:layout-standard :vertex-buffer))
  (location :vec3 :accessor location)
  (velocity :vec3 :accessor velocity))

(define-shader-subject simple-particle-emitter (particle-emitter)
  ())

(defmethod initial-particle-state :before ((emitter simple-particle-emitter) tick particle)
  (setf (location particle) (vec 0 0 0)))

(defmethod update-particle-state ((emitter simple-particle-emitter) tick particle output)
  (setf (location output) (v+ (location particle) (velocity particle)))
  (let ((life (lifetime particle)))
    (incf (vx2 life) (dt tick))
    (setf (lifetime output) life)
    (< (vx2 life) (vy2 life))))

(defmethod paint :before ((emitter simple-particle-emitter) (pass shader-pass))
  (let ((program (shader-program-for-pass pass emitter)))
    (setf (uniform program "view_matrix") (view-matrix))
    (setf (uniform program "projection_matrix") (projection-matrix))
    (setf (uniform program "model_matrix") (model-matrix))))

(define-class-shader (simple-particle-emitter :vertex-shader)
  "layout (location = 0) in vec3 vtx_location;
layout (location) in vec3 location;

uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void main(){
  vec3 position = vtx_location + location;
  gl_Position = projection_matrix * view_matrix * model_matrix * vec4(position, 1.0f);
}")

Okey! Again we define a new structure, this time including the base particle so that we get the lifetime field as well. We add a location and velocity on to this, which we'll provide for basic movement. Then we define a subclass of our emitter, to provide the additional defaults. Using this subclass we can provide some basic updates that most particle systems based on it will expect: an initial location at the origin, updating the location by the velocity, increasing the lifetime by the delta time of the tick, and returning whether the particle is still live after that.

On the painting side we provide the default handling of the position. To do so, we first pass the three standard transform matrices used in Trial as uniforms, and then define a vertex shader snippet that handles the vertex transformation. You might notice here that the second vertex input, the one for the per-particle location, does not have a location assigned. This is because we cannot know where this binding lies ahead of time. The user might have additional vertex attributes for their per-particle mesh that we don't know about. The user must later provide an additional vertex-shader snippet that does define this.

So, finally, let's look at an actual use-case of this system.

(define-asset (workbench particles) vertex-struct-buffer
    'simple-particle
  :struct-count 1024)

(define-shader-subject fireworks (simple-particle-emitter)
  ()
  (:default-initargs :particle-mesh (change-class (make-sphere 1) 'vertex-array :vertex-attributes '(location))
                     :particle-buffer (asset 'workbench 'particles)))

(defmethod initial-particle-state ((fireworks fireworks) tick particle)
  (let ((dir (polar->cartesian (vec2 (/ (sxhash (fc tick)) (ash 2 60)) (mod (sxhash (fc tick)) 100)))))
    (setf (velocity particle) (vec (vx dir) (+ 2.5 (mod (sxhash (fc tick)) 2)) (vy dir))))
  (setf (lifetime particle) (vec 0 (+ 3.0 (random 1.0)))))

(defmethod update-particle-state :before ((fireworks fireworks) tick particle output)
  (let ((vel (velocity particle)))
    (decf (vy3 vel) 0.005)
    (when (< (abs (- (vx (lifetime particle)) 2.5)) 0.05)
      (let ((dir (polar->cartesian (vec3 (+ 1.5 (random 0.125)) (random (* 2 PI)) (random (* 2 PI))))))
        (vsetf vel (vx dir) (vy dir) (vz dir))))
    (setf (velocity output) vel)))

(defmethod new-particle-count ((fireworks fireworks) tick)
  (if (= 0 (mod (fc tick) (* 10 1)))
      128 0))

(define-class-shader (fireworks :vertex-shader 1)
  "layout (location = 1) in vec2 in_lifetime;
layout (location = 2) in vec3 location;

out vec2 lifetime;

void main(){
  lifetime = in_lifetime;
}")

(define-class-shader (fireworks :fragment-shader)
  "out vec4 color;

in vec2 lifetime;

void main(){
  if(lifetime.x <= 2.5)
    color = vec4(1);
  else{
    float lt = lifetime.y-lifetime.x;
    color = vec4(lt*2, lt, 0, 1);
  }
}")

First we define an asset that holds our per-particle buffer data. To do this we simply pass along the name of the particle class we want to use, as well as the number of such instances to allocate in the buffer. We then use this, as well as a simple sphere mesh, to initialize our own particle emitter. Then come the particle update methods. For the initial state we calculate a random velocity within a cone region, using polar coordinates. This will cause the particles to shoot out at various angles. We use a hash on the current frame counter here to ensure that particles generated in the same frame get bunched together with the same initial values. We also set the lifetime to be between three and four seconds, randomly for each particle.

In the update, we only take care of the velocity change, as the rest of the work is already done for us. For this we apply some weak gravity, and then check the lifetime of the particle. If it is within a certain range, we radically change the velocity of the particle in a random, spherical direction. In effect this will cause the particles, which were bunched together until now, to spread out randomly.

For our generator, we simply create a fixed number of particles every 10 frames or so. In a fixed frame-rate, this should look mean a steady generation of particle batches.

Finally, in the two shader code snippets we provide the aforementioned vertex attribute binding location, and some simple colouring logic to make the particles look more like fireworks. The final result of this exercise is this:

fireworks

Quite nice, I would say. With this we have a system that allows us to create very different particle effects, with relatively little code. For Leaf, I intend on using this to create 2D sprite-based particle effects, such as sparks, dust clouds, and so forth. I'm sure I'll revisit this at a later date to explore these different application possibilities.

Vsevolod DyomkinProgramming Algorithms: Data Structures

· 79 days ago

The next several chapters will be describing the basic data structures that every programming language provides, their usage and the most important algorithms relevant to them. And we'll start with the notion of a data-structure and tuples or structs that are the most primitive and essential one.

Data Structures vs Algorithms

Let's start with a somewhat abstract question: what's more important, algorithms or data structures?

From one point of view, algorithms are the essence of many programs, while data structures may seem secondary. Besides, although a majority of algorithms rely on certain features of particular data structures, not all do. Good examples of the data-structure-relying algorithms are heapsort, search using BSTs, and union-find. And of the second type: the sieve of Erastophenes and consistent hashing.

At the same time, some seasoned developers state that when the right data structure is found, the algorithm will almost write itself. Linus Torvalds, the creator of Linux, is quoted saying:

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

A somewhat less poignant version of the same idea is formulated in the Art of Unix Programming by Eric S. Raymond as the "Rule of Representation":

Fold knowledge into data so program logic can be stupid and robust.

Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic.

Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.

Data structures are more static than algorithms. Surely, most of them allow change of their contents over time, but there are certain invariants that always hold. This allows reasoning by simple induction: consider only two (or at least a small number of) cases, the base one(s) and the general. In other words, data structures remove, in the main, the notion of time from consideration, and change over time is one of the major causes of program complexity. In other words, data structures are declarative, while most of the algorithms are imperative. The advantage of the declarative approach is that you don't have to imagine (trace) the flow of time through it.

So, this book, like most other books on the subject, is organized around data structures. The majority of the chapters present a particular structure, its properties and interface, and explain the algorithms, associated with it, showing its real-world use cases. Yet, some important algorithms don't require a particular data structure, so there are also several chapters dedicated exclusively to them.

The Data Structure Concept

Among data structures, there are, actually, two distinct kinds: abstract and concrete. The significant difference between them is that an abstract structure is just an interface (a set of operations) and a number of conditions or invariants that have to be met. Their particular implementations, which may differ significantly in efficiency characteristics and inner mechanisms, are provided by the concrete data structures. For instance, an abstract data structure queue has just two operations: enqueue that adds an item to the end of the queue and dequeue that gets an item at the beginning and removes it. There's also a constraint that the items should be dequeued in the same order they are enqueued. Now, a queue may be implemented using a number of different underlying data structures: a linked or a double-linked list, an array or a tree. Each one having different efficiency characteristics and additional properties beyond the queue interface. We'll discuss both kinds in the book, focusing on the concrete structures and explaining their usage to implement a particular abstract interface.

The term data structures has somewhat fallen from grace, in the recent years, being often replaced by conceptually more loaded notions of types, in the context of the functional programming paradigm, or classes, in object-orientated one. Yet, both of those notions imply something more than just algorithmic machinery we're exclusively interested in, for this book. First of all, they also distinguish among primitive values (numbers, characters, etc.) that are all non-distinct, in the context of algorithms. Besides, classes form a hierarchy of inheritance while types are associated with algebraic rules of category theory. So, we'll stick to a neutral data structures term, throughout the book, with occasional mentions of the other variants where appropriate.

Contiguous and Linked Data Structures

The current computer architectures consist of a central processor (CPU), memory and peripheral input-output devices. The data is someway exchanged with the outside world via the IO-devices, stored in memory, and processed by the CPU. And there's a crucial constraint, called the von Neumann's bottleneck: the CPU can only process data that is stored inside of it in a limited number of special basic memory blocks called registers. So it has to constantly move data elements back and forth between the registers and main memory (using intermediate cache to speed up the process). Now, there are things that can fit in a register and those that can't. The first ones are called primitive and mostly unite those items that can be directly represented with integer numbers: integers proper, floats, characters. Everything that requires a custom data structure to be represented can't be put in a register as a whole.

Another item that fits into the processor register is a memory address. In fact, there's an important constant — the number of bits in a general-purpose register, which defines the maximum memory address that a particular CPU may handle and, thus, the maximum amount of memory it can work with. For a 32-bit architecture it's 2^32 (4 GB) and for 64-bit — you've guessed it, 2^64. A memory address is usually called a pointer, and if you put a pointer in a register, there are commands that allow the CPU to retrieve the data in-memory from where it points.

So, there are two ways to place a data structure inside the memory:

  • a contiguous structure occupies a single chunk of memory and its contents are stored in adjacent memory blocks. To access a particular piece we should know the offset of its beginning from the start of the memory range allocated to the structure. (This is usually handled by the compiler). When the processor needs to read or write to this piece it will use the pointer calculated as the sum of the base address of the structure and the offset. The examples of contiguous structures are arrays and structs
  • a linked structure, on the contrary, doesn't occupy a contiguous block of memory, i.e. its contents reside in different places. This means that pointers to a particular piece can't be pre-calculated and should be stored in the structure itself. Such structures are much more flexible at the cost of this additional overhead both in terms of used space and time to access an element (which may require several hops when there's nesting, while in the contiguous structure it is always constant). There exists a multitude of linked data structures like lists, trees, and graphs

Tuples

In most languages, some common data structures, like arrays or lists, are "built-in", but, under the hood, they will mostly work in the same way as any user-defined ones. To implement an arbitrary data structure, these languages provide a special mechanism called records, structs, objects, etc. The proper name for it would be "tuple". It is the data structure that consists of a number of fields each one holding either a primitive value, another tuple or a pointer to another tuple of any type. This way a tuple can represent any structure, including nested and recursive ones. In the context of type theory, such structures are called product types.

A tuple is an abstract data structure and its sole interface is the field accessor function: by name (a named tuple) or index (an anonymous tuple). It can be implemented in various ways, although a contiguous variant with constant-time access is preferred. However, in many languages, especially dynamic, programmers often use lists or dynamic arrays to create throw-away ad-hoc tuples. Python has a dedicated tuple data type, that is often for this purpose, that is a linked data structure under the hood. The following Python function will return a tuple (written in parens) of a decimal and remainder parts of the number x[1]:

def truncate(x):
dec = int(x)
rem = x - dec
return (dec, rem)

This is a simple and not very efficient way that may have its place when the number of fields is small and the lifetime of the structure is short. However, a better approach both from the point of view of efficiency and code clarity is to use a pre-defined structure. In Lisp, a tuple is called "struct" and is defined with defstruct, which uses a contiguous representation by default (although there's an option to use a linked list under-the-hood). Following is the definition of a simple pair data structure that has two fields (called "slots" in Lisp parlance): left and right.

(defstruct pair
left right)

The defstruct macro, in fact, generates several definitions: of the struct type, its constructor that will be called make-pair and have 2 keyword arguments :left and :right, and field accessors pair-left and pair-right. Also, a common print-object method for structs will work for our new structure, as well as a reader-macro to restore it from the printed form. Here's how it all fits together:

CL-USER> (make-pair :left "foo" :right "bar")
#S(PAIR :LEFT "foo" :RIGHT "bar")
CL-USER> (pair-right (read-from-string (prin1-to-string *)))
"bar"

prin1-to-string and read-from-string are complimentary Lisp functions that allow to print the value in a computer-readable form (if an appropriate print-function is provided) and read it back. Good print-representations readable to both humans and, ideally, computers are very important to code transparency and should never be neglected.

There's a way to customize every part of the definition. For instance, if we plan to use pairs frequently we can leave out the pair- prefix by specifying (:conc-name nil) property. Here is an improved pair definition and shorthand constructor for it from RUTILS, which we'll use throughout the book. It uses :type list allocation to integrate with destructuring macros.

(defstruct (pair (:type list) (:conc-name nil))
"A generic pair with left (LT) and right (RT) elements."
lt rt)

(defun pair (x y)
"A shortcut to make a pair of X and Y."
(make-pair :lt x :rt y))

Passing Data Structures in Function Calls

One final remark. There are two ways to use data structures with functions: either pass them directly via copying appropriate memory areas (call-by-value) — an approach, usually, applied to primitive types — or pass a pointer (call-by-reference). In the first case, there's no way to modify the contents of the original structure in the called function, while in the second variant it is possible, so the risk of unwarranted change should be taken into account. The usual way to handle it is by making a copy before invoking any changes, although, sometimes, mutation of the original data structure may be intended so a copy is not needed. Obviously, the call-by-reference approach is more general, because it allows both modification and copying, and more efficient because copying is on-demand. That's why it is the default way to handle structures (and objects) in most programming languages. In a low-level language like C, however, both variants are supported. Moreover, in C++ the pass-by-reference has two kinds: pass the pointer and pass what's actually called a reference, which is syntax sugar over pointers that allows accessing the argument with non-pointer syntax (dot instead of arrow) and adds a couple of restrictions. But the general idea, regardless of the idiosyncrasies of particular languages, remains the same.

Structs in Action: Union-Find

Data structures come in various shapes and flavors. Here, I'd like to mention one peculiar and interesting example that is both a data structure and an algorithm, to some extent. Even the name speaks about certain operations rather than a static form. Well, most of the more advanced data structures all have this feature that they are defined not only by the shape and arrangement but also via the set of operations that are applicable. Union-Find is a family of data-structure-algorithms that can be used for efficient determination of set membership in sets that change over time. They may be used for finding the disjoint parts in networks, detection of cycles in graphs, finding the minimum spanning tree and so forth. One practical example of such problems is automatic image segmentation: separating different parts of an image, a car from the background or a cancer cell from a normal one.

Let's consider the following problem: how to determine if two points of the graph have a path between them? Given that a graph is a set of points (vertices) and edges between some of the pairs of these points. A path in the graph is a sequence of points leading from source to destination with each pair having an edge that connects them. If some path between two points exists they belong to the same component if it doesn't — to two disjoint ones.


A graph with 3 disjoint components

For two arbitrary points, how to determine if they have a connecting path? The naive implementation may take one of them and start building all the possible paths (this may be done in breadth-first or depth-first manner, or even randomly). Anyway, such procedure will, generally, require a number of steps proportional to the number of vertices of the graph. Can we do better? This is a usual question that leads to the creation of more efficient algorithms.

Union-Find approach is based on a simple idea: when adding the items record the id of the component they belong to. But how to determine this id? Use the id associated with some point already in this subset or the current point's id if the point is in a subset of its own. And what if we have the subsets already formed? No problem, we can simulate the addition process by iterating over each vertex and taking the id of an arbitrary point it's connected to as the subset's id. Below is the implementation of this approach (to simplify the code, we'll use the pointers to `point` structs instead of ids, but, conceptually, it's the same idea):

(defstruct point
parent) ; if the parent is null the point is the root

(defun uf-union (point1 point2)
"Join the subsets of POINT1 and POINT2."
(:= (point-parent point1) (or (point-parent point2)
point2)))

(defun uf-find (point)
"Determine the id of the subset that a POINT belongs to."
(let ((parent (point-parent point)))
(if parent
(uf-find parent)
point)))

Just calling (make-point) will add a new subset with a single item in it to our set.

Note that uf-find uses recursion to find the root of the subset, i.e. the point that was added first. So, for each vertex, we store some intermediary data and, to get the subset id, each time, we'll have to perform additional calculations. This way, we managed to reduce the average-case find time, but, still, haven't completely excluded the possibility of it requiring traversal of every element of the set. Such so-called degraded case may manifest when each item is added referencing the previously added one. I.e. there will be a single subset with a chain of its members connected to the next one like this: a -> b -> c -> d. If we call uf-find on a it will have to enumerate all of the set's elements.

Yet, there is a way to improve uf-find behavior: by compressing the tree depth to make all points along the path to the root point to it, i.e squashing each chain into a wide shallow tree of depth 1.

  d
^ ^ ^
| | |
a b c

Unfortunately, we can't do that, at once, for the whole subset, but, during each run of uf-find, we can compress one path, which will also shorten all the paths in the subtree that is rooted in the points on it! Still, this cannot guarantee that there will not be a sequence of enough unions to grow the trees faster than finds can flatten them. But there's another tweak that, combined with path compression, allows to ensure sublinear (actually, almost constant) time of both operations: keep track of the size of all trees and link the smaller tree below the larger one. This will ensure that all trees' heights will stay below (log n). The rigorous proof of that is quite complex, although, intuitively, we can see the tendency by looking at the base case: if we add a 2-element tree and a 1-element one we'll still get the tree of the height 2.

Here is the implementation of the optimized version:

(defstruct point
parent
(size 1))

(defun uf-find (point)
(let ((parent (point-parent point)))
(if parent
;; here, we use the fact that the assignment will also return
;; the value to perform both path compression and find
(:= (point-parent point) (uf-find parent))
point)))

(defun uf-union (point1 point2)
(with ((root1 (uf-find point1))
(root2 (uf-find point2))
(major minor (if (> (point-size root1)
(point-size root2))
(values root1 root2)
(values root2 root1))))
(:+ (point-size major) (point-size minor))
(:= (point-parent minor) major)))

Here, Lisp multiple values come handy, to simplify the code. See the footnote [1] for more details about them.

The suggested approach is quite simple in implementation but complex in complexity analysis. So, I'll have to give just the final result: m union/find operations, with tree weighting and path compression, on a set of n objects will work in O((m + n) log* n) (where log* is iterated logarithm — a very slowly increasing function, that can be considered a constant, for practical purposes).

Finally, this is how to check if none of the points belong to the same subset in almost O(n) where n is the number of points to check[2], so in O(1) for 2 points:

(defun uf-disjoint (points)
"Return true if all of the POINTS belong to different subsets."
(let (roots)
(dolist (point points)
(let ((root (uf-find point)))
(when (member root roots)
(return-from uf-disjoint nil))
(push root roots))))
t)

A couple more observations may be drawn from this simple example:

  1. Not always the clever idea that we, initially, have works flawlessly at once. It is important to check the edge cases for potential problems.
  2. We've seen an example of a data structre that, directly, doesn't exist: pieces of information are distributed over individual data points. Sometimes, there's a choice between storing the information, in a centralized way, in a dedicated structure like a hash-table and distributing it over individual nodes. The latter approach is often more elegant and efficient, although it's not so obvious.

Footnotes:

[1] Moreover, Python has special syntax for destructuring such tuples: dec, rem = truncate(3.14). However, this is not the optimal way to handle returning the primary and one or more secondary values from a function. Lisp provides a more elegant solution called multiple values: all the necessary values are returned via the values form: (values dec rem) and can be retrieved with (multiple-value-bind (dec rem) (truncate 3.14) ...) or (with ((dec rem (truncate 3.14))) ...). It is more elegant because secondary values may be discarded at will by calling the function in a usual way: (+ 1 (truncate 3.14)) => 4 — not possible in Python, because you can't sum a tuple with something.

[2] Actually, the complexity here is O(n^2) due to the use of the function member that performs set membership test in O(n), but it's not essential to the general idea. If uf-disjoint is expected to be called with tens or hundreds of points the roots structure has to be changed to a hash-set that has a O(1) membership operation.

Lispers.deLisp-Meetup in Hamburg on Monday, 5th August 2019

· 79 days ago

We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CEST on 5th August 2019.

This is an informal gathering of Lispers of all experience levels.

Christophe Rhodesholiday hacking swankr

· 80 days ago

I'm on holiday! And holidays, as seems to be my usual, involve trains!

I was on a train yesterday, wondering how I could possibly fill two hours of leisure time (it's more straightforward these days than a few years ago, when my fellow travellers were less able to occupy their leisure time with books), when help came from a thoroughly unexpected quarter: I got a bug report for swankr.

I wrote swankr nine years ago, mostly at ISMIR2010. I was about to start the Teclo phase of my ongoing adventures, and an academic colleague had recommended that I learn R; and the moment where it all fell into place was when I discovered that R supported handlers and restarts: enough that writing support for slime's SLDB was a lower-effort endeavour than learning R's default debugger. I implemented support for the bits of slime that I use, and also for displaying images in the REPL -- so that I can present lattice objects as thumbnails of the rendered graph, and have the canned demo of adding two of them together to get a combined graph: generates "oo" sounds at every outing, guaranteed! (Now that I mostly use ggplot2, I need to find a new demo: incrementally building and styling a plot by adding theme, scale, legend objects and so on is actually useful but lacks a wow factor.)

SLIME works by exchanging messages between the emacs front-end and the backend lisp; code on the backend lisp is responsible for parsing the messages, executing whatever is needed to support the request, and sending a response. The messages are, broadly, sexp-based, and the message parsing and executing in the backends is mostly portable Common Lisp, delegating to implementation-specific code for specific implementation-specific support needed.

"Mostly portable Common Lisp" doesn't mean that it'll run in R. The R code is necessarily completely separate, implementing just enough of a Lisp reader to parse the messages. This works fine, because the messages themselves are simple; when the front-end sends a message asking for evaluation for the listener, say, the message is in the form of a sexp, but the form to evaluate is a string of the user-provided text: so as long as we have enough of a sexp reader/evaluator to deal with the SLIME protocol messages, the rest will be handled by the backend's eval function.

... except in some cases. When slime sends a form to be evaluated which contains an embedded presentation object, the presentation is replaced by #.(swank:lookup-presented-object-or-lose 57.) in the string sent for evaluation. This works fine for Common Lisp backends - at least provided *read-eval* is true - but doesn't work elsewhere. One regexp allows us to preprocess the string to evaluate to rewrite this into something else, but what? Enter cunning plan number 1: (ab)use backquote.

Backquote? Yes, R has backquote bquote. It also has moderately crazy function call semantics, so it's possible to: rewrite the string ob be evaluated to contain unquotations; parse the string into a form; funcall the bquote function on that form (effectively performing the unquotations, simulating the read-time evaluation), and then eval the result. And this would be marvellous except that Philipp Marek discovered that his own bquote-using code didn't work. Some investigation later, and the root cause became apparent:

CL-USER> (progn (set 'a 3) (eval (read-from-string "``,a")))
`,a

compare

R> a <- 3; bquote(bquote(.(a)))
bquote(3)

NO NESTED BACKQUOTES?

So, scratch the do.call("bquote", ...) plan. What else is available to us? It turns out that there's a substitute function, which for substitute(expr, env)

returns the parse tree for the (unevaluated) expression 'expr', substituting any variables bound in 'env'.

Great! Because that means we can use the regular expression to rewrite the presentation lookup function calls to be specific variable references, then substitute these variable references from the id-to-object environment that we have anyway, and Bob is your metaphorical uncle. So I did that.

The situation is not perfect, unfortunately. Here's some more of the documentation for substitute:

'substitute' works on a purely lexical basis. There is no guarantee that the resulting expression makes any sense.

... and here's some more from do.call:

The behavior of some functions, such as 'substitute', will not be the same for functions evaluated using 'do.call' as if they were evaluated from the interpreter. The precise semantics are currently undefined and subject to change.

Well that's not ominous at all.

Vsevolod DyomkinProgramming Algorithms: A Crash Course in Lisp

· 86 days ago

The introductory post for this book, unexpectedly, received quite a lot of attention, which is nice since it prompted some questions, and one of them I planned to address in this chapter.

I expect that there will be two main audiences, for this book:

  • people who'd like to advance in algorithms and writing efficient programs — the major group
  • lispers, either accomplished or aspiring who also happen to be interested in algorithms

This introductory chapter is, primarily, for the first group. After reading it, the rest of the book's Lisp code should become understandable to you. Besides, you'll know the basics to run Lisp and experiment with it if will you so desire.

For the lispers, I have one comment and one remark. You might be interested to read this part just to understand my approach of utilizing the language throughout the book. Also, you'll find my stance regarding the question that was voiced several times in the comments: whether it's justified to use some 3rd-party extensions and to what extent or should the author vigilantly stick to only the tools provided by the standard.

The Core of Lisp

To effortlessly understand Lisp, you'll have to forget, for a moment, any concepts of how programming languages should work that you might have acquired from your prior experience in coding. Lisp is simpler; and when people bring their Java, C or Python approaches to programming with it, first of all, the results are suboptimal in terms of code quality (simplicity, clarity, and beauty), and, what's more important, there's much less satisfaction from the process, not to mention very few insights and little new knowledge gained.

It is much easier to explain Lisp if we begin from a blank slate. In essence, all there is to it is just an evaluation rule: Lisp programs consist of forms that are evaluated by the compiler. There are 3+2 ways how that can happen:

  • self-evaluation: all literal constants (like 1, "hello", etc.) are evaluated to themselves. These literal objects can be either built-in primitive types (1) or data structures ("hello")
  • symbol evaluation: separate symbols are evaluated as names of variables, functions, types or classes depending on the context. The default is variable evaluation, i.e. if we encounter a symbol foo the compiler will substitute in its place the current value associated with this variable (more on this a little bit later)
  • expression evaluation: compound expressions are formed by grouping symbols and literal objects with parenthesis. The following form (oper 1 foo) is considered a "functional" expression: the operator name is situated in the first position (head), and its arguments, if any, in the subsequent positions (rest). There are 3 ways to evaluate a functional expression:
    • there are 25 special operators that are defined in lower-level code and may be considered something like axioms of the language: they are pre-defined, always present, and immutable. Those are the building blocks, on top of which all else is constructed, and they include the sequential block operator, the conditional expression if, and the unconditional jump go, to name a few. If oper is the name of a special operator, the low-level code for this operator that deals with the arguments in its own unique way is executed
    • there's also ordinary function evaluation: if oper is a function name, first, all the arguments are evaluated with the same evaluation rule, and then the function is called with the obtained values
    • finally, there's macro evaluation. Macros provide a way to change the evaluation rule for a particular form. If oper names a macro, its code is substituted instead of our expression and then evaluated. Macros are a major topic in Lisp, and they are used to build a large part of the language, as well as provide an accessible way, for the users, to extend it. However, they are orthogonal to the subject of this book and won't be discussed in further detail here. You can delve deeper into macros in such books as On Lisp or Let Over Lambda

It's important to note that, in Lisp, there's no distinction between statements and expressions, no special keywords, no operator precedence rules, and other similar arbitrary stuff you can stumble upon in other languages. Everything is uniform; everything is an expression in a sense that it will be evaluated and return some value.

A Code Example

To sum up, let's consider an example of the evaluation of a Lisp form. The following one implements the famous binary search algorithm (that we'll discuss in more detail in one of the following chapters):

(when (> (length vec) 0)
(let ((beg 0)
(end (length vec)))
(do ()
((= beg end))
(let ((mid (floor (+ beg end) 2)))
(if (> (? vec mid) val)
(:= beg (1+ mid))
(:= end mid))))
(values beg
(? vec beg)
(= (? vec beg) val))))

It is a compound form. In it, the so-called top-level form is when, which is a macro for a one-clause conditional expression: an if with only the true-branch. First, it evaluates the expression (> (length vec) 0), which is an ordinary function for a logical operator > applied to two args: the result of obtaining the length of the contents of the variable vec and a constant 0. If the evaluation returns true, i.e. the length of vec is greater than 0, the rest of the form is evaluated in the same manner. The result of the evaluation, if nothing exceptional happens, is either false (which is called nil, in Lisp) or 3 values returned from the last form (values ...). ? is the generic access operator, which abstracts over different ways to query data structures by key. In this case, it retrieves the item from vec at the index of the second argument. Below we'll talk about other operators shown here.

But first I need to say a few words abut RUTILS. It is a 3rd-party library that provides a number of extensions to the standard Lisp syntax and its basic operators. The reason for its existence is that Lisp standard is not going to change ever, and, as eveything in this world, it has its flaws. Besides, our understanding of what's elegant and efficient code evolves over time. The great advantage of the Lisp standard, however, which counteracts the issue of its immutability, is that its authors had put into it multiple ways to modify and evolve the language at almost all levels starting from even the basic syntax. And this addresses our ultimate need, after all: we're not so interested in changing the standard as we're in changing the language. So, RUTILS is one of the ways of evolving Lisp and its purpose is to make programming in it more accessible without compromising the principles of the language. So, in this book, I will use some basic extensions from RUTILS and will explain them as needed. Surely, using 3rd-party tools is the question of preference and taste and might not be approved by some of the Lisp old-times, but no worries, in your code, you'll be able to easily swap them for your favorite alternatives.

The REPL

Lisp programs are supposed to be run not only in a one-off fashion of simple scripts, but also as live systems that operate over long periods of time experiencing change not only of their data but also code. This general way of interaction with a program is called Read-Eval-Print-Loop (REPL), which literally means that the Lisp compiler reads a form, evaluates it with the aforementioned rule, prints the results back to the user, and loops over.

REPL is the default way to interact with a Lisp program, and it is very similar to the Unix shell. When you run your Lisp (for example, by entering sbcl at the shell) you'll drop into the REPL. We'll preceede all REPL-based code interactions in the book with a REPL prompt (CL-USER> or similar). Here's an example one:

CL-USER> (print "Hello world")
"Hello world"
"Hello world"

A curious reader may be asking why "Hello world" is printed twice. It's a proof that everything is an expression in Lisp. :) The print "statement", unlike in most other languages, not only prints its argument to the console (or other output stream), but also returns it as is. This comes very handy when debugging, as you can wrap almost any form in a print not changing the flow of the program.

Obviously, if the interaction is not necessary, just the read-eval part may remain. But, what's more important, Lisp provides a way to customize every stage of the process:

  • at the read stage special syntax ("syntax sugar") may be introduced via a mechanism called reader macros
  • ordinary macros are a way to customize the eval stage
  • the print stage is conceptually the simplest one, and there's also a standard way to customize object printing via the Common Lisp Object System's (CLOS) print-object function
  • and the loop stage can be replaced by any desired program logic

Basic Expressions

The structural programming paradigm states that all programs can be expressed in terms of 3 basic constructs: sequential execution, branching, and looping. Let's see how these operators are expressed in Lisp.

Sequential Execution

The simplest program flow is sequential execution. In all imperative languages, it is what is assumed to happen if you put several forms in a row and evaluate the resulting code block. Like this:

CL-USER> (print "hello") (+ 2 2)
"hello"
4

The value returned by the last expression is dimmed the value of the whole sequence.

Here, the REPL-interaction forms an implicit unit of sequential code. However, there are many cases when we need to explicitly delimit such units. This can be done with the block operator:

CL-USER> (block test
(print "hello")
(+ 2 2))
"hello"
4

Such block has a name (in this example: test). This allows to prematurely end its execution by using an operator return-from:

CL-USER> (block test
(return-from test 0)
(print "hello")
(+ 2 2))
0

A shorthand return is used to exit from blocks with a nil name (which are implicit in most of the looping constructs we'll see further):

CL-USER> (block nil
(return 0)
(print "hello")
(+ 2 2))
0

Finally, if we don't even plan to ever prematurely return from a block, we can use the progn operator that doesn't require a name:

CL-USER> (progn
(print "hello")
(+ 2 2))
"hello"
4

Branching

Conditional expressions calculate the value of their first form and, depending on it, execute one of several alternative code paths. The basic conditional expression is if:

CL-USER> (if nil
(print "hello")
(print "world"))
"world"
"world"

As we've seen, nil is used to represent logical falsity, in Lisp. All other values are considered logically true, including the symbol T or t which directly has the meaning of truth.

And when we need to do several things at once, in one of the conditional branches, it's one of the cases when we need to use progn or block:

CL-USER> (if (+ 2 2)
(progn
(print "hello")
4)
(print "world"))
"hello"
4

However, often we don't need both branches of the expressions, i.e. we don't care what will happen if our condition doesn't hold (or holds). This is such a common case that there are special expressions for it in Lisp — when and unless:

CL-USER> (when (+ 2 2)
(print "hello")
4)
"world"
4
CL-USER> (unless (+ 2 2)
(print "hello")
4)
NIL

As you see, it's also handy because you don't have to explicitly wrap the sequential forms in a progn.

One other standard conditional expression is cond, which is used when we want to evaluate several conditions in a row:

CL-USER> (cond
((typep 4 'string)
(print "hello"))
((> 4 2)
(print "world")
nil)
(t
(print "can't get here")))
"world"
NIL

The t case is a catch-all that will trigger if none of the previous conditions worked (as its condition is always true). The above code is equivalent to the following:

(if (typep 4 'string)
(print "hello")
(if (> 4 2)
(progn
(print "world")
nil)
(print "can't get here")))

There are many more conditional expressions in Lisp, and it's very easy to define your own with macros (it's actually, how when, unless, and cond are defined), and when there arises a need to use a special one, we'll discuss its implementation.

Looping

Like with branching, Lisp has a rich set of looping constructs, and it's also easy to define new ones when necessary. This approach is different from the mainstream languages, that usually have a small number of such statements and, sometimes, provide an extension mechanism via polymorphism. And it's even considered to be a virtue justified by the idea that it's less confusing for the beginners. It makes sense to a degree. Still, in Lisp, both generic and custom approaches manage to coexist and complement each other. Yet, the tradition of defining custom control constructs is very strong. Why? One justification for this is the parallel to human languages: indeed, when and unless, as well as dotimes and loop are either directly words from the human language or are derived from natural language expressions. Our mother tongues are not so primitive and dry. The other reason is because you can™. I.e. it's so much easier to define custom syntactic extensions in Lisp than in other languages that sometimes it's just impossible to resist. :) And in many use cases they make the code much more simple and clear.

Anyway, for a complete beginner, actually, you have to know the same number of iteration constructs as in any other language. The simplest one is dotimes that iterates the counter variable a given number of times (from 0 to (- times 1)) and executes the body on each iteration. It is analogous to for (int i = 0; i < times; i++) loops found in C-like languages.

CL-USER> (dotimes (i 3)
(print i))
0
1
2
NIL

The return value is nil by default, although it may be specified in the loop header.

The most versatile (and low-level) looping construct, on the other hand, is do:

CL-USER> (do ((i 0 (1+ i))
(prompt (read-line) (read-line)))
((> i 1) i)
(print (pair i prompt))
(terpri))
foo

(0 "foo")
bar

(1 "bar")

2

do iterates a number of variables (zero or more) that are defined in the first part (here, i and prompt) until the termination condition in the second part is satisfied (here, (> i 1)), and as with dotimes (and other do-style macros) executes its body — rest of the forms (here, print and terpri, which is a shorthand for printing a newline). read-line reads from standard input until newline is encountered and 1+ returns the current value of i increased by 1.

All do-style macros (and there's quite a number of them, both built-in and provided from external libraries: dolist, dotree, do-register-groups, dolines etc.) have an optional return value. In do it follows the termination condition, here — just return the final value of i.

Besides do-style iteration, there's also a substantially different beast in CL ecosystem — the infamous loop macro. It is very versatile, although somewhat unlispy in terms of syntax and with a few surprising behaviors. But elaborating on it is beyond the scope of this book, especially since there's an excellent introduction to loop in Peter Seibel's "LOOP for Black Belts".

Many languages provide a generic looping construct that is able to iterate an arbitrary sequence, a generator and other similar-behaving things — usually, some variant of foreach. We'll return to such constructs after speaking about sequences in more detail.

And there's also an alternative iteration philosophy: the functional one, which is based on higher-order functions (map, reduce and similar) — we'll cover it in more detail in the following chapters, also.

Procedures and Variables

We have covered the 3 pillars of structural programming, but one essential, in fact, the most essential, construct still remains — variables and procedures.

What if I told you that you can perform the same computation many times, but changing some parameters... OK, OK, pathetic joke. So, procedures are the simplest way to reuse computations, and procedures accept arguments, which allows to pass values into their bodies. A procedure, in Lisp, is called lambda. You can define one like this: (lambda (x y) (+ x y)). When used, such procedure — also often called a function, although it's quite different from what we consider a mathematical function — and, in this case, it's called an anonymous function as it doesn't have any name — will produce the sum of its inputs:

CL-USER> ((lambda (x y) (+ x y)) 2 2)
4

It is quite cumbersome to refer to procedures by their full code signature, and an obvious solution is to assign names to them. A common way to do that in Lisp is via the defun macro:

CL-USER> (defun add2 (x y) (+ x y))
ADD2
CL-USER> (add2 2 2)
4

The arguments of a procedure are examples of variables. Variables are used to name memory cells whose contents are used more than once and may be changed in the process. They serve different purposes:

  • to pass data into procedures
  • as temporary placeholders for some varying data in code blocks (like loop counters)
  • as a way to store computation results for further reuse
  • to define program configuration parameters (like the OS environment variables, which can also be thought of as arguments to the main function of our program)
  • to refer to global objects that should be accessible from anywhere in the program (like *standard-output* stream)
  • and more

Can we live without variables? Theoretically, well, maybe. At least, there's the so-called point-free style of programming that strongly discourages the use of variables. But, as they say, don't try this at home (at least, until you know perfectly well what you're doing :) Can we replace variables with constants, or single-assignment variables, i.e. variables that can't change over time? Such approach is promoted by the so called purely functional languages. To a certain degree, yes. But, from the point of view of algorithms development, it makes life a lot harder by complicating many optimizations if not totally outruling them.

So, how to define variables in Lisp? You've already seen some of the variants: procedural arguments and let-bindings. Such variables are called local or lexical, in Lisp parlance. That's because they are only accessible locally throughout the execution of the code block, in which they are defined. let is a general way to introduce such local variables, which is lambda in disguise (a thin layer of syntax sugar over it):

CL-USER> (let ((x 2))
(+ x x))
4
CL-USER> ((lambda (x) (+ x x))
2)
4

While with lambda you can create a procedure in one place, possibly, assign it to a variable (that's what, in essence, defun does), and then apply many times in various places, with let you define a procedure and immediately call it, leaving no way to store it and re-apply again afterwards. That's even more anonymous than an anonymous function! Also, it requires no overhead, from the compiler. But the mechanism is the same.

Creating variables via let is called binding, because they are immediately assigned (bound with) values. It is possible to bind several variables at once:

CL-USER> (let ((x 2)
(y 2))
(+ x y))
4

However, often we want to define a row of variables with next ones using the previous ones' values. It is cumbersome to do with let, because you need nesting (as procedural arguments are assigned independently):

(let ((len (length list)))
(let ((mid (floor len 2)))
(let ((left-part (subseq list 0 mid))
(right-part (subseq list mid)))
...)))

To simplify this use case, there's let*:

(let* ((len (length list))
(mid (floor len 2))
(left-part (subseq list 0 mid))
(right-part (subseq list mid)))
...)

However, there are many other ways to define variables: bind multiple values at once; perform the so called "destructuring" binding when the contents of a data structure (usually, a list) are assigned to several variables, first element to the first variable, second to the second, and so on; access the slots of a certain structure etc. For such use cases, there's with binding from RUTILS, which works like let* with extra powers. Here's a very simple example:

(with ((len (length list))
(mid rem (floor len 2))
;; this group produces a list of 2 sublists
;; that are bound to left-part and right-part
;; and ; character starts a comment in lisp
((left-part right-part) (group mid list)))
...

In the code throughout this book, you'll only see these two binding constructs: let for trivial and parallel bindings and with for all the rest.

As we said, variables may not only be defined, or they'd be called "constants", instead, but also modified. To alter the variable's value we'll use := from RUTILS (it is an abbreviation of the standard psetf macro):

CL-USER> (let ((x 2))
(print (+ x x))
(:= x 4)
(+ x x))
4
8

Modification, generally, is a dangerous construct as it can create unexpected action-at-a-distance effects, when changing the value of a variable in one place of the code effects the execution of a different part that uses the same variable. This, however, can't happen with lexical variables: each let creates its own scope that shields the previous values from modification (just like passing arguments to a procedure call and modifying them within the call doesn't alter those values, in the calling code):

CL-USER> (let ((x 2))
(print (+ x x))
(let ((x 4))
(print (+ x x)))
(print (+ x x)))
4
8
4

Obviously, when you have two lets in different places using the same variable name they don't affect each other and these two variables are, actually, totally distinct.

Yet, sometimes it is useful to modify a variable in one place and see the effect in another. The variables, which have such behavior, are called global or dynamic (and also special, in Lisp jargon). They have several important purposes. One is defining important configuration parameters that need to be accessible anywhere. The other is referencing general-purpose singleton objects like the standard streams or the state of the random number generator. Yet another is pointing to some context that can be altered in certain places subject to the needs of a particular procedure (for instance, the *package* global variable determines in what package we operate — CL-USER in all previous examples). More advanced uses for global variables also exist. The common way to define a global variable is with defparameter, which specifies its initial value:

(defparameter *connection* nil
"A default connection object.") ; this is a docstring describing the variable

Global variables, in Lisp, usually have so-called "earmuffs" around their names to remind the user of what they are dealing with. Due to their action-at-a-distance feature, it is not the safest programming language feature, and even a "global variables considered harmful" mantra exists. Lisp is, however, not one of those squeamish languages, and it finds many uses for special variables. By the way, they are called "special" due to a special feature, which greatly broadens the possibilities for their sane usage: if bound in let they act as lexical variables, i.e. the previous value is preserved and restored upon leaving the body of a let:

CL-USER> (defparameter *temp* 1)
*TEMP*
CL-USER> (print *temp*)
1
CL-USER> (progn
(let ((*temp* 2))
(print *temp*)
(:= *temp* 4)
(print *temp*))
*temp*)
2
4
1

Procedures in Lisp are first-class objects. This means the one you can assign to a variable, as well as inspect and redefine at run-time, and, consequently, do many other useful things with. The RUTILS function call[1] will call a procedure passed to it as an argument:

CL-USER> (call 'add2 2 2)
4
CL-USER> (let ((add2 (lambda (x y) (+ x y))))
(call add2 2 2))
4

In fact, defining a function with defun also creates a global variable, although in the function namespace. Functions, types, classes — all of these objects are usually defined as global. Though, for functions there's a way to define them locally with flet:

CL-USER> (foo 1)
;; ERROR: The function COMMON-LISP-USER::FOO is undefined.
CL-USER> (flet ((foo (x) (1+ x)))
(foo 1))
2
CL-USER> (foo 1)
;; ERROR: The function COMMON-LISP-USER::FOO is undefined.

Comments

Finally, there's one more syntax we need to know: how to put comments in the code. Only losers don't comment their code, and comments will be used extensively, throughout this book, to explain some parts of the code examples, inside of them. Comments, in Lisp, start with a ; character and end at the end of a line. So, the following snippet is a comment: ; this is a comment. There's also a common style of commenting, when short comments that follow the current line of code start with a single ;, longer comments for a certain code block precede it, occupy the whole line or a number of lines and start with ;;, comments for code section that include several Lisp top-level forms (global definitions) start with ;;; and also occupy whole lines. Besides, each global definition can have a special comment-like string, called the "docstring", that is intended to describe its purpose and usage, and that can be queried programmatically. To put it all together, this is how different comments may look like:

;;; Some code section

(defun this ()
"This has a curious docstring."
...)

(defun that ()
...
;; this is an interesting block don't you find?
(block interesting
(print "hello"))) ; it prints hello

Getting Started

I strongly encourage you to play around with the code presented in the following chapters of the book. Try to improve it, find issues with it, and come up with fixes, measure and trace everything. This will not only help you master some Lisp, but also understand much deeper the descriptions of the discussed algorithms and data structures, their pitfalls and corner cases. Doing that is, in fact, quite easy. All you need is install some Lisp (preferrably, SBCL or CCL), add Quicklisp, and, with its help, RUTILS.

As I said above, the usual way to work with Lisp is interacting with its REPL. Running the REPL is fairly straightforward. On my Mint Linux I'd run the following commands:

$ apt-get install sbcl rlwrap
...
$ rlwrap sbcl
...
* (print "hello world")

"hello world"
"hello world"
*

* is the Lisp raw prompt. It's, basically, the same as CL-USER> prompt you'll see in SLIME. You can also run a Lisp script file: sbcl --script hello.lisp. If it contains just a single (print "hello world") line we'll see the "hello world" phrase printed to the console.

This is a working, but not the most convenient setup. A much more advanced environment is SLIME that works inside Emacs (a similar project for vim is called SLIMV). There exists a number of other solutions: some Lisp implementations provide and IDE, some IDEs and editors provide integration.

After getting into the REPL, you'll have to issue the following commands:

* (ql:quickload :rutilsx)
* (use-package :rutilsx)
* (named-readtables:in-readtable rutilsx-readtable)

Well, that's enough Lisp you'll need to know, to start. We'll get acquainted with other Lisp concepts as they will become needed for the next chapters of this book. Yet, you're all set to read and write Lisp programs. They may seem unfamiliar, at first, but as you overcome the initial bump and get used to their paranthesised prefix surface syntax, I promise that you'll be able to recognize and appreciate their clarity and conciseness.

So, as they say in Lisp land, happy hacking!


Footnotes:

[1] call is the RUTILS abbreviation of the standard funcall. It was surely fun to be able to call a function from a variable back in the 60's, but now it has become so much more common that there's no need for the prefix ;)

Lispers.deBerlin Lispers Meetup, Monday, 29th July 2019

· 89 days ago

We meet again on Monday 8pm, 29th July. Our host this time is James Anderson (www.dydra.com).

Berlin Lispers is about all flavors of Lisp including Clojure, Scheme and Common Lisp.

We will have two topics this time.

Christian will talk about his time in Sierra Leone during the Ebola outbreak 2014 and where he used Common Lisp for his work. Another topic will be Lisp jobs.

We meet in the Taut-Haus at Engeldamm 70 in Berlin-Mitte, the bell is "James Anderson". It is located in 10min walking distance from U Moritzplatz or U Kottbusser Tor or Ostbahnhof. In case of questions call Christian +49 1578 70 51 61 4.

Vsevolod Dyomkin"Programming Algorithms" Book

· 93 days ago

Drago — a nice example of a real-world binary tree

I'm writing a book about algorithms and Lisp. It, actually, started several years ago, but as I experience a constant shortage of quality time to devote to such side activities, short periods of writing alternated with long pauses. Now, I'm, finally, at the stage when I can start publishing it. But I intend to do that, first, gradually in this blog and then put the final version — hopefully, improved and polished thanks to the comments of the first readers — on Leanpub. The book will be freely available with a CC BY-NC-ND license.

The book will have 16 chapters grouped into 3 parts: essential data structures, derivative ones, and advanced algorithms. I plan to publish each one approximately once a week. To finish the process by the end of the year.

I hope the book turns out to be an enlightening read for those who start their career in programming or want to level up in it. At least, I tried to accumulate inside all my experience from production algorithmic development, teaching of these topics, and Lisp programming, over the last 10+ years. Below is a short preface and an introductory chapter about Complexity.

Why Algorithms Matter

In our industry, currently, there seems to prevail a certain misunderstanding of the importance of algorithms for the working programmer. There's often a disconnect between the algorithmic questions posed at the job interviews and the everyday essence of the same job. That's why opinions are voiced that you, actually, don't have to know CS to be successful in the software developer's job. That's true, you don't, but you'd better do if you want to be in the notorious top 10% programmers. For several reasons. One is that, actually, you can find room for algorithms almost at every corner of your work — provided you are aware of their existence. To put it simply, the fact that you don't know a more efficient or elegant solution to a particular programming problem doesn't make your code less crappy. The current trend in software development is that, although the hardware becomes more performant, the software becomes slower faster. There are two reasons for that, in my humble opinion:

  1. Most of the application programmers don't know the inner workings of the underlying platforms. And the number of platform layers keeps increasing.
  2. Most of the programmers also don't know enough algorithms and algorithmic development technics to squeeze the most from their code. And often this means a loss of one or more orders of magnitude of performance.

In the book, I'll address, primarily, the second issue but will also try to touch on the first whenever possible.

Besides, learning the art of solving difficult algorithmic problems trains the brain and makes it more apt to solving various other problems, in the course of your day-to-day work.

Finally, you will be speaking the same lingua franca as other advanced programmers — the tongue that transcends the mundane differences of particular programming languages. And you'll gain a more detached view of those differences, freeing your mind from the dictate of a particular set of choices exhibiting in any one of them.

One of the reasons for this gap of understanding of the value of algorithms, probably, originates from how they are usually presented in the computer science curriculum. First, it is often done in a rather theoretical or "mathematical" way with rigorous proofs and lack of connection to the real world™. Second, the audience is usually freshmen or sophomores who don't have a lot of practical programming experience and thus can't appreciate and relate how this knowledge may be applied to their own programming challenges (because they didn't have those yet) — rather, most of them are still at the level of struggling to learn well their first programming language and, in their understanding of computing, are very much tied to its choices and idiosyncrasies.

In this book, the emphasis is made on the demonstration of the use of the described data structures and algorithms in various areas of computer programming. Moreover, I anticipate that the self-selected audience will comprise programmers with some experience in the field. This makes a significant difference in the set of topics that are relevant and how they can be conveyed. Another thing that helps a lot is when the programmer has a good command of more than one programming language, especially, if the languages are from different paradigms: static and dynamic, object-oriented and functional. These factors allow bridging the gap between "theoretical" algorithms and practical coding, making the topic accessible, interesting, and inspiring.

This is one answer to a possible question: why write another book on algorithms? Indeed, there are several good textbooks and online courses on the topic, of which I'd recommend the most Steven Skienna's The Algorithm Design Manual. Yet, as I said, this book is not at all academic in presentation of the material, which is a norm for other textbooks. Except for simple arithmetic, it contains almost no "math" or proofs. And, although proper attention is devoted to algorithm complexity, it doesn't deal with theories of complexity or computation and similar scientific topics. Besides, all the algorithms and data structures come with some example practical use cases. Last, but not least, there's no book on algorithms in Lisp, and, in my opinion, it's a great topic to introduce the language. The next chapter will provide a crash course to grasp the basic ideas, and then we'll discuss various Lisp programming approaches alongside the algorithms they will be used to implement.

This is an introductory book, not a bible of algorithms. It will draw a comprehensive picture and cover all topics necessary for further advancement of your algorithms knowledge. However, it won't go too deep into the advanced topics, such as persistent or probabilistic data structures, advanced tree, graph, and optimization algorithms, as well as algorithms for particular fields, such as Machine Learning, Cryptography or Computational Geometry. All of those fields require (and usually have) separate books of their own.

A Few Words about Lisp

For a long time, I've been contemplating writing an introductory book on Lisp, but something didn't add up, I couldn't see the coherent picture, in my mind. And then I got a chance to teach algorithms with Lisp. From my point of view, it's a perfect fit for demonstrating data structures and algorithms (with a caveat that students should be willing to learn it), while discussing the practical aspects of those algorithms allows to explain the language naturally. At the same time, this topic requires almost no endeavor into the adjacent areas of programming, such as architecture and program design, integration with other systems, user interface, and use of advanced language features, such as types or macros. And that is great because those topics are overkill for an introductory text and they are also addressed nicely and in great detail elsewhere (see Practical Common Lisp and ANSI Common Lisp).

Why Lisp is great for algorithmic programs? One reason is that the language was created with such use case in mind. It has support for all the proper basic data structures, such as arrays, hash-tables, linked lists, strings, and tuples. It also has a numeric tower, which means no overflow errors and, so, a much saner math. Next, it's created for the interactive development style, so the experimentation cycle is very short, there's no compile-wait-run-revise red tape, and there are no unnecessary constraints, like the need for additional annotations (a.k.a. types), prohibition of variable mutation or other stuff like that. You just write a function in the REPL, run it and see the results. In my experience, Lisp programs look almost like pseudocode. Compared to other languages, they may be slightly more verbose at times but are much more clear, simple, and directly compatible with the algorithm's logical representation.

But why not choose a popular programming language? The short answer is that it wouldn't have been optimal. There are 4 potential mainstream languages that could be considered for this book: C++, Java, Python, and JavaScript. (Surely, there's already enough material on algorithms that uses them). The first two are statically-typed, which is, in itself, a big obstacle to using them as teaching languages. Java is also too verbose, while C++ — too low-level. These qualities don't prevent them from being used in the majority of production algorithm code, in the wild, and you'll, probably, end up dealing with such code sooner than later if not already. Besides, their standard libraries provide great examples of practical algorithm implementation. But, I believe that gaining good conceptual understanding will allow to easily adapt to one of these languages if necessary while learning them in parallel with diving into algorithms creates unnecessary complexity. Python and JS are, in many ways, the opposite choices: they are dynamic and provide some level of an interactive experience (albeit inferior compared to Lisp), but those languages are in many ways anti-algorithmic. Trying to be simple and accessible, they hide too much from the programmer and don't give enough control of the concrete data. Teaching algorithms, using their standard libraries, seems like cheating to me as their basic data structures often are not what they claim to be. Lisp is in the middle: it is both highly interactive and gives enough control of the environment, while not being too verbose and demanding. And the price to pay — the unfamiliar syntax — is really small, in my humble opinion.

Yet, there's another language that is rapidly gaining popularity and is considered by many to be a good choice for algorithmic development — Rust. It's also a static language, although not so ceremonial as Java or C++. However, neither am I an expert in Rust, nor intend to become one. Moreover, I think the same general considerations regarding static languages apply to it.

Algorithmic Complexity

Complexity is a point that will be mentioned literally on every page of this book; the discussion of any algorithm or data structure can't avoid this topic. After correctness, it is the second most important quality of every algorithm — moreover, often correctness alone doesn't matter if complexity is neglected, while the opposite is possible: to compromise correctness somewhat in order to get significantly better complexity. By and large, algorithm theory differs from other subjects of CS in that it concerns not about presenting a working (correct) way to solve some problem but about finding an efficient way to do it. Where efficiency is understood as the minimal (or admissible) number of operations performed and occupied memory space.

In principle, the complexity of an algorithm is the dependence of the number of operations that will be performed on the size of the input. It is crucial to the computer system's scalability: it may be easy to solve the programming problem for a particular set of inputs, but how will the solution behave if the input is doubled, increased tenfold or million-fold? This is not a theoretical question, and an analysis of any general-purpose algorithm should have a clear answer to it.

Complexity is a substantial research topic: a whole separate branch of CS — Complexity Theory — exists to study it. Yet, throughout the book, we'll try to utilize the end results of such research without delving deep into rigorous proofs or complex math, especially since, in most of the cases, measuring complexity is a matter of simple counting. Let's look at the following illustrative example:


(defun mat-max (mat)
(let (max)
(dotimes (i (array-dimension mat 0))
(dotimes (j (array-dimension mat 1))
(when (or (null max)
(> (aref mat i j) max))
(:= max (aref mat i j)))))
max))

This function finds the maximum element of a two-dimensional array (matrix):


CL-USER> (mat-max #2A((1 2 3) (4 5 6)))
6

What's its complexity? To answer, we can just count the number of operations performed: at each iteration of the inner loop, there are 2 comparisons involving 1 array access, and, sometimes, if the planets align we perform another access for assignment. The inner loop is executed (array-dimension mat 1) times (let's call it m where m=3), and the outer one — (array-dimension mat 0) (n=2, in the example). If we sum this all up we'll get: n * m * 4 as an upper limit, for the worst case when each sequent array element is larger then the previous. As a rule of thumb, each loop adds multiplication to the formula, and each sequential block adds a plus sign.

In this calculation, there are two variables (array dimensions n and m) and one constant (the number of operations performed for each array element). There exists a special notation — Big-O — used to simplify the representation of end results of such complexity arithmetic. In it, all constants are reduced to 1, and thus m * 1 becomes just m, and also since we don't care about individual array dimension differences we can just put n * n instead of n * m. With such simplification, we can write down the final complexity result for this function: O(n^2). In other words, our algorithm has quadratic complexity (which happens to be a variant of a broader class called "polynomial complexity") in array dimensions. It means that by increasing the dimensions of our matrix ten times, we'll increase the number of operations of the algorithm 100 times. In this case, however, it may be more natural to be concerned with the dependence of the number of operations on the number of elements of the matrix, not its dimensions. We can observe that n^2 is the actual number of elements, so it can also be written as just n — if by `n` we mean the number of elements, and then the complexity is linear in the number of elements (O(n)). As you see, it is crucial to understand what `n` we are talking about!

There are just a few more things to know about Big-O complexity before we can start using it to analyze our algorithms.

1. There are 6 major complexity classes of algorithms:

  • constant-time (O(1))
  • sublinear (usually, logarithmic — O(log n))
  • linear (O(n)) and superlinear (O(n * log n))
  • higher-order polynomial (O(n^c), where c is some constant greater than 1)
  • exponential (O(с^n), where с is usually 2 but, at least, greater than 1)
  • and just plain lunatic complex (O(n!) and so forth) — I call them O(mg), jokingly

Each class is a step-function change in performance, especially, at scale. We'll talk about each one of them as we'll be discussing the particular examples of algorithms falling into it.

2. Worst-case vs. average-case behavior. In this example, we saw that there may be two counts of operations: for the average case, we can assume that approximately half of the iterations will require assignment (which results in 3,5 operations in each inner loop), and, for the worst case, the number will be exactly 4. As Big-O reduces all numbers to 1, for this example, the difference is irrelevant, but there may be others, for which it is much more drastic and can't be discarded. Usually, for such algorithms, both complexities should be mentioned (alongside with ways to avoid worst-case scenarios): a good example is quicksort algorithm described in the subsequent chapter.

3. We have also seen the so-called "constant factors hidden by the Big-O notation". I.e., from the point of view of algorithm complexity, it doesn't matter if we need to perform 3 operations in the inner loop or 30. Yet, it is quite important in practice, and we'll also discuss it below when examining binary search. Even more, some algorithms with better theoretical complexity may be worse in many practical applications due to these hidden factors (for example, until the dataset reaches a certain size).

4. Finally, besides execution time complexity, there's also space complexity, which instead of the number of operations measures the amount of storage space used proportional to the size of the input. In general, similar approaches are applied to its estimation.


Quicklisp newsaserve renaming breaks stuff

· 96 days ago
Recently Portable AllegroServe renamed its main system from "aserve" to "paserve". This breaks a bunch of stuff, including parts of aserve itself. Here are some other things:

Other failures are available in a daily build failure report

Does anyone have any interest in getting these projects building with the latest updates to Portable AllegroServe?

update This has been resolved, for now, by reverting the change.

Nicolas HafnerA Month of Daily Gamedev - Confession 86

· 99 days ago

header
That didn't feel as long as it was. A month ago I promised to do daily streams of game development. So far I've held true to that. I'm sure I won't be able to hold that true forever, but I'll try to keep going for as long as I can. In this one month alone, a lot has changed for the game though!

A new architecture for map organisation was implemented, including a new save file format for that. As part of that work I also completely rewrote the tilemap rendering as well. The game has a lighting system now, too, based on signed distance functions to compute precise light areas. In terms of physics, collision detection was revamped to properly support slopes and moving platforms, giving much more freedom for level design.

All of the art assets that existed previously were also dropped and replaced with new ones. I'm still working on that part, since I'm not quite happy with the current set of tiles. I'll also have to add more animations, and of course repeat the animation and character design work for any NPC I might add to the game. That's a bit of a ways out though, as I'll need to think about world building and story writing first before I can really get into that. Now there's a hard challenge!

Finally, in the last week I designed a new language for writing dialogue with branching, choices, looping, and so forth. To support this I extended the syntax of Markless, and added a compiler to transform the Markless AST into a simple assembly language, which is then executed in a simple, suspendable VM. Added on to that there's now a quest system that should be general enough to allow writing any kind of quest I'll need. Currently though it's lacking a way to conveniently write these quests, so that's a task to work on in the near future.

I intend on writing a simple UI to create and edit these quests. I'm not sure yet what I'll use for that, though. I'm most well-versed with Qt, but maybe I should finally cave in and give CLIM a shot. Or perhaps LTK. We'll have to see.

There's still about two months left in my summer break before university resumes. I hope to keep going with this until then, so expect further daily streams and more progress! As before, the stream still happens every day at 20:00 CEST, on https://stream.shinmera.com, or https://twitch.tv/shinmera. I've tremendously appreciated all the people that stop by in chat to watch, and even talk with me during work. It's made things so much more enjoyable for me. Really, thank you so much!

Quicklisp newsJuly 2019 Quicklisp dist update now available

· 103 days ago
New projects:
  • adopt — Simple, flexible, UNIX-style option parsing. — MIT
  • bike — Common Lisp .Net Core Interop — MIT
  • binpack — Rectangle packer for sprite/texture atlases — MIT
  • cl-ipfs-api2 — Bindings for the IPFS HTTP API. — GPLv3
  • cl-keycloak — Describe cl-keycloak here — GPLv3
  • cl-lzlib — lzip (LZMA) (de)compression using binding to lzlib — GPL-3
  • cl-steamworks — Generator for the low-level steamworks bindings. — zlib
  • csv — Read CSV into lists natively. Convert CSV into lists dangerously. — GNU GPL, version 3
  • datum-comments — datum #;(comments) for common lisp — Public Domain (Unlicense)
  • fiveam-asdf — Library to integrate FiveAM testing with ASDF TEST-OP and TEST-SYSTEM — Lisp LGPL
  • lastfm — Interface for the Last.fm API (https://www.last.fm/api/) — GPLv3
  • lyrics — Song lyrics with local database — GPLv3
  • method-hooks — simple qualifiable hooks defined like methods with the option to modify the dispatch method and how dispatch happens — Mozilla Public License Version 2.0
  • origin — A native Lisp graphics math library with an emphasis on performance and correctness. — MIT
  • patchwork — A spritesheet packer for games. — MIT
  • youtube — Play youtube urls with or without video using mpv — GPLv3
  • zbucium — last.fm music player with lyrics — GPLv3
Updated projects3d-matrices3d-vectorsalexandriaalso-alsaaprilarray-operationsarray-utilsatomicsaws-sign4binfixbstcari3sceplceramiccffichirpcl+sslcl-algebraic-data-typecl-allcl-anacl-cffi-gtkcl-collidercl-db3cl-decimalscl-digikar-utilitiescl-enumerationcl-environmentscl-feedparsercl-flaccl-fondcl-formscl-fusecl-fuse-meta-fscl-gamepadcl-glfw3cl-gpiocl-hamcrestcl-inotifycl-just-getopt-parsercl-k8055cl-ledgercl-mangocl-marklesscl-mixedcl-monitorscl-mpg123cl-mpicl-ntp-clientcl-openglcl-out123cl-patternscl-pngcl-rabbitcl-random-forestcl-rdkafkacl-rulescl-smtpcl-soloudcl-spidevcl-strcl-whocl-yesqlclackcleshclipcloser-mopclsscom.clearly-useful.generic-collection-interfacecommand-line-argumentsconcrete-syntax-treeconfiguration.optionscroatoancrypto-shortcutscxml-rngdata-lensdeedsdeferreddefinitionsdissectdjuladocbrowserdocumentation-utilsdoubly-linked-listdufyeazy-projecteclectorelffind-portflac-metadataflac-parserflarefloat-featuresflowforform-fiddlefxmlgendlgeneric-clglsl-toolkitgolden-utilshalftoneharmonyhelambdaphumblericlendarincf-clinkwellironcladjsownkenzolambda-fiddlelanguage-codeslasslegitlichat-ldaplichat-protocollichat-serverliblichat-tcp-clientlichat-tcp-serverlichat-ws-serverlionchatlistopialocal-timelquerymaidenmcclimmitommapmodularizemodularize-hooksmodularize-interfacesmultilang-documentationmultiposterninevehnodguinorthnumpy-file-formatosicatoverlordoxenfurtpango-markupparachuteparsleypathname-utilspetalisppipingplokamiplumpplump-bundleplump-sexpplump-texpngloadpy4clpzmqqlotqmyndqt-libsqtoolsqtools-uiquickutilquilcqvmracerrandom-stateratifyredirect-streamregular-type-expressionremote-jsreplicrpcqrtg-mathsc-extensionsscreamersealable-metaobjectsselserapeumshadowsimple-actorssimple-inferiorssimple-tasksslimeslysnoozesoftdrinksouthstaplestatic-dispatchstudio-clientstumpwmsystem-localeterrabletootertrace-dbtriviatrivial-argumentstrivial-backtracetrivial-benchmarktrivial-bit-streamstrivial-cltl2trivial-continuationtrivial-featurestrivial-indenttrivial-main-threadtrivial-mimestrivial-monitored-threadtrivial-pooled-databasetrivial-signaltrivial-thumbnailtrivial-utilitiestrivial-variable-bindingsubiquitousumbrausocketverbosevernacularwoo.

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

Lispers.deLisp-Meetup in Hamburg on Monday, 1st July 2019

· 117 days ago

We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CEST on 1st July 2019.

This is an informal gathering of Lispers of all experience levels.


For older items, see the Planet Lisp Archives.


Last updated: 2019-10-21 15:10