Planet Lisp
http://planet.lisp.org/
Planet LispenLispers.de: Lisp-Meetup in Hamburg on Monday, 6th May 2019https://www.lispers.de/#2019-05-06-Hamburg
https://www.lispers.de/#2019-05-06-Hamburg
<p class="content">We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CET on 6th May 2019.</p><p class="content">Christian was at ELS and will report, and we will talk about our attempts at a little informal language benchmark.</p><p class="content">This is an informal gathering of Lispers of all experience levels.</p><p class="content">Update: the fine folks from stk-hamburg.de will be there and talk about their Lisp-based work!</p>Tue, 30 Apr 2019 00:00:00 GMTPaul Khuong: Fractional Set Covering With Expertshttp://www.pvk.ca/Blog/2019/04/23/fractional-set-covering-with-experts/
http://www.pvk.ca/Blog/2019/04/23/fractional-set-covering-with-experts/
<p>Last winter break, I played with one of the annual
<a href="https://en.wikipedia.org/wiki/Vehicle_routing_problem">capacitated vehicle routing problem</a>
(CVRP) “Santa Claus” contests. Real world family stuff
took precedence, so, after the
obvious <a href="http://webhotel4.ruc.dk/~keld/research/LKH-3/">LKH</a>
with <a href="http://www.math.uwaterloo.ca/tsp/concorde.html">Concorde</a>
polishing for individual tours, I only had enough time for
one diversification moonshot. I decided to treat the
high level problem of assembling prefabricated routes as
a <a href="https://en.wikipedia.org/wiki/Set_cover_problem">set covering problem</a>:
I would solve the linear programming (LP) relaxation for the
min-cost set cover, and use randomised rounding to feed new starting
points to LKH. Add a lot of luck, and that might
just strike the right balance between solution quality and diversity.</p>
<p>Unsurprisingly, luck failed to show up, but I had ulterior motives:
I’m much more interested in exploring first order methods for
relaxations of combinatorial problems than in solving CVRPs. The
routes I had accumulated after a couple days turned into a
<a href="https://archive.org/details/santa-cvrp-set-cover-instance">set covering LP with 1.1M decision variables, 10K constraints, and 20M nonzeros</a>.
That’s maybe denser than most combinatorial LPs (the aspect ratio
is definitely atypical), but 0.2% non-zeros is in the right ballpark.</p>
<p>As soon as I had that fractional set cover instance, I tried to solve
it with a simplex solver. Like any good Googler, I used <a href="https://developers.google.com/optimization/lp/glop">Glop</a>... and stared at a blank terminal for more than one hour.</p>
<p>Having observed that lack of progress, I implemented the toy I really
wanted to try out: first order online “learning with experts”
(specifically, <a href="https://arxiv.org/abs/1301.0534">AdaHedge</a>) applied to
LP <em>optimisation</em>. I let this <a href="https://gist.github.com/pkhuong/c508849180c6cf612f7335933a88ffa6">not-particularly-optimised serial CL code</a>
run on my 1.6 GHz laptop for 21 hours, at which point the first
order method had found a 4.5% infeasible solution (i.e., all the
constraints were satisfied with \(\ldots \geq 0.955\) instead of
\(\ldots \geq 1\)). I left Glop running long after the contest was
over, and finally stopped it with no solution after more than 40 <em>days</em>
on my 2.9 GHz E5.</p>
<p>Given the shape of the constraint matrix, I would have loved to try an
interior point method, but all my licenses had expired, and I didn’t
want to risk OOMing my workstation. <a href="https://twitter.com/e_d_andersen">Erling Andersen</a>
was later kind enough to test Mosek’s interior point solver on it.
The runtime was much more reasonable:
<a href="https://twitter.com/e_d_andersen/status/1120579664806842368">10 minutes on 1 core, and 4 on 12 cores</a>, with the sublinear speed-up mostly caused by the serial
crossover to a simplex basis.</p>
<p>At 21 hours for a naïve implementation, the “learning with experts”
first order method isn’t practical yet, but also not obviously
uninteresting, so I’ll write it up here.</p>
<p>Using online learning algorithms for the “experts problem” (e.g.,
<a href="https://cseweb.ucsd.edu/~yfreund/papers/adaboost.pdf">Freund and Schapire’s Hedge algorithm</a>)
to solve linear programming <em>feasibility</em> is now a classic result;
<a href="https://jeremykun.com/2017/02/27/the-reasonable-effectiveness-of-the-multiplicative-weights-update-algorithm/">Jeremy Kun has a good explanation on his blog</a>. What’s
new here is:</p>
<ol>
<li>Directly solving the optimisation problem.</li>
<li>Confirming that the parameter-free nature of <a href="https://arxiv.org/abs/1301.0534">AdaHedge</a> helps.</li>
</ol>
<p>The first item is particularly important to me because it’s a simple
modification to the LP feasibility meta-algorithm, and might make the
difference between a tool that’s only suitable for theoretical
analysis and a practical approach.</p>
<p>I’ll start by reviewing the experts problem, and how LP feasibility is
usually reduced to the former problem. After that, I’ll
cast the reduction as a <a href="https://smartech.gatech.edu/bitstream/handle/1853/24230/karwan_mark_h_197612_phd_154133.pdf">surrogate relaxation</a>
method, rather than a <a href="https://en.wikipedia.org/wiki/Lagrangian_relaxation">Lagrangian relaxation</a>;
optimisation should flow naturally from that
point of view. Finally, I’ll guess why I had more success
with <a href="https://arxiv.org/abs/1301.0534">AdaHedge</a> this time than with
<a href="https://www.satyenkale.com/papers/mw-survey.pdf">Multiplicative Weight Update</a>
eight years ago.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>
<h2>The experts problem and LP feasibility</h2>
<p>I first heard about the experts problem while researching
dynamic sorted set data structures:
<a href="https://dspace.mit.edu/handle/1721.1/10639">Igal Galperin’s PhD dissertation</a>
describes <a href="http://user.it.uu.se/~arnea/abs/partb.html">scapegoat trees</a>, but is really about online learning with
experts.
<a href="https://www.satyenkale.com/papers/mw-survey.pdf">Arora, Hazan, and Kale’s 2012 survey of multiplicative weight update methods</a>.
is probably a better introduction to the topic ;)</p>
<p>The experts problem comes in many variations. The simplest form
sounds like the following. Assume you’re playing a binary prediction
game over a predetermined number of turns, and have access to a fixed
finite set of experts at each turn. At the beginning of every turn,
each expert offers their binary prediction (e.g., yes it will rain
today, or it will not rain today). You then have to make a prediction
yourself, with no additional input. The actual result (e.g., it
didn’t rain today) is revealed at the end of the turn. In general,
you can’t expect to be right more often than the best expert at the
end of the game. Is there a strategy that bounds the “regret,” how
many more wrong prediction you’ll make compared to the expert(s) with
the highest number of correct predictions, and in what circumstances?</p>
<p>Amazingly enough, even with an omniscient adversary that has access to
your strategy and determines both the experts’ predictions and the
actual result at the end of each turn, a stream of random bits (hidden
from the adversary) suffice to bound our expected regret in
\(\mathcal{O}(\sqrt{T}\,\lg n)\), where \(T\) is the number of
turns and \(n\) the number of experts.</p>
<p>I long had trouble with that claim: it just seems too good of a magic
trick to be true. The key realisation for me was that we’re only
comparing against invidivual experts. If each expert is a move in a
<a href="https://www.encyclopediaofmath.org/index.php/Matrix_game">matrix game</a>,
that’s the same as claiming you’ll never do much worse than any pure
strategy. One example of a pure strategy is always playing rock in
Rock-Paper-Scissors; pure strategies are really bad! The trick is
actually in making that regret bound useful.</p>
<p>We need a more continuous version of the experts problem for LP
feasibility. We’re still playing a turn-based game, but, this time,
instead of outputting a prediction, we get to “play” a mixture of the
experts (with non-negative weights that sum to 1). At the beginning
of each turn, we describe what weight we’d like to give to each
experts (e.g., 60% rock, 40% paper, 0% scissors). The cost
(equivalently, payoff) for each expert is then revealed (e.g.,
\(\mathrm{rock} = -0.5\), \(\mathrm{paper} = 0.5\),
\(\mathrm{scissors} = 0\)), and we incur the weighted average
from our play (e.g., \(60\% \cdot -0.5 + 40\% \cdot 0.5 = -0.1\))
before playing the next round.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> The goal is to minimise
our worst-case regret, the additive difference between the total cost
incurred by our mixtures of experts and that of the a posteriori best single
expert. In this case as well, online learning
algorithms guarantee regret in \(\mathcal{O}(\sqrt{T} \, \lg n)\)</p>
<p>This line of research is interesting because simple algorithms achieve
that bound, with explicit constant factors on the order of 1,<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>
and <a href="http://drops.dagstuhl.de/opus/volltexte/2017/7499/pdf/LIPIcs-ICALP-2017-48.pdf">those bounds are known to be non-asymptotically tight for a large class of algorithms</a>.
Like dense linear algebra or fast Fourier transforms, where algorithms
are often compared by counting individual floating point operations,
online learning has matured into such tight bounds that worst-case
regret is routinely presented without Landau notation. Advances improve
constant factors in the worst case, or adapt to easier inputs in order
to achieve “better than worst case” performance.</p>
<p>The <a href="https://jeremykun.com/2017/02/27/the-reasonable-effectiveness-of-the-multiplicative-weights-update-algorithm/">reduction below</a>
lets us take any learning algorithm with an additive regret bound,
and convert it to an algorithm with a corresponding worst-case
iteration complexity bound for \(\varepsilon\)-approximate LP feasibility.
An algorithm that promises low worst-case regret in \(\mathcal{O}(\sqrt{T})\)
gives us an algorithm that needs at most \(\mathcal{O}(1/\varepsilon\sp{2})\)
iterations to return a solution that almost satisfies every constraint in the
linear program, where each constraint is violated by \(\varepsilon\) or less (e.g.,
\(x \leq 1\) is actually \(x \leq 1 + \varepsilon\)).</p>
<p>We first split the linear program in two components, a simple domain
(e.g., the non-negative orthant or the \([0, 1]\sp{d}\) box) and the
actual linear constraints. We then map each of the latter constraints to
an expert, and use an arbitrary algorithm that solves
our continuous version of the experts problem as a black box.
At each turn, the black box will output a set of non-negative weights for
the constraints (experts). We will average the constraints using these
weights, and attempt to find a solution in the intersection of our
simple domain and the weighted average of the linear constraints.</p>
<p>Let’s use Stigler’s <a href="https://neos-guide.org/content/diet-problem">Diet Problem with three foods and two constraints</a>
as a small example, and further simplify it by disregarding the
minimum value for calories, and the maximum value for vitamin A. Our
simple domain here is at least the non-negative orthant: we can’t
ingest negative food. We’ll make things more interesting by also
making sure we don’t eat more than 10 servings of any food per day.</p>
<p>The first constraint says we mustn’t get too many calories</p>
<p>\[72 x\sb{\mathrm{corn}} + 121 x\sb{\mathrm{milk}} + 65 x\sb{\mathrm{bread}} \leq 2250,\]</p>
<p>and the second constraint (tweaked to improve this example) ensures
we ge enough vitamin A</p>
<p>\[107 x\sb{\mathrm{corn}} + 400 x\sb{\mathrm{milk}} \geq 5000,\]</p>
<p>or, equivalently,</p>
<p>\[-107 x\sb{\mathrm{corn}} - 400 x\sb{\mathrm{milk}} \leq -5000,\]</p>
<p>Given weights \([¾, ¼]\), the weighted average of the two constraints is</p>
<p>\[27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}} \leq 437.5,\]</p>
<p>where the coefficients for each variable and for the right-hand side
were averaged independently.</p>
<p>The subproblem asks us to find a feasible point in the intersection
of these two constraints:
\[27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}} \leq 437.5,\]
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10.\]</p>
<p>Classically, we claim that this is just Lagrangian relaxation, and
find a solution to</p>
<p>\[\min 27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}}\]
subject to
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10.\]</p>
<p>In the next section, I’ll explain why I think this analogy is wrong
and worse than useless. For now, we can easily find the maximum one
variable at a time, and find the solution
\(x\sb{\mathrm{corn}} = 0\), \(x\sb{\mathrm{milk}} = 10\),
\(x\sb{\mathrm{bread}} = 0\), with objective value \(-92.5\) (which
is \(530\) less than \(437.5\)).</p>
<p>In general, three things can happen at this point. We could discover
that the subproblem is infeasible. In that case, the original
non-relaxed linear program itself is infeasible: any solution to the
original LP satisfies all of its constraints, and thus would also
satisfy any weighted average of the same constraints. We could also
be extremely lucky and find that our optimal solution to the relaxation is
(\(\varepsilon\)-)feasible for the original linear program; we can stop
with a solution. More commonly, we have a solution that’s feasible for the
relaxation, but not for the original linear program.</p>
<p>Since that solution satisfies the weighted average constraint, the
black box’s payoff for this turn (and for every other turn) is
non-positive. In the current case, the first constraint (on calories)
is satisfied by \(1040\), while the second (on vitamin A) is
violated by \(1000\). On weighted average, the constraints are satisfied by
\(\frac{1}{4}(3 \cdot 1040 - 1000) = 530.\) Equivalently,
they’re violated by \(-530\) on average.</p>
<p>We’ll add that solution to an accumulator vector that will come in
handy later.</p>
<p>The next step is the key to the reduction: we’ll derive payoffs
(negative costs) for the black box from the solution to the last
relaxation. Each constraint (expert) has a payoff equal to its level
of violation in the relaxation’s solution. If a constraint is
strictly satisfied, the payoff is negative; for example, the constraint
on calories is satisfied by \(1040\), so its payoff this turn is
\(-1040\). The constraint on vitamin A is violated by \(1000\),
so its payoff this turn is \(1000\). Next turn, we expect the
black box to decrease the weight of the constraint on calories,
and to increase the weight of the one on vitamin A.</p>
<p>After \(T\) turns, the total payoff for each constraint is equal to
the sum of violations by all solutions in the accumulator. Once we
divide both sides by \(T\), we find that the divided payoff for each
constraint is equal to its violation by the average of the solutions
in the accumulator. For example, if we have two solutions, one that
violates the calories constraint by \(500\) and another that
satisfies it by \(1000\) (violates it by \(-1000\)), the total
payoff for the calories constraint is \(-500\), and the average
of the two solutions does strictly satisfy the linear constraint by
\(\frac{500}{2} = 250\)!</p>
<p>We also know that we only generated feasible solutions to the relaxed
subproblem (otherwise, we’d have stopped and marked the original LP as
infeasible), so the black box’s total payoff is \(0\) or negative.</p>
<p>Finally, we assumed that the black box algorithm guarantees an additive
regret in \(\mathcal{O}(\sqrt{T}\, \lg n)\), so the black box’s payoff
of (at most) \(0\) means that any constraint’s payoff is at most
\(\mathcal{O}(\sqrt{T}\, \lg n)\). After dividing by \(T\), we obtain
a bound on the violation by the arithmetic mean of all solutions in
the accumulator: for all constraint, that violation is in
\(\mathcal{O}\left(\frac{\lg n}{\sqrt{T}}\right)\). In other words, the number
of iteration \(T\) must scale with
\(\mathcal{O}\left(\frac{\lg n}{\varepsilon\sp{2}}\right)\),
which isn’t bad when \(n\) is in the millions but
\(\varepsilon \approx 0.01\).</p>
<p>Theoreticians find this reduction interesting because there are
concrete implementations of the black box, e.g., the
<a href="https://www.satyenkale.com/papers/mw-survey.pdf">multiplicative weight update (MWU) method</a>
with non-asymptotic bounds. For many problems, this makes it
possible to derive the exact number of iterations necessary
to find an \(\varepsilon-\)feasible fractional solution, given
\(\varepsilon\) and the instance’s size (but not the instance
itself).</p>
<p>That’s why algorithms like MWU are theoretically useful tools for
fractional approximations, when we already have subgradient methods
that only need \(\mathcal{O}\left(\frac{1}{\varepsilon}\right)\) iterations:
state-of-the-art algorithms for learning with experts explicit
non-asymptotic regret bounds that yield, for many problems, iteration
bounds that only depend on the instance’s size, but not its data.
While the iteration count when solving LP feasibility with MWU scales
with \(\frac{1}{\varepsilon\sp{2}}\), it is merely proportional to
\(\lg n\), the log of the the number of linear constraints. That’s
attractive, compared to subgradient methods for which the iteration
count scales with \(\frac{1}{\varepsilon}\), but also scales
linearly with respect to instance-dependent values like the distance
between the initial dual solution and the optimum, or the Lipschitz
constant of the Lagrangian dual function; these values are hard to
bound, and are often proportional to the square root of the number
of constraints. Given the choice between
\(\mathcal{O}\left(\frac{\lg n}{\varepsilon\sp{2}}\right)\)
iterations with explicit constants, and a looser
\(\mathcal{O}\left(\frac{\sqrt{n}}{\varepsilon}\right)\), it’s
obvious why MWU and online learning are powerful additions to
the theory toolbox.</p>
<p>Theoreticians are otherwise not concerned with efficiency, so the
usual answer to someone asking about optimisation is to tell them they
can always reduce linear optimisation to feasibility with a binary
search on the objective value. I once made the mistake of
implementing that binary search last strategy. Unsurprisingly, it
wasn’t useful. I also tried another theoretical reduction, where I
looked for a pair of primal and dual -feasible solutions that happened
to have the same objective value. That also failed, in a more
interesting manner: since the two solution had to have almost the same
value, the universe spited me by sending back solutions that were
primal and dual infeasible in the worst possible way. In the end, the
second reduction generated fractional solutions that were neither
feasible nor superoptimal, which really isn’t helpful.</p>
<h2>Direct linear optimisation with experts</h2>
<p>The reduction above works for any “simple” domain, as long as it’s
convex and we can solve the subproblems, i.e., find a point in the
intersection of the simple domain and a single linear constraint or
determine that the intersection is empty.</p>
<p>The set of (super)optimal points in some initial simple domain is
still convex, so we could restrict our search to the search of the
domain that is superoptimal for the linear program we wish to
optimise, and directly reduce optimisation to the feasibility problem
solved in the last section, without binary search.</p>
<p>That sounds silly at first: how can we find solutions that are
superoptimal when we don’t even know the optimal value?</p>
<p>Remember that the subproblems are always relaxations of the original
linear program. We can port the objective function from the original
LP over to the subproblems, and optimise the relaxations. Any
solution that’s optimal for a realxation must have an optimal or
superoptimal value for the original LP.</p>
<p>Rather than treating the black box online solver as a generator of
<a href="https://en.wikipedia.org/wiki/Duality_(optimization)#The_strong_Lagrangian_principle:_Lagrange_duality">Lagrangian dual</a>
vectors, we’re using its weights as solutions to the
<a href="https://smartech.gatech.edu/bitstream/handle/1853/24230/karwan_mark_h_197612_phd_154133.pdf"><em>surrogate</em> relaxation dual</a>.
The latter interpretation isn’t just more powerful by handling
objective functions. It also makes more sense: the weights generated
by algorithms for the experts problem are probabilities, i.e., they’re
non-negative and sum to \(1\). That’s also what’s expected for surrogate
dual vectors, but definitely not the case for Lagrangian dual vectors,
even when restricted to \(\leq\) constraints.</p>
<p>We can do even better!</p>
<p>Unlike Lagrangian dual solvers, which only converge when fed
(approximate) subgradients and thus make us (nearly) optimal solutions
to the relaxed subproblems, our reduction to the experts problem only
needs feasible solutions to the subproblems. That’s all we need to
guarantee an \(\varepsilon-\)feasible solution to the initial problem
in a bounded number of iterations. We also know exactly how that
\(\varepsilon-\)feasible solution is generated: it’s the arithmetic
mean of the solutions for relaxed subproblems.</p>
<p>This lets us decouple finding lower bounds from generating feasible
solutions that will, on average, \(\varepsilon-\)satisfy the
original LP. In practice, the search for an \(\varepsilon-\)feasible
solution that is also superoptimal will tend to improve the lower
bound. However, nothing forces us to evaluate lower bounds
synchronously, or to only use the experts problem solver to improve
our bounds.</p>
<p>We can find a new bound from any vector of non-negative constraint
weights: they always yield a valid surrogate relaxation. We can solve
that relaxation, and update our best bound when it’s improved. The
Diet subproblem earlier had</p>
<p>\[27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}} \leq 437.5,\]
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10.\]</p>
<p>Adding the original objective function back yields the linear program</p>
<p>\[\min 0.18 x\sb{\mathrm{corn}} + 0.23 x\sb{\mathrm{milk}} + 0.05 x\sb{\mathrm{bread}}\]
subject to
\[27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}} \leq 437.5,\]
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10,\]</p>
<p>which has a trivial optimal solution at \([0, 0, 0]\).</p>
<p>When we generate a feasible solution for the same subproblem, we can
use any valid bound on the objective value to find the most feasible
solution that is also assuredly (super)optimal. For example, if some
oracle has given us a lower bound of \(2\) for the original Diet
problem, we can solve for</p>
<p>\[\min 27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}}\]
subject to
\[0.18 x\sb{\mathrm{corn}} + 0.23 x\sb{\mathrm{milk}} + 0.05 x\sb{\mathrm{bread}}\leq 2\]
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10.\]</p>
<p>We can relax the objective value constraint further, since we know
that the final \(\varepsilon-\)feasible solution is a simple
arithmetic mean. Given the same best bound of \(2\), and, e.g., a
current average of \(3\) solutions with a value of \(1.9\), a new
solution with an objective value of \(2.3\) (more than our best
bound, so not necessarily optimal!) would yield a new average solution
with a value of \(2\), which is still (super)optimal. This means
we can solve the more relaxed subproblem</p>
<p>\[\min 27.25 x\sb{\mathrm{corn}} - 9.25 x\sb{\mathrm{milk}} + 48.75 x\sb{\mathrm{bread}}\]
subject to
\[0.18 x\sb{\mathrm{corn}} + 0.23 x\sb{\mathrm{milk}} + 0.05 x\sb{\mathrm{bread}}\leq 2.3\]
\[0 \leq x\sb{\mathrm{corn}},\, x\sb{\mathrm{milk}},\, x\sb{\mathrm{bread}} \leq 10.\]</p>
<p>Given a bound on the objective value, we swapped the constraint and
the objective; the goal is to maximise feasibility, while generating a
solution that’s “good enough” to guarantee that the average solution
is still (super)optimal.</p>
<p>For box-constrained linear programs where the box is the convex
domain, subproblems are bounded linear knapsacks, so we can simply
stop the greedy algorithm as soon as the objective value constraint is
satisfied, or when the knapsack constraint becomes active (we found a
better bound).</p>
<p>This last tweak doesn’t just accelerate convergence to
\(\varepsilon-\)feasible solutions. More importantly for me, it
pretty much guarantees that out \(\varepsilon-\)feasible solution
matches the best known lower bound, even if that bound was provided by
an outside oracle. <a href="http://www.inrialpes.fr/bipop/people/malick/Docs/05-frangioni.pdf">Bundle methods</a>
and the <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.217.8194&rep=rep1&type=pdf">Volume algorithm</a>
can also mix solutions to relaxed subproblems in order to generate
\(\varepsilon-\)feasible solutions, but the result lacks the last
guarantee: their fractional solutions are even more superoptimal
than the best bound, and that can make bounding and variable fixing
difficult.</p>
<h2>The secret sauce: AdaHedge</h2>
<p>Before last Christmas’s CVRP set covering LP, I had always used the
<a href="https://www.satyenkale.com/papers/mw-survey.pdf">multiplicative weight update (MWU) algorithm</a>
as my black box online learning algorithm: it wasn’t great, but I
couldn’t find anything better. The two main downsides for me
were that I had to know a “width” parameter ahead of time, as well
as the number of iterations I wanted to run.</p>
<p>The width is essentially the range of the payoffs; in our case, the
potential level of violation or satisfaction of each constraints by
any solution to the relaxed subproblems. The dependence isn’t
surprising: folklore in Lagrangian relaxation also says that’s a big
factor there. The problem is that the most extreme violations and
satisfactions are initialisation parameters for the MWU algorithm,
and the iteration count for a given \(\varepsilon\) is quadratic
in the width (\(\mathrm{max_violation} \cdot \mathrm{max_satisfaction}\)).</p>
<p>What’s even worse is that the MWU is explicitly tuned for a specific
iteration count. If I estimate that, give my worst-case width estimate,
one million iterations will be necessary to achieve \(\varepsilon-\)feasibility,
MWU tuned for 1M iterations will need 1M iterations, even if the actual
width is narrower.</p>
<p><a href="https://arxiv.org/abs/1301.0534">de Rooij and others published AdaHedge in 2013</a>,
an algorithm that addresses both these issues by smoothly estimating
its parameter over time, without using the doubling trick.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup>
AdaHedge’s loss (convergence rate to an \(\varepsilon-\)solution)
still depends on the relaxation’s width. However, it depends on the
maximum width actually observed during the solution process, and not
on any explicit worst-case bound. It’s also not explicily tuned for a
specific iteration count, and simply keeps improving at a rate that
roughly matches MWU. If the instance happens to be easy, we will find
an \(\varepsilon-\)feasible solution more quickly. In the worst
case, the iteration count is never much worse than that of an
optimally tuned MWU.</p>
<p>These <a href="https://gist.github.com/pkhuong/c508849180c6cf612f7335933a88ffa6">400 lines of Common Lisp</a>
implement AdaHedge and use it to optimise the set covering LP. AdaHedge acts
as the online blackk box solver for the surrogate dual problem, the relaxed
set covering LP is a linear knapsack, and each subproblem attempts
to improve the lower bound before maximising feasibility.</p>
<p>When I ran the code, I had no idea how long it would take to find a
feasible enough solution: covering constraints can never be violated
by more than \(1\), but some points could be covered by hundreds of
tours, so the worst case satisfaction width is high. I had to rely on
the way AdaHedge adapts to the actual hardness of the problem. In the
end, \(34492\) iterations sufficed to find a solution that was \(4.5\%\)
infeasible.<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup> This corresponds to a worst case with a width
of less than \(2\), which is probably not what happened. It seems
more likely that the surrogate dual isn’t actually an omniscient
adversary, and AdaHedge was able to exploit some of that “easiness.”</p>
<p>The iterations themselves are also reasonable: one sparse matrix /
dense vector multiplication to convert surrogate dual weights to an
average constraint, one solve of the relaxed LP, and another sparse
matrix / dense vector multiplication to compute violations for each
constraint. The relaxed LP is a fractional \([0, 1]\) knapsack, so
the bottleneck is sorting double floats. Each iteration took 1.8
seconds on my old laptop; I’m guessing that could easily be 10-20
times faster with vectorisation and parallelisation.</p>
<p>In another post, I’ll show how using the same surrogate dual optimisation
algorithm to mimick <a href="https://link.springer.com/article/10.1007/BF02592954">Lagrangian decomposition</a>
<a href="https://perso.ensta-paristech.fr/~diam/ro/online/Monique.Guignard-top11201.pdf">instead of Lagrangian relaxation</a>
guarantees an iteration count in \(\mathcal{O}\left(\frac{\lg \#\mathrm{nonzero}}{\varepsilon\sp{2}}\right)\) independently of luck or the specific linear constraints.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p>Yes, I have been banging my head against that wall for a while.<a href="#fnref:1" rev="footnote">↩</a></p></li>
<li id="fn:2">
<p>This is equivalent to minimising expected loss with random bits, but cleans up the reduction.<a href="#fnref:2" rev="footnote">↩</a></p></li>
<li id="fn:3">
<p>When was the last time you had to worry whether that log was natural or base-2?<a href="#fnref:3" rev="footnote">↩</a></p></li>
<li id="fn:4">
<p>The doubling trick essentially says to start with an estimate for some parameters (e.g., width), then adjust it to at least double the expected iteration count when the parameter’s actual value exceeds the estimate. The sum telescopes and we only pay a constant multiplicative overhead for the dynamic update.<a href="#fnref:4" rev="footnote">↩</a></p></li>
<li id="fn:5">
<p>I think I computed the \(\log\) of the number of decision variables instead of the number of constraints, so maybe this could have gone a bit better.<a href="#fnref:5" rev="footnote">↩</a></p></li>
</ol>
</div>Tue, 23 Apr 2019 22:05:09 GMTLispers.de: Berlin Lispers Meetup, Monday, 15th April 2019https://www.lispers.de/#2019-04-15-Berlin
https://www.lispers.de/#2019-04-15-Berlin
<p class="content">We meet again on Monday 8pm, 15th April. Our host this time is
James Anderson (www.dydra.com).</p><p class="content">Berlin Lispers is about all flavors of Lisp including Emacs Lisp, Common Lisp, Clojure, Scheme.</p><p class="content">We will have two talks this time.</p><p class="content">Hans Hübner will tell us about "Reanimating VAX LISP - A CLtL1
implementation for VAX/VMS".</p><p class="content">And Ingo Mohr will continue his talk
"About the Unknown East of the
Ancient LISP World. History and Thoughts. Part II: Eastern Common LISP
and a LISP Machine."</p><p class="content">We meet in the Taut-Haus at Engeldamm 70 in Berlin-Mitte, the bell is "James Anderson".
It is located in 10min walking distance from U Moritzplatz or U Kottbusser Tor or Ostbahnhof.
In case of questions call Christian +49 1578 70 51 61 4.</p>Tue, 09 Apr 2019 12:30:00 GMTDidier Verna: Quickref 2.0 "Be Quick or Be Dead" is releasedhttps://www.didierverna.net/blog/index.php?post/2019/04/08/Quickref-2.0-Be-Quick-or-Be-Dead-is-released
https://www.didierverna.net/blog/index.php?post/2019/04/08/Quickref-2.0-Be-Quick-or-Be-Dead-is-released
<p>Surfing on the energizing wave of ELS 2019, the <a href="https://european-lisp-symposium.org/2019/index.html" hreflang="en" title="12th European Lisp Symposium">12 European Lisp Symposium</a>, I'm happy to announce the release of Quickref 2.0, codename "Be Quick or Be Dead".</p>
<p>The major improvement in this release, justifying an increment of the major version number (and the very appropriate codename), is the introduction of parallel algorithms for building the documentation. I presented this work last week in Genova so I won't go into the gory details here, but for the brave and impatient, let me just say that using the parallel implementation is just a matter of calling the <code>BUILD</code> function with <code>:parallel t :declt-threads x :makeinfo-threads y</code> (adjust x and y as you see fit, depending on your architecture).</p>
<p>The second featured improvement is the introduction of an author index, in addition to the original one. The author index is still a bit shaky, mostly due to technical problems (calling <code>asdf:find-system</code> almost two thousand times simply doesn't work) and also to the very creative use that some library authors have of the ASDF <code>author</code> and <code>maintainer</code> slots in the system descriptions. It does, however, a quite decent job for the majority of the authors and their libraries'reference manuals.</p>
<p>Finally, the repository now has a fully functional continuous integration infrastructure, which means that there shouldn't be anymore lags between new Quicklisp (or Quickref) releases and new versions of the <a href="http://quickref.common-lisp.net" hreflang="en" title="documentation website">documentation website</a>.</p>
<p>Thanks to Antoine Hacquard, Antoine Martin, and Erik Huelsmann for their contribution to this release! A lot of new features are already in the pipe. Currently documenting 1720 libraries, and counting...</p>Mon, 08 Apr 2019 00:00:00 GMTLispers.de: Lisp-Meetup in Hamburg on Monday, 1st April 2019https://www.lispers.de/#2019-04-01-Hamburg
https://www.lispers.de/#2019-04-01-Hamburg
<p class="content">We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CET on 1st April 2019.</p><p class="content">This is an informal gathering of Lispers. Svante will talk a bit about the implementation of lispers.de. You are invited to bring your own topics.</p>Thu, 28 Mar 2019 19:17:00 GMTLispers.de: Berlin Lispers Meetup, Monday, 25th March 2019https://www.lispers.de/#2019-03-25-Berlin
https://www.lispers.de/#2019-03-25-Berlin
<p class="content">We meet again on Monday 8pm, 25th March. Our host this time is James Anderson (www.dydra.com).</p><p class="content">Berlin Lispers is about all flavors of Lisp including Common Lisp, Scheme, Dylan, Clojure.</p><p class="content">We will have a talk this time. Ingo Mohr will tell us
"About the Unknown East of the Ancient LISP World. History and Thoughts. Part I: LISP on Punchcards".</p><p class="content">We meet in the Taut-Haus at Engeldamm 70 in Berlin-Mitte, the bell is "James Anderson".
It is located in 10min walking distance from U Moritzplatz or U Kottbusser Tor or Ostbahnhof.
In case of questions call Christian +49 1578 70 51 61 4.</p>Wed, 20 Mar 2019 10:30:00 GMTQuicklisp news: March 2019 Quicklisp dist update now availablehttp://blog.quicklisp.org/2019/03/march-2019-quicklisp-dist-update-now.html
http://blog.quicklisp.org/2019/03/march-2019-quicklisp-dist-update-now.html
<b>New projects:</b><br /><ul><li><a href="https://sjl.bitbucket.io/bobbin/">bobbin</a> — Simple (word) wrapping utilities for strings. — MIT</li><li><a href="https://github.com/cmoore/cl-mango/">cl-mango</a> — A minimalist CouchDB 2.x database client. — BSD3</li><li><a href="https://sjl.bitbucket.io/cl-netpbm/">cl-netpbm</a> — Common Lisp support for reading/writing the netpbm image formats (PPM, PGM, and PBM). — MIT/X11</li><li><a href="https://github.com/asciian/cl-skkserv/">cl-skkserv</a> — skkserv with Common Lisp — GPLv3</li><li><a href="https://gitlab.com/vindarel/cl-torrents/">cl-torrents</a> — This is a little tool for the lisp REPL or the command line (also with a readline interactive prompt) to search for torrents and get magnet links — MIT</li><li><a href="https://github.com/yitzchak/common-lisp-jupyter/">common-lisp-jupyter</a> — A Common Lisp kernel for Jupyter along with a library for building Jupyter kernels. — MIT</li><li><a href="https://github.com/noloop/conf">conf</a> — Simple configuration file manipulator for projects. — GNU General Public License v3.0</li><li><a href="https://github.com/noloop/eventbus">eventbus</a> — An event bus in Common Lisp. — GPLv3</li><li><a href="https://github.com/ralph-schleicher/open-location-code/">open-location-code</a> — Open Location Code library. — Modified BSD License</li><li><a href="https://gitlab.com/ediethelm/piggyback-parameters">piggyback-parameters</a> — This is a configuration system that supports local file and database based parameter storage. — MIT</li><li><a href="https://github.com/rigetti/quilc/">quilc</a> — A CLI front-end for the Quil compiler — Apache License 2.0 (See LICENSE.txt)</li><li><a href="https://github.com/rigetti/qvm/">qvm</a> — An implementation of the Quantum Abstract Machine. — Apache License 2.0 (See LICENSE.txt)</li><li><a href="https://github.com/marcoheisig/restricted-functions/">restricted-functions</a> — Reasoning about functions with restricted argument types. — MIT</li><li><a href="https://github.com/noloop/simplet">simplet</a> — Simple test runner in Common Lisp. — GPLv3</li><li><a href="https://github.com/noloop/skeleton-creator">skeleton-creator</a> — Create projects from a skeleton directory. — GPLv3</li><li><a href="https://bitbucket.org/reginleif/solid-engine.git">solid-engine</a> — The Common Lisp stack-based application controller — MIT</li><li><a href="https://github.com/robert-strandh/Spell/">spell</a> — Spellchecking package for Common Lisp — BSD</li><li><a href="https://gitlab.com/ediethelm/trivial-continuation">trivial-continuation</a> — Provides an implementation of function call continuation and combination. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-hashtable-serialize">trivial-hashtable-serialize</a> — A simple method to serialize and deserialize hash-tables. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-json-codec">trivial-json-codec</a> — A JSON parser able to identify class hierarchies. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-monitored-thread">trivial-monitored-thread</a> — Trivial Monitored Thread offers a very simple (aka trivial) way of spawning threads and being informed when one any of them crash and die. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-object-lock">trivial-object-lock</a> — A simple method to lock object (and slot) access. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-pooled-database">trivial-pooled-database</a> — A DB multi-threaded connection pool. — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-timer">trivial-timer</a> — Easy scheduling of tasks (functions). — MIT</li><li><a href="https://gitlab.com/ediethelm/trivial-variable-bindings">trivial-variable-bindings</a> — Offers a way to handle associations between a place-holder (aka. variable) and a value. — MIT</li><li><a href="https://github.com/marcoheisig/ucons/">ucons</a> — Unique conses and functions for working on them. — MIT</li><li><a href="https://github.com/phoe/wordnet/">wordnet</a> — Common Lisp interface to WordNet — CC-BY 4.0</li></ul><b>Updated projects</b>: <a href="http://quickdocs.org/agnostic-lizard/">agnostic-lizard</a>, <a href="http://quickdocs.org/april/">april</a>, <a href="http://quickdocs.org/big-string/">big-string</a>, <a href="http://quickdocs.org/binfix/">binfix</a>, <a href="http://quickdocs.org/cepl/">cepl</a>, <a href="http://quickdocs.org/chancery/">chancery</a>, <a href="http://quickdocs.org/chirp/">chirp</a>, <a href="http://quickdocs.org/cl+ssl/">cl+ssl</a>, <a href="http://quickdocs.org/cl-abstract-classes/">cl-abstract-classes</a>, <a href="http://quickdocs.org/cl-all/">cl-all</a>, <a href="http://quickdocs.org/cl-async/">cl-async</a>, <a href="http://quickdocs.org/cl-collider/">cl-collider</a>, <a href="http://quickdocs.org/cl-conllu/">cl-conllu</a>, <a href="http://quickdocs.org/cl-cron/">cl-cron</a>, <a href="http://quickdocs.org/cl-digraph/">cl-digraph</a>, <a href="http://quickdocs.org/cl-egl/">cl-egl</a>, <a href="http://quickdocs.org/cl-gap-buffer/">cl-gap-buffer</a>, <a href="http://quickdocs.org/cl-generator/">cl-generator</a>, <a href="http://quickdocs.org/cl-generic-arithmetic/">cl-generic-arithmetic</a>, <a href="http://quickdocs.org/cl-grace/">cl-grace</a>, <a href="http://quickdocs.org/cl-hamcrest/">cl-hamcrest</a>, <a href="http://quickdocs.org/cl-las/">cl-las</a>, <a href="http://quickdocs.org/cl-ledger/">cl-ledger</a>, <a href="http://quickdocs.org/cl-locatives/">cl-locatives</a>, <a href="http://quickdocs.org/cl-markless/">cl-markless</a>, <a href="http://quickdocs.org/cl-messagepack/">cl-messagepack</a>, <a href="http://quickdocs.org/cl-ntriples/">cl-ntriples</a>, <a href="http://quickdocs.org/cl-patterns/">cl-patterns</a>, <a href="http://quickdocs.org/cl-prevalence/">cl-prevalence</a>, <a href="http://quickdocs.org/cl-proj/">cl-proj</a>, <a href="http://quickdocs.org/cl-project/">cl-project</a>, <a href="http://quickdocs.org/cl-qrencode/">cl-qrencode</a>, <a href="http://quickdocs.org/cl-random-forest/">cl-random-forest</a>, <a href="http://quickdocs.org/cl-stopwatch/">cl-stopwatch</a>, <a href="http://quickdocs.org/cl-string-complete/">cl-string-complete</a>, <a href="http://quickdocs.org/cl-string-match/">cl-string-match</a>, <a href="http://quickdocs.org/cl-tcod/">cl-tcod</a>, <a href="http://quickdocs.org/cl-wayland/">cl-wayland</a>, <a href="http://quickdocs.org/clad/">clad</a>, <a href="http://quickdocs.org/clem/">clem</a>, <a href="http://quickdocs.org/clod/">clod</a>, <a href="http://quickdocs.org/closer-mop/">closer-mop</a>, <a href="http://quickdocs.org/clx-xembed/">clx-xembed</a>, <a href="http://quickdocs.org/coleslaw/">coleslaw</a>, <a href="http://quickdocs.org/common-lisp-actors/">common-lisp-actors</a>, <a href="http://quickdocs.org/croatoan/">croatoan</a>, <a href="http://quickdocs.org/dartsclhashtree/">dartsclhashtree</a>, <a href="http://quickdocs.org/data-lens/">data-lens</a>, <a href="http://quickdocs.org/defrec/">defrec</a>, <a href="http://quickdocs.org/doplus/">doplus</a>, <a href="http://quickdocs.org/doubly-linked-list/">doubly-linked-list</a>, <a href="http://quickdocs.org/dynamic-collect/">dynamic-collect</a>, <a href="http://quickdocs.org/eclector/">eclector</a>, <a href="http://quickdocs.org/escalator/">escalator</a>, <a href="http://quickdocs.org/external-program/">external-program</a>, <a href="http://quickdocs.org/fiasco/">fiasco</a>, <a href="http://quickdocs.org/flac-parser/">flac-parser</a>, <a href="http://quickdocs.org/game-math/">game-math</a>, <a href="http://quickdocs.org/gamebox-dgen/">gamebox-dgen</a>, <a href="http://quickdocs.org/gamebox-math/">gamebox-math</a>, <a href="http://quickdocs.org/gendl/">gendl</a>, <a href="http://quickdocs.org/generic-cl/">generic-cl</a>, <a href="http://quickdocs.org/genie/">genie</a>, <a href="http://quickdocs.org/golden-utils/">golden-utils</a>, <a href="http://quickdocs.org/helambdap/">helambdap</a>, <a href="http://quickdocs.org/interface/">interface</a>, <a href="http://quickdocs.org/ironclad/">ironclad</a>, <a href="http://quickdocs.org/jp-numeral/">jp-numeral</a>, <a href="http://quickdocs.org/json-responses/">json-responses</a>, <a href="http://quickdocs.org/l-math/">l-math</a>, <a href="http://quickdocs.org/letrec/">letrec</a>, <a href="http://quickdocs.org/lisp-chat/">lisp-chat</a>, <a href="http://quickdocs.org/listopia/">listopia</a>, <a href="http://quickdocs.org/literate-lisp/">literate-lisp</a>, <a href="http://quickdocs.org/maiden/">maiden</a>, <a href="http://quickdocs.org/map-set/">map-set</a>, <a href="http://quickdocs.org/mcclim/">mcclim</a>, <a href="http://quickdocs.org/mito/">mito</a>, <a href="http://quickdocs.org/nodgui/">nodgui</a>, <a href="http://quickdocs.org/overlord/">overlord</a>, <a href="http://quickdocs.org/parachute/">parachute</a>, <a href="http://quickdocs.org/parameterized-function/">parameterized-function</a>, <a href="http://quickdocs.org/pathname-utils/">pathname-utils</a>, <a href="http://quickdocs.org/periods/">periods</a>, <a href="http://quickdocs.org/petalisp/">petalisp</a>, <a href="http://quickdocs.org/pjlink/">pjlink</a>, <a href="http://quickdocs.org/plump/">plump</a>, <a href="http://quickdocs.org/policy-cond/">policy-cond</a>, <a href="http://quickdocs.org/portable-threads/">portable-threads</a>, <a href="http://quickdocs.org/postmodern/">postmodern</a>, <a href="http://quickdocs.org/protest/">protest</a>, <a href="http://quickdocs.org/qt-libs/">qt-libs</a>, <a href="http://quickdocs.org/qtools/">qtools</a>, <a href="http://quickdocs.org/qtools-ui/">qtools-ui</a>, <a href="http://quickdocs.org/recur/">recur</a>, <a href="http://quickdocs.org/regular-type-expression/">regular-type-expression</a>, <a href="http://quickdocs.org/rove/">rove</a>, <a href="http://quickdocs.org/serapeum/">serapeum</a>, <a href="http://quickdocs.org/shadow/">shadow</a>, <a href="http://quickdocs.org/simplified-types/">simplified-types</a>, <a href="http://quickdocs.org/sly/">sly</a>, <a href="http://quickdocs.org/spinneret/">spinneret</a>, <a href="http://quickdocs.org/staple/">staple</a>, <a href="http://quickdocs.org/stumpwm/">stumpwm</a>, <a href="http://quickdocs.org/sucle/">sucle</a>, <a href="http://quickdocs.org/synonyms/">synonyms</a>, <a href="http://quickdocs.org/tagger/">tagger</a>, <a href="http://quickdocs.org/template/">template</a>, <a href="http://quickdocs.org/trivia/">trivia</a>, <a href="http://quickdocs.org/trivial-battery/">trivial-battery</a>, <a href="http://quickdocs.org/trivial-benchmark/">trivial-benchmark</a>, <a href="http://quickdocs.org/trivial-signal/">trivial-signal</a>, <a href="http://quickdocs.org/trivial-utilities/">trivial-utilities</a>, <a href="http://quickdocs.org/ubiquitous/">ubiquitous</a>, <a href="http://quickdocs.org/umbra/">umbra</a>, <a href="http://quickdocs.org/usocket/">usocket</a>, <a href="http://quickdocs.org/varjo/">varjo</a>, <a href="http://quickdocs.org/vernacular/">vernacular</a>, <a href="http://quickdocs.org/with-c-syntax/">with-c-syntax</a>.<br /><br /><b>Removed projects</b>: mgl, mgl-mat.<br /><br />To get this update, use: (ql:update-dist "quicklisp")<br /><br />Enjoy!Thu, 07 Mar 2019 14:52:00 GMTLispers.de: Lisp-Meetup in Hamburg on Monday, 4th March 2019https://www.lispers.de/#2019-03-04-Hamburg
https://www.lispers.de/#2019-03-04-Hamburg
<p class="content">We meet at Ristorante Opera, Dammtorstraße 7, Hamburg, starting around 19:00 CET on 4th March 2019.</p><p class="content">This is an informal gathering of Lispers. Come as you are, bring lispy topics.</p>Wed, 27 Feb 2019 11:26:00 GMTPaul Khuong: The Unscalable, Deadlock-prone, Thread Poolhttp://www.pvk.ca/Blog/2019/02/25/the-unscalable-thread-pool/
http://www.pvk.ca/Blog/2019/02/25/the-unscalable-thread-pool/
<p><small>Epistemic Status: I’ve seen thread pools fail this way multiple
times, am confident the pool-per-state approach is an improvement, and
have confirmed with others they’ve also successfully used it in anger.
While I’ve thought about this issue several times over ~4 years and
pool-per-state seems like a good fix, I’m not convinced it’s
undominated and hope to hear about better approaches.</small></p>
<p>Thread pools tend to only offer a sparse interface:
<a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.submit">pass a closure or a function and its arguments to the pool</a>,
<a href="https://github.com/silentbicycle/loom/blob/master/loom.h#L46">and that function</a>
<a href="https://github.com/lmj/lparallel/blob/9c11f40018155a472c540b63684049acc9b36e15/src/kernel/core.lisp#L374">will be called, eventually</a>.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> Functions can do
anything, so this interface should offer all the expressive power one
could need. Experience tells me otherwise.</p>
<p>The standard pool interface is so impoverished that it is nearly
impossible to use correctly in complex programs, and leads us down
design dead-ends. I would actually argue it’s better to work with raw
threads than to even have <del>generic</del> amorphous thread pools: the former force us
to stop and think about resource requirements (and lets the OS’s real
scheduler help us along), instead of making us pretend we only care
about CPU usage. I claim thread pools aren’t scalable because, with
the exception of CPU time, they actively hinder the development of
programs that achieve high resource utilisation.</p>
<p>This post comes in two parts. First, the story of a simple program
that’s parallelised with a thread pool, then hits a wall as a wider set
of resources becomes scarce. Second, a solution I like for that kind of
program: an explicit state machine, where each state gets a dedicated
queue that is aware of the state’s resource requirements.</p>
<h2>Stages of parallelisation</h2>
<p>We start with a simple program that processes independent work units,
a serial loop that pulls in work (e.g., files in a directory), or wait
for requests on a socket, one work unit at a time.</p>
<p>At some point, there’s enough work to think about parallelisation, and
we choose threads.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> To keep things simple, we simply spawn
a thread per work unit. Load increases further, and we observe that
we spend more time switching between threads or contending on shared
data than doing actual work. We could use a semaphore to limit the
number of work units we process concurrently, but we might as well
just push work units to a thread pool and recycle threads instead of
wasting resources on a thread-per-request model. We can even start
thinking about queueing disciplines, admission control, backpressure,
etc. Experienced developers will often jump directly to this stage
after the serial loop.</p>
<p>The 80s saw a lot of research on generalising this “flat” parallelism
model to nested parallelism, where work units can spawn additional
requests and wait for the results (e.g., to recursively explore
sub-branches of a search tree). Nested parallelism seems like a good
fit for contemporary network services: we often respond to a request
by sending simpler requests downstream, before merging and munging the
responses and sending the result back to the original requestor. That
may be why futures and promises are so popular these days.</p>
<p>I believe that, for most programs, the futures model is an excellent
answer to the wrong question. The moment we perform I/O (be it
network, disk, or even with hardware accelerators) in order to
generate a result, running at scale will have to mean controlling
more resources than just CPU, and both the futures and the generic
thread pool models fall short.</p>
<p>The issue is that futures only work well when a waiter can help along
the value it needs, with task stealing, while thread pools implement a
trivial scheduler (dedicate a thread to a function until that function
returns) that must be oblivious to resource requirements, since it
handles opaque functions.</p>
<p>Once we have futures that might be blocked on I/O, we can’t
guarantee a waiter will achieve anything by lending CPU time to its
children. We could help sibling tasks, but that way stack overflows
lie.</p>
<p>The deficiency of flat generic thread pools is more subtle. Obviously, one
doesn’t want to take a tight thread pool, with one thread per core,
and waste it on synchronous I/O. We’ll simply kick off I/O
asynchronously, and re-enqueue the continuation on the pool upon
completion!</p>
<p>Instead of doing</p>
<pre><code>A, I/O, B
</code></pre>
<p>in one function, we’ll split the work in two functions and a callback</p>
<pre><code>A, initiate asynchronous I/O
On I/O completion: enqueue B in thread pool
B
</code></pre>
<p>The problem here is that it’s easy to create too many asynchronous
requests, and run out of memory, DOS the target, or delay the rest of
the computation for too long. As soon as the I/O requests has been
initiated in <code>A</code>, the function returns to the thread pool, which will
just execute more instances of <code>A</code> and initiate even more I/O.</p>
<p>At first, when the program doesn’t heavily utilise any resource in
particular, there’s an easy solution: limit the total number of
in-flight work units with a semaphore. Note that I wrote work unit,
not function calls. We want to track logical requests that we started
processing, but for which there is still work to do (e.g., the
response hasn’t been sent back yet).</p>
<p>I’ve seen two ways to cap in-flight work units. One’s buggy, the other
doesn’t generalise.</p>
<p>The buggy implementation acquires a semaphore in the first stage of
request handling (<code>A</code>) and releases it in the last stage (<code>B</code>). The
bug is that, by the time we’re executing <code>A</code>, we’re already using up a
slot in the thread pool, so we might be preventing <code>B</code>s from
executing. We have a lock ordering problem: <code>A</code> acquires a thread
pool slot before acquiring the in-flight semaphore, but <code>B</code> needs to
acquire a slot before releasing the same semaphore. If you’ve seen
code that deadlocks when the thread pool is too small, this was
probably part of the problem.</p>
<p>The correct implementation acquires the semaphore before enqueueing a
new work unit, before shipping a call to <code>A</code> to the thread pool (and
releases it at the end of processing, in <code>B</code>). This only works because
we can assume that the first thing <code>A</code> does is to acquire the
semaphore. As our code becomes more efficient, we’ll want to more
finely track the utilisation of multiple resources, and
pre-acquisition won’t suffice. For example, we might want to limit
network requests going to individual hosts, independently from disk
reads or writes, or from database transactions.</p>
<h2>Resource-aware thread pools</h2>
<p>The core issue with thread pools is that the only thing they can do is
run opaque functions in a dedicated thread, so the only way to reserve
resources is to already be running in a dedicated thread. However, the
one resource that every function needs is a thread on which to run, thus
any correct lock order must acquire the thread last.</p>
<p>We care about reserving resources because, as our code becomes more
efficient and scales up, it will start saturating resources that used
to be virtually infinite. Unfortunately, classical thread pools can
only control CPU usage, and actively hinder correct resource
throttling. If we can’t guarantee we won’t overwhelm the supply of a
given resource (e.g., read IOPS), we must accept wasteful
overprovisioning.</p>
<p>Once the problem has been identified, the solution becomes obvious:
make sure the work we push to thread pools describes the resources
to acquire before running the code in a dedicated thread.</p>
<p>My favourite approach assigns one global thread pool (queue) to each
function or processing step. The arguments to the functions will
change, but the code is always the same, so the resource requirements
are also well understood. This does mean that we incur complexity to
decide how many threads or cores each pool is allowed to use. However,
I find that the resulting programs are better understandable at a high
level: it’s much easier to write code that traverses and describes the
work waiting at different stages when each stage has a dedicated
thread pool queue. They’re also easier to model as queueing systems,
which helps answer “what if?” questions without actually implementing
the hypothesis.</p>
<p>In increasing order of annoyingness, I’d divide resources to acquire in
four classes.</p>
<ol>
<li>Resources that may be seamlessly<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> shared or timesliced, like CPU.</li>
<li>Resources that are acquired for the duration of a single function
call or processing step, like DB connections.</li>
<li>Resources that are acquired in one function call, then released in
another thread pool invocation, like DB transactions, or asynchronous
I/O semaphores.</li>
<li>Resources that may only be released after temporarily using more of
it, or by cancelling work: memory.</li>
</ol>
<p>We don’t really have to think about the first class of resources, at
least when it comes to correctness. However, repeatedly running the
same code on a given core tends to improve performance, compared to
running all sorts of code on all cores.</p>
<p>The second class of resources may be acquired once our code is running
in a thread pool, so one could pretend it doesn’t exist. However, it
is more efficient to batch acquisition, and execute a bunch of calls
that all need a given resource (e.g., a DB connection from a
connection pool) before releasing it, instead of repetitively
acquiring and releasing the same resource in back-to-back function
calls, or blocking multiple workers on the same
bottleneck.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> More importantly, the property of always being
acquired and released in the same function invocation, is a global
one: as soon as even one piece of code acquires a given resource and
releases in another thread pool call (e.g., acquires a DB connection,
initiates an asynchronous network call, writes the result of the call
to the DB, and releases the connection), we must always treat that
resource as being in the third, more annoying, class. Having explicit
stages with fixed resource requirements helps us confirm resources
are classified correctly.</p>
<p>The third class of resources <em>must</em> be acquired in a way that
preserves forward progress in the rest of the system. In particular,
we must never have all workers waiting for resources of this third
class. In most cases, it suffices to make sure there at least as many
workers as there are queues or stages, and to only let each stage run
the initial resource acquisition code in one worker at a time.
However, it can pay off to be smart when different queued items
require different resources, instead of always trying to satisfy
resource requirements in FIFO order.</p>
<p>The fourth class of resources is essentially heap memory. Memory is
special because the only way to release it is often to complete the
computation. However, moving the computation forward will use even
more heap. In general, my only solution is to impose a hard cap on the
total number of in-flight work units, and to make sure it’s easy to
tweak that limit at runtime, in disaster scenarios. If we still run
close to the memory capacity with that limit, the code can either
crash (and perhaps restart with a lower in-flight cap), or try to
cancel work that’s already in progress. Neither option is very
appealing.</p>
<p>There are some easier cases. For example, I find that temporary bumps
in heap usage can be caused by parsing large responses from
idempotent (<code>GET</code>) requests. It would be nice if networking subsystems
tracked memory usage to dynamically throttle requests, or
even cancel and retry idempotent ones.</p>
<p>Once we’ve done the work of explicitly writing out the processing
steps in our program as well as their individual resource
requirements, it makes sense to let that topology drive the structure
of the code.</p>
<p>Over time, we’ll gain more confidence in that topology and bake it in
our program to improve performance. For example, rather than limiting
the number of in-flight requests with a semaphore, we can have a
fixed-size allocation pool of request objects. We can also
selectively use bounded ring buffers once we know we wish to impose a
limit on queue size. Similarly, when a sequence (or subgraph) of
processing steps is fully synchronous or retires in order, we can
control both the queue size and the number of in-flight work units
with a <a href="https://lmax-exchange.github.io/disruptor/">disruptor</a>, which
should also improve locality and throughput under load. These
transformations are easy to apply once we know what the movement of
data and resource looks like. However, they also ossify the structure
of the program, so I only think about such improvements if they
provide a system property I know I need (e.g., a limit on the number
of in-flight requests), or once the code is functional and we have
load-testing data.</p>
<p>Complex programs are often best understood as state machines. These
state machines can be implicit, or explicit. I prefer the latter. I
claim that it’s also preferable to have one thread pool<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup> per
explicit state than to dump all sorts of state transition logic
in a shared pool. If writing functions that process flat tables is
data-oriented programming, I suppose I’m arguing for data-oriented
state machines.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
<p>Convenience wrappers, like parallel map, or “run after this time,” still rely on the flexibility of opaque functions.<a href="#fnref:1" rev="footnote">↩</a></p></li>
<li id="fn:2">
<p>Maybe we decided to use threads because there’s a lot of shared, read-mostly, data on the heap. It doesn’t really matter, process pools have similar problems.<a href="#fnref:2" rev="footnote">↩</a></p></li>
<li id="fn:3">
<p>Up to a point, of course. No model is perfect, etc. etc.<a href="#fnref:3" rev="footnote">↩</a></p></li>
<li id="fn:4">
<p>Explicit resource requirements combined with one queue per stage lets us steal ideas from <a href="https://en.wikipedia.org/wiki/Staged_event-driven_architecture">SEDA</a>.<a href="#fnref:4" rev="footnote">↩</a></p></li>
<li id="fn:5">
<p>One thread pool per state in the sense that no state can fully starve out another of CPU time. The concrete implementation may definitely let a shared set of workers pull from all the queues.<a href="#fnref:5" rev="footnote">↩</a></p></li>
</ol>
</div>Tue, 26 Feb 2019 02:27:06 GMTChristophe Rhodes: sbcl 1 5 0http://christophe.rhodes.io/notes/blog/posts/2019/sbcl_1_5_0/
http://christophe.rhodes.io/notes/blog/posts/2019/sbcl_1_5_0/
<p>Today, I
released <a href="http://www.sbcl.org/all-news.html#1.5.0">sbcl-1.5.0</a> - with
no particular reason for the minor version bump except that when the
patch version (we don't in fact do semantic versioning) gets large
it's hard to remember what I need to type in the release script. In
the 17 versions (and 17 months) since sbcl-1.4.0, there have been over
2900 commits - almost all by other people - fixing user-reported,
developer-identified, and random-tester-lesser-spotted bugs; providing
enhancements; improving support for various platforms; and making
things faster, smaller, or sometimes both.</p>
<p>It's sometimes hard for developers to know what their users think of
all of this furious activity. It's definitely hard for <em>me</em>, in the
case of SBCL: I throw releases over the wall, and sometimes people
tell me I messed up (but usually there is a resounding silence). So I
am running
a
<a href="https://docs.google.com/forms/d/e/1FAIpQLSe6GZTqlQbGZNusJ8W7oIWqpkh_6PuTsNGm4c_P8TfnUjBYxQ/viewform">user survey</a>,
where I invite you to tell me things about your use of SBCL. All
questions are optional: if something is personally or commercially
sensitive, please feel free not to tell me about it! But it's nine
years since
the
<a href="http://random-state.net/sbcl-survey-2010-results.html">last survey</a>
(that I know of), and much has changed since then - I'd be glad to
hear any information SBCL users would be willing to provide. I hope
to collate the information in late March, and report on any insight I
can glean from the answers.</p>Sun, 24 Feb 2019 21:40:13 GMT