Implementing Condition Variables in Racket
Whoops, I got distracted with Petri Nets, Midlands Graduate School, the Idris Developer Meeting, and a whole bunch of projects and fixes, and suddenly it’s 5 months since the precursor to this. Sorry about that…
After patching the sleep
shenanigans in the Idris Racket backend, the
Condition Variable (CV) tests needed to be written. I came up with the following
test scenarios:
- 1 Main, 1 Child thread – Test that the child can be signalled and wakes up.
- 1 Main, N Child threads – Test that 1 child thread gets woken on signal.
- 1 Main, 1 Child thread – Test that the child gets woken when the CV is broadcast.
- 1 Main, N Child threads – Test that all N children get woken when the CV is broadcast.
- 1 Main, 1 Child thread – Test that the child continues after a timeout if the main thread never updates the CV.
- 1 Main, N Child threads – Test that M (where M < N) threads continue after a timeout if the main thread never updates the CV.
These scenarios test all supported operations on condition variables (wait
,
signal
, broadcast
, and waitTimeout
) with both 1 and multiple threads using
the CV (kind of critical to test multiple thread access in a
concurrency/synchronisation primitive). They were implemented in Idris and then
run using the --cg racket
flag. However, since the CV implementation was
broken, it needed to be fixed.
Brief refresher on some terminology:
Condition Variables are synchronisation primitives which allow threads to wait for a condition to be true, e.g. to guarantee that worker threads don’t try to get work before the main thread has signalled that there is work. CVs typically support 3 operations:
wait
– Somewhat self-explanatory: blocks the thread, waiting until the CV indicates that it is okay to continue. This operation requires a locked mutex to guarantee that the thread atomically switches to waiting on the CV.signal
– Wakes 1 thread currently waiting on the CV.broadcast
– Wakes all threads currently waiting on the CV.
In addition to this, most CV implementations support something like
waitTimeout
, which blocks the thread until either the CV is updated or the
specified timeout has elapsed.
Semaphores have existed for millennia, and no-one is really sure where they come from. Well, kind of: Semaphores are technically just a form of signals, with some of the oldest examples being smoke signals. Eventually, trains came along, and soon after a fixed railway sign using “arms” to signal different things to the locomotive drivers was invented, called a “Railway semaphore signal”. A while later, Dijkstra then came along and invented the synchronisation primitive, naming it after the railway signal. Since Dijkstra was Dutch, he used the Dutch terminology for the operations on semaphores: passering “passing”, and vrijgave “set free/release”. These were abbreviated P and V respectively, which turns out to be unfortunate and highly confusing when trying to use something which uses the common, English operation names: P and V are equivalent to “wait” and “post” respectively (yes, “respectively”: the original operation “P” is what most languages now call “wait”, and the original operation “V” is what most languages now call “post” – V increments the semaphore and P tries to decrement it).
Understanding the “Implementing CVs” paper
As with most concurrency primitives, implementing them from scratch is really hard, since there are a lot of race-conditions and edge-cases to take care of, and often some go unnoticed until someone writes a specific multithreaded use-case where everything breaks for seemingly no reason. Fortunately, there is an excellent paper by Andrew D. Birell detailing not only how CVs are meant to work and how to achieve this using semaphores (which is the only simple concurrency primitive we have access to in Racket), but even including the exact code! The paper highlights the difficulty of doing implementing CVs correctly: The bulkiness and potential performance impact of the semaphore-based implementation was serious enough that the team abandoned the implementation and ended up having one of the kernel developers implement them directly using the scheduler and the hardware’s atomic test-and-set instructions. Unfortunately, we have no such luxury in the Racket/Idris world, so I had to understand the code in the paper.
The code uses mostly single-letter variable names which everybody knows is best
practice and makes things easy to understand. The most sensible of these is m
,
which is the mutex used to guarantee the atomic switch to waiting on the CV. In
the paper, this mutex is part of the CV data structure, but as far as I can tell,
this is contrary to most other implementations (like C or Chez-Scheme), which
have the mutex be passed in as an argument to the wait
function call. As such,
I chose to leave this part out and only have the CV data structure contain the
semaphores required for its correct operation and the waiters
counter; keeping
it consistent with the other CV APIs I had seen. The waiters
counter is the
only truly sensibly-named variable: it counts how many threads are currently
waiting on the CV. The implementation uses 3 semaphores which are “helpfully”
named s
, x
, and h
. Reading through the code a couple of times with the
help of a friend, we came to the following conclusions:
s
– is responsible for the actual releasing of the threads.x
– protects thewaiters
counter, guaranteeing that it is atomically incremented and decremented; basically a mutex.h
– ensures that there is a “handshake” when waking threads, i.e. thatwait
+signal
pairs line up. This is necessary to prevent inter-broadcast
mix ups: when callingbroadcast
, it has to wait forwait
to have successfully woken a thread before continuing to broadcast to the remaining threads.
This is helpful 1) since it helps immensely with understanding the code+logic, and 2) to have (hopefully) more helpful names in my Racket implementation.
Implementing CVs in Racket
First step is to define a new data-type. In racket, this is done using the
struct
keyword:
(struct cv (countingSem waitersLock waiters handshakeSem) #:mutable)
This defines a new structure called cv
, which contains the fields
countingSem
, waitersLock
, waiters
, and handshakeSem
. Specifying
#:mutable
as a struct-option (as opposed to a field-option) means that
every field in the struct is mutable and can be modified using the implicitly
generated set-
structName-fieldName
!
functions.
Since the semaphores are internal to the CV, they are created as part of the constructor:
(define (blodwen-make-cv)
(let ([s (make-semaphore 0)]
[x (make-semaphore 1)]
[h (make-semaphore 0)])
(cv s x 0 h)))
I used the original variable names here for brevity, and also to maintain a
slight link between the original paper and my implementation. The field names,
which are actually used for access and modification, still have the (in my
opinion) clearer names. The 0
on the last line is the initial value of the
waiters
counter.
Next, the simplest operation to define in Racket is wait
:
(define (blodwen-cv-wait my-cv m)
; atomically increment waiters
(semaphore-wait (cv-waitersLock my-cv))
(set-cv-waiters! my-cv (+ (cv-waiters my-cv) 1))
(semaphore-post (cv-waitersLock my-cv))
; release the provided mutex
(blodwen-mutex-release m)
; wait for the counting semaphore to let us through
(semaphore-wait (cv-countingSem my-cv))
; signal to broadcast that we have proceeded past the critical point/have
; been woken up successfully
(semaphore-post (cv-handshakeSem my-cv))
; re-acquire the provided mutex
(blodwen-mutex-acquire m)
)
As with most other implementations, wait
takes a CV my-cv
and a locked mutex
m
, and causes the thread to wait until the CV is updated. This is done by
waiting on the countingSem
semaphore. Each of the fields in the my-cv
struct
can be accessed using the implicitly generated structName-fieldName
functions. Note that +
is just a function, so to increment waiters
by 1, we
call it on the value retrieved from the struct: (+ (cv-waiters my-cv) 1)
, just
like we would if we were using any other function.
With wait
implemented, the next thing which is still mostly simple to
implement is signal
:
(define (blodwen-cv-signal my-cv)
; lock access to waiters
(semaphore-wait (cv-waitersLock my-cv))
(let ([waiters (cv-waiters my-cv)])
(if (> waiters 0)
; if we have waiting threads, signal one of them
(begin
(set-cv-waiters! my-cv (- waiters 1))
; increment the counting semaphore to wake up a thread
(semaphore-post (cv-countingSem my-cv))
; wait for the thread to tell us it's okay to proceed
(semaphore-wait (cv-handshakeSem my-cv))
)
; otherwise, do nothing
(void)
)
; unlock access to waiters
(semaphore-post (cv-waitersLock my-cv))
))
Again, nothing crazy is going on. Rather than using (cv-waiters my-cv)
everywhere, we retrieve the value once using a let
binding. And since Racket
is a functional language, we have to deal with no-ops explicitly, so the else
part of the if
-statement is simply (void)
. One new thing, compared to the
previous post on Idris/Racket concurrency, is the begin
statement. And it’s
an important one. It is a very good example of where Lisp-like languages (or
maybe just the Scheme dialects) can be somewhat tricky: Contrary to most other
programming languages, parentheses carry more meaning than ordering things.
Consider the Racket expression:
+ 1 2
You might expect this to evaluate to 3, but that’s not the case. This
expression, without any parentheses, is: the procedure +
, and the numbers 1
and 2. This will likely be confusing to many people coming from outside the
Scheme/Lisp world (it certainly was for me!). The reason for this result is that
parentheses are not arbitrary, they change the fundamental meaning of things.
This is very easy to trip over, especially when trying to use parentheses to
order execution. If you want the numbers 3 and 7 (as a toy example), you might
correctly write:
(+ 1 2) (+ 3 4)
and it wouldn’t be unreasonable, coming from other languages, to then assume that writing:
((+ 1 2) (+ 3 4))
would result in the same numbers, just being more explicit about the ordering of things (perhaps for extra sanity in a large sequence of operations). However, instead of being fine, Racket complains:
; application: not a procedure;
This is because, while the expression partially reduces to (3 7)
, the
parentheses change the meaning of that expression to: “Call 3
with the
argument 7
”. This doesn’t make sense, 3
is a number and not something that
can be called, and so Racket complains. Adding parentheses doesn’t just change
the evaluation order, it changes the meaning of the expressions themselves! To
sum up with a shorter example:foo
is a variable reference; (foo)
is the call to foo
with no arguments;
and ((foo))
is taking the value that calling foo
with no arguments returns,
and calling that value as if it were a function/procedure, with no arguments.
However, sometimes when writing Racket, we end up in situations where it would
be really convenient to express a sequence of operations in the middle of
things, e.g. in an if
branch. This is where begin
is useful. Coming back to
the signal
code:
(if (> waiters 0)
(begin
(set-cv-waiters! my-cv (- waiters 1))
(semaphore-post (cv-countingSem my-cv))
(semaphore-wait (cv-handshakeSem my-cv))
)
(void)
)
Without begin
, the first part of the true
branch of the if
-statement would
be “the call of the return value of set-cv-waiters! [...]
with 2 arguments
(i.e. the return values of the semaphore-*
calls)”, which is not what we want
at all; we want to execute each statement as part of the true
branch. So we
use begin
to evaluate the expressions, ignoring every result but the last one,
which is returned.
While begin
took a bit to explain, it is fine to use once you remember about
it. The main pitfall in implementing signal
is that the paper puts the 3
instructions required on one line:
if (waiters > 0) { waiters--; s.V(); h.P(); }
In combination with the single-letter variable naming, this makes it easy to miss one of them, which I did on my first attempt at implementing this. Fortunately, since the implementation doesn’t work without all 3, it’s at least easy to realise that a mistake has been made. Finding it however, not so much…
Moving on to the broadcast
function, it is a bit trickier to implement: It
should wake all threads waiting on the CV at the moment of the call, which is
done by iteration in the paper. However, as you will know if you’ve done
functional programming before, functional programming doesn’t really do
iteration. Instead, we prefer to use recursion or something from the map
,
traverse
, fold
, etc family of functions. I decided that the simplest
approach would be to define a couple of helper-functions for the iterative bits
of the code. This time, the confusion of single-letter naming gets some
assistance from “single-line loops” (because why make things readable, right?):
x.P(); {
for (int i = 0; i < waiters; i++) s.V();
while (waiters > 0) { waiters--; h.P(); }
} x.V();
The first helper is to implement the for
-loop. It can be broken down into a
base-case and a recursive step:
- If
i
is zero, then we are done. - Otherwise, we need to signal a waiting thread, and recurse with
i
being 1 smaller.
There are 2 things we need to keep track of: The CV which things are waiting on,
and i
. So the definition of our helper becomes:
; for (int i = 0; i < waiters; i++) s.V();
(define (broadcast-for-helper my-cv i)
(if (= i 0)
; if i is zero, we're done
(void)
; otherwise, we signal one waiting thread, decrement i, and keep going
(begin
(semaphore-post (cv-countingSem my-cv))
(broadcast-for-helper my-cv (- i 1))
)))
The single-equals in Racket is numerical equality, since variable assignment is
done either as argument-passing to functions or through let
-bindings.
The second helper is to implement the while
-loop. Like the for
-loop, it can
be broken into very similar steps. One difference is that we keep track of
waiters
rather than i
, but the main difference is that since we are waking
up all the threads, we need to wait for the handshakes to complete, indicating
that the thread has successfully been woken and that there is now one fewer
waiters. With these things in mind, the while
helper becomes:
; while (waiters > 0) { waiters--; h.P(); }
(define (broadcast-while-helper my-cv waiters)
(if (= waiters 0)
; if waiters is 0, we're done
(void)
; otherwise, wait for "waiters" many threads to tell us they're awake
(begin
(semaphore-wait (cv-handshakeSem my-cv))
(broadcast-while-helper my-cv (- waiters 1))
)))
With both the helpers implemented, draw the rest of the owl:
(define (blodwen-cv-broadcast my-cv)
; lock access to waiters
(semaphore-wait (cv-waitersLock my-cv))
(let ([waiters (cv-waiters my-cv)])
; for (int i = 0; i < waiters; i++) s.V(); so start at "waiters - 1"
(broadcast-for-helper my-cv (- waiters 1))
; wait on "waiters" many threads to have been woken
(broadcast-while-helper my-cv waiters)
; unlock access to waiters
(semaphore-post (cv-waitersLock my-cv))
))
Unfortunately, this doesn’t quite work and the keen-eyed reader might have
spotted why: There is something wrong with the for
-loop transcription. Since
it goes from i = 0
to i < waiters
, it is natural to transcribe the latter as
waiters - 1
; we do something as soon as we call the function after all.
However, the condition now ends up being doubly accounted for and missing an
iteration, as the other natural thing is to have 0 as the base case in the
recursive helper:
; for (int i = 0; i < waiters; i++) s.V();
(define (broadcast-for-helper my-cv i)
(if (= i 0)
; if i is zero, we're done
(void)
[...]
This took forever to spot and confused me for ages, since I was confident that I had transcribed the paper correctly. It took an extra pair of eyes to finally spot it, and fortunately it was a trivial fix. With the mistake fixed, the code becomes:
; BROADCAST
(define (blodwen-cv-broadcast my-cv)
; lock access to waiters
(semaphore-wait (cv-waitersLock my-cv))
(let ([waiters (cv-waiters my-cv)])
; signal "waiters" many threads; counting *until* 0 in the helper
; function, hence "waiters" and NOT "waiters - 1"
(broadcast-for-helper my-cv waiters)
; wait on "waiters" many threads to have been woken
(broadcast-while-helper my-cv waiters)
; unlock access to waiters
(semaphore-post (cv-waitersLock my-cv))
))
Including a comment to remind myself, and others, that the lack of
waiters - 1
is intentional.
With this, all the tests passed under the Racket backend, and there was much rejoicing. Almost.
A final note: wait-timeout
Unfortunately, the paper does not describe an implementation of wait-timeout
.
My guess is that they just never needed it in their kernel. I had some attempts
at implementing it myself using Racket’s sync/timeout
functionality, but I
couldn’t get it to work. Unfortunately for the implementation, I didn’t really
need the functionality either, but I did need to get on with more research-y
projects. As such, CVs under the Racket backend do not support this at the time
of writing, but it is possible someone else might come along and implement it if
they need that functionality and have to use the Racket backend.
With that said, I am still fairly happy that I was successfully able to port the implementation from the paper to the Idris Racket backend and that the core functionality of CVs now works. I learned a lot about both CVs and Racket on the way, and it was a solid exercise in implementing concurrency primitives : )
Acknowledgements
- Thanks to Andrew D. Birell for writing the paper which greatly facilitated this implementation.
- Thanks to Tom Harley for helping me understand the paper and decipher the purposes of the internal semaphores.
- Thanks to Edwin Brady for helping with ironing out the final creases in the implementation, like the off-by-one error.