Threads & Condition Variables

Overview

In this lab you will implement three short concurrency primitives in C using pthread mutexes and condition variables:

A cyclic (reusable) barrier.
A bounded buffer (producer/consumer).
A simulated byte-pool allocator that demonstrates the “covering conditions” idea from lecture.

Each part is a single C file. The autograder runs hidden stress tests against your code; any deadlock fails the test by alarm timeout. You can also run ThreadSanitizer locally with make tsan – it’s the easiest way to find data races in your own code.

Each part is graded on two things: correctness (the bulk of the points) and propagating errors from pthread_*_init (4 points per part). pthread_mutex_init and pthread_cond_init can fail with EAGAIN, ENOMEM, or EINVAL when the system is short on resources. Each *_init function in this lab is declared to return 0 on success and non-zero on failure – check the return code of every pthread_*_init call you make and propagate any failure to your caller.

There is no report and no written questions. You submit a tarball; the autograder does the grading.

Setup

Download the starter tarball with the following code:

cd ~/cs224
wget https://cs224.cs.vassar.edu/labs/cv_lab.tar.gz
tar xzf cv_lab.tar.gz
cd cv_lab
make sanity            # builds and runs three smoke tests; should pass
                       # *only* once you start filling in code

In the starter code, you only edit the three .c files. The headers, the Makefile, and the tests are fixed; the autograder uses the exact headers in your tarball.

cv_lab/
  barrier.h    barrier.c
  bbuf.h       bbuf.c
  allocator.h  allocator.c
  tests/sanity_*.c
  Makefile
  README.md

Part 1 – Cyclic barrier (35 pts: 31 correctness, 4 error check)

What is a barrier?

A barrier is a synchronization point that holds every participating thread until all of them have arrived. Imagine a group hike where everyone has agreed to wait at the next trailhead before continuing: nobody starts the next leg until the slowest hiker has reached the meeting point. In a multithreaded program, threads “arrive” at the barrier by calling barrier_wait, and none of them returns from that call until all n participants have called it. The moment the last thread arrives, every thread is released and they all continue on together.

Barriers show up everywhere parallel work is broken into phases that each thread contributes to. Iterative simulations (a fluid solver advancing one timestep at a time, a cellular-automaton update pass, a graph-coloring sweep) need every thread done with phase k before any thread can safely read its neighbors’ results in phase k+1. Numerical libraries like OpenMP and MPI provide barriers as a core primitive for exactly this reason. You can think of a barrier as the multi-thread generalization of pthread_join – “wait until everyone’s done with this step” – except that the threads don’t exit afterwards; they reuse themselves for the next step.

That reuse is what makes a cyclic (also called reusable) barrier important. A single-shot barrier handles one round and then its data structure is exhausted. A cyclic barrier is one you can call from inside a loop:

for (int phase = 0; phase < N_PHASES; phase++) {
    do_my_share_of_phase(phase);
    barrier_wait(&bar);   // wait for everyone to finish this phase
}

Pthreads itself provides pthread_barrier_t (with pthread_barrier_init, pthread_barrier_wait, and pthread_barrier_destroy); in production code you would just use that. We’re implementing our own from scratch in this lab to see how mutexes and condition variables compose into a higher-level synchronization primitive.

Your job in this part is to write a cyclic barrier_wait.

Specification

Implement barrier_init, barrier_wait, and barrier_destroy in barrier.c. The header barrier.h defines the type and prototypes; do not modify it.

You must use only barrier->mutex and barrier->cond. No spinning, no sleep, no atomics, no global state.

Why `barrier_t` has a `phase` field

A first attempt at a cyclic barrier might track only the arrival count: increment on entry, broadcast and reset to zero when the count reaches num_threads, and have everyone else wait while the count isn’t zero. This works for the first round but breaks under repeated use. The trouble is that the count’s “zero” state is transient. As soon as the trip-count is reached, one of the released threads loops around, calls barrier_wait again, and bumps the count back off zero for the next phase – before the previous phase’s slower waiters have all reacquired the mutex and re-checked the predicate. Those slower waiters then see a non-zero count, decide they were spuriously woken, and go back to sleep. Nothing will ever signal them again.

phase exists to give each round of the barrier a stable identifier that the next round doesn’t erase. A waiter snapshots the phase number on entry and waits until the phase changes, not until the count hits some particular value. The phase only ever moves forward, so a slow waiter can always tell whether its round has ended – even if a faster thread has already begun the next one. Use the phase field in your wait predicate.

The autograder runs eight threads through five thousand phases with a deadlock alarm. Between every two consecutive barrier_wait calls it checks an invariant: every thread must be at the same phase number. Any lost wakeup or missed synchronization will fail.

Part 2 – Bounded buffer (45 pts: 41 correctness, 4 error check)

Implement bbuf_init, bbuf_put, bbuf_get, and bbuf_destroy in bbuf.c. bbuf.h defines a buffer of BBUF_CAPACITY (= 2) integer slots with one mutex and two condition variables: not_empty (consumers wait here when the buffer is empty) and not_full (producers wait here when the buffer is full).

Lecture walks through three broken versions before arriving at a correct one. The bugs are worth keeping in mind as you write your implementation:

Using if (count == ...) instead of while – broken because a thread returning from cond_wait is not guaranteed that the condition still holds. The fix is to recheck the predicate in a while loop.
Using a single condition variable with while – still broken with multiple producers and multiple consumers. A consumer’s signal can wake a fellow consumer instead of a waiting producer; that consumer sees an empty buffer and goes back to sleep, and eventually everyone is asleep.
Using two condition variables (not_empty and not_full) – correct. This is what you must implement.

Items must come out of bbuf_get in the same order they went into bbuf_put (FIFO). No item may be lost or returned twice.

The autograder runs:

a single-producer / single-consumer FIFO test with thousands of items, and
a four-producer / four-consumer stress test with thousands of items per producer. Each item is encoded (producer_id << 16) | seq, and the test fails if any (pid, seq) pair is consumed zero or more than one times.

Both run with a deadlock alarm. The single-condition-variable bug reliably deadlocks under this stress test.

Part 3 – Covering conditions (20 pts: 16 correctness, 4 error check)

Implement alloc_init, alloc_acquire, alloc_release, and alloc_destroy in allocator.c. This is a simulated allocator: it tracks bytes_left in a fixed-size pool but does not allocate real memory. alloc_acquire(a, n) blocks until at least n bytes are free, then deducts n; alloc_release(a, n) adds n back to the pool and wakes waiters that may now proceed.

This is the “covering conditions” scenario from lecture. Multiple threads can be blocked at the same time with different requested sizes, and a release that frees X bytes might satisfy some waiters and not others. pthread_cond_signal wakes only one waiter; if the woken waiter can’t proceed and goes back to sleep, other waiters that could proceed are stranded. The fix is the appropriate broadcast call. Pick the right primitive.

You must use only a->mutex and a->cond (one condition variable, exactly as the lecture example).

The autograder runs:

a stress test with many threads and varied request sizes against a small pool, and
a deterministic trap: the pool is held empty, two waiters of different sizes are parked on the cond var, then exactly enough bytes are released to satisfy the smaller waiter only. Under cond_signal the wrong waiter is woken, deadlocks, and alarm fires.

Submitting

From the cv_lab/ directory:

make submission        # creates submission.tar.gz with your three .c files

Upload submission.tar.gz to the Threads and Condition Variables assignment on Gradescope. The autograder prints a per-part score.

You may submit unlimited times. Only your last submission counts.

If a part is incomplete, leave the TODO stubs in place; you’ll get 0 for that part but the autograder will still run the others.

Tips

Run make sanity early and often. The sanity tests are not the autograder, but if they fail or hang, the autograder definitely will.
make tsan rebuilds the sanity tests with ThreadSanitizer. Any “data race” warning here is a real bug – fix it before submitting, even if the sanity test still passes.
If you see a sanity run hang for more than a few seconds, you have a deadlock. Ctrl-C, recheck the slide for the relevant broken version, and look for a missing wait predicate, a missing while, or signaling the wrong condition variable.
Pthread reference: man pthread_mutex_init, man pthread_cond_wait, man pthread_cond_signal, man pthread_cond_broadcast.