Threads & Condition Variables

Overview

In this lab you will implement three short concurrency primitives in C using pthread mutexes and condition variables:

  1. A cyclic (reusable) barrier.
  2. A bounded buffer (producer/consumer).
  3. A simulated byte-pool allocator that demonstrates the “covering conditions” idea from lecture.

Each part is a single C file. The autograder runs hidden stress tests against your code; any deadlock fails the test by alarm timeout. You can also run ThreadSanitizer locally with make tsan – it’s the easiest way to find data races in your own code.

Each part is graded on two things: correctness (the bulk of the points) and propagating errors from pthread_*_init (4 points per part). pthread_mutex_init and pthread_cond_init can fail with EAGAIN, ENOMEM, or EINVAL when the system is short on resources. Each *_init function in this lab is declared to return 0 on success and non-zero on failure – check the return code of every pthread_*_init call you make and propagate any failure to your caller.

There is no report and no written questions. You submit a tarball; the autograder does the grading.


Setup

Download the starter tarball with the following code:

cd ~/cs224
wget https://cs224.cs.vassar.edu/labs/cv_lab.tar.gz
tar xzf cv_lab.tar.gz
cd cv_lab
make sanity            # builds and runs three smoke tests; should pass
                       # *only* once you start filling in code

In the starter code, you only edit the three .c files. The headers, the Makefile, and the tests are fixed; the autograder uses the exact headers in your tarball.

cv_lab/
  barrier.h    barrier.c
  bbuf.h       bbuf.c
  allocator.h  allocator.c
  tests/sanity_*.c
  Makefile
  README.md

Part 1 – Cyclic barrier (35 pts: 31 correctness, 4 error check)

What is a barrier?

A barrier is a synchronization point that holds every participating thread until all of them have arrived. Imagine a group hike where everyone has agreed to wait at the next trailhead before continuing: nobody starts the next leg until the slowest hiker has reached the meeting point. In a multithreaded program, threads “arrive” at the barrier by calling barrier_wait, and none of them returns from that call until all n participants have called it. The moment the last thread arrives, every thread is released and they all continue on together.

Barriers show up everywhere parallel work is broken into phases that each thread contributes to. Iterative simulations (a fluid solver advancing one timestep at a time, a cellular-automaton update pass, a graph-coloring sweep) need every thread done with phase k before any thread can safely read its neighbors’ results in phase k+1. Numerical libraries like OpenMP and MPI provide barriers as a core primitive for exactly this reason. You can think of a barrier as the multi-thread generalization of pthread_join – “wait until everyone’s done with this step” – except that the threads don’t exit afterwards; they reuse themselves for the next step.

That reuse is what makes a cyclic (also called reusable) barrier important. A single-shot barrier handles one round and then its data structure is exhausted. A cyclic barrier is one you can call from inside a loop:

for (int phase = 0; phase < N_PHASES; phase++) {
    do_my_share_of_phase(phase);
    barrier_wait(&bar);   // wait for everyone to finish this phase
}

Pthreads itself provides pthread_barrier_t (with pthread_barrier_init, pthread_barrier_wait, and pthread_barrier_destroy); in production code you would just use that. We’re implementing our own from scratch in this lab to see how mutexes and condition variables compose into a higher-level synchronization primitive.

Your job in this part is to write a cyclic barrier_wait.

Specification

Implement barrier_init, barrier_wait, and barrier_destroy in barrier.c. The header barrier.h defines the type and prototypes; do not modify it.

You must use only barrier->mutex and barrier->cond. No spinning, no sleep, no atomics, no global state.

Why barrier_t has a phase field

A first attempt at a cyclic barrier might track only the arrival count: increment on entry, broadcast and reset to zero when the count reaches num_threads, and have everyone else wait while the count isn’t zero. This works for the first round but breaks under repeated use. The trouble is that the count’s “zero” state is transient. As soon as the trip-count is reached, one of the released threads loops around, calls barrier_wait again, and bumps the count back off zero for the next phase – before the previous phase’s slower waiters have all reacquired the mutex and re-checked the predicate. Those slower waiters then see a non-zero count, decide they were spuriously woken, and go back to sleep. Nothing will ever signal them again.

phase exists to give each round of the barrier a stable identifier that the next round doesn’t erase. A waiter snapshots the phase number on entry and waits until the phase changes, not until the count hits some particular value. The phase only ever moves forward, so a slow waiter can always tell whether its round has ended – even if a faster thread has already begun the next one. Use the phase field in your wait predicate.

The autograder runs eight threads through five thousand phases with a deadlock alarm. Between every two consecutive barrier_wait calls it checks an invariant: every thread must be at the same phase number. Any lost wakeup or missed synchronization will fail.


Part 2 – Bounded buffer (45 pts: 41 correctness, 4 error check)

Implement bbuf_init, bbuf_put, bbuf_get, and bbuf_destroy in bbuf.c. bbuf.h defines a buffer of BBUF_CAPACITY (= 2) integer slots with one mutex and two condition variables: not_empty (consumers wait here when the buffer is empty) and not_full (producers wait here when the buffer is full).

Lecture walks through three broken versions before arriving at a correct one. The bugs are worth keeping in mind as you write your implementation:

Items must come out of bbuf_get in the same order they went into bbuf_put (FIFO). No item may be lost or returned twice.

The autograder runs:

Both run with a deadlock alarm. The single-condition-variable bug reliably deadlocks under this stress test.


Part 3 – Covering conditions (20 pts: 16 correctness, 4 error check)

Implement alloc_init, alloc_acquire, alloc_release, and alloc_destroy in allocator.c. This is a simulated allocator: it tracks bytes_left in a fixed-size pool but does not allocate real memory. alloc_acquire(a, n) blocks until at least n bytes are free, then deducts n; alloc_release(a, n) adds n back to the pool and wakes waiters that may now proceed.

This is the “covering conditions” scenario from lecture. Multiple threads can be blocked at the same time with different requested sizes, and a release that frees X bytes might satisfy some waiters and not others. pthread_cond_signal wakes only one waiter; if the woken waiter can’t proceed and goes back to sleep, other waiters that could proceed are stranded. The fix is the appropriate broadcast call. Pick the right primitive.

You must use only a->mutex and a->cond (one condition variable, exactly as the lecture example).

The autograder runs:


Submitting

From the cv_lab/ directory:

make submission        # creates submission.tar.gz with your three .c files

Upload submission.tar.gz to the Threads and Condition Variables assignment on Gradescope. The autograder prints a per-part score.

You may submit unlimited times. Only your last submission counts.

If a part is incomplete, leave the TODO stubs in place; you’ll get 0 for that part but the autograder will still run the others.


Tips