Final Study Guide
The final is closed book, closed note, no electronic devices allowed.
The final is comprehensive, so you should be familiar with the material in the exam1 study guide and the exam2 study guide, in addition to the material below.
You will be provided with a Reference Card with a list of RISC-V instructions and other useful information for the quiz.
On the course website there are blank copies of both exams and all the quizzes, along with their solutions. Questions on the final will be similar to those that you have seen in previous exams and quizzes. About 30% of the final is material since exam2, and about 70% is from previous material.
Concurrency
POSIX Threads (pthreads)
Creating Threads
pthread_t tid;
pthread_create(&tid, NULL, thread_func, (void*)arg);
Waiting for Threads to Finish
void* result;
pthread_join(tid, &result);
Passing Data to and from Threads
- Pass a pointer to a struct (wrap all needed values)
- Ensure the data remains valid for the thread’s lifetime
typedef struct {
int id;
char* msg;
} thread_data_t;
Locks
Basic Usage
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
// alternative init
pthread_mutex_init(&lock, NULL);
pthread_mutex_lock(&lock);
// critical section
pthread_mutex_unlock(&lock);
Use Cases
- Protect shared variables
- Prevent race conditions
Condition Variables
Usage Pattern
#include <pthread.h>
pthread_cond_t cond;
pthread_mutex_t mutex;
// Initialize with default attributes
pthread_cond_init(&cond, NULL);
pthread_mutex_init(&mutex, NULL);
// ... use the condition variable ...
pthread_mutex_lock(&lock);
while (!condition_met) {
pthread_cond_wait(&cond, &lock);
}
// do work
pthread_mutex_unlock(&lock);
// signal another thread
pthread_cond_signal(&cond);
// Clean up when done
pthread_cond_destroy(&cond);
pthread_mutex_destroy(&mutex);
Use Case
- Thread waits until a condition becomes true (e.g. buffer not empty)
Semaphores
Initialization
sem_t sem;
sem_init(&sem, 0, initial_value);
Wait and Signal
sem_wait(&sem); // decrement semaphore and wait if semaphore is negative
sem_post(&sem); // increment semaphore and wake a single thread waiting on the semaphore
Use Cases
- Resource management (e.g. N-slot buffer)
- Thread synchronization
Producer-Consumer Problem
- Shared bounded buffer
- Synchronization using:
- Mutex (for mutual exclusion)
- Condition variables or semaphores (to wait/signal)
Key Conditions:
- Producer waits if buffer full
- Consumer waits if buffer empty
Readers-Writers Locks
Goals
- Allow multiple concurrent readers
- Writers get exclusive access
Concurrent Data Structures
Examples:
- Approximate Counters
- Concurrent lists
- Michael and Scott Concurrent Queues
- Concurrent hash map (mutex per bucket)
- Circular buffer (for producer-consumer)
Key Concepts to Know
| Concept | Know How To… |
|---|---|
| Race condition | Explain what causes it and how to fix it |
| Deadlock | Identify and prevent |
| Critical section | Define and protect with locks |
| Bounded buffer | Describe and implement |
| Signal vs broadcast | Differentiate in condition variables |
| Mutex vs Semaphore | Compare and choose correctly |
Optimizing Program Performance
Why Optimize?
- Goal: Map programs efficiently to machine resources.
- Importance: Can lead to significant performance improvements (e.g., 10x).
- Factors: Both asymptotic complexity (Big-O) and constant factors matter. Choose the best algorithm first, then optimize implementation details.
Measuring Performance
- Cycles Per Element (CPE): A common metric to express performance for programs operating on sequences of data. Lower CPE is better.
Optimization Blockers
- Memory Aliasing: When two different pointers might refer to the same memory location. Compilers must be conservative and often cannot optimize memory accesses aggressively.
- Procedure Calls: Compilers often treat procedure calls as “black boxes” due to potential side effects, limiting optimizations across function boundaries.
- Inlining: Replacing a function call with the function’s body (e.g., GCC
-O1) can help, but is often limited to calls within the same file.
- Inlining: Replacing a function call with the function’s body (e.g., GCC
How to maximize performance
- Write Compiler-Friendly Code: Understand limitations like aliasing and procedure calls.
- Reduce Memory Accesses: Use temporary variables within loops to accumulate results instead of reading/writing to memory repeatedly.
- Loop Optimizations:
- Code Motion: Move calculations or function calls (like
strlenon a non-changing string) out of loops if their result doesn’t change per iteration. - Loop Unrolling: Perform more work per iteration to reduce loop overhead and expose more instructions for parallel execution.
- Reassociation: Change the order of operations (if mathematically valid) to break sequential dependencies.
- Separate Accumulators: Use multiple temporary variables to accumulate results in parallel, breaking data dependencies.
- Code Motion: Move calculations or function calls (like
Exploiting Modern Hardware (Microarchitecture)
- Instruction Pipelining: Processors break instruction execution into stages, allowing multiple instructions to be in different stages simultaneously.
- Instruction-Level Parallelism (ILP): Modern processors can execute multiple instructions per clock cycle (superscalar).
- Data Dependencies: The primary limiter of ILP. Optimizations often focus on breaking these dependencies.
- SIMD (Single Instruction, Multiple Data): Vector instructions (like AVX2) perform the same operation on multiple data elements simultaneously.
Takeaways
- Optimization requires understanding both the code and the underlying hardware.
- Focus on bottlenecks.
- Compilers are powerful but have limitations (aliasing, procedure calls).
- Programmers can enable better performance by:
- Choosing efficient algorithms.
- Writing compiler-friendly code.
- Applying manual optimizations (loop unrolling, reassociation, etc.).
- Understanding and leveraging hardware features (ILP, SIMD).
- Tune code for the specific target machine for best results.