Exam2 Study Guide
Exam 2 is closed-book and closed-notes. No electronic devices are allowed. You will be provided with a Reference Card with a list of RISC-V instructions and other useful information for the quiz.
This study guide highlights some of the major topics we have covered so far, but it is not an exhaustive list. Material from lectures and labs but not included here may also appear on the quiz.
The focus of the quiz will be on the material since the first exam, but you should be familiar with the material in the exam1 study guide.
Logic Design
Digital Signals & Abstraction
- Concept: Computers use voltage thresholds to represent discrete binary values (0 and 1) from continuous analog signals.
- Why Digital?:
- Noise Immunity: Uses “high” and “low” ranges with a guard range in between. Small fluctuations (noise) do not change the bit value.
- Simplicity: Allows for simple, small, and fast circuits.
- Representation:
- High Voltage: Represents logical
1(True). - Low Voltage: Represents logical
0(False).
- High Voltage: Represents logical
Logic Gates and Combinational Circuits
Logic gates are the fundamental building blocks that perform Boolean functions. They respond to input changes continuously (with a small propagation delay).
Combinational Circuits are an acyclic (no loops) network of logic gates.
- Properties:
- Memoryless: The output depends solely on the current input values.
- Continuous: Outputs update continuously based on inputs (after a delay).
- Truth Tables: A table listing all possible input combinations ($2^n$ rows for $n$ inputs) and their corresponding outputs.
- Boolean Equations:
- Used to express logic functions as equations.
- Sum of Products (SOP): A standard way to represent any logic truth
table.
- Look at a output variable column of the truth table. Ignore the rows where the output is 0.
- Where the output is 1, create Minterms (The “Products”): by writing a product term (AND operation) combining all input variables. If the input is 1, write the variable (e.g., $A$) name down, if the input is 0, write the NOT of the variable (e.g., $A’$).
- Combine Them (The “Sum”): Add all the minterms together using OR operators to get the final expression.
- Note: SOP doesn’t always yield the simplest circuit.
Standard Digital Components
These are common circuits built from logic gates.
- Equality Comparator:
- Bit-level: Checks if two bits are identical.
- Word-level: To check if two words are equal, logically you compare each bit pair and AND all the results together.
- Multiplexer (MUX):
- Function: A “selector” that chooses one of multiple data inputs based on a control signal.
- Usage: Critical for building ALUs (e.g., choosing between the result of an AND operation and an ADD operation).
- Adder:
- We built a 32-bit adder by chaining 32 1-bit full adders together. This is known as a ripple adder. It is simple to build, but gives poor preformance, because the carryout bits must propigate from the lsb to the msb.
Sequential Logic
1. Combinational vs. Sequential Logic
- Combinational Logic (The “Now”):
- Acyclic: No loops or feedback.
- Deterministic: Output depends only on current inputs (e.g., ALU, Adder).
- Timeless: Changes propagate immediately after a delay.
- Sequential Logic (The “History”):
- Cyclic: Contains feedback loops where output feeds back into input.
- Memory: Output depends on current inputs AND the sequence of past inputs.
- Analogy: A “Black Box” containing combinational logic + a feedback cycle to retain state.
2. The Hierarchy of Storage Elements
The Bistable Element (The Physics)
- Construction: Two inverters (NOT gates) connected in a loop.
- Signal Conditioning: Inverters “strengthen” weak signals, pushing them toward valid 0 or 1 voltages.
- Metastability (“Here be Dragons”):
- The Point: If voltage is exactly halfway (0.5V), the circuit doesn’t know which way to settle.
- The Analogy: An Inverted Pendulum. It can balance at the top momentarily, but any noise (thermal, vibration) causes it to fall randomly to Stable 0 or Stable 1.
SR Latch (Set-Reset)
- Inputs: S (Set to 1), R (Reset to 0).
- Mechanism: Uses OR gates to inject control signals into the feedback loop.
- Modes:
- $R=1, S=0$: Force 0.
- $S=1, R=0$: Force 1.
- $R=0, S=0$: Storing Mode (Keeps previous value).
D Latch (Level-Sensitive)
- Inputs: D (Data), L (Latch/Enable).
- Behavior:
- Transparent (L=1): “Open for business.” Output follows input immediately. If D changes, Output changes.
- Opaque (L=0): Input is ignored. Output is frozen at the last value.
D Flip-Flop (Edge-Triggered)
- The Problem with Latches: Transparency is dangerous; signals racing through the circuit can cause errors. We want to snapshot data at a precise moment.
- The Solution: The Edge-Triggered Flip-Flop.
- Behavior: Samples D only on the Rising Edge (0 $\to$ 1 transition) of the clock.
- Stability: At all other times, the output is stable, even if D changes. This allows the rest of the cycle for computation.
Registers and Clocks
- The Clock: The “Conductor” of the CPU.
- Frequency: e.g., 2.4 GHz.
- Period: The time between ticks (inverse of frequency). Determines how much work (logic depth) can be done in one cycle.
- Register: A collection of D Flip-Flops (e.g., 32 for a 32-bit word) sharing one clock.
- Acts as a barrier. Holds the fort (stable output) until the next rising edge, then loads new input instantly.
The RISC-V Register File
- Structure: An array of 32 registers (x0–x31).
- Reading (Combinational): Asynchronous. You supply an address (e.g.,
rs1), and the data comes out after a slight delay. Can read two registers simultaneously (Dual Ported). - Writing (Sequential): Synchronous. You supply an address (
rd) and data, but the write only happens on the Rising Edge of the clock.
The RISC-V Pipelined Processor
Pipelining Core Concept
Divide long combinational logic into smaller independent stages, separated by pipeline registers.
Clock rate limited by longest pipeline stage.
Pipelining divides instruction execution into 5 stages (IF → ID → EX → MEM → WB), overlapping multiple instructions to increase throughput while slightly increasing latency due to pipeline register overhead.
Key metrics: Throughput and Latency. In our pipelined architecture, each instruction has a latency of 5 cycles. Assuming no bubbles are injected, the throughput is one instruction per cycle.
The 5 Stages
| Stage | What happens |
|---|---|
| IF | Fetch instruction from memory using PC; compute PC+4 |
| ID | Decode instruction, read registers, extract immediate |
| EX | ALU operation, branch evaluation, jump target calculation |
| MEM | Read/write data memory (loads and stores) |
| WB | Write result back to register file |
Hazards & Solutions
Data Hazard (RAW): A later instruction reads a register that an earlier instruction hasn’t written yet. → Forwarding bypasses the value from EX, MEM, or WB back to ID (zero penalty). Nearest stage has priority.
Load-Use Hazard: Special type of RAW hazard. A lw produces its value in MEM, but the next instruction needs it in ID — forwarding alone can’t bridge this gap.
→ Stall 1 cycle (IF/ID stall, EX gets bubble), then forward from MEM.
Control Hazard (Branch Misprediction): Branch outcome isn’t known until EX stage. We predict not taken (fetch PC+4). If wrong: → 2-cycle penalty — inject bubbles into ID and EX to remove mispredicted instructions from the pipeline.
Pipeline Register Modes
- Normal: load new value
- Stall: hold current value
- Bubble: inject NOP
Performance
CPI = 1.0 + B/I (1 + bubbles per instruction)
Penalty = (fraction of that instruction type) × (hazard rate) × (bubble count). Typical CPI ≈ 1.2.
The Memory Subsystem
SRAM vs DRAM
| Feature | SRAM | DRAM |
|---|---|---|
| Speed | Faster | Slower |
| Cost | More expensive | Cheaper |
| Structure | Uses flip-flops (6 transistors) | Uses capacitors (1 transistor + capacitor) |
| Volatility | Volatile | Volatile |
| Refreshing | No need | Requires periodic refreshing |
| Use Case | Cache memory | Main memory |
Principle of Locality
Locality refers to the tendency of programs to access the same set of memory locations repetitively over a short period of time.
Temporal Locality
- Recently accessed data is likely to be accessed again soon.
- Example: Loop counters, frequently called functions.
Spatial Locality
- Data near recently accessed memory is likely to be accessed soon.
- Example: Array elements accessed in sequence.
Role of Caches
- Caches act as a fast buffer between CPU and main memory.
- Store frequently accessed data to speed up execution.
- Reduce average memory access time.
Types of Cache Misses
| Miss Type | Description |
|---|---|
| Cold (Compulsory) | The first time data is accessed; it’s not yet in the cache. |
| Conflict | Multiple data mapped to the same cache location in set-associative/direct-mapped caches. |
| Capacity | Cache cannot hold all the data being used; some data is evicted and reloaded. |
LRU Replacement Algorithm
- LRU (Least Recently Used) evicts the line that has not been accessed for the longest time.
- Implements temporal locality assumption.
- Commonly used in n-way set-associative caches.
Performing a Cache Lookup
To perform a lookup:
- Extract the set index from the address.
- Search the set for a line where the valid bit is set and the tag matches.
- If found, it’s a hit. If not, it’s a miss.
- On a miss, fetch from lower memory and update the cache using the replacement policy.
Cache Terminology
| Term | Definition |
|---|---|
| Line | Smallest unit of data storage in a cache. |
| Set | A group of one or more cache lines. |
| Block | A chunk of data transferred between memory and cache. |
| Tag | Part of address used to identify memory blocks stored in cache lines. |
| Valid Bit | Indicates whether the data in the cache line is valid. |
Cache Mapping Techniques
| Type | Description |
|---|---|
| Direct-Mapped | Each memory block maps to exactly one cache line. Fast but prone to conflict misses. |
| Fully Associative | Any memory block can go into any line. Low conflict misses but slower and more hardware. |
| n-Way Set Associative | Cache is divided into sets, each with E lines. A compromise between the two extremes. |
Write Policies
| Policy | Description |
|---|---|
| Write-Through | Updates are written to both the cache and main memory immediately. Easy to implement, slower writes. |
| Write-Back | Updates are only written to cache initially. Written back to memory when line is evicted. Faster, but requires modified (dirty) bit tracking. |
Cache Summary
- SRAM is fast and used in caches; DRAM is cheaper and used for main memory.
- Locality (temporal/spatial) motivates cache usage.
- Caches reduce latency and increase speed but have miss types to manage.
- Cache mapping strategies and write policies have performance tradeoffs.