Exam2 Study Guide

Exam 2 is closed-book and closed-notes. No electronic devices are allowed. You will be provided with a Reference Card with a list of RISC-V instructions and other useful information for the quiz.

This study guide highlights some of the major topics we have covered so far, but it is not an exhaustive list. Material from lectures and labs but not included here may also appear on the quiz.

The focus of the quiz will be on the material since the first exam, but you should be familiar with the material in the exam1 study guide.

Logic Design

Digital Signals & Abstraction

Concept: Computers use voltage thresholds to represent discrete binary values (0 and 1) from continuous analog signals.
Why Digital?:
- Noise Immunity: Uses “high” and “low” ranges with a guard range in between. Small fluctuations (noise) do not change the bit value.
- Simplicity: Allows for simple, small, and fast circuits.
Representation:
- High Voltage: Represents logical 1 (True).
- Low Voltage: Represents logical 0 (False).

Logic Gates and Combinational Circuits

Logic gates are the fundamental building blocks that perform Boolean functions. They respond to input changes continuously (with a small propagation delay).

Combinational Circuits are an acyclic (no loops) network of logic gates.

Properties:
- Memoryless: The output depends solely on the current input values.
- Continuous: Outputs update continuously based on inputs (after a delay).
Truth Tables: A table listing all possible input combinations ($2^n$ rows for $n$ inputs) and their corresponding outputs.
Boolean Equations:
- Used to express logic functions as equations.
- Sum of Products (SOP): A standard way to represent any logic truth table.
  1. Look at a output variable column of the truth table. Ignore the rows where the output is 0.
  2. Where the output is 1, create Minterms (The “Products”): by writing a product term (AND operation) combining all input variables. If the input is 1, write the variable (e.g., $A$) name down, if the input is 0, write the NOT of the variable (e.g., $A’$).
  3. Combine Them (The “Sum”): Add all the minterms together using OR operators to get the final expression.
  - Note: SOP doesn’t always yield the simplest circuit.

Standard Digital Components

These are common circuits built from logic gates.

Equality Comparator:
- Bit-level: Checks if two bits are identical.
- Word-level: To check if two words are equal, logically you compare each bit pair and AND all the results together.
Multiplexer (MUX):
- Function: A “selector” that chooses one of multiple data inputs based on a control signal.
- Usage: Critical for building ALUs (e.g., choosing between the result of an AND operation and an ADD operation).
Adder:
- We built a 32-bit adder by chaining 32 1-bit full adders together. This is known as a ripple adder. It is simple to build, but gives poor preformance, because the carryout bits must propigate from the lsb to the msb.

Sequential Logic

1. Combinational vs. Sequential Logic

Combinational Logic (The “Now”):
- Acyclic: No loops or feedback.
- Deterministic: Output depends only on current inputs (e.g., ALU, Adder).
- Timeless: Changes propagate immediately after a delay.
Sequential Logic (The “History”):
- Cyclic: Contains feedback loops where output feeds back into input.
- Memory: Output depends on current inputs AND the sequence of past inputs.
- Analogy: A “Black Box” containing combinational logic + a feedback cycle to retain state.

2. The Hierarchy of Storage Elements

The Bistable Element (The Physics)

Construction: Two inverters (NOT gates) connected in a loop.
Signal Conditioning: Inverters “strengthen” weak signals, pushing them toward valid 0 or 1 voltages.
Metastability (“Here be Dragons”):
- The Point: If voltage is exactly halfway (0.5V), the circuit doesn’t know which way to settle.
- The Analogy: An Inverted Pendulum. It can balance at the top momentarily, but any noise (thermal, vibration) causes it to fall randomly to Stable 0 or Stable 1.

SR Latch (Set-Reset)

Inputs: S (Set to 1), R (Reset to 0).
Mechanism: Uses OR gates to inject control signals into the feedback loop.
Modes:
- $R=1, S=0$: Force 0.
- $S=1, R=0$: Force 1.
- $R=0, S=0$: Storing Mode (Keeps previous value).

D Latch (Level-Sensitive)

Inputs: D (Data), L (Latch/Enable).
Behavior:
- Transparent (L=1): “Open for business.” Output follows input immediately. If D changes, Output changes.
- Opaque (L=0): Input is ignored. Output is frozen at the last value.

D Flip-Flop (Edge-Triggered)

The Problem with Latches: Transparency is dangerous; signals racing through the circuit can cause errors. We want to snapshot data at a precise moment.
The Solution: The Edge-Triggered Flip-Flop.
Behavior: Samples D only on the Rising Edge (0 $\to$ 1 transition) of the clock.
Stability: At all other times, the output is stable, even if D changes. This allows the rest of the cycle for computation.

Registers and Clocks

The Clock: The “Conductor” of the CPU.
- Frequency: e.g., 2.4 GHz.
- Period: The time between ticks (inverse of frequency). Determines how much work (logic depth) can be done in one cycle.
Register: A collection of D Flip-Flops (e.g., 32 for a 32-bit word) sharing one clock.
- Acts as a barrier. Holds the fort (stable output) until the next rising edge, then loads new input instantly.

The RISC-V Register File

Structure: An array of 32 registers (x0–x31).
Reading (Combinational): Asynchronous. You supply an address (e.g., rs1), and the data comes out after a slight delay. Can read two registers simultaneously (Dual Ported).
Writing (Sequential): Synchronous. You supply an address (rd) and data, but the write only happens on the Rising Edge of the clock.

The RISC-V Pipelined Processor

Pipelining Core Concept

Divide long combinational logic into smaller independent stages, separated by pipeline registers.

Clock rate limited by longest pipeline stage.

Pipelining divides instruction execution into 5 stages (IF → ID → EX → MEM → WB), overlapping multiple instructions to increase throughput while slightly increasing latency due to pipeline register overhead.

Key metrics: Throughput and Latency. In our pipelined architecture, each instruction has a latency of 5 cycles. Assuming no bubbles are injected, the throughput is one instruction per cycle.

The 5 Stages

Stage	What happens
IF	Fetch instruction from memory using PC; compute PC+4
ID	Decode instruction, read registers, extract immediate
EX	ALU operation, branch evaluation, jump target calculation
MEM	Read/write data memory (loads and stores)
WB	Write result back to register file

Hazards & Solutions

Data Hazard (RAW): A later instruction reads a register that an earlier instruction hasn’t written yet. → Forwarding bypasses the value from EX, MEM, or WB back to ID (zero penalty). Nearest stage has priority.

Load-Use Hazard: Special type of RAW hazard. A lw produces its value in MEM, but the next instruction needs it in ID — forwarding alone can’t bridge this gap. → Stall 1 cycle (IF/ID stall, EX gets bubble), then forward from MEM.

Control Hazard (Branch Misprediction): Branch outcome isn’t known until EX stage. We predict not taken (fetch PC+4). If wrong: → 2-cycle penalty — inject bubbles into ID and EX to remove mispredicted instructions from the pipeline.

Pipeline Register Modes

Normal: load new value
Stall: hold current value
Bubble: inject NOP

Performance

CPI = 1.0 + B/I (1 + bubbles per instruction)

Penalty = (fraction of that instruction type) × (hazard rate) × (bubble count). Typical CPI ≈ 1.2.

The Memory Subsystem

SRAM vs DRAM

Feature	SRAM	DRAM
Speed	Faster	Slower
Cost	More expensive	Cheaper
Structure	Uses flip-flops (6 transistors)	Uses capacitors (1 transistor + capacitor)
Volatility	Volatile	Volatile
Refreshing	No need	Requires periodic refreshing
Use Case	Cache memory	Main memory

Principle of Locality

Locality refers to the tendency of programs to access the same set of memory locations repetitively over a short period of time.

Temporal Locality

Recently accessed data is likely to be accessed again soon.
Example: Loop counters, frequently called functions.

Spatial Locality

Data near recently accessed memory is likely to be accessed soon.
Example: Array elements accessed in sequence.

Role of Caches

Caches act as a fast buffer between CPU and main memory.
Store frequently accessed data to speed up execution.
Reduce average memory access time.

Types of Cache Misses

Miss Type	Description
Cold (Compulsory)	The first time data is accessed; it’s not yet in the cache.
Conflict	Multiple data mapped to the same cache location in set-associative/direct-mapped caches.
Capacity	Cache cannot hold all the data being used; some data is evicted and reloaded.

LRU Replacement Algorithm

LRU (Least Recently Used) evicts the line that has not been accessed for the longest time.
Implements temporal locality assumption.
Commonly used in n-way set-associative caches.

Performing a Cache Lookup

To perform a lookup:

Extract the set index from the address.
Search the set for a line where the valid bit is set and the tag matches.
If found, it’s a hit. If not, it’s a miss.
On a miss, fetch from lower memory and update the cache using the replacement policy.

Cache Terminology

Term	Definition
Line	Smallest unit of data storage in a cache.
Set	A group of one or more cache lines.
Block	A chunk of data transferred between memory and cache.
Tag	Part of address used to identify memory blocks stored in cache lines.
Valid Bit	Indicates whether the data in the cache line is valid.

Cache Mapping Techniques

Type	Description
Direct-Mapped	Each memory block maps to exactly one cache line. Fast but prone to conflict misses.
Fully Associative	Any memory block can go into any line. Low conflict misses but slower and more hardware.
n-Way Set Associative	Cache is divided into sets, each with E lines. A compromise between the two extremes.

Write Policies

Policy	Description
Write-Through	Updates are written to both the cache and main memory immediately. Easy to implement, slower writes.
Write-Back	Updates are only written to cache initially. Written back to memory when line is evicted. Faster, but requires modified (dirty) bit tracking.

Cache Summary

SRAM is fast and used in caches; DRAM is cheaper and used for main memory.
Locality (temporal/spatial) motivates cache usage.
Caches reduce latency and increase speed but have miss types to manage.
Cache mapping strategies and write policies have performance tradeoffs.