Memory Management

Paging, segmentation, virtual memory, page replacement.

Paging

Page table, TLB, fragmentation.

Computer organization — pipelining, cache hierarchy, memory speedups
Notes

Instruction pipelining — overlap execution of multiple instructions to improve throughput.

Classic 5-stage pipeline:

  1. IF (Instruction Fetch)
  2. ID (Instruction Decode) + register read
  3. EX (Execute) in ALU
  4. MEM (Memory access)
  5. WB (Write Back)

Without pipelining: 5 cycles per instruction → 1/5 instruction per cycle.
With pipelining (steady state): 1 instruction per cycle. 5× speedup in ideal case.


Pipeline hazards:

1. Structural hazard: two stages competing for the same resource (e.g., memory unit serving both IF and MEM). Fix: duplicate (separate I-cache and D-cache).

2. Data hazard: instruction depends on result of preceding one not yet computed.

  • RAW (Read After Write): most common. ADD r1, r2, r3 followed by SUB r4, r1, r5 — SUB reads r1 before ADD wrote it.
  • Mitigations: forwarding (bypass result from EX/MEM to next ALU input), stalling (insert bubbles), compiler reordering.

3. Control hazard: caused by branches.

  • Wait until branch is resolved (stall ~3 cycles) — bad.
  • Branch prediction: static (always taken / always not) or dynamic (history-based, e.g., 2-bit saturating counter, branch target buffer). Modern CPUs achieve >95% accuracy.

Speedup formula: S = (Time without pipelining) / (Time with pipelining + stalls).


MEMORY HIERARCHY

Level Size Speed Cost/byte
Registers (CPU) ~kBs < 1 ns Highest
L1 cache tens of KB 1-2 ns High
L2 cache hundreds of KB 3-10 ns
L3 cache MBs 10-30 ns
Main memory (DRAM) GBs 50-100 ns
SSD / Disk TBs 0.1-10 ms Low

Principle of locality justifies caches:

  • Temporal locality: recently-used items used again soon.
  • Spatial locality: items near recently-used items likely to be used.

CACHE BASICS

When CPU requests data, check cache first.

  • Hit: data found in cache. Fast.
  • Miss: must fetch from lower level. Slow.

Average memory access time (AMAT):
AMAT = Hit time + Miss rate × Miss penalty.

Example: Hit time = 1 ns, hit rate = 0.95, miss penalty = 100 ns.
AMAT = 1 + 0.05 × 100 = 6 ns.

Even a 5% miss rate doubles average access time. Hence the obsession with high hit rates.


CACHE MAPPING POLICIES — where does a memory block go in cache?

1. Direct mapped: block goes to one specific cache line. Simple but conflict misses when multiple blocks map to same line.

2. Fully associative: block can go anywhere. No conflicts but slow lookup (must compare all entries).

3. Set associative: k-way; block goes to one of k lines in a set. Compromise between simplicity and flexibility. Common: 4-way, 8-way.

Address breakdown:

  • Tag (uniquely identifies block).
  • Index (which set).
  • Offset (byte within block).

REPLACEMENT POLICIES (when cache is full):

  • LRU (Least Recently Used): evict least recently accessed. Good but expensive to track.
  • FIFO: simple but suffers Belady's anomaly.
  • Random: surprisingly competitive; simple.
  • MRU, NRU, Pseudo-LRU for approximations.

WRITE POLICIES:

1. Write-through: write to cache AND main memory simultaneously. Simple, consistent. Slow (bottlenecked by memory).

2. Write-back: write only to cache; mark "dirty"; flush to memory when evicted. Fast but consistency overhead.

Combined with:

  • Write-allocate: on write miss, load block into cache first.
  • Write-no-allocate: on write miss, write directly to memory.

Typical pairings: write-back + write-allocate, OR write-through + no-allocate.


Worked example. A direct-mapped cache has 64 lines, each 16 bytes. Memory address is 32 bits. Find tag, index, offset fields.

  • 16 bytes → 4 bits offset.
  • 64 lines → 6 bits index.
  • Tag = 32 − 6 − 4 = 22 bits.

For a 32-bit address ABCDEF12 (hex):

  • Offset = last 4 bits = 0010 = 2.
  • Index = next 6 bits.
  • Tag = upper 22 bits.

Page replacement

FIFO, LRU, optimal, second-chance.

Page replacement — FIFO, LRU, Optimal, and Belady's anomaly
Notes

When physical memory is full and a new page is referenced, the OS must evict an existing page. Page replacement algorithms decide which one.

Three classics:

1. FIFO (First-In, First-Out). Evict the page that has been in memory longest.

  • Simple to implement (queue).
  • Suffers from Belady's anomaly: more frames can paradoxically cause MORE faults.

2. LRU (Least Recently Used). Evict the page that hasn't been used for the longest time.

  • Closer to optimal in practice.
  • Implementation: stack/counter (expensive) or approximation via "use bits".
  • Does NOT suffer Belady's anomaly (it's a "stack algorithm").

3. Optimal (OPT, Belady's). Evict the page that won't be needed for the longest time in the future.

  • Provably optimal — fewest faults.
  • Requires future knowledge → unimplementable in real OS, but used as a benchmark.

Worked example. Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. With 3 frames:

FIFO: 1*, 2*, 3*, 4* (replaces 1), 1* (replaces 2), 2* (replaces 3), 5* (replaces 4), 1, 2, 3* (replaces 5), 4* (replaces 1), 5* (replaces 2). 9 faults.

LRU: 1*, 2*, 3*, 4* (LRU=1), 1* (LRU=2), 2* (LRU=3), 5* (LRU=4), 1, 2, 3* (LRU=5), 4* (LRU=1), 5* (LRU=2). 10 faults.

OPT: 1*, 2*, 3*, 4* (replaces 3 — not used soon), 1, 2, 5* (replaces 4), 1, 2, 3* (replaces 5), 4* (replaces 1), 5* (replaces 2). 7 faults.

Belady's anomaly demo. With FIFO and reference: 1,2,3,4,1,2,5,1,2,3,4,5:

  • 3 frames → 9 page faults
  • 4 frames → 10 page faults (paradox!)

Approximations used in real OSes:

  • Second-chance / Clock: FIFO with a use-bit. If the candidate's use-bit is 1, give it a second chance and clear the bit.
  • NRU: classify pages by (referenced, modified) bits.
  • WSClock / aging: decaying counter approximating LRU.

Linux uses an approximation called "two-handed clock" with separate active and inactive lists.