Process Management

Free preview

Process states, PCB, scheduling algorithms.

Practice — 58 questions

This is a free preview chapter. Unlock all of GATE Computer Science →

Scheduling algorithms

FCFS, SJF, RR, priority, multilevel queue.

Processes — states, PCB, context switch, IPC, threads vs processes

Notes

When you open a browser, stream music, and download a file simultaneously, your laptop's OS is quietly juggling all three — and the machinery that makes this feel seamless is what this lesson demystifies. Understanding processes, the PCB, context switching, and IPC is not just GATE syllabus; it is the foundation of everything that happens when a computer runs software.

Definition: A process is a program in execution — an active entity with its own memory space, register values, program counter, and state. A program is merely the passive set of instructions sitting on disk; it becomes a process only when the OS loads and begins running it.

Definition: The Process Control Block (PCB) is the kernel data structure that stores a complete snapshot of a process — PID, state, program counter, saved registers, memory-management pointers, open file table, and accounting data.

Why a program becomes a process

Think of a recipe (program) versus a cook actually making the dish (process). The recipe can be photocopied a thousand times; a hundred cooks can run the same recipe simultaneously. Each cook has their own workspace, their own state (which step they are at), and their own set of pots. In the same way, a single .exe or binary on disk can launch as many independent processes, each with a completely separate memory image and execution state.

The five-state process model

The OS tracks every process through five states:

[New] ──→ [Ready] ⇌ [Running] ──→ [Terminated]
                        ↕
                  [Waiting / Blocked]

New: the process is being created — resources are being allocated, the PCB is initialised.
Ready: the process is loaded into memory and waiting for the CPU. The OS keeps a ready queue of these.
Running: the process owns a CPU core and is executing instructions. On a single-core machine, only one process is ever in this state at a time.
Waiting / Blocked: the process issued an I/O request (read a disk file, wait for a keyboard input, open a socket) and cannot proceed until that I/O completes. It vacates the CPU voluntarily.
Terminated: execution finished (or the process was killed). Resources are being reclaimed.

Key transitions (memorise the direction and trigger):

Transition	Cause
New → Ready	OS finishes loading the process
Ready → Running	Scheduler dispatches the process (gives it CPU)
Running → Ready	Time slice expires (preemption)
Running → Waiting	Process requests I/O or waits for an event
Waiting → Ready	I/O completes or event occurs
Running → Terminated	Process calls `exit()` or is killed

Why it matters: These transitions are drawn in GATE diagrams and appear as MCQs ("after a time-slice expires, a process moves from __ to __"). Getting the direction right is the whole question.

The Process Control Block in depth

The PCB is the kernel's "resume card" for a process. Its fields:

PID (Process ID): unique integer identifier.
Process state: one of the five states above.
Program counter (PC): address of the next instruction to execute.
CPU registers: all general-purpose registers, stack pointer, base pointer — saved exactly so execution can resume as if it never stopped.
CPU scheduling info: priority value, pointer to scheduling queue.
Memory management info: page tables or segment descriptors that describe the process's address space.
I/O status: list of open files and devices, pending I/O requests.
Accounting data: CPU time consumed, wall-clock time, number of context switches.

Real-world example: Open Linux's /proc/<PID>/status or Windows Task Manager's details tab — every row you see (PID, state, virtual memory, open handles) maps directly to a field in the PCB.

Context switching — the cost of multitasking

A context switch is the act of saving one process's state into its PCB and loading another process's state from its PCB so the CPU can run the second process. During the switch itself, no user code runs — it is pure overhead.

Steps in a context switch:

The running process is interrupted (timer interrupt or voluntary yield).
The OS saves the process's PC, registers, and flags into its PCB.
The OS selects the next process from the ready queue (scheduling decision).
The OS loads that process's PC, registers, and flags from its PCB.
On architectures with virtual memory, the OS flushes or selectively invalidates the TLB (Translation Lookaside Buffer), because the new process has different virtual-to-physical address mappings.
Pipeline state and branch-predictor state may reset, adding a few extra cycles.

Modern OSes do this hundreds to thousands of times per second. Each switch costs ~1–10 microseconds, but if switches are too frequent, the overhead eats into throughput — a phenomenon called thrashing the scheduler.

Common misconception: Students say "context switch creates a new process." No — a context switch only swaps which process is on the CPU. No new process is created.

Unix process creation: fork, exec, wait

Unix and Linux create processes with three system calls:

fork() — creates a child process that is an almost-exact copy of the parent:

Returns 0 inside the child.
Returns the child's PID inside the parent.
Returns −1 if it fails (out of memory, process-table full).

Both processes continue running from the line after fork(), which is why the return value is the only way they can tell each other apart. Internally, modern OSes use copy-on-write (COW): the child's pages are marked read-only and share the parent's physical frames until one of them writes — only then is the physical page copied. This makes fork() cheap even for large processes.

exec() family — replaces the current process's memory image with a new program loaded from a file. It does not create a new process; the same PID continues, but now running entirely new code. The fork() + exec() pattern is "spawn a new program."

wait() / waitpid() — lets the parent block until a child finishes, collecting the child's exit status and letting the kernel clean up the child's PCB.

Zombie process: a child that has finished (exit()ed) but whose parent has not yet called wait(). The process is dead, but its PCB entry remains so the parent can collect the exit status. If many zombies accumulate, the process table fills up.

Orphan process: a child whose parent exits first. The OS reassigns it to init (PID 1, or systemd) as its new parent, which periodically calls wait() to reap orphans.

Threads versus processes

Definition: A thread is a lightweight unit of execution within a process. All threads in a process share the same code, heap, and global data, but each has its own stack and register set.

:::compare Threads vs Processes

Feature	Process	Thread
Address space	Separate for each process	Shared within the same process
Communication	IPC mechanisms (slower)	Direct shared memory (fast)
Creation cost	High — fork + page-table copy	Low — just a new stack + TCB
Context-switch overhead	Higher — TLB flush needed	Lower — same address space
Crash impact	Isolated — one crash can't corrupt another process	Thread crash can take the whole process down
Use case	Separate applications, strong isolation	Parallelism within one app (e.g., web server threads)
:::

User-level threads are managed by a library (e.g., POSIX pthreads at user level); the kernel is unaware — switching is fast, but a blocking system call blocks all threads. Kernel-level threads are OS-aware; one can block while others run, but creation is expensive. Modern OSes (Linux, Windows) use a 1:1 model: each user thread maps to exactly one kernel thread.

Inter-Process Communication (IPC)

Separate processes need to share data or coordinate. IPC mechanisms:

Shared memory — processes map the same physical region into their address spaces. It is the fastest IPC (no kernel copy), but requires explicit synchronisation (semaphores or mutexes) to avoid race conditions.
Pipes — a unidirectional byte stream between a parent and child. | in the shell is a pipe.
Named pipes (FIFOs) — like pipes, but accessible by any two processes that know the name.
Message queues — a kernel-maintained list of typed messages; any process can send/receive.
Sockets — work across machines via TCP/IP, not just within one OS.
Signals — asynchronous notifications: SIGKILL (9) forcibly terminates; SIGTERM (15) asks gracefully; SIGINT is Ctrl+C; SIGSEGV fires on a segmentation fault.

Synchronisation and critical sections

When multiple processes or threads access shared mutable data, the result can depend on the exact timing of interleaved instructions — a race condition. The segment of code that reads or modifies the shared resource is the critical section. Mutual exclusion (only one thread in the critical section at a time) is enforced by semaphores, mutexes, and monitors — detailed in the synchronisation lesson.

Key scheduling metrics

Turnaround time = completion time − arrival time
Waiting time = turnaround time − CPU burst time
Response time = time from arrival to first CPU allocation

Worked example:

Question: Two processes arrive at t = 0. P1 has burst 5 ms, P2 has burst 3 ms. FCFS order P1 then P2. Calculate waiting times.

Solution:

Step 1: P1 starts at 0, finishes at 5 ms. Waiting time = 5 − 0 − 5 = 0 ms.
Step 2: P2 starts at 5, finishes at 8 ms. Waiting time = 8 − 0 − 3 = 5 ms.
Conclusion: Average waiting time = (0 + 5) / 2 = 2.5 ms.

:::keypoints Key points

A process is a program in execution; the program on disk is passive, the process is active.
Five states: New, Ready, Running, Waiting/Blocked, Terminated — with well-defined transitions.
The PCB stores the complete snapshot needed to pause and resume a process exactly.
Context switch saves/restores PCBs and flushes the TLB — it is pure overhead, not useful work.
fork() returns 0 to the child and the child's PID to the parent; exec() replaces the image.
Threads share address space (faster IPC) but a crash takes down the whole process.
IPC mechanisms: shared memory, pipes, FIFOs, message queues, sockets, signals.
Turnaround = completion − arrival; waiting = turnaround − burst.
:::

:::memory
"New Ready Running Waiting Terminated" → "No Red Rover Will Travel" — five states in order. For fork return values: Child gets 0, Parent gets PID, Error gets -1 → "CPE": Child-Parent-Error.
:::

:::recap

The OS tracks every process through five named states using a PCB per process.
Context switches enable multitasking but carry a cost; minimising unnecessary switches improves throughput.
fork + exec + wait is Unix's three-syscall recipe for spawning a program.
Zombie = dead child, parent hasn't called wait; Orphan = living child, parent is dead, adopted by init.
Threads are cheaper than processes but less isolated; 1:1 mapping dominates modern OSes.
Critical sections require mutual exclusion to prevent race conditions on shared data.
:::

CPU scheduling algorithms — when to use which

Notes

Every modern operating system must decide which process gets to run next — the wrong choice can waste milliseconds that add up to a sluggish system, while the right choice keeps users happy and CPUs busy. CPU scheduling algorithms are the set of rules the OS uses to make that decision, and understanding why each one was invented tells you exactly when to use it.

Definition: CPU scheduling is the mechanism by which the OS selects, from the ready queue, which process will be loaded onto the CPU next.

Definition: Preemptive scheduling means the OS can forcibly remove a running process from the CPU before it finishes; non-preemptive means a process keeps the CPU until it voluntarily releases it (I/O wait or termination).

Definition: Burst time is the amount of CPU time a process needs to complete its current CPU phase.

First-Come-First-Served (FCFS)

FCFS is the simplest possible policy: treat the ready queue like a queue at a bus stop — whoever arrived first boards first. It is strictly non-preemptive.

Why it matters: FCFS is easy to implement and fair in the sense that no process is permanently ignored, but its performance degrades badly when a long job arrives first.

The convoy effect is the defining weakness: imagine a 100-ms process at the head of the queue, followed by five 1-ms processes. All five must wait 100 ms, pushing average waiting time way up.

Worked example — FCFS

Question: Three processes arrive at time 0: P1 (burst 10 ms), P2 (burst 5 ms), P3 (burst 8 ms). Calculate average waiting time under FCFS.

Solution:
Step 1: FCFS order is P1 → P2 → P3.
Step 2: P1 waits 0 ms; P2 waits 10 ms; P3 waits 10 + 5 = 15 ms.
Conclusion: Average waiting time = (0 + 10 + 15) / 3 = 8.33 ms.

Shortest Job First (SJF)

SJF picks the process with the smallest burst time from the ready queue. It is non-preemptive by default.

Why it matters: SJF is mathematically proven to be optimal for minimising average waiting time among all non-preemptive algorithms — you cannot do better with the same set of burst times.

Common misconception: Many students think "shortest" means shortest total program length. It means shortest next CPU burst, not total job length.

Disadvantage — starvation: A continuous stream of short jobs can permanently block a long one. The fix is aging — incrementally raising the effective priority of a process the longer it waits, until it eventually wins the CPU.

Worked example — SJF (compare with FCFS)

Question: Same three processes: P1 (burst 10), P2 (burst 5), P3 (burst 8). All arrive at time 0.

Solution:
Step 1: SJF orders by burst: P2 (5) → P3 (8) → P1 (10).
Step 2: P2 waits 0 ms; P3 waits 5 ms; P1 waits 5 + 8 = 13 ms.
Conclusion: Average = (0 + 5 + 13) / 3 = 6.0 ms — better than FCFS's 8.33 ms.

Real-world example: A hospital triage that always treats the fastest-to-treat patients first would maximise throughput but could leave a complex case waiting indefinitely — the classic starvation trade-off.

Shortest Remaining Time First (SRTF)

SRTF is the preemptive version of SJF: whenever a new process arrives in the ready queue, if its burst time is less than the remaining burst of the currently running process, the OS preempts the current one.

SRTF gives the globally optimal average waiting time across all scheduling algorithms (including preemptive ones), but it requires knowing future burst times — which is impossible in practice. It is used as a theoretical benchmark.

Round Robin (RR)

Round Robin assigns each process a fixed time quantum (also called time slice, typically 10–100 ms), then cycles through the ready queue, giving each process one quantum in turn. It is inherently preemptive.

Why it matters: RR is the standard algorithm for time-sharing systems where responsiveness matters more than raw throughput (desktops, web servers).

Quantum size is critical:

Too small → context switches dominate; CPU wastes time saving/restoring registers.
Too large → degrades toward FCFS; processes feel unresponsive.
Rule of thumb: ~80% of CPU bursts should finish within one quantum.

Real-world example: The Linux CFS (Completely Fair Scheduler) is a weighted Round Robin variant. Each process gets a slice of CPU time proportional to its "nice" value.

Priority Scheduling

Each process is assigned a numeric priority. The CPU always goes to the highest-priority ready process. Can be preemptive (new high-priority process can interrupt) or non-preemptive.

Problem: starvation of low-priority processes. Fix: aging — each second a process waits, its priority value increases by 1. Eventually even the lowest-priority process will accumulate enough seniority to run.

Common misconception: "Higher number = higher priority" is OS-dependent. In many Unix systems, lower numbers mean higher priority (0 = most urgent). Always check the convention in exam questions.

Multilevel Queue Scheduling

The ready queue is split into multiple queues with different scheduling algorithms per queue (e.g., system processes use RR with small quantum; interactive processes use RR with larger quantum; batch jobs use FCFS). Processes are permanently assigned to a queue at birth.

The queues themselves are scheduled by fixed priority — the batch queue does not get CPU time while any interactive process is ready.

Multilevel Feedback Queue (MLFQ)

The most general and most commonly used scheme in real OSes. Like multilevel queue, but processes can move between queues based on their behaviour:

A CPU-intensive process that consistently uses its full quantum gets demoted to a lower-priority queue.
A process that often does I/O (interactive behaviour) gets promoted back up.

This means the OS automatically identifies and rewards interactive processes without requiring programmers to label them. Windows, Linux, and macOS all use variants of MLFQ.

Comparison at a Glance

:::compare CPU Scheduling Algorithms

Algorithm	Preemptive?	Optimises	Starvation Risk?	Best Use Case
FCFS	No	Nothing specific	No	Simple batch systems
SJF	No	Avg waiting time	Yes (long jobs)	Batch with known burst times
SRTF	Yes	Avg waiting time (global optimum)	Yes (long jobs)	Theoretical benchmark
Round Robin	Yes	Response time, fairness	No	Time-sharing / interactive
Priority	Either	Custom criteria	Yes (low priority)	Real-time systems
Multilevel Queue	Either	Different goals per tier	Possible	Multi-class workloads
MLFQ	Yes	Adaptive fairness	Rare (with aging)	General-purpose modern OSes
:::

Gantt Charts and Calculating Metrics

Exam questions often ask for average waiting time or average turnaround time. The method is always:

Waiting time = start time − arrival time (adjust if process is preempted and re-queued).
Turnaround time = completion time − arrival time.
Response time = first time the process gets CPU − arrival time.

Draw a Gantt chart (a timeline showing which process runs when) before computing any metric. It prevents arithmetic mistakes.

Worked example — Round Robin with quantum = 2

Question: P1 arrives at 0 (burst 4), P2 arrives at 0 (burst 3), P3 arrives at 0 (burst 5). Quantum = 2.

Solution:
Step 1: Gantt chart: P1(0-2), P2(2-4), P3(4-6), P1(6-8), P2(8-9), P3(9-11), P3(11-12).
Step 2: Wait times — P1: 4 ms (ran at 6, arrived at 0, already ran 2 ms — use turnaround−burst); compute directly: P1 finishes at 8, burst=4, turnaround=8, wait=4. P2 finishes at 9, burst=3, wait=6. P3 finishes at 12, burst=5, wait=7.
Conclusion: Average wait = (4+6+7)/3 = 5.67 ms.

:::keypoints Key points

FCFS is simple but suffers from the convoy effect — never optimal.
SJF minimises average waiting time but cannot know future burst times.
SRTF is the preemptive SJF and is the global theoretical optimum.
Round Robin trades throughput for fairness and responsiveness; quantum size is the key tuning parameter.
Priority scheduling risks starvation; aging is the standard cure.
Multilevel Feedback Queue is the real-world default in modern OSes.
Draw a Gantt chart before computing any scheduling metric.
Preemptive algorithms can interrupt; non-preemptive cannot.
:::

:::memory
"Fairly Solve Problems, Round Priority Makes Feedback" — FCFS, SJF, SRTF, RR, Priority, Multilevel, Feedback — the seven algorithms in order.
:::

:::recap

FCFS is FIFO; worst for average waiting time when jobs differ widely in length.
SJF and SRTF are optimal but require knowing burst times.
Round Robin is the go-to for interactive/time-sharing systems.
Priority scheduling needs aging to prevent indefinite starvation.
Modern OSes use Multilevel Feedback Queue for automatic adaptation.
Exam technique: always draw a Gantt chart to compute waiting and turnaround times accurately.
:::

Process Concepts and States

Process vs Program and the PCB

Notes

A program is a passive entity (file on disk); a process is an active entity with a program counter, registers, stack, and resources. The Process Control Block (PCB) stores per-process info: PID, process state, program counter, CPU registers, scheduling info (priority), memory-management info (page/segment tables), accounting info, and I/O status (open files). Memory aid: 'PC RAMS-IO' (PID, Counter, Registers, Accounting, Memory, State, IO). The PCB is saved/restored during a context switch. A process address space has four regions: Text (code), Data (globals), Heap (dynamic, grows up), and Stack (grows down). Remember 'TDHS' bottom-to-top. Context switch overhead is pure scheduling cost; no useful work is done during it.

Five-State Process Model

Summary

Two coders attack the same Dynamic Programming problem. One writes a recursive function with a cache; the other fills a 2-D array in nested loops. Both finish with the same time complexity — yet under the hood they made very different trade-offs. Knowing which style to choose, and why, is the heart of GATE-level DP.

Definition: Memoization (top-down DP). Ordinary recursion plus a lookup table (memo[state]). The first time a sub-problem is solved its answer is cached; every later call returns the cached value in O(1). It is lazy — only sub-problems actually reached are computed.

Definition: Tabulation (bottom-up DP). An iterative algorithm that fills a DP table in dependency order, smallest sub-problem first, until the final answer sits at the last cell. It is eager — every entry in the table is computed.

Same Complexity, Different Constants

Both styles share the universal DP shortcut:

Time = (number of distinct states) × (work per transition).

For Fibonacci, the state is a single integer i from 0 to n, so there are n+1 states; each transition does O(1) work, giving O(n) overall. The same formula applies whether you write fib(n) recursively with a cache, or run for i in 0..n: dp[i] = dp[i-1] + dp[i-2].

But constants matter, especially in GATE numerical questions and in interviews.

Top-down wins when many states are unreachable from the answer state. Classic example: a path-counting problem on a sparse DAG, where only a small subset of (i, j) pairs is ever explored.
Bottom-up wins when every state is needed anyway, because it avoids function-call overhead and a deep recursion stack.

Stack Depth, Cache Locality and Constant Factors

Recursive memoization carries a hidden tax — the call stack. For an input that requires depth n recursion, the runtime stores n stack frames; on most systems this overflows around n = 10^4 to 10^5. Tabulation has no such limit. A second hidden tax is cache locality: bottom-up loops walk memory sequentially and benefit from CPU prefetching, while a recursive scheme may jump unpredictably through the memo table.

In return, top-down DP is far easier to derive. You write the natural recurrence, slap on @lru_cache, and you are done. That makes it ideal for problems where the recurrence is non-obvious but the state space is small — many game theory, interval and tree DPs fall in this camp.

Space Optimization — A Bottom-Up Super-Power

Once you compute bottom-up, you can often see that row i only depends on row i-1. So instead of storing the full 2-D table, keep just two rows (or even one) and overwrite. This shrinks space from O(n²) to O(n) — sometimes the difference between Accepted and Memory Limit Exceeded. The 0/1 Knapsack, Longest Common Subsequence and Edit Distance all admit this optimization. Top-down DP cannot do this naturally, because recursion calls are unordered.

Worked example — Fibonacci, three ways:

Question: How many states and transitions does fib(n) have, and what does space optimization give you?
Solution:
Step 1: States — one per value of i, so n+1 states.
Step 2: Transition — fib(i) = fib(i-1) + fib(i-2), which is O(1) work.
Step 3: Time = states × work = O(n) × O(1) = O(n).
Step 4: Bottom-up needs an array of size n+1 → O(n) space; with rolling variables a, b, you reduce to O(1) space.
Conclusion: Same O(n) time for every style, but only tabulation gives O(1) space via the two-variable trick.

Why it matters

GATE Computer Science routinely asks: "What is the time/space complexity of the given DP solution?" The correct answer almost always falls out of counting states × transitions. Memoization-vs-tabulation also appears in the Programming and Data Structures section as MCQs about stack overflow, recursion depth and iterative rewrites — and in interviews at every product company that recruits GATE rankers.

Real-world example

Indian Railways uses DP-style algorithms for seat allocation under quotas and fare-table lookups across thousands of stations. Building the fare matrix once at boot (tabulation) is faster than computing fares lazily at every booking request (memoization), because virtually every station pair is queried daily — a textbook case where bottom-up wins.

Common misconception

"Memoization and tabulation are different time complexities." False. Whenever both visit the same set of states, they have the same big-O time. The differences are in constants, space, and which states are computed. Another myth: "Top-down is always slower because of recursion." Not true when many states are pruned — top-down can be asymptotically the same yet do strictly less work in practice.

:::compare

Feature	Memoization (Top-Down)	Tabulation (Bottom-Up)
Direction	Recurse from answer to base	Iterate from base to answer
Computes	Only reached states (lazy)	Every state (eager)
Recursion stack	Yes — risk of overflow	None
Code style	Natural — mirrors recurrence	Requires ordering of states
Space optimization	Hard	Easy (rolling rows/variables)
Best when	State space is sparse	All states needed; deep n
Constant factor	Higher (function calls)	Lower (tight loops)
:::

:::keypoints

Time = states × transition cost — the universal DP shortcut.
Memoization is recursion + cache; tabulation is iteration + table.
Both share the same asymptotic time when they visit the same states.
Top-down can skip unreachable states; bottom-up touches them all.
Bottom-up avoids stack overflow and enables space optimization (O(n²) → O(n) → O(1)).
Fibonacci: n states, O(1) transition → O(n) time, optimizable to O(1) space.
For sparse state spaces or hard-to-order recurrences, prefer memoization.
For deep n, tight time limits, or memory pressure, prefer tabulation.
:::

:::memory
"TOP is Lazy, BOTTOM is Tidy."
Top-down: Lookup-on-demand, recursion.
Bottom-up: Table fully built, iteration, can be space-squeezed.
For complexity, just chant: States × Transition.
:::

:::recap

Memoization = recursion + memo table; tabulation = iterative table fill.
Both have time = (distinct sub-problems) × (work per transition).
Top-down avoids unreached states; bottom-up avoids the call stack and allows space tricks.
For Fibonacci: n states × O(1) transition = O(n) time; O(1) space with two variables.
:::

Context Switch Mechanics

Worked example

A context switch saves the current process state into its PCB and loads the next process's state. Triggered by: interrupts, system calls causing blocking, or preemption (timer). Mode switch (user to kernel) is cheaper than a full context switch and does not necessarily change the running process. Example: if a context switch costs 2 ms and a time quantum is 8 ms, then for every 8 ms of useful work, 2 ms is overhead, giving CPU efficiency = 8/(8+2) = 80%. Smaller quanta increase responsiveness but raise context-switch overhead; larger quanta reduce overhead but approach FCFS behaviour. Saving/restoring registers is hardware-assisted on many architectures.

Process synchronization

Critical section, Peterson, semaphores.

Processes — states, PCB, context switch, IPC, threads vs processes

Notes

When you run a program, the file on disk does nothing by itself — the operating system must breathe life into it as a process, manage its journey through states, and orchestrate dozens of such processes apparently simultaneously on a single CPU. Understanding how processes live, switch, communicate and compare to threads is core OS material and a GATE staple.

Definition: A process is a program in execution — an active entity with its own address space, state, registers and open resources. A program is the passive instruction file sitting on disk; the same program binary can spawn many independent processes simultaneously.

The process in memory

A running process occupies four logical regions:

Region	Contents
Text (code) segment	Machine instructions; typically read-only
Data segment	Global and static variables; two parts: initialised (BSS) and uninitialised
Heap	Dynamically allocated memory (malloc/new); grows upward
Stack	Function call frames, local variables, return addresses; grows downward

Process states — the 5-state model

[New] → [Ready] ⇌ [Running] → [Terminated]
                  ↓ ↑
              [Waiting / Blocked]

New: being created; resources are being allocated.
Ready: in memory, waiting in the ready queue for CPU time.
Running: the CPU is executing its instructions (on a single-core system, exactly one process is Running at any instant).
Waiting (Blocked): paused, waiting for an external event (I/O completion, signal, resource availability).
Terminated: execution complete; PCB not yet freed (zombie state until parent reads exit status).

State transitions:

Ready → Running: scheduler dispatches the process (long-term, medium-term, short-term schedulers).
Running → Ready: time-slice expires (preemption), or higher-priority process arrives.
Running → Waiting: process requests I/O or waits for a resource.
Waiting → Ready: the awaited event (e.g., I/O completion) occurs.
Running → Terminated: process calls exit() or is killed.

Process Control Block (PCB)

Definition: The PCB (also called Task Control Block) is the kernel data structure that represents a process — the snapshot the OS needs to pause and resume it.

A PCB contains:

Field	Purpose
Process ID (PID)	Unique identifier
Process state	New / Ready / Running / Waiting / Terminated
Program counter (PC)	Address of the next instruction to execute
CPU registers	All general-purpose, index and stack pointer values
CPU scheduling info	Priority, pointers to scheduling queues
Memory management info	Page/segment tables, base and limit registers
I/O status info	List of open files, pending I/O requests
Accounting info	CPU time used, wall-clock time, process number

PCBs are stored in a process table in kernel memory and chained into ready/wait queues.

Context switch — the cost of multitasking

A context switch is the mechanism by which the CPU switches from one process to another: the OS saves the currently running process's CPU state into its PCB, then loads the saved state of the next process from its PCB.

Why it is pure overhead: While the OS is saving and loading PCBs, no user process is making progress. On a modern system running hundreds of processes, context switches happen thousands of times per second; each switch costs:

Saving the current PCB.
Loading the next PCB.
Flushing or selectively invalidating the TLB (Translation Lookaside Buffer) — if the new process has a different address space, cached virtual-to-physical mappings are stale.
Resetting pipeline and branch-predictor state on some architectures.

Typical duration: a few microseconds; on modern hardware with hardware-tagged TLBs (e.g., ARM with ASID) the TLB flush can be skipped for the same process, reducing cost.

Process creation in Unix/Linux

fork(): creates a child process that is a copy-on-write (COW) clone of the parent.

Returns 0 in the child, child's PID in the parent, and −1 on error.
The child inherits open file descriptors, signal handlers, environment variables, and the program counter (it continues from the instruction after fork()).
Modern implementations use COW: both processes share the same physical pages until one writes, at which point a private copy is made — so fork is cheap even for large processes.

exec() family (execve, execl, execvp, …): replaces the calling process's memory image with a new program. It does not create a new process — the PID remains the same; only the code, data and stack change.

Typical pattern: fork() to create a child, then exec() in the child to load a new program (e.g., how a shell runs a command).

wait() / waitpid(): the parent calls wait() to block until a child terminates and to reap its exit status from the PCB.

Zombie process: a terminated child whose exit status has not yet been wait()-ed by the parent. Its PCB remains in the process table — consuming a small amount of memory — until reaped.

Orphan process: a child whose parent terminates first. Linux automatically re-parents orphans to PID 1 (init / systemd), which periodically calls wait().

Threads vs processes — why threads exist

Threads ("lightweight processes") were introduced so that related tasks within one application could share memory efficiently without the overhead of full process creation.

:::compare Threads vs Processes

Feature	Process	Thread
Address space	Entirely separate	Shared within the process
Memory	Own heap, stack, code	Own stack only; shared heap and code
Communication	IPC (pipes, sockets, shared memory — slower, complex)	Direct via shared variables (fast, needs synchronisation)
Creation cost	High — fork+exec, new address space	Much lower — new stack + TCB within existing address space
Context switch cost	Slower — TLB flush if different address space	Faster — no address-space change
Crash isolation	Crash of one process does not affect others	A thread crash (e.g., segfault, stack overflow) typically kills the entire process
Scheduling	OS schedules independently	Kernel threads: OS-scheduled; user threads: library-scheduled
:::

User-level threads (ULT): managed by a user-space threading library (e.g., POSIX pthreads in user mode, early Java green threads). Context switch is very fast; but if one ULT makes a blocking system call, the kernel blocks the entire process — all threads stall.

Kernel-level threads (KLT): the kernel knows about each thread and can schedule them independently. One thread can block on I/O while others continue. Slightly more expensive to create and switch. All modern OS (Linux, Windows, macOS) use KLT.

Threading models:

M:1 (many-to-one): all ULTs mapped to one KLT. Fast switch; blocks whole process.
1:1 (one-to-one): one KLT per ULT. Most common (Linux pthreads, Windows threads). Best concurrency; small overhead per thread.
M:N (many-to-many): M ULTs mapped to ≤ M KLTs. Flexible; complex to implement (Solaris, early Go runtime).

Inter-Process Communication (IPC)

Separate processes have separate address spaces; to exchange data, they need IPC mechanisms:

1. Shared memory (fastest): the kernel maps a common physical memory region into both processes' virtual address spaces. Data is passed by writing to this region. Requires synchronisation (mutexes, semaphores) to prevent race conditions. shmget() / shmat() in System V; mmap() with MAP_SHARED in POSIX.

2. Message passing (cleaner isolation):

Pipes: unidirectional, anonymous; kernel buffer; typically parent-child.
Named pipes (FIFOs): like pipes but accessible by any processes via a file-system path.
Message queues: messages stored in the kernel; processes send/receive; allows priority ordering.
Sockets: full-duplex; can cross machine boundaries (network sockets) or stay local (Unix domain sockets); used by browsers, databases, etc.

3. Signals: asynchronous notifications sent to a process. Examples: SIGKILL (uncatchable terminate), SIGTERM (polite terminate), SIGINT (Ctrl+C), SIGSEGV (segmentation fault), SIGCHLD (child terminated). A process installs a signal handler or leaves the default action.

4. Files: simplest form of IPC; persistent; used for logging and configuration sharing.

Synchronisation issues (overview)

When multiple processes or threads access shared data without coordination:

Race condition: the outcome depends on the exact interleaving of instructions — non-deterministic, hard to debug.
Critical section: the code region accessing shared data; must have mutual exclusion (only one thread at a time), progress (not blocked when no one is in), and bounded waiting (not starved forever).
Solutions: mutexes, semaphores, monitors, condition variables — detailed in the OS Synchronisation lesson.

Key scheduling metrics (GATE numericals)

Turnaround time = completion time − arrival time.
Waiting time = turnaround time − burst (CPU) time.
Response time = first time on CPU − arrival time.
CPU utilisation = useful CPU time / total time × 100%.
Throughput = number of processes completed per unit time.

Worked example

Question: Three processes: P1 (arrival 0, burst 5), P2 (arrival 1, burst 3), P3 (arrival 2, burst 2). FCFS scheduling. Find average waiting time.

Solution:

Step 1: FCFS order of execution: P1 at time 0–5, P2 at 5–8, P3 at 8–10.

Step 2: Waiting times — P1: 0 − 0 = 0; P2: 5 − 1 = 4; P3: 8 − 2 = 6.

Step 3: Average waiting time = (0 + 4 + 6) / 3 = 10/3 ≈ 3.33 ms.

Conclusion: Under FCFS the average wait is 3.33 ms; SJF would serve P3 first after P1, reducing this — illustrating why scheduling algorithm choice matters.

Why it matters: These concepts explain how a single CPU appears to run dozens of applications simultaneously, why a crashed browser tab can (or cannot) take down others, and how scheduling metrics are computed for GATE numerical questions.

Real-world example: Open several browser tabs — Chrome deliberately runs each tab as a separate process (with shared code libraries, but separate heaps/stacks). When a tab crashes, it is isolated: the main browser process reparents it, shows an error page, and other tabs continue. Within a single tab, multiple threads share memory to render HTML, run JavaScript, handle network requests and manage the UI simultaneously — exactly the threads-within-process model this lesson describes.

Common misconception: Many students believe fork() returns the same value to both parent and child. It does not — fork() returns 0 to the child and the child's PID to the parent (−1 on any error). This asymmetric return value is the standard Unix idiom for a single function that creates two execution paths, and it is a direct GATE question.

:::keypoints Key points

A process is active; a program is the passive file. Processes have code, data, heap and stack regions.
The PCB stores PID, PC, registers, state, memory maps — everything needed to resume a paused process.
A context switch is pure overhead; it may flush the TLB, breaking memory translation caches.
fork() returns 0 to child, child PID to parent — memorise this exact return value.
Threads share heap/code (fast IPC) but a crash kills the entire process.
1:1 threading model (one KLT per ULT) is used by Linux and Windows.
Turnaround = completion − arrival; Waiting = turnaround − burst; Response = first dispatch − arrival.
:::
:::memory
"fork = zero to child, PID to parent" — write it, say it, never forget it.
:::
:::recap
Processes move New → Ready → Running → Waiting → Terminated; every arrow is testable.
PCB and context switch are the mechanism behind apparent multitasking.
fork/exec/wait is the Unix process lifecycle; zombie and orphan are common trick questions.
Threads trade isolation for speed; 1:1 kernel threading is the modern default.
Scheduling metrics (TAT, WT, RT) are GATE numericals — practise the FCFS/SJF/RR computation pattern.
:::

CPU Scheduling Algorithms

Scheduling Criteria and Formulas

Formulas

Key metrics: Turnaround Time (TAT) = Completion Time - Arrival Time; Waiting Time (WT) = TAT - Burst Time; Response Time = first CPU allocation time - Arrival Time. Average WT and average TAT are computed over all processes. Throughput = processes completed per unit time; CPU Utilization should be maximized. Memory aid: 'TAT = WT + BT' and 'WT = TAT - BT'. For non-preemptive algorithms, response time often equals waiting time. Goal: minimize WT, TAT, and response time; maximize throughput and utilization. Note arrival times carefully: idle CPU time before the first arrival is not counted as waiting for any process.

FCFS, SJF, SRTF, Priority, Round Robin

Notes

FCFS: non-preemptive, by arrival order; suffers convoy effect (short jobs wait behind long ones). SJF: non-preemptive, picks shortest burst; provably gives minimum average waiting time. SRTF: preemptive SJF; optimal preemptive WT but causes starvation of long jobs. Priority: highest priority first; starvation fixed by aging. Round Robin (RR): preemptive FCFS with time quantum q; if q is very large, RR to FCFS; if q to 0, overhead dominates. RR response time is good; it is fair. Memory aid: 'SJF/SRTF minimize average waiting time'. Both SJF and SRTF need burst-length knowledge (estimated via exponential averaging).

SRTF Worked Example

Worked example

Processes (Arrival, Burst): P1(0,8), P2(1,4), P3(2,9), P4(3,5). SRTF (preemptive): At t=0 run P1 (rem 8). At t=1, P2 arrives (4) < P1 rem(7), so P2 runs. At t=2 P3(9) vs P2 rem(3): P2 continues. At t=3 P4(5) vs P2 rem(2): P2 continues, finishes t=5. Then choose smallest among P1(7), P4(5): P4 runs to t=10. Then P1(7) to t=17, then P3(9) to t=26. Completion: P1=17, P2=5, P3=26, P4=10. TAT = C - A: P1=17, P2=4, P3=24, P4=7. WT = TAT - BT: P1=9, P2=0, P3=15, P4=2. Average WT = (9+0+15+2)/4 = 6.5.

Threads and Concurrency

Threads: Shared vs Private Resources

Notes

A thread is a lightweight unit of execution within a process. Threads of the same process SHARE: code/text, data (globals), heap, and open files/signals. Each thread has PRIVATE: program counter, registers, and its own stack. Memory aid: 'Stack and Registers are Selfish; Code, Data, Heap, Files are Shared' (SR-private, CDHF-shared). Thread creation and context switching are cheaper than process creation because the address space is shared (no page-table switch, no TLB flush in many cases). Benefits: responsiveness, resource sharing, economy, scalability on multiprocessors. Risk: lack of isolation; one thread's bad pointer can corrupt the whole process.

Multithreading Models and Thread Libraries

Summary

Modern processors give you multiple cores, but only the kernel decides which thread actually runs on which core. So how user-level threads are mapped to kernel-level threads decides whether your program is truly parallel, whether one blocking I/O call freezes everything, and how heavy the per-thread cost is. GATE loves this topic because a single line of code (pthread_create) behaves very differently under each model.

Definition: A user-level thread (ULT) is a thread managed entirely by a library in user space; the kernel is unaware of it.

Definition: A kernel-level thread (KLT) is a thread the kernel itself schedules; it is the unit the OS scheduler sees.

Definition: A multithreading model is the rule that maps many user threads onto some number of kernel threads.

The three classical models

There are exactly three mapping models you must memorise: Many-to-One, One-to-One, and Many-to-Many. Each is a trade-off between speed of thread management and true concurrency / parallelism.

Many-to-One

Many user threads are mapped to one kernel thread. The thread library (in user space) does its own scheduling among the user threads, but to the kernel the whole process looks like a single schedulable entity.

The good: thread creation, context switch, and destruction are all done in user space — extremely cheap, no system call needed.

The bad: if any user thread makes a blocking system call (say, a disk read), the entire process is blocked because the kernel only sees one thread. And on a multi-core CPU, only one user thread can run at any instant — no true parallelism, ever. Classic implementations: early Solaris green threads, GNU Portable Threads.

One-to-One

Every user thread maps to its own kernel thread. Now the kernel sees and schedules every thread independently, so blocking one does not block others, and the OS can place different threads on different cores for genuine parallelism.

The cost: every thread creation requires a system call and a kernel data structure, which is much heavier. There is also usually a system-imposed cap on the number of kernel threads per process.

This is the model used by Linux (NPTL) and Windows. So when you write pthread_create on a typical Linux machine today, you are using a One-to-One implementation.

Many-to-Many

m user threads multiplex onto n kernel threads, with n typically less than or equal to m and tuneable. The library can keep many lightweight user threads while ensuring there are always some kernel threads available so that a blocked thread does not stall the others.

This combines the flexibility of user threads with the parallelism of kernel threads. It is conceptually elegant but harder to implement, which is why most mainstream OSes have moved to One-to-One. A close variant is the two-level model, where most user threads are many-to-many but a few are "bound" to a dedicated kernel thread.

Why thread libraries are not models

Definition: A thread library is the API a programmer uses to create and manage threads.

The three libraries you should know are POSIX Pthreads, Windows threads, and Java threads. A library is a specification; the model it uses depends on the implementation. POSIX Pthreads on Linux is One-to-One; an older Pthreads implementation on a different OS could have been Many-to-One. Java threads on the JVM usually delegate to the underlying OS, so on Linux/Windows they too are One-to-One.

Why it matters: an exam question that says "POSIX Pthreads is which model?" is a trap — the correct answer is that Pthreads is a specification, not a model. The model depends on the implementation.

Real-world example

Consider a typical Indian web service like an IRCTC ticket booking backend running on Linux. When a request arrives, a worker thread (a Pthread, which on Linux is a One-to-One kernel thread) does a database call. The thread blocks on I/O — but because the model is One-to-One, the other worker threads continue serving other passengers on other cores. If IRCTC were running on Many-to-One, every passenger's request would queue behind whichever one was waiting on the database. That is why production servers have used kernel threads for two decades.

Common misconception

A very common error is to say "user-level threads are always faster." They are faster to create and switch, but if any one of them blocks on I/O, the entire process stalls — making throughput much worse. So user-level threads are fast only for CPU-bound, cooperative workloads where blocking is rare. For real workloads with disk and network I/O, kernel-level threads win.

Another misconception: students think Many-to-Many is the most popular model. In practice, the simpler One-to-One model won because modern OS schedulers and stacks are cheap enough that the overhead is acceptable.

Worked example

Question: A program has 4 user-level threads under a Many-to-One model on a quad-core machine. One thread issues a blocking read() system call. What happens to the other three threads?
Solution:
Step 1: Recall that under Many-to-One the kernel sees only one schedulable entity for the entire process.
Step 2: The blocking system call traps to the kernel. The kernel marks the (single) kernel thread as blocked.
Step 3: With the only kernel thread blocked, no user thread of this process can be scheduled — the library has nothing to switch to from the kernel's perspective.
Conclusion: All four user threads are effectively blocked, even though three of them have no reason to wait. This is the canonical "Many-to-One blocks all" failure.

:::compare

Model	True parallelism on multicore?	One blocking call blocks all?	Cost of create/switch
Many-to-One	No	Yes	Very low (user space)
One-to-One	Yes	No	High (kernel call per thread)
Many-to-Many	Yes (up to n cores)	No (others remap to spare KLT)	Medium
:::

:::keypoints

ULTs are managed by a library; KLTs are managed by the OS.
Many-to-One → all threads on one KLT → no real parallelism.
One-to-One → one KLT per ULT → used by Linux NPTL and Windows.
Many-to-Many → m ULTs on n KLTs → flexible but complex.
Pthreads is a specification, not a model; behaviour depends on the implementation.
A blocking system call stalls the whole KLT it runs on, not the whole machine.
Modern Linux Pthreads is One-to-One with NPTL.
:::

:::memory
"Many-to-One Blocks All." If you ever see a multithreading question and remember just this five-word sentence, you can rule out wrong answers about blocking behaviour. For parallelism, remember "One-to-One runs On every cOre."
:::

:::recap

Three models: M:1, 1:1, M:N — pick by parallelism need vs cost.
Linux/Windows use One-to-One via Pthreads/NPTL.
ULTs are fast to manage, KLTs survive blocking calls.
A library is not a model — the implementation chooses.
:::

fork() Behaviour Example

Worked example

fork() creates a child process duplicating the parent's address space (copy-on-write). It returns the child PID to the parent and 0 to the child. Counting trick: n consecutive fork() calls create 2^n - 1 child processes (total 2^n processes including the original). Example: code with three fork() calls in sequence yields 2^3 = 8 total processes, hence 7 children. With branching/conditionals, draw the process tree. After fork(), parent and child have separate copies of variables; changes in one do not affect the other. exec() replaces the process image; wait() lets a parent block until a child terminates, reaping zombies.

Interprocess Communication and Synchronization Basics

IPC Models: Shared Memory vs Message Passing

Notes

Two fundamental IPC models. Shared Memory: processes share a region of memory; fast (after setup) since no kernel involvement per access, but requires explicit synchronization to avoid race conditions. Message Passing: processes exchange messages via send/receive; easier for distributed systems, no shared variables, but slower due to kernel involvement and copying. Message passing can be blocking (synchronous) or non-blocking (asynchronous), with direct or indirect (mailbox/port) addressing. Memory aid: 'Shared memory = fast but you synchronize; Message passing = safe but slow'. Pipes, sockets, and message queues are common implementations; named pipes (FIFOs) persist and work between unrelated processes.

Race Condition and Critical Section Requirements

Summary

A race condition occurs when the outcome depends on the non-deterministic ordering of concurrent accesses to shared data. The critical section (CS) is the code segment accessing shared resources. Any correct CS solution must satisfy THREE requirements: (1) Mutual Exclusion - at most one process in the CS at a time; (2) Progress - if no process is in the CS, selection of the next entrant cannot be postponed indefinitely and only contenders decide; (3) Bounded Waiting - a bound exists on how many times others enter before a waiting process is granted entry. Memory aid: 'ME, Progress, Bounded Wait' = 'MPB'. Note: assumptions about relative process speeds or number of CPUs must NOT be made.

Peterson's Solution

Worked example

Peterson's algorithm is one of the most elegant ideas in concurrent programming and one of the most-loved testbeds in GATE Computer Science. It solves the critical section problem for two processes using nothing but ordinary memory reads and writes — no special hardware instructions like Test-and-Set or atomic Compare-and-Swap. That minimalism is exactly what makes it teachable and examable.

Definition: The critical section problem asks: given multiple processes that share a resource, how do we ensure that at most one process at a time accesses the shared data, while still guaranteeing progress and fairness?

Definition: A correct solution to the critical section problem must satisfy three properties — mutual exclusion (only one in the critical section), progress (if no one is inside, those wanting to enter must eventually be allowed in), and bounded waiting (no process is starved indefinitely).

Definition: Peterson's solution is a software-only algorithm by Gary Peterson (1981) that uses two shared variables — a boolean array flag[2] and an integer turn — to satisfy all three properties for exactly two processes.

The Algorithm in One Block

shared:
    boolean flag[2] = {false, false};   // intent to enter
    int     turn;                       // whose turn it is

process i (i = 0 or 1; let j = 1 - i):
    // ENTRY SECTION
    flag[i] = true;                 // (1) I want in
    turn    = j;                    // (2) but you go first
    while (flag[j] && turn == j)    // (3) wait only if peer also wants in AND it's their turn
        ;                           //     busy-wait

    // CRITICAL SECTION
    // ... access shared data ...

    // EXIT SECTION
    flag[i] = false;                // (4) I'm done; release

That is the whole algorithm. Three writes and a busy-wait. The cleverness is hidden in the order of lines (1) and (2) and the conjunction in line (3).

Why the Three Properties Hold

Mutual exclusion. Suppose both P0 and P1 are inside the critical section at the same time. Then both must have read flag[other] && turn == other as false. Each of them set flag[i] = true before the check, so flag[other] could not have been false at the moment of the check. Therefore turn == other must have been false for both — but turn is a single variable, so it cannot equal both 0 and 1 simultaneously. Contradiction. Hence at most one process is inside the critical section at a time.

Progress. Suppose P0 is waiting in its while-loop. That means flag[1] && turn == 1. If P1 does not want to enter, it would have set flag[1] = false, so P0 would proceed. If P1 does want to enter, then after P1 wrote turn = 0 in its own entry section, P0's condition turn == 1 becomes false, so P0 proceeds. Either way, someone makes progress — neither process is blocked unless the other is actively in or trying.

Bounded waiting. Once P0 exits and re-enters, it again sets turn = 1 in line (2). So if P1 was waiting, P1 enters next. P0 cannot starve P1 for more than one critical-section worth of waiting. Bound = 1 turn.

The Polite-Friends Intuition

The mental model that sticks for GATE is two friends at a single doorway.

Each friend first raises a hand to signal intent (line 1).
Then each friend says "You go first" by setting turn to the other person's name (line 2).
The friend who said "you go first" last ends up waiting — because turn now holds the other name.

So whichever friend most recently deferred ends up being the one who waits. The peer enters, finishes, lowers their flag, and the waiting friend then proceeds. The asymmetry created by the last write to turn is what breaks the tie.

Why the Order of Lines (1) and (2) Matters

If we swap lines (1) and (2) — i.e., set turn = j before raising our flag — then both processes can pass the while-check before either flag is high. Two processes would enter the critical section together. Mutual exclusion breaks. So the flag-then-turn ordering is structurally essential, not stylistic.

Similarly, the while-condition is a conjunction (&&), not a disjunction (||). Using || would deadlock both processes when both raise their flags. The conjunction lets one of them break free precisely because turn can hold only one value.

Why it matters

Peterson's solution is the cleanest illustration of three GATE-favourite ideas at once:

Software-only synchronisation is possible — you do not always need atomic hardware instructions.
All three correctness properties must be checked; satisfying mutual exclusion alone is not enough.
Memory consistency assumptions matter — and on modern hardware, naive Peterson without memory barriers can fail.

GATE setters love this last point because it bridges OS, computer architecture and compilers.

Real-world example

Imagine two Bangalore-based microservices, each running on a separate core, both writing to a shared in-memory cache entry. A pre-2010 textbook would suggest Peterson-style software locks. In modern practice, the engineers would reach for the OS's atomic primitives (std::atomic in C++, synchronized in Java, Mutex in Rust) — but understanding Peterson explains why those higher-level primitives must use memory fences internally.

Common misconception

Many students believe Peterson's algorithm works perfectly on any modern multi-core CPU. It does not. Out-of-order execution and store buffers in x86, ARM and POWER architectures can reorder the writes to flag[i] and turn. If process i's flag[i] = true becomes visible to other cores after turn = j, then process j may pass the while-check and both processes enter the critical section. The fix is to insert a memory barrier (mfence on x86) between lines (1) and (2). On the textbook model assumed by GATE — sequentially consistent memory with atomic loads and stores — this concern disappears.

A second misconception: students think Peterson can extend trivially to n processes. It cannot. The two-variable structure is specifically tied to two processes. The n-process generalisation is Lamport's Bakery Algorithm, not a Peterson extension. (A nested Peterson tournament can simulate n-process exclusion but is impractical.)

A third trap: students confuse Peterson with strict alternation (using only turn). Strict alternation guarantees mutual exclusion but violates progress — if one process never wants to enter, the other can never enter either. Likewise, using only flag[] without turn (Dekker's earlier attempt) can deadlock when both flags go up together. Peterson's two-variable design is the minimal fix.

Question: A student writes Peterson's algorithm but swaps the order of the first two entry-section statements (sets turn = j first, then flag[i] = true). Does the algorithm still guarantee mutual exclusion?

Solution:
Step 1: Suppose both processes execute turn = j first. Now both flags are still false.
Step 2: Both then set flag[i] = true. Now both flags are true.
Step 3: P0 checks flag[1] && turn == 1. If turn is currently 0 (overwritten by P1's turn = 0), P0 enters.
Step 4: P1 checks flag[0] && turn == 0. With turn == 0, P1 must wait. But the timing window between steps 1 and 2 can let both pass.
Conclusion: Mutual exclusion can fail with swapped order. Order matters.

:::compare

Solution	Mutual Exclusion	Progress	Bounded Waiting	# Processes
Strict alternation (turn only)	Yes	No	Yes	2
Flags only (Dekker's first attempt)	No (deadlock)	No	Yes (if no deadlock)	2
Peterson's solution	Yes	Yes	Yes	2
Lamport's Bakery	Yes	Yes	Yes	n
Hardware Test-and-Set	Yes	Yes	Not guaranteed	n
Hardware Compare-and-Swap	Yes	Yes	Not guaranteed	n
:::

:::keypoints

Peterson's solves the two-process critical section problem in software.
Uses two shared variables — boolean flag[2] and int turn.
Entry sequence: set flag[i] = true, then turn = j, then busy-wait on flag[j] && turn == j.
Exit sequence: set flag[i] = false.
Satisfies mutual exclusion, progress AND bounded waiting.
Order matters — flag before turn — or mutual exclusion can fail.
Works under sequentially consistent memory; modern CPUs need memory barriers.
Does not extend naturally beyond two processes — use Bakery for n processes.
The algorithm uses busy-waiting (spin-loops), which wastes CPU; in practice, OS primitives that block are preferred.
:::

:::memory
"Set flag, give turn away, then wait." — the three-line entry section in order.

Or: "I want in, but you go first." — captures the politeness that breaks the tie.
:::

:::recap

Peterson's algorithm = flag[2] + turn → safe critical section for two processes.
All three correctness properties (ME, progress, bounded waiting) are met.
Works only for two processes and assumes sequential memory consistency.
A swapped statement order or wrong logical operator silently breaks correctness — read carefully in GATE questions.
:::