Normalization
1NF, 2NF, 3NF, BCNF, functional dependencies.
Functional dependencies
Armstrong's axioms, closure, minimal cover.
Why normalize?
- Remove redundancy.
- Avoid update/insert/delete anomalies.
- Preserve dependencies.
FUNCTIONAL DEPENDENCY (FD)
X → Y means: value of X uniquely determines value of Y.
Armstrong's axioms:
- Reflexivity: if Y ⊆ X, then X → Y.
- Augmentation: if X → Y, then XZ → YZ.
- Transitivity: if X → Y and Y → Z, then X → Z.
Derived rules:
- Union: X→Y, X→Z → X→YZ.
- Decomposition: X→YZ → X→Y, X→Z.
- Pseudo-transitivity: X→Y, WY→Z → WX→Z.
KEYS:
- Super key: any set of attributes that uniquely identifies a tuple.
- Candidate key: minimal super key.
- Primary key: chosen candidate key.
- Prime attribute: part of any candidate key. Non-prime: not in any CK.
NORMAL FORMS:
1NF (First Normal Form):
- All attributes contain atomic (indivisible) values.
- No repeating groups or arrays.
- Bad: {StudentID, Name, Courses (CS101, CS102)}.
- Good: separate row for each course.
2NF:
- 1NF + no partial dependency.
- Every non-prime attribute is fully dependent on the entire primary key (not just part).
- Issue if PK is composite and some attribute depends only on part of it.
Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.
3NF:
- 2NF + no transitive dependency.
- Non-prime attribute should not depend on another non-prime.
Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.
Solution: EmpID → DeptID; DeptID → DeptName (separate table).
3NF rule: for every FD X → A, either:
- X is a super key, OR
- A is a prime attribute.
BCNF (Boyce-Codd Normal Form): stricter than 3NF.
- For every FD X → A, X must be a super key. (No exception for prime attribute.)
- All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.
4NF and 5NF (advanced):
- 4NF: no multi-valued dependencies (MVDs).
- 5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.
EXAMPLE PROBLEM:
Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.
Find candidate keys.
Compute closures of single attributes:
- A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
- B⁺ = BD.
- C⁺ = C.
- D⁺ = D.
- E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
Check pairs:
- CD⁺ = CDEABC = all → CD is a CK.
- BC⁺ = BCDEA = all → BC is a CK.
Candidate keys: {A, E, CD, BC}.
Prime attributes: A, B, C, D, E (all are prime).
In this case, every attribute is prime → likely 3NF since prime rule satisfied.
EXAM HOOKS:
- A relation is in BCNF if for every non-trivial FD, LHS is a super key.
- 3NF allows X→A where A is prime (even if X is not super key).
- Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
- Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.
Normal forms
1NF, 2NF, 3NF, BCNF, anomalies.
The normal forms are nested — BCNF ⊂ 3NF ⊂ 2NF ⊂ 1NF. Each removes a class of redundancy.
1NF: every attribute holds atomic values (no arrays, no nested tables). Almost always trivially true in modern relational DBs.
2NF: 1NF + no partial dependency (no non-prime attribute depends on only part of a candidate key). Only relevant if you have composite keys.
3NF: 2NF + no transitive dependency (no non-prime attribute depends on another non-prime attribute). For every FD X → Y in the relation, either X is a superkey OR Y is a prime attribute.
BCNF: 2NF + every non-trivial FD X → Y has X as a superkey. Stricter than 3NF — every determinant must be a superkey.
Quick test:
For every functional dependency X → Y in your schema:
If X is a superkey → fine.
Else if Y is a prime attribute (in some candidate key) → 3NF (but not BCNF).
Else → not 3NF.
Worked example.
Schema: STUDENT(roll, course, instructor, dept).
FDs: roll → dept, course → instructor.
Candidate key: {roll, course}.
- "roll → dept": roll is not a superkey, dept is non-prime → violates 3NF (transitive dependency).
- "course → instructor": course is not a superkey, instructor is non-prime → also violates 3NF.
To get to 3NF: split into STUDENT(roll, dept), ENROLLMENT(roll, course), COURSE(course, instructor).
Why stop at BCNF? Higher forms (4NF, 5NF, DKNF) handle multi-valued dependencies and join dependencies, rarely needed for typical schemas.
Why normalize?
- Remove redundancy.
- Avoid update/insert/delete anomalies.
- Preserve dependencies.
FUNCTIONAL DEPENDENCY (FD)
X → Y means: value of X uniquely determines value of Y.
Armstrong's axioms:
- Reflexivity: if Y ⊆ X, then X → Y.
- Augmentation: if X → Y, then XZ → YZ.
- Transitivity: if X → Y and Y → Z, then X → Z.
Derived rules:
- Union: X→Y, X→Z → X→YZ.
- Decomposition: X→YZ → X→Y, X→Z.
- Pseudo-transitivity: X→Y, WY→Z → WX→Z.
KEYS:
- Super key: any set of attributes that uniquely identifies a tuple.
- Candidate key: minimal super key.
- Primary key: chosen candidate key.
- Prime attribute: part of any candidate key. Non-prime: not in any CK.
NORMAL FORMS:
1NF (First Normal Form):
- All attributes contain atomic (indivisible) values.
- No repeating groups or arrays.
- Bad: {StudentID, Name, Courses (CS101, CS102)}.
- Good: separate row for each course.
2NF:
- 1NF + no partial dependency.
- Every non-prime attribute is fully dependent on the entire primary key (not just part).
- Issue if PK is composite and some attribute depends only on part of it.
Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.
3NF:
- 2NF + no transitive dependency.
- Non-prime attribute should not depend on another non-prime.
Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.
Solution: EmpID → DeptID; DeptID → DeptName (separate table).
3NF rule: for every FD X → A, either:
- X is a super key, OR
- A is a prime attribute.
BCNF (Boyce-Codd Normal Form): stricter than 3NF.
- For every FD X → A, X must be a super key. (No exception for prime attribute.)
- All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.
4NF and 5NF (advanced):
- 4NF: no multi-valued dependencies (MVDs).
- 5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.
EXAMPLE PROBLEM:
Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.
Find candidate keys.
Compute closures of single attributes:
- A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
- B⁺ = BD.
- C⁺ = C.
- D⁺ = D.
- E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
Check pairs:
- CD⁺ = CDEABC = all → CD is a CK.
- BC⁺ = BCDEA = all → BC is a CK.
Candidate keys: {A, E, CD, BC}.
Prime attributes: A, B, C, D, E (all are prime).
In this case, every attribute is prime → likely 3NF since prime rule satisfied.
EXAM HOOKS:
- A relation is in BCNF if for every non-trivial FD, LHS is a super key.
- 3NF allows X→A where A is prime (even if X is not super key).
- Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
- Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.