Normalization

1NF, 2NF, 3NF, BCNF, functional dependencies.

Practice — 2 questions Ask AI

Functional dependencies

Armstrong's axioms, closure, minimal cover.

Database normalization — FDs, 1NF to BCNF

Notes

Why normalize?

Remove redundancy.
Avoid update/insert/delete anomalies.
Preserve dependencies.

FUNCTIONAL DEPENDENCY (FD)

X → Y means: value of X uniquely determines value of Y.

Armstrong's axioms:

Reflexivity: if Y ⊆ X, then X → Y.
Augmentation: if X → Y, then XZ → YZ.
Transitivity: if X → Y and Y → Z, then X → Z.

Derived rules:

Union: X→Y, X→Z → X→YZ.
Decomposition: X→YZ → X→Y, X→Z.
Pseudo-transitivity: X→Y, WY→Z → WX→Z.

KEYS:

Super key: any set of attributes that uniquely identifies a tuple.
Candidate key: minimal super key.
Primary key: chosen candidate key.
Prime attribute: part of any candidate key. Non-prime: not in any CK.

NORMAL FORMS:

1NF (First Normal Form):

All attributes contain atomic (indivisible) values.
No repeating groups or arrays.
Bad: {StudentID, Name, Courses (CS101, CS102)}.
Good: separate row for each course.

2NF:

1NF + no partial dependency.
Every non-prime attribute is fully dependent on the entire primary key (not just part).
Issue if PK is composite and some attribute depends only on part of it.

Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.

3NF:

2NF + no transitive dependency.
Non-prime attribute should not depend on another non-prime.

Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.

Solution: EmpID → DeptID; DeptID → DeptName (separate table).

3NF rule: for every FD X → A, either:

X is a super key, OR
A is a prime attribute.

BCNF (Boyce-Codd Normal Form): stricter than 3NF.

For every FD X → A, X must be a super key. (No exception for prime attribute.)
All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.

4NF and 5NF (advanced):

4NF: no multi-valued dependencies (MVDs).
5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.

EXAMPLE PROBLEM:

Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.

Find candidate keys.

Compute closures of single attributes:
- A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
- B⁺ = BD.
- C⁺ = C.
- D⁺ = D.
- E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
Check pairs:
- CD⁺ = CDEABC = all → CD is a CK.
- BC⁺ = BCDEA = all → BC is a CK.

Candidate keys: {A, E, CD, BC}.

Prime attributes: A, B, C, D, E (all are prime).

In this case, every attribute is prime → likely 3NF since prime rule satisfied.

EXAM HOOKS:

A relation is in BCNF if for every non-trivial FD, LHS is a super key.
3NF allows X→A where A is prime (even if X is not super key).
Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.

Normal forms

1NF, 2NF, 3NF, BCNF, anomalies.

Normal forms 1NF → BCNF — the practical decision tree

Notes

The normal forms are nested — BCNF ⊂ 3NF ⊂ 2NF ⊂ 1NF. Each removes a class of redundancy.

1NF: every attribute holds atomic values (no arrays, no nested tables). Almost always trivially true in modern relational DBs.

2NF: 1NF + no partial dependency (no non-prime attribute depends on only part of a candidate key). Only relevant if you have composite keys.

3NF: 2NF + no transitive dependency (no non-prime attribute depends on another non-prime attribute). For every FD X → Y in the relation, either X is a superkey OR Y is a prime attribute.

BCNF: 2NF + every non-trivial FD X → Y has X as a superkey. Stricter than 3NF — every determinant must be a superkey.

Quick test:

For every functional dependency X → Y in your schema:
  If X is a superkey → fine.
  Else if Y is a prime attribute (in some candidate key) → 3NF (but not BCNF).
  Else → not 3NF.

Worked example.
Schema: STUDENT(roll, course, instructor, dept).
FDs: roll → dept, course → instructor.
Candidate key: {roll, course}.

"roll → dept": roll is not a superkey, dept is non-prime → violates 3NF (transitive dependency).
"course → instructor": course is not a superkey, instructor is non-prime → also violates 3NF.

To get to 3NF: split into STUDENT(roll, dept), ENROLLMENT(roll, course), COURSE(course, instructor).

Why stop at BCNF? Higher forms (4NF, 5NF, DKNF) handle multi-valued dependencies and join dependencies, rarely needed for typical schemas.

Database normalization — FDs, 1NF to BCNF

Notes

Why normalize?

Remove redundancy.
Avoid update/insert/delete anomalies.
Preserve dependencies.

FUNCTIONAL DEPENDENCY (FD)

X → Y means: value of X uniquely determines value of Y.

Armstrong's axioms:

Reflexivity: if Y ⊆ X, then X → Y.
Augmentation: if X → Y, then XZ → YZ.
Transitivity: if X → Y and Y → Z, then X → Z.

Derived rules:

Union: X→Y, X→Z → X→YZ.
Decomposition: X→YZ → X→Y, X→Z.
Pseudo-transitivity: X→Y, WY→Z → WX→Z.

KEYS:

Super key: any set of attributes that uniquely identifies a tuple.
Candidate key: minimal super key.
Primary key: chosen candidate key.
Prime attribute: part of any candidate key. Non-prime: not in any CK.

NORMAL FORMS:

1NF (First Normal Form):

All attributes contain atomic (indivisible) values.
No repeating groups or arrays.
Bad: {StudentID, Name, Courses (CS101, CS102)}.
Good: separate row for each course.

2NF:

1NF + no partial dependency.
Every non-prime attribute is fully dependent on the entire primary key (not just part).
Issue if PK is composite and some attribute depends only on part of it.

Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.

3NF:

2NF + no transitive dependency.
Non-prime attribute should not depend on another non-prime.

Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.

Solution: EmpID → DeptID; DeptID → DeptName (separate table).

3NF rule: for every FD X → A, either:

X is a super key, OR
A is a prime attribute.

BCNF (Boyce-Codd Normal Form): stricter than 3NF.

For every FD X → A, X must be a super key. (No exception for prime attribute.)
All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.

4NF and 5NF (advanced):

4NF: no multi-valued dependencies (MVDs).
5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.

EXAMPLE PROBLEM:

Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.

Find candidate keys.

Compute closures of single attributes:
- A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
- B⁺ = BD.
- C⁺ = C.
- D⁺ = D.
- E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
Check pairs:
- CD⁺ = CDEABC = all → CD is a CK.
- BC⁺ = BCDEA = all → BC is a CK.

Candidate keys: {A, E, CD, BC}.

Prime attributes: A, B, C, D, E (all are prime).

In this case, every attribute is prime → likely 3NF since prime rule satisfied.

EXAM HOOKS:

A relation is in BCNF if for every non-trivial FD, LHS is a super key.
3NF allows X→A where A is prime (even if X is not super key).
Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.