What is Third Normal Form (3NF)?

Imagine working with a massive, unstructured database filled with repeated, redundant information. Every update or deletion becomes a potential disaster, risking errors and inconsistencies. Third normal form (3NF) is a proven database normalization method to avoid this chaos. Implementing 3NF cleans up your data structure, ensuring it’s efficient, organized, and free from unnecessary redundancies.

In this article, we’ll explore how 3NF works, why it’s valuable, and how you can put it into practice. We’ll also compare 3NF to other forms and learn when to use each. Everyone can benefit from learning more about these structures, but this knowledge is particularly valuable if you are a database designer or data scientist, as it can significantly simplify your work and keep your databases reliable. If you are interested in database design as a whole, take a look at our full Database Design course.

Third Normal Form (3NF) Definition

Third normal form is a key concept of database normalization that removes unwanted dependencies. 3NF builds upon first normal form (1NF) and second normal form (2NF), meaning that it inherits their rules: 1NF requires atomic (indivisible) values in each cell, and 2NF removes partial dependencies on a composite primary key. 3NF takes it further by removing transitive dependencies, a situation where non-key attributes depend indirectly on the primary key.

By focusing on this, 3NF ensures that each non-key column in a table is directly tied to the primary key and nothing else. In more practical terms, 3NF helps minimize redundancy and avoid anomalies when inserting, updating, or deleting data.

In the 1970s, Edgar F. Codd introduced 3NF to formalize the conditions for achieving a fully normalized database structure. A reformulation by Carlo Zaniolo a few years later provided a clearer explanation of the difference between the “classic” 3NF and the more restrictive Boyce-Codd normal form (BCNF). Don’t worry too much about BCNF for now, we will come back to it further down.

Understanding the Conditions for Third Normal Form

So, what exactly does it take to achieve 3NF? For a table to qualify, it needs to meet a few conditions:

  • Be in 2NF: This means it’s already atomic, with no repeating groups and no partial dependencies on any composite keys.

1NF, 2NF, and 3NF requirements

3NF is inclusive of 2NF and 1NF. Image by Author

  • No transitive dependencies: This rule is key. In a 3NF table, any non-primary-key column must depend solely on the primary key, not indirectly through another non-key column.

Let’s have a look at what that practically means.

Decomposing Tables to Achieve 3NF

Let’s walk through the process of decomposing tables to reach 3NF. We’ll use some sample data from DataCamp courses to illustrate each step.

Step 1: Identify transitive dependencies

To start, we’ll look for any attributes in a table that indirectly depend on the primary key. As a rule of thumb, if any attribute depends on something other than the primary key, this indicates a transitive dependency. That’s a sign that it might be time to split up your table.

Take a look at the three tables below. Which one has a transitive dependency?

Table 1: Course

Course ID Course Name Difficulty
201 SQL Fundamentals Beginner
202 Introduction to Python Beginner
203 Understanding Data Science Intermediate

Table 2: Instructor

Instructor ID Instructor Name Expertise
1 Sarah Johnson Data Science
2 Tom Williams Machine Learning
3 Emily Brown Python

Table 3: Enrollments

Enrollment ID Student Name Course ID Course Name
1001 Alice Smith 201 SQL Fundamentals
1002 Bob Green 202 Introduction to Python
1003 Charlie Blue 201 SQL Fundamentals

The answer is… Table 3!

In this table, Course Name depends on Course ID, but not directly on Enrollment ID (the primary key). This indirect dependency makes Course Name a transitive dependency.

Step 2: Separate data into new tables

To address the transitive dependency, we’ll split Table 1 into two tables. Each table will focus on directly dependent data.

Revised enrollments table

Enrollment ID Student Name Course ID
1001 Alice Smith 201
1002 Bob Green 202
1003 Charlie Blue 201

Courses table

Course ID Course Name
201 SQL Fundamentals
202 Introduction to Python

Now, each table contains only information that directly depends on its primary key: Course ID is now the primary key for Course Name in the Courses table, and Enrollment ID is the primary key in the Enrollments table.

With this decomposition, the tables now meet 3NF requirements, eliminating redundancy and ensuring each table stores only directly relevant information.

If you want to get hands-on and create your own databases, have a look at our Creating PostgreSQL Databases course. If you are a little more advanced, you could try Introduction to Data Modeling in Snowflake, which covers ideas like entity-relationship and dimensional modeling.

Benefits and Limitations of Using Third Normal Form

So, why go through all this effort to reach 3NF? Here are the main perks:

  • Improved Data Integrity: By eliminating transitive dependencies, 3NF helps ensure that updates and deletions don’t lead to conflicting or outdated data across tables.
  • Reduced Redundancy: Less redundancy means your database is easier to maintain, and storage use is reduced.
  • Simpler Data Maintenance: Keeping similar information in dedicated tables makes it easier to update records without tracking down redundant entries.

That said, while 3NF structures support data accuracy, they can also lead to more segmented data, sometimes making complex queries slower due to additional table joins. In cases where the need for speed trumps the need for normalization, BCNF or 4NF may be more practical options.

Comparison: First, Second, Third, and BC Normal Forms

Let’s have a look at form differences.

Comparison table: first, second, and third normal forms

Here is a comparison table to help you understand the requirements of 1NF, 2NF, and 3NF.

Feature 1NF 2NF 3NF
Atomic data
No partial dependencies
No transitive dependencies

Third normal form vs. Boyce-Codd normal form (BCNF)

BCNF is a “stricter” form of 3NF that further eliminates anomalies that arise with overlapping candidate keys. It can be especially useful in complex cases where 3NF alone doesn’t fully eliminate dependencies. BCNF applies when a non-prime attribute depends on an attribute that is part of a composite candidate key. I know that sounds complex, so let’s break it down with an example.

Current structure (in 3NF)

After decomposition to achieve 3NF, we had these two tables:

Enrollments table

Enrollment ID Student Name Course ID
1001 Alice Smith 201
1002 Bob Green 202
1003 Charlie Blue 201

Courses table

Course ID Course Name
201 SQL Fundamentals
202 Introduction to Python

In this structure, each table is in 3NF with no transitive dependencies, and data is appropriately normalized.

Introducing a new requirement

Now, let’s add a new attribute to Courses: the Classroom in which each course is held. This new attribute could result in a scenario that requires BCNF.

Updated courses table (3NF)

Course ID Course Name Classroom
201 SQL Fundamentals Room 101
202 Introduction to Python Room 102
203 Understanding Data Science Room 101

Here, Course ID is still the primary key, and all other attributes depend directly on it. But let’s assume there’s a new rule that each classroom can hold only one subject at a time. Let’s also suppose the Course NameSQL Fundamentals” could be offered under different Course IDs (like 201, 204, etc.), if they were scheduled at different times. In that case, each offering of “SQL Fundamentals” would still take place in “Room 101,” regardless of the specific Course ID. As a result, Course Name also uniquely determines Classroom.

This means we now have two candidate keys:

  1. Course ID
  2. Course Name

With both candidate keys, we now have an issue that 3NF does not address: Classroom depends on Course Name rather than just Course ID.

Applying BCNF

To eliminate this dependency issue, we’ll need to further decompose the Courses table into two separate tables that better align with BCNF:

  1. A new Courses table, which only includes the Course ID and Course Name.
  2. A CourseDetails table, which stores the Course Name and Classroom association.

Here’s how this looks:

Revised courses table (BCNF)

CourseDetails table (BCNF)

Course Name Classroom
SQL Fundamentals Room 101
Introduction to Python Room 102
Understanding Data Science Room 101

With this new structure, each table satisfies BCNF conditions:

  • In the Courses table, Course ID is the primary key, and all attributes depend solely on it.
  • In the CourseDetails table, Course Name is the primary key, and Classroom depends only on Course Name.

This setup removes any dependency issues caused by overlapping candidate keys, ensuring a strictly normalized structure.

Conclusion

Third normal form is a valuable tool for database designers aiming to keep data clean, consistent, and free from problematic dependencies. With 3NF, data integrity is enhanced, making management smoother and reducing redundancy. Remember, while 3NF works well in most situations, more complex databases might benefit from additional forms like BCNF or 4NF.

If you found this article helpful, consider taking the next step by earning our SQL Associate Certification. It’s a great way to validate your SQL and database management skills and demonstrate your expertise to potential employers!

Source:
https://www.datacamp.com/tutorial/third-normal-form