- No transitive dependencies: This rule is key. In a 3NF table, any non-primary-key column must depend solely on the primary key, not indirectly through another non-key column.
Let’s have a look at what that practically means.
Decomposing Tables to Achieve 3NF
Let’s walk through the process of decomposing tables to reach 3NF. We’ll use some sample data from DataCamp courses to illustrate each step.
Step 1: Identify transitive dependencies
To start, we’ll look for any attributes in a table that indirectly depend on the primary key. As a rule of thumb, if any attribute depends on something other than the primary key, this indicates a transitive dependency. That’s a sign that it might be time to split up your table.
Take a look at the three tables below. Which one has a transitive dependency?
Table 1: Course
Course ID | Course Name | Difficulty |
---|---|---|
201 | SQL Fundamentals | Beginner |
202 | Introduction to Python | Beginner |
203 | Understanding Data Science | Intermediate |
Table 2: Instructor
Instructor ID | Instructor Name | Expertise |
---|---|---|
1 | Sarah Johnson | Data Science |
2 | Tom Williams | Machine Learning |
3 | Emily Brown | Python |
Table 3: Enrollments
Enrollment ID | Student Name | Course ID | Course Name |
---|---|---|---|
1001 | Alice Smith | 201 | SQL Fundamentals |
1002 | Bob Green | 202 | Introduction to Python |
1003 | Charlie Blue | 201 | SQL Fundamentals |
The answer is… Table 3!
In this table, Course Name depends on Course ID, but not directly on Enrollment ID (the primary key). This indirect dependency makes Course Name a transitive dependency.
Step 2: Separate data into new tables
To address the transitive dependency, we’ll split Table 1 into two tables. Each table will focus on directly dependent data.
Revised enrollments table
Enrollment ID | Student Name | Course ID |
---|---|---|
1001 | Alice Smith | 201 |
1002 | Bob Green | 202 |
1003 | Charlie Blue | 201 |
Courses table
Course ID | Course Name |
---|---|
201 | SQL Fundamentals |
202 | Introduction to Python |
Now, each table contains only information that directly depends on its primary key: Course ID is now the primary key for Course Name in the Courses table, and Enrollment ID is the primary key in the Enrollments table.
With this decomposition, the tables now meet 3NF requirements, eliminating redundancy and ensuring each table stores only directly relevant information.
If you want to get hands-on and create your own databases, have a look at our Creating PostgreSQL Databases course. If you are a little more advanced, you could try Introduction to Data Modeling in Snowflake, which covers ideas like entity-relationship and dimensional modeling.
Benefits and Limitations of Using Third Normal Form
So, why go through all this effort to reach 3NF? Here are the main perks:
- Improved Data Integrity: By eliminating transitive dependencies, 3NF helps ensure that updates and deletions don’t lead to conflicting or outdated data across tables.
- Reduced Redundancy: Less redundancy means your database is easier to maintain, and storage use is reduced.
- Simpler Data Maintenance: Keeping similar information in dedicated tables makes it easier to update records without tracking down redundant entries.
That said, while 3NF structures support data accuracy, they can also lead to more segmented data, sometimes making complex queries slower due to additional table joins. In cases where the need for speed trumps the need for normalization, BCNF or 4NF may be more practical options.
Comparison: First, Second, Third, and BC Normal Forms
Let’s have a look at form differences.
Comparison table: first, second, and third normal forms
Here is a comparison table to help you understand the requirements of 1NF, 2NF, and 3NF.
BCNF is a “stricter” form of 3NF that further eliminates anomalies that arise with overlapping candidate keys. It can be especially useful in complex cases where 3NF alone doesn’t fully eliminate dependencies. BCNF applies when a non-prime attribute depends on an attribute that is part of a composite candidate key. I know that sounds complex, so let’s break it down with an example.
Current structure (in 3NF)
After decomposition to achieve 3NF, we had these two tables:
Enrollments table
Courses table
Course ID | Course Name |
---|---|
201 | SQL Fundamentals |
202 | Introduction to Python |
In this structure, each table is in 3NF with no transitive dependencies, and data is appropriately normalized.
Introducing a new requirement
Now, let’s add a new attribute to Courses: the Classroom in which each course is held. This new attribute could result in a scenario that requires BCNF.
Updated courses table (3NF)
Course ID | Course Name | Classroom |
---|---|---|
201 | SQL Fundamentals | Room 101 |
202 | Introduction to Python | Room 102 |
203 | Understanding Data Science | Room 101 |
Here, Course ID is still the primary key, and all other attributes depend directly on it. But let’s assume there’s a new rule that each classroom can hold only one subject at a time. Let’s also suppose the Course Name “SQL Fundamentals” could be offered under different Course IDs (like 201, 204, etc.), if they were scheduled at different times. In that case, each offering of “SQL Fundamentals” would still take place in “Room 101,” regardless of the specific Course ID. As a result, Course Name also uniquely determines Classroom.
This means we now have two candidate keys:
- Course ID
- Course Name
With both candidate keys, we now have an issue that 3NF does not address: Classroom depends on Course Name rather than just Course ID.
Applying BCNF
To eliminate this dependency issue, we’ll need to further decompose the Courses table into two separate tables that better align with BCNF:
- A new Courses table, which only includes the Course ID and Course Name.
- A CourseDetails table, which stores the Course Name and Classroom association.
Here’s how this looks:
Revised courses table (BCNF)
CourseDetails table (BCNF)
Course Name | Classroom |
---|---|
SQL Fundamentals | Room 101 |
Introduction to Python | Room 102 |
Understanding Data Science | Room 101 |
- In the Courses table, Course ID is the primary key, and all attributes depend solely on it.
- In the CourseDetails table, Course Name is the primary key, and Classroom depends only on Course Name.
This setup removes any dependency issues caused by overlapping candidate keys, ensuring a strictly normalized structure.
Conclusion
Third normal form is a valuable tool for database designers aiming to keep data clean, consistent, and free from problematic dependencies. With 3NF, data integrity is enhanced, making management smoother and reducing redundancy. Remember, while 3NF works well in most situations, more complex databases might benefit from additional forms like BCNF or 4NF.
If you found this article helpful, consider taking the next step by earning our SQL Associate Certification. It’s a great way to validate your SQL and database management skills and demonstrate your expertise to potential employers!