What is Second Normal Form (2NF)?

Tutorials

SQL

When working with databases, it’s common to encounter issues like redundant data and inconsistent updates. Second normal form is a database normalization step that builds on first normal form (1NF) to create cleaner and more efficient tables.

Understanding 2NF is critical for anyone working in database design or data management, and it lays the foundation for higher normalization forms like third normal form (3NF). In this article, we’ll explore how 2NF works and how to transform tables to meet 2NF requirements, with practical examples. We’ll also talk about the benefits and drawbacks of 2NF, and the use cases it suits best.

Understanding Second Normal Form

Second normal form is a database normalization step focused on eliminating partial dependencies. It was introduced by Edgar F. Codd, the pioneer of relational databases, as part of his work on normalization.

Before a table can be in 2NF, it must satisfy the rules of first normal form:

Atomicity: Each cell must contain a single value (no repeating groups or arrays).
Unique rows: The table must have a clear primary key.

2NF goes one step further with an additional rule: eliminate partial dependencies.

A partial dependency occurs when a non-prime attribute (column that isn’t part of any candidate key) relies on only part of a composite key instead of the whole key. The 2NF rule ensures that all non-prime attributes are dependent on the entire primary key, not just a part of it. Leaving partial dependencies in a table means that redundant data can creep into the database, leading to inefficiency and potential inconsistencies during updates or deletions.

The theory alone can be a little dry, so let’s look at a practical example.

Below is a Course Enrollment table of Datacamp students.

Student ID	Course ID	Course Name	Instructor Name
1001	201	SQL Fundamentals	Ken Smith
1002	202	Introduction to Python	Merlin O’Donnell
1001	202	Introduction to Python	Merlin O’Donnell

Here, the primary key is the composite of Student ID and Course ID. However, the non-prime attributes Course Name and Course Fee depend only on Course ID, not the entire key. This violates 2NF.

Steps for Decomposing Tables to Achieve 2NF

To make sure that a table follows the rules of 2NF, you need to:

Identify All Candidate Keys: Determine the minimal sets of attributes that uniquely identify rows in the table. These are your candidate keys.
Determine Functional Dependencies: Identify all functional dependencies in the table. Specifically, look for dependencies where non-prime attributes (those not part of any candidate key) depend only on a part of a composite key.
Eliminate Partial Dependencies: For each partial dependency:

Move the dependent attributes into a new table along with the part of the key they depend on.
Ensure the new table has a unique primary key.

Repeat Until No Partial Dependencies Remain: Confirm that every non-prime attribute in all tables is fully dependent on its respective primary key.

Examples of Second Normal Form in Practice

Let’s now look at two examples.

Example 1: Course enrollment table

Earlier, we saw the following course enrollment table:

Student ID	Course ID	Course Name	Instructor Name
1001	201	SQL Fundamentals	Ken Smith
1002	202	Introduction to Python	Merlin O’Donnell
1001	202	Introduction to Python	Merlin O’Donnell

Let’s follow the steps we outlined in the previous section.

1. Identify our candidate key.

In this case, the candidate key is a composite key of Student ID and Course ID. This unique combination identifies each row in the table.

2. Determine our functional dependencies

Course Name and Instructor Name depend on Course ID, not the full composite key (Student ID, Course ID). This is a partial dependency because these attributes depend on only part of the composite key.

3. Eliminate partial dependencies

We need to move the attributes that depend on only part of the key (Course Name and Instructor Name) to a new table that is based solely on Course ID.

After decomposition, our new tables look like this:

Course enrollment table

Student ID	Course ID
1001	201
1002	202
1001	202

Course details table

Course ID	Course Name	Instructor Name
201	SQL Fundamentals	Ken Smith
202	Introduction to Python	Merlin O’Donnell

If you want to get hands-on and create your own databases, have a look at our PostgresQL course. If you are a little more advanced, you could try this Introduction to Data Modeling in Snowflake, which covers ideas like entity-relationship and dimensional modeling.

Example 2: Orders table

We will start with this Orders table. Try to follow the steps we outlined above and decompose this table yourself!

Order ID	Product ID	Order Date	Product Name	Supplier Name
1	201	2024-11-01	Laptop	TechSupply
1	202	2024-11-01	Mouse	TechSupply
2	201	2024-11-02	Laptop	TechSupply
3	203	2024-11-03	Keyboard	KeyMasters

1. Identify our candidate key

The Order ID and Product ID combination uniquely identifies each row, making (Order ID, Product ID) a composite candidate key. No single column can uniquely identify rows because:

Order ID alone isn’t unique, as multiple products can be part of the same order.
Product ID alone isn’t unique, as the same product can appear in different orders.

This means that (Order ID, Product ID) is also our primary key.

2. Determine our functional dependencies

Order Date depends on Order ID (not on the full composite key). This is a partial dependency.

Product Name and Supplier Name depend on Product ID (not on the full composite key). These are also partial dependencies.

3. Eliminate partial dependencies

We need to split the table into smaller tables, each addressing one logical dependency.

First, we’ll create a table for order information, which contains information specific to Order ID.

Orders Table

Order ID	Order Date
1	2024-11-01
2	2024-11-02
3	2024-11-03

Then, we create a table that contains information specific to Product ID.

Orders Table

Product ID	Product Name	Supplier Name
201	Laptop	TechSupply
202	Mouse	TechSupply
203	Keyboard	KeyMasters

The original table is now reduced to just the composite key and the relationships between orders and products.

Order ID	Product ID
1	201
1	202
2	201
3	203

Now, our database is in 2NF because 1) all partial dependencies have been eliminated, and 2) non-prime attributes depend entirely on their respective primary keys.

When to Implement Second Normal Form

So, why should you refactor your database to 2NF? Is it sufficient on its own or should you take one step further and aim for 3NF?

Benefits and limitations of second normal form

Second normal form offers several advantages, making it a useful step in the database normalization process:

Enhanced data integrity: By eliminating partial dependencies, 2NF minimizes insertion, update, and deletion anomalies, leading to a more reliable database.
Reduction of redundancy: 2NF decreases data repetition, optimizing storage usage and simplifying data maintenance.
Improved data structure: It lays the groundwork for further normalization, like progression to third normal form, by creating a cleaner and more efficient database design.

But it does come with some limitations:

Increased complexity: Decomposing tables to meet 2NF can make the design process more complex, particularly when dealing with composite keys and dependencies.
Additional joins: Splitting tables may require more joins in queries, potentially impacting performance in systems with large datasets or complex queries – more on that below.
Residual redundancy: While 2NF reduces partial dependencies, it does not address transitive dependencies, leaving some redundancy until addressed in 3NF.

Performance considerations with second normal form

Decomposing tables to eliminate partial dependencies can directly impact database performance. On one hand, achieving 2NF reduces data redundancy and improves consistency, leading to fewer anomalies during insert, update, or delete operations. On the other hand, normalization can increase the number of tables, which means additional joins are necessary when retrieving related data. This could impact query performance in large datasets.

To make sure your normalized database remains performant, make sure you follow these best practices:

Indexing: Use indexes to speed up joins between decomposed tables.
Query optimization: Optimize queries to minimize the cost of additional joins.
Hybrid approach: Combine normalization with denormalization in areas where performance is important, such as reporting tables.
Regular monitoring: Continuously evaluate your database performance with profiling tools to catch any potential issue.

Is 2NF just a transitional step to achieve third normal form?

In most cases, database designers strive to achieve third normal form due to its ability to reduce redundancy further and improve overall data integrity. However, achieving 3NF often involves additional work, such as creating more tables and relationships, which can introduce complexity and performance trade-offs in query execution.

There are cases where using second normal form by itself can be sufficient. If simplicity and quick implementation are priorities, such as in small-scale projects, prototyping, or situations where data redundancy is minimal, 2NF can suffice. For example, in systems where all attributes are already fully dependent on a simple primary key, achieving 2NF might fulfill the primary goal of reducing partial dependency, without the need for further normalization.

Moving beyond second normal form: toward third normal form

If you want to normalize your database further, you can keep refactoring your tables to reach third normal form.

3NF builds on 2NF by addressing transitive dependencies – situations where non-key attributes depend on other non-key attributes rather than the primary key. This progression ensures that each attribute is directly dependent on the primary key and nothing else.

For example, in a table tracking course enrollments:

2NF: Ensures that attributes like the course name and student name depend entirely on their respective primary keys (e.g., Student ID and Course ID). This eliminates partial dependencies, where non-key attributes rely only on part of the composite key.
3NF: Ensures that attributes like instructor details or department information are stored in separate tables, eliminating transitive dependencies.

3NF is ideal for more complex systems where data integrity and efficiency are paramount, especially as the volume of data grows. Check out our What is third normal form? article if you want to learn more about 3NF and its more restrictive form, BCNF.

Conclusion

Second normal form is an essential step in database normalization, bridging the gap between 1NF and higher forms like 3NF. By removing partial dependencies, 2NF reduces redundancy and improves the reliability of your data. While it can add some complexity, the benefits of improved data integrity and simplified maintenance make it a critical part of effective database design.

If you’re ready to take your skills further, explore our Database Design course to deepen your understanding of normalization techniques and their practical applications. You can also validate your SQL and database management skills and demonstrate your expertise to potential employers with our SQL Associate Certification!

Lastly, I want to say, if you are a decisionmaker in a business and you know that you have work to do in creating cleaner, more efficient databases, consider putting in a DataCamp for Business demo request. We can help transform your team’s capabilities so that you can create scalable database systems that drive business efficiency and innovation. We can even create tailored learning paths and custom tracks.

Source:
https://www.datacamp.com/tutorial/second-normal-form