CTE in SQL: A Complete Guide with Examples

If you’ve been working with SQL for a while but haven’t used CTEs, you’ll likely wonder how you managed without them. I use them pretty much everywhere, including in SELECT, INSERT, UPDATE, and DELETE statements.

In this article, I will go through the basics, including how to create a CTE. I will also go into more advanced things, such as how to differentiate between non-recursive and recursive CTEs, which both serve a purpose. 

If you are a bit unfamiliar with SQL operations, try our very popular Introduction to SQL course to get started. The course is well-designed and comprehensive, and it will teach you everything you need to know to extract data using efficient queries.

What is a SQL CTE?

The idea of CTEs will become clear when I show examples. But for now, we can say that a CTE, or common table expression, is a temporary, named result set in SQL that allows you to simplify complex queries, making them easier to read and maintain.

CTEs are commonly used when working with multiple subqueries. You might recognize them because they are created with the distinctive WITH keyword and, like I mentioned, they can be used in SELECT, INSERT, UPDATE, and DELETE statements.

How to Create a SQL CTE

When creating a CTE, we use the WITH keyword to initiate the CTE definition. The general syntax of a CTE is as follows:

WITH cte_name (column1, column2, ...) AS ( -- Query that defines the CTE SELECT ... FROM ... WHERE ... ) -- Main query SELECT ... FROM cte_name;

Where:

  • WITH: Initiates the CTE definition, indicating that the following name represents a temporary result set.

  • cte_name: The name is assigned to the CTE to reference it in the main query.

  • Optional column list (column1, column2, …): Specifies column names for the CTE’s result set. This is useful when column names need to be adjusted.

  • Query that defines the CTE: The inner query that selects data and shapes the temporary result set.

  • Main query: References the CTE by its name, using it like a table.

Let us look at the following example of creating a CTE using a layered approach. Assume we have a table Employees, and we want to create a CTE that selects employees who earn a salary above $50,000.

Step 1: Write the base query

We start by writing the basic SELECT query:

SELECT EmployeeID, FirstName, LastName, Salary FROM Employees WHERE Salary > 50000;

Step 2: Wrap the query using the WITH keyword to create a CTE

Use the WITH keyword to give the CTE a name.

WITH HighEarningEmployees AS ( SELECT EmployeeID, FirstName, LastName, Salary FROM Employees WHERE Salary > 50000 )

Step 3: Use the CTE in the main query

Finally, refer to the CTE in a SELECT statement by calling the CTE name defined above.

-- Define a Common Table Expression (CTE) WITH HighEarningEmployees AS ( SELECT EmployeeID, FirstName, LastName, Salary FROM Employees WHERE Salary > 50000 ) -- Use the CTE to select high-earning employees SELECT EmployeeID, FirstName, LastName FROM HighEarningEmployees;

To summarize the above steps, we have used the WITH keyword to define the CTE named HighEarningEmployees. The inner query was used to generate the temporary dataset. The main query references the HighEarningEmployees to display the specified columns EmployeeID, FirstName, and LastName.

Why SQL CTEs are Useful

From the above example, you may wonder why we use CTEs when even simple queries yield the same results. The following are the reasons:

Simplify complex queries

CTEs break down complex SQL statements into smaller, more manageable parts, making the code easier to read, write, and maintain. 

Suppose we have three tables: Orders, Customers, and Products. We want to find the total revenue generated by each customer who purchased in 2024. When we write the query without using CTE, it looks cluttered and hard to read and understand.

-- Select customer names and total revenue from their orders SELECT c.CustomerName, SUM(p.Price * o.Quantity) AS TotalRevenue FROM Orders o -- Join to get customer and products table JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON o.ProductID = p.ProductID WHERE YEAR(o.OrderDate) = 2024 GROUP BY c.CustomerName HAVING SUM(p.Price * o.Quantity) > 1000;

By using a CTE, we can separate the logic into a more readable format:

-- Define the CTE WITH OrderDetails AS ( SELECT o.OrderID, c.CustomerName, p.Price, o.Quantity, o.OrderDate FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON o.ProductID = p.ProductID WHERE YEAR(o.OrderDate) = 2024 ) --Main query SELECT CustomerName, SUM(Price * Quantity) AS TotalRevenue FROM OrderDetails GROUP BY CustomerName HAVING SUM(Price * Quantity) > 1000;

Code reusability

CTEs help avoid duplication by allowing the same result set to be reused across different parts of a query. If multiple calculations or operations are based on the same dataset, you can define it once in a CTE and refer to it as needed.

Assume we need to calculate the average and total sales for each product category in an e-commerce database. We can use a CTE to define the calculations once and reuse them in subsequent queries.

-- Define a CTE to calculate total and average sales for each category WITH CategorySales AS ( SELECT Category, SUM(SalesAmount) AS TotalSales, AVG(SalesAmount) AS AverageSales FROM Products GROUP BY Category ) -- Select category, total sales, and average sales from the CTE SELECT Category, TotalSales, AverageSales FROM CategorySales WHERE TotalSales > 5000;

Other applications

In addition to simplifying queries and code reusability, CTEs have other uses as well. I’m not able to cover every possible use of CTEs in detail. Our Data Manipulation in SQL course is a great option if you want to keep practicing. However, I’ll document some of the main other reasons here:

  • Query Organization and Readability: CTEs improve SQL code readability by dividing queries into logical, sequential steps. Each step in the query process can be represented by its own CTE, making the entire query easier to follow.
  • Hierarchical Data Traversal: CTEs can help navigate hierarchical relationships, such as organizational structures, parent-child relationships, or any data model that involves nested levels. Recursive CTEs are useful for querying hierarchical data because they allow you to traverse levels iteratively.
  • Multi-Level Aggregations: CTEs can help perform aggregations at multiple levels, such as calculating sales figures at different granularities (e.g., by month, quarter, and year). Using CTEs to separate these aggregation steps ensures that each level is calculated independently and logically.
  • Combining Data from Multiple Tables: Multiple CTEs can be used to combine data from different tables, making the final combination step more structured. This approach simplifies complex joins and ensures the source data is organized logically for improved readability.

Advanced SQL CTE Techniques

CTEs support advanced SQL techniques, making them versatile and useful for different use cases. The following are some of CTEs’ advanced applications.

Multiple CTEs in a single query

You can define multiple CTEs in a single query, which allows for complex transformations and calculations. This method is useful when a problem requires multiple stages of data processing, where each CTE represents a distinct stage.

Suppose we have sales data in a table called Sales and we want to calculate the total sales for each product, identify products with above-average total sales, and rank these products based on their total sales.

WITH ProductSales AS ( -- Step 1: Calculate total sales for each product SELECT ProductID, SUM(SalesAmount) AS TotalSales FROM Sales GROUP BY ProductID ), AverageSales AS ( -- Step 2: Calculate the average total sales across all products SELECT AVG(TotalSales) AS AverageTotalSales FROM ProductSales ), HighSalesProducts AS ( -- Step 3: Filter products with above-average total sales SELECT ProductID, TotalSales FROM ProductSales WHERE TotalSales > (SELECT AverageTotalSales FROM AverageSales) ) -- Step 4: Rank the high-sales products SELECT ProductID, TotalSales, RANK() OVER (ORDER BY TotalSales DESC) AS SalesRank FROM HighSalesProducts;

In the above example;

  • The first CTE (ProductSales) calculates the total sales per product.

  • The second CTE (AverageSales) computes the average total sales across all products.

  • The third CTE (HighSalesProducts) filters for products whose total sales exceed the average.

  • The final query ranks these products based on their total sales.

CTEs in UPDATE, DELETE, and MERGE Statements

When incorporated into UPDATE, DELETE, and MERGE operations, CTEs can simplify data manipulation tasks, especially when dealing with complex filters or hierarchical data.

Using CTE with an UPDATE statement

Assume we have an Employees table with an EmployeeSalary column. We want to give a 10% raise to all employees who have worked for the company for over 5 years.

-- Define a CTE to find employees hired more than 5 years ago WITH LongTermEmployees AS ( SELECT EmployeeID FROM Employees WHERE DATEDIFF(YEAR, HireDate, GETDATE()) > 5 ) -- Update salaries by 10% for long-term employees identified in the CTE UPDATE Employees SET EmployeeSalary = EmployeeSalary * 1.1 WHERE EmployeeID IN (SELECT EmployeeID FROM LongTermEmployees);

The CTE LongTermEmployees identifies employees who have worked more than five years. The UPDATE statement uses this CTE to selectively increase salaries.

Using CTE with a DELETE statement

Now assume we have a table named Products and want to delete all products that haven’t been sold in the last 2 years. We can use a CTE to filter the products:

-- Define a CTE to identify products not sold in the last 2 years WITH OldProducts AS ( SELECT ProductID FROM Products -- Use DATEADD to find products with a LastSoldDate more than 2 years ago WHERE LastSoldDate < DATEADD(YEAR, -2, GETDATE()) ) -- Delete products identified as old from the main table DELETE FROM Products WHERE ProductID IN (SELECT ProductID FROM OldProducts);

The CTE OldProducts identifies products that haven’t been sold in the last two years, and then the DELETE statement uses this CTE to remove those products.

Using CTE with a MERGE statement

The MERGE statement in SQL allows for conditional updates, inserts, or deletions in a target table based on data in a source table. In the following example, the CTE MergedInventory combines new and existing inventory data. The MERGE statement then updates quantities for existing products or inserts new products based on the CTE data.

-- CTE to merge new and existing inventory data WITH MergedInventory AS ( SELECT ni.ProductID, ni.Quantity AS NewQuantity, i.Quantity AS CurrentQuantity FROM NewInventoryData ni -- Use LEFT JOIN to include all new data, even if not in current inventory LEFT JOIN Inventory i ON ni.ProductID = i.ProductID ) -- Merge the prepared data into the Inventory table MERGE INTO Inventory AS i USING MergedInventory AS mi ON i.ProductID = mi.ProductID -- Update existing products with new quantities WHEN MATCHED THEN UPDATE SET i.Quantity = mi.NewQuantity -- Insert new products if they don't exist in the inventory WHEN NOT MATCHED BY TARGET THEN INSERT (ProductID, Quantity) VALUES (mi.ProductID, mi.NewQuantity);

Recursive Common Table Expressions (CTEs)

Recursive CTEs help perform advanced and repeated operations.

Introduction to recursive CTEs

Recursive CTEs are a special type of CTE that references itself within its definition, allowing the query to perform repeated operations. This makes them ideal for working with hierarchical or tree-structured data, such as organizational charts, directory structures, or product assemblies. The recursive CTE iteratively processes data, returning results step by step until a termination condition is met.

Anchor and recursive members

A recursive CTE consists of two main parts:

  • Anchor Member: The part that defines the base query that starts the recursion.
  • Recursive Member: The part that references the CTE itself, allowing it to perform the “recursive” operations.

Suppose we have an Employees table, where each row contains an EmployeeID, EmployeeName, and ManagerID. If we want to find all direct and indirect reports for a specific manager, we start with the anchor member identifying the top-level manager. The anchor member starts with the employee with EmployeeID = 1.

The recursive member finds employees whose ManagerID matches the EmployeeID from the previous iteration. Each iteration retrieves the next level of the hierarchy.

WITH EmployeeHierarchy AS ( -- Anchor member: select the top-level manager SELECT EmployeeID, EmployeeName, ManagerID, 1 AS Level FROM Employees WHERE EmployeeID = 1 -- Starting with the top-level manager UNION ALL -- Recursive member: find employees who report to the current managers SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1 FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID ) SELECT EmployeeID, EmployeeName, Level FROM EmployeeHierarchy;

Potential Issues or Limitations of CTEs in SQL

Understanding the features and limitations of CTEs is important for writing logical and readable queries. Let us look at some limitations and potential issues of using CTEs in different databases.

SQL Server and Azure limitations

There are some environment-specific limitations for SQL CTEs when working with SQL Server or Azure Synapse Analytics. They include the following:

  • SQL Server: The default maximum recursion level for recursive CTEs is 100, which can be modified using the OPTION (MAXRECURSION) hint. If this limit is exceeded without adjustment, an error occurs. CTEs cannot be nested directly within each other or defined inside another CTE.

  • Azure Synapse Analytics: CTEs have limited support for certain SQL operations such as INSERT, UPDATE, DELETE, and MERGE. Also, recursive CTEs are not supported in the Azure Synapse Analytics cloud-based environments restricting the ability to perform certain hierarchical data operations.

If you find yourself working with SQL Server, know that DataCamp has a lot of great resources to help. To start, I recommend taking DataCamp’s Introduction to SQL Server course to master the basics of SQL Server for data analysis. You can try our SQL Server Developer career track, which covers everything from transactions and error handling to time series analysis. Our Hierarchical and Recursive Queries in SQL Server course gets right to the heart of how to write advanced queries in SQL Server, including methods involving CTEs.

Other potential issues

Although CTEs are useful for simplifying complex queries, there are some common pitfalls you should be aware of. They include the following:

  • Infinite Loops in Recursive CTEs: If the termination condition for a recursive CTE is not met, it can result in an infinite loop, causing the query to run indefinitely. To avoid the recursive CTE running infinitely, use the OPTION (MAXRECURSION N) hint to limit the maximum number of recursive iterations, where N is a specified limit.

  • Performance Considerations: Recursive CTEs can become resource-intensive if the recursion depth is high or large datasets are being processed. To optimize the performance, limit the data processed in each iteration and ensure appropriate filtering to avoid excessive recursion levels.

When to Use CTEs vs. Other Techniques

While CTEs are appropriate for simplifying queries involving repeated tasks, derived tables, views, and temp tables also serve similar purposes. The following table highlights the advantages and disadvantages of each method and when to use each.

Technique Advantages Disadvantages Suitable Use Case
CTEs Temporary scope within a single queryNo storage or maintenance requiredImproves readability by modularizing code Limited to the query in which they are defined Organizing complex queries, temporary transformations, and breaking down multi-step operations
Derived Tables Simplifies nested subqueriesNo need for permanent storage Harder to read/maintain for complex queriesIt cannot be reused multiple times within a query Quick, single-use transformations and aggregations within a query
Views Reusable across queriesCan enhance security by restricting data access Requires maintenance and can affect multiple queriesComplex views can impact performance Long-term reusable logic and data access control

Conclusion

Mastering CTEs takes practice, like anything: I recommend trying out DataCamp’s Associate Data Analyst in SQL career track to become a proficient data analyst. The Reporting in SQL course will also help you become proficient in building complex reports and dashboards for effective data presentation. Finally, you should obtain the SQL Associate Certification to showcase your mastery in using SQL to solve business problems and stand out among other professionals.

Source:
https://www.datacamp.com/tutorial/cte-sql