SQLServerCentral Article

Common Table Expression Basics

,

Introduction

Firstly, we need to understand what is a common table expression. Common table expressions, also abbreviated as CTE, are virtual tables which are formed by collecting and formatting data from one or multiple source table(s). These virtual tables are not created permanently, which means they are only available during the query execution. Once the query execution is complete, the tables are automatically removed.

Let us look into the proper way of writing a CTE first:

WITH our_cte AS (
  SELECT col1, col2, col3
  FROM TABLE
)
SELECT col1, col3
FROM our_cte 
WHERE ....

The name of our CTE is our_cte. This CTE collects the data of 3 columns (col1, col2 and col3) from the table TABLE. Finally, we are returning the data of col1 and col3 from our_cte based on some filtering conditions in the where clause.

This code is simple and looks familiar to the traditional SQL queries we write as:

SELECT COL1, COL3
from TABLE
WHERE ...

But this was just an example to understand how CTEs are written syntactically correctly.

Learning CTEs with an Example

Let us first create a table named EMP using the following DDL script:

CREATE TABLE emp(
emp_id INT PRIMARY KEY,
emp_name VARCHAR (50) NOT NULL,
emp_email VARCHAR (50) NOT NULL,
emp_mgr_id int NULL
);

Next, we will insert few sample records as follows:

INSERT INTO EMP VALUES (101, 'SABYASACHI MUKHERJEE', 'sabya@gmail.com', 102);
INSERT INTO EMP VALUES (102, 'DEBSENA SADHUKHAN', 'deb@gmail.com', 103);
INSERT INTO EMP VALUES (103, 'SHRAYASI CHAKRABORTY', 'shrayasi@gmail.com', 104);
INSERT INTO EMP VALUES (104, 'RAHUL PAUL', 'rahul@gmail.com', 105);
INSERT INTO EMP VALUES (105, 'SAURAV KUMAR', 'saurav@gmail.com', 106);
INSERT INTO EMP VALUES (106, 'JAYDIP DUTTA', 'jaydip@gmail.com', 107);
INSERT INTO EMP VALUES (107, 'INDRANIL SOM', 'indranil@gmail.com', 108);
INSERT INTO EMP VALUES (108, 'PARTHA MUKHERJEE', 'partha@gmail.com', NULL);
INSERT INTO EMP VALUES (109, 'SURAJIT BHATTACHARYYA', 'surajit@gmail.com', 110);
INSERT INTO EMP VALUES (110, 'ANUBHAv BASSI', 'anubhav@gmail.com', 113);
INSERT INTO EMP VALUES (111, 'ABHISHEKH UPAMANNYU', 'abhishekh@gmail.com', 110);
INSERT INTO EMP VALUES (113, 'ANUPAM MITTAL', 'anupam@gmail.com', NULL);

Taking a close look at the above table, we see that this employee table has the records of all the employees with their reporting manager id. The manager id in turn is the employee id of the manager, who is also an employee.

Now, suppose we have a requirement to return the entire organization tree starting from Sabyasachi Mukherjee (employee id 101). We need to find out who is the manager of employee id 101 and also who is the manager of that manager and so on until we reach to that employee who has no further reporting manager.

I we run the following query (we will explain this soon), we would get the list of all the employees who are sitting atop the employee with id 101.

WITH EMPLOYEE_CTE(EMP_ID,EMP_NAME,EMP_MGR_ID) AS  
(  
   SELECT EMP_ID, EMP_NAME, EMP_MGR_ID 
   FROM EMP AS EMPLOYEE 
   WHERE EMPLOYEE.EMP_ID=101 
   UNION ALL  
   SELECT MANAGER.EMP_ID, MANAGER.EMP_NAME, MANAGER.EMP_MGR_ID  
   FROM EMP AS MANAGER   
   INNER JOIN EMPLOYEE_CTE MANAGER_ITERATOR 
   ON MANAGER.EMP_ID = MANAGER_ITERATOR.EMP_MGR_ID  
)  
SELECT * FROM EMPLOYEE_CTE;

The result set would look like this:

We can see the entire organization tree of the employee id 101 in ascending order (starting from the junior most to the senior most employee).

Here is an explanation on how the query works. This part of the query helps us to first get the details of the employee having the employee id 101.

SELECT EMP_ID, EMP_NAME, EMP_MGR_ID 
FROM EMP AS EMPLOYEE 
WHERE EMPLOYEE.EMP_ID=101

Once we have the main employee in question selected, we are doing a UNION ALL. Inside the UNION ALL we have this code. The first part retrieves the details of the employee and in the second part, it makes sure that this details are of the manager of the last employee retrieved.

SELECT MANAGER.EMP_ID, MANAGER.EMP_NAME, MANAGER.EMP_MGR_ID  
FROM EMP AS MANAGER   
INNER JOIN EMPLOYEE_CTE MANAGER_ITERATOR 
ON MANAGER.EMP_ID = MANAGER_ITERATOR.EMP_MGR_ID

 

So in a nutshell, the query before UNION ALL clause identifies the employee to start with which is the one having the employee id as 101. And the latter part of the query specifies to identify the manager information of the employees in a loop and keep adding them into the CTE. Finally we can do a select all from the CTE we created.

Now, this was a very simple example of using a CTE. We will now see some more practical scenarios where we will create multiple CTEs at once and use them combined to get the desired output.

Using Multiple CTEs

For this, we will create a new table for a book store where we will have the sales of few books recorded. We can use the following query to create the table:

CREATE TABLE BOOK_STORE(
BRANCH VARCHAR(255) NOT NULL,
DATE DATE NOT NULL,
AUTHOR VARCHAR(255) NOT NULL,
BOOK_NAME VARCHAR(255) NOT NULL,
QTY INT NOT NULL,
UNIT_PRICE INT NOT NULL
);

Next we will insert some records into the table which we will use later to prepare some reports. We can use the following DML script to insert data:

INSERT INTO BOOK_STORE VALUES ('KOLKATA-GARIA', GETDATE()-1000, 'HARSH MITTAL', 'HOW TO SAVE WATER', 10, 250);
INSERT INTO BOOK_STORE VALUES ('LONDON', GETDATE()-100, 'JOHN DOVE', 'MOTIVATIONAL MONDAYS', 8, 320);
INSERT INTO BOOK_STORE VALUES ('SPAIN-BARCELONA', GETDATE()-20, 'JIM CHRISTY', 'TREMENDOUS TUESDAYS', 2, 100);
INSERT INTO BOOK_STORE VALUES ('BAHAMAS', GETDATE()-320, 'KWINTER SHARLOCK', 'WEAKER WEDNESDAYS', 3, 110);
INSERT INTO BOOK_STORE VALUES ('AMERICA', GETDATE()-870, 'GAVIN HURRAY', 'TOUGHER THURSDAYS', 9, 97);
INSERT INTO BOOK_STORE VALUES ('LOS ANGELES', GETDATE()-360, 'ADARSH SMITH', 'FREAKING FRIDAYS', 4, 340);
INSERT INTO BOOK_STORE VALUES ('AFRICA', GETDATE()-420, 'ACROBAT SYMPHONY', 'SATISFYING SATURDAYS', 21, 342);
INSERT INTO BOOK_STORE VALUES ('KOLKATA-HOWRAH', GETDATE()-120, 'HANDRES ANDREW', 'GO GREEN', 15, 20);
INSERT INTO BOOK_STORE VALUES ('SPAIN-MADRID', GETDATE()-20, 'ADAM GILCHRIST', 'GROCERY GAINING', 7, 38);
INSERT INTO BOOK_STORE VALUES ('AMERICA', GETDATE()-670, 'RIDSOM BANSLEY', 'COMPUTER ENGINEERING', 23, 98);

Suppose we want to create a report with the total sales in all Kolkata outlets and we also want the revenue for each specific outlet in Kolkata alongside. So, we create two CTEs and then join them in the main query.

WITH KOLKATA_GARIA_SALES AS (
  SELECT
    SUM(UNIT_PRICE * QTY) AS SALES
  FROM BOOK_STORE
  WHERE BRANCH = 'KOLKATA-GARIA'
),
KOLKATA_HOWRAH_SALES AS (
  SELECT
    SUM(UNIT_PRICE * QTY) AS SALES
  FROM BOOK_STORE
  WHERE BRANCH = 'KOLKATA-HOWRAH'
)
SELECT
  KOLKATA_GARIA_SALES.SALES + KOLKATA_HOWRAH_SALES.SALES AS KOLKATA_SALES,
  KOLKATA_GARIA_SALES.SALES AS KOLKATA_GARIA_BRANCH_SALES,
  KOLKATA_HOWRAH_SALES.SALES AS KOLKATA_HOWRAH_BRANCH_SALES
FROM KOLKATA_GARIA_SALES, KOLKATA_HOWRAH_SALES;

In the query above, we define two CTEs, KOLKATA_GARIA_SALES and KOLKATA_HOWRAH_SALES, to obtain the total sales for each branch based out in Kolkata. Finally, we join both CTEs and calculate the total sales for Kolkata by adding up the sales of the two branches.

The output for the above query is:

We can see such scenarios in real life where we need to get the sales of all the outlets of a city for a specific month and also individual sales of the outlets in the city. In such case, we just need to add one more condition for the month check in the individual CTEs.

Nested CTE

Finally, we will look into some more advanced use of CTEs in our day to day lives. So for this we will have a practical case study again.

We need to generate a report for all the books that are priced over 80 rupees and the quantity of such books sold by AMERICA BARNCH.

WITH OVER_80_BOOKS AS (
  SELECT DISTINCT
    BOOK_NAME,
    UNIT_PRICE
  FROM BOOK_STORE
  WHERE UNIT_PRICE >= 80
),
AMERICA_OVER_80 AS (
  SELECT
    OV80B.BOOK_NAME,
    OV80B.UNIT_PRICE,
    COALESCE(SUM(BS.QTY), 0) AS QUANTITY_SOLD
  FROM OVER_80_BOOKS OV80B
  LEFT JOIN BOOK_STORE BS
  ON OV80B.BOOK_NAME = BS.BOOK_NAME AND BS.BRANCH = 'AMERICA'
  GROUP BY OV80B.BOOK_NAME, OV80B.UNIT_PRICE
)
SELECT BOOK_NAME, UNIT_PRICE, QUANTITY_SOLD
FROM   AMERICA_OVER_80;

The first CTE is OVER_80_BOOKS, which selects all the items priced greater than 80 rupees. The second CTE is AMERICA_OVER_80, which selects the quantity sold by AMERICA outlet for each item included in OVER_80_BOOKS. Note that this query has a nested CTE – the FROM in the second CTE referring to the first. We use LEFT JOIN BOOK_STORE because AMERICA may not have sold every item in OVER_80_BOOKS.

The result of the query is:

Clearly, we can see that all books above the price of 80 rupees are selected but quantity sold only by AMERICA outlet are mentioned while for other books which were never sold by AMERICA outlet is 0.

Conclusion

Common Table Expressions (CTE) are really powerful SQL queries. They can be used to create easily readable queries. They also help us in formatting the resultant data as expected.

Rate

5 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (3)

You rated this post out of 5. Change rating