Stairway to T-SQL DML

Stairway to T-SQL DML Level 8: Using the ROLLUP, CUBE and GROUPING SET operator in a GROUP BY Clause

,

In this article I will be expanding on my discussion of the GROUP BY clause, which I started in level 7 of this stairway, by exploring the ROLLUP, CUBE and GROUPING SETS operators. These additional GROUP BY operators make it is easy to have SQL Server create subtotals, grand totals, a superset of subtotals, as well as multiple aggregate groupings in a single SELECT statement. I will explain the differences of each of these different grouping operators and provide you with examples to demonstrate how they work.

When are the ROLLUP, CUBE, and GROUPING SETS operators useful?

The ROLLUP, CUBE and GROUPING SETS operators produce multiple different aggregate groupings. These operators are useful when you want to aggregate your data multiple different ways. If you need to aggregate your data on several different column groupings then you could use one of these three operators along with the GROUP BY clause to create those additional aggregation groupings.

Suppose you wanted to keep track of where you spent your money, not only at the individual transaction level but also at different levels of summarization, or based on different categories of where you spent your money. With a normal GROUP BY clause you can get the total the amount you spent at Target, and Amazon.com on clothes. But what if you also wanted a single SELECT statement to return the total you spent for clothes regardless of where you bought them, as well as summarize your expenses based on additional categories, like food, utilities, etc, Plus what if you also want that single SELECT statement to return the grand total of all of your expenses. To accomplish these additional aggregation groupings you can use the ROLLUP, CUBE or GROUPING SETS operators.

In the sections below I will explain each of these operators in detail but first let me build some sample data.

Sample Data

In order to demonstrate how to use these different GROUP BY operators I will be creating a table that contains some sample data that I can summarize. To provide a little context that we all might be familiar with I will be creating a table that will contain check registry information. My table will be named CheckRegistry and it will contain the basic columns needed to record the information you might have filled out when writing a check. I will be populating my CheckRegistry with multiple rows of data, each of which represents a chec written to a business over a four month period.

If you want to follow along and run the examples I discussed later in the article, you can create my sample CheckRegistry table by running the code in Listing 1.

SET NOCOUNT ON;
USE tempdb;
GO
CREATE TABLE CheckRegistry (
CheckNumber smallint, 
PayTo varchar(20),
Amount money, 
CheckFor varchar(20),
CheckDate date);
INSERT INTO CheckRegistry VALUES
    (1000,'Seven Eleven',12.57,'Food','2011-07-12'),
    (1001,'Costco',128.57,'Clothes','2011-07-15'),
    (1002,'Costco',21.87,'Clothes','2011-07-18'),
    (1003,'AT&T',69.23,'Utilities','2011-07-25'),
    (1004,'Comcast',45.95,'Utilities','2011-07-25'),
    (1005,'Northwest Power',69.18,'Utilities','2011-07-25'),
    (1006,'StockMarket',59.25,'Food','2011-07-25'),
    (1007,'Safeway',120.21,'Food','2011-07-28'),
    (1008,'Albertsons',9.15,'Food','2011-08-02'),
    (1009,'Amazon',158.34,'Clothes','2011-08-05'),
    (1010,'Target',89.21,'Clothes','2011-08-06'),
    (1011,'AT&T',69.23,'Utilities','2011-08-25'),
    (1012,'Comcast',45.95,'Utilities','2011-08-25'),
    (1013,'Nordstrums',259.12,'Clothes','2011-08-27'),
    (1014,'AT&T',69.23,'Utilities','2011-09-25'),
    (1015,'Comcast',45.95,'Utilities','2011-09-25'),
    (1016,'Northwest Power',71.35,'Utilities','2011-09-25'),
    (1017,'Safeway',123.25,'Food','2011-09-25'),
    (1018,'Albertsons',65.11,'Food','2011-09-29'),
    (1019,'McDonalds',12.57,'Food','2011-09-29'),
    (1020,'AT&T',69.23,'Utilities','2011-10-25'),
    (1021,'Comcast',45.95,'Utilities','2011-10-25'),
    (1022,'Black Angus',159.23,'Food','2011-10-25'),
    (1023,'TicketMasters',59.87,'Entertainment','2011-10-30'),
    (1024,'WalMart',25.11,'Clothes','2011-10-31'),
    (1025,'Albertsons',158.50,'Food','2011-10-31');

Listing 1: Create the CheckRegistry Table

Using the ROLLUP operator to Create Subtotals and a Grand Total

The ROLLUP operator can be used as part of the GROUP BY clause to provide subtotals and a grand total while your data is being aggregated. To understand how to use the ROLLUP operator let me show you a few different examples. Let me first start by showing you how to create a grand total of the amounts in our CheckRegistry table, by using the code in Listing 2.

USE tempdb;
GO
SELECT CheckFor
     , SUM (Amount) As Total
FROM CheckRegistry
GROUP BY ROLLUP (CheckFor);             

Listing 2: Generating a Grand Total Row using ROLLUP on the CheckFor column

When the code in Listing 2 is executed, the output in Report 1 is returned.

CheckForTotal
----------
Clothes682.22
Entertainment59.87
Food719.84
Utilities601.25
NULL2063.18

Report 1: Output produced when running code in Listing 2

If you look at the output in Report 1 you can see that SQL Server summarized values in the Amount column for each of the different CheckFor values. Additionally a final row was displayed that contained the grand total of the Amount values for all the checks written. The grand total row can be identified by the NULL in the CheckFor column.

I’ve never liked the NULL being returned for the grand total value when you use the ROLLUP operator. I would prefer that it said “GRAND TOTAL”. To accomplish this you can modify the code from listing 2 slightly, as shown in Listing 3. In Listing 3, I added a COALESCE function on the CheckFor column. The COALESCE function allowed me to replace the NULL that was displayed in in Report 1 with “GRAND TOTAL”. You can run this code for yourself to verify that “GRAND TOTAL” is replacing the NULL.

USE tempdb;
GO
SELECT COALESCE (CheckFor,'GRAND TOTAL') As CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY ROLLUP (CheckFor);

Listing 3: Replacing “NULL” reference with “GRAND TOTAL”

If you include more than one column in the ROLLUP specifications it will create subtotals for each column in the ROLLUP specification along with the grand total. The creation of the subtotal amounts will be done based on aggregating the columns in the ROLLUP specifications from right to left. In other words, if you have a ROLLUP clause that has only two columns, there will be two different sets of subtotals, one for each column in the ROLLUP clause, where the more granular subtotals will be based on the right most column. If you have three columns identified in the ROLLUP specification, then SQL Server will create the first set of subtotal amounts based on the right most column (third column identified in the ROLLUP specification), and then the second set of subtotal will be based on the second column identified in the ROLLUP specification. To better understand the aggregation order from right to left let me go through a couple of examples.

For the first example my goal is to summarize my check registry information by month. I want to know how much I spent every month for clothes, entertainment, food, etc. In addition, I also want a rolled up amount that shows me the total amount I spent each month. I can do this by identifying my two different rollup criteria in the ROLLUP specification. My first criterion will identify what I wrote my checks for, which is specified by using the CheckFor column. The second criterion will identify the month the check was written, which can be calculated using the MONTH function against the CheckDate column.

To get my data rolled up in the correct order, I will need to specify each of my criteria in the correct order. Remember now, data is aggregated using the columns in the ROLLUP specifications in right to left order. Or another way to say that is the column that requires the most granular rollup needs to be specified last in the ROLLUP criteria. In my requirements above I wanted the CheckFor column to be aggregated first. To meet my summarization requirements I can use the code in Listing 4.

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY ROLLUP (MONTH(CheckDate),CheckFor);

Listing 4: Generating Subtotals and Grand Total

When I run the code in Listing 4 I get the results in Report 2:

CheckMonthCheckForTotal
---------------
7Clothes150.44
7Food192.03
7Utilities184.36
7NULL526.83
8Clothes506.67
8Food9.15
8Utilities115.18
8NULL631.00
9Food200.93
9Utilities186.53
9NULL387.46
10Clothes25.11
10Entertainment59.87
10Food317.73
10Utilities115.18
10NULL517.89
NULLNULL2063.18

Report 2: Output from running Listing 4

In the output that is displayed in Report 2 you can see that for each CheckMonth there is an aggregated amount displayed for each of the different CheckFor values, which is found in the Total column. These values were created by specifying CheckFor as the right most criterion in the ROLLUP specification. Also included in the results is a row for every CheckMonth value that has a value of NULL in the CheckFor column. This monthly value is the total amount of checks written for that month, and was created by specifying MONTH(CheckDate) as the left most column in the ROLLUP criteria. If you look the last row in the results you can see a row that has NULL in both the CheckMonth and CheckFor columns. This is the row that contains the grand total for the Amount column in my CheckRegistry table.

If you specify more than two columns in the ROLLUP specification SQL Server will create subtotals for each of the additional columns. In Listing 5 is an example that creates two different subtotal amounts by specifying three columns in the ROLLUP specification.

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , PayTo
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY ROLLUP (MONTH(CheckDate),CheckFor,PayTo);

Listing 5: Creating Multiple Subtotal rows

I’ll leave it up to you to run this code and review the output. When you do that you should see subtotals for both the Checkfor and CheckMonth columns.

Using the CUBE Operator to create a Superset of Aggregated Values

The name CUBE comes from concept of aggregating data along N dimensions of data, which when aggregating data across three dimensions just happens to be represented by a CUBE. If you would like to find out more about the research Microsoft did and the concepts behind the CUBE operator there is a great research paper you can read that can be found here: http://research.microsoft.com/apps/pubs/default.aspx?id=69578.

The CUBE operator like the ROLLUP operator creates subtotals and grand totals amounts, but it also creates aggregates across all the different columns identified in the CUBE specifications or what I will call a superset of aggregated values. Therefore the number of possible summarized values could be substantial when you consider that the superset would contain summarized values for all the permutations of the columns involved in the CUBE specification. To demonstrate how this works let me go through a couple examples, starting with the code in Listing 6.

USE tempdb;
GO
SELECT CheckFor
     , SUM (Amount) As Total
FROM CheckRegistry
GROUP BY CUBE(CheckFor);             

Listing 6: Creating SubTotals and GrandTotal using Cube Specification

If you run the code in Listing 6 you will see that the results produced will contain aggregated values for the Amount column for each of the CheckFor values. Additionally, the results will contain a grand total amount for all the Amount values. The results are the same as the results produced by Listing 2 using ROLLUP and shown in Report 1. This is because there is only a single value referenced in the CUBE specification, so SQL Server only had to aggregate values based on a single column.

To better demonstrate how the CUBE operator creates aggregates for every permutation of the columns referenced in the CUBE specifications let me run the code in Listing 7.

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY CUBE (MONTH(CheckDate),CheckFor);

Listing 7: Using two columns in the CUBE Specification

This code will create aggregates for all the permutations of MONTH(CheckDate) and CheckFor values. That means this code will produce summarized values for the following permutations:

  • CheckFor
  • CheckFor and MONTH(CheckDate)
  • MONTH(CheckDate)

This can be see these multiple aggregations if you review the result set in Report 3, which was produced when I ran the code in Listing 7.

CheckMonthCheckForTotal
---------------
7Clothes150.44
8Clothes506.67
10Clothes25.11
NULLClothes682.22
10Entertainment59.87
NULLEntertainment59.87
7Food192.03
8Food9.15
9Food200.93
10Food317.73
NULLFood719.84
7Utilities184.36
8Utilities115.18
9Utilities186.53
10Utilities115.18
NULLUtilities601.25
NULLNULL2063.18
7NULL526.83
8NULL631.00
9NULL387.46
10NULL517.89

Report 3: Results when using two columns in CUBE specifications

In the results in Report 3, you can see that by using the CUBE specification SQL Server aggregated each CheckFor value by month and then calculated a subtotal for each CheckFor value. Additionally SQL Server calculating the grand total (row with NULL in both the CheckMonth and CheckFor columns). The last few rows in the report show monthly subtotal aggregations for each CheckMonth.

If you start adding additional columns to the CUBE specification, the number of aggregations that SQL Server will created will grow substantially. This is because every combination of the columns in the CUBE specification will be aggregated. Run the code in Listing 8 to see how many summarized values will be produced when using a CUBE specification with three columns.

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , PayTo
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY CUBE (MONTH(CheckDate),CheckFor, PayTo);

Listing 8: CUBE specification using three columns

Using the CUBE specification is very useful when you want to summarize data in a Data Warehouse situation. The CUBE specification provides an easy mechanism to create subtotals across many different combinations of dimensions.

Creating Multiple Aggregated Groupings using the GROUPING SETS Operator

Using the GROUPING SETS operator with a GROUP BY clause allows you to create a record set that will aggregate your data multiple different ways. The GROUPING SETS specification allows you an alternative to writing a UNION ALL query where each SELECT statement is grouping data on a different column. The number of different aggregates is based on the number of columns, or sets of columns provided in the GROUPING SETS specification. Let me demonstrate this by first showing you a GROUPING SETS query (Listing 9) and then the equivalent UNION ALL query (Listing 10).

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY GROUPING SETS (MONTH(CheckDate),CheckFor);

Listing 9: GROUPING SETS query

When I run my GROUPING SETS query in Listing 9, I get the result set show in Report 4:

CheckMonthCheckForTotal
---------------
NULLClothes682.22
NULLEntertainment59.87
NULLFood719.84
NULLUtilities601.25
7NULL526.83
8NULL631.00
9NULL387.46
10NULL517.89

Report 4: Result set produced with running code in Listing 9.

By looking at Report 4, you can see that the first 4 rows of output have aggregates create by grouping my sample data based on the CheckFor column, and then the last 4 rows are aggregated values based on the calculated CheckMonth column. My GROUPING SETS query in Listing 9 is equivalent to running the UNION ALL query in Listing 10.

USE tempdb;
GO
SELECT NULL AS CheckMonth
     , CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY CheckFor 
UNION ALL
SELECT MONTH(CheckDate) AS CheckMonth
     , NULL as CheckFor  
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY MONTH(CheckDate);

Listing 10: UNION ALL query equivalent to Listing 9

The GROUPING SETS operator can also be used to create similar result sets produced by ROLLUP and CUBE operators. To learn more about this refer the books online topic “GROUPING SETS Equivalents”.

You can use the GROUPING SETS operator to aggregate values based on more than two columns. You do this by putting the multiple columns you want to group by inside of parentheses. The code in Listing 11 demonstrates this:

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , PayTo
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY GROUPING SETS ((MONTH(CheckDate),CheckFor), PayTo);

Listing 11: Aggregating on multiple columns

I am creating two different aggregated sets in the code in Listing 11. The first aggregated set will be based based on PayTo column, and the second aggregated set will be based on the combination of the month the check was written and the CheckFor column.

You can even create a grand total amount by introducing the empty set “()” representation into the GROUPING SET specifications. If you run the code in Listing 12 you will see that one extra row is created in addition the rows shown in Report 3. This additional grand total row will have a NULL value identified for both the CheckMonth and CheckFor columns.

USE tempdb;
GO
SELECT MONTH(CheckDate) AS CheckMonth 
     , CheckFor
     , SUM (Amount) AS Total
FROM CheckRegistry
GROUP BY GROUPING SETS (MONTH(CheckDate),CheckFor,());

Listing 12: Creating a Grand Total row using the empty set in the GROUPING SETS specification

If you want to produce multiple aggregated groupings with minimal code then the GROUPING SETS operator is a way to accomplish this.

More than One Way to Aggregate Data Using ROLLUP, CUBE and GROUPING SETS

As in the old saying “There is more than one way to skin a cat”, in T-SQL there is also more than one way to aggregate data. In addition to using the simple GROUP BY clause that was introduced in level 7 of this stairway, you can also use the ROLLUP, CUBE and GROUPING SETS specification to create subtotal, grand total, and a superset of aggregated values. These additional GROUP BY operators are very useful in creating different aggregations to meet your application needs.

This article is part of the parent stairway Stairway to T-SQL DML

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating