SQLServerCentral Article

Can You Compute?

,

Introduction

I was reviewing some VB code the other day when I saw that the developer had submitted two queries.

SELECT CT.CustomerTypeDescription 
, Count(*) AS Customers
 FROM Customer AS C
INNER JOIN CustomerType AS CT
ON C.CustomerTypeId = CT.CustomerTypeId
 GROUP BY
CT.CustomerTypeDescription

and

SELECT Count(*) AS Customers
 FROM Customer

Nothing radical there. If the customers table had been large then I would probably have told the developer to write

SELECT MAX(rowcnt) AS Customers
 FROM dbo.sysindexes
 WHERE id = Object_Id( 'dbo.Customers')
 AND indid IN (0,1)

This is just one of the many useful tips I have picked up from www.sqlservercentral.com. However, as I delved deeper into the code I came across more query pairs where the first query produced a subtotal aggregate and the 2nd produced a grand total.

Many of the queries required joins across four or more tables in order to produce their results and therefore the SysIndexes query cannot work. Generally these queries would not be an issue, however if the queries are being run against a database in a high traffic web site you want to save every bit of resources that you can.

A NORTHWIND example

Let us look at a similar situation within the NorthWind database. We want to get data on the number of products bought by country and product category

Northwind relationships for customer country by category sales

The base query might be as follows

SELECT Cust.country ,
       Cat.CategoryName ,
       Count(*) AS ProductsBought
 FROM dbo.Customers AS Cust
   INNER JOIN dbo.Orders AS Ord
      ON Cust.CustomerId = Ord.CustomerId
   INNER JOIN dbo.[Order Details] AS OrdDet
      ON Ord.OrderId = OrdDet.OrderId
   INNER JOIN dbo.Products AS Prod
      ON OrdDet.ProductId = Product.ProductId
   INNER JOIN dbo.Categories AS Cat
      ON Product.CategoryId = Cat.CategoryId
 GROUP BY
   Cust.Country ,
Cat.CategoryName

This has a subtree cost of 0.241. The 2nd query would be as follows

SELECT
 Cust.country ,
 Count(*) AS ProductsBought
 FROM dbo.Customers AS Cust
   INNER JOIN dbo.Orders AS Ord
  ON Cust.CustomerId = Ord.CustomerId
   INNER JOIN dbo.[Order Details] AS OrdDet
     ON Ord.OrderId = OrdDet.OrderId
INNER JOIN dbo.Products AS Prod
  ON OrdDet.ProductId = Product.ProductId
 GROUP BY
   Cust.Country

This has a subtree cost of 0.249. The developer calculated the grand total for themselves in code but they could just as easilty fired off a third query adding another subtree cost of 0.214. So we are looking at a total cost for these queries of 0.490 for the subtotals and 0.704 if we want the grand total as well.

WITH ROLLUP

Fortunately SQL has an option on the GROUP BY clause WITH ROLLUP. If we add this clause to our first Northwind example then the returned recordset shows a number of differences

SELECT
  Cust.country ,
  Cat.CategoryName ,
  Count(*) AS ProductsBought
 FROM dbo.Customers AS Cust
INNER JOIN dbo.Orders AS Ord
ON Cust.CustomerId = Ord.CustomerId
INNER JOIN dbo.[Order Details] AS OrdDet
ON Ord.OrderId = OrdDet.OrderId
INNER JOIN dbo.Products AS Prod
ON OrdDet.ProductId = Product.ProductId
INNER JOIN dbo.Categories AS Cat
ON Product.CategoryId = Cat.CategoryId
 GROUP BY
  Cust.Country ,
  Cat.CategoryName
 WITH ROLLUP

  • Our recordset is now sorted in order of the GROUP BY columns
  • The last record for each country has a NULLcategory with the ProductsBought column holding the total products bought for the country.
  • The final record has a NULL for both Country and CategoryName and the ProductsBought column contains the grand total for all records.

Sample results from the WITH ROLLUP query

And perhaps most important of all the subtree cost for the query is only 0.253.

GROUPING

In our examples so far we have been fortunate in that the data that we are aggregating contains no NULL values. This has meant that all we need to do to get our subtotal values is check for NULL values in the resulting recordset.

Fortunately we have an aggregate function called GROUPING whose purpose is to return a value of one if the row is returned as a result of a ROLLUP or CUBE operation. In all other cases it will return zero. So we alter our example query to the query shown below:

SELECT
  Cust.country ,
  Cat.CategoryName ,
  Count(*) AS ProductsBought,
  GROUPING(Cust.country) AS SubtotalForCountries
  GROUPING(Cat.CategoryName) AS SubtotalForCategories
 FROMdbo.Customers AS Cust
INNER JOIN dbo.Orders AS Ord
ON Cust.CustomerId = Ord.CustomerId
NNER JOIN dbo.[Order Details] AS OrdDet
ON Ord.OrderId = OrdDet.OrderId
INNER JOIN dbo.Products AS Prod
ON OrdDet.ProductId = Product.ProductId
INNER JOIN dbo.Categories AS Cat
ON Product.CategoryId = Cat.CategoryId
 GROUP BY
  Cust.Country ,
  Cat.CategoryName

Our result set will now look like the following.

Sample results from the GROUPING query

WITH CUBE

Changing the WITH ROLLUP to WITH CUBE produces a result set as follows

Sample results from the GROUPING query

As you can see in addition to our subtotals for each country we have also gained subtotals for each category. Again the execution plan for this reveals a subtree cost of 0.288. This is dramatically lower than having to provide the same information with individual queries.

The downsides

There are only two downsides that I have come across.

  1. You cannot use COUNT(DISTINCT columnname) with these GROUP BY options
  2. Programmers may grumble because they have to write code to deal with the totals being included in the result set

With the increasing take up of .NET, however, the programmer has plenty of tools to be able to deal with the 2nd point.

COMPUTE and COMPUTE BY

So far we have seen that we can dramatically cut down the cost of producing multiple aggregates for a GROUP BY query, but what if we want aggregates for a straight SELECT query?

Again SQL contains two options for use with the ORDER BYclause:

  • COMPUTE
  • COMPUTE BY

Using the NorthWind database let us look at a query to provide us with the cost of goods in stock.

SELECT
  Prod.ProductId ,
  Prod.ProductName ,
  Cat.CategoryName ,
  Prod.UnitsInStock * Prod.UnitPrice AS StockValue
 FROM dbo.Products AS Prod
INNER JOIN dbo.Categories AS Cat
ON Product.CategoryId = Cat.CategoryId
 ORDER BY
  Cat.CategoryName
 COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice)

This query will actually produce two recordsets. One for our sorted results and a 2nd recordset containing the grand total stock value.

Sample results from the COMPUTE query

If we wanted to produce a subtotal for every category then we alter our query as follows:

SELECT
  Prod.ProductId ,
  Prod.ProductName ,
  Cat.CategoryName ,
  Prod.UnitsInStock * Prod.UnitPrice AS StockValue
 FROMdbo.Products AS Prod
INNER JOIN dbo.Categories AS Cat
ON Product.CategoryId = Cat.CategoryId
 ORDER BY
  Cat.CategoryName
 COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice)  
   BY Cat.CategoryName

If you run this query you will find that you will get a pair of result sets for every category, one containing our ordered records and the other containing the subtotal for that category.

If we wanted a final grand total as well then we can add a 2nd COMPUTE to our query

SELECT
  Prod.ProductId ,
  Prod.ProductName ,
  Cat.CategoryName ,
  Prod.UnitsInStock * Prod.UnitPrice AS StockValue
 FROM dbo.Products AS Prod
INNER JOIN dbo.Categories AS Cat
ON Product.CategoryId = Cat.CategoryId
 ORDER BY Cat.CategoryName
 COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice) 
   BY Cat.CategoryName
 COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice)

It doesn't matter what order we put our COMPUTE statements, the grand total will always appear as the last recordset.

In this simple example the inclussion of the additional COMPUTE statements had no affect on the execution plan.

Conclusion

The use of WITH ROLLUP, WITH CUBE and GROUPING should not pose too much of a challenge for developers. The use of COMPUTE and COMPUTE BY will need to be handled with care. There is a danger that the system will produce multiple large results sets. In addition your developers must be familiar with accessing multiple results sets. Dealing with multiple recordsets isn't really complicated it is just a slight increase in complexity from the developer's perspective. See http://msdn.microsoft.com for examples.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating