One of the most requested pieces of code on the Internet forums about SQL Server these days, is the code for making a crosstab query. There was no native support for that in version 6.5. Not even version 7.0 or version 2000 had this support.
So when Microsoft announced that SQL 2005 would support crosstab queries we all (well, at least me) cheered and anticipated that this should solve a number of difficulties. Many of us have worked with MS Access since version 2.0, and in this application, pivot tables are breazes.
What happened after the release of SQL 2005? When we all were excited about the new Pivot operator, equally disappointed we got when we realized we had to hardwire the columns, just as we always had done, by using keyword CASE.
-- SQL 2005 syntax USE MyDatabase GO WITH ProductSales(ProductID, OrderYear, OrderTotal) AS ( SELECT det.productID, YEAR(hdr.orderdate), det.linetotal FROM sales.salesorderdetail det JOIN sales.salesorderheader hdr ON det.salesorderid = hdr.salesorderid ) SELECT ProductSalesPivot.productID, Total_Sales_2001 = ISNULL([2001], 0), Total_Sales_2002 = ISNULL([2002], 0), Total_Sales_2003 = ISNULL([2003], 0), Total_Sales_2004 = ISNULL([2004], 0) FROM ProductSales PIVOT ( SUM(OrderTotal) FOR OrderYear IN ([2001], [2002], [2003], [2004]) ) AS ProductSalesPivot ORDER BY ProductSalesPivot.ProductID -- SQL 2000 syntax USE MyDatabase SELECT OfficeName, SUM(CASE WHEN MONTH(OrderDate) = 1 THEN OrderAmount ELSE 0 END) AS 'January', SUM(CASE WHEN MONTH(OrderDate) = 2 THEN OrderAmount ELSE 0 END) AS 'February', SUM(CASE WHEN MONTH(OrderDate) = 3 THEN OrderAmount ELSE 0 END) AS 'March', SUM(CASE WHEN MONTH(OrderDate) = 4 THEN OrderAmount ELSE 0 END) AS 'April', SUM(CASE WHEN MONTH(OrderDate) = 5 THEN OrderAmount ELSE 0 END) AS 'May', SUM(CASE WHEN MONTH(OrderDate) = 6 THEN OrderAmount ELSE 0 END) AS 'June', SUM(CASE WHEN MONTH(OrderDate) = 7 THEN OrderAmount ELSE 0 END) AS 'July', SUM(CASE WHEN MONTH(OrderDate) = 8 THEN OrderAmount ELSE 0 END) AS 'August', SUM(CASE WHEN MONTH(OrderDate) = 9 THEN OrderAmount ELSE 0 END) AS 'September', SUM(CASE WHEN MONTH(OrderDate) = 10 THEN OrderAmount ELSE 0 END) AS 'October', SUM(CASE WHEN MONTH(OrderDate) = 11 THEN OrderAmount ELSE 0 END) AS 'November', SUM(CASE WHEN MONTH(OrderDate) = 12 THEN OrderAmount ELSE 0 END) AS 'December' FROM MyTable WHERE YEAR(OrderDate) = 2006 GROUP BY OfficeName ORDER BY OfficeName
This works fine, as long as the developer knows how many columns there will be, as in the example above. But how to solve a problem when the columns are not known? See following question.
USE Pubs SELECT Stores.stor_name StoreName, Titles.title BookTitle, Titles.type BookCategory, Sales.qty Books FROM Sales INNER JOIN Stores ON Stores.stor_id = Sales.stor_id INNER JOIN Titles ON Titles.title_id = Sales.title_id
The resultset looks like something the following table
StoreName | BookTitle | BookCategory | Books |
Eric the Read Books | The Busy Executive's Database Guide | business | 5 |
Barnum's | Secrets of Silicon Valley | popular_comp | 50 |
News & Brews | Fifty Years in Buckingham Palace Kitchens | trad_cook | 20 |
Fricative Bookshop | Sushi, Anyone? | trad_cook | 20 |
Doc-U-Mat: Quality Laundry and Books | Emotional Security: A New Algorithm | psychology | 25 |
Fricative Bookshop | Straight Talk About Computers | business | 15 |
What will you do when your end users asks you for a report based on the number of sold books for all stores and categories? The report should be in a crosstab format, where the rows represent the stores and the columns represents the categories, such as the following table.
StoreName | business | Mod_cook | popular_comp | psychology | trad_cook |
Barnum's | 0 | 0 | 50 | 75 | 0 |
Bookbeat | 35 | 15 | 30 | 0 | 0 |
Eric the Read Books | 5 | 0 | 0 | 3 | 0 |
Fricative Bookshop | 50 | 10 | 0 | 0 | 0 |
News & Brews | 0 | 0 | 0 | 10 | 80 |
The challenge here is that you don’t know the number of columns.
I will now share my code with you, with an explanation how it works. It will work on all versions on Microsoft SQL Server, starting from version 6.5, 7.0, 2000 and 2005.
First we must fetch the data we want to work with. Create a temporary table #Aggregates where we will store the minimum dataset representing the final pivot table. The sizes of the ColumnText and RowText field should be set to a size matching the fields in the source table.
CREATE TABLE #Aggregates ( RowText VARCHAR(50), ColumnText VARCHAR(50), CellData INT )
These are a few optimization techniques we can use. First, pre-aggregate all data. Second, filter out data where sum is 0. Remember, always keep the dataset as small as possible.
INSERT INTO #Aggregates ( RowText, ColumnText, CellData ) SELECT Table1.A, Table1.B, SUM(Table1.C) FROM Table1 --INNER JOIN Table2 ON Table2.ID = Table1.ID --WHERE Table1.D BETWEEN '10/12/2005' AND '6/14/2006' GROUP BY Table1.A, Table1.B HAVING SUM(Table1.C) <> 0
We can speed things up a little by creating a covering index on the fetched aggregated data. In this case we have COUNT as aggregate function. If you want COUNT, MIN, MAX or another function, it is
ok.
CREATE UNIQUE INDEX IX_Aggregates ON #Aggregates (RowText, ColumnText, CellData)
Now we have stored the data to work with. All possible column, rows and cells are stored in #Aggregates.
Now we want to create the columns for the report. The columns should be sorted alphabetically. The size of the ColumnText field should be set to a size matching the field in the source table.
CREATE TABLE #Columns ( ColumnIndex INT IDENTITY (0, 1), ColumnText VARCHAR(50) )
Note that we don’t have to select the columns from source table again. Just use the pre aggregated
table!
INSERT INTO #Columns ( ColumnText ) SELECT DISTINCT ColumnText FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK) ORDER BY ColumnText
Even here we can speed things up a little by creating an other covering index on the columns.
CREATE UNIQUE INDEX IX_Columns ON #Columns (ColumnIndex, ColumnText)
Now create the rows for the report. There is yet no need to sort the rows. Also here, the size of the RowText field should be set to a size matching the field in the source table
CREATE TABLE #Rows ( RowText VARCHAR(50) )
Again, select the columns from the pre aggregated data, not the source table.
INSERT INTO #Rows ( RowText ) SELECT DISTINCT RowText FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK)
Again, we use a third covering index on the rows.
CREATE UNIQUE INDEX IX_Rows ON #Rows (RowText)
The algorithm for accomplish the pivot task is quite simple. Iterate through all columns, add them to the existing rows and append the data. What could be more simple? Well, we have to resort to dynamic SQL.
We will now work with the #Rows temporary table and dynamically add the columns.
Declare four variables. @ColumnIndex is the counter for the loop, @MaxColumnIndex is the number of columns retrieved, @ColumnText is for the name of each column and @SQL is the dynamically built string to execute for each column.
DECLARE @ColumnIndex INT, @MaxColumnIndex INT, @ColumnText VARCHAR(50), @SQL VARCHAR(1000)
Initialize the variables with appropriate values.
SELECT @ColumnIndex = 0, @MaxColumnIndex = MAX(ColumnIndex) FROM #Columns
So far so good. Now, do a loop through all columns from 0 to MAX, as indicated by the IDENTITY column ColumnIndex in #Columns.
WHILE @ColumnIndex <= @MaxColumnIndex BEGIN
Now we must fetch every column name, one at a time.
SELECT @ColumnText = ColumnText FROM #Columns WHERE ColumnIndex = @ColumnIndex
The task now is to build a DDL query for adding the new column and execute it.
SELECT @SQL = 'ALTER TABLE #Rows ADD ' + QUOTENAME(@ColumnText) + ' INT DEFAULT 0'
EXEC (@SQL)
Now we want to update the newly added column, matching each row from the pre aggregated table. Build a query as follows. What this query does, is that it updated the newly added column in the #Rows table with the matching data from the pre aggregated table #Aggregates. Match RowText with RowText and ColumnText with ColumnText.
Well, here are some matters to consider. What if the column name contains an apostrophe or a bracket? We all know how hard it is to keep track of all apostrophes when building a dynamic string. There must be a workaround, right?
Normally we would use
SELECT @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) + ' = 1 FROM #Aggregates WHERE #Rows.RowText = ''' + #Aggregates.RowText + ''''Even we are not safe if the column name contains a bracket.
Yes, there is a workaround! We know which ColumnIndex we are working with. Use the ColumnIndex instead and you never have to worry about enough apostrophes.
SELECT @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) + ' = #Aggregates.CellData FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK) , #Columns (INDEX(IX_Columns), NOLOCK) WHERE #Rows.RowText = #Aggregates.RowText AND #Columns.ColumnText = #Aggregates.ColumnText AND #Columns.ColumnIndex = ' + CAST(@ColumnIndex AS VARCHAR(12)) EXEC (@SQL)For looping through all columns, we have to increment the column index variable like this.
SELECT @ColumnIndex = @ColumnIndex + 1And finally, close the loop.
ENDDo a clean up. Drop the #Columns table and the #Aggregates table.
DROP TABLE #Columns DROP TABLE #AggregatesReturn the pivot table with following statement.
SELECT #Rows.* FROM #Rows ORDER BY #Rows.RowTextDo another clean up. Drop the #Rows table.
DROP TABLE #Rows------------------------------------------------------------------------------------------
A real world example follows.
------------------------------------------------------------------------------------------
CREATE PROCEDURE uspSalesOfficeDepartmentPivot ( @FromDate DATETIME, @ToDate DATETIME ) AS SET NOCOUNT ON CREATE TABLE #Aggregates ( RowText VARCHAR(50), ColumnText VARCHAR(50), CellData MONEY ) INSERT INTO #Aggregates ( RowText, ColumnText, CellData ) SELECT Sales.Office, Sales.Department, SUM(Invoices.Amount) FROM Sales INNER JOIN Invoices ON Sales.OfficeID = Invoices.OfficeID WHERE Invoices.OrderDate BETWEEN @FromDate AND @ToDate GROUP BY Sales.Office, Sales.Department HAVING SUM(Invoices.Amount) <> 0 CREATE UNIQUE INDEX IX_Aggregates ON #Aggregates (RowText, ColumnText, CellData) CREATE TABLE #Columns ( ColumnIndex INT IDENTITY (0, 1), ColumnText VARCHAR(50) ) INSERT INTO #Columns ( ColumnText ) SELECT DISTINCT ColumnText FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK) ORDER BY ColumnText CREATE UNIQUE INDEX IX_Columns ON #Columns (ColumnIndex, ColumnText) CREATE TABLE #Rows ( RowText VARCHAR(50) ) INSERT INTO #Rows ( RowText ) SELECT DISTINCT RowText FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK) CREATE UNIQUE INDEX IX_Rows ON #Rows (RowText) DECLARE @ColumnIndex INT, @MaxColumnIndex INT, @ColumnText VARCHAR(50), @SQL VARCHAR(1000) SELECT @ColumnIndex = 0, @MaxColumnIndex = MAX(ColumnIndex) FROM #Columns WHILE @ColumnIndex <= @MaxColumnIndex BEGIN SELECT @ColumnText = ColumnText FROM #Columns WHERE ColumnIndex = @ColumnIndex SELECT @SQL = 'ALTER TABLE #Rows ADD ' + QUOTENAME(@ColumnText) + ' INT NULL DEFAULT 0' EXEC (@SQL) SELECT @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) + ' = #Aggregates.CellData FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK) , #Columns (INDEX(IX_Columns), NOLOCK) WHERE #Rows.RowText = #Aggregates.RowText AND #Columns.ColumnText = #Aggregates.ColumnText AND #Columns.ColumnIndex = ' + CAST(@ColumnIndex AS VARCHAR(12)) EXEC (@SQL) SELECT @ColumnIndex = @ColumnIndex + 1 END DROP TABLE #Columns DROP TABLE #Aggregates SELECT #Rows.* FROM #Rows ORDER BY #Rows.RowText DROP TABLE #Rows------------------------------------------------------------------------------------------
The code above is one of the stored procedures I have on my production system. It calculates the sum of amounts for all invoices that are ordered between the two supplied dates. The two dates can be a specific month, or a year or even 10 years!
Peter Larsson
Helsingborg, Sweden