SQLServerCentral Article

Pivot table for Microsoft SQL Server

,

One of the most requested pieces of code on the Internet forums about SQL Server these days, is the code for making a crosstab query. There was no native support for that in version 6.5. Not even version 7.0 or version 2000 had this support.

So when Microsoft announced that SQL 2005 would support crosstab queries we all (well, at least me) cheered and anticipated that this should solve a number of difficulties. Many of us have worked with MS Access since version 2.0, and in this application, pivot tables are breazes.

What happened after the release of SQL 2005? When we all were excited about the new Pivot operator, equally disappointed we got when we realized we had to hardwire the columns, just as we always had done, by using keyword CASE.

-- SQL 2005 syntax
USE MyDatabase
GO
WITH ProductSales(ProductID, OrderYear, OrderTotal)
 AS (
      SELECT
         det.productID,
         YEAR(hdr.orderdate),
det.linetotal
      FROM sales.salesorderdetail det
  JOIN sales.salesorderheader hdr
        ON det.salesorderid = hdr.salesorderid
)
SELECT
    ProductSalesPivot.productID,
 Total_Sales_2001 = ISNULL([2001], 0),
    Total_Sales_2002 = ISNULL([2002], 0),
    Total_Sales_2003 = ISNULL([2003], 0),
    Total_Sales_2004 = ISNULL([2004], 0)
 FROM ProductSales
  PIVOT ( SUM(OrderTotal)
 FOR OrderYear IN ([2001], [2002], [2003], [2004])
) AS ProductSalesPivot
ORDER BY
 ProductSalesPivot.ProductID
-- SQL 2000 syntax
USE MyDatabase
SELECT     OfficeName,
  SUM(CASE WHEN MONTH(OrderDate) = 1 THEN OrderAmount ELSE 0 END) AS 'January',
  SUM(CASE WHEN MONTH(OrderDate) = 2 THEN OrderAmount ELSE 0 END) AS 'February',
  SUM(CASE WHEN MONTH(OrderDate) = 3 THEN OrderAmount ELSE 0 END) AS 'March',
  SUM(CASE WHEN MONTH(OrderDate) = 4 THEN OrderAmount ELSE 0 END) AS 'April',
  SUM(CASE WHEN MONTH(OrderDate) = 5 THEN OrderAmount ELSE 0 END) AS 'May',
  SUM(CASE WHEN MONTH(OrderDate) = 6 THEN OrderAmount ELSE 0 END) AS 'June',
  SUM(CASE WHEN MONTH(OrderDate) = 7 THEN OrderAmount ELSE 0 END) AS 'July',
  SUM(CASE WHEN MONTH(OrderDate) = 8 THEN OrderAmount ELSE 0 END) AS 'August',
  SUM(CASE WHEN MONTH(OrderDate) = 9 THEN OrderAmount ELSE 0 END) AS 'September',
  SUM(CASE WHEN MONTH(OrderDate) = 10 THEN OrderAmount ELSE 0 END) AS 'October',
  SUM(CASE WHEN MONTH(OrderDate) = 11 THEN OrderAmount ELSE 0 END) AS 'November',
  SUM(CASE WHEN MONTH(OrderDate) = 12 THEN OrderAmount ELSE 0 END) AS 'December'
FROM        MyTable
WHERE      YEAR(OrderDate) = 2006
GROUP BY    OfficeName
ORDER BY    OfficeName

This works fine, as long as the developer knows how many columns there will be, as in the example above. But how to solve a problem when the columns are not known? See following question.

USE Pubs
SELECT 
   Stores.stor_name StoreName,
   Titles.title BookTitle,
   Titles.type BookCategory,
   Sales.qty Books
FROM   Sales
INNER JOIN  Stores ON Stores.stor_id = Sales.stor_id
INNER JOIN  Titles ON Titles.title_id = Sales.title_id

The resultset looks like something the following table

StoreName

BookTitle

BookCategory

Books

Eric the Read Books

The Busy Executive's Database Guide

business   

5

Barnum's

Secrets of Silicon Valley

popular_comp

50

News &

Brews

Fifty Years

in Buckingham Palace Kitchens

trad_cook  

20

Fricative

Bookshop

Sushi,

Anyone?

trad_cook  

20

Doc-U-Mat:

Quality Laundry and Books

Emotional

Security: A New Algorithm

psychology 

25

Fricative

Bookshop

Straight

Talk About Computers

business   

15

What will you do when your end users asks you for a report based on the number of sold books for all stores and categories? The report should be in a crosstab format, where the rows represent the stores and the columns represents the categories, such as the following table.

StoreName

business

Mod_cook

popular_comp

psychology

trad_cook

Barnum's

0

0

50

75

0

Bookbeat

35

15

30

0

0

Eric the Read Books

5

0

0

3

0

Fricative Bookshop

50

10

0

0

0

News & Brews

0

0

0

10

80

The challenge here is that you don’t know the number of columns.

I will now share my code with you, with an explanation how it works. It will work on all versions on Microsoft SQL Server, starting from version 6.5, 7.0, 2000 and 2005.

First we must fetch the data we want to work with. Create a temporary table #Aggregates where we will store the minimum dataset representing the final pivot table. The sizes of the ColumnText and RowText field should be set to a size matching the fields in the source table.

CREATE TABLE #Aggregates
( RowText VARCHAR(50),
  ColumnText VARCHAR(50),
  CellData INT
)

These are a few optimization techniques we can use. First, pre-aggregate all data. Second, filter out data where sum is 0. Remember, always keep the dataset as small as possible.

INSERT INTO #Aggregates
(  RowText,
   ColumnText,
   CellData
)
SELECT
   Table1.A,
   Table1.B,
   SUM(Table1.C)
 FROM Table1
  --INNER JOIN Table2 ON Table2.ID = Table1.ID
  --WHERE Table1.D BETWEEN '10/12/2005' AND '6/14/2006'
 GROUP BY  Table1.A,
           Table1.B
 HAVING      SUM(Table1.C) <> 0

We can speed things up a little by creating a covering index on the fetched aggregated data. In this case we have COUNT as aggregate function. If you want COUNT, MIN, MAX or another function, it is

ok.

CREATE UNIQUE INDEX IX_Aggregates ON #Aggregates (RowText, ColumnText, CellData)

Now we have stored the data to work with. All possible column, rows and cells are stored in #Aggregates.

Now we want to create the columns for the report. The columns should be sorted alphabetically. The size of the ColumnText field should be set to a size matching the field in the source table.

CREATE TABLE #Columns
(  ColumnIndex INT IDENTITY (0, 1),
   ColumnText VARCHAR(50)
)

Note that we don’t have to select the columns from source table again. Just use the pre aggregated

table!

INSERT INTO #Columns
    (
     ColumnText
    )
SELECT DISTINCT ColumnText
FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK)
ORDER BY  ColumnText

Even here we can speed things up a little by creating an other covering index on the columns.

CREATE UNIQUE INDEX IX_Columns ON #Columns (ColumnIndex, ColumnText)

Now create the rows for the report. There is yet no need to sort the rows. Also here, the size of the RowText field should be set to a size matching the field in the source table

CREATE TABLE #Rows
 (
  RowText VARCHAR(50)
 )
 

Again, select the columns from the pre aggregated data, not the source table.

INSERT INTO     #Rows
 (    RowText
)
SELECT DISTINCT RowText
FROM  #Aggregates (INDEX(IX_Aggregates), NOLOCK)

Again, we use a third covering index on the rows.

CREATE UNIQUE INDEX IX_Rows ON #Rows (RowText)

The algorithm for accomplish the pivot task is quite simple. Iterate through all columns, add them to the existing rows and append the data. What could be more simple? Well, we have to resort to dynamic SQL.

We will now work with the #Rows temporary table and dynamically add the columns.

Declare four variables. @ColumnIndex is the counter for the loop, @MaxColumnIndex is the number of columns retrieved, @ColumnText is for the name of each column and @SQL is the dynamically built string to execute for each column.

DECLARE    @ColumnIndex INT,
           @MaxColumnIndex INT,
           @ColumnText VARCHAR(50),
           @SQL VARCHAR(1000)

Initialize the variables with appropriate values.

SELECT      @ColumnIndex = 0,
            @MaxColumnIndex = MAX(ColumnIndex)
 FROM        #Columns
 

So far so good. Now, do a loop through all columns from 0 to MAX, as indicated by the IDENTITY column ColumnIndex in #Columns.

WHILE @ColumnIndex <= @MaxColumnIndex
   BEGIN

Now we must fetch every column name, one at a time.

      SELECT     @ColumnText = ColumnText
      FROM       #Columns
      WHERE      ColumnIndex = @ColumnIndex

The task now is to build a DDL query for adding the new column and execute it.

     

SELECT     @SQL = 'ALTER TABLE #Rows ADD ' + QUOTENAME(@ColumnText) + ' INT DEFAULT 0'

EXEC       (@SQL)

Now we want to update the newly added column, matching each row from the pre aggregated table. Build a query as follows. What this query does, is that it updated the newly added column in the #Rows table with the matching data from the pre aggregated table #Aggregates. Match RowText with RowText and ColumnText with ColumnText.

Well, here are some matters to consider. What if the column name contains an apostrophe or a bracket? We all know how hard it is to keep track of all apostrophes when building a dynamic string. There must be a workaround, right?

Normally we would use

SELECT @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) 
   + ' = 1 FROM #Aggregates WHERE #Rows.RowText = ''' + #Aggregates.RowText + ''''

Even we are not safe if the column name contains a bracket.

Yes, there is a workaround! We know which ColumnIndex we are working with. Use the ColumnIndex instead and you never have to worry about enough apostrophes.

      SELECT     @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) 
          + ' = #Aggregates.CellData FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK)
     , #Columns (INDEX(IX_Columns),
 NOLOCK) WHERE #Rows.RowText = #Aggregates.RowText 
 AND #Columns.ColumnText = #Aggregates.ColumnText 
 AND #Columns.ColumnIndex = ' + CAST(@ColumnIndex AS VARCHAR(12))
      EXEC        (@SQL)

For looping through all columns, we have to increment the column index variable like this.

      SELECT  @ColumnIndex = @ColumnIndex + 1

And finally, close the loop.

   END

Do a clean up. Drop the #Columns table and the #Aggregates table.

DROP TABLE  #Columns
DROP TABLE  #Aggregates

Return the pivot table with following statement.

SELECT     #Rows.*
FROM       #Rows
ORDER BY    #Rows.RowText

Do another clean up. Drop the #Rows table.

DROP TABLE  #Rows

------------------------------------------------------------------------------------------

A real world example follows.

------------------------------------------------------------------------------------------

CREATE PROCEDURE uspSalesOfficeDepartmentPivot
(
   @FromDate DATETIME,
   @ToDate DATETIME
)
AS
SET NOCOUNT ON
CREATE TABLE #Aggregates
             (
              RowText VARCHAR(50),
              ColumnText VARCHAR(50),
              CellData MONEY
             )
INSERT INTO #Aggregates
            (
             RowText,
             ColumnText,
             CellData
            )
SELECT      Sales.Office,
            Sales.Department,
            SUM(Invoices.Amount)
FROM        Sales
INNER JOIN  Invoices ON Sales.OfficeID = Invoices.OfficeID
WHERE       Invoices.OrderDate BETWEEN @FromDate AND @ToDate
GROUP BY    Sales.Office,
            Sales.Department
HAVING      SUM(Invoices.Amount) <> 0
CREATE UNIQUE INDEX IX_Aggregates ON #Aggregates (RowText, ColumnText, CellData)
 
CREATE TABLE #Columns
            (
             ColumnIndex INT IDENTITY (0, 1),
             ColumnText VARCHAR(50)
             )
 
INSERT INTO     #Columns
                (
                 ColumnText
                )
SELECT DISTINCT ColumnText
FROM           #Aggregates (INDEX(IX_Aggregates), NOLOCK)
ORDER BY       ColumnText
 
CREATE UNIQUE INDEX IX_Columns ON #Columns (ColumnIndex, ColumnText)
 
CREATE TABLE #Rows 
             (
              RowText VARCHAR(50)
             )
 
INSERT INTO    #Rows
                (
                 RowText
                )
SELECT DISTINCT RowText
FROM            #Aggregates (INDEX(IX_Aggregates), NOLOCK)
 
CREATE UNIQUE INDEX IX_Rows ON #Rows (RowText)
 
DECLARE     @ColumnIndex INT,
            @MaxColumnIndex INT,
            @ColumnText VARCHAR(50),
            @SQL VARCHAR(1000)
 
SELECT      @ColumnIndex = 0,
            @MaxColumnIndex = MAX(ColumnIndex)
FROM        #Columns
 
WHILE @ColumnIndex <= @MaxColumnIndex
   BEGIN
      SELECT     @ColumnText = ColumnText
      FROM       #Columns
      WHERE      ColumnIndex = @ColumnIndex
 
      SELECT     @SQL = 'ALTER TABLE #Rows ADD ' + QUOTENAME(@ColumnText) + ' INT NULL DEFAULT 0'
      EXEC       (@SQL)
 
      SELECT     @SQL = 'UPDATE #Rows SET ' + QUOTENAME(@ColumnText) + ' = #Aggregates.CellData
                         FROM #Aggregates (INDEX(IX_Aggregates), NOLOCK)
    , #Columns (INDEX(IX_Columns), NOLOCK) 
 WHERE #Rows.RowText = #Aggregates.RowText 
 AND #Columns.ColumnText = #Aggregates.ColumnText 
 AND #Columns.ColumnIndex = ' + CAST(@ColumnIndex AS VARCHAR(12))
      EXEC       (@SQL)
 
      SELECT  @ColumnIndex = @ColumnIndex + 1 
   END
DROP TABLE #Columns
DROP TABLE #Aggregates
SELECT     #Rows.*
FROM        #Rows
ORDER BY   #Rows.RowText
DROP TABLE  #Rows

------------------------------------------------------------------------------------------

The code above is one of the stored procedures I have on my production system. It calculates the sum of amounts for all invoices that are ordered between the two supplied dates. The two dates can be a specific month, or a year or even 10 years!

Peter Larsson

Helsingborg, Sweden

Rate

4.68 (22)

You rated this post out of 5. Change rating

Share

Share

Rate

4.68 (22)

You rated this post out of 5. Change rating