Stairway to Advanced T-SQL

Stairway to Advanced T-SQL Level 5: Turning Data On Its Side Using PIVOT Operator

,

The PIVOT operator was added to the Transact-SQL language in SQL Server 2005. It allows you to turn your row data on its side, so it can be presented as column data. This is useful when you want take unique column values and have them displayed as column headings, where the column headings are associated with summarized values displayed below each column heading. In this article I will be exploring how to use the PIVOT operator.

Syntax of PIVOT operator

The PIVOT operator allows you create multiple columns in a result set which are based on the unique column values in a source table or query.

In order to accomplish this a SELECT statement will need to contain the PIVOT operator using the following syntax:

SELECT <non-pivoted column>,
    [first pivoted column] AS <column name>,
    [second pivoted column] AS <column name>,
    ...
    [last pivoted column] AS <column name>
FROM
    (<SELECT query that produces the data>) 
    AS <alias for the source query>
PIVOT
(
    <aggregation function>(<column being aggregated>)
FOR 
[<column that contains the values that will become column headers>] 
    IN ( [first pivoted column], [second pivoted column],
    ... [last pivoted column])
) AS <alias for the pivot table>
<optional ORDER BY clause>;

Where:

  • [first pivoted column], [second pivoted column] and [last pivot column] – Identifies the unique column values that will be pivoted.
  • <column_name> - Identifies the column alias name for the pivoted columns.
  • <SELECT query that produces that data> - Identifies a query that produced the data that will be used by the PIVOT operator.
  • <Alias for the query source> - Identifies the table alias name for the query that produces the data for the PIVOT operator.
  • <aggregation function> - Identifies the function that will be used to aggregate the pivot column.
  • <column being aggregated> - Identifies the column that will be aggregated by the PIVOT operator.
  • <column that contains the values that will become column header> - Identifies the column from the query that that contains the unique column values that will be pivoted.
  • <alias for the pivot table> - identifies the alias name for the table produced by the PIVOT operator.
  • <optional ORDER BY clause> - identifies the columns used to sort the results produced by the PIVOT operator.

By reviewing this syntax you can see the PIVOT operator requires a number of different TSQL statement components to perform the PIVOT operation. To better understand this syntax let's review a few examples.

Sample Data for Examples

For all the examples in the article I will be using the AdventureWorks2012 database. If you want to follow along and you don't have the AdventureWorks2012 database you can download it from this location: http://msftdbprodsamples.codeplex.com/releases/view/93587

Example of Simple Pivot Operation

For my first example I will be creating some code that does a simple PIVOT operation. But before we look at some code that uses the PIVOT operator let's review the code in Listing 1 and the output it produces in Result 1.

USE AdventureWorks2012;
GO
SELECT [TerritoryID]
     , Year([OrderDate]) AS OrderYear
 , COUNT(*)  NumOfOrders
FROM [Sales].[SalesOrderHeader]
WHERE Year([OrderDate]) in (2005,2006)
GROUP BY Year([OrderDate]),[TerritoryID]
ORDER BY Year([OrderDate]),[TerritoryID]

Listing 1: Query that produces Number of Orders by Year and TerritoryID

When I run the code in Listing 1 it produces the output in Result 1.

TerritoryID OrderYear   NumOfOrders
----------- ----------- -----------
1           2005        184
2           2005        40
3           2005        45
4           2005        296
5           2005        62
6           2005        127
7           2005        59
8           2005        76
9           2005        394
10          2005        96
1           2006        489
2           2006        100
3           2006        114
4           2006        748
5           2006        146
6           2006        434
7           2006        268
8           2006        233
9           2006        859
10          2006        301

Result 1: Output when running the code in Listing 1

If you look at the code in Listing 1 you can see it is counting the number of orders (NumOfOrders) for each TerritoryID, for each of the years listed. What I'm trying to show here is that there is one row of output produced for every unique combination of values of TerritoryID and OrderYear. By using the PIVOT operator I will be able to produce output similar toResult 1, but it will be pivoted. By using the PIVOT operator the NumOfOrders value for each TerritoryID will be a separate column value in a row for a given OrderYear. To demonstrate this let's look at the PIVOT query in Listing 2.

USE AdventureWorks2012;
GO
SELECT OrderYear as Num_Of_Orders_Per_Year_By_TerritoryID, 
 [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]
FROM
(SELECT [TerritoryID]
     , Year([OrderDate]) AS OrderYear
 ,1 Num
FROM [Sales].[SalesOrderHeader]
WHERE Year([OrderDate]) in (2005,2006)) AS SourceTable
PIVOT
(
SUM(Num)  
FOR [TerritoryID] IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10])
) AS PivotTable;

Listing 2: PIVOT query to place the number of orders value by TerritoryID and OrderYear in separate columns

When I run the code in Listing 2 I get the output in Result 2.

Num_Of_Orders_Per_Year_By_TerritoryID 1    2    3    4    5    6    7    8    9    10
------------------------------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
2005                                  184  40   45   296  62   127  59   76   394  96
2006                                  489  100  114  748  146  434  268  233  859  301

Result 2: Output produced when running the PIVOT query in Listing 2.

By reviewing the output in Result 2 you can see that the summarized number of orders per year are now shown as column values for a specific OrderYear and TerritoryID. To understand how the PIVOT query created this output let's walk through the different parts for the query in Listing 2.

The query in Listing 2 can be broken up into three different parts. The first part of the query is right after the word SELECT. This first part identifies the column headings for the pivoted data. I my example those column headings would be the OrderYear, a non-pivoted column followed by 10 different pivoted values of 1 through 10. These values 1 through 10 represent the different possible TerritoryID values that are represented in SalesOrderHeader rows. I specified all the possible values for TerritoryID. You don't have to specify every value in the table. I could have specified only those column headings I wanted to display. Additionally if you specify a value that doesn't exist (such as “11”) in your data, then the value shown for that column will be NULL. The next part is a SELECT subquery that is identified by a table alias, in this case SourceTable. The SourceTable subquery identifies the source of the data for the PIVOT operator. In this example the SourceTable subquery produces a record set that contains the TerritoryID, the year of the order (OrderDate), and a constant value of 1 for the Num column for every SalesOrderHeader record where the year of the order is in 2005 or 2006. The next part of the query is the PIVOT clause, which is enclosed in parentheses and is given an alias name of PivotTable. The PIVOT clause is used to summarize the Num column to identify the summarized column values that will be pivoted. The PIVOT operator also aggregates and groups the data in the SourceTable. The data is summarized based on the columns in the SourceTable that are not being aggregated, which in this case is OrderYear and Territoryid. The PIVOT operator aggregates (in this case, sums) the Num column for each unique value of OrderYear, and TerritoryID. I used the SUM function to aggregate the Num column. Since the different Num column values are all 1's I could have also used the COUNT aggregation function to produce the same result. The aggregated values will be displayed under the column heading associated with the different TerritoryID's. For each unique values of the OrderYear column, which is the only column in the SourceTable that is not used in the PIVOT clause, there will be a separate row in the output. In my example the OrderYear column is the only column in the SourceTable that is not referenced in the PIVOT clause. Because OrderYear has two unique values there are two rows of aggregated data in Result 2.

Multiple Columns not used in the PIVOT clause

What happens when there are multiple columns in the source table that are not used in the PIVOT clause? When there is more than a single column in the source table that is not used in the PIVOT clause then there will be one row of output for every unique set of values for the non-PIVOT clausecolumns. To demonstrate this I'm going to run the code in Listing 3. In this listing I will be summarizing SalesOrderHeader data where the data is pivoted based on the sales quarter (OrderQtr), and aggregating the Num column based on OrderQtr, TerritoryID and OrderYear.

USE AdventureWorks2012;
GO
SELECT OrderYear, TerritoryID, 
       [1], [2], [3], [4] 
    
FROM
(SELECT Year([OrderDate]) AS OrderYear
        ,TerritoryID 
        , ((Month([OrderDate])-1)/3) + 1 as OrderQtr
   ,1 Num
FROM [Sales].[SalesOrderHeader]
WHERE Year([OrderDate]) in (2005,2006)) AS SourceTable
PIVOT
(
COUNT(Num)  
FOR [OrderQtr] IN ([1], [2], [3], [4])
) AS PivotTable
ORDER BY OrderYear, TerritoryID;

Listing 3: PIVOT query to group data on more than 1 column

When I run the code in Listing 3 I get the output in Result 3.

OrderYear   TerritoryID 1           2           3           4
----------- ----------- ----------- ----------- ----------- -----------
2005        1           0           0           72          112
2005        2           0           0           18          22
2005        3           0           0           20          25
2005        4           0           0           127         169
2005        5           0           0           30          32
2005        6           0           0           61          66
2005        7           0           0           30          29
2005        8           0           0           30          46
2005        9           0           0           182         212
2005        10          0           0           51          45
2006        1           101         117         149         122
2006        2           20          20          30          30
2006        3           20          23          36          35
2006        4           148         208         209         183
2006        5           31          32          44          39
2006        6           108         93          128         105
2006        7           38          44          86          100
2006        8           38          56          61          78
2006        9           190         200         203         266
2006        10          47          32          108         114

Result 3: Output produced when running the PIVOT query in Listing 3.

By reviewing the code in Listing 3 you can see my SourceTable contains the following columns: OrderYear, TerritoryID, OrderQtr, and Num. The code in listing 3 aggregates the Num column, and pivots on the OrderQtr columns. Remember in the last section I said the PIVOT clausegroups the data based on the columns not being aggregated by the PIVOT operator. Therefore the data from the SourceTable is grouped on the OrderQtr, OrderYear and TerritoryID, because these are three columns in the SourceTable that are not aggregated in the PIVOT clause. You can see this by reviewing the output in Result 3, where there is an aggregated value displayed for each unique value of OrderQtr, OrderYear and TerritoryID. Additionally for every unique value pair of the OrderYear and TerritoryID columns, the two columns that are not used in the PIVOT clause, there is a separate row of output in Results 3. This time I used the COUNT function to determine the number of orders placed per OrderQtr, I could have used the SUM function as well since it produces the same results in this case.

Dynamically Determining PIVOT columns

What happens if you don't always know the column values you want to pivot on? Does this mean you can't write your PIVOT query in advance? Not knowing the pivot column values doesn't keep you from pivoting your data. You can use dynamic SQL code to generate your PIVOT query when the data values for the FOR clause of the PIVOT operator are not known. The code in Listing 4 generates dynamic SQL to determine the number of SalesOrderHeader records there are by TerritoryID and OrderYear.

SET NOCOUNT ON;
USE AdventureWorks2012;
GO
DECLARE @Columns nvarchar(1000)='';
-- Identify columns to pivot
SELECT @Columns=stuff((
    SELECT DISTINCT ',' + QUOTENAME(CAST(Year([OrderDate]) AS CHAR(4))) as OrderYear
    FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] p2
    ORDER BY OrderYear
    FOR XML PATH(''), TYPE).value('.', 'varchar(max)')
            ,1,1,'')
--FROM (SELECT DISTINCT CAST(Year([OrderDate]) AS CHAR(4)) AS OrderYear 
--      FROM [Sales].[SalesOrderHeader]) AS Years
DECLARE @CMD nvarchar(1000);
-- Generate Dynamic SQL
SET @CMD = 'SELECT TerritoryID, ' + @Columns +
           ' FROM (SELECT TerritoryID, Year(OrderDate) AS OrderYear ' +
   'FROM [Sales].[SalesOrderHeader])AS SourceTable ' +
           ' PIVOT(COUNT(OrderYear) For [OrderYear] IN (' + @Columns + ')) as PivotTable';
-- Print and execute generated command
PRINT @CMD
EXEC sp_executesql @CMD;

Listing 4: Code to produce dynamic query statement

When I run the code in Listing 4 I get the output in Result 4.

TerritoryID 2005        2006        2007        2008
----------- ----------- ----------- ----------- -----------
1           184         489         1789        2132
2           40          100         135         77
3           45          114         149         77
4           296         748         2391        2789
5           62          146         179         99
6           127         434         1597        1909
7           59          268         1127        1218
8           76          233         1062        1252
9           394         859         2662        2928
10          96          301         1352        1470

Result 4: Output produced when running the dynamic PIVOT query in Listing 4.

The Code in Listing 4 is broken up into three different pieces. The first part of this code identifies the different OrderYear values and sets the @Column variable to the set of unique OrderYear values with commas in between each year value. This code uses the FOR XML PATH clause to concatenate the unique OrderYear values together, separatedwithcommas. The STUFF function was used to remove the comma that was placed before the first OrderYear value. The second part of the code builds a dynamic PIVOT query which uses the @Column value to identify the column heading values for the pivoted column. The last part of the code prints out and executes the dynamic SQL.

The dynamic SQL code generated by Listing 4 can be found in Result 5.

SELECT TerritoryID, [2005],[2006],[2007],[2008] 
FROM (
   SELECT TerritoryID, Year(OrderDate) AS OrderYear 
   FROM [Sales].[SalesOrderHeader])AS SourceTable  
   PIVOT(COUNT(OrderYear) For [OrderYear] IN ([2005],[2006],[2007],[2008])) as PivotTable

Result 5: Dynamic SQL Code generated by running Listing 4 (re-formatted for readability)

By reviewing the code in Results 5 you can see the code in Listing 4 identified that there were 4 different order years. And then uses those order year values to dynamically build the PIVOT query that is displayed in Results 5.

Summary

When you want to turn rows of data on its side to produce columns you should consider the PIVOT operator. With the PIVOT operator you can easily aggregate a single column, based on the unique values of another columns. Next time you need to summarize data based on specific unique column values consider using the PIVOT operator.

Question and Answer

In this section you can review how well you have understood using the PIVOT operation by answering the following questions.

Question 1:

The PIVOT operator supports summarizing many column? (True or False).

  1. True
  2. False

Question 2:

When the PIVOT clause groups data it does it based on which source columns?

  1. The source columns that are used in the PIVOT operator
  2. The source columns that are not aggregated with the PIVOT operator
  3. The source columns used in the FOR clause of the PIVOT operator
  4. All of the source column used in the PIVOT query

Question 3:

Which of the following statements are true (maybe more than one)?

  1. You can use the PIVOT operator even if you don't know the column values for the FOR clause.
  2. You can use the PIVOT operator to group on more than one column.
  3. The PIVOT operator uses the FOR column values as column headings
  4. The PIVOT operator does not required an ORDER BY clause

Answers:

Question 1:

The correct answer is b, false. The PIVOT operator supports only summarizing one column based on the values of other columns.

Question 2:

The correct answer is b. SQL Server groups data in a PIVOT clause by those column in the source table that are not aggregated by the PIVOT operator.

Question 3:

All of the answer are correct. If you don't already know the column values to be used in the FOR clause then you can generate a dynamic SQL PIVOT query. You can group the summarized column of a PIVOT statement based on all the columns that are not aggregated by the PIVOT operator, which could be more than one. When pivoting data the PIVOT operator uses the column values identified in the FOR clause as column headings in output for a PIVOT query. The “ORDER BY” clause is optional, so it doesn't need to be included in every PIVOT query.

This article is part of the parent stairway Stairway to Advanced T-SQL

Rate

4 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (4)

You rated this post out of 5. Change rating