Questions About Pivoting Data in SQL Server You Were Too Shy to Ask

Of all the basic SQL operations, the pivot seems to cause the most problems. We can tell from the way that old articles on Simple-Talk on the topic continue to be read. It turns out that there are several questions that come to mind while learning about pivoting, but which are seldom asked on forums. Once more, Robert Sheldon attempts to answer these unspoken questions.

  1. “How do I use the PIVOT operator to rotate data in SQL Server?”
  2. “Can I specify column names different from the values in the pivoted column?”
  3. “I’m confused by how data is grouped when using the PIVOT operator. Can I control grouping?”
  4. “How do I avoid grouping on a column’s values when pivoting data so I can return a single-row result set that aggregates the entire dataset?”
  5. “SQL Server does not permit the use of column expressions in the PIVOT clause. Is there a way to get around this limitation?”
  6. “Can I group data by more than one column when I use the PIVOT operator? “
  7. “Is there a way to work around SQL Server’s restriction against using COUNT(*) in the PIVOT clause?”
  8. “Is it possible to pivot more than one column when pivoting data in SQL Server?”
  9. “I sometimes run queries against a SQL Server 2000 database, which doesn’t support the PIVOT operator. How do I pivot data without using PIVOT?”
  10. “Does the PIVOT operator let you perform multiple aggregations?”
  11. “How do I pivot data if I don’t know the values in the pivoted column?”
  12. “How do I unpivot data in SQL Server?”

“How do I use the PIVOT operator to rotate data in SQL Server?”

You use the PIVOT operator within your query’s FROM clause to rotate and aggregate the values in a dataset. The data is pivoted based on one of the columns in the dataset. Each unique value in that column becomes its own column, which contains aggregated pivoted data.

To better understand how this all works, let’s start with the basic syntax for a query that uses the PIVOT operator:

For the SELECT clause, you can specify an asterisk (*) or the individual columns, and for the FROM clause, you can specify a table or table expression. If you use a table expression, then you must also define a table alias. You can also include an ORDER BY clause, but that’s optional. You’ll get to see these clauses in action as we progress through the questions. For now, let’s focus on the PIVOT clause. You need to understand how this clause works to make sure your pivots work the way you want.

After you specify the PIVOT keyword, you pass in what are essentially three arguments, enclosed in parentheses. The first is the aggregate function and the name of the column to be aggregated. You can use any aggregate function except the COUNT function, when used with an asterisk, as in COUNT(*).

Next, you define the FOR subclause, which specifies the column on which the pivot will be based. It is this column whose distinct values are turned into their own columns. The FOR subclause also includes the IN operator, which is where you specify the pivot column values that will be turned into columns. The values you specify here must exist in the pivot column or they will be ignored.

This will all make more sense when you see it in action. Let’s start with the following temporary table and data:

The SELECT statement returns the following results, which include sales totals for the two book types over the course of two years:

BookType

SalesYear

BookSales

Fiction

2014

11201.00

Fiction

2014

12939.00

Fiction

2013

10436.00

Fiction

2013

9346.00

Nonfiction

2014

7214.00

Nonfiction

2014

5800.00

Nonfiction

2013

8922.00

Nonfiction

2013

7462.00

Although this is a very simple dataset, it gives us what we need to demonstrate the PIVOT operator. In the following SELECT statement, I use the operator to turn the SalesYear values into columns and to pivot and aggregate the sales totals:

If you refer back to the syntax above, you can see that BookSales is the aggregate column, SalesYear is the pivot column, and the SalesYear values, which are passed into the IN operator, become the new columns. The following table shows the results returned by the SELECT statement:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

An important point to note about the results is that the first column is BookType, with a row for fiction sales and a row for nonfiction sales. The BookSales values are aggregated according to year, with the data grouping based on the BookType column. SQL Server automatically grouped the data by the values in this column, even though the column is not explicitly included in the PIVOT clause. (We’ll cover grouping more in depth in a separate question.)

In the preceding example, I used the SUM function to aggregate the data, but we can use a different aggregate function. For example, the following PIVOT clause uses the AVG function:

This time, the SELECT statement returns sales averages, rather than sales totals, as shown in the following table:

BookType

2013

2014

Fiction

9891.00

12070.00

Nonfiction

8192.00

6507.00

In the previous two examples, the IN operator in the PIVOT clause specified all the SalesYear values, but you can specify whatever values you want to return, as long as they’re included in the pivoted column. For example, the IN operator in the following PIVOT clause includes only the 2014 value:

As the following table shows, the results now include only two columns:

BookType

2014

Fiction

24140.00

Nonfiction

13014.00

There is, of course, more to the PIVOT operator than what’s shown here, but this should give you the basics. The key to understanding the operator is in knowing which column to aggregate, which column to pivot, and which one should be implicitly grouped.

“Can I specify column names different from the values in the pivoted column?”

Yes, you can, by using column aliases, but before we get into that, let’s return to the syntax I introduced in the previous question:

Notice that I use the column_list placeholder in the SELECT clause. Before we dig any deeper into the select list, let’s again start with the #BookSales temporary table and populate it:

As you saw in the preceding examples, if we want to go with the default column names produced by the PIVOT operator, we can use just an asterisk in the SELECT clause:

As expected, the query returns the data shown in the following table:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

 

Suppose we now want to better control how the columns are returned in the result set. Let’s revise our syntax a bit, with more detail in the SELECT clause:

By default, the first column in the select list is the column on which the aggregated grouping is based. This is the column that you do not specify in the PIVOT clause. The subsequent columns are based on the unique values in the pivoted column. If we were to repeat the previous example, but include the column names in the select list, our query would look something like the following:

All I’ve done here is specify the column names rather than use an asterisk. However, because we’re dealing with integers for some of the column names, the results we get are not what we might expect:

BookType

(No column name)

(No column name)

Fiction

2013

2014

Nonfiction

2013

2014

In this case, the values 2013 and 2014 are treated as literals and returned as column values, rather than column names. We can easily remedy this by delimiting the column names:

Now we get the results we would expect:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

Of course, all we’ve achieved here is to return the same results we would have achieved if we had used an asterisk. However, we can use the select list to specify the column names in a different order than how the columns are listed in the IN operator within the FOR subclause:

Now the aggregated data is listed in an order different from the previous examples:

BookType

2014

2013

Fiction

24140.00

19782.00

Nonfiction

13014.00

16384.00

We can in fact, even change the position of the group column, which in this case is BookType:

In most cases, though, you’ll want to keep the group column in the first position, or you end up with results like those shown in the following table:

2013

BookType

2014

19782.00

Fiction

24140.00

16384.00

Nonfiction

13014.00

Clearly, moving the group column around in this way can make the results less readable. Imagine what it would be like if you were dealing with 10 times the number of columns.

Another advantage of being able to specify the columns in the select list is that you can assign aliases to them. In the next example, I’ve assigned an alias to each of the three columns:

As you’d expect, the column aliases are reflected in the query results:

BookCategory

Year2013

Year2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

Whenever you find that the values in your pivot column do not make good column names for the pivoted data, you can easily assign aliases to them in this way to make your data more readable.

“I’m confused by how data is grouped when using the PIVOT operator. Can I control grouping?”

The PIVOT operator is a bit odd in the fact that data is grouped implicitly based on the columns you do not specify in the PIVOT operator. Let’s return to our example temporary table:

In this case, the source dataset is a simple three-column table. BookSales serves as the aggregate column, and SalesYear serves as the pivot column. That leaves BookType, which automatically becomes the group column, as shown in the following results:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

In the real world, our source datasets are seldom that simple. Suppose, for example, our temp table included an additional column, a simple nullable BIT column that indicates whether the totals have been verified, with 1 meaning YES, 0 meaning NO, and NULL meaning we don’t know one way or the other:

 

Chances are, you might not actually make such a column nullable, preferring instead to default to 0, but this is good enough for now to demonstrate a point, that is, grouping is based on all columns not specified in the PIVOT clause. Let’s try it out. The following SELECT statement is identical to the one in the preceding example:

But now our results are quite different, skewing them in a direction we likely do not want to go:

BookType

Verified

2013

2014

Fiction

NULL

10436.00

NULL

Nonfiction

NULL

8922.00

NULL

Fiction

0

9346.00

NULL

Nonfiction

0

7462.00

7214.00

Fiction

1

NULL

24140.00

Nonfiction

1

NULL

5800.00

The database engine has grouped the data by both the BookType and Verified columns, rather than just BookType. To get around this, we need to specify a table expression in our FROM clause that retrieves only the relevant columns:

Now we get the results we expect, with the data grouped only by the BookType table:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

In most cases, you’ll find you’ll need to use a table expression in the FROM clause, rather than specifying only the base table. In fact, you’ll often find that most of the real engineering for pivoting data rests with the table expression.

“How do I avoid grouping on a column’s values when pivoting data so I can return a single-row result set that aggregates the entire dataset?”

The trick to using PIVOT operator to aggregate and rotate data effectively often rests with getting the grouping right. This idea also applies to aggregating the entire dataset. Let’s return again to our basic three-column temporary table and perform a simple pivot:

As expected, our query returns the results in the following table:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

But suppose we don’t want to group the data by the BookType column and instead want to return the aggregated values for the entire year. For that, we need the right table expression, one that returns only the BookSales and SalesYear columns:

As before, we’ve specified BookSales table as our aggregate column and SalesYear as our pivot column. However, because these are now the only two columns in our dataset, we no longer have a third column to serve as the group column. As a result, our query returns the following data:

2013

2014

36166.00

37154.00

We now have totals for each year, without the data being grouped by any one column. We can, however, specify a column in the first position that provides context to the other data:

In this case, I’ve created a column named Year and assigned the value Total Sales to that column, as shown in the following results:

Year

2013

2014

Total Sales

36166.00

37154.00

This can be a handy strategy to make it easier to understand the data, but you need to be careful because the column name can be a little deceiving in that it actually points to the other column names, rather than the value in its own column. Be sure to give careful consideration to what you’re doing if you go down this route.

“SQL Server does not permit the use of column expressions in the PIVOT clause. Is there a way to get around this limitation?”

Like many of the PIVOT clause limitations, the way to get around the restriction against column expressions is to use the table expression in the FROM clause. There you can create whatever expressions you need. Let’s start with a different variation of our #BookSales temporary table:

We now have a DATETIME column, rather than a simple INT column for the year, and we’ve added a column for the wholesale cost. Suppose we now want to calculate the net sales and extract only the year for our pivot. We can create a query similar to the following:

Once again, our table expression is doing the work for us by extracting the year and calculating the net sale. We then use the name of the computed columns in our PIVOT clause, giving us the results shown in the following table:

BookCategory

2013

2014

Fiction

2196.00

6519.00

Nonfiction

2616.00

3036.00

Although this is a simple example, it demonstrates the importance of getting the table expression right so your PIVOT operator has exactly the data it needs to do its magic.

“Can I group data by more than one column when I use the PIVOT operator?”

Yes, you can group multiple columns, but you need to be sure you’re grouping only those columns that should be grouped. As mentioned earlier, the PIVOT operator groups data by those columns in the data set that are not specified as the aggregate or pivot columns. To demonstrate multi-column grouping, let’s recast our #BookSales temp table once again:

This time we’re also including the BookSubtype column to provide an additional way to categorize the data. We can then include the column in our table expression, as shown in the following example:

The table expression defines our dataset, which includes the BookSubtype column. However, as in the previous examples, the PIVOT clause specifies only the BookSales and SalesYear columns, which means the data will be grouped by the BookType and BookSubtype columns when it is pivoted, as shown in the following results:

BookType

BookSubtype

2013

2014

Fiction

Adults

20345.00

25410.00

Fiction

Children

18072.00

24428.00

Fiction

YA

8756.00

9854.00

Nonfiction

Adults

16662.00

11613.00

Nonfiction

Children

15729.00

11048.00

Once again, the key to making this all work is to ensure that you get your table expression right so that you’re controlling the columns on which you group the data.

“Is there a way to work around SQL Server’s restriction against using COUNT(*) in the PIVOT clause?”

It’s true that you can’t use the COUNT aggregate function with an asterisk in your PIVOT clause, but you can use the function if you specify a one of the columns in the dataset. To demonstrate this, let’s create our temporary table once more:

Suppose that we now want to return the number of entries, rather than the total number of sales. One way we can do this is to pass the BookSales column in as an argument to the COUNT function:

The statement will now return a count, rather than a sale amount, as shown in the following table:

BookType

2013

2014

Fiction

2

3

Nonfiction

3

2

Although this appears to work, the challenge with this approach is that it assumes the specified column is non-nullable, that is, does not contain NULL values. But that’s not always the case. For example, suppose we insert the following two rows to serve as placeholders during the load process:

If we were to rerun the previous SELECT statement, we would receive the same results, despite the fact that we’ve added two rows. That’s because the COUNT function ignores NULL values. Another approach we might consider is to pass the SalesYear column into the function:

Unfortunately, this confuses matters even more because the SalesYear column is also the pivot column, giving us the following results:

BookType

BookSales

2013

2014

Fiction

NULL

0

1

Nonfiction

NULL

0

1

Nonfiction

4372.00

1

0

Nonfiction

5800.00

0

1

Nonfiction

7214.00

0

1

Nonfiction

7462.00

1

0

Nonfiction

8922.00

1

0

Fiction

9346.00

1

0

Fiction

9856.00

0

1

Fiction

10436.00

1

0

Fiction

11201.00

0

1

Fiction

12939.00

0

1

Because we left the BookSales column out of the PIVOT clause, it’s now treated as a group column. One way around this is to create a table expression in the FROM clause so we return only the BookType and SalesYear columns:

This time we get the results we want:

BookType

2013

2014

Fiction

2

4

Nonfiction

3

3

Unfortunately, with this approach we’re once again assuming that the column will contain no NULL values. A safer approach might be to add a column to our dataset that contains the same constant for every row:

In this case, I’ve created a column name CntCol and set its value to 1. This way, I don’t have to worry about NULL values skewing the results. The following table shows the data returned by SELECT statement:

BookType

2013

2014

Fiction

2

4

Nonfiction

3

3

Adding a column in this way is a handy trick to know when you want to return reliable counts in your pivoted data and you can’t guarantee that the aggregate column will contain no NULL values..

“Is it possible to pivot more than one column when pivoting data in SQL Server?”

Yes, it is possible to base your pivot on more than one column; however, you can’t do this within the PIVOT clause. As with other PIVOT limitations, you must use your table expression to define the logic and return the necessary dataset. Let’s start with yet another variation on our #BookSales temp table:

Suppose we now want to pivot the data based on both the Region and SalesYear columns so that we end up with columns for each year/region combination. The place to start, of course, is with our table expression:

In this case, I’m concatenating the SalesYear and Region columns to produce the SalesRegion computed column. This will give me values such as 2014_east and 2014_west. I then specify SalesRegion as the pivot column and pass in the column’s values to the IN operator. The SELECT statement returns the results shown in the following table:

BookCategory

2013_east

2013_west

2014_east

2014_west

Fiction

13702.00

10436.00

11201.00

12939.00

Nonfiction

8922.00

7462.00

7214.00

14256.00

As you can see, each book category in the result set is broken down by the year/region combination. Again, it’s all about the table expression.

“I sometimes run queries against a SQL Server 2000 database, which doesn’t support the PIVOT operator. How do I pivot data without using PIVOT?”

If you’re working on a SQL Server edition that predates SQL Server 2005 or if you don’t want to use the PIVOT operator for another reason, you can revert to a series of CASE expressions in the SELECT clause that define each pivoted column. To demonstrate this, we’ll once more create the #BookSales temporary table:

As we’ve seen before, if we were to use the PIVOT operator, we would end up with a query that looks similar to the following:

With results like those shown in the following table:

BookType

2013

2014

Fiction

19782.00

24140.00

Nonfiction

16384.00

13014.00

We can achieve the same results if we specifically group the data on the BookType column and then use CASE expressions in the SELECT clause:

The SELECT clause starts with the BookType column and then includes a CASE expression for each column we want to include in the pivoted result set. Each CASE expression checks the value in the SalesYear column and returns the related BookSales value for that year.

For a simple example like this, the original way of pivoting data isn’t so bad. But if you’re dealing with a lot of unique values in your pivot column, it can become fairly tedious creating all those CASE expressions. On the other hand, it’s nice to have an alternative you can turn to when you don’t want to-or you can’t-use the PIVOT operator.

“Does the PIVOT operator let you perform multiple aggregations?”

Yes, you can perform multiple aggregations, but you can’t do it within a single PIVOT clause. To demonstrate, let’s return once again to the #BookSales table:

Suppose we now want to return both sales totals and averages, in which case, we need to use the SUM and AVG aggregate functions. A fairly straightforward approach to doing this is to create two common table expressions (CTEs) that each pivot the data and then join the two CTEs together, as shown in the following example:

I first define a CTE named SalesTotal that uses the SUM function to aggregate the data. I then define a CTE named SalesAvg that uses the AVG function to aggregate the data. Notice that in each case the SELECT statement returns pivot column names that are easily identifiable. After I define the two CTEs, I use an inner join to merge them together, based on the BookType column. The statement returns the results shown in the following table:

BookType

2013_Total

2013_Avg

2014_Total

2014_Avg

Fiction

19782.00

9891.00

24140.00

12070.00

Nonfiction

16384.00

8192.00

13014.00

6507.00

As you can see, for each book category, we now have a total sales amount as well as an average sales amount for each year. Having the capacity to use CTE’s to join multiple pivot operations helps to make the PIVOT operator much more flexible.

“How do I pivot data if I don’t know the values in the pivoted column?”

Up to this point, the examples have assumed that we’ll know the values in our pivoted column at the time we write our query. Often, however, this isn’t the case, which means we have no way of defining the new columns. To overcome this limitation, we need to turn to dynamic SQL to construct our T-SQL statement on the fly. Let’s look at how this works, but first, another incarnation of the #BookSales table:

Although we’re in on the ground floor in this case, in terms of having created and populated the table, let’s assume we have no idea what the values will be in the SalesYear column, although we still want to pivot the data based on that column. In this case, we need to construct our query dynamically, pulling the data out of the pivot column as we need it. To do so, we must declare a couple variables, retrieve the list of values from the pivot column, and assemble our query, as shown in the following example:

In this example, I first declare the @sql and @col variables. The @sql variable will store the final SELECT statement, and the @col variable will store the list of values from the pivot column. Next, I use a SELECT statement to retrieve a list of distinct values from the SalesYear column and assign those values to the @col variable.

Finally, I use a SET statement to construct the final query, passing in the @col variable to provide the column names to the IN operator and select list, all of which I assign to the @sql variable. I then use an EXECUTE statement to run the final query, which returns the results shown in the following table:

BookType

2012

2013

2014

Fiction

16765.00

19782.00

24140.00

Nonfiction

14581.00

16384.00

13014.00

Using dynamic SQL is a great way to get around the need to know the column values in advance, but be sure to take precautions against injection attacks when using dynamic SQL. For example, the QUOTENAME function returns a delimited string to make sure the column name is a valid delimited identifier, helping to avoid injection through object names. Make sure you understand how SQL injection works before using dynamic SQL.

“How do I unpivot data in SQL Server?”

You can unpivot data in SQL Server by using the UNPIVOT operator, which in many ways is similar to the PIVOT operator. As you’ve seen, the PIVOT operator rotates a column’s values, turning them into their own columns. The UNPIVOT operator does the opposite by rotating columns and turning them into column values. Essentially, the PIVOT operator denormalizes data, and the UNPIVOT operator attempts to normalize it. For example, suppose our #BookSales table look more like a denormalized Excel spreadsheet:

As expected, the SELECT statement returns the results shown in the following table:

BookType

2012

2013

2014

Fiction

16765.00

19782.00

24140.00

Nonfiction

14581.00

16384.00

13014.00

Now suppose we want to bring a little more relational-like structure to the data. We can use the UNPIVOT operator to reverse engineer our dataset:

Notice that the FROM clause includes the UNPIVOT subclause, similar to a PIVOT clause. In the case of UNPIVOT, however, we first specify a value column (SalesTotal), which will display the sales amounts in our result set, and then include a FOR subclause, which specifies the new pivot column (SalesYear) and the values that will be added to that column. These values are the original column names. The SELECT statement returns the results shown in the following table:

BookType

SalesYear

SalesTotal

Fiction

2012

16765.00

Fiction

2013

19782.00

Fiction

2014

24140.00

Nonfiction

2012

14581.00

Nonfiction

2013

16384.00

Nonfiction

2014

13014.00

As with the PIVOT operator, you want to be sure that the table expression in your FROM clause returns the right data for the UNPIVOT clause. However, you should not think of UNPIVOT as merely an undoing of a PIVOT operation. For example, suppose that this time around we start with a more normalized version of the #BookSales table:

The SELECT statement returns the results shown in the following table:

BookType

SalesYear

BookSales

Fiction

2014

11201.00

Fiction

2014

12939.00

Fiction

2013

10436.00

Fiction

2013

9346.00

Fiction

2012

8956.00

Fiction

2012

7809.00

Nonfiction

2014

7214.00

Nonfiction

2014

5800.00

Nonfiction

2013

8922.00

Nonfiction

2013

7462.00

Nonfiction

2012

7739.00

Nonfiction

2012

6842.00

Now let’s pivot the data in the same way we’ve been doing all along:

Not surprisingly, the statement returns the following results:

BookType

2012

2013

2014

Fiction

16765.00

19782.00

24140.00

Nonfiction

14581.00

16384.00

13014.00

Instead of just returning the results, however, let’s put them into the #PivotSales temporary table:

Now let’s unpivot our new temporary table:

The SELECT statement returns the results in the following table:

BookType

SalesYear

SalesTotal

Fiction

2012

16765.00

Fiction

2013

19782.00

Fiction

2014

24140.00

Nonfiction

2012

14581.00

Nonfiction

2013

16384.00

Nonfiction

2014

13014.00

If you compare these to the results we received when we first created the #BookSales table, you’ll see that we don’t get the granularity of the original table. The UNPIVOT operator has no insight into the past and can work only with the dataset in its present form. In general, this shouldn’t present a problem; just don’t expect the UNPIVOT operator to be able to undo your PIVOT operations.