Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

DataOnWheels

Steve Hughes is a Principal Consultant at Magenic. His area of expertise is in data and business intelligence architecture on the Microsoft SQL Server platform. He was also the data architect for a SaaS company which delivered a transportation management solution for fleets across the United States. Steve has co-authored two books and delivered more than 30 presentations on SQL Server and data architecture over the past six years. He also provides insights from the field on his blog at http://dataonwheels.wordpress.com.

T-SQL Window Functions – Part 1: The OVER() Clause

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow FunctionsSQL Window functions were introduced in SQL Server 2005. At the time, only a small set of functionality was available. Window functions fill a need in the aggregation story for SQL Server. Window functions allow the developer to use row level aggregations without the penalty of using cursors to accomplish a similar task. Window functions allow you to segment a set of rows and then apply a function to that set of rows. In many cases, you may choose an aggregation function. However, other functions are also available including ranking and analytic functions. In this four-part series, I will start by breaking apart the OVER clause which is the key to understanding window functions in SQL Server. The following posts will expand on each group of window functions which use the OVER clause – ranking, aggregate, and analytic.

Up until 2008 R2, SQL Server only supported a subset of window functions which focused on ranking and some aggregation functions. With SQL Server 2012, Microsoft greatly expanded its support for window functions in T-SQL thus making fairly complex operations such as running totals much simpler to accomplish.

The following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Defining the Window with the OVER() Clause

The purpose of the OVER clause is to define the window over which the function will be applied. The default functionality is to define the window for the entire table. As shown in the example below, the result is the same as if you had used an aggregation function without a GROUP BY clause. Some of what makes OVER unique includes the fact different windows can be specified in a single SELECT statement as will be shown in later sections.


select CustomerName
, SUM(OrderAmt) Over () as NoParm
from CTEOrders;

image

The OVER clause takes three additional arguments which change the scope of the clause — PARTITION BY, ORDER BY, and ROWS or RANGE.

PARTITION BY

The PARTITION BY clause is used to reduce the scope of the window to which the aggregation applies. This clause partitions the default or entire result set into partitions based on the criteria specified.  The function will be applied to each partition independently. At this point, we begin to see the real power of window functions.

While similar to the GROUP BY clause, the PARTITION BY clause allows you to specify different partitions within the same select statement. The following example builds on the illustration above with no partition specified and a partition on the CustomerName field.

select CustomerName
, SUM(OrderAmt) OVER() as NoParm
, SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
from CTEOrders;

image

ORDER BY

The ORDER BY clause is used to order how the partitions apply the window function. For instance, when an ORDER BY is present in our current illustration, it will sum all of Beth’s order amounts, but will then add Joe’s to the total as he is the next in order. In the following illustration, no partition is specified so the partition is still the whole table. You can now see how partitions and ordering affect the window function.

select CustomerName
,SUM(OrderAmt) OVER () as NoParm
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
from CTEOrders;

image

As you can see, if no partition is specified the order will apply to the entire result set or the default partition. This is particularly important to understand when using ranking functions with the OVER clause.

Before SQL Server 2012, the ORDER BY clause could only be used with ranking functions. With the release of SQL Server 2012, the ORDER BY clause can now be used with ranking, aggregate, and analytic functions.

The following image visually illustrates the how the PARTITION BY and ORDER BY clauses aggregate and rank data.

image

ROWS or RANGE

The final clauses that can be used to define a partition are the ROWS and RANGE functions. This functionality was introduced in SQL Server 2012. These functions will specify a set of rows based on the position of the current row and as a result both functions require the use of the ORDER BY clause. Because each partition is anchored to the current row, the rows or range specifications are based on their proximity to the current row. A number of key words can be used with this clause to define the window frame used by the window functions.

• CURRENT ROW: That’s what it means. It identifies the current row as part of the ROWS or RANGE. This key word is supported by ROWS and RANGE.
• UNBOUNDED PRECEDING: Go to the beginning of the partition. This is supported by both ROWS and RANGE.
• UNBOUNDED FOLLOWING: Go to the end of the partition. This is supported by both ROWS and RANGE.
n PRECEDING: This is used to specify the number of rows before the current row. This is only supported by the ROWS clause.
n FOLLOWING: This is used to specify the number of rows after the current row. This is only supported by the ROWS clause.

select CustomerName
,SUM(OrderAmt) OVER () as NoParm
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as RangeByDate
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
from CTEOrders;

image

A couple of other notes regarding ROWS and RANGE. In cases where the partition returns no rows, the value returned is NULL by default as seen in rows 5 and 10 above. Also, if the partition only returns fewer rows than specified it will only apply to rows within the range. In the example above, row 4 NextTwoAmts column will only return the sum of row 5 as it is the only row included in the partition.

Finally, ROWS and RANGE react differently when only the CURRENT ROW is specified. When CURRENT ROW is used by itself with the RANGE clause it will apply the function based on the partitioning and ordering. However, when used with the ROWS clause, it will only return the current row regardless of the partition specification.

select CustomerName
,OrderDate
,OrderAmt
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate RANGE CURRENT ROW) as RangeRow
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate ROWS CURRENT ROW) as RowRow
from CTEOrders;

image

The next three related posts I will discuss ranking functions, aggregate functions and analytic functions.


Comments

Leave a comment on the original post [dataonwheels.wordpress.com, opens in a new window]

Loading comments...