This post is part of the series discussing the new Analytic functions in SQL Server “Denali”, CTP3.
Analytic Functions in SQL Server “Denali”
All of the new Analytic functions require the use of the OVER clause. To see what changes to the OVER clause that “Denali” brings, please view my article The OVER Clause enhancements in SQL Server code named “Denali”, CTP3 on SQLServerCentral.com.
Introducing the FIRST_VALUE / LAST_VALUE functions
The FIRST_VALUE and LAST_VALUE functions return the first or last value in the ordered set of your partition, optionally applying a windowing clause to limit the rows being considered.
The syntax of these functions is:
FIRST_VALUE | LAST_VALUE ( [scalar_expression )
OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
Note that the order_by_clause is required, and the partition_by_clause and the row_range_clause are optional.
The scalar_expression is the value to be returned – this will normally be a column, but it can also be a subquery or any other expression that results in a single value.
Using the FIRST_VALUE and LAST_VALUE functions
For this example, I’m going to use the AdventureWorks for Denali database, which can be downloaded here. In this database, there is a Sales.SalesOrderHeader table, which has information about each order. This information includes CustomerID, OrderDate, and TotalDue columns.
Let’s run a query that shows, for each CustomerID, the OrderDate for when they placed their least and most expensive orders. What this entails is using the FIRST_VALUE and LAST_VALUE functions to return the OrderDate column. The OVER clause specifies that the PARTITION BY is the CustomerID, the ORDER BY is the TotalDue, and the ROWS clause specifies all rows for that CustomerID. The resulting code is:
SELECT DISTINCT TOP (20)
CustomerID,
OrderDateLow = FIRST_VALUE(OrderDate)
OVER (PARTITION BY CustomerID
ORDER BY TotalDue
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING),
OrderDateHigh = LAST_VALUE(OrderDate)
OVER (PARTITION BY CustomerID
ORDER BY TotalDue
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
FROM Sales.SalesOrderHeader
ORDER BY CustomerID;
This code produces the following result set:
CustomerID OrderDateLow OrderDateHigh
----------- ----------------------- -----------------------
11000 2007-07-22 00:00:00.000 2005-07-22 00:00:00.000
11001 2008-06-12 00:00:00.000 2005-07-18 00:00:00.000
11002 2007-07-04 00:00:00.000 2005-07-10 00:00:00.000
11003 2007-07-09 00:00:00.000 2005-07-01 00:00:00.000
11004 2007-07-26 00:00:00.000 2005-07-26 00:00:00.000
11005 2007-07-27 00:00:00.000 2005-07-02 00:00:00.000
11006 2007-07-02 00:00:00.000 2005-07-27 00:00:00.000
11007 2007-07-25 00:00:00.000 2005-07-12 00:00:00.000
11008 2007-07-07 00:00:00.000 2005-07-28 00:00:00.000
11009 2007-07-24 00:00:00.000 2005-07-30 00:00:00.000
11010 2007-07-15 00:00:00.000 2005-07-17 00:00:00.000
11011 2007-07-29 00:00:00.000 2005-07-02 00:00:00.000
11012 2008-04-17 00:00:00.000 2007-09-17 00:00:00.000
11013 2007-10-15 00:00:00.000 2008-07-24 00:00:00.000
11014 2007-09-24 00:00:00.000 2007-11-01 00:00:00.000
11015 2007-07-22 00:00:00.000 2007-07-22 00:00:00.000
11016 2007-08-13 00:00:00.000 2007-08-13 00:00:00.000
11017 2008-04-16 00:00:00.000 2005-07-15 00:00:00.000
11018 2008-04-26 00:00:00.000 2005-07-20 00:00:00.000
11019 2008-02-23 00:00:00.000 2007-11-26 00:00:00.000
By repeating the use of the function, with the same OVER clause definition, we can get other columns from those same rows.
How about another example?
In my OVER clause enhancements article linked to above on SSC, there is an example of performing a running total. To demonstrate the LAST_VALUE function, I’m going to enhance that code so that it not only shows the running total, but to also show the percent of total that each transaction amount is per account. This will require getting the last running total value of the TranAmt column for each account.
The code to create the sample data is (copied from the above article):
DECLARE @SampleData TABLE (
AccountId INTEGER,
TranDate DATETIME,
TranAmt NUMERIC(8,2));
INSERT INTO @SampleData
(AccountId, TranDate, TranAmt)
VALUES (1, '20110101', 500),
(1, '20110115', 50),
(1, '20110122', 250),
(1, '20110124', 75),
(1, '20110126', 125),
(1, '20110128', 175),
(2, '20110101', 500),
(2, '20110115', 50),
(2, '20110122', 25),
(2, '20110123', 125),
(2, '20110126', 200),
(2, '20110129', 250),
(3, '20110101', 500),
(3, '20110115', 50 ),
(3, '20110122', 5000),
(3, '20110125', 550),
(3, '20110127', 95 ),
(3, '20110130', 2500);
and the code to perform the running totals and percent of total is:
WITH cte AS
(
SELECT AccountId ,
TranDate ,
TranAmt,
-- running total of all transactions
RunTotalAmt = SUM(TranAmt)
OVER (PARTITION BY AccountId
ORDER BY TranDate)
FROM @SampleData
)
SELECT AccountId,
TranDate,
TranAmt,
RunTotalAmt,
PercentOfTotal = CONVERT(NUMERIC(5,2), TranAmt * 100 / LAST_VALUE(RunTotalAmt) OVER (PARTITION BY AccountId ORDER BY TranDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) FROM cte
ORDER BY AccountId, TranDate;
The highlighted lines show the LAST_VALUE function. The OVER clause is utilizing all three parts: the PARTITION BY is to restart the calculation for each AccountId, the ORDER BY is on TranDate to ensure that I get the proper row for the last value, and the ROWS clause is utilized in order to get the last value from all of the rows for this account – if it hadn’t had been specified, it would have used the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW, and that would have every row’s percent being 100% – not what we want!
This code produces this result set:
AccountId TranDate TranAmt RunTotalAmt PercentOfTotal
----------- ----------------------- ------- ----------- --------------
1 2011-01-01 00:00:00.000 500.00 500.00 42.55
1 2011-01-15 00:00:00.000 50.00 550.00 4.26
1 2011-01-22 00:00:00.000 250.00 800.00 21.28
1 2011-01-24 00:00:00.000 75.00 875.00 6.38
1 2011-01-26 00:00:00.000 125.00 1000.00 10.64
1 2011-01-28 00:00:00.000 175.00 1175.00 14.89
2 2011-01-01 00:00:00.000 500.00 500.00 43.48
2 2011-01-15 00:00:00.000 50.00 550.00 4.35
2 2011-01-22 00:00:00.000 25.00 575.00 2.17
2 2011-01-23 00:00:00.000 125.00 700.00 10.87
2 2011-01-26 00:00:00.000 200.00 900.00 17.39
2 2011-01-29 00:00:00.000 250.00 1150.00 21.74
3 2011-01-01 00:00:00.000 500.00 500.00 5.75
3 2011-01-15 00:00:00.000 50.00 550.00 0.58
3 2011-01-22 00:00:00.000 5000.00 5550.00 57.50
3 2011-01-25 00:00:00.000 550.00 6100.00 6.33
3 2011-01-27 00:00:00.000 95.00 6195.00 1.09
3 2011-01-30 00:00:00.000 2500.00 8695.00 28.75
As you can see, the PercentOfTotal column calculates the percentage between the current row’s TranAmt and the TranAmt total for that account.
Hopefully, by seeing some examples, you now have some ideas of how you can utilize these new functions.