A couple a weeks ago I blogged about a few of the enhancements in the OVER clause, and now I will show you guys a couple of new windowing functions, that goes along with the OVER clause. These new functions are FIRST_VALUE() and LAST_VALUE().
I’ll use the same setup as I did in the OVER clause post:
CREATE TABLE Employees ( EmployeeId INT IDENTITY PRIMARY KEY, Name VARCHAR(50), HireDate DATE NOT NULL, Salary INT NOT NULL ) GO
INSERT INTO Employees (Name, HireDate, Salary) VALUES ('Alice', '2011-01-01', 20000), ('Brent', '2011-01-15', 19000), ('Carlos', '2011-02-01', 22000), ('Donna', '2011-03-01', 25000), ('Evan', '2011-04-01', 18500) GO
Last time I found the average salary of the entire group, and also controlled the size of the window frame. Now I would like to know the name, salary and hire date of the person with the higest salary. In the good old days we would solve this by using a mess of subqueries, but now we can do this quite elegantly:
SELECT EmployeeId, Name, Salary, HireDate, LAST_VALUE(Name) OVER( ORDER BY Salary RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS HighestSalaryName, LAST_VALUE(Salary) OVER( ORDER BY Salary RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS HighestSalary, LAST_VALUE(HireDate) OVER( ORDER BY Salary RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS HiredateOfHighestSalary FROM Employees ORDER BY EmployeeId GO
This resultset gives us one row per employee, but with the following columns added: HigestSalaryName, HighestSalary and HireDateOfHigestSalary.
The function LAST_VALUE(x) returns the value of column x in the last row of the window frame defined in the OVER clause. I have defined the ORDER BY clause to order by Salary, so the FIRST_VALUE() will look at the row with the lowest salary, and the LAST_VALUE() will look at the row with the higest salary.
The beauty of these functions is, that we can return any of the columns in the dataset given any order by clause. If we think about the MIN(x) and MAX(x) functions, these can only return the value of column x ordered by column x.
Again we can add the new ROWS or RANGE clause to narrow down the window frame, but in the example above I just wanted to look at the entire dataset as one group.
@HenrikSjang



Subscribe to this blog
Briefcase
Print
Posted by Anonymous on 30 August 2011
Pingback from Dew Drop – August 30, 2011 | Alvin Ashcraft's Morning Dew
Posted by Jason Brimhall on 30 August 2011
Thanks for the nice examples.
Posted by Carlo Romagnano on 31 August 2011
Thank you for the news.
Posted by peter.row on 31 August 2011
Syntax seems to be a bit ugly compared to a simple select ... from employess cross join (select max(...) as highest
When you say "The beauty of these functions is, that we can return any of the columns in the dataset given any order by clause. If we think about the MIN(x) and MAX(x) functions, these can only return the value of column x ordered by column x." is that correct? Using MIN or MAX you would not technically have had to order by anything, would you?
What would have been interesting would be a performance comparision between the example you gave and one without the new syntax.
Posted by Robert L Davis on 31 August 2011
Peter, you can't specify a range of data for a MIN or MAX. Plus, you can't simply add a MIN or MAX to a query with other columns unless you are using a GROUP BY or the MIN/MAX is contained within a subquery, both of which is much much worse for performance.
Posted by Charles Kincaid on 31 August 2011
Then look a geo data. In a trip where I'm gathering co-ordinate points I need to know the first Lat/Long of the trip and the last Lat/Long of the trip. So I'd be pulling LAST_VALUE(Coordinate) but the OVEr would be date based. Without this I'd have to use a CTE to get the first co-ordinate and another to get the last.
Wonderful syntax aid. Now how about query plans and performaqnce?