Blog Post

T-SQL Enhancements: OVER clause

,

With SQL Server Denali there has been made some improvements to the OVER clause, which I wasn’t even aware of until today. I was playing around with a few af the new analytical functions, and suddenly struggled to get the results I was expecting. After some investigation, I found the solution in books online.

The basic syntax of the OVER clause from books online:

OVER ([ <PARTITION BY clause> ]   [ <ORDER BY clause> ]   [ <ROW or RANGE clause> ]   )

The ROW or RANGE clause is the new stuff I will be talking about today.

This last bit is what defines the window frame, which gives you the abillity to widen or narrow the frame you are looking at. Let me setup a simple example:

CREATE TABLE Employees (
    EmployeeId INT IDENTITY PRIMARY KEY,
    Name VARCHAR(50),
    HireDate DATE NOT NULL,
    Salary INT NOT NULL
)
GO
INSERT INTO Employees (Name, HireDate, Salary)
VALUES
    ('Alice', '2011-01-01', 20000),
    ('Brent', '2011-01-15', 19000),
    ('Carlos', '2011-02-01', 22000),
    ('Donna', '2011-03-01', 25000),
    ('Evan', '2011-04-01', 18500)
GO

I have now created a simplified version of the Employee table from Adventureworks. I have five employees with different salaries, and let’s just pretend they all work in the same department. I could now ask the question: What are the salary of each employe, and what are the average salary over all? That’s simple:

 

SELECT
    Name,
    Salary,
    AVG(Salary) OVER(ORDER BY HireDate) AS avgSalary
FROM Employees
GO

image

Wait, I was actually expecting the avgSalary column to contain the same value all over, but that is not the case. Why? If you look closer, the avgSalary column actually does give you the average salary, but only in the window frame from the first row until the current row. So when we look at the first row, the window frame only contains that single row, and therefore the avgSalary is also 20000.

When we look at the second row, the window frame expands to contain rows 1-2, and this gives an average of (20000+19000)/2 = 19500.

The solution is to tell SQL Server that I want the window frame to contain all the rows, and this can be done like this:

SELECT
    Name,
    Salary,
    AVG(Salary) OVER(ORDER BY HireDate
                        RANGE
                        BETWEEN UNBOUNDED PRECEDING
                        AND UNBOUNDED FOLLOWING
                    ) AS avgSalary
FROM Employees
GO

Here we specify the window frame to range between “unbounded preceding” and “unbounded following”. This means that the range goes from the first row in the partition all the way to the final row in the partition. This gives us the expected result, where the avgSalary contains the actual average of all employees:

image

If we look in the documentation, we find this:

“If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame”.

Aha, the default window frame is exactly how we figured out above, and not ROWS BETWEEN  UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING as I would have expected.

So, what else can we do with this feature? If we would like to compare each employees salary with the persons hired around the same time, we could specify that the window frame should only contain the previous row, the current row, and the next row. This gives us a total of three rows in the frame. This could be specified like this:

SELECT
    Name,
    HireDate
    Salary,
    AVG(Salary) OVER(ORDER BY HireDate
                        ROWS
                        BETWEEN 1 PRECEDING
                        AND 1 FOLLOWING
                    ) AS avgSalary
FROM Employees
GO

Notice how we now need to specify ROWS instead of RANGE, and we simply write the number of rows preceding and following instead of the keyword UNBOUNDED.

The result:

image

The five employees have been ordered by their HireDate, so the persons above and below the current row have similar seniority. Because of the sliding window frame of 3 rows in total, the avgSalary columns now gives us the average salary of Donna and the two colleagues with similar seniority.

There are a few other combinations you can play around with, and they are documented here: http://msdn.microsoft.com/en-us/library/ms189461%28v=sql.110%29.aspx

Next week we will look into the FIRST_VALUE() and LAST_VALUE() functions which goes very well together with the possibility of sliding windows frames.

@HenrikSjang

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating