Blog Post

Basic Window Functions–#SQLNewBlogger

,

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Window functions are a class of functions that dramatically improve the performance of certain types of queries. We often think of these as aggregates, but there are ranking and analytic functions that can be used as well.

This is a basic post looking at the outline of these functions and their structure. I’ll look at a few more details in future posts.

The OVER() Clause

The idea in many of these functions is to create a window on which a function can operate and perform some computation over a row in the window. When I first heard of this, I wasn’t sure this was that powerful, but now I find these functions to be incredibly useful for many aggregates, especially because I can get things like a SUM() without requiring a GROUP BY.

The main way we define the window is with the OVER() clause. This comes after the function and defines a partition and ordering for the rowset, one which the window is then applied. That sounds more complex than it really is, so let’s look at a simple example.

I have a number of baseball statistics in a table. I’ll look at one player, Barry Bonds, and his career across a couple teams. If I look at the data, I see:

2021-07-06 12_58_18-SQLQuery1.sql - ARISTOTLE.BaseballData (ARISTOTLE_Steve (51))_ - Microsoft SQL S

If I want to sum something like home runs, I can do that with the team easily like this:

2021-07-06 13_00_35-SQLQuery1.sql - ARISTOTLE.BaseballData (ARISTOTLE_Steve (51))_ - Microsoft SQL S

I have a sum and a team, and a GROUP BY. What if I wanted to add something in there, like the at bats by year. Maybe I want to add some other text fields, which is easy to do, but every time I add a non-aggregate to the column list, I need to also add it to the GROUP BY. That’s a pain, and it’s cumbersome. Not only that, I’m limited to looking at the data in the same way. I might want an overall average number of at-bats, or maybe an average by team.

With the OVER clause, I can have different views of the rowset as applied to the function. In this query below, I order one rowset, but not the other. I could have partitioned by different values if I wanted, but that doesn’t help here.

2021-07-06 13_07_16-SQLQuery1.sql - ARISTOTLE.BaseballData (ARISTOTLE_Steve (51))_ - Microsoft SQL S

Here the average is calculated on the partition, not on the entire group, which is what I’d have before the OVER() clause. I could even add other detail data, like the league, and I wouldn’t have to alter multiple clauses.

There are two key concepts here for the basic OVER clause that I want to discuss, and more details will come in other posts.

Partition BY

The partition by separates the rowset into groups, much like the GROUP BY does. In this case, we note that we want to break our data into groups based on the value of a column (or columns).

This can be different for each OVER() clause, as you can see below. Now the average is teh same for all rows, as Mr. Bonds only played in one league.

2021-07-06 13_12_26-SQLQuery1.sql - ARISTOTLE.BaseballData (ARISTOTLE_Steve (51))_ - Microsoft SQL S

More details in a future post.

Order By (Inside OVER)

The ORDER BY is the ordering of the rows inside of the partition. In this case, the order doesn’t matter, but in some cases it might. With some functions it matters, so this can change the way your query works.

This works just like the ORDER BY clause at the end of a SELECT query, but when inside the parenthesis, this applies only to the partitions.

Summary

This is a very basic look at Window functions. This is the basic view of what these are, and how they might be used in queries. There are more options, and more to know, but this is a basic look at how you might think about building aggregate queries without a GROUP BY, better performance, and in my mind, easier to read.

SQLNewBlogger

I had to write a query recently and chose an window function because it simplified the code. It also performed very well. This is a start of a few posts based on that work, showing that I have some knowledge on how to use these features in SQL.

This post took about 10 minutes to write, mostly me structuring the example from a dataset I have, and then about 5 minutes to pull some things out of this post to focus on the basic part of building a query. I can expand on this in other posts.

This is a pattern you can easily follow in your own blog to show that you understand some concept in T-SQL, in Azure, or really any topic. Enough of these and they’ll help drive your interview to subjects you know something about.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

4 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (2)

You rated this post out of 5. Change rating