How LAG compares to other techniques

LAG pulls a column from another row without a self-join. In this article, Kathi Kellenberger shows how LAG compares to other techniques

Microsoft introduced the first window (aka, windowing or windowed) functions with SQL Server 2005. These functions were ROW_NUMBER, RANK, DENSE_RANK, NTILE, and the window aggregates. Many folks, including myself, used these functions without realizing they were part of a special group. In 2012, Microsoft added several more: LAG and LEAD, FIRST_VALUE and LAST_VALUE, PERCENT_RANK and CUME_DIST, PERCENTILE_CONT, and PERCENTILE_DISC. They also added the ability to do running totals and moving calculations.

These functions were promoted as improving performance over older techniques, but that isn’t always the case. There were still performance problems with the aggregate functions introduced in 2005 and the four of the functions introduced in 2012. In 2019, Microsoft introduced Batch Mode on Row Store, available on Enterprise and Developer Editions, that can improve the performance of window aggregates and the four statistical functions from 2012.

I started writing this article to compare some window function solutions to traditional solutions. I found that there were so many ways to write a query that includes a column from another row that this article is dedicated to the window functions LAG and LEAD.

Include a column from another row using LAG

Including a column from another row typically means some a self-join. The code is somewhat difficult to write and doesn’t perform well. The LAG function can be used to pull in the previous row without a self-join. As long as there are adequate indexes in place, using LAG solves those problems. LEAD works the same way as LAG, except that it grabs a later row.

Here’s an example using LAG in the AdventureWorks database:

This query returns a list of products and the dates they were ordered. It uses the DATEDIFF function to compare the current order date to the prior order date determined with LAG. The OVER clause uses a PARTITION BY on ProductID to ensure that different products are not compared. The OVER clause for LAG requires an ORDER BY. It is ordered by OrderDate since that is how the rows should be lined up to find the previous date. Figure 1 shows the partial results.

Image showing the partial results

Figure 1. The partial results of the query using LAG

The query ran in under a second, about 300 ms, and required just 365 logical reads, as shown in Figure 2.

Image showing logical reads from LAG

Figure 2. The logical reads of the LAG query

I tried several methods to see if it was possible to write a query that performed well without LAG. Even though the database is small, the queries took some time to run, depending on the technique.

Self-join

The self-join technique is painfully slow. Keeping the first order row of each ProductID in the results required LEFT JOINS, but this was so slow, I killed the query after a few minutes. Instead, the following query omits the NULL row for each product.

The query ran in 20 seconds and had 3,103 logical reads, as shown in Figure 3.

Statistics IO from self-join

Figure 3. Logical reads of self-join query

The query uses the MAX function To find the previous OrderDate and filters to find rows in SOH2 with OrderDate less than the OrderDate in SOD.

Derived table

Is it possible to improve the performance of the self-join with a derived table? Here’s a query to find out:

I was surprised to find that this query ran in 2 seconds even though the logical reads were much higher!

Statistics IO from derived table

Figure 4. Logical reads of derived table query

That’s quite good, though not as good as the query with LAG.

Common table expression

Another way to solve the query is with a common table expression (CTE):

 

This query also ran in 2 seconds with the same logical reads as the derived table. What’s going on here? The CTE is not saving the results to be reused – the tables are accessed twice – but the joins and some other operators are different. The optimizer was able to come up with better plans by using derived tables or a CTE.

OUTER APPLY

The APPLY operator is often used to improve the performance of queries. There are two flavors: CROSS APPLY and OUTER APPLY, similar to JOIN and LEFT JOIN. The APPLY operator can be used to solve many interesting queries, and in this case, it OUTER APPLY replaces the LEFT JOIN.

This query ran in 12 seconds and had a whopping 86,281,577 logical reads!

Statistic IO from OUTER APPLY

Figure 5. Logical reads from OUTER APPLY query

OUTER APPLY is acting like a function call in this scenario calling OUTER APPLY once for each row in the outer query, which would probably not be the case if the less than operator was not involved. 

It’s possible to use TOP(1) instead of MAX, but then the query inside OUTER APPLY must also be ordered, and the results are not any better.

Temp Table

Since whatever method is used needs a distinct list of ProductID, Name, and OrderDate, these rows could be stored in a temp table.

Here’s an example where the temp table was used with OUTER APPLY:

The temp table drastically improved the time, 1.4 seconds. The actual tables were only touched once to create the temp table. Then the temp table was scanned twice and joined with a Nested Loop.

Image showing execution plan for temp table query

Figure 6. Partial execution plan when using a temp table

Adding an index to the temp table might improve the performance even more. This script ran in about 700 ms.

The self-join was also improved with the temp table, returning in just 3 seconds even while using LEFT JOIN, which wasn’t possible before.

 

Scalar user-defined function

The performance of user-defined scalar functions was improved in 2019 with inlining. I had hoped that the performance would not be too bad since SQL Server is running version 2019. Unfortunately, the less than operator (<) kills the performance. Even adding in an index on OrderDate didn’t help much. I killed the query after 25 seconds.

Since I killed the query before it completed, I am not sure how long it would keep running. The execution looks simple enough, but it doesn’t show that the function is called many times.

Execution plan for UDF

Figure 7. The execution plan for the scalar UDF

Table-valued functions

I’ve often heard someone say, “just turn it into a table-valued function” when UDF issues arise. However, it’s still possible to do “bad things” with table-valued functions as well. There are two types of table-valued functions, multi-statement and inline. Multi-statement table-valued functions (MSTVF) can have loops, IF blocks, and table variables, so they do not scale well.

Inline table-valued functions (ITVF) only allow a single query. Of course, the function could contain a poorly written query, but generally, you will see better performance with these. In this case, however, the performance is still not as good as using LAG. Note that the call to the ITVF also uses OUTER APPLY.

The ITVF takes about 12 seconds to run with over 59 million logical reads when using the temp table.

Execution plan for ITVF and temp table

Figure 8. The logical reads for the ITVF

Cursor

I debated whether to include a section on cursors because I don’t want to encourage anyone to start with a cursor solution. However, I remembered other situations when a cursor solution performed better compared to other techniques, so I decided to include it. Cursors are another tool in your T-SQL toolbox. They may be at the bottom of the box and a bit rusty from disuse, but a tool, nonetheless.

One important thing to note about running the cursor solution is to turn off the Actual Execution Plan (or any other method you might be using to capture execution plans) and Statistics. With those turned off, the script took about 2 seconds to run!

How LAG compares to other techniques?

There are probably even more ways to write the query (once someone in a presentation insisted that a view would always outperform LAG), but it’s not likely that any other method runs faster than LAG when you need a column from the previous row. Here are the results for each technique:

Technique

Indexed temp table?

Time

Logical reads

LAG

No

300 ms

365

Self-join

No

20 sec

3,103

Derived table

No

2 sec

127,723

Common table expression

No

2 sec

127,730

OUTER APPLY

No

12 sec

86,281,577

OUTER APPLY

Yes

700 ms

57,358

Self-join

Yes

3 sec

452

Scalar UDF

Yes

Killed the query

Unknown

Inline table-valued function

Yes

12 sec

59,622,091

Cursor

Yes

2 sec

Unknown

As long as OUTER APPLY had the pre-aggregated temp table to work with, it performed almost as well as LAG. Otherwise, the other methods ran in 2 seconds or more.

For this specific problem, LAG performed the best. The lesson to learn here is that there are many ways to write a query, so try other techniques when you experience issues with performance.

If you like this article, you might also like Introduction to T-SQL Window Functions