Using LAG

Question

Post reply

Using LAG

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

dstrickrott SSCommitted Points: 1951 More actions · Answer 1

dstrickrott

SSCommitted

Points: 1951

April 15, 2015 at 2:37 pm

#1791204

Good information. Thanks for the question Kathi.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 2

RLilj33 (4/15/2015)
On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?

The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.

For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.

Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.

But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.

Cheers,

Hugo

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Koen Verbeeck SSC Guru Points: 259207 More actions · Answer 3

Easy one, thanks.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259207 More actions · Answer 4

Iwas Bornready (4/15/2015)
Thanks for the question. I didn't realize this even existed.

You are missing out a lot. 😉

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259207 More actions · Answer 5

Hugo Kornelis (4/15/2015)
RLilj33 (4/15/2015)
On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?
The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.
For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.
Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.
But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.

Also, if you use the ORDER BY clause in the OVER() clause, remember to specify a frame extent using ROWS, since the default is RANGE which is usually (almost all the time) slower. LAG and LEAD don't have a frame extent though.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

RLilj33 SSCrazy Points: 2203 More actions · Answer 6

Koen Verbeeck (4/17/2015)
Hugo Kornelis (4/15/2015)
RLilj33 (4/15/2015)
On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?
The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.
For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.
Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.
But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.
Also, if you use the ORDER BY clause in the OVER() clause, remember to specify a frame extent using ROWS, since the default is RANGE which is usually (almost all the time) slower. LAG and LEAD don't have a frame extent though.

Thank you Hugo & Koen!

TomThomson SSC Guru Points: 104768 More actions · Answer 7

Hugo Kornelis (4/15/2015)
But note that supplying an offset only produces the expected results if you can be sure that there is exactly one row per month in the sales table. (Or, if you use the PARTITION BY clause, exactly one row per distinct set of values of the partition columns and per month).
LAG simply counts rows, it does not look at the actual values.

I think that the wording of the question makes it quite clear that one is looking at something that delivers the sales for a month - after all, it's made absolutely clear that that's what is being compared. What makes you think that the that it is not sensible to assume (given the wording of the question) that the query refers to a table rather than to some row set delivered by various joins, aggregates, and restrictions which result in monthly data?

Tom

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 8

TomThomson (4/18/2015)
Hugo Kornelis (4/15/2015)
But note that supplying an offset only produces the expected results if you can be sure that there is exactly one row per month in the sales table. (Or, if you use the PARTITION BY clause, exactly one row per distinct set of values of the partition columns and per month).
LAG simply counts rows, it does not look at the actual values.
I think that the wording of the question makes it quite clear that one is looking at something that delivers the sales for a month - after all, it's made absolutely clear that that's what is being compared. What makes you think that the that it is not sensible to assume (given the wording of the question) that the query refers to a table rather than to some row set delivered by various joins, aggregates, and restrictions which result in monthly data?

Oh, I absolutely am with you on the assumption that the data being queries is already aggregated on a per-month basis. I now, upon rereading, realise that my comment can be interpreted ambiguously.

When writing that comment, I was only thinking of data that, for whatever reason, misses months.

(In some businesses, it is not uncommon for a salesperson to sometimes have to work multiple months on a single sale, so aggregating per salesperson and per month can easily result in getting less than 12 rows per year for a salesperson - if you need to compare to last year, you either have to use a different method, or you have to join to a calendar table to guarantee that every month is present)

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

TomThomson SSC Guru Points: 104768 More actions · Answer 9

Hugo Kornelis (4/19/2015)
Oh, I absolutely am with you on the assumption that the data being queries is already aggregated on a per-month basis. I now, upon rereading, realise that my comment can be interpreted ambiguously.
When writing that comment, I was only thinking of data that, for whatever reason, misses months.

Ah, I see, you were just excluding the zero case, not worrying about the more thatn one case - in that case, I agree with you.

Tom

OceanDrop SSC-Addicted Points: 487 More actions · Answer 10

I don't think you need offset and I am not sure if it is reliable for all the data. Let's say if they don't have any data last year Jan but we are trying to compare with this Jan data then 12 will not be accurate?

I think simple way to do this is,

SELECT

YEAR(OrderDate) AS OrderYear, month(OrderDate) orderMOnth, sum(TotalDue) AS Total

, LAG(sum(TotalDue))

OVER(PARTITION BY month(OrderDate)

ORDER BY month(OrderDate), YEAR(OrderDate)) AS lastyearmonth_Total

FROM Sales.SalesOrderHeader

GROUP BY YEAR(OrderDate),month(OrderDate)

Order by month(OrderDate),YEAR(OrderDate)

Kathi Kellenberger SSChampion Points: 11811 More actions · Answer 11

Nice solution!

Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor