Using LAG

  • Stephanie Giovannini (4/15/2015)


    Outer join to a calendar table to ensure that there are no missing months, order by the date, and it should work.

    Good practice when doing reporting. This allows SQL to do the heavy lifting instead of a different rendering tool, whether it be SSRS or something else.

    On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?

  • Good information. Thanks for the question Kathi.

  • RLilj33 (4/15/2015)


    On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?

    The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.

    For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.

    Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.

    But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.

    Cheers,

    Hugo


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Easy one, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Iwas Bornready (4/15/2015)


    Thanks for the question. I didn't realize this even existed.

    You are missing out a lot. 😉

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hugo Kornelis (4/15/2015)


    RLilj33 (4/15/2015)


    On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?

    The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.

    For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.

    Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.

    But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.

    Also, if you use the ORDER BY clause in the OVER() clause, remember to specify a frame extent using ROWS, since the default is RANGE which is usually (almost all the time) slower. LAG and LEAD don't have a frame extent though.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (4/17/2015)


    Hugo Kornelis (4/15/2015)


    RLilj33 (4/15/2015)


    On that note, I know that the ORDER BY clause is in general a performance killer. Can anyone speak to / point to a good article on / how the analytic functions perform - i.e. is it better to use the analytic functions, or to off-load those tasks to another tool?

    The ORDER BY clause in itself is not a performance killer. But it often (not always!) results in sort operators being injected in the query plan, and they do tend to be slow.

    For all variations of the OVER() clause, introducing sort operators can be avoided if you have an index on first the columns in the PARTITION BY clause, in any order; and then the columns in the ORDER BY clause, in matching order. The index must also cover the column used in the aggregate or analytic function, and ideally covers the entire query.

    Also, if a query already has an ORDER BY clause at the end that lists first all PARTITION BY columns and then all ORDER BY columns, then the execution plan will already need to sort on those columns and introducing the OVER() clause will usually not incur additional cost.

    But if you have a query that currently does not order (or at least not in that order) and there is no good supporting index - then yes, the analytics function will require SQL Server to introduce a sort operator in the plan.

    Also, if you use the ORDER BY clause in the OVER() clause, remember to specify a frame extent using ROWS, since the default is RANGE which is usually (almost all the time) slower. LAG and LEAD don't have a frame extent though.

    Thank you Hugo & Koen!

  • Hugo Kornelis (4/15/2015)


    But note that supplying an offset only produces the expected results if you can be sure that there is exactly one row per month in the sales table. (Or, if you use the PARTITION BY clause, exactly one row per distinct set of values of the partition columns and per month).

    LAG simply counts rows, it does not look at the actual values.

    I think that the wording of the question makes it quite clear that one is looking at something that delivers the sales for a month - after all, it's made absolutely clear that that's what is being compared. What makes you think that the that it is not sensible to assume (given the wording of the question) that the query refers to a table rather than to some row set delivered by various joins, aggregates, and restrictions which result in monthly data?

    Tom

  • TomThomson (4/18/2015)


    Hugo Kornelis (4/15/2015)


    But note that supplying an offset only produces the expected results if you can be sure that there is exactly one row per month in the sales table. (Or, if you use the PARTITION BY clause, exactly one row per distinct set of values of the partition columns and per month).

    LAG simply counts rows, it does not look at the actual values.

    I think that the wording of the question makes it quite clear that one is looking at something that delivers the sales for a month - after all, it's made absolutely clear that that's what is being compared. What makes you think that the that it is not sensible to assume (given the wording of the question) that the query refers to a table rather than to some row set delivered by various joins, aggregates, and restrictions which result in monthly data?

    Oh, I absolutely am with you on the assumption that the data being queries is already aggregated on a per-month basis. I now, upon rereading, realise that my comment can be interpreted ambiguously.

    When writing that comment, I was only thinking of data that, for whatever reason, misses months.

    (In some businesses, it is not uncommon for a salesperson to sometimes have to work multiple months on a single sale, so aggregating per salesperson and per month can easily result in getting less than 12 rows per year for a salesperson - if you need to compare to last year, you either have to use a different method, or you have to join to a calendar table to guarantee that every month is present)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (4/19/2015)


    Oh, I absolutely am with you on the assumption that the data being queries is already aggregated on a per-month basis. I now, upon rereading, realise that my comment can be interpreted ambiguously.

    When writing that comment, I was only thinking of data that, for whatever reason, misses months.

    Ah, I see, you were just excluding the zero case, not worrying about the more thatn one case - in that case, I agree with you.

    Tom

  • I don't think you need offset and I am not sure if it is reliable for all the data. Let's say if they don't have any data last year Jan but we are trying to compare with this Jan data then 12 will not be accurate?

    I think simple way to do this is,

    SELECT

    YEAR(OrderDate) AS OrderYear, month(OrderDate) orderMOnth, sum(TotalDue) AS Total

    , LAG(sum(TotalDue))

    OVER(PARTITION BY month(OrderDate)

    ORDER BY month(OrderDate), YEAR(OrderDate)) AS lastyearmonth_Total

    FROM Sales.SalesOrderHeader

    GROUP BY YEAR(OrderDate),month(OrderDate)

    Order by month(OrderDate),YEAR(OrderDate)

  • Nice solution!

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply