Windows Functions: Tell me when that changes.

  • michael.f.morin

    Old Hand

    Points: 386

    Comments posted to this topic are about the item Windows Functions: Tell me when that changes.

  • hartmann 74688

    Old Hand

    Points: 346

    Just to make it clear, they are windowing functions or window functions.

    Windows functions sounds like API of the OS :).

    Good Article, great reminder!

  • Alan Burstein

    SSC Guru

    Points: 61037

    hartmann 74688 (2/17/2016)


    Just to make it clear, they are windowing functions or window functions.

    Windows functions sounds like API of the OS :).

    Good Article, great reminder!

    I noticed that too (and have made the same mistake). I don't know the SSC protocol for correcting articles but I hope that gets sorted out so this excellent article has a better chance of showing up earlier in a Google search.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • michael.f.morin

    Old Hand

    Points: 386

    Thanks for the comments guys. Nomenclature is important and I will see what SSC can do to update the article.

  • AdamE

    SSC Enthusiast

    Points: 131

    Alternative method for those without LEAD() and LAG()

    WITH cteSAMPLEDATA

    AS (--Create Sample data

    SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM

    (VALUES

    (1, 'ORDER1', 'DS', N'20151001'),

    (2, 'ORDER2', 'DS', N'20151001'),

    (5, 'ORDER2', 'DS', N'20151002'),

    (3, 'ORDER3', 'DS', N'20151001'),

    (6, 'ORDER3', 'AG', N'20151002'),

    (8, 'ORDER3', 'AG', N'20151003'),

    (4, 'ORDER4', 'DS', N'20151001'),

    (7, 'ORDER4', 'AG', N'20151002'),

    (9, 'ORDER4', 'DS', N'20151003')

    )

    AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE)

    )

    SELECT A.ID

    ,A.ORDER_ID

    ,A.CODE

    ,A.ORDER_DATE AS ORDER_DATE

    ,B.ID AS PREVIOUS_ID

    ,B.CODE AS PREVIOUS_CODE

    ,B.ORDER_DATE AS PREVIOUS_ORDERDATE

    FROM (

    SELECT DISTINCT X.*

    FROM cteSAMPLEDATA C

    CROSS APPLY(

    SELECT TOP 1 *

    FROM cteSAMPLEDATA Z

    WHERE Z.ORDER_ID = C.ORDER_ID

    ORDER BY Z.ORDER_DATE DESC

    ) X

    ) A

    CROSS APPLY(

    SELECT TOP 1 *

    FROM cteSAMPLEDATA Z

    WHERE Z.ORDER_ID = A.ORDER_ID

    AND Z.ORDER_DATE<A.ORDER_DATE

    ORDER BY Z.ORDER_DATE DESC

    ) B

    WHERE A.CODE<>B.CODE

  • dgreen-1126628

    SSC-Addicted

    Points: 443

    Great article. And since you are going to clean it up for "Windows" vs "Windowing", you can clean up the following typos, too 🙂

    1. "Meaning it is the only order whose most recent date (2015-10-03) has a different code (DS) that the one before it (AG) from 2015-10-02"

    ("that" should be "than")

    2. “(note this is for SQL Server 2012+ and Azure):

    “Accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2016. ..."

    (confusing you quote from SQL Server 2016 when saying it is ok in 2012+)

    3. "Since our rows as sorted by ORDER_DATE descending, that would be what we want." ("as" should be "are")

    4. "LEAD(H1.CODE,1) --Get the code 1 row after the current row.

    OVER (PARTITION BY ORDER_ID

    ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE"

    "LEAD(H1.CODE,1) --Get the code 1 row before the current row.

    OVER (PARTITION BY ORDER_ID

    ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE" (first comment says "after" but second occurance says "before")

  • timwell

    SSCarpal Tunnel

    Points: 4961

    Thanks Michael for doing this article. I thought it was a good explanation and example.

    However I was also distracted by the window/windows difference and some typos / misspellings.

    Here are a couple more I found:

    "They wanted to know when the most resent order’s code was different from the one previous to it." (I think you meant "recent," not that it was re-sent. It seems to be correct everywhere else)

    "In this case we want to partition our recordset by ORDER_ID, essentially grouping all rows with the sam ORDER_ID together."

    (sam/same)

  • Stephen Rybacki

    Hall of Fame

    Points: 3216

    Great content, other than the typos and nomenclature things others have already mentioned. Thanks!

  • j-1064772

    SSCertifiable

    Points: 5316

    Re: Alternative method for those without LEAD() and LAG()

    The original code WITH LEAD() and LAG() seems easier to read for me.

    The alternative method would apply in a version of SSQL which does not support LEAD or LAG. Otherwise, time to look at using the new features.

    Maybe someone who is an old hand at this could compare the actual execution plans to see which method scales better ?

  • davidawest00

    SSC Enthusiast

    Points: 123

    I have encountered this precise scenario a dozen times.

    You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.

    In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.

    Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.

    Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.

  • j-1064772

    SSCertifiable

    Points: 5316

    davidawest00 (2/17/2016)


    I have encountered this precise scenario a dozen times.

    You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.

    In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.

    Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.

    Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.

    Good explanation.

    Just out of curiousity - how many records took 20 minutes ?

  • davidawest00

    SSC Enthusiast

    Points: 123

    It was to get the latest + previous purchase per customer...about 40 million out of 60-80 million rows...twice.

    Once for current and once for previous.

  • Luis Cazares

    SSC Guru

    Points: 183567

    These articles with real life examples are great to help people get rid of cursors and change the mentalities.

    I just wanted to make 2 comments:

    For sample data, create a table (even a temp table) to prevent repeating the creation of sample data each time to concentrate in the real solution. This will also help define the correct data types.

    CREATE TABLE MYTABLE(

    ID int,

    ORDER_ID char(6),

    CODE char(2),

    ORDER_DATE date

    );

    INSERT INTO MYTABLE

    VALUES

    (1, 'ORDER1', 'DS', '20151001'),

    (2, 'ORDER2', 'DS', '20151001'),

    (5, 'ORDER2', 'DS', '20151002'),

    (3, 'ORDER3', 'DS', '20151001'),

    (6, 'ORDER3', 'AG', '20151002'),

    (8, 'ORDER3', 'AG', '20151003'),

    (4, 'ORDER4', 'DS', '20151001'),

    (7, 'ORDER4', 'AG', '20151002'),

    (9, 'ORDER4', 'DS', '20151003');

    Second, I found that there might be an alternative that can give a simpler plan. I'm not sure if that means better performance, but it usually helps. The secret is to reuse your windows as much as possible.

    WITH cteGET_MAX_DATE_AND_CODE

    AS (

    SELECT

    H1.ID,

    H1.ORDER_ID,

    H1.CODE,

    H1.ORDER_DATE,

    ROW_NUMBER() --Identify the latest row for each ORDER

    OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) AS ROW_NUM,

    LEAD(H1.CODE,1) --Get the code 1 row before the current row.

    OVER (PARTITION BY ORDER_ID

    ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE

    FROM MYTABLE H1

    )

    SELECT

    ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE

    FROM cteGET_MAX_DATE_AND_CODE

    WHERE

    ROW_NUM = 1

    AND CODE != PREVIOUS_CODE

    AND PREVIOUS_CODE IS NOT NULL;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • AdamE

    SSC Enthusiast

    Points: 131

    Agreed, always look to use new features.

    We have a couple of old 2008R2 servers and are sometimes forced to go old school on a few queries, hence my post.

    I'd be interested in the analysis of the execution plans if someone wants to step up?

  • j-1064772

    SSCertifiable

    Points: 5316

    The execution plans do not tell the whole story. I have seen some cases by Gila Monster where the execution was simplified but the load on the server was not improved (roughly same execution time.

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply