Identifying Start Dates not Aligned with End Dates on a Prior Row

  • Dwain Camps

    SSC Guru

    Points: 86893

    Comments posted to this topic are about the item Identifying Start Dates not Aligned with End Dates on a Prior Row


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • starunit

    SSCommitted

    Points: 1807

    Nice article, Dwain.

    I've been using the CTE approach, and hadn't thought about using Cross Apply - but it looks like that's a bit slower anyway. And I'll try to remember the LAG() fuction when we move to SQL-2012

    Nevertheless, I'll bookmark your article for later reference.

    Thanks!

    Mark
    Just a cog in the wheel.

  • ken.trock

    SSCertifiable

    Points: 5147

    Interesting article; Dwain requires that, for a given product, the EffectiveStartDT of the next timeline matches the EffectiveEndDT of the previous. This helps to enforce no overlapping dates for a product. What's the price of that product on the transition date? The one on EffectiveStartDT of the later timeline.

    In our employee hierarchy, we (or our predecessors) have EffectiveEndDT always 1 day before the next EffectiveStartDT for that employee. Our users are locked into selecting data and joining back to the employee data with

    WHERE <dataDate> BETWEEN EffectiveStartDT AND EffectiveEndDT.

    Here, there's an issue if they're reporting on a period of time including the transition day, where they would double the fact data. OTOH, we don't have all the nice integrity checking that Dwaine has.

    Ken

  • Eirikur Eiriksson

    SSC Guru

    Points: 182511

    Good article as always Dwain, thank you very much.

    Some thoughts, first of all the 1M test code only produced 100 rows until I adjusted the top clauses. The second thing has to do with the sorting implied by the LEAD, simplifying the work by date diffing to 0 does improve the performance there, narrowing the "worktable" i guess makes a difference. Did similar recently which brought the execution time down by 40%

    😎

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM

    (

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    ,d=LAG(EffectiveEndDT, 1) OVER

    (

    PARTITION BY ProductID ORDER BY DATEDIFF(DAY,0,EffectiveStartDT)

    )

    FROM dbo.ProductPrices

    ) a

    WHERE d IS NOT NULL AND d <> EffectiveStartDT;

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM

    (

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    ,d=LAG(EffectiveEndDT, 1) OVER

    (

    PARTITION BY ProductID ORDER BY CONVERT(BINARY(8),EffectiveStartDT,0)

    )

    FROM dbo.ProductPrices

    ) a

    WHERE d IS NOT NULL AND d <> EffectiveStartDT;

    Stats

    Without Datediff:

    SQL Server parse and compile time:

    CPU time = 3 ms, elapsed time = 3 ms.

    Table 'ProductPrices'. Scan count 5, logical reads 4713, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4352 ms, elapsed time = 2156 ms.

    With Datediff:

    Table 'ProductPrices'. Scan count 5, logical reads 4713, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4352 ms, elapsed time = 1878 ms.

  • Dwain Camps

    SSC Guru

    Points: 86893

    starunit (3/24/2015)


    Nice article, Dwain.

    I've been using the CTE approach, and hadn't thought about using Cross Apply - but it looks like that's a bit slower anyway. And I'll try to remember the LAG() fuction when we move to SQL-2012

    Nevertheless, I'll bookmark your article for later reference.

    Thanks!

    I'm not sure quite what you mean by the "CTE approach" because any use of a CTE is simply a derived table in disguise (unless the CTE is used in multiple places in the query). CTEs are just a generic way to construct a temporary row set for use in the current query.

    I would say, you've also got to consider the EXISTS approach. What I said about always "forgetting" to consider how to do a query using EXISTS as a filter is true. That's why I constantly have to kick myself when it turns out to be faster (and that is relatively often).

    Thanks for your "nice article" comment!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain Camps

    SSC Guru

    Points: 86893

    ken.trock (3/24/2015)


    Interesting article; Dwain requires that, for a given product, the EffectiveStartDT of the next timeline matches the EffectiveEndDT of the previous. This helps to enforce no overlapping dates for a product. What's the price of that product on the transition date? The one on EffectiveStartDT of the later timeline.

    In our employee hierarchy, we (or our predecessors) have EffectiveEndDT always 1 day before the next EffectiveStartDT for that employee. Our users are locked into selecting data and joining back to the employee data with

    WHERE <dataDate> BETWEEN EffectiveStartDT AND EffectiveEndDT.

    Here, there's an issue if they're reporting on a period of time including the transition day, where they would double the fact data. OTOH, we don't have all the nice integrity checking that Dwaine has.

    Ken

    Hi Ken,

    The transition date/time you refer to is the exact millisecond of transition to the new price. So at midnight on the effective date (that effective start date) the new price is in effect.

    I believe there are a few people on this site (people that are real SQL experts) that would say use of BETWEEN on date/times is not the recommended approach. With a DATE type maybe. Certainly Joe Celko in his contiguous dates article seemed to suggest it was appropriate.

    While I don't happen to agree with it, that doesn't mean you can't get away with it. And you could probably even use the queries I provided with a little modification to handle the gaps/overlaps identification in your scenario.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain Camps

    SSC Guru

    Points: 86893

    Eirikur Eiriksson (3/24/2015)


    Good article as always Dwain, thank you very much.

    Some thoughts, first of all the 1M test code only produced 100 rows until I adjusted the top clauses. The second thing has to do with the sorting implied by the LEAD, simplifying the work by date diffing to 0 does improve the performance there, narrowing the "worktable" i guess makes a difference. Did similar recently which brought the execution time down by 40%

    😎

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM

    (

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    ,d=LAG(EffectiveEndDT, 1) OVER

    (

    PARTITION BY ProductID ORDER BY DATEDIFF(DAY,0,EffectiveStartDT)

    )

    FROM dbo.ProductPrices

    ) a

    WHERE d IS NOT NULL AND d <> EffectiveStartDT;

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM

    (

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    ,d=LAG(EffectiveEndDT, 1) OVER

    (

    PARTITION BY ProductID ORDER BY CONVERT(BINARY(8),EffectiveStartDT,0)

    )

    FROM dbo.ProductPrices

    ) a

    WHERE d IS NOT NULL AND d <> EffectiveStartDT;

    Stats

    Without Datediff:

    SQL Server parse and compile time:

    CPU time = 3 ms, elapsed time = 3 ms.

    Table 'ProductPrices'. Scan count 5, logical reads 4713, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4352 ms, elapsed time = 2156 ms.

    With Datediff:

    Table 'ProductPrices'. Scan count 5, logical reads 4713, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4352 ms, elapsed time = 1878 ms.

    Hi Eirikur - Thanks for taking a look!

    I'm a little confused by your examples though. You're using DATEDIFF in one bit of code and then CONVERT in the ORDER BY of the other. Yet your performance results suggest with/without DATEDIFF. Does without DATEDIFF translate into the CONVERT version? Or are you comparing your first example with the SQL 2012 version included in the article?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Eirikur Eiriksson

    SSC Guru

    Points: 182511

    dwain.c (3/24/2015)


    Hi Eirikur - Thanks for taking a look!

    I'm a little confused by your examples though. You're using DATEDIFF in one bit of code and then CONVERT in the ORDER BY of the other. Yet your performance results suggest with/without DATEDIFF. Does without DATEDIFF translate into the CONVERT version? Or are you comparing your first example with the SQL 2012 version included in the article?

    My blunder, copied the wrong code snips:blush:

  • Eirikur Eiriksson

    SSC Guru

    Points: 182511

    Eirikur Eiriksson (3/25/2015)


    dwain.c (3/24/2015)


    Hi Eirikur - Thanks for taking a look!

    I'm a little confused by your examples though. You're using DATEDIFF in one bit of code and then CONVERT in the ORDER BY of the other. Yet your performance results suggest with/without DATEDIFF. Does without DATEDIFF translate into the CONVERT version? Or are you comparing your first example with the SQL 2012 version included in the article?

    My blunder, copied the wrong code snips:blush:

    Not only did I post the wrong code snips but even worse, I made an assumption which was wrong. Finally having time to look into it this morning I realised that the difference in execution time is simply because the server chooses a parallel execution plan for the DATEDIFF, the work behind it is in the first order of magnitude greater than without it. Again my bad.

    😎

  • Dwain Camps

    SSC Guru

    Points: 86893

    Hehe. I'm sure all will forgive you Eirikur!

    Better to have stumped you than for you to find my code-craft lacking.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Stuart Pearson

    Mr or Mrs. 500

    Points: 527

    Hi Dwain

    Nice article

    The resources link on the article appears to be pointing at the wrong file.

  • Peter Heller

    Right there with Babe

    Points: 790

    Compliments on your article.

    If you defined the EffectiveEndDT as DiscontinuedDT your queries would not have to use Isnull. It would be sargable and make the query optimizer a little happier :-).

    Also, I always try avoid defining tables with dbo, I prefer to create a schema which in fact may be owned by dbo for a set of grouped SQL objects.

    [p]

    --

    -- Setting a default value for Maximum EffectiveEndDT

    --

    EffectiveEndDT DATETIME NOT NULL DEFAULT ('2099-12-31')

    DiscontinuedDT DATETIME NOT NULL DEFAULT ('2099-12-31')

    DECLARE @GetPriceOn DATETIME = '2013-12-31';

    --

    -- EffectiveEndDT set default value '2099-12-31'

    --

    -- sargable

    --

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM dbo.ProductPrices

    WHERE @GetPriceOn >= EffectiveStartDT AND @GetPriceOn < EffectiveEndDT;

    [/p]

    Thank you for sharing this article.

  • Dwain Camps

    SSC Guru

    Points: 86893

    Peter Heller (3/29/2015)


    Compliments on your article.

    If you defined the EffectiveEndDT as DiscontinuedDT your queries would not have to use Isnull. It would be sargable and make the query optimizer a little happier :-).

    Also, I always try avoid defining tables with dbo, I prefer to create a schema which in fact may be owned by dbo for a set of grouped SQL objects.

    [p]

    --

    -- Setting a default value for Maximum EffectiveEndDT

    --

    EffectiveEndDT DATETIME NOT NULL DEFAULT ('2099-12-31')

    DiscontinuedDT DATETIME NOT NULL DEFAULT ('2099-12-31')

    DECLARE @GetPriceOn DATETIME = '2013-12-31';

    --

    -- EffectiveEndDT set default value '2099-12-31'

    --

    -- sargable

    --

    SELECT ProductID, EffectiveStartDT, EffectiveEndDT, ProductPrice

    FROM dbo.ProductPrices

    WHERE @GetPriceOn >= EffectiveStartDT AND @GetPriceOn < EffectiveEndDT;

    [/p]

    Thank you for sharing this article.

    Yes you could. Often times, people don't though.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Peter Heller

    Right there with Babe

    Points: 790

    The practical example is when the product price changes the EffectiveEndDt of the last current price is updated with the current date and a new transaction is created with same date as the EffectiveStartDT equal to the previous end date. The new EffectiveEndDt will be null in your case and in my case 12/13/2099.

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply