Returning a column per date in a range

  • Hmmm. Duly checked out!! That's something of an eye-opener:w00t: Won't be doing that again.

    Thanks for the pointer to a very interesting article.

    Bob

  • drew.allen (1/11/2012)


    Bob Cullen-434885 (1/10/2012)


    I started to look at an alternative approach, using a recursive CTE to generate a continuous list of dates that covered the report range

    Jeff Moden has an article that shows how badly recursive CTEs can perform for certain types of tasks and it sounds like yours may fall into that category. Check out his article Hidden RBAR: Counting with Recursive CTE's[/url]

    It's pretty tough to imagine a list of dates long enough to make any practical difference regarding performance here. Before anyone says it, yes I know that it often makes sense to code for the 10-million row case, but that would take us into the year 4749 here. A mere 4000 rows (150ms from Jeff's article graph) is more than enough to cover a ten-year period.

    I congratulate Bob for finding my error and pursuing an alternative line of enquiry. My concern is that we shouldn't be too keen in promoting The One True Way to generate numbers such that we end up stifling innovation and exploration.

  • SQL Kiwi (1/11/2012)


    drew.allen (1/11/2012)


    Bob Cullen-434885 (1/10/2012)


    I started to look at an alternative approach, using a recursive CTE to generate a continuous list of dates that covered the report range

    Jeff Moden has an article that shows how badly recursive CTEs can perform for certain types of tasks and it sounds like yours may fall into that category. Check out his article Hidden RBAR: Counting with Recursive CTE's[/url]

    It's pretty tough to imagine a list of dates long enough to make any practical difference regarding performance here. Before anyone says it, yes I know that it often makes sense to code for the 10-million row case, but that would take us into the year 4749 here. A mere 4000 rows (150ms from Jeff's article graph) is more than enough to cover a ten-year period.

    I congratulate Bob for finding my error and pursuing an alternative line of enquiry. My concern is that we shouldn't be too keen in promoting The One True Way to generate numbers such that we end up stifling innovation and exploration.

    If you read the article, you'll see that even the mere 4000 row CTE has a significant performance impact. That's how badly recursive CTEs can perform.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (1/11/2012)


    If you read the article, you'll see that even the mere 4000 row CTE has a significant performance impact. That's how badly recursive CTEs can perform.

    You might have missed that I quoted the performance impact directly from the article "(150ms from Jeff's article graph)", which is one I have read many times (see the article's discussion thread for my comments). 150ms is not 'significant' in the current context, and it is 46ms on my laptop in any case:

    SET STATISTICS TIME ON;

    WITH R AS (SELECT 1 AS N UNION ALL SELECT R.N + 1 FROM R WHERE R.N <= 4000)

    SELECT * INTO #bucket FROM R OPTION (MAXRECURSION 0)

    SET STATISTICS TIME OFF;

  • SQL Kiwi (1/11/2012)


    You might have missed that I quoted the performance impact directly from the article "(150ms from Jeff's article graph)", which is one I have read many times (see the article's discussion thread for my comments). 150ms is not 'significant' in the current context, and it is 46ms on my laptop in any case:

    I disagree. 150ms is THIRTY times worse than the next worse algorithm. I think that's pretty significant. I can see no reason to recommend incorporating this method into new code when we know how badly it performs.

    Granted, if this method was being used in production code you would have to consider the context to determine whether it was worth rewriting the code to improve performance, but that's not the case here.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (1/11/2012)


    I disagree.

    No worries.

    Granted, if this method was being used in production code you would have to consider the context to determine whether it was worth rewriting the code to improve performance...

    Exactly.

    ...but that's not the case here.

    Why not?

  • Wouldnt a date table work in this situation as well? sort of like what we find in data warehouses?

  • stephen99999 (1/11/2012)


    Wouldnt a date table work in this situation as well? sort of like what we find in data warehouses?

    Yes it would.

  • SQL Kiwi (1/11/2012)


    drew.allen (1/11/2012)


    I disagree.

    No worries.

    Granted, if this method was being used in production code you would have to consider the context to determine whether it was worth rewriting the code to improve performance...

    Exactly.

    ...but that's not the case here.

    Why not?

    I have found that, although the original author of a given piece of code may, indeed, carefully consider the context in which it is being used (ie. limited number of rows), the NEXT person to run across the code may not. Although you'd like to think that someone leveraging an existing piece of code would do a little testing, they usually do not because they 1) wouldn't be looking for the code if they knew more about SQL Server and T-SQL methods and 2) they're desperate for a solution. Such desperation is normally accompanied by a schedule crunch and they won't test it in a larger environment because they don't have the time.

    I've also found that supposed limits on a number of rows is frequently incorrect.

    With all of that in mind, I try to protect folks who may not know and may be in a time crunch or simply won't take the time to do a little verification of performance on their own by reminding them that a counting CTE is frequently worse than a While loop and that, considering how many other methods there are to solve the problem correctly, should be avoided at all cost.

    Besides... why would anyone intentionally write performance challenged code?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Kiwi (1/11/2012)


    It's pretty tough to imagine a list of dates long enough to make any practical difference regarding performance here. Before anyone says it, yes I know that it often makes sense to code for the 10-million row case, but that would take us into the year 4749 here. A mere 4000 rows (150ms from Jeff's article graph) is more than enough to cover a ten-year period.

    Spot on... what about the next guy that uses the code to generate individual dates according to Start/End date pairs for several thousand rows?

    I don't see the merit in ever using a counting rCTE for such a thing... the code takes longer to write, is physically longer and more complex than the other methods, and is guaranteed to have a performance problem if someone were to ever leverage the code for a larger purpose.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/12/2012)


    SQL Kiwi (1/11/2012)


    It's pretty tough to imagine a list of dates long enough to make any practical difference regarding performance here. Before anyone says it, yes I know that it often makes sense to code for the 10-million row case, but that would take us into the year 4749 here. A mere 4000 rows (150ms from Jeff's article graph) is more than enough to cover a ten-year period.

    Spot on... what about the next guy that uses the code to generate individual dates according to Start/End date pairs for several thousand rows?

    I don't see the merit in ever using a counting rCTE for such a thing... the code takes longer to write, is physically longer and more complex than the other methods, and is guaranteed to have a performance problem if someone were to ever leverage the code for a larger purpose.

    Sorry Paul, but I'm inclined to agree with Jeff on this one. Using a rCTE to generate a simple sequence of numbers (or dates) is wasteful of resources when you can get the rows (and maybe the numbers too) elsewhere at a lower cost and code complexity. I'm not saying rCTE's shouldn't be used for anything other than resolving hierarchies - as you know too well - just not for this purpose. Not even a mere 4000 rows. It's up to us to keep folks informed of where to use a particular tool or trick, and where not to - otherwise it's only a matter of time before someone posts up a performance problem with a "counting rCTE" within a loop, or worse. But hey - at least it will be easy to fix 😀


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Jeff and Chris,

    You've missed the point I was making, and have chosen to assume that I am either (a) not aware of the performance issues involved; or (b) that I was offering general advice that it is ok to use rCTEs to generate numbers tables on the fly. I would have thought you both knew me better than that.

  • SQL Kiwi (1/12/2012)


    Jeff and Chris,

    You've missed the point I was making, and have chosen to assume that I am either (a) not aware of the performance issues involved; or (b) that I was offering general advice that it is ok to use rCTEs to generate numbers tables on the fly. I would have thought you both knew me better than that.

    Paul, I assumed that you were suggesting that it's ok if the number of rows is not only small but is restricted, and I must admit I was more than a little taken aback. On re-reading the whole thread I can see that this isn't the case - but it is sufficiently woolly to be interpreted that way. Apologies, mate.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (1/12/2012)


    Paul, I assumed that you were suggesting that it's ok if the number of rows is not only small but is restricted, and I must admit I was more than a little taken aback. On re-reading the whole thread I can see that this isn't the case - but it is sufficiently woolly to be interpreted that way. Apologies, mate.

    No worries.

  • SQL Kiwi (1/12/2012)


    Jeff and Chris,

    You've missed the point I was making, and have chosen to assume that I am either (a) not aware of the performance issues involved; or (b) that I was offering general advice that it is ok to use rCTEs to generate numbers tables on the fly. I would have thought you both knew me better than that.

    The problem is that your point depends on an assumption that is not only not warranted, but actually contraindicated by the context of the discussion as a whole. Your argument depends on the fact that the rCTE is already in place and the question is whether to rewrite it, when it seems fairly obvious to me that the OP doesn't have ANY code in place and is looking for the best solution. Given the tenacity with which you have held your position despite the fact that it is fairly clearly new development, I don't see how anyone can reach any other conclusion.

    Even if your assumption were correct, the rCTE performs so badly compared to the other methods, I can't imagine a situation where you would choose not to rewrite the code at all. I would always choose to rewrite this code, although it may not have a high priority.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 16 through 30 (of 55 total)

You must be logged in to reply to this topic. Login to reply