Returning a column per date in a range

  • Jeff Moden (1/13/2012)


    On one hand, you seem to support avoiding counting rCTEs and then you seem to justify their use on the other.

    Yikes. No, what I objected to (reasonably clearly, as far as I can tell) was the idea that using an rCTE for the specific solution in this thread was not a reasonable thing to even try. No solution based on such an effort has even been posted because the suggestion was immediately pounced on. My concern is that dissuading people from experimentation is not something I like to see encouraged. Not everyone is a know-nothing newbie; so novel uses of things like recursive CTEs are tools that people should be aware of. If nothing else, it is a very creative use of the feature and may help people understand how they work. Having more than one tool in your toolbox is always a good thing, and understanding when and where to use which tool is also important.

    Because of previous misunderstandings, let me say that I realize the 150ms solution was for the generation of 4,000 dates (well beyond the requirements of this thread) and the number of milliseconds to generate the dates for the problem on this thread seem trivial to most folks. I may also have misread what you mean by "significant" so I'll try to clear that up, as well.

    To me, 150ms to gen only 4,000 rows is a "significant" problem. Using that same code to gen only 12 rows is still a "significant" problem. It's absolutely true that the amount of time the code takes to run for 12 rows is "not significant" to most folks and will fit very well within anyone's expections for the code.

    The generation of the numbers required in my posted solution (which used an on-the-fly numbers table, in any case!) is not a significant part of the problem. The process of unpivoting the data into a temporary table and generating the dynamic cross-tab is much more significant. That is what I meant. I used 4,000 rows and 150ms simply as an example of how much it doesn't matter here.

    My problem with the counting rCTE code is that someone may use the code for something else larger. Again, It's not that the code would cause a performance problem on this particular thread because, for most, it won't appear to be. I worry about folks taking something like a counting rCTE from a thread like this and using it somewhere else that requires more scalability.

    There is no counting rCTE code on this thread! There was only a suggestion that one might try to build a solution for this specific problem based on one. If a solution had been posted, we could have discussed it, and people might have learnt something about where one might consider using such a thing.

    Instead, we have ended up the usual debate about whether counting rCTEs should ever be used. Your posts and articles target very new users of SQL Server, whereas I target a slightly different audience. There is room for both. If I had said anywhere on this thread that counting rCTEs are a tool of first choice, I would expect a reaction to that. I did not, and yet we still have people (none of which posted a solution) trotting out the same tired old dogma, as if I had.

  • SQL Kiwi (1/13/2012)


    LutzM (1/13/2012)


    The main reason some of us go beyond a "it works in the context" solution is because there are people out there who -unlike your great attitude- copy a solution off the web and put it in production without any verification.

    Laudable sentiment, but I'm not convinced catering for this audience should be our primary concern :w00t:

    Agreed. I might have given the wrong reason for providing the best solution I know of (that usually is neither the best solution that exist nor always a generic method). Take it one step back: we don't always know how much the scenario has been simplified by the OP to a reasonable size for a forum. Leading to the same effect: a simple solution based on the sample data might even blow the OP's system.

    But I guess we're in agreement here: "this audience" is not our primary concern and the preference of certain generic methods for their broad applicability and general performance characteristics is reasonable. All good from my side (just a not-so-well-thought argument on my side at the beginning). 😀



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • @Paul...

    SQL Kiwi (1/11/2012)


    drew.allen (1/11/2012)


    Bob Cullen-434885 (1/10/2012)


    I started to look at an alternative approach, using a recursive CTE to generate a continuous list of dates that covered the report range

    Jeff Moden has an article that shows how badly recursive CTEs can perform for certain types of tasks and it sounds like yours may fall into that category. Check out his article Hidden RBAR: Counting with Recursive CTE's[/url]

    It's pretty tough to imagine a list of dates long enough to make any practical difference regarding performance here. Before anyone says it, yes I know that it often makes sense to code for the 10-million row case, but that would take us into the year 4749 here. A mere 4000 rows (150ms from Jeff's article graph) is more than enough to cover a ten-year period.

    Were you or were you not saying that it would be ok to use a counting rCTE here (for this particular problem) or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/15/2012)


    Were you or were you not saying that it would be ok to use a counting rCTE here (for this particular problem) or not?

    I was saying that the choice of method to generate the numbers required was not a significant part of solving this particular problem. Had Bob gone on to explore the option, he might have found a superior recursive solution to the wider problem...who knows. Point is, he was discouraged from even trying.

  • SQL Kiwi (1/15/2012)


    Jeff Moden (1/15/2012)


    Were you or were you not saying that it would be ok to use a counting rCTE here (for this particular problem) or not?

    I was saying that the choice of method to generate the numbers required was not a significant part of solving this particular problem.

    Heh.. hence the need for the same "tired old dogma". Although a counting rCTE might not be a significant part of THIS problem, people need to be warned about it for the "wider" picture.

    Had Bob gone on to explore the option, he might have found a superior recursive solution to the wider problem...who knows. Point is, he was discouraged from even trying.

    I believe the opposite, Paul. He was shown an article that would show him how to test his own convictions about counting through recursion if he was so moved. I don't believe that he was even aware of the potential problems of "standard" recursive counting methods. It was pretty obvious from his reply after reading the article that he was going to use the very same method in the article. Now that he knows of the potential problems, perhaps he'll be driven to find a better way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/16/2012)


    Heh.. hence the need for the same "tired old dogma". Although a counting rCTE might not be a significant part of THIS problem, people need to be warned about it for the "wider" picture.

    I believe the opposite, Paul. He was shown an article that would show him how to test his own convictions about counting through recursion if he was so moved. I don't believe that he was even aware of the potential problems of "standard" recursive counting methods. It was pretty obvious from his reply after reading the article that he was going to use the very same method in the article. Now that he knows of the potential problems, perhaps he'll be driven to find a better way.

    Yes, but how do you know there isn't a superior recursive solution to this problem? Are you assuming there isn't? I have no problem with people linking to your article, but you have to be prepared for people like me to debate you on this.

    As you know, there are cases where recursive CTEs provide the optimal solution. Had I read, and taken literally, your article on counting rCTEs, I might never have discovered the ultra-fast DISTINCT algorithm, for example.

    This is the reason I object to people being quite so heavy-handed in their opposition to 'counting rCTEs'. Seeking to limit discussion and experimentation through arguments like 'yeah but newbie x might copy-and-paste the code into a critical production system' irks me no end, to be honest. It's all good as far as it goes, but let's not get silly about it.

  • SQL Kiwi (1/16/2012)


    As you know, there are cases where recursive CTEs provide the optimal solution. Had I read, and taken literally, your article on counting rCTEs, I might never have discovered the ultra-fast DISTINCT algorithm, for example.

    If I recall correctly, the "ultra-fast DISTINCT" is only fast when there are very few duplicates. The recursive counting CTE causes it to become much slower than a normal DISTINCT in the presence of many duplicates. Again, we have perceived code performance based on a small number of rows (and, yes, I remember you admitting that in the same thread). Knowing that, would you actually put such code into production? I know I would not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Kiwi (1/16/2012)


    Yes, but how do you know there isn't a superior recursive solution to this problem? Are you assuming there isn't? I have no problem with people linking to your article, but you have to be prepared for people like me to debate you on this.

    I have no problem with the debate and I don't even mind people like you being as dogmatic it their opposition to the article as I am for it. I agree that someone may pull a rabbit out of their hat for recursive counting CTEs. But posting a recursive counting solution using the current technique will not compel others to find a better way if people don't know a problem exists to begin with.

    Heh... and if you're going to be dogmatic in your opposition to the article, please stop accusing me of posting the same "tired old dogma" about a real performance problem that has been publically tested and documented to exist for a given technique. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/16/2012)


    If I recall correctly, the "ultra-fast DISTINCT" is only fast when there are very few duplicates. The recursive counting CTE causes it to become much slower than a normal DISTINCT in the presence of many duplicates.

    Well memory is a funny thing isn't it? You have the relationship between speed and duplicates backwards: its best case performance occurs when there is only one distinct value (i.e. everything is duplicated). Worst case performance is on a column with only unique values. On the million-row test, rCTE DISTINCT is faster than T-SQL DISTINCT if there are 10,000 or fewer distinct values (in other words, an average 100 duplicates per value).

    So if your knowledge of the data means you *know* there will be more than 100 values per item (a very common case) you can be confident of a win with the rCTE. With 1,000 duplicates per item, it's 10x faster...and as your own testing showed on that thread, if there are many duplicates, the rCTE can be a thousand times faster. As you said yourself, "...if you know you have a lot of dupes, this is the berries!"

    Again, we have perceived code performance based on a small number of rows (and, yes, I remember you admitting that in the same thread). Knowing that, would you actually put such code into production? I know I would not.

    So you disregard every algorithm that has awesome best-case performance, and poor worst-case performance? (Check out the characteristics of QuickSort some time, if so). I don't think you really want to get into a debate about putting unreliable code into production, do you? 😉

  • Jeff Moden (1/16/2012)


    I have no problem with the debate and I don't even mind people like you being as dogmatic it their opposition to the article as I am for it. I agree that someone may pull a rabbit out of their hat for recursive counting CTEs. But posting a recursive counting solution using the current technique will not compel others to find a better way if people don't know a problem exists to begin with.

    Certainly, rCTEs are not a good way to produce a large sequence of numbers. No-one is disputing that, or saying your article is rubbish, so relax a bit on that. The danger comes in extrapolating your results, and coming to the conclusion that all rCTE solutions of somewhat similar shape fall into the same basket. If you oppose the idea that people should be allowed to explore recursive solutions in general, please explain your reasoning.

    Heh... and if you're going to be dogmatic in your opposition to the article, please stop accusing me of posting the same "tired old dogma" about a real performance problem that has been publically tested and documented to exist for a given technique.

    I don't oppose the article, just the unsafe extrapolation as I said above. And don't take the 'dogma' reference personally; it is aimed at the argument, not the person, and you were not the first, worst, or only one to have that reaction. I am simply asking that people take the anti-rCTE rhetoric down a notch, that is all.

  • SQL Kiwi (1/16/2012)


    If you oppose the idea that people should be allowed to explore recursive solutions in general, please explain your reasoning.

    Nope... not opposed to rCTEs... just counting rCTEs. I'll be careful to mention than rCTEs aren't a problem themselves in future posts. I may even add a prolog to the article to make sure that people seriously understand the difference but I'm not likely to back down on counting rCTEs being a problem until someone finds a way to make them perform a whole lot better.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 46 through 55 (of 55 total)

You must be logged in to reply to this topic. Login to reply