|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022,
Visits: 3,594
|
|
aktikt (4/13/2009) Say it ain't so. This is the first I've heard a set based solution losing to RBAR.
I don't have the desire to check right now, but what about using a CTE instead for the tally table?
aktikt
Hi aktikt
Yes it is . .Net is more optimized for things like string operations and formatting in most other cases the set based solutions should be the better way.
Which kind of CTE approach do you mean? A recursive CTE will miss because there are possibly more than 100 lines in the text. I would be glad if you have any other approach!
Greets Flo
The more I learn, the more I know what I do not know How to Post Data/Code to get the best Help"Numbers" or "Tally" Table How to Post Performance Problems
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 9:09 PM
Points: 20,156,
Visits: 13,692
|
|
aktikt (4/13/2009) Say it ain't so. This is the first I've heard a set based solution losing to RBAR.
I don't have the desire to check right now, but what about using a CTE instead for the tally table?
aktikt
Heh... actually, this is the second time. The While Loop beats the Tally table when "STUFF" is used to clean a string.
The CTE method was one of the things I quickly checked... it still looses to the While Loop. The CLR smokes all of the methods. Glad someone finally tested that.
I've got a lot of testing to do between VARCHAR(MAX) and VARCHAR(8000), but I don't know why anyone would write something that couldn't handle the MAX anymore. I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022,
Visits: 3,594
|
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Yesterday @ 1:10 PM
Points: 388,
Visits: 1,520
|
|
This reminds me a bit of the blog I did called 'Never say never to the 'While' loop.
A Charindex and WHILE loop will always outperform a 'helper table' in certain circumstances. However, these cases are usually fairly esoteric string searches. They don't invalidate the value of the use of helper tables for the common task.
Is there some test data to try this on? (I couldn't see any) I've been experimenting quite a bit with the NET Regex class for this sort of task, and would love to benchmark these techniques along with regex Split. I'm also very aware that every technique can be improved. For example, one can do the Charindex without the WHILE loop (though the code is a bit opaque, and one needs to have an upper limit to the final table)
Best wishes,
Phil Factor Simple Talk
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 9:09 PM
Points: 20,156,
Visits: 13,692
|
|
Florian Reischl (4/13/2009)
Jeff Moden (4/13/2009) I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.Don't say that. There might be actually two business cases where the tally table looses but there are many more things to do with than only some string manipulations. Tally for ever! 
Thanks for the vote of confidence... Like you said, two business cases aren't necessarily a killer, but they do indicate serious potential can be had... to wit, I've got a whole lot of testing to do across a whole lot of scenarios. As Phil mentioned, that would also include some testing on things not having to do with a While Loop.
You've done a couple of very cool tricks in the While Loop that most have not... like calculating the CHARINDEX only once instead of twice like a good number of folks end up doing. Others have tried methods similar to yours but have failed because of the doubled CHARINDEX and a couple of other nuances that you've very cleverly avoided. It's brilliant code in its simplicity and my hat's off to you.
You should write an article about this, Flo... this is good stuff.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Thursday, March 11, 2010 3:29 PM
Points: 859,
Visits: 845
|
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022,
Visits: 3,594
|
|
Jeff Moden (4/13/2009)
Florian Reischl (4/13/2009)
Jeff Moden (4/13/2009) I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.Don't say that. There might be actually two business cases where the tally table looses but there are many more things to do with than only some string manipulations. Tally for ever!  Thanks for the vote of confidence... Like you said, two business cases aren't necessarily a killer, but they do indicate serious potential can be had... to wit, I've got a whole lot of testing to do across a whole lot of scenarios. As Phil mentioned, that would also include some testing on things not having to do with a While Loop. You've done a couple of very cool tricks in the While Loop that most have not... like calculating the CHARINDEX only once instead of twice like a good number of folks end up doing. Others have tried methods similar to yours but have failed because of the doubled CHARINDEX and a couple of other nuances that you've very cleverly avoided. It's brilliant code in its simplicity and my hat's off to you. You should write an article about this, Flo... this is good stuff.
My business case was definitely a special case depending on the length of the text. Sure in most other cases the set based solution will still be the faster and definitely guideline conform solution.
I did not think specially on the once called CHARINDEX function within the loop. I think the was because I'm an old C guy and always think about "every method call has its price" .
I will start next days with some more investigations about split functions and maybe some other good reasons for CLR integrating approaches. After this I will write an article about this with all test data.
Thanks a lot for all this praise!!! 
Flo
The more I learn, the more I know what I do not know How to Post Data/Code to get the best Help"Numbers" or "Tally" Table How to Post Performance Problems
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022,
Visits: 3,594
|
|
Hi Bruce
Bruce W Cassidy (4/13/2009)
[font="Verdana"]Flo, I'll add my kudos as well: it's good to see some numbers showing how CLR rates in comparison with T-SQL. And I'll look forward to the article.  Thank you also!!!
BTW, in your while loop: WHILE EXISTS (SELECT TOP(1) 1 FROM @procs) BEGIN SELECT TOP(1) @name = name, @definition = definition FROM @procs
[...] END
Did you try this approach: while (1=1) begin select top (1) @name = name, @definition = definition from @procs (order by?) if @@rowcount = 0 break;
[...] end; -- while
In the times where I resort to a while loop (er, trying to do it less, honest Jeff!) I often find myself using that format, because otherwise I end up writing more code and making SQL Server do more work. So taking out the additional select may make your while loop a little faster. Just curious as to whether it changes the numbers a little. P.S. Phil Factor's blog post makes interesting reading, particularly for this little gem: Jeff Moden
P.S. Heh... Always say "Never" to a RBAR While loop... if the While loop beats the set based method, there's something else wrong.   Thanks also for this performance improvement!
Anyway, it's very confusing to discuss all this optimizations for loops with all of you... 
Greets Flo
The more I learn, the more I know what I do not know How to Post Data/Code to get the best Help"Numbers" or "Tally" Table How to Post Performance Problems
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 11:21 AM
Points: 9,104,
Visits: 8,528
|
|
Jeff may not remember it but he helped me out with some tally-based line-splitting code over the summer and there were several unusual results to come out of it:
1) For splitting the lines in stored procedure definitions, there is a subtle bug that exists in almost every line-splitting routine, including the ones so far in this thread: there is actually two different kinds of line-breaks incorporated in the system stored procedures: CR-LF, and LF alone. To correctly split them, you must take both into account (which is not trivial).
2) The fastest tally-method for line-splitting turned out to be, neither an in-line join (as is being tested here), nor even an inline table-valued Function, but suprisingly a multi-line TVF.
3) The final observation that I had at that time was that the loop-based routines start to catch up with the tally-based methods if the average distance between each separator gets large enough, and for line-splitting system procedures, they tend to be quite large of course (as opposed to comma-separated strings, for instance). This is because the loop-based methods can skip the ahead of "dead" characters in-between with CHARINDEX(). Because Florian figured out how to call CHARINDEX only once per loop, that lowered where that threshold is.
So anyway, I wanted to test my line splitting functions against these also, so here are the results:
Start tally solution Milliseconds: 6546 | Lines: 27106 Start Tally-function 3 solution (correct line-splitting) Milliseconds: 3830 | Lines: 28519 Start Tally-function 4 solution (has same bug as the rest) Milliseconds: 3653 | Lines: 28119 Start cursor solution Milliseconds: 2376 | Lines: 28119
these results are pretty stable on my system.
I'll post my code shortly...
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCarpal Tunnel
       
Group: General Forum Members
Last Login: Today @ 10:50 AM
Points: 4,567,
Visits: 3,037
|
|
RBarryYoung (4/13/2009) 1) For splitting the lines in stored procedure definitions, there is a subtle bug that exists in almost every line-splitting routine, including the ones so far in this thread: there is actually two different kinds of line-breaks incorporated in the system stored procedures: CR-LF, and LF alone. To correctly split them, you must take both into account (which is not trivial).
It's pretty trivial in the CLR case 
RBarryYoung (4/13/2009)
3) The final observation that I had at that time was that the loop-based routines start to catch up with the tally-based methods if the average distance between each separator gets large enough, and for line-splitting system procedures, they tend to be quite large of course (as opposed to comma-separated strings, for instance). This is because the loop-based methods can skip the ahead of "dead" characters in-between with CHARINDEX(). Because Florian figured out how to call CHARINDEX only once per loop, that lowered where that threshold is.
*cough*
Paul White
The tally or numbers table solution is, to an extent, a brute-force approach. In the case of many long strings where the frequency of the searched-for string is low, a solution based on a loop/CHARINDEX approach should out-perform the tally approach.
RBarryYoung (4/13/2009)
So anyway, I wanted to test my line splitting functions against these also, so here are the results:
Flo
Start clr xml solution Milliseconds: 790 | Lines: 28545 Start clr tvf (PW) solution Milliseconds: 1083 | Lines: 25661
Still slower than both CLR implementations  I'm still wondering why the CLR TVF is so much slower than the XML. Note that the line counts are different simply because the TVF omits empty results:
string[] items = toSplit.Split(new string[] { delimeter }, StringSplitOptions.RemoveEmptyEntries);)
|
|
|
|