SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


«««12345»»»

Performance issue with tally solution Expand / Collapse
Author
Message
Posted Monday, April 13, 2009 7:28 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022, Visits: 3,594
aktikt (4/13/2009)
Say it ain't so. This is the first I've heard a set based solution losing to RBAR.

I don't have the desire to check right now, but what about using a CTE instead for the tally table?

aktikt


Hi aktikt

Yes it is . .Net is more optimized for things like string operations and formatting in most other cases the set based solutions should be the better way.

Which kind of CTE approach do you mean? A recursive CTE will miss because there are possibly more than 100 lines in the text. I would be glad if you have any other approach!

Greets
Flo



The more I learn, the more I know what I do not know

How to Post Data/Code to get the best Help"Numbers" or "Tally" Table
How to Post Performance Problems
Post #695776
Posted Monday, April 13, 2009 7:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:09 PM
Points: 20,156, Visits: 13,692
aktikt (4/13/2009)
Say it ain't so. This is the first I've heard a set based solution losing to RBAR.

I don't have the desire to check right now, but what about using a CTE instead for the tally table?

aktikt


Heh... actually, this is the second time. The While Loop beats the Tally table when "STUFF" is used to clean a string.

The CTE method was one of the things I quickly checked... it still looses to the While Loop. The CLR smokes all of the methods. Glad someone finally tested that.

I've got a lot of testing to do between VARCHAR(MAX) and VARCHAR(8000), but I don't know why anyone would write something that couldn't handle the MAX anymore. I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Post #695780
Posted Monday, April 13, 2009 7:37 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022, Visits: 3,594
Jeff Moden (4/13/2009)
I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.


Don't say that. There might be actually two business cases where the tally table looses but there are many more things to do with than only some string manipulations.

Tally for ever!



The more I learn, the more I know what I do not know

How to Post Data/Code to get the best Help"Numbers" or "Tally" Table
How to Post Performance Problems
Post #695791
Posted Monday, April 13, 2009 10:46 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 1:10 PM
Points: 388, Visits: 1,520
This reminds me a bit of the blog I did called 'Never say never to the 'While' loop.

A Charindex and WHILE loop will always outperform a 'helper table' in certain circumstances. However, these cases are usually fairly esoteric string searches. They don't invalidate the value of the use of helper tables for the common task.

Is there some test data to try this on? (I couldn't see any) I've been experimenting quite a bit with the NET Regex class for this sort of task, and would love to benchmark these techniques along with regex Split. I'm also very aware that every technique can be improved. For example, one can do the Charindex without the WHILE loop (though the code is a bit opaque, and one needs to have an upper limit to the final table)





Best wishes,

Phil Factor
Simple Talk
Post #695945
Posted Monday, April 13, 2009 11:09 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:09 PM
Points: 20,156, Visits: 13,692
Florian Reischl (4/13/2009)
Jeff Moden (4/13/2009)
I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.


Don't say that. There might be actually two business cases where the tally table looses but there are many more things to do with than only some string manipulations.

Tally for ever!


Thanks for the vote of confidence... Like you said, two business cases aren't necessarily a killer, but they do indicate serious potential can be had... to wit, I've got a whole lot of testing to do across a whole lot of scenarios. As Phil mentioned, that would also include some testing on things not having to do with a While Loop.

You've done a couple of very cool tricks in the While Loop that most have not... like calculating the CHARINDEX only once instead of twice like a good number of folks end up doing. Others have tried methods similar to yours but have failed because of the doubled CHARINDEX and a couple of other nuances that you've very cleverly avoided. It's brilliant code in its simplicity and my hat's off to you.

You should write an article about this, Flo... this is good stuff.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Post #695963
Posted Monday, April 13, 2009 2:03 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, March 11, 2010 3:29 PM
Points: 859, Visits: 845
Flo, I'll add my kudos as well: it's good to see some numbers showing how CLR rates in comparison with T-SQL. And I'll look forward to the article.

BTW, in your while loop:
WHILE EXISTS (SELECT TOP(1) 1 FROM @procs)
BEGIN
SELECT TOP(1) @name = name, @definition = definition FROM @procs

[...]

END

Did you try this approach:
while (1=1) begin
select top (1) @name = name, @definition = definition from @procs (order by?)
if @@rowcount = 0 break;

[...]
end; -- while

In the times where I resort to a while loop (er, trying to do it less, honest Jeff!) I often find myself using that format, because otherwise I end up writing more code and making SQL Server do more work. So taking out the additional select may make your while loop a little faster. Just curious as to whether it changes the numbers a little.

P.S. Phil Factor's blog post makes interesting reading, particularly for this little gem:

Jeff Moden

P.S. Heh... Always say "Never" to a RBAR While loop... if the While loop beats the set based method, there's something else wrong.



Post #696110
Posted Monday, April 13, 2009 2:34 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022, Visits: 3,594
Jeff Moden (4/13/2009)
Florian Reischl (4/13/2009)
Jeff Moden (4/13/2009)
I'm thinking that the hey-day of the Tally table have finally been replaced by faster methods.


Don't say that. There might be actually two business cases where the tally table looses but there are many more things to do with than only some string manipulations.

Tally for ever!


Thanks for the vote of confidence... Like you said, two business cases aren't necessarily a killer, but they do indicate serious potential can be had... to wit, I've got a whole lot of testing to do across a whole lot of scenarios. As Phil mentioned, that would also include some testing on things not having to do with a While Loop.

You've done a couple of very cool tricks in the While Loop that most have not... like calculating the CHARINDEX only once instead of twice like a good number of folks end up doing. Others have tried methods similar to yours but have failed because of the doubled CHARINDEX and a couple of other nuances that you've very cleverly avoided. It's brilliant code in its simplicity and my hat's off to you.

You should write an article about this, Flo... this is good stuff.


My business case was definitely a special case depending on the length of the text. Sure in most other cases the set based solution will still be the faster and definitely guideline conform solution.

I did not think specially on the once called CHARINDEX function within the loop. I think the was because I'm an old C guy and always think about "every method call has its price" .

I will start next days with some more investigations about split functions and maybe some other good reasons for CLR integrating approaches. After this I will write an article about this with all test data.

Thanks a lot for all this praise!!!

Flo



The more I learn, the more I know what I do not know

How to Post Data/Code to get the best Help"Numbers" or "Tally" Table
How to Post Performance Problems
Post #696125
Posted Monday, April 13, 2009 2:39 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:19 AM
Points: 2,022, Visits: 3,594
Hi Bruce

Bruce W Cassidy (4/13/2009)
[font="Verdana"]Flo, I'll add my kudos as well: it's good to see some numbers showing how CLR rates in comparison with T-SQL. And I'll look forward to the article.

Thank you also!!!


BTW, in your while loop:
WHILE EXISTS (SELECT TOP(1) 1 FROM @procs)
BEGIN
SELECT TOP(1) @name = name, @definition = definition FROM @procs

[...]

END

Did you try this approach:
while (1=1) begin
select top (1) @name = name, @definition = definition from @procs (order by?)
if @@rowcount = 0 break;

[...]
end; -- while

In the times where I resort to a while loop (er, trying to do it less, honest Jeff!) I often find myself using that format, because otherwise I end up writing more code and making SQL Server do more work. So taking out the additional select may make your while loop a little faster. Just curious as to whether it changes the numbers a little.

P.S. Phil Factor's blog post makes interesting reading, particularly for this little gem:

Jeff Moden

P.S. Heh... Always say "Never" to a RBAR While loop... if the While loop beats the set based method, there's something else wrong.



Thanks also for this performance improvement!


Anyway, it's very confusing to discuss all this optimizations for loops with all of you...

Greets
Flo



The more I learn, the more I know what I do not know

How to Post Data/Code to get the best Help"Numbers" or "Tally" Table
How to Post Performance Problems
Post #696130
Posted Monday, April 13, 2009 3:13 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 11:21 AM
Points: 9,104, Visits: 8,528
Jeff may not remember it but he helped me out with some tally-based line-splitting code over the summer and there were several unusual results to come out of it:

1) For splitting the lines in stored procedure definitions, there is a subtle bug that exists in almost every line-splitting routine, including the ones so far in this thread: there is actually two different kinds of line-breaks incorporated in the system stored procedures: CR-LF, and LF alone. To correctly split them, you must take both into account (which is not trivial).

2) The fastest tally-method for line-splitting turned out to be, neither an in-line join (as is being tested here), nor even an inline table-valued Function, but suprisingly a multi-line TVF.

3) The final observation that I had at that time was that the loop-based routines start to catch up with the tally-based methods if the average distance between each separator gets large enough, and for line-splitting system procedures, they tend to be quite large of course (as opposed to comma-separated strings, for instance). This is because the loop-based methods can skip the ahead of "dead" characters in-between with CHARINDEX(). Because Florian figured out how to call CHARINDEX only once per loop, that lowered where that threshold is.

So anyway, I wanted to test my line splitting functions against these also, so here are the results:
Start tally solution
Milliseconds: 6546 | Lines: 27106
Start Tally-function 3 solution (correct line-splitting)
Milliseconds: 3830 | Lines: 28519
Start Tally-function 4 solution (has same bug as the rest)
Milliseconds: 3653 | Lines: 28119
Start cursor solution
Milliseconds: 2376 | Lines: 28119

these results are pretty stable on my system.

I'll post my code shortly...


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #696150
Posted Monday, April 13, 2009 3:22 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:50 AM
Points: 4,567, Visits: 3,037
RBarryYoung (4/13/2009)
1) For splitting the lines in stored procedure definitions, there is a subtle bug that exists in almost every line-splitting routine, including the ones so far in this thread: there is actually two different kinds of line-breaks incorporated in the system stored procedures: CR-LF, and LF alone. To correctly split them, you must take both into account (which is not trivial).

It's pretty trivial in the CLR case

RBarryYoung (4/13/2009)

3) The final observation that I had at that time was that the loop-based routines start to catch up with the tally-based methods if the average distance between each separator gets large enough, and for line-splitting system procedures, they tend to be quite large of course (as opposed to comma-separated strings, for instance). This is because the loop-based methods can skip the ahead of "dead" characters in-between with CHARINDEX(). Because Florian figured out how to call CHARINDEX only once per loop, that lowered where that threshold is.

*cough*

Paul White

The tally or numbers table solution is, to an extent, a brute-force approach. In the case of many long strings where the frequency of the searched-for string is low, a solution based on a loop/CHARINDEX approach should out-perform the tally approach.


RBarryYoung (4/13/2009)

So anyway, I wanted to test my line splitting functions against these also, so here are the results:


Flo

Start clr xml solution
Milliseconds: 790 | Lines: 28545
Start clr tvf (PW) solution
Milliseconds: 1083 | Lines: 25661

Still slower than both CLR implementations
I'm still wondering why the CLR TVF is so much slower than the XML. Note that the line counts are different simply because the TVF omits empty results:

string[] items = toSplit.Split(new string[] { delimeter }, StringSplitOptions.RemoveEmptyEntries);)
Post #696161
« Prev Topic | Next Topic »

«««12345»»»

Permissions Expand / Collapse