RE: Performance issue with tally solution

SSC-Dedicated

Points: 37301

April 13, 2009 at 4:00 am

Hey Paul!

1. Using table variables (no statistics are available and parallelism cannot be used)

Was only for the sample to make everything available. I have a real tally table in my database. The performance is better with the real table but worse than the loop.

2. Using large object types (MAX)
3. Using Unicode

Source data are NVARCHAR(MAX) and it is required...

4. @crlf is defined as CHAR rather than NCHAR (implicit conversion)

Absolutely correct, thank you! I just corrected.

Nevertheless, all that is rather beside the point here.
The tally or numbers table solution is, to an extent, a brute-force approach. In the case of many long strings where the frequency of the searched-for string is low, a solution based on a loop/CHARINDEX approach should out-perform the tally approach.
Searching a 1000-row table containing strings of 500K characters for example:
The tally method will execute a SUBSTRING 1000 * 500K times, trying to find the search string.
A method based on CHARINDEX will execute Sigma (Ni+1) times, where Ni is the number of occurrences of the search string in row i.

Same as I think. The tally split function seems really handy and fast for sets of strings wich are not too large. For really large strings a simple character cursor seems to be a better solution. I think a split function is still one of the best approaches for a CLR function 😉

Greets

Flo