SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Performance issue with tally solution


Performance issue with tally solution

Author
Message
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
Bob Hovious (4/12/2009)
Flo: I'm not at my laptop right now, so I can't test your code. I'll do so at the earliest opportunity. Couple of questions though:

Why are you selecting the top (1000)?
WHERE N < len(s.definition)-1 should handle that.


ORDER BY N? Does it make any difference to the output?

Why table variable @tally?


Hi Bob

Thanks for your help!

Why are you selecting the top (1000)?
The TOP(1000) is just to able to use the ORDER BY to ensure that the returned lines will not be swapped.
ORDER BY N? Does it make any difference to the output?
I'm not sure if a sequential result is ensured if not ordered. It is important that no lines will be swapped.
Why table variable @tally?
The variable @tally was only to ensure everything is available for the sample. I also tried with a usual Tally table, same result.

Greets
Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35558 Visits: 11361
Hey again Flo,

You might like to also try this with a CLR TVF.
I tried it on your sample and got:


Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#source'. Scan count 1, logical reads 77, physical reads 0, read-ahead reads 0, lob logical reads 3803, lob physical reads 0, lob read-ahead reads 2381.

SQL Server Execution Times:
CPU time = 1422 ms, elapsed time = 1940 ms.



A real C# dev will be able to improve my attempt at the code:

(the code tags have broken a couple of things 'AddAdd' and a cast, but hopefully you will get the gist...)

public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(
FillRowMethodName = "FillRow",
DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
Name = "tfn_Split",
TableDefinition = "row_id int, result nvarchar(max)")]
public static IEnumerable tfn_Split
(
[SqlFacet(IsNullable = false, MaxSize = -1, IsFixedLength = false)]
SqlString StringToSplit,
[SqlFacet(IsNullable = false, MaxSize = 4000, IsFixedLength = false)]
SqlString Delimiter
)
{
string toSplit = StringToSplit.Value;
string delimeter = Delimiter.Value;

string[] items = toSplit.Split(new string[] { delimeter }, StringSplitOptions.RemoveEmptyEntries);
Dictionary (items.Length);
for (int i = 0; i < items.Length; i++)
{
results.Add(i + 1, items[i]);
}
return results;
}

public static void FillRow(Object obj, out SqlInt32 row_id, out SqlString result)
{
KeyValuePair )obj;
row_id = new SqlInt32(kvp.Key);
result = new SqlString(kvp.Value);
}
};



The CLR version runs fastest on my laptop, has the simplest plan (obviously) and starts to return results immediately, which is nice.

For completeness, this is the SQL:


SELECT S.row_id, TVF.row_id, TVF.result
FROM #source AS S
CROSS
APPLY dbo.tfn_Split(S.definition, CHAR(13) + CHAR(10)) AS TVF
ORDER BY
S.row_id, TVF.row_id;



Cheers,

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
Hey Paul!

1. Using table variables (no statistics are available and parallelism cannot be used)

Was only for the sample to make everything available. I have a real tally table in my database. The performance is better with the real table but worse than the loop.

2. Using large object types (MAX)
3. Using Unicode

Source data are NVARCHAR(MAX) and it is required...

4. @crlf is defined as CHAR rather than NCHAR (implicit conversion)

Absolutely correct, thank you! I just corrected.

Nevertheless, all that is rather beside the point here.

The tally or numbers table solution is, to an extent, a brute-force approach. In the case of many long strings where the frequency of the searched-for string is low, a solution based on a loop/CHARINDEX approach should out-perform the tally approach.

Searching a 1000-row table containing strings of 500K characters for example:

The tally method will execute a SUBSTRING 1000 * 500K times, trying to find the search string.
A method based on CHARINDEX will execute Sigma[i=1, 1000] (Ni+1) times, where Ni is the number of occurrences of the search string in row i.

Same as I think. The tally split function seems really handy and fast for sets of strings wich are not too large. For really large strings a simple character cursor seems to be a better solution. I think a split function is still one of the best approaches for a CLR function ;-)

Greets
Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
Hi Jeff!

Wow! Nicely done, Flo... I haven't figured it out, yet. Looks like you may have figured out a way to beat the Tally table.

I'm not sure if "nice" should be the correct word for "The known slow approach seems to be the faster solution" :-D

Thanks also for the bug fix in the procedure!

Greets
Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
I know Jeff will shoot me for this... :-)

I did some other tests and finally found a high performant solution with CLR. Due to the fact that CLR functions do not support table valued functions I had to handle with XML data type.

Since now the function is not really optimized but it beats all other solutions!

CLR function
[code="csharp"]
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Xml;

using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static System.Data.SqlTypes.SqlXml ufn_clr_SplitLines(SqlChars textIn)
{
string text = new string(textIn.Value);
string[] lines = text.Split(new string[] { "\r" }, StringSplitOptions.None);

int capacity = text.Length * 2;
MemoryStream strm = new MemoryStream(capacity);
XmlWriter sw = XmlWriter.Create(strm);

sw.WriteStartDocument();
sw.WriteStartElement("Root");

for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];

sw.WriteElementString("Item", line);
}

sw.WriteEndElement();
sw.WriteEndDocument();
sw.Flush();

strm.Position = 0;

SqlXml sqlXml = new SqlXml(strm);

// Put your code here
return sqlXml;
}
};
[/code]


Additional test block in TSQL

DELETE FROM @result -- Clean up

-- //////////////////////////////////////////////////////////
-- -> Clr xml solution
PRINT 'Start clr xml solution'
SELECT @now = GETDATE()

-- Split text into lines
INSERT INTO @result
SELECT l.line
FROM @source s
CROSS APPLY (SELECT
ISNULL(T.C.value('(./text())[1]', 'nvarchar(1000)'), '') line
FROM (SELECT dbo.ufn_clr_SplitLines(s.definition) col1) x
CROSS APPLY x.col1.nodes('/Root/Item') T(C)
) l

-- Results
SELECT @duration = DATEDIFF(MILLISECOND, @now, GETDATE())
SELECT @count = COUNT(*) FROM @result
PRINT 'Milliseconds: ' + CONVERT(VARCHAR(10), @duration) + ' | Lines: ' + CONVERT(VARCHAR(10), @count)
-- <- Clr xml solution
-- //////////////////////////////////////////////////////////



Results
Start clr xml solution
Milliseconds: 783 | Lines: 28545


@Jeff
The day to activate CLR on your servers is close :-D

Greets
Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
Hey Paul again!

Sorry! Didn't see your new post...

I did not know that CLR functions can return IEnumerable. Thanks for that!

I tested your function against mine:
Start clr xml solution
Milliseconds: 790 | Lines: 28545
Start clr tvf (PW) solution
Milliseconds: 1083 | Lines: 25661

I don't really understand why the way over XML is faster than a direct TVF. The only idea I have is that SQL Server calls the FillRowMethod over reflection, what would be really ugly implementation! I usually handle those kinds of dynamic method calls over dynamic classes and delegates. It's a little more source code but the performance improvement is up to factor 1000.

Greets
Flo

PS: Sorry also for the late answer. I had to restart my rooter about twenty times... Currently I'm doin' something like "stock posting" Hehe in a text editor until internet gives me a minute or two...


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35558 Visits: 11361
Flo,

More great stuff from you again...!

I hope your internet connection is fixed soon.

Off to bed now, but thanks for the XML CLR solution - neat (though I am naturally XML-sceptic).

I disliked writing the CLR TVF intensely so it's good to see an alternative (and by all measures apparent to me) better solution.

The documentation regarding IEnumerable and CLR TVFs in general is not fantastic in BOL - there must be something better out there on the interweb...


Cheers,

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Florian Reischl
Florian Reischl
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8849 Visits: 3934
Paul White (4/13/2009)
Flo,

More great stuff from you again...!

I hope your internet connection is fixed soon.

Off to bed now, but thanks for the XML CLR solution - neat (though I am naturally XML-sceptic).

I disliked writing the CLR TVF intensely so it's good to see an alternative (and by all measures apparent to me) better solution.

The documentation regarding IEnumerable and CLR TVFs in general is not fantastic in BOL - there must be something better out there on the interweb...


Cheers,

Paul


Currently I'm online ;-)

I already saw some other solutions working with XML in SQL Server. It seems MS really optimized it for performance! Since now there is no fantastic documentation about CLR in BOL for anything...

Thanks again for all your help!

Good night :-)
Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)SSC Guru (215K reputation)

Group: General Forum Members
Points: 215008 Visits: 41979
Florian Reischl (4/13/2009)
I know Jeff will shoot me for this... :-)
@Jeff
The day to activate CLR on your servers is close :-D


Heh... you may be right. Might also have to install C# and learn it, too. I've been waiting for someone to do these types of tests. Thanks, Flo.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
aktikt
aktikt
SSC-Addicted
SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)SSC-Addicted (421 reputation)

Group: General Forum Members
Points: 421 Visits: 413
Say it ain't so. This is the first I've heard a set based solution losing to RBAR.

I don't have the desire to check right now, but what about using a CTE instead for the tally table?

aktikt
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search