Performance issue with tally solution

  • Phil,

    I hear ya - but it's difficult to just take someone's word for it when such differing results have already been posted with detailed scripts, including test data.

    I get that all methods are cool and funky with very small strings. Photographing the string and cutting it up with scissors performs well for arbitrarily small strings ;c)

    However it may seem, I'm not a particular CLR fan (as I keep posting, it seems) but it is well suited to certain tasks. I agree that if you have a Pentium Pro at 200Mhz with 128MB RAM, CLR might not be the way to go. I also think that it seems to perform well on the sort of data size we are all likely to encounter fairly frequently. The odd requirement to split strings from entire books does come up occasionally, but the frequency is such that (a) a custom solution is needed; (b) optimizations may be possible given knowledge of that data or source; and (c) you have to ask whether it would be better done at the source or by pre-processing rather than on a SQL Server.

    It's amazing how many times people will accept that an app writen in .NET would be ideal for a task, and yet still insist that it *can be done* in T-SQL. I have nothing against T-SQL, it's what I use every day, and by far my preferred language.

    It will be fascinating to see where this twisty thread turns next...

    Paul

  • Phil Factor (4/21/2009)


    My Set-based 'quirky update' solution is half way down the page on page 6 of this thread. http://www.sqlservercentral.com/Forums/Topic695508-338-6.aspx

    Not for me it isn't! As far as I'm concerned, this thread has only just got to page 4! But then I use 50 posts/page.

    Try quoting the post number.

    Derek

  • http://www.sqlservercentral.com/Forums/FindPost696606.aspx

    Best wishes,
    Phil Factor

  • Paul White (4/21/2009)


    Phil Factor (4/21/2009)


    ...the difference in the results that we are getting is due to the way different servers behave with different resources and configurations. I do my testing on an old dog of a server so that performance problems are highlighted.

    It seems logical that "different machines with different resources and configurations" should produce different results, doesn't it? FYI I use a 2GHz (single core obviously) Pentium IV-m laptop of a fairly ordinary sort. I wouldn't be surprised if your 'old dog' compares quite well with it!

    Despite the different actual numbers, we usually find that the relative numbers track pretty well across different severs.

    However, in my experience there are three factors that are not well accounted for in most of the on-line performance tests/comparisons:

    1) the effect of parallelism

    2) the effect of different memory sizes

    3) the effect of different file-to-disk mappings

    Usually (2) does not come up because our tests, though large, will still fit in memory for a single user. (3) is mostly about the location of the ldf's and tempdb with respect to the base mdf. On most of our desktop/laptop systems these are all on the same physical disk, not so on most servers which could make a huge difference. However, because most of our tests are select oriented, it hasn't affected testing too much so far (hard to tell though).

    (1) is a bit more of a problem though, and entirely possible as a random factor here.

    In order to tell if these things might be affecting the tests, we usually include CPU & Logical IO stats along with elapsed times. If these do not track well with the Elapsed times then we can usually tell that one of these is skewing the results. The other thing that we often do is to include the query plans which are very helpful in determining if something odd is going on in the tests.

    The other thing that we usually account for well, but that is getting lost here is the version of SQL Server being tested. As someone here observed, the CLR entry-cost seems to be much improved on SQL 2008 which could definitely be affecting our comparisons and relative numbers.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Phil Factor (4/21/2009)


    http://www.sqlservercentral.com/Forums/FindPost696606.aspx

    Thanks, Phil.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RBarryYoung (4/21/2009)


    Despite the different actual numbers, we usually find that the relative numbers track pretty well across different severs.

    However, in my experience there are three factors that are not well accounted for in most of the on-line performance tests/comparisons:

    1) the effect of parallelism

    2) the effect of different memory sizes

    3) the effect of different file-to-disk mappings

    Agreed... Phil and I went through that on his Simple-Talk thread where we both split the Moby Dick novel. His While loop beat the Tally table method on his older machine and the Tally table method beat the While loop on my older machine. That's why I'm thankful to Flo for all the testing he's done... takes some of the differences between machines out. Of course, some of those differences are also important to performance so I'm glad to see other's doing their own tests, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What an absolutely amazing thread this is! Well done to all involved. I have learned a good bit, and have some new tools in my bag of tricks! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Edward Boyle (4/20/2009)


    My strings usually have less that 20 "tokens"

    Edward.... first, please understand that I'm absolutely NOT trying to be a wise guy here...

    I've found that many folks use the justification of "it's will only be used for a limited nuber of rows" to justify all sorts of code. And, for your application, it may very well be not only limited, but actually be guaranteed to be limited in rows.

    The problem occurs when someone see's the code (maybe even in your own company) and uses it in an environment where such limits are not guarateed. That's where computational dept comes into play and it can be a real killer. That's why folks are so deeply involved in this thread... to show some extreme but simple methods for avoiding that type of computational debt.

    Just so everyone knows, recursion usually has about the same speed as a well written cursor or while loop. It would be interesting to see someone include that method in the current testing going on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Factor (4/21/2009)


    http://www.sqlservercentral.com/Forums/FindPost696606.aspx

    Thanks

    Derek

  • Hi

    Sorry for late answer. Too much meetings...

    So as Paul already indicated we have a new Tally solution with pretty strange characteristics... I already did some tests with Jeff. It seems that it is up to three (or more) times faster than the traditional tally solution. But only in some cases and I still try to understand when and why.

    Initially I just tried to find a way to use the tally split without the required leading and trailing delimiter to avoid some memory swapping. The new version does not only handle this little issue but also creates (sometimes) a completely different execution plan containing a Worktable. Since this is included the performance increases in factors.

    Again, I don't really understand why and how it does what it does :crazy: . It would be great if somebody has any suggestion!

    Here a test framework which is currently not part of my other tests because of the quiet strange behavior...

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    USE tempdb

    GO

    IF (OBJECT_ID('dbo.Tally') IS NULL)

    BEGIN

    CREATE TABLE dbo.Tally (N INT NOT NULL, PRIMARY KEY CLUSTERED (N))

    INSERT INTO dbo.Tally

    SELECT TOP 11000

    ROW_NUMBER() OVER (ORDER BY c1.column_id)

    FROM master.sys.all_columns c1

    CROSS JOIN master.sys.all_columns c2

    END

    --===== Declare and preset a variable to hold the desired length of a CSV

    DECLARE @MaxLength INT

    SET @MaxLength = 8000

    --===== Create a CSV variable of a given length (Each part = 9 characters + comma = 10 characters)

    DECLARE @CSV VARCHAR(8000)

    SELECT @CSV = ISNULL(@CSV+',','')+'Part-'+CAST(t.N AS CHAR(4))

    FROM dbo.Tally t

    WHERE t.N <= @MaxLength/10-1 --(-1) for room for leading / trailing commas

    --===== Add leading and trailing commas

    SELECT @CSV = ','+@CSV+','

    --DROP TABLE dbo.JBMTest

    IF (OBJECT_ID('dbo.JBMTest') IS NULL)

    BEGIN

    --===== Create and populate a 1000 row test table.

    SELECT TOP 1000

    RowNum = IDENTITY(INT,1,1),

    CSV = @CSV

    INTO dbo.JBMTest

    FROM Master.sys.columns t1,

    Master.sys.columns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Add a clustered Primary Key like any good table.

    -- Takes about 1 second to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    END

    GO

    IF (OBJECT_ID('dbo.Result') IS NOT NULL)

    DROP TABLE dbo.Result

    GO

    DECLARE @delimiter CHAR(1)

    SELECT @delimiter = ','

    PRINT '----=========================================================='

    PRINT '---- UNION ALL Tally'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT s.RowNum,l.Item

    INTO dbo.Result

    FROM dbo.JBMTest s

    CROSS APPLY (

    -- Get the first (or even only) item

    SELECT

    SUBSTRING(s.CSV, 1, ISNULL(NULLIF(CHARINDEX(@delimiter, s.CSV, 1) - 1, -1), LEN(s.CSV))) item,

    1 Sorting

    UNION ALL

    -- Usual Tally split with extension to get last item without needed delimiter at the end

    SELECT TOP 100 PERCENT

    SUBSTRING(s.CSV, t.N + 1, ISNULL(NULLIF(CHARINDEX(@delimiter, s.CSV, t.N + 1) - t.N - 1, -t.N - 1), LEN(s.CSV) - t.N)) item,

    2 Sorting

    FROM Tally t

    WHERE t.N <= LEN(s.CSV)

    AND SUBSTRING(s.CSV, t.N, 1) = @delimiter

    ORDER BY Sorting,t. N

    ) l

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    GO

    IF (OBJECT_ID('dbo.Result') IS NOT NULL)

    DROP TABLE dbo.Result

    GO

    --But this doesn't speed up... must have something to do with the cross apply

    DECLARE @delimiter CHAR(1)

    SELECT @delimiter = ','

    PRINT '----=========================================================='

    PRINT '---- Traditional Tally solution'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT s.RowNum,l.Item

    INTO dbo.Result

    FROM jbmtest s

    CROSS APPLY(

    --Traditional split

    SELECT TOP 100 PERCENT

    SUBSTRING(s.CSV, t.N +1, CHARINDEX(@delimiter, s.CSV, t.N +1) -t.N -1) AS Item

    FROM dbo.Tally t

    WHERE t.N < LEN(s.CSV)

    AND SUBSTRING(s.CSV, t.N, 1) = @delimiter

    ORDER BY T.N

    )l

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    My current IO/TIME results

    ----==========================================================

    ---- Traditional Tally solution

    Table 'Worktable'. Scan count 1, logical reads 5608, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'JBMTest'. Scan count 1, logical reads 1004, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 733 ms, elapsed time = 730 ms.

    (801000 row(s) affected)

    ----==========================================================

    ---- Traditional Tally solution

    Table 'Tally'. Scan count 1000, logical reads 15000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'JBMTest'. Scan count 1, logical reads 1004, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2325 ms, elapsed time = 2356 ms.

    (799000 row(s) affected)

    The different row counts depend on the case that the new routine includes the first and the last "," and the traditional doesn't. Should be unimportant for now.

    Edited: Corrected PRINT message in script

    Greets

    Flo

  • Bruce W Cassidy (4/19/2009)


    Florian Reischl (4/19/2009)


    The problem is the string builder. It takes to much overhead inside.

    [font="Verdana"]Only if you use code to generate the lookup table. 😀 Go on, list all 256 values![/font]

    Hi Bruce

    Sorry, missed your response...

    Nope, the lookup table is only created once. If you have a look to my code the "_hexLookup256" is marked as "static readonly" and the constructor I used is static the lookup table is created only once. 😉

    Greets

    Flo

  • Jeff Moden (4/19/2009)


    Just curious.... lot's of folks call what's stored in VarBinary, well... Binary and it's actually displayed as hex. So, pardon my ignorance of which you speak, but could you display an example input and output of what you guys are talking about for the binary-to-hex conversion function? Thanks.

    Jeff, I think I have to apologize... I underrated the tally table! Yes, the CLR is faster, but the tally solution I just tried (and there may be better) is also very fast! Until now I just tried the Microsoft "master.sys.fn_varbintohexstr" which is quiet slow.

    SET NOCOUNT ON

    --DROP TABLE #BinData

    IF (OBJECT_ID('tempdb..#BinData') IS NULL)

    BEGIN

    CREATE TABLE #BinData (Id INT NOT NULL IDENTITY, BinData VARBINARY(MAX), HexString VARCHAR(MAX))

    DECLARE @b-2 VARBINARY(256)

    SELECT @b-2 = 0x000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F

    SELECT @b-2 = @b-2 + 0x404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F

    SELECT @b-2 = @b-2 + 0x808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF

    SELECT @b-2 = @b-2 + 0xC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF

    INSERT INTO #BinData (BinData)

    SELECT TOP(100) @b-2

    FROM Tally

    WHILE (1000000 > (SELECT TOP(1) DATALENGTH(BinData) FROM #BinData))

    UPDATE #BinData SET BinData = BinData + BinData

    END

    -- CLR solution

    UPDATE #BinData SET HexString = NULL

    PRINT '--================================================='

    PRINT '-- CLR'

    SET STATISTICS TIME ON

    UPDATE #BinData SET HexString = dbo.ufn_clr_varbintohexstr(BinData)

    SET STATISTICS TIME OFF

    -- Tally solution

    UPDATE #BinData SET HexString = NULL

    -- Lookup table

    DECLARE @HexString VARCHAR(550)

    SELECT @HexString = '000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F' +

    '404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F' +

    '808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF' +

    'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'

    PRINT '--================================================='

    PRINT '-- Tally'

    SET STATISTICS TIME ON

    UPDATE t1 SET

    HexString = '0x' + (

    SELECT

    SUBSTRING(@HexString, CONVERT(TINYINT, SUBSTRING(t1.BinData, N, 1)) * 2 + 1, 2)

    FROM Tally t2

    WHERE N <= DATALENGTH(t1.BinData)

    ORDER BY N

    FOR XML PATH('')

    )

    FROM #BinData t1

    SET STATISTICS TIME OFF

    Performance

    --=================================================

    -- CLR

    SQL Server Execution Times:

    CPU time = 9235 ms, elapsed time = 9980 ms.

    --=================================================

    -- Tally

    SQL Server Execution Times:

    CPU time = 11513 ms, elapsed time = 11662 ms.

    Edited: Be careful with this test on lower performing systems! It creates a table with 100mb and converts 200mb!

    Greets

    Flo

  • Morning Flo!

    On first look, the performance improvement is down to the QO choosing a much more sensible plan when a VARCHAR is used, rather than a MAX datatype. This is all too familiar of course - the QO often carps out when LOBs turn up. I think Lynn and I both mentioned that about 4,000 posts back.

    By the way, the attempt at intermediate materialisation (the TOP + ORDER BY) doesn't work - I tried to get the QO to do this in some of my attempts too.

    There is much more to say, but this is just a quick first post.

    The big difference in the plans seems to be this filter:

    {moved to attachment Filter.txt}

    That executes a substring 799,000 times!.

    On my machine (SQL2K5 Dev dual-core 9.0.4211 SP3+)

    Good plan:

    {moved to attachment GoodPlan.txt}

    Bad plan:

    {moved to attachment BadPlan.txt}

    Copying and pasting the text into a SSMS window seems to work quite well.

    Cheers,

    Paul

    edit: moved the vast text into attachments (apologies)

  • Paul White (4/21/2009)


    Morning Flo!

    As usual Paul is up - shift changeover - time for bed (12:42 AM)...

    This stays a great pity!

    On first look, the performance improvement is down to the QO choosing a much more sensible plan when a VARCHAR is used, rather than a MAX datatype. This is all too familiar of course - the QO often carps out when LOBs turn up. I think Lynn and I both mentioned that about 4,000 posts back.

    I remember your posts. I played with the VARCHAR sizes. Since I change the type of @CSV to VARCHAR(MAX) and change the size of MaxLength to 8010 the duration on my machine goes down to 1,5 seconds for the new tally function and 45 seconds for the traditional...

    By the way, the attempt at intermediate materialisation (the TOP + ORDER BY) doesn't work - I tried to get the QO to do this in some of my attempts too.

    I guess this is a relict of my intention to get the results sorted and can be removed. The TOP was only to be able to specify a ORDER BY.

    The big difference in the plans seems to be this filter:

    {moved to attachment Filter.txt}

    That executes a substring 799,000 times!.

    Good plan:

    {moved to attachment GoodPlan.txt}

    Bad plan:

    {moved to attachment BadPlan.txt}

    I just don't understand why this difference happens... :unsure:

    I'm no execution plan pro. Remember I'm a developer and if something is too slow the hardware is to small. 😀

    I just attached my execution plan as text and sqlplan (zipped). They look a bit different because I have currently only Sql2k8.

    On my machine (SQL2K5 Dev dual-core 9.0.4211 SP3+)

    I don't know if this helps:

    Quad Core 2,4 / 8 Gig RAM / Windows Server 2008 x64

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)

    Mar 29 2009 10:11:52

    Copyright (c) 1988-2008 Microsoft Corporation

    Developer Edition (64-bit) on Windows NT 6.0 (Build 6001: Service Pack 1)

    Thank you very much for all your help!

    Flo

  • Paul White (4/21/2009)


    Morning Flo!

    On first look, the performance improvement is down to the QO choosing a much more sensible plan when a VARCHAR is used, rather than a MAX datatype. This is all too familiar of course - the QO often carps out when LOBs turn up. I think Lynn and I both mentioned that about 4,000 posts back.

    By the way, the attempt at intermediate materialisation (the TOP + ORDER BY) doesn't work - I tried to get the QO to do this in some of my attempts too.

    There is much more to say, but this is just a quick first post.

    The big difference in the plans seems to be this filter:

    {moved to attachment Filter.txt}

    That executes a substring 799,000 times!.

    On my machine (SQL2K5 Dev dual-core 9.0.4211 SP3+)

    Good plan:

    {moved to attachment GoodPlan.txt}

    Bad plan:

    {moved to attachment BadPlan.txt}

    Copying and pasting the text into a SSMS window seems to work quite well.

    Cheers,

    Paul

    edit: moved the vast text into attachments (apologies)

    It's not the presence of TOP nor ORDER BY. It's the presence of UNION ALL. In the testing I did with Flo, I actually created a situation where the first SELECT in the UNION ALL returned 0 rows and it knocked an 800 column split on 1000 rows on a VARCHAR(MAX) down from 44 seconds to only 6. It somehow forces the optimizer to, as Flo says, use a working table that wouldn't normally be utilized to keep from building 799,000 rows using the Tally table. Instead, it only reads 800 rows from the Tally table and spawns the rest of the rows...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 151 through 165 (of 522 total)

You must be logged in to reply to this topic. Login to reply