Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Peter:

    Here is my tally table generation code. The important elements are likely the DBREINDEX and UPDATE STATISTICS; while you "shouldn't" need them, I've found a very small difference by using them... and this thread is all about finding very small differences.

    Much of the other silliness (temp tally tables just to fill permanent ones) you see is to get it to work on SQL Server 2000 boxes with large values of N, to handle the VARCHAR(MAX) splitters and other larger work.

    IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#TallyTempBase]'))

    DROP TABLE [dbo].[#TallyTempBase]

    CREATE TABLE #TallyTempBase

    (N bit NOT NULL

    );

    INSERT INTO #TallyTempBase

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[Tally]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)

    DROP TABLE [dbo].[Tally]

    CREATE TABLE dbo.[Tally] (

    [N] [smallint] NOT NULL ,

    CONSTRAINT [PK_Tally_N] PRIMARY KEY CLUSTERED

    (

    [N]

    ) WITH FILLFACTOR = 100 ON [PRIMARY]

    ) ON [PRIMARY]

    IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#Tally]'))

    DROP TABLE [dbo].[#Tally]

    SELECT TOP 32767

    IDENTITY(SMALLINT,1,1) AS N

    INTO #Tally

    FROM #TallyTempBase t1 -- 16

    CROSS JOIN #TallyTempBase t2 -- 256

    CROSS JOIN #TallyTempBase t3 -- 4096

    CROSS JOIN #TallyTempBase t4 -- 65536

    INSERT INTO dbo.tally

    SELECT TOP 32767 N

    FROM #Tally

    ORDER BY N

    DROP TABLE #Tally

    DBCC DBREINDEX('dbo.tally')

    UPDATE STATISTICS tally WITH FULLSCAN, NORECOMPUTE

    IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[Tally0Based]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)

    DROP TABLE [dbo].[Tally0Based]

    CREATE TABLE dbo.[Tally0Based] (

    [N] [smallint] NOT NULL ,

    CONSTRAINT [PK_Tally0Based_N] PRIMARY KEY CLUSTERED

    (

    [N]

    ) WITH FILLFACTOR = 100 ON [PRIMARY]

    ) ON [PRIMARY]

    IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = object_id(N'[tempdb].[dbo].[#Tally0Based]'))

    DROP TABLE [dbo].[#Tally0Based]

    SELECT TOP 32768

    IDENTITY(SMALLINT,0,1) AS N

    INTO #Tally0Based

    FROM #TallyTempBase t1 -- 16

    CROSS JOIN #TallyTempBase t2 -- 256

    CROSS JOIN #TallyTempBase t3 -- 4096

    CROSS JOIN #TallyTempBase t4 -- 65536

    INSERT INTO dbo.Tally0Based

    SELECT TOP 32768 N

    FROM #Tally0Based

    ORDER BY N

    DROP TABLE #Tally0Based

    DBCC DBREINDEX('dbo.Tally0Based')

    UPDATE STATISTICS Tally0Based WITH FULLSCAN, NORECOMPUTE

    P.S. Don't be surprised by the optimizer; it's entirely likely that some query hints are required to get optimal performance; from 2000 to 2008 R2 I've seen some mind-bogglingly bad choices from the optimizer; real performance tuning is now, and has always been, about benchmarking reality... because the theory doesn't always hold true in practice, even (though rarely) at the assembly code level (Pentium floating point bug, anyone?).

  • Jeff Moden (5/7/2011)


    ...

    Does anyone else have any other suggestions for a speed enhancement before I do that?

    Nadrek and Peter. Thanks a million for taking the time to tweek the code. Wayne, thanks for testing so many people's suggestions and posting the results in graph form. You've just got to love to collaboration that occurred in this thread. Well done, folks! 🙂

    You're quite welcome!

    For the temporary tally table variant, I would also suggest something other than a power of 10 multi-CTE; 20^3 is 8000, and 21^3 is just over 9000. They're not many more UNIONs, slightly smaller overall size, and one fewer join in the nested CTEs. Also try 91^2 (8281), with two fewer CTE's.

    Also, make certain it's using a smallint, not an int; no reason to spend 4 bytes per number when it doesn't get over a value of 32767.

  • Nadrek (5/7/2011)


    Jeff Moden (5/7/2011)


    You're quite welcome!

    For the temporary tally table variant, I would also suggest something other than a power of 10 multi-CTE; 20^3 is 8000, and 21^3 is just over 9000. They're not many more UNIONs, slightly smaller overall size, and one fewer join in the nested CTEs. Also try 91^2 (8281), with two fewer CTE's.

    Also, make certain it's using a smallint, not an int; no reason to spend 4 bytes per number when it doesn't get over a value of 32767.

    I tried that too, but since the tally driving tables (E1 to E4) return a constant it does not seem to matter one bit between returning a bit or an int. None of these values are used and the only one that is used, is in the cteTally, which is a generated number and thus likely optimized to be memory-less as well. There seem to be no data stored in memory, other then some counters. I would need to look inside the memory usage itself to make sure (can be important on a busy system), but would expect a speed difference which i did not see during my previous tests.

    I am certainly going to look a but further into it tho, as I did notice some implicit converts, sometimes even from int to bigint and then the next operation back to int again.

  • Ok... Peter and Nadrek... I need to know your real names for inclusion in the final documentation in the updated sprocs that I'm going to attach to the article. Either tell me here or send me an email. If you prefer for your full names not to be known, let me know that and I'll just include your SSC handles.

    But I need to know sooner than later so I can put this puppy to bed. Thanks folks. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By the way... I've attached the final results which also contains more charts than you can shake a stick at.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/8/2011)


    Ok... Peter and Nadrek... I need to know your real names for inclusion in the final documentation in the updated sprocs that I'm going to attach to the article. Either tell me here or send me an email. If you prefer for your full names not to be known, let me know that and I'll just include your SSC handles.

    But I need to know sooner than later so I can put this puppy to bed. Thanks folks. 🙂

    Well my name is: Peter de Heer

  • Jeff Moden (5/8/2011)


    By the way... I've attached the final results which also contains more charts than you can shake a stick at.

    And wow, that Split method (green line) does some crazy stuff, but is overall clearly the winner.

    Is that the .NET CLR one or a new SQL based one I missed?

  • peter-757102 (5/9/2011)


    Jeff Moden (5/8/2011)


    By the way... I've attached the final results which also contains more charts than you can shake a stick at.

    And wow, that Split method (green line) does some crazy stuff, but is overall clearly the winner.

    Is that the .NET CLR one or a new SQL based one I missed?

    That's the CLR that Paul wrote for me. As you can tell from the first chart, it doesn't like stuff that doesn't need to be split and appears to be the only real spot where it has troubles. As you say, it's clearly the overall winner.

    Thank you for the full name, Peter. You and Nadrek did good things on this thread. The final script I attach will be the documented version of your DelimitedSplit8k_T1 (without the _T1 in the name). This function has been "helped" by a lot of good folks over time and I wanted to be sure to give you both some credit for your extreme and well though out participation as well as the great improvement you made through DelimitedSplit8k_T1.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    This article inspired me to write a TallyTable article for Access users/developers.

    And thanks to the other commenters for your patience and enlightening comments.

    Mark

  • About the little wiggly bits at the start of the green lines (the SQL CLR implementation): CLR code is compiled to an intermediate language to make it portable across different processor architectures. This code is further optimized and compiled to native machine code by the just-in time compiler when it runs on the machine for the first time. This conversion to optimal machine code is performed only once, so it survives SQL Server restarts for example.

    This is the reason that tests should run the newly-installed CLR function with a few dummy values before 'hot laps'. As different code paths are compiled to machine code on demand, the CLR implementation reaches its full potential.

    My guess is that Jeff forgot about this and the first runs (on very small numbers of elements and element lengths) were running CLR code that was not yet fully compiled to native code. It really doesn't matter though - the test results are utterly conclusive.

    Also, bear in mind that the CLR code is obvious and basic, and written by an amateur in ten minutes. Compare that to the effort expended to try to get 10% better performance out of the TSQL solutions - a language which just is not well suited to these types of tasks.

    Consider also that the CLR function works with strings up to 2GB (MAX data types), Unicode, and is the only implementation to scale linearly with the number of elements; look at the composite graphs: the CLR lines almost overlap!

    I should probably also mention that CLR routines work exclusively with Unicode, so the CLR routine has the overhead of converting the VARCHAR input strings to Unicode, and the returned results (also Unicode) are twice as big, at two bytes per character. That means the CLR routine is writing twice as much data to disk in Jeff's tests, and converting to or from Unicode for every row of input and output!

    Paul

  • SQLkiwi

    Can people use ngen to compile the assembly and save the JIT overhead?

  • SQLkiwi (5/9/2011)


    About the little wiggly bits at the start of the green lines (the SQL CLR implementation): CLR code is compiled to an intermediate language to make it portable across different processor architectures. This code is further optimized and compiled to native machine code by the just-in time compiler when it runs on the machine for the first time. This conversion to optimal machine code is performed only once, so it survives SQL Server restarts for example.

    This is the reason that tests should run the newly-installed CLR function with a few dummy values before 'hot laps'. As different code paths are compiled to machine code on demand, the CLR implementation reaches its full potential.

    My guess is that Jeff forgot about this and the first runs (on very small numbers of elements and element lengths) were running CLR code that was not yet fully compiled to native code. It really doesn't matter though - the test results are utterly conclusive.

    Also, bear in mind that the CLR code is obvious and basic, and written by an amateur in ten minutes. Compare that to the effort expended to try to get 10% better performance out of the TSQL solutions - a language which just is not well suited to these types of tasks.

    Consider also that the CLR function works with strings up to 2GB (MAX data types), Unicode, and is the only implementation to scale linearly with the number of elements; look at the composite graphs: the CLR lines almost overlap!

    I should probably also mention that CLR routines work exclusively with Unicode, so the CLR routine has the overhead of converting the VARCHAR input strings to Unicode, and the returned results (also Unicode) are twice as big, at two bytes per character. That means the CLR routine is writing twice as much data to disk in Jeff's tests, and converting to or from Unicode for every row of input and output!

    Paul

    Yep... forgot about that. I'll try it a different way. Thanks, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mark hutchinson (5/9/2011)


    Jeff

    This article inspired me to write a TallyTable article for Access users/developers.

    And thanks to the other commenters for your patience and enlightening comments.

    Mark

    Very cool. Hopefully, it'll be here on SSC but, if it's not, let us know what the URL is... I'd love to have a read on it. Thanks. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    I just submitted it yesterday. It hasn't entered the editor review phase yet. I'd like to know what you think. I might be able to your incorporate suggestions during the editing process.

    http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_5410-Creating-and-Using-a-Tally-Table-in-Access.html

  • Thanks, Mark. I'll have a look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 211 through 225 (of 990 total)

You must be logged in to reply to this topic. Login to reply