Performance issue with tally solution

  • I did some testing myself using the testsuite Jeff made a few posts back. Thanks, it ran good first time....and on my machine (a pretty fast one) the timings indicate my IO less function is just as fast as the crossjoin method Jeff made.

    In fact all methods are pretty close together, what it now comes down to is resource use. We should determine if the tableless versions cap out on memory at some point. Again I got plenty of memory so its hard to test for me as I cant restrict the amount of memory SQL Server can use, other people are working on it too. And we need to put each method into a set of different slutions to see how the affects queryplans and overall efficiency. I think this is where real differences will show up!

    Can somone else take a look at it as well?

    And I am also interested to see how this function works on lesser equipment....calling Jeff here ;)...

  • Here is an update to my tally function, it has now slightly less overhead by eliminating two cross joins. I did this by moving from a 4 bit (0-16) constant table to 6 bit (0-63) wide one.

    The result is a slight speed bump when using large numbers and less clutter when examining execution plans in code that uses the function!

    -- Tally table function (24 bit, large enaugh for general purpose use)

    --

    create function dbo.tfnTally24bit( @max-2 int ) returns table

    as

    return

    (

    with

    nQ( N ) as

    (

    select 0 union all select 0 union all select 0 union all select 0 union all -- 4

    select 0 union all select 0 union all select 0 union all select 0 union all -- 8

    select 0 union all select 0 union all select 0 union all select 0 union all -- 12

    select 0 union all select 0 union all select 0 union all select 0 union all -- 16

    select 0 union all select 0 union all select 0 union all select 0 union all -- 20

    select 0 union all select 0 union all select 0 union all select 0 union all -- 24

    select 0 union all select 0 union all select 0 union all select 0 union all -- 28

    select 0 union all select 0 union all select 0 union all select 0 union all -- 32

    select 0 union all select 0 union all select 0 union all select 0 union all -- 36

    select 0 union all select 0 union all select 0 union all select 0 union all -- 40

    select 0 union all select 0 union all select 0 union all select 0 union all -- 44

    select 0 union all select 0 union all select 0 union all select 0 union all -- 48

    select 0 union all select 0 union all select 0 union all select 0 union all -- 52

    select 0 union all select 0 union all select 0 union all select 0 union all -- 56

    select 0 union all select 0 union all select 0 union all select 0 union all -- 60

    select 0 union all select 0 union all select 0 union all select 0 -- 64

    )

    select top ( isnull( @max-2, 0 ) )

    row_number() over ( order by anchor.constant ) as n

    from

    ( select 0 ) as anchor( constant )

    cross join nQ as n1 -- 64 ( 6 bit)

    cross join nQ as n2 -- 4096 ( 12 bit)

    cross join nQ as n3 -- 262144 ( 18 bit)

    cross join nQ as n4 -- 16777216 ( 24 bit)

    )

    ;

    Enjoy

  • Me again....

    I modified the classic Tally solution for string splitting as found on the initial pages and made speed improvements.

    This solution is unlikely to suffer from the predictable data problem that the really fast solutions turned out to be victim of.

    This code is based on that of Jeff which got improved by Barry and now its my turn....grin.

    I will explain in a bit after you inspected the code for yourself :).

    create function dbo.fnSplitClassicTweak

    (

    @parameter varchar(Max)

    , @Separator Varchar(64)

    , @sectorsize int = 8 -- have this size closly match the expected separator-less blocks in the input

    )

    returns

    @Items TABLE

    (

    ID INT identity(1,1) primary key clustered -- the element number

    , item VARCHAR(8000) not null -- the split-out string element

    , OffSet int not null -- the original offest / ( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() )

    )

    as

    begin

    --our seperator character (convenient, doesn't affect performance)

    declare @Sep char(1);

    select @Sep = char( 10 );

    --NOTE: we make the @Sep character LF so that we will automatically

    -- parse out rogue LF-only line breaks.

    --===== Add start and end seprators to the Parameter so we can handle

    -- all the elements the same way

    -- Also change the seperator expressions to our seperator

    -- character to keep all offsets = 1

    select @Parameter = @Sep + Replace( @Parameter, @Separator, @Sep ) + @Sep;

    insert into @Items( Offset, item )

    select

    (charpos.N + sector.N) + 1

    , substring( @Parameter, (charpos.N + sector.N) + 1, charindex( @Sep, @Parameter, (charpos.N + sector.N) + 1) - (charpos.N + sector.N) - 1 )

    from

    (

    select

    @sectorsize * (s.n - 1) as N

    from

    dbo.tfnTally24bit( (datalength( @parameter ) + @sectorsize - 1) / @sectorsize ) as s

    where

    -- Notice how we idenify sectors of interest

    charindex( @Sep, substring( @parameter, 1 + (@sectorsize * (s.n - 1)), @sectorsize ) ) != 0

    ) as sector

    cross join dbo.tfnTally24bit( @sectorsize ) as charpos

    where

    -- Notice how we find the separator within for a sector of interest

    (charpos.N + sector.N) < datalength( @Parameter ) and substring( @Parameter, (charpos.N + sector.N), 1 ) = @Sep

    order by 1

    ;

    return;

    end

    go

    This is what I call an engineering solution....it address some of the drawbacks and adds a few minor tweaks....no radical things.

    Improvements

    1. The tally table from my previous post (no I/O, and in all test I did so far it performs as the best).

    2. The result table now has an identity column, which makes it possible for me to dispose of the row_number() over construction as I can do an order by 1 on the character index to generate the IDs.

    3. Added a clustered primary key in the result table.

    4. Now there a configurable mechanism that splits the input string into even sized pieces (I call them sectors). You can configure the size of these sectors so it best matches the separator distances typically experienced in your data.

    Point 4 is by far the most significant improvement as no longer is every character inspected by a costly substring operation. First all sector are scanned using charindex and if a match is found that sector becomes part of the derived table that contains all sectors to be inspected in the classic way. Thus if your sector size is small enough to have plenty of them without separator and the still larger then 3 to 4 characters you should save on string manipulation costs and thus gain speed.

    I hope to have given a new impulse and some new ideas for you guys to incorporate into your own solutions, be it string splitting or otherwise 🙂

    Looking forward to some replies, its getting lonely here and it was a hard sleepless night work!

    Cheers!

    Special note to Jeff: I saved your tally.......it beats the cursor based function 🙂

  • peter (5/4/2009)


    Here is an update to my tally function, it has now slightly less overhead by eliminating two cross joins.

    As you keenly observed before, all of these are now in the area of greased lightning and it comes down to which kind of resources you want to expend. Here's the same million row test on my "slow" box with your latest function...

    [font="Courier New"]====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    ===== Matt Miller's Method =====

    SQL Server Execution Times:

    CPU time = 906 ms, elapsed time = 979 ms.

    ====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    ===== Itzek's Method =====

    SQL Server Execution Times:

    CPU time = 844 ms, elapsed time = 853 ms.

    ====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    ===== Jeff Moden's Method

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'syscolrdb'. Scan count 2, logical reads 98, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 1, logical reads 7, physical reads 1, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 719 ms, elapsed time = 832 ms.

    ====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    ===== RBarryYoung's Method

    Table 'syscolrdb'. Scan count 2, logical reads 22, physical reads 2, read-ahead reads 97, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 10, physical reads 1, read-ahead reads 47, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 734 ms, elapsed time = 808 ms.

    ====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    ===== Combined Method

    Table 'syscolrdb'. Scan count 2, logical reads 12, physical reads 0, read-ahead reads 92, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 14, physical reads 1, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 703 ms, elapsed time = 865 ms.

    ====================================================================================================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    [highlight="YELLOW"]===== Peter's Method

    SQL Server Execution Times:

    CPU time = 719 ms, elapsed time = 726 ms.[/highlight][/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the information Jeff...care to take a quick look at what your string splitter evolved into as well?

    PS.

    I see still room for improvement for the classic method..my strategy is to eliminate as much substring operations as possible as that is the main killer!

    In fact I already got a faster version ready, but am still trying to squeeze more out of it 🙂

    First a slight tally update:

    -- Tally table function (24 bit, large enaugh for general purpose use)

    --

    alter function dbo.tfnTally24bit( @max-2 int ) returns table

    as

    return

    (

    with

    nQ( N ) as

    (

    select 0 union all select 0 union all select 0 union all select 0 union all -- 4

    select 0 union all select 0 union all select 0 union all select 0 union all -- 8

    select 0 union all select 0 union all select 0 union all select 0 union all -- 12

    select 0 union all select 0 union all select 0 union all select 0 union all -- 16

    select 0 union all select 0 union all select 0 union all select 0 union all -- 20

    select 0 union all select 0 union all select 0 union all select 0 union all -- 24

    select 0 union all select 0 union all select 0 union all select 0 union all -- 28

    select 0 union all select 0 union all select 0 union all select 0 union all -- 32

    select 0 union all select 0 union all select 0 union all select 0 union all -- 36

    select 0 union all select 0 union all select 0 union all select 0 union all -- 40

    select 0 union all select 0 union all select 0 union all select 0 union all -- 44

    select 0 union all select 0 union all select 0 union all select 0 union all -- 48

    select 0 union all select 0 union all select 0 union all select 0 union all -- 52

    select 0 union all select 0 union all select 0 union all select 0 union all -- 56

    select 0 union all select 0 union all select 0 union all select 0 union all -- 60

    select 0 union all select 0 union all select 0 union all select 0 -- 64

    )

    select top ( isnull( @max-2, 0 ) )

    cast( row_number() over ( order by anchor.constant ) as int ) as n

    , cast( row_number() over ( order by anchor.constant ) as tinyint ) as n8

    , cast( row_number() over ( order by anchor.constant ) as smallint ) as n16

    , row_number() over ( order by anchor.constant ) as n64

    from

    ( select 0 ) as anchor( constant )

    cross join nQ as n1 -- 64 ( 6 bit)

    cross join nQ as n2 -- 4096 ( 12 bit)

    cross join nQ as n3 -- 262144 ( 18 bit)

    cross join nQ as n4 -- 16777216 ( 24 bit)

    )

    ;

    The cast to a 32 bit integer is slightly faster when done in the tally function instead of numerous implicit casts in the consuming queries!

    Added also a few other columns to different integer sizes, only the one you select will be generated, so far I detected no overhead in this and it makes selecting s certain integer type more convenient.

  • So, I'm interested to hear the conclusion of this matter. Peter seems to have picked up on a way to extract more out of the tally table method. Is this enough? What about the scalability issues that existed with the CLR have those been solved?

    I'd like to hear Flo's final analysis on this or someone else who has been working this problem through.

    I've got my popcorn ready.

  • peter (5/5/2009)


    Thanks for the information Jeff...care to take a quick look at what your string splitter evolved into as well?

    Sorry Peter... I'm out of coffee and I'm not sure of what you speak. What is it that you'd like me to take a look at? Since this thread has gotten quite long, would you mind posting the splitter code you want me to look at?

    Or, are you talking about the new Tally generator you just posted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry, Peter... I got it. You were talking about the code in the previous post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Peter,

    How do I determine what the optimal sector size in your splitter code should be?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Factor (5/1/2009)


    Jeff,

    ...but their developers seem unable to fix things so they work like they used to.

    Ouch.

    They're doing all they can to put in a better solution ASAP. The problem was that it became impossible to improve the browser-side 'prettifier' without a lot of effort (Javascript regex solutions can only do so much) and they are planning on using a vastly better server-side solution. They're working on it urgently at this very moment.

    You can always use my Prettifier in the meantime!

    Any word on this, Phil? Copying code from the code windows makes some really ugly stuff with extra line breaks, no indentation, and all those nasty things I posted on the test thread for the code windows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Peter,

    Sorry man. I tested against 1000 lines of 800 comma separated random numbers each in a VARCHAR(8000) environment... your code did the split into a table at a minute and 50 seconds. Many of the other solutions come in at less than 10 seconds for the same thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... oh my... :hehe: I believe I've found the secret as to why Cross Apply makes the Tally table solutions run so fast. It forces an unnatural ORDER BY and the same thing can be accomplished in straight code by either using the correct ORDER BY in ROW_NUMBER or by a real ORDER BY in SQL Server 2000 instances.

    Without the correct ORDER BY either explicitly or by CROSS APPLY, it takes 1:37 to split 1000 rows of 800 highly randomized numbers to be split into a result table.

    With the correct ORDER BY explicitly by either ROW_NUMBER or a real ORDER BY or implicitly by Cross Apply, the same thing runs in about 10 seconds.

    SQL Server makes some really bad execution plan decisions on Tally Table based splitters. In the example above and left to it's own devices, it created some hidden RBAR to the tune of about 7.6 million internal rows. Adding one of the ORDER BY methods to force the order by the row number of the row in the source table and THEN by the Tally Table number removed all that hidden RBAR.

    I'll see if I can put a nice little test together so that folks can run it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... first of all, the following code assumes that you have a Tally table that has at least a count of 8000. I'm currently using a Tally table with 1 million rows in it and it. For those that want to play along, here's the code for that (everything built in TempDB so no harm to any existing code)...

    [font="Courier New"]--===== Do this in a nice safe place

        USE TempDB

    --===== Create and populate the Tally table on the fly

     SELECT TOP 1000000

            IDENTITY(INT,1,1AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

        ADD CONSTRAINT PK_Tally_N 

            PRIMARY KEY CLUSTERED (NWITH FILLFACTOR = 100

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC[/font]

    Here's the code I use to build the test cases. Read the comments in the header for how to use it. Currently, it's set to create 1000 rows of 800 randomized numbers for each row.

    [font="Courier New"]--drop table jbmtest

    /**************************************************************************************************

     Purpose:

      Generate highly randomized CSV rows of random numbers for CSV Splitter testing.  It' virtually 

      impossible that this code will create any identical rows with this code. It's not "Moby Dick", 

      but it does produce CSV's that are more likely to be seen in the real world.

      I wrote this so that it's compatible with both SQL Server 2000 and 2005 so virtually anyone

      can play along.  The only thing that needs a manual change to run in 2k is the object of the TOP

      statement in the actual table creation code below.

      Notes:

      1. The code is setup for VARCHAR(8000).  To test for VARCHAR(MAX), do a Search'n'Replace to

         change all VARCHAR(8000) to VARCHAR(MAX).

      2. If you get the following error, you've selected too many elements for the size numbers and 

         have most likely exceeded the VARCHAR(8000).  You simply need to reduce either the range of

         numbers or the number of elements, or both.  Of course, that probably won't happen with

         VARCHAR(MAX) if you've changed to that.

            Msg 8152, Level 16, State 14, Line 32

            String or binary data would be truncated.

            The statement has been terminated.

      3. The settings in the code are currently set to make 800 elements with numbers ranging from 

         1 to 1,000,000,000 (1 to 10 characters) that pretty much fill up VARCHAR(8000).

      4. Look for "Change the values here" in the code below.

    --  Jeff Moden

    **************************************************************************************************/

    --===== Change to a safe place to "play"

        USE TempDB

    GO

    --=================================================================================================

    --      Conditionally drop the special objects that we'll need to create

    --      (just to simplify reruns)

    --=================================================================================================

         IF OBJECT_ID('TempDB.dbo.RandomPositiveInt'IS NOT NULL

            DROP VIEW dbo.RandomPositiveInt

         IF OBJECT_ID('TempDB.dbo.RandomIntCSV'IS NOT NULL

            DROP FUNCTION dbo.RandomIntCSV

    GO

    --=================================================================================================

    --===== Create a view to return random positive integers starting at 0.

         -- This is necessary because you cannot yet use NEWID() in a UDF even in 2k5.

    --================================================================================================='

     CREATE VIEW dbo.RandomPositiveInt AS

     SELECT ABS(CHECKSUM(NEWID())) AS RandomPositiveInt

    GO

    --=================================================================================================

    --===== Create a function to make predictable ranges of CSV'd numbers

    --=================================================================================================

     CREATE FUNCTION dbo.RandomIntCSV(@NumberOfElements INT@Range INT@Offset INT)

    RETURNS VARCHAR(8000)

         AS

      BEGIN

            DECLARE @ReturnCSV VARCHAR(8000)

             SELECT @ReturnCSV ISNULL(@ReturnCSV+',','')

                  + CAST(rpi.RandomPositiveInt%@Range+@Offset AS VARCHAR(10))

               FROM dbo.Tally t

              CROSS JOIN dbo.RandomPositiveInt rpi

              WHERE t.N <= @NumberOfElements

     RETURN @ReturnCSV

        END

    GO

    --=================================================================================================

    --      This section of code creates the test table based on 

    --=================================================================================================

    --===== Declare some control variables that we can change the values of to make the generation

         -- of different size and width tables easy

    DECLARE @NumRows       INT,     --Number of rows to create

            @NumElements   INT,     --Per row

            @ElementRange  INT,     --Number of numbers in the rangge

            @ElementOffSet INT,     --First number to appear in the range

            @AddDelimiters CHAR(1)  --If = 'Y', adds leading/trailing delimiters

    --===== Change the values here

     SELECT @NumRows       1000,

            @NumElements   800,

            @ElementRange  1000000000,

            @ElementOffSet 1,

            @AddDelimiters 'Y'

    --===== Conditionally drop the test table so we can rebuild it on reruns

         IF OBJECT_ID('dbo.JBMTest'IS NOT NULL

            DROP TABLE dbo.JBMTest

    --===== Create and populate the test table on the fly

     SELECT TOP (@NumRows-- You will need to hard code the value here in 2k.

            IDENTITY(INT,1,1AS RowNum,

            dbo.RandomIntCSV(@NumElements,@ElementRange,@ElementOffSetAS CSV

       INTO dbo.JBMTest

       FROM dbo.Tally

    --===== Conditionally add leading and trailing delimiters

         IF @AddDelimiters 'Y'

     UPDATE dbo.JBMTest

        SET CSV ','+CSV+','

    --===== Add a Primary Key

      ALTER TABLE dbo.JBMTest

        ADD CONSTRAINT PK_JBMTest_RowNum

            PRIMARY KEY CLUSTERED (RowNumWITH FILLFACTOR = 100

    --===== Display the stats for the new test table just for convenience in testing

     SELECT COUNT(*)       AS MeasuredRowCount,

            @NumElements   AS DesignedNumElements,

            CASE

                WHEN @AddDelimiters 'Y'

                THEN MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) - 1)

                ELSE MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) + 1)

            END AS MeasuredNumElements,

            @ElementOffSet AS DesignedElementOffset,

            @ElementRange  AS DesignedElementRange,

            @AddDelimiters AS DelimitersAdded,

            MIN(LEN(CSV))  AS MinCsvLength,

            AVG(LEN(CSV))  AS AvgCsvLength,

            MAX(LEN(CSV))  AS MaxCsvLength 

       FROM dbo.JBMTest

    GO

    [/font]

    And here's the simple bit of experimenting I did with 2 different forms of explicit order by (we've already seen the Cross Apply run, so not included here)...

    [font="Courier New"]--===== This takes a whopping big 1:37 on my box

    DROP TABLE Result

    DECLARE @delimiter CHAR(1

     SELECT @delimiter ',' 

     SELECT RowNum,

            ROW_NUMBER() OVER (ORDER BY t.NAS ID,

            SUBSTRING(s.CSVt.N +1CHARINDEX(@delimiters.CSVt.N +1) -t.N -1AS Item

       INTO dbo.Result

       FROM dbo.Tally t

      CROSS JOIN jbmtest s

      WHERE t.N LEN(s.CSV)

        AND SUBSTRING(s.CSVt.N1@delimiter

    ---------------------------------------------------------------------------------------

    --===== This only takes 8 seconds

    DROP TABLE Result

    DECLARE @delimiter CHAR(1

     SELECT @delimiter ',' 

     SELECT RowNum,

            ROW_NUMBER() OVER (ORDER BY s.RowNum,t.NAS ID,

            SUBSTRING(s.CSVt.N +1CHARINDEX(@delimiters.CSVt.N +1) -t.N -1AS Item

       INTO dbo.Result

       FROM dbo.Tally t

      CROSS JOIN jbmtest s

      WHERE t.N LEN(s.CSV)

        AND SUBSTRING(s.CSVt.N1@delimiter

    ---------------------------------------------------------------------------------------

    --===== This only takes 8 seconds, as well

    DROP TABLE Result

    DECLARE @delimiter CHAR(1

     SELECT @delimiter ',' 

     SELECT RowNum,

            SUBSTRING(s.CSVt.N +1CHARINDEX(@delimiters.CSVt.N +1) -t.N -1AS Item

       INTO dbo.Result

       FROM dbo.Tally t

      CROSS JOIN jbmtest s

      WHERE t.N LEN(s.CSV)

        AND SUBSTRING(s.CSVt.N1@delimiter

      ORDER BY s.RowNum,t.n[/font]

    By the way, I've done some testing against this same bit of highly randomized data and it seems that type of "real life" data really blows some of the other solutions out of the water. I won't post that testing here because I don't want to steal any of the thunder on Flo's article.

    Heh... Long live the Tally table. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    Some initial comments:

    The version that takes forever includes an index spool in the plan. The modifications you made to the other tally approaches remove that operator. The fastest versions run in a little over ten seconds on my laptop.

    The CSV length of ~8000 characters precludes using the fastest CLR implementation (it has a limit of NVARCHAR(4000)).

    Using the NVARCHAR(MAX) version of Adam's memory-friendly routine - which has to convert to and from Unicode as well as taking the performance hit imposed by the MAX datatype - runs in just under seven seconds.

    Long live the CLR solutions! 😀

    Paul

  • I agree... the CLR solutions, especially that one, look pretty good.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 391 through 405 (of 522 total)

You must be logged in to reply to this topic. Login to reply