Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Jeff, a very nicely done write up! I know it took a lot of time to put together, thanks for continuing to do this kind of work and share with the community.

  • Andy Warren (5/4/2011)


    Jeff, a very nicely done write up! I know it took a lot of time to put together, thanks for continuing to do this kind of work and share with the community.

    who needs sleep??

  • Jeff Moden (5/3/2011)


    mtassin (5/3/2011)


    WayneS (5/3/2011)


    ChrisM@home (5/3/2011)


    Jeff, please please please write a book, without changing your style. Give us all an opportunity to repay you for what you've done for us.

    +10!

    Yes, please do write a book...

    Oddly enough on the first of every month, I search Amazon for Jeff's name hoping to find a book for my kindle 🙂

    You folks are too much. :blush: Thanks for the confidence. I actually started to work on a book, gosh, I guess it was two years ago. I got all balled up at work and also came down with Bronchitis that hung on for about 4 months and a bunch of other things and put writing asided. Tony Davis sent me an email last month asking if I could start again. Guess there's no time like the present. Now, all I have to do is convince Tony Davis the ChrisM@home is right... let me write it in my style... it'll be a whole lot easier for me to be me.:-P

    Jeff if not a complete book, you might consider a chapter or two in the next "SQL Server MVP Deep Dives"

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • bitbucket-25253 (5/4/2011)


    Jeff Moden (5/3/2011)


    mtassin (5/3/2011)


    WayneS (5/3/2011)


    ChrisM@home (5/3/2011)


    Jeff, please please please write a book, without changing your style. Give us all an opportunity to repay you for what you've done for us.

    +10!

    Yes, please do write a book...

    Oddly enough on the first of every month, I search Amazon for Jeff's name hoping to find a book for my kindle 🙂

    You folks are too much. :blush: Thanks for the confidence. I actually started to work on a book, gosh, I guess it was two years ago. I got all balled up at work and also came down with Bronchitis that hung on for about 4 months and a bunch of other things and put writing asided. Tony Davis sent me an email last month asking if I could start again. Guess there's no time like the present. Now, all I have to do is convince Tony Davis the ChrisM@home is right... let me write it in my style... it'll be a whole lot easier for me to be me.:-P

    Jeff if not a complete book, you might consider a chapter or two in the next "SQL Server MVP Deep Dives"

    How about a tear-off calendar handling one paragraph a day ? :w00t::hehe:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Naomi N (5/3/2011)


    Jeff,

    That was one of the best articles I read recently. Terrific job!

    BTW, is there a way to contact you privately somehow?

    Thanks again.

    Thanks, Naomi. I appreciate the kudo.

    And, yes... there is a way to contact me. If you click on my name at the top left of this post (not the whole thread, this post), a pop-up menu will appear and one of the selections is "Send Email to Member".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nadrek (5/3/2011)


    SQLRNNR (5/3/2011)


    Please post your code for this multi cte temporary tally table that outperforms the others. I am sure several others would like to test it.

    The multi cte temporary tally table is Jeff's from the article.

    The three permanent tally table versions are already in my posts farther up this thread.

    Those are interesting, as well. Thanks for posting them. I wasn't able to get any testing done on those or other functions offered on this fine thread last night. I opened my laptop where all my code is stored for the first time in a week and I made the mistake of letting it do some updates... they took half the night. :pinch:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Andy Warren (5/4/2011)


    Jeff, a very nicely done write up! I know it took a lot of time to put together, thanks for continuing to do this kind of work and share with the community.

    Hi Andy. Thanks for stopping by and I really appreciate the great kudo especially coming from a principle in the world of SQL Server. What's better than any article I could ever write is the fine discussions that follow. Lots of good folks have shared some awesome ideas on this thread. It's a fine example of what the SQL Server community does day in and day out and I'm happy I can be a part of it. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLkiwi (5/4/2011)


    This is the CLR code I wrote for Jeff.

    Thanks for stopping by, Paul. Now I don't have to send you an email to ask you to "stop by" with your code. I didn't want to presume to post the code you gave me without asking. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLkiwi (5/4/2011)


    This is the CLR code I wrote for Jeff. It's not flash, but it is simple, fast, and handles strings of any length, Unicode or not.

    There are three implementations below. Splitter 'B' is the one referred to above, and works on all versions of SQL Server from 2005 RTM onward.

    Splitter 'A' features simpler code, but fell victim to a SQL Server bug introduced in 2008 RTM. I have tested Splitter A on 2005 and 2008 SP2 with no problems.

    Splitter 'C' is a .NET 3.5 implementation provided for interest - it requires at least SQL Server 2008.

    All versions are very fast - 2-3 times faster than the best TSQL solutions. I suspect Splitters A & B may be slighty faster than C, but I have never bothered to test. I only write basic C#as a hobby; no doubt better implementations are possible.

    If you need a splitter that handles mutliple-character delimiters, try Adam Machanic's code here:

    http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

    Thanks for posting this fine code.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Jeff Moden (5/3/2011)


    Roy Ernest (5/3/2011)


    As usual, great post Jeff. Five star article.. 🙂

    Thanks, Roy. I appreciate you stopping by. How are you doing after that last car crunch?

    I am doing good. My car is back in shape. I have been quite on the forums. Just watching. Lots of work. But when I saw your article I had to comment.. 🙂

    -Roy

  • Jeff Moden (5/3/2011)


    What I find really strange is why do people wait until I write an article to bring these wonders forward? 😛

    Maybe they want to hear about you sucking on a beer popsicle in a corner with the dust bunnies all cuddled up in your binky? :-P;-)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Nadrek (5/3/2011)


    The UNION version is up to about 5% faster on small but reasonably useful numbers of elements and uses the "standard" 1 based tally table, and the 0 based version is up to about 5% faster on medium to large numbers of elements, and uses a special tally table that starts at 0. Both can be slightly slower than your CTE Tally example on very small numbers of elements/string sizes, depending on some combination of SQL version and platform parameters.

    Nadrek, I was thinking about this on my commute, and am wondering if you would test this out (I'm not going to have the time today like I had yesterday)

    (It creates a zero-based tally table from a one-based tally table):

    CREATE FUNCTION [dbo].[YourFunction0Based](@pString [varchar](8000), @pDelimiter [char](1))

    RETURNS TABLE

    RETURN

    WITH

    cteTally(N) AS (--==== This limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    -- changed "+1" to "+2", made N be "N-1" to create a zero-based virtual tally table

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))+2) N-1

    FROM YourDB.dbo.YourTallyTable0Based WITH (nolock)

    WHERE N >= 1

    -- add where clause to NOT get a zero if a zero based tally table.

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1), -- not including StringNumber actually increases duration and CPU, but uses slightly fewer writes

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

    FROM cteStart s WITH (nolock)

    ;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Roy Ernest (5/4/2011)


    Jeff Moden (5/3/2011)


    Roy Ernest (5/3/2011)


    As usual, great post Jeff. Five star article.. 🙂

    Thanks, Roy. I appreciate you stopping by. How are you doing after that last car crunch?

    I am doing good. My car is back in shape. I have been quite on the forums. Just watching. Lots of work. But when I saw your article I had to comment.. 🙂

    Glad to hear that you're doing fine Roy!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Maybe I am missing something here - but I don't think you need a 0 based tally table (perm) to make this work. If you change the following:

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    To this - you get a 0 based result and it works. Not sure if it changes the performance but I cannot see how it would:

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 FROM E4

    ),

    At least, my tests so far seem to work the same.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeff, great article!... I just read through it a few minutes ago, but as I was walking through you code explanation for handling the start and end positions I thought your epiphany was going to be putting a 'delim' at the beginning and the end of the input string.... bypassing having to handle no delims at these positions... it would be a one-time hit modifying the string, but the 'delim/data' pattern would be consistent w/o having to create extra code to account for the lack of delim's... just a thought...

    Thanks for the effort!

    KDM

Viewing 15 posts - 136 through 150 (of 990 total)

You must be logged in to reply to this topic. Login to reply