Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Usman Butt (8/29/2012)


    hahahaha..but your version too started with varchar(max) as the parameter, so would the explicit conversion to nvarchar(max) would have matter :hehe:

    Not at all. I wasn't criticising you. It would just be typical that the next post in the thread would be from someone with Unicode data.

  • SQL Kiwi (8/29/2012)


    Usman Butt (8/29/2012)


    hahahaha..but your version too started with varchar(max) as the parameter, so would the explicit conversion to nvarchar(max) would have matter :hehe:

    Not at all. I wasn't criticising you. It would just be typical that the next post in the thread would be from someone with Unicode data.

    I never took it as criticism. But, you are one of my favorite mentors so even your criticism is most welcome 😎

  • SQL Kiwi (8/29/2012)


    Yes that is better. Of course now someone will need to split:

    DECLARE @x nvarchar(max) = N'?????,??,???,????,??????,???????????,?????,??,?????,???';

    SELECT * FROM dbo.XmlInlineSplitter(@x, ',', NEWID()) AS xis;

    So eat more of these soft French loaves, and have some tea!

  • mburbea (8/22/2012)


    This will perform better than the tally based splitter for most strings.

    Which Tally Based Splitter???

    I guess I'd need to see your full test harness that proves that statement because the bit of testing I've done with your function shows that's not true if you limit the string to what the DelimitedSplit8K function was designed to do.

    Here's just a 1 row example...

    PRINT '========== DelimitedSplit8k on 7,999 Characters =========='

    DECLARE @CommaList VARCHAR(8000);

    SET @CommaList='ABC'+replicate(cast(',ABC' as varchar(8000)),1999);

    DECLARE @BitBucketV VARCHAR(8000),

    @BitBucketI BIGINT;

    SET STATISTICS TIME ON;

    SELECT @BitBucketV = Item,

    @BitBucketI = ItemNumber

    FROM dbo.DelimitedSplit8K(@CommaList,',');

    SET STATISTICS TIME OFF;

    GO 5

    PRINT '========== mburbea XMLify/XMLSplit on 7,999 Characters =========='

    DECLARE @CommaList VARCHAR(8000);

    SET @CommaList='ABC'+replicate(cast(',ABC' as varchar(8000)),1999);

    DECLARE @BitBucketV VARCHAR(8000),

    @BitBucketI BIGINT;

    SET STATISTICS TIME ON;

    SELECT @BitBucketV = Item

    FROM dbo.XMLSplit(@CommaList,',');

    SET STATISTICS TIME OFF;

    GO 5

    Here's the results from that code...

    Beginning execution loop

    ========== DelimitedSplit8k on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 14 ms.

    ========== DelimitedSplit8k on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 15 ms.

    ========== DelimitedSplit8k on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 14 ms.

    ========== DelimitedSplit8k on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 14 ms.

    ========== DelimitedSplit8k on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 15 ms.

    Batch execution completed 5 times.

    Beginning execution loop

    ========== mburbea XMLify/XMLSplit on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 58 ms.

    ========== mburbea XMLify/XMLSplit on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 58 ms.

    ========== mburbea XMLify/XMLSplit on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 58 ms.

    ========== mburbea XMLify/XMLSplit on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 56 ms.

    ========== mburbea XMLify/XMLSplit on 7,999 Characters ==========

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 57 ms.

    Batch execution completed 5 times.

    From here, it looks like your functions are about 4 times slower. Perhaps I'm missing something.

    So far as testing a 70MB string goes, you would have had to modify the DelimitedSplit8K function to VARCHAR(MAX) and that would instantly make the code run about 6 times as slow. Of course, then it wouldn't be appropriate to say that Tally Based Splitter is slower because that would make it do something that it simply wasn't designed to do.

    As a side bar, the DelimitedSplit8K will withstand strings like "A&B". Yours will not. Please submit a function that does fix the entitization problem so we can test apples-to-apples. I'm all in favor of faster code but let's make sure we have the same functionality available.

    Yours also doesn't return an item number but that, of course, is an easy fix.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Blog post is up: http://bit.ly/ComputeScalar

  • SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    ... Every damn time I start thinking I'm 'good' at this software, you come by and break my brain gently. Thanks for the blog post, that's... damned good information. It's gonna take me a week to fully consume it. 😀


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (9/4/2012)


    SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    ... Every damn time I start thinking I'm 'good' at this software, you come by and break my brain gently. Thanks for the blog post, that's... damned good information. It's gonna take me a week to fully consume it. 😀

    If it helps, that happens to me quite frequently too. The more you know... 😀

  • SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    That is a great explanation of the behavior Paul!

    I wish I understood more about query plans, so I could dig into them as you have.

    I'm wondering what other applications of XML may be improved by this approach, even though you do say at the end you shouldn't count on it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    Absolutely BRILLIANT! Great research and great read. Thanks, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul,

    I just noticed something on your blog. You have...

    © 2012 Paul White

    For some reason, some countries are required to recognize that unless you also include the words All Rights Reserved and the actual word Copyright. Like this...

    © Copyright 2012 Paul White - All Rights Reserved

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/4/2012)


    Paul,

    I just noticed something on your blog. You have...

    © 2012 Paul White

    For some reason, some countries are required to recognize that unless you also include the words All Rights Reserved and the actual word Copyright. Like this...

    © Copyright 2012 Paul White - All Rights Reserved

    Something for me to think about. I started adding the © to remind people about copyright; you'd be amazed how small my international litigation budget actually is 🙂

    Thanks for reading and leaving the kind comment by the way.

  • SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    I learned something today, several things in fact, thus today is a very good day for me.

    Thanks to you Paul!

  • SQL Kiwi (9/5/2012)


    Jeff Moden (9/4/2012)


    Paul,

    I just noticed something on your blog. You have...

    © 2012 Paul White

    For some reason, some countries are required to recognize that unless you also include the words All Rights Reserved and the actual word Copyright. Like this...

    © Copyright 2012 Paul White - All Rights Reserved

    Something for me to think about. I started adding the © to remind people about copyright; you'd be amazed how small my international litigation budget actually is 🙂

    Thanks for reading and leaving the kind comment by the way.

    I have the same problem. I even have well known user groups with news letters publishing my stuff without even asking. At least they leave my name in it all.

    It's really a shame that people copy other's work without asking whether they give credit or not. I've even taken to putting a copyright notice on every page of PowerPoint presentations. It doesn't really help if someone is hell bent on illegally copying but at least I have a legal leg to stand on if I ever do get the time and money necessary to break the antlers off of infringers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • peter-757102 (9/5/2012)


    SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    I learned something today, several things in fact, thus today is a very good day for me.

    Thanks to you Paul!

    Paul,

    I have to agree with Peter, here. If I extrapolate some of the things you said in your fine blog post, I can see possible explanations for why "Divide'n'Conquer" methods, such as using Temp Tables to store interim results instead of "All-in-one" queries, might execute so very much faster. Would you agree in that area?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Kiwi (9/4/2012)


    Blog post is up: http://bit.ly/ComputeScalar

    "Deferred execution" - the next best thing to "Let off the hook" 😀

    Thanks for an excellent read, Paul.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 406 through 420 (of 990 total)

You must be logged in to reply to this topic. Login to reply