Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Eirikur Eiriksson (10/28/2015)


    Ed Wagner (10/28/2015)


    allnelsons (10/28/2015)


    So, question about these splitter functions (which are pretty much the most used tools in my arsenal, btw!), are they faster/more efficient being run from the I.B.G. CTE tally table like you're doing or with a physical Numbers table? Anyone done that analysis?

    It depends on the situation. I've seen smaller sets (< 8K rows) run faster with a physical tally table than with the tally function.

    It does also depend on the construct of the inline talky table, the numbers of rows produced and few other factors, best results so far are using memory resident tally table.In some cases, normal tally patterns like 2^3^3 or 2^2^2^2 tend to be up to 50% slower than 465^3 or 15^2^3.

    😎

    Thank you, Eirikur. You've given me something new to investigate. 😉

  • Ed Wagner (10/28/2015)


    Eirikur Eiriksson (10/28/2015)


    Ed Wagner (10/28/2015)


    allnelsons (10/28/2015)


    So, question about these splitter functions (which are pretty much the most used tools in my arsenal, btw!), are they faster/more efficient being run from the I.B.G. CTE tally table like you're doing or with a physical Numbers table? Anyone done that analysis?

    It depends on the situation. I've seen smaller sets (< 8K rows) run faster with a physical tally table than with the tally function.

    It does also depend on the construct of the inline talky table, the numbers of rows produced and few other factors, best results so far are using memory resident tally table.In some cases, normal tally patterns like 2^3^3 or 2^2^2^2 tend to be up to 50% slower than 465^3 or 15^2^3.

    😎

    Thank you, Eirikur. You've given me something new to investigate. 😉

    Been spending quite some time on this subject, any input appreciated, especially on realistic testing!

    😎

  • I think that if we're going to do a study on inline Tally Table creation methods and comparisons against physical Tally Tables that have been cached, we should start another thread to possibly wet the appetite of additional participants.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Or someone should write an article looking at performance comparisons.

  • Steve Jones - SSC Editor (11/2/2015)


    Or someone should write an article looking at performance comparisons.

    Heh... hint taken. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.

  • TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Ed Wagner (3/11/2016)


    TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.

    Wayne Sheffield is actually doing that as we speak.

    There are two major problems with the new function though and they both have connect items on them (links courtesy of Wayne Sheffield).

    Connect items to fix this:

    https://connect.microsoft.com/SQLServer/Feedback/Details/2433171 (add Item #)

    https://connect.microsoft.com/SQLServer/Feedback/Details/2454210 (handle NULL inputs)

    With that, I strongly encourage everyone to visit those links and vote the fixes up!

    @steve-2 Jones. This would be really good for an editorial especially if you included the links and encouraged folks to vote the fixes up. If you want, I could write it up over this weekend and send it to you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/11/2016)


    TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Ed Wagner (3/11/2016)


    TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.

    Wayne Sheffield is actually doing that as we speak.

    There are two major problems with the new function though and they both have connect items on them (links courtesy of Wayne Sheffield).

    Connect items to fix this:

    https://connect.microsoft.com/SQLServer/Feedback/Details/2433171 (add Item #)

    https://connect.microsoft.com/SQLServer/Feedback/Details/2454210 (handle NULL inputs)

    With that, I strongly encourage everyone to visit those links and vote the fixes up!

    @steve-2 Jones. This would be really good for an editorial especially if you included the links and encouraged folks to vote the fixes up. If you want, I could write it up over this weekend and send it to you.

    The item number is the main reason that the DelimitedSplit8K is far superior to all the other splitters available. Several other splitters are ok for performance on a smallish number of rows. But the lack of ordinal position within the list just sucks.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

    Tom

  • TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Someone needs to point out that using this in a WHERE clause is a BAD thing too (or putting in a table variable). The lack of statistics can lead to HORRIBLE plan choices if any data value skew is in play. Also there are any number of times the size of the thingy coming out will lead to issues too I bet. Oh, and implicit conversions possibly too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TomThomson (3/11/2016)


    Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

    If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...

    SELECT LOG(NULL);

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (3/11/2016)


    TheSQLGuru (3/11/2016)


    Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Someone needs to point out that using this in a WHERE clause is a BAD thing too (or putting in a table variable). The lack of statistics can lead to HORRIBLE plan choices if any data value skew is in play. Also there are any number of times the size of the thingy coming out will lead to issues too I bet. Oh, and implicit conversions possibly too.

    Heh... kind of like how MS pointed out that FORMAT is more than 44 times slower than an equivalent CONVERT? It won't happen in MS documentation. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/11/2016)


    TomThomson (3/11/2016)


    Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

    If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...

    SELECT LOG(NULL);

    I don't think any splitter would ever return a NULL as one of the results because you can't concatenate a NULL with a string and get anything other than NULL,

    For example, "1,2,,4" would hopefully return 4 rows: 1, 2, an empty string, 4. This is because there really is an empty string in position 3. And as we all know, an empty string is not a NULL.

  • Jeff Moden (3/11/2016)


    TomThomson (3/11/2016)


    Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

    If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...

    SELECT LOG(NULL);

    No, that's not what I'm suggesting. An empty set for NULL is anathema.

    What I'm suggesting is probably pretty radical: since Codd's idea was that we should work in a language that was based on a relational calculus, let's make T-SQL a little bit closer to that and treat relations as first class objects in the language - I know, absolute anathema to the SQL Standards gurus, but in my view disagreeing with them is a probably a good sign.

    So if you pass the function a NULL, it returns NULL, and if you pass it aomething other than null it retums a relation (ie a set of rows) - it's a (built in) TVF. Whether the function ever returns a relation containing no rows depends on how it is specified to behave if the string passed in to be split has length zero. To me it seems natural in that case to return one row (sequence number 1, string value the zero length string) but some people think returning an empty relation for that case would be better.

    Tom

Viewing 15 posts - 781 through 795 (of 990 total)

You must be logged in to reply to this topic. Login to reply