Splitting Strings in SQL Server 2016

  • Comments posted to this topic are about the item Splitting Strings in SQL Server 2016

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • "it needs to be escaped using the forward-slash character "\"."

    Sorry to be picky, but you mean the back-slash character. :hehe:

    The real use of OPENJSON will be for structured JSON messages, not simple strings; it would be interesting to see an example of that as i already have a use-case for it! I expect that STRING_SPLIT and OPENJSON use the same underlying function for splitting.

  • True, while the string test here is interesting, it's real JSON data that will make it (or not) relevant. I'm thinking integration services, message broker etc. that could make a good use for this to parse JSON data efficiently and structure it into a set of relational tables as needed.

  • RobertMcClean (4/12/2016)


    "it needs to be escaped using the forward-slash character "\"."

    Sorry to be picky, but you mean the back-slash character. :hehe:

    The real use of OPENJSON will be for structured JSON messages, not simple strings; it would be interesting to see an example of that as i already have a use-case for it! I expect that STRING_SPLIT and OPENJSON use the same underlying function for splitting.

    Thanks for catching this Robert.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • ducon (4/12/2016)


    True, while the string test here is interesting, it's real JSON data that will make it (or not) relevant. I'm thinking integration services, message broker etc. that could make a good use for this to parse JSON data efficiently and structure it into a set of relational tables as needed.

    Keep in mind that the purpose of this article is about splitting strings, not about working with JSON data. OPENJSON was examined since it can split strings, and it's built-in.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Great comparison and test metrics. It's great to have off-the-shelf utilities like these.

    However, there is something I think it's worth sharing. Sometimes developers abuse split functions like these for simple tasks. In a lot of cases it's used simply to later on join the resulting data set with a table for matching values. We had an issue with a database in production that was using too much tempdb space, and the heavy use of split functions was one of the contributors.

    As an alternative, consider the following:

    declare @value varchar(10) = 'abc',

    @valueSet varchar(50) = 'abc,def',

    @delimiter char(1) = ','

    if concat(@delimiter,@valueSet,@delimiter) like concat('%',@delimiter,@value,@delimiter,'%')

    select 'yes'

    else

    select 'no'

    Put this into a function, call it say FnIsValueInSet, and then use it in your WHERE clause instead of joining to a temp table produced from the split string.

    Happy coding!

  • Interesting read Wayne. Good work sir!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • LesterF (4/12/2016)


    We had an issue with a database in production that was using too much tempdb space, and the heavy use of split functions was one of the contributors.

    Good to know, thanks.

  • Thanks for the article.

  • Thanks; great post! I did have a question about the "failures" using CROSS APPLY vs OUTER APPLY though. Isn't that the expected behavior of using these functions? It seems like it would be likening it to data in tables and calling an INNER JOIN a failure of a test case when a row isn't returned where there is a NULL or no corresponding record based on the join column in one of the tables. Right?

    Anyway, thanks again for the write-up...now we have even more ways to accomplish this without a clear winner in all scenarios 🙂

  • LesterF (4/12/2016)


    Great comparison and test metrics. It's great to have off-the-shelf utilities like these.

    However, there is something I think it's worth sharing. Sometimes developers abuse split functions like these for simple tasks. In a lot of cases it's used simply to later on join the resulting data set with a table for matching values. We had an issue with a database in production that was using too much tempdb space, and the heavy use of split functions was one of the contributors.

    As an alternative, consider the following:

    declare @value varchar(10) = 'abc',

    @valueSet varchar(50) = 'abc,def',

    @delimiter char(1) = ','

    if concat(@delimiter,@valueSet,@delimiter) like concat('%',@delimiter,@value,@delimiter,'%')

    select 'yes'

    else

    select 'no'

    Put this into a function, call it say FnIsValueInSet, and then use it in your WHERE clause instead of joining to a temp table produced from the split string.

    Happy coding!

    Do you have a specific example of what the over-used split function code looked like? It would be good to see what the problem you were actually experiencing was. It would also be nice to know what the split function code itself was because not all splitters are created equal. It may have been the method being used to do the splits was the real source of the problem rather than how it was being used.

    As a bit of a sidebar, I'd have to look real close at the use-case for your suggested alternative. Concatenation is a relatively expensive operation as are mid string lookups.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sk1ppy (4/12/2016)


    Thanks; great post! I did have a question about the "failures" using CROSS APPLY vs OUTER APPLY though. Isn't that the expected behavior of using these functions? It seems like it would be likening it to data in tables and calling an INNER JOIN a failure of a test case when a row isn't returned where there is a NULL or no corresponding record based on the join column in one of the tables. Right?

    I can't speak for Wayne but your question is a good one.

    My personal opinion is that, even though it returns a table-like structure, it's still a function and if you pass NULL to a function (except for things like ISNULL), you usually get a NULL back. With that in mind, I'd like to see the new function do the same (not to mention an extra column to identify the left to right position any given element appears in the string). Of course, my personal opinion is just that and many people will disagree.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/13/2016)


    Sk1ppy (4/12/2016)


    Thanks; great post! I did have a question about the "failures" using CROSS APPLY vs OUTER APPLY though. Isn't that the expected behavior of using these functions? It seems like it would be likening it to data in tables and calling an INNER JOIN a failure of a test case when a row isn't returned where there is a NULL or no corresponding record based on the join column in one of the tables. Right?

    I can't speak for Wayne but your question is a good one.

    My personal opinion is that, even though it returns a table-like structure, it's still a function and if you pass NULL to a function (except for things like ISNULL), you usually get a NULL back. With that in mind, I'd like to see the new function do the same

    Thanks for replying Jeff.

    And this is exactly what I feel about it also. You should get something back, since something was passed to it.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Fair enough. Thanks for replying!

  • Great article, Wayne. Thanks for putting it together. I especially liked your comparison of the different approaches and what each one does and does not do. I really hope that Microsoft finishes the function by adding an ordinal to the return result set.

    I must say that I fully agree - a function should return something even if it's NULL.

    Now all I have to do is wait until we get SQL 2016 here at work. I figure I have another 4 or 5 years. :w00t:

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply