Introduction to STRING_SPLIT function in SQL

  • Archana

    Old Hand

    Points: 322

    Comments posted to this topic are about the item Introduction to STRING_SPLIT function in SQL

  • Japie Botma

    SSCrazy

    Points: 2940

    This looks like an overkill to me. It is another help to people to understand STRING_SPLIT, but in your scenario I would use INSTR to get the positions of the dots and the LEFT and SUBSTRING concatenated to return the result. Or am I missing something?

    5ilverFox
    Consulting DBA / Developer
    South Africa

  • Robert-378556

    SSCertifiable

    Points: 5542

    The problem with this solution is that the UDF assumes the same order of rows as the substrings in the input string, while the documentation for string_split says the order is not guaranteed.

  • aklt

    SSC Veteran

    Points: 233

    My own horrible solution (which i think is something along the lines of what Japie Botma had in mind)

    SELECT * INTO #TempWork FROM (
    SELECT 'RCX_M01_SMD 1.0.6' AS AppVersion UNION
    SELECT 'RCX_D00_BackOffice 7.2.1.0' AS AppVersion UNION
    SELECT 'RCX_I04_OLAP 2.0.0.0 [2.0.0.5]' AS AppVersion UNION
    SELECT 'RCX_X02_MOB 10.6.2.3' AS AppVersion
    ) temp
    GO
    with cte as
    (
    SELECT *, CHARINDEX('.', AppVersion, CHARINDEX('.', AppVersion, CHARINDEX('.', AppVersion, 0) + 1) + 1) Pos FROM #TempWork
    ), cte2 as
    (
    select *, case when Pos = 0 then Len(AppVersion) else Pos - 1 end Length from cte
    )
    select *, SUBSTRING(AppVersion, 1, Length) from cte2
    GO
  • Japie Botma

    SSCrazy

    Points: 2940

    Not quite. Something much simpler:

    My Test Code:

    SELECT 'RCX_M01_SMD 1.0.6 ' AS AppVersion INTO #Temp

    INSERT INTO #Temp SELECT 'RCX_D00_BackOffice 7.2.1.0'

    INSERT INTO #Temp SELECT 'RCX_I04_OLAP 2.0.0.0 [2.0.0.5]'

    INSERT INTO #Temp SELECT 'RCX_X02_MOB 10.6.2.3'

    SELECT AppVersion, dbo.fncBudleVersion(AppVersion) AS BundleVersion

    FROM #Temp

    DROP TABLE #Temp

    My Function :

    CREATE FUNCTION dbo.fncBudleVersion

    (

    @AppVersion VARCHAR(MAX)

    )

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @Pos INT

    SET @Pos = CHARINDEX('.', @AppVersion, 1)

    SET @Pos = CHARINDEX('.', @AppVersion, @Pos + 1)

    SET @Pos = CHARINDEX('.', @AppVersion, @Pos + 1)

    IF @Pos = 0 SET @Pos = LEN(@AppVersion) + 1

    RETURN SUBSTRING(@AppVersion, 1, @Pos - 1)

    END

    5ilverFox
    Consulting DBA / Developer
    South Africa

  • Japie Botma

    SSCrazy

    Points: 2940

    The same as yours, just with a function, because that was what he was using.

    5ilverFox
    Consulting DBA / Developer
    South Africa

  • Xedni

    SSCertifiable

    Points: 6947

    Robert-378556 wrote:

    The problem with this solution is that the UDF assumes the same order of rows as the substrings in the input string, while the documentation for string_split says the order is not guaranteed.

    I had the same thought. Imho the lack of straightforward ability to return the split values sorted makes the function useless in a great many scenarios. Shame...

    Executive Junior Cowboy Developer, Esq.[/url]

  • Jeff Moden

    SSC Guru

    Points: 996851

    First of all, I appreciate it when anyone puts the time and effort into writing and publishing an article, so thank you very much for that.  Keep writing.

    As you've seen in the discussion above, so far, there are a couple of problems that people have cited and a couple they have not.  I realize that some of these aren't even close to the primary thrust of your article but let's check out a couple of options...

    Test Data

    First, we need to duplicate the test data you posted for others to test with.  I threw in a couple of wild geese just in case such things might occur.

    --      DROP TABLE #TempWork
    --===== Create the test table and populate it on-the-fly
    SELECT AppVersion
    INTO #TempWork
    FROM (VALUES
    ('RCX_M01_SMD 1.0.6')
    ,('RCX_D00_BackOffice 7.2.1.0')
    ,('RCX_I04_OLAP 2.0.0.0 [2.0.0.5]')
    ,('RCX_X02_MOB 10.6.2.3')
    ,('RCX_J01_NoRbar 100.10') --Added the following rows just to be sure
    ,('RCX_J02_NoRbar 99')
    ,('RCX_J03_NoRbar')
    ) v (AppVersion)
    ;
    GO

    Should have been an "iSF"

    In that latter category, you've built a Scalar UDF, which don't perform well for a lot of reasons we'll not cover in this post.  What you should have done is to build it as an iSF (Inline Scalar Function) and, no... I'm not talking about 2019.  In this case, an iSF is an iTVF (Inline Table Valued Function) that returns a scalar value.  Please see the following article for more about this particular subject.

    https://www.sqlservercentral.com/articles/how-to-make-scalar-udfs-run-faster-sql-spackle

    Here's what the code for that rendition would look like along with an example call.  Please DO observe the warning, which other people have also warned about.  THIS CODE IS FOR DEMONSTRATION PURPOSES ONLY!!!

    /***WARNING***WARNING***WARNING***WARNING***WARNING***WARNING***WARNING***
    DO NOT USE THIS CODE IN PRODUCTION BECAUSE STRING_SPLIT IS NOT GUARANTEED
    TO RETURN DATA ELEMENTS IN THE SAME ORDER AS THEY APPEAR IN THE STRING!!!
    ***WARNING***WARNING***WARNING***WARNING***WARNING***WARNING***WARNING***/
    CREATE FUNCTION dbo.udf_GetBundleVersionUsingStringSplit_JBM01
    (
    @AppVersion VARCHAR(8000)
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN WITH
    cte_StringSplit AS
    (--==== Split the string keep only the "first" 3 elements of the split
    SELECT TOP 3
    StringSplitValue = split.[value]
    FROM STRING_SPLIT (@AppVersion,'.') split
    )--==== Concatenate the pieces back together as a single string
    SELECT BundleVersion = STRING_AGG(StringSplitValue,'.')
    FROM cte_StringSplit
    ;
    GO
    --===== Demonstrate the usage of the function
    SELECT tw.AppVersion, ca.BundleVersion
    FROM #TempWork tw
    CROSS APPLY dbo.udf_GetBundleVersionUsingStringSplit_JBM01(tw.AppVersion) ca
    ;

    Interesting Code

    I liked the code that @ aklt posted.  He used cCTEs (Cascading CTEs ) to DRY out the code.  It, too, could be used in an iSF for reuse.

    It IS a real shame...

    It IS a real shame that Microsoft stuffed up what would have been a great splitter function by not including the ordinal position of the elements returned in the STRING_SPLIT function.  If they did, the code would be super simple and you wouldn't even need to think about creating a function for the problem you've portrayed.  I'm using the DelimitedSplit8K function to substitute as a better splitter that does return ordinal positions of the split out elements just to show what "could be".  Notice the "sort" being done by the STRING_AGG()...

     SELECT  tw.AppVersion
    ,BundleVersion = STRING_AGG(split.Item,'.') WITHIN GROUP (ORDER BY split.ItemNumber ASC)
    FROM #TempWork tw
    CROSS APPLY dbo.DelimitedSplit8K(AppVersion,'.') split --This takes the place of STRING_SPLIT()
    WHERE split.ItemNumber <=3
    GROUP BY tw.AppVersion
    ;

    A recommendation

    I recommend that you replace your current code with what @ aklt posted because it IS guaranteed to return everything in the correct order where yours is not.  You can convert his code to an iSF for reasons of performance, as well.

    Again, nice job on the article.  Keep writing.

    p.s.

    You can get the DelimitedSplit8K function at the bottom of the following article in resources section.  It works in all versions of SQL Server from 2005 and up.

    https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-%e2%80%9ccsv-splitter%e2%80%9d-function

    If you have 2012 or up, a good friend of mine (Eirikur Eiriksson) made even that twice as fast by using LEAD, which first appeared in 2012.  His rendition of the function is located in the first section of the following article...

    https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Archana

    Old Hand

    Points: 322

    Thank you, everyone, for your comments. I'm really inspired by all the discussions that are prevailing.

    @jeff-moden, Thank you for the detailed explanation. It really helped me understand the gap.

  • Jeff Moden

    SSC Guru

    Points: 996851

    You're welcome and thank you for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply