Alternative to splitting text

  • Hello,

    I am in need of some assistance. I have a column that have values that look like this:

    111#222#33333#44#555#66

    The length of the values can change. I am going to insert each set of values separated by # into columns.

    For example: 111 into COL1, 222 into COl2, 33333 into COL3.

    I have already wrote the code to do this, but it is a bit tedious and confusing. I am using SUBSTRING/CHARINDEX method. Is there another method to do this?

  • You may want to check this article by Aaron Bertrand. He explains various methods to split values in a string separated by delimiter such as #.

    http://sqlperformance.com/2012/07/t-sql-queries/split-strings

    Also check out this article by Ole Michelsen >> https://ole.michelsen.dk/blog/split-string-to-table-using-transact-sql.html

    Thanks

  • imba (1/13/2016)


    Hello,

    I am in need of some assistance. I have a column that have values that look like this:

    111#222#33333#44#555#66

    The length of the values can change. I am going to insert each set of values separated by # into columns.

    For example: 111 into COL1, 222 into COl2, 33333 into COL3.

    I have already wrote the code to do this, but it is a bit tedious and confusing. I am using SUBSTRING/CHARINDEX method. Is there another method to do this?

    Quick suggestion, use the DelimitedSplit8K[/url] function

    😎

  • sqlquaker (1/13/2016)


    You may want to check this article by Aaron Bertrand. He explains various methods to split values in a string separated by delimiter such as #.

    http://sqlperformance.com/2012/07/t-sql-queries/split-strings

    Also check out this article by Ole Michelsen >> https://ole.michelsen.dk/blog/split-string-to-table-using-transact-sql.html

    Thanks

    The problem with that article is that the test data Aaron used will lead you down the primrose path as to which T-SQL method is the fastest (I've almost finished an article on this type of serious mistake). If you can't use a properly written SQLCLR (and very few are actually written correctly) for the split, then I strongly recommend that you use either DelimitedSplit8K (for up to VARCHAR(8000)) or DelimitedSplitN4K (for up to NVARCHAR(4000)).

    You can get both functions from the following article.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    I've not checked Ole's function for performance but, if you decide to, don't make the same mistake with the test data that Aaron made.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sqlquaker (1/13/2016)


    You may want to check this article by Aaron Bertrand. He explains various methods to split values in a string separated by delimiter such as #.

    http://sqlperformance.com/2012/07/t-sql-queries/split-strings

    Also check out this article by Ole Michelsen >> https://ole.michelsen.dk/blog/split-string-to-table-using-transact-sql.html

    Thanks

    Caution here, Ole Michelsen's splitter is a recursive cte type splitter and unfortunately it's neither the fastest nor the most reliable/robust as it will fail on some extended characters, example below.

    😎

    /*

    Simple splitter test harness which splits all sys.messages.text on space

    */

    DECLARE @timer TABLE (T_TEXT VARCHAR(50) NOT NULL,T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

    DECLARE @NVARCHAR_BUCKET NVARCHAR(4000) = N'';

    INSERT INTO @timer(T_TEXT) VALUES ('DRY RUN');

    SELECT

    @NVARCHAR_BUCKET = SM.text

    FROM sys.messages SM;

    INSERT INTO @timer(T_TEXT) VALUES ('DRY RUN');

    BEGIN TRY

    INSERT INTO @timer(T_TEXT) VALUES ('OMSplit');

    SELECT

    @NVARCHAR_BUCKET = X.Data

    FROM sys.messages SM

    CROSS APPLY dbo.OMSplit(SM.text,NCHAR(32)) AS X;

    INSERT INTO @timer(T_TEXT) VALUES ('OMSplit');

    END TRY

    BEGIN CATCH

    INSERT INTO @timer(T_TEXT) VALUES ('OMSplit FAILED!');

    END CATCH

    INSERT INTO @timer(T_TEXT) VALUES ('DelimitedSplitN4K');

    SELECT

    @NVARCHAR_BUCKET = X.Item

    FROM sys.messages SM

    CROSS APPLY dbo.DelimitedSplitN4K(SM.text,NCHAR(32)) AS X;

    INSERT INTO @timer(T_TEXT) VALUES ('DelimitedSplitN4K');

    SELECT

    T.T_TEXT

    ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

    FROM @timer T

    GROUP BY T.T_TEXT

    ORDER BY DURATION;

  • Once you get the splitter, you just need to pivot the data.

    CREATE TABLE #SplitTest( String varchar( 8000));

    INSERT INTO #SplitTest VALUES( '111#222#33333#44#555#66');

    SELECT MAX( CASE WHEN ItemNumber = 1 THEN Item END) AS Col1

    ,MAX( CASE WHEN ItemNumber = 2 THEN Item END) AS Col2

    ,MAX( CASE WHEN ItemNumber = 3 THEN Item END) AS Col3

    ,MAX( CASE WHEN ItemNumber = 4 THEN Item END) AS Col4

    ,MAX( CASE WHEN ItemNumber = 5 THEN Item END) AS Col5

    ,MAX( CASE WHEN ItemNumber = 6 THEN Item END) AS Col6

    FROM #SplitTest

    CROSS APPLY TEST.dbo.DelimitedSplit8k( String, '#') s;

    GO

    DROP TABLE #SplitTest;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply