spliting a varchar(max) into multiples

  • Hi,

    I have a need to split a varchar column into multiple columns based on a delimiter.

    Hi have a variable named @a varchar(max)

    it contains value similar to below

    select @a = '1,a,b,20080101|2,c,d,20080102|3,e,f,20080102'

    like i need to insert it into a table

    with columns( a1 int, a2 char, a3 char, a4 datetime)

    Is it worth writing a function with user defined table valued parameter or any other thoughts?

    I would be happy to receive codes for accomplishing this?

    Kindest Regards,
    Paarthasarathy
    Microsoft Certified Technology Specialist
    http://paarthasarathyk.blogspot.com

  • Easily possible. Try "DelimitSplit8K Jeff Moden" in the site's search, you will find an excellent solution for your problem.

  • Hello,

    Please try the following script

    Within the script, you can find the source of the split function at Split String using XML - How to Convert or Split a Delimited String Values to Rows using T-SQL XML Commands

    Besides split code, CROSS APPLY is also key here in this solution.

    Also grouping and taking MAX is the trick.

    create table splittedtable (

    column1 int,

    column2 varchar(5),

    column3 varchar(5),

    column4 date

    )

    GO;

    declare @a varchar(max) = '1,a,b,20080101|2,c,d,20080102|3,e,f,20080102';

    with cte as (

    select

    s.id,

    column1 = case when t.id = 1 then t.val else null end,

    column2 = case when t.id = 2 then t.val else null end,

    column3 = case when t.id = 3 then t.val else null end,

    column4 = case when t.id = 4 then t.val else null end

    from dbo.SPLIT(@a,'|') s

    cross apply dbo.SPLIT(s.val,',') t

    )

    insert into splittedtable

    select

    column1 = MAX(column1),

    column2 = MAX(column2),

    column3 = MAX(column3),

    column4 = MAX(column4)

    from cte

    group by id

    select * from splittedtable

    Hoping to be helpful,

  • Eralper (7/20/2010)


    ...Split String using XML - How to Convert or Split a Delimited String Values to Rows using T-SQL XML Commands...

    You have the right basic idea there, but using XML is not a good way to split strings in SQL Server.

    There are some performance comparisons here

    The script ColdCoffee mentioned earlier is efficient and appropriate for many situations.

    For overall fastest performance, particularly on long strings and big sets of data, use a SQLCLR solution, such as this one.

  • Hi Paul,

    I remember we have a discussion on splitting function, do you remember too 🙂

    In fact I could not measure how fast my split functions are.

    I have also coded a recursive cte sql split function too. I guess it will not perform very good too. But I enjoy solving a problem using different methods.

    By the way thanks for the link. I must say that I did not surprise to see the CLR is performing very well. Because splitting is not actually a database action. That is programming. And I do not expect sql engine to be very fast at row based actions, etc. But I expect it to be good at table level query commands.

  • Paarthasarathy (7/20/2010)


    Hi,

    I have a need to split a varchar column into multiple columns based on a delimiter.

    Hi have a variable named @a varchar(max)

    it contains value similar to below

    select @a = '1,a,b,20080101|2,c,d,20080102|3,e,f,20080102'

    like i need to insert it into a table

    with columns( a1 int, a2 char, a3 char, a4 datetime)

    Is it worth writing a function with user defined table valued parameter or any other thoughts?

    I would be happy to receive codes for accomplishing this?

    Please see the following "how to" article for a T-SQL only solution...

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eralper (7/20/2010)


    I remember we have a discussion on splitting function, do you remember too 🙂

    In fact I could not measure how fast my split functions are.

    I can't say I do remember it specifically (sorry) but it is a question that comes up a lot, so I do lose track.

    The performance differences may be minimal for a small number of rows (depending on the size of the rows, of course!) but the picture becomes very clear as we give the splitting more work to do. The XML method is the slowest of all the techniques I know, and it also bothers me because it is a misuse of the facility, and may fail for some strings due to illegal XML characters (as far as I remember, anyway).

    I have also coded a recursive cte sql split function too. I guess it will not perform very good too. But I enjoy solving a problem using different methods.

    Yes the recursive idea is nicer, at least from a theoretical point of view, but it does not perform well, as you say. This is mostly down to the SQL Server engine's implementation of recursion.

    By the way thanks for the link. I must say that I did not surprise to see the CLR is performing very well. Because splitting is not actually a database action. That is programming. And I do not expect sql engine to be very fast at row based actions, etc. But I expect it to be good at table level query commands.

    I agree with most of that. Set-based solutions will often out-perform even a good SQLCLR solution, but it's a question of picking the right tool for the job at hand.

    Paul

  • Jeff Moden (7/20/2010)


    Please see the following "how to" article for a T-SQL only solution...

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    Yes, that's one of the top articles on the whole site. For some reason, I can never remember the title when I think about searching for it. That's my excuse for not posting it previously, and I am sticking to it 🙂

  • Paul White NZ (7/20/2010)


    Jeff Moden (7/20/2010)


    Please see the following "how to" article for a T-SQL only solution...

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    Yes, that's one of the top articles on the whole site. For some reason, I can never remember the title when I think about searching for it. That's my excuse for not posting it previously, and I am sticking to it 🙂

    😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since we have a known number of columns, we could technically split the data via a split function and then use PARSENAME to split the other data for us.

    It makes the code clean and is pretty straight forward. Of course this only holds water, if there is a set number of columns and it is less than 4.

    USE [tempdb]

    GO

    IF OBJECT_ID('dbo.fn_split') IS NOT NULL

    DROP FUNCTION dbo.fn_split;

    GO

    CREATE FUNCTION dbo.fn_split(@arr AS NVARCHAR(MAX), @sep AS NCHAR(1))

    RETURNS TABLE

    AS

    RETURN

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1) --2 rows

    ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B) --4 rows (2x2)

    ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B) --16 rows (4x4)

    ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B) --256 rows (16x16)

    ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B) --65536 rows (256x256)

    ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B) --4,294,967,296 rows (65536x65536)

    ,Numbers AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)

    SELECT

    (n - 1) - LEN(REPLACE(LEFT(@arr, n-1), @sep, N'')) + 1 AS pos,

    LTRIM(SUBSTRING(@arr, n, CHARINDEX(@sep, @arr + @sep, n) - n)) AS element

    FROM Numbers

    WHERE n <= LEN(@arr) + 1

    AND SUBSTRING(@sep + @arr, n, 1) = @sep

    AND Numbers.n <= 1000 -- make sure to change to an appropriate value

    GO

    DECLARE @var VARCHAR(MAX)

    SET @var ='1,a,b,20080101|2,c,d,20080102|3,e,f,20080102'

    SELECT

    PARSENAME(REPLACE(element,',','.'),4) AS Col1,

    PARSENAME(REPLACE(element,',','.'),3) AS Col2,

    PARSENAME(REPLACE(element,',','.'),2) AS Col3,

    PARSENAME(REPLACE(element,',','.'),1) AS Col4

    FROM dbo.fn_split(@var,'|')

  • Adam Haines (7/21/2010)


    Since we have a known number of columns, we could technically split the data via a split function and then use PARSENAME to split the other data for us...

    Yes, PARSENAME is very handy for that sort of thing (many people use it to split IPv4 addresses too).

    The only thing I have against PARSENAME, is that it is non-deterministic (for reasons I don't understand).

    It's not a huge problem, but I do try to avoid non-deterministic functions where a deterministic alternative exists, since the query optimiser has many fewer choices for plan exploration and re-ordering when it encounters a non-deterministic function.

    Still, a neat way to perform the multi-level splitting 🙂

  • In fact this particular example can be addressed by a combination of PARSENAME and string manipulation to resolve the column list

    DECLARE @ReplacePipe VARCHAR(MAX)

    SET @ReplacePipe = REPLACE('1,a,b,20080101|2,c,d,20080102|3,e,f,20080102', '|', '.')

    ;WITH cte (SplitColumn) AS

    (

    SELECT PARSENAME(@ReplacePipe, 3)

    UNION ALL SELECT PARSENAME(@ReplacePipe, 2)

    UNION ALL SELECT PARSENAME(@ReplacePipe, 1)

    )

    SELECT

    LEFT(SplitColumn, CHARINDEX(',', SplitColumn) - 1) AS a1,

    SUBSTRING(SplitColumn, CHARINDEX(',', SplitColumn) + 1, CHARINDEX(',', SplitColumn, CHARINDEX(',', SplitColumn) + 1) - CHARINDEX(',', SplitColumn) - 1) AS a2,

    REVERSE(SUBSTRING(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1, CHARINDEX(',', REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1) - CHARINDEX(',', REVERSE(SplitColumn)) - 1)) AS a3,

    REVERSE(LEFT(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) - 1)) AS a4

    FROM cte

    or indeed without using PARSENAME at all

    DECLARE @a VARCHAR(MAX)

    SET @a = '1,a,b,20080101|2,c,d,20080102|3,e,f,20080102'

    ;WITH cte (SplitColumn) AS

    (

    SELECT LEFT(@a, CHARINDEX('|', @a) - 1)

    UNION ALL SELECT SUBSTRING(@a, CHARINDEX('|', @a) + 1, CHARINDEX('|', @a, CHARINDEX('|', @a) + 1) - CHARINDEX('|', @a) - 1)

    UNION ALL SELECT REVERSE(LEFT(REVERSE(@a), CHARINDEX('|', REVERSE(@a)) - 1))

    )

    SELECT

    LEFT(SplitColumn, CHARINDEX(',', SplitColumn) - 1) AS a1,

    SUBSTRING(SplitColumn, CHARINDEX(',', SplitColumn) + 1, CHARINDEX(',', SplitColumn, CHARINDEX(',', SplitColumn) + 1) - CHARINDEX(',', SplitColumn) - 1) AS a2,

    REVERSE(SUBSTRING(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1, CHARINDEX(',', REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1) - CHARINDEX(',', REVERSE(SplitColumn)) - 1)) AS a3,

    REVERSE(LEFT(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) - 1)) AS a4

    FROM cte

  • All those REVERSE's are likely to make that a pretty expensive proposition. Have you compared it on a a wide set of parameters?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not yet Jeff, it's just that I'm so used to doing tally table/cross apply splits that I thought I'd try something different for a change!;-)

  • Seems to work reasonably well even for 2 dimensional splits. Using the general approach presented below, I have experimented with much wider parameters and increased number of rows. For example, a 5000 wide parameter and 1000 rows in #Hold takes about 40 seconds to return. The same query using VARCHAR(8000) instead of VARCHAR(MAX) is 5X quicker.

    If the requirement is merely to split the parameter itself, then simply replace GO 100 with GO in the below script. The split takes less than a second for a 20,000 wide parameter

    IF NOT OBJECT_ID('tempdb.dbo.#Hold', 'U') IS NULL DROP TABLE #Hold

    CREATE TABLE #Hold (BulkColumn VARCHAR(MAX))

    GO

    DECLARE @a VARCHAR(MAX) = '1,a,b,20080101|2,c,d,20080102|3,e,f,20080102|4,g,h,20080101|5,i,j,20080102|6,k,l,20080102|7,m,n,20080101|8,o,p,20080102|9,q,r,20080102|10,s,t,20080101|11,u,v,20080102|12,w,x,20080102'

    INSERT #Hold SELECT @a

    GO 100

    SELECT a1, a2, a3, a4 FROM #Hold

    CROSS APPLY

    (

    SELECT

    LEFT(SplitColumn, CHARINDEX(',', SplitColumn) - 1),

    SUBSTRING(SplitColumn, CHARINDEX(',', SplitColumn) + 1, CHARINDEX(',', SplitColumn, CHARINDEX(',', SplitColumn) + 1) - CHARINDEX(',', SplitColumn) - 1),

    REVERSE(SUBSTRING(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1, CHARINDEX(',', REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) + 1) - CHARINDEX(',', REVERSE(SplitColumn)) - 1)),

    REVERSE(LEFT(REVERSE(SplitColumn), CHARINDEX(',', REVERSE(SplitColumn)) - 1))

    FROM (

    SELECT SUBSTRING(BulkColumn + '|', N, CHARINDEX('|', BulkColumn + '|', N) - N)

    FROM master.dbo.tally

    WHERE N < LEN(BulkColumn) + 2 AND SUBSTRING('|' + BulkColumn + '|', N, 1) = '|'

    ) AS X (SplitColumn)

    ) AS Y (a1, a2, a3, a4)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply