Exploding delimited string records

  • I'm trying to split a string and relate it to the row from which it came but I'm having some trouble. Given the following:

    [font="Courier New"]CREATE TABLE Source

    (

    Id INT,

    Data NVARCHAR(2000)

    )

    GO

    INSERT INTO Source (Data) VALUES (1, 'APPLE, DOG, CAT')

    INSERT INTO Source (Data) VALUES (1, 'A,B,C')

    GO[/font]

    I would like to see output like this:

    [font="Courier New"]Id Data

    --- --------

    1 APPLE

    1 DOG

    1 CAT

    2 A

    2 B

    2 C[/font]

    There are a number of string split function examples but all of them are called like this:

    [font="Courier New"]SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')[/font]

    What I need to do is to be able to feed in the [Source].[Data] column to a function and include the ID in the output to get the above example.

    Ideally, I'd like to be able to issue a single SELECT and get back the ID of the record and a row for each split string token.

    I just don't know how to do this.

  • John, you happen to have come to the right place at the right time...

    There are several methods to do this, but for shear performance, nothing beats the method I'm getting ready to show you. How do I know? I'm about 20 hours into some experiments with all the methods for an article I'm writing. You'll also find a lot of people that have also done similar experiments and the following method is absolutely the fastest and certainly one of the easiest.

    The first thing you need is a "Tally" or "Numbers" table... I prefer the word "Tally" just because it sounds cooler and describes a bit about what the table is actually used for. It consists of a single very well index column of sequential numbers that start at 1 and go to some number. Depending on what I anticipate the customer needs to be, I make one that has 11k, 14k, 19k, or even a million rows long. The table doesn't take much room compared to the utility it provides... it has dozens of uses and doing "whole table splits" like you need to is just one of the things that the Tally table does very well.

    Just because... I'm going to assume that you're only going to use a max of 8000 characters in your CSV column. I have a method that will handle VARCHAR(MAX), but it's probably overkill, here. Using VARCHAR(MAX) also slows things down... sometimes, a lot!

    So, without further digression, here's how to make an 11 k row Tally table...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    Now... let's use your example data that you were kind enough to supply to do a "whole table split" using the Tally table... The key here is that we have to have a primary key on the table, so I've changed the values in your ID column to be UNIQUE.

    --======================================================================================================

    -- Create and populate a test table. THIS IS NOT PART OF THE SOLUTION!

    --======================================================================================================

    CREATE TABLE Source

    (

    ID INT PRIMARY KEY CLUSTERED,

    Data NVARCHAR(2000)

    )

    GO

    INSERT INTO Source (ID,Data) VALUES (1, 'APPLE, DOG, CAT')

    INSERT INTO Source (ID,Data) VALUES (2, 'A,B,C')

    GO

    --======================================================================================================

    -- Split the whole table's Data column all at once

    -- This IS the solution

    --======================================================================================================

    --===== Declare a variable to hold the delimiter

    DECLARE @Delim CHAR(1)

    SET @Delim = ','

    --===== Do the split with a count

    SELECT ID,

    DataVal = LTRIM(SUBSTRING(@Delim+s.Data, t.N+1, CHARINDEX(@Delim, s.Data+@Delim, t.N+1)-t.N))

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case SomeCsv is NULL

    dbo.Source s

    ON SUBSTRING(@Delim+s.Data, t.N, 1) = @Delim

    AND t.N < LEN(@Delim+s.Data)

    GO

    I could explain in detail just how it works... but then there'd be no sense in writting the article. 😉

    By the way, this method will split a million rows with an average of 10 items in the CSV, not including the display time, in just 93 seconds.

    Please let me know if that works for ya...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I've used a numbers table and a calendar table before so I'm familiar with their uses. I thought of trying to use a numbers table but my mistake was trying to combine it with a UDF.

    I'm glad I didn't need the UDF and I've been able to take your example and successfully apply it to my problem.

    Thank you very much for the response and the unbelievably fast solution. I was able to split a VARCHAR(255) column in 143,615 rows into 179,700 records in 11 seconds.

    - John

  • Very cool... thank you for the stats and the feedback, John.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • if u can send me the result or script of function......

    SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')

    then some way can be sugested.........

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply