Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Exploding delimited string records Expand / Collapse
Author
Message
Posted Thursday, March 6, 2008 3:05 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, June 2, 2011 9:24 AM
Points: 2, Visits: 11
I'm trying to split a string and relate it to the row from which it came but I'm having some trouble. Given the following:

CREATE TABLE Source
(
Id INT,
Data NVARCHAR(2000)
)
GO

INSERT INTO Source (Data) VALUES (1, 'APPLE, DOG, CAT')
INSERT INTO Source (Data) VALUES (1, 'A,B,C')
GO


I would like to see output like this:

Id Data
--- --------
1 APPLE
1 DOG
1 CAT
2 A
2 B
2 C


There are a number of string split function examples but all of them are called like this:

SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')

What I need to do is to be able to feed in the [Source].[Data] column to a function and include the ID in the output to get the above example.

Ideally, I'd like to be able to issue a single SELECT and get back the ID of the record and a row for each split string token.

I just don't know how to do this.
Post #465522
Posted Thursday, March 6, 2008 3:57 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:33 PM
Points: 35,259, Visits: 31,740
John, you happen to have come to the right place at the right time...

There are several methods to do this, but for shear performance, nothing beats the method I'm getting ready to show you. How do I know? I'm about 20 hours into some experiments with all the methods for an article I'm writing. You'll also find a lot of people that have also done similar experiments and the following method is absolutely the fastest and certainly one of the easiest.

The first thing you need is a "Tally" or "Numbers" table... I prefer the word "Tally" just because it sounds cooler and describes a bit about what the table is actually used for. It consists of a single very well index column of sequential numbers that start at 1 and go to some number. Depending on what I anticipate the customer needs to be, I make one that has 11k, 14k, 19k, or even a million rows long. The table doesn't take much room compared to the utility it provides... it has dozens of uses and doing "whole table splits" like you need to is just one of the things that the Tally table does very well.

Just because... I'm going to assume that you're only going to use a max of 8000 characters in your CSV column. I have a method that will handle VARCHAR(MAX), but it's probably overkill, here. Using VARCHAR(MAX) also slows things down... sometimes, a lot!

So, without further digression, here's how to make an 11 k row Tally table...

--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC

Now... let's use your example data that you were kind enough to supply to do a "whole table split" using the Tally table... The key here is that we have to have a primary key on the table, so I've changed the values in your ID column to be UNIQUE.

--======================================================================================================
-- Create and populate a test table. THIS IS NOT PART OF THE SOLUTION!
--======================================================================================================
CREATE TABLE Source
(
ID INT PRIMARY KEY CLUSTERED,
Data NVARCHAR(2000)
)
GO
INSERT INTO Source (ID,Data) VALUES (1, 'APPLE, DOG, CAT')
INSERT INTO Source (ID,Data) VALUES (2, 'A,B,C')
GO

--======================================================================================================
-- Split the whole table's Data column all at once
-- This IS the solution
--======================================================================================================
--===== Declare a variable to hold the delimiter
DECLARE @Delim CHAR(1)
SET @Delim = ','

--===== Do the split with a count
SELECT ID,
DataVal = LTRIM(SUBSTRING(@Delim+s.Data, t.N+1, CHARINDEX(@Delim, s.Data+@Delim, t.N+1)-t.N))
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case SomeCsv is NULL
dbo.Source s
ON SUBSTRING(@Delim+s.Data, t.N, 1) = @Delim
AND t.N < LEN(@Delim+s.Data)
GO

I could explain in detail just how it works... but then there'd be no sense in writting the article. ;)

By the way, this method will split a million rows with an average of 10 items in the CSV, not including the display time, in just 93 seconds.

Please let me know if that works for ya...


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #465545
Posted Thursday, March 6, 2008 7:39 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, June 2, 2011 9:24 AM
Points: 2, Visits: 11
Jeff,

I've used a numbers table and a calendar table before so I'm familiar with their uses. I thought of trying to use a numbers table but my mistake was trying to combine it with a UDF.

I'm glad I didn't need the UDF and I've been able to take your example and successfully apply it to my problem.

Thank you very much for the response and the unbelievably fast solution. I was able to split a VARCHAR(255) column in 143,615 rows into 179,700 records in 11 seconds.

- John
Post #465605
Posted Thursday, March 6, 2008 8:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:33 PM
Points: 35,259, Visits: 31,740
Very cool... thank you for the stats and the feedback, John.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #465618
Posted Thursday, March 6, 2008 8:58 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, February 4, 2013 7:02 AM
Points: 465, Visits: 94
if u can send me the result or script of function......

SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')

then some way can be sugested.........

Post #465622
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse