SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Exploding delimited string records


Exploding delimited string records

Author
Message
John Hurrell
John Hurrell
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 11
I'm trying to split a string and relate it to the row from which it came but I'm having some trouble. Given the following:

CREATE TABLE Source
(
Id INT,
Data NVARCHAR(2000)
)
GO

INSERT INTO Source (Data) VALUES (1, 'APPLE, DOG, CAT')
INSERT INTO Source (Data) VALUES (1, 'A,B,C')
GO


I would like to see output like this:

Id Data
--- --------
1 APPLE
1 DOG
1 CAT
2 A
2 B
2 C


There are a number of string split function examples but all of them are called like this:

SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')

What I need to do is to be able to feed in the [Source].[Data] column to a function and include the ID in the output to get the above example.

Ideally, I'd like to be able to issue a single SELECT and get back the ID of the record and a row for each split string token.

I just don't know how to do this.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)

Group: General Forum Members
Points: 89477 Visits: 41144
John, you happen to have come to the right place at the right time...

There are several methods to do this, but for shear performance, nothing beats the method I'm getting ready to show you. How do I know? I'm about 20 hours into some experiments with all the methods for an article I'm writing. You'll also find a lot of people that have also done similar experiments and the following method is absolutely the fastest and certainly one of the easiest.

The first thing you need is a "Tally" or "Numbers" table... I prefer the word "Tally" just because it sounds cooler and describes a bit about what the table is actually used for. It consists of a single very well index column of sequential numbers that start at 1 and go to some number. Depending on what I anticipate the customer needs to be, I make one that has 11k, 14k, 19k, or even a million rows long. The table doesn't take much room compared to the utility it provides... it has dozens of uses and doing "whole table splits" like you need to is just one of the things that the Tally table does very well.

Just because... I'm going to assume that you're only going to use a max of 8000 characters in your CSV column. I have a method that will handle VARCHAR(MAX), but it's probably overkill, here. Using VARCHAR(MAX) also slows things down... sometimes, a lot!

So, without further digression, here's how to make an 11 k row Tally table...

--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC



Now... let's use your example data that you were kind enough to supply to do a "whole table split" using the Tally table... The key here is that we have to have a primary key on the table, so I've changed the values in your ID column to be UNIQUE.

--======================================================================================================
-- Create and populate a test table. THIS IS NOT PART OF THE SOLUTION!
--======================================================================================================
CREATE TABLE Source
(
ID INT PRIMARY KEY CLUSTERED,
Data NVARCHAR(2000)
)
GO
INSERT INTO Source (ID,Data) VALUES (1, 'APPLE, DOG, CAT')
INSERT INTO Source (ID,Data) VALUES (2, 'A,B,C')
GO

--======================================================================================================
-- Split the whole table's Data column all at once
-- This IS the solution
--======================================================================================================
--===== Declare a variable to hold the delimiter
DECLARE @Delim CHAR(1)
SET @Delim = ','

--===== Do the split with a count
SELECT ID,
DataVal = LTRIM(SUBSTRING(@Delim+s.Data, t.N+1, CHARINDEX(@Delim, s.Data+@Delim, t.N+1)-t.N))
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case SomeCsv is NULL
dbo.Source s
ON SUBSTRING(@Delim+s.Data, t.N, 1) = @Delim
AND t.N < LEN(@Delim+s.Data)
GO



I could explain in detail just how it works... but then there'd be no sense in writting the article. Wink

By the way, this method will split a million rows with an average of 10 items in the CSV, not including the display time, in just 93 seconds.

Please let me know if that works for ya...

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
John Hurrell
John Hurrell
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 11
Jeff,

I've used a numbers table and a calendar table before so I'm familiar with their uses. I thought of trying to use a numbers table but my mistake was trying to combine it with a UDF.

I'm glad I didn't need the UDF and I've been able to take your example and successfully apply it to my problem.

Thank you very much for the response and the unbelievably fast solution. I was able to split a VARCHAR(255) column in 143,615 rows into 179,700 records in 11 seconds.

- John
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)

Group: General Forum Members
Points: 89477 Visits: 41144
Very cool... thank you for the stats and the feedback, John.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Giri Duddu
Giri Duddu
SSC-Addicted
SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)

Group: General Forum Members
Points: 495 Visits: 94
if u can send me the result or script of function......

SELECT * FROM dbo.someStringSplitFunction('String To Split', 'Delimiter')

then some way can be sugested.........
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search