Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Convert CSV values in three columns to rows Expand / Collapse
Author
Message
Posted Sunday, June 27, 2010 12:36 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 3:45 PM
Points: 536, Visits: 752
Hello,

I have a table with three columns containing CSV values. I need to normalize them. The order of the CSV values is meaningful (explained more below) Here is the sample DDL:

CREATE TABLE #t (id int, kpi1 varchar(16), kpi2 varchar(16), kpi3 varchar(16))

insert into #t (id, kpi1, kpi2, kpi3)
values(1, '0.1,0.11,0.111', '0.2,0.22,0.222', '0.3,0.33,0.333')


insert into #t (id, kpi1, kpi2, kpi3)
values(2, '0.4,0.44,0.444', '0.5,0.55,0.555', '0.6,0.66,0.666')

select *
from #t

I do have a function which converts CSV values into rows (I think I got it from this site). Here is the definition:

CREATE FUNCTION dbo.fnETL_Split_Set_CSV_To_Table (
@id int,
@String varchar(8000),
@Delimiter char(1)
)
returns @temptable TABLE (id int, items varchar(128))
as
begin
declare @idx int
declare @slice varchar(8000)

select @idx = 1
if len(@String)<1 or @String is null return

while @idx!= 0
begin
set @idx = charindex(@Delimiter,@String)
if @idx!=0
set @slice = left(@String,@idx - 1)
else
set @slice = @String

if(len(@slice)>0)
insert into @temptable(id, Items) values(@id, @slice)

set @String = right(@String,len(@String) - @idx)
if len(@String) = 0 break
end
return
end

I tried using CROSS APPLY first and it works fine for one column:

select t.id, t.kpi1, t.kpi2, t.kpi3, ca1.items as items1 
from
#t as t
cross apply
[dbo].[fnETL_Split_Set_CSV_To_Table] (t.id, t.kpi1, ',') as ca1

However, as soon as I try to add a second column to the mix, I get a cross-join, which is incorrect:


select t.id, t.kpi1, t.kpi2, t.kpi3, ca1.items as items1 , ca2.items as items2
from
#t as t
cross apply
[dbo].[fnETL_Split_Set_CSV_To_Table] (t.id, t.kpi1, ',') as ca1
cross apply
[dbo].[fnETL_Split_Set_CSV_To_Table] (t.id, t.kpi2, ',') as ca2

My desired result would look something like this:

id kpi1 kpi2 kpi3 items1 items2 items3
----------- ---------------- ---------------- ---------------- ---------- ------ -----
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.1 0.2 0.3
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.11 0.22 0.33
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.111 0.222 0.333
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.4 0.5 0.6
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.44 0.55 0.66
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.444 0.555 0.666


I hope this is clear. Basically I want to split CSV values in each of the columns without multiplying the number of records every time I do the split, but just to append them to the same id. Once again, the order of CSV values is meaningful for the subsequent manipulations. For example: 0.1, 0.2, 0.3 (first values from kpi1, kpi2, kpi3) form a meaningful set of values, that's why I want them to be in the first row. The next row contains second values from each of the kpi's: 0.11, 0.22, 0.33. Since I do have an id in my source table and the function returns this id, I thought I could do a JOIN instead of CROSS APPLY, but this gives me a syntax error:

select      t.id, t.kpi1, t.kpi2, t.kpi3, ca1.items as items1
from
#t as t
left join
[dbo].[fnETL_Split_Set_CSV_To_Table] (t.id, t.kpi1, ',') as ca1
on
t.id = ca1.id

Apparently this is invalid, because I get the error back:

Msg 4104, Level 16, State 1, Line 5
The multi-part identifier "t.id" could not be bound.
Msg 4104, Level 16, State 1, Line 5
The multi-part identifier "t.kpi1" could not be bound.


I appreciate any pointers in the right direction.

Thank you!



Post #943562
Posted Sunday, June 27, 2010 3:38 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:57 PM
Points: 7,042, Visits: 12,974
Here's how I'd do it:
Step 1: UNPIVOT the data to get a more normalized data structure.
Step 2: Apply the string split function (side note: you might want to either search this site for "DelimitedSplit8K" or have a look at the TallyTable article referenced in my signature to get a better performing split function than you currently have...)
Step 3: get the final result using the CrossTab approach (see the related link in my signature for details).

Edit: Thank you for providing ready to use sample data! Made it really easy to work on! Excellent job!!!!

;
WITH cte AS
(
SELECT id,col,val
FROM
(SELECT id, kpi1,kpi2,kpi3
FROM #t) p
UNPIVOT
(val FOR col IN
(kpi1,kpi2,kpi3)
)AS unpvt
), cte2 AS
(
SELECT id,col,val, item,itemnumber
FROM cte
CROSS APPLY ( SELECT *
FROM dbo.DelimitedSplit8K(val, ',')) z
)
SELECT
id,
MAX(CASE WHEN col='kpi1' THEN val ELSE NULL END) AS kpi1,
MAX(CASE WHEN col='kpi2' THEN val ELSE NULL END) AS kpi2,
MAX(CASE WHEN col='kpi3' THEN val ELSE NULL END) AS kpi3,
MAX(CASE WHEN col='kpi1' THEN item ELSE NULL END) AS items1 ,
MAX(CASE WHEN col='kpi2' THEN item ELSE NULL END) AS items2 ,
MAX(CASE WHEN col='kpi3' THEN item ELSE NULL END) AS items3
FROM cte2
GROUP BY id,itemnumber
ORDER BY id,itemnumber
/* result set
id kpi1 kpi2 kpi3 items1 items2 items3
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.1 0.2 0.3
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.11 0.22 0.33
1 0.1,0.11,0.111 0.2,0.22,0.222 0.3,0.33,0.333 0.111 0.222 0.333
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.4 0.5 0.6
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.44 0.55 0.66
2 0.4,0.44,0.444 0.5,0.55,0.555 0.6,0.66,0.666 0.444 0.555 0.666
*/





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #943574
Posted Sunday, June 27, 2010 7:52 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 3:45 PM
Points: 536, Visits: 752
Thank you very much, lmu92 ! Great solution and exactly what I need.

As a side note I did a search on this site for "DelimitedSplit8K", but I can't seem to find it. However I do understand that the one I listed above is inefficient. Below is the alternative, is it similar to yours?

CREATE FUNCTION [dbo].[fnSplit]
(@list VARCHAR(8000),
@delim CHAR(1) = ','
) RETURNS TABLE AS
RETURN
WITH csvtbl(START, stop) AS (
SELECT START = 1,
stop = CHARINDEX(@delim, @list + @delim)
UNION ALL
SELECT START = stop + 1,
stop = CHARINDEX(@delim,
@list + @delim, stop + 1)
FROM csvtbl
WHERE stop > 0
)
SELECT row_number() over (order by Start) as ItemID,
LTRIM(RTRIM(SUBSTRING(@list, START, CASE WHEN stop > 0 THEN stop - START ELSE 0 END)))
AS ItemValue
FROM csvtbl
WHERE stop > 0
GO





Post #943606
Posted Sunday, June 27, 2010 8:56 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:04 PM
Points: 36,795, Visits: 31,258
mishaluba (6/27/2010)
Thank you very much, lmu92 ! Great solution and exactly what I need.

As a side note I did a search on this site for "DelimitedSplit8K", but I can't seem to find it. However I do understand that the one I listed above is inefficient. Below is the alternative, is it similar to yours?

CREATE FUNCTION [dbo].[fnSplit]
(@list VARCHAR(8000),
@delim CHAR(1) = ','
) RETURNS TABLE AS
RETURN
WITH csvtbl(START, stop) AS (
SELECT START = 1,
stop = CHARINDEX(@delim, @list + @delim)
UNION ALL
SELECT START = stop + 1,
stop = CHARINDEX(@delim,
@list + @delim, stop + 1)
FROM csvtbl
WHERE stop > 0
)
SELECT row_number() over (order by Start) as ItemID,
LTRIM(RTRIM(SUBSTRING(@list, START, CASE WHEN stop > 0 THEN stop - START ELSE 0 END)))
AS ItemValue
FROM csvtbl
WHERE stop > 0
GO




mishaluba (6/27/2010)
Thank you very much, lmu92 ! Great solution and exactly what I need.

As a side note I did a search on this site for "DelimitedSplit8K", but I can't seem to find it. However I do understand that the one I listed above is inefficient. Below is the alternative, is it similar to yours?

CREATE FUNCTION [dbo].[fnSplit]
(@list VARCHAR(8000),
@delim CHAR(1) = ','
) RETURNS TABLE AS
RETURN
WITH csvtbl(START, stop) AS (
SELECT START = 1,
stop = CHARINDEX(@delim, @list + @delim)
UNION ALL
SELECT START = stop + 1,
stop = CHARINDEX(@delim,
@list + @delim, stop + 1)
FROM csvtbl
WHERE stop > 0
)
SELECT row_number() over (order by Start) as ItemID,
LTRIM(RTRIM(SUBSTRING(@list, START, CASE WHEN stop > 0 THEN stop - START ELSE 0 END)))
AS ItemValue
FROM csvtbl
WHERE stop > 0
GO




That's a recursive CTE and can be as bad as a While Loop for CPU and will usually be about 3 times worse on the number of reads. It's just another form of RBAR and it should usually be avoided.

Yes... you are correct... the search capabilities on this forum suck. I'm sure that Lutz will be back soon with his split function.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #943615
Posted Monday, June 28, 2010 12:09 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:57 PM
Points: 7,042, Visits: 12,974
Jeff Moden (6/27/2010)
...

Yes... you are correct... the search capabilities on this forum suck. I'm sure that Lutz will be back soon with his split function.


That's something I cannot confirm (at least not in this case...)

Search string: DelimitedSplit8K returned the following thread on the first page:

Thread name: SP error
... FROM #List l CROSS APPLY DelimitedSplit8K(l.List, @Delimiter) f [/code] See how the DelimitedSplit8K function returns a table that can be joined ... From HC_USERS CROSS JOIN DelimitedSplit8K(@rid, ',')[/code] Finally, clean up ...


This thread include a post by WayneS showing the DelimitedSplit8K function I'm using at the moment.

But I admitt, it's always easier to find something once you know where to look and what to look for...

@mishaluba: As a side note: please check the author of that script and see if the name rings a bell

@Jeff: I'm sure you know that I don't have my own split function. Right?




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #944146
Posted Monday, June 28, 2010 1:06 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 3:45 PM
Points: 536, Visits: 752
Once again thank you both Lutz and Jeff for your help! There is one more happy developer in the world, he learned something new today


Post #944182
Posted Monday, June 28, 2010 1:35 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:57 PM
Points: 7,042, Visits: 12,974
Glad we could help

But you did the most important part by providing ready to use sample data, by showing what you've tried so far and clearly explain what your expected output need to look like.

Posts like that are the ones I enjoy working on since it not only shows that you put some effort into your question. It also implies that you'll use the answer to learn from it and not only to copy and paste and move on.





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #944200
Posted Monday, June 28, 2010 3:51 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:40 PM
Points: 6,582, Visits: 8,864
lmu92 (6/28/2010)
This thread include a post by WayneS showing the DelimitedSplit8K function I'm using at the moment.


I believe I saw a follow-up post from Jeff where he found that the UNPIVOT (as demonstrated in the above link) turns out to be slower when he ran it on his work prod server several times. Jeff, please let us know if I remember this correctly.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #944296
Posted Tuesday, June 29, 2010 6:38 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:04 PM
Points: 36,795, Visits: 31,258
WayneS (6/28/2010)
lmu92 (6/28/2010)
This thread include a post by WayneS showing the DelimitedSplit8K function I'm using at the moment.


I believe I saw a follow-up post from Jeff where he found that the UNPIVOT (as demonstrated in the above link) turns out to be slower when he ran it on his work prod server several times. Jeff, please let us know if I remember this correctly.


You did, indeed. I didn't take the time to analyze "why" but on certain machines with multiple processors, the UNPIVOT method sometimes runs substantially slower. I also didn't understand that the function you good folks were talking about was the function that I posted. Here's the latest and greatest with all the documentation and optimizations that I currently use for production code... the documentation in the header is quite substantial. {EDIT} Updated the code below to include the lastest performance (From yesterday) thanks to Paul White.

 CREATE FUNCTION dbo.DelimitedSplit8K
/***************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list of the split elements (items).

Returns:
iTVF containing the following:
ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
Item = Element value as a VARCHAR(8000)

CROSS APPLY Usage Example:
---------------------------------------------------------------------------------------------------
--===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL
DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
SELECT *
INTO #JBMTest
FROM (
SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL
SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL
SELECT 3, 'This,is,a,test' UNION ALL
SELECT 4, 'and so is this' UNION ALL
SELECT 5, 'This, too (no pun intended)'
) d (SomeID,SomeValue)
;
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, split.ItemNumber, split.Item
FROM #JBMTest test
CROSS APPLY
(
SELECT ItemNumber, Item
FROM dbo.DelimitedSplit8k(test.SomeValue,',')
) split
;
---------------------------------------------------------------------------------------------------
Notes:
1. Optimized for VARCHAR(7999) or less. No testing or error reporting for truncation at 7999
characters is done.
2. Optimized for single character delimiter. Multi-character delimiters should be resolved
externally from this function.
3. Optimized for use with CROSS APPLY.
4. Does not "trim" elements just in case leading or trailing blanks are intended.
5. If you don't know how a Tally table can be used to replace loops, please see the following...
http://www.sqlservercentral.com/articles/T-SQL/62867/
6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow. It's just the
nature of VARCHAR(MAX) whether it fits in-row or not.
7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows
that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.
8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually
slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2.
9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually
slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).

Credits:
This code is the product of many people's efforts including but not limited to the following:
cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a
bit of extra speed and finally redacted by Jeff Moden for a different slant on readability and
compactness. Hat's off to Paul White for his simple explanations of CROSS APPLY. Finally,
special thanks to Erland Sommarskog for his tireless efforts to help people understand
what you can actually do with T-SQL. I also thank whoever wrote the first article I ever saw
on "numbers tables" which is located at the following URL ...
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html

Revision History:
Rev 00 - 20 Jan 2010 - Concept: Lynn Pettis and others.
Redaction/Implementation: Jeff Moden
- Base 10 redaction and reduction for CTE. (Total rewrite)

Rev 01 - 13 Mar 2010 - Jeff Moden
- Removed one additional concatenation and one subtraction from the SUBSTRING in the
SELECT List for that tiny bit of extra speed.

Rev 02 - 14 Apr 2010 - Jeff Moden
- No code changes. Added CROSS APPLY usage example to the header, some additional credits,
and extra documentation.

Rev 03 - 18 Apr 2010 - Jeff Moden
- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'
actually work for this type of function.

Rev 04 - 29 Jun 2010 - Jeff Moden
- Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary
"Table Spool" when the function is used in an UPDATE statement even though the function
makes no external references.
***************************************************************************************************/
--===== Define I/O parameters
(
@pString VARCHAR(7999),
@pDelimiter CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values up to
-- 10,000... enough to cover VARCHAR(8000)
WITH
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)
--===== Do the split
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item
FROM cteTally
WHERE N < LEN(@pString) + 2
AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
;
GO




--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #944589
Posted Tuesday, June 29, 2010 7:38 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:40 PM
Points: 6,582, Visits: 8,864
Sigh... I sure wish we could put into the briefcase an individual post on the forum... This one of Jeff's certainly would be in it.

@Jeff -
1. Did you test after adding the WITH SCHEMABINDING? If not, I see a test in my near future.
2. I see you changed the input parameter from varchar(8000) to varchar(7999), but I don't see a reason for that in your notes. Would you elaborate on this please?
Edit:
3. Is it safe to assume that a properly configured (Clustered index, 100% FillFactor) real tally table would perform better than the inline one (the comment Scott made below brings this up)?


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #944648
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse