Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««4849505152»»»

Tally OH! An Improved SQL 8K “CSV Splitter” Function Expand / Collapse
Author
Message
Posted Monday, February 11, 2013 1:03 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 8:01 AM
Points: 13,309, Visits: 12,170
ahpitre (2/11/2013)
How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.


Easiest way I can think of would be to simply use replace.

select *
from YourTable
cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

This way you are still splitting on commas but it will work for either character.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1418615
Posted Monday, February 11, 2013 3:26 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
ahpitre (2/11/2013)
How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

Just throw in a replace to change the second character into the desired delimiter. I can't guess what effect this would have on performance of the function.

If you are talking about a two-dimensional array like 'A,B;X,Y;D,E' and you want to split that into two columns, then that's a different problem.


DECLARE @strExample VARCHAR(8000)

SET @strExample = 'A,B;C;D,E,F,G,H'

SELECT
ItemNumber
,Item
FROM
dbo.DelimitedSplit8K(REPLACE(@strExample,';',','),',') AS dsk


Post #1418659
Posted Monday, February 11, 2013 3:28 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
Sorry Sean...I didn't see that you had already answered with exactly the same idea.

 
Post #1418661
Posted Monday, February 11, 2013 3:39 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 8:01 AM
Points: 13,309, Visits: 12,170
Steven Willis (2/11/2013)
Sorry Sean...I didn't see that you had already answered with exactly the same idea.

 


No worries. Great minds...


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1418667
Posted Monday, February 11, 2013 5:13 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, August 7, 2014 2:07 AM
Points: 13, Visits: 216
If you have two delimiters you can change the code slightly:

DECLARE @pString varchar(8000) = '55555;4444,333,22,1';
DECLARE @pDelimiter1 char(1) = ',';
DECLARE @pDelimiter2 char(2) = ';';

WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) in (@pDelimiter1, @pDelimiter2)
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
Case when ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)
< ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)
then ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)
else ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)
end
FROM cteStart s
)
select * from cteLen;

Performance of the second charindex might be similar to using replace on the input string. An alternative would be do join cteStart to itself to derive cteLen. For a single delimiter it is slower than charindex but for multiples it might be better. Certainly for more than 2 delimiters the self join would be simpler code to read.
Post #1418690
Posted Monday, February 11, 2013 5:57 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:10 AM
Points: 3,634, Visits: 5,283
Sean Lange (2/11/2013)
ahpitre (2/11/2013)
How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.


Easiest way I can think of would be to simply use replace.

select *
from YourTable
cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

This way you are still splitting on commas but it will work for either character.


Be careful now! [Sorry I just love it when Jeff says that] Try this:

CREATE TABLE #Strings
(strcol VARCHAR(8000))

;WITH Tally (n) AS (
SELECT TOP 1000 1 FROM sys.all_columns a, sys.all_columns b)
INSERT INTO #Strings
SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +
REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20)
FROM Tally

DECLARE @BlackHole VARCHAR(8000)

PRINT 'Sean''s suggestion'
SET STATISTICS TIME ON
select @BlackHole=Item
from #Strings
cross apply dbo.DelimitedSplit8k(replace(strcol, ';', ','), ',')
SET STATISTICS TIME OFF

PRINT 'Dwain''s suggestion'
SET STATISTICS TIME ON
select @BlackHole=Item
from #Strings
CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a
cross apply dbo.DelimitedSplit8k(MyString, ',') b
SET STATISTICS TIME OFF

PRINT 'Sean''s suggestion with COLLATE'
SET STATISTICS TIME ON
select @BlackHole=Item
from #Strings
cross apply dbo.DelimitedSplit8k(replace(strcol COLLATE Latin1_General_BIN, ';', ','), ',')
SET STATISTICS TIME OFF

DROP TABLE #Strings


I found that applying a built-in function to the string to be split in the DelimitedSplit8K FUNCTION's call has adverse performance effects. I can't say as to why. But these are the results:

Sean's suggestion

SQL Server Execution Times:
CPU time = 27971 ms, elapsed time = 28411 ms.

Dwain's suggestion

SQL Server Execution Times:
CPU time = 468 ms, elapsed time = 471 ms.

Sean's suggestion with COLLATE

SQL Server Execution Times:
CPU time = 13323 ms, elapsed time = 13567 ms.




My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!


  Post Attachments 
Query Plan.sqlplan (1 view, 194.27 KB)
Post #1418705
Posted Monday, February 11, 2013 9:16 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
Steven Willis (2/11/2013)
...I can't guess what effect this would have on performance of the function.

I was wondering about that! Thanks for doing the proof testing.

 
Post #1418734
Posted Monday, February 11, 2013 9:27 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:10 AM
Points: 3,634, Visits: 5,283
Steven Willis (2/11/2013)
Steven Willis (2/11/2013)
...I can't guess what effect this would have on performance of the function.

I was wondering about that! Thanks for doing the proof testing.

 


I would love to hear an explanation for it. It is fortunate that I tried something like this once before so I knew the consequences.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1418737
Posted Monday, February 11, 2013 11:17 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:10 AM
Points: 3,634, Visits: 5,283
Note that I posted the query plan in my previous post in case someone wants to take a shot at that analysis.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1418758
Posted Tuesday, February 12, 2013 8:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, February 22, 2013 11:34 AM
Points: 7, Visits: 18
Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice.

What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?
Post #1419021
« Prev Topic | Next Topic »

Add to briefcase «««4849505152»»»

Permissions Expand / Collapse