

SSChampion
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 12,016,
Visits: 11,046





SSCAddicted
Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429,
Visits: 1,721


ahpitre (2/11/2013) How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character. Just throw in a replace to change the second character into the desired delimiter. I can't guess what effect this would have on performance of the function.
If you are talking about a twodimensional array like 'A,B;X,Y;D,E' and you want to split that into two columns, then that's a different problem.
DECLARE @strExample VARCHAR(8000)
SET @strExample = 'A,B;C;D,E,F,G,H'
SELECT ItemNumber ,Item FROM dbo.DelimitedSplit8K(REPLACE(@strExample,';',','),',') AS dsk




SSCAddicted
Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429,
Visits: 1,721


Sorry Sean...I didn't see that you had already answered with exactly the same idea.




SSChampion
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 12,016,
Visits: 11,046





Grasshopper
Group: General Forum Members
Last Login: Friday, April 18, 2014 6:54 PM
Points: 13,
Visits: 174


If you have two delimiters you can change the code slightly:
DECLARE @pString varchar(8000) = '55555;4444,333,22,1'; DECLARE @pDelimiter1 char(1) = ','; DECLARE @pDelimiter2 char(2) = ';';
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), 10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), 10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), 10E+4 or 10,000 rows max cteTally(N) AS (==== This provides the "base" CTE and limits the number of rows right up front  for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) in (@pDelimiter1, @pDelimiter2) ), cteLen(N1,L1) AS(==== Return start and length (for use in substring) SELECT s.N1, Case when ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)s.N1,8000) < ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)s.N1,8000) then ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)s.N1,8000) else ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)s.N1,8000) end FROM cteStart s ) select * from cteLen; Performance of the second charindex might be similar to using replace on the input string. An alternative would be do join cteStart to itself to derive cteLen. For a single delimiter it is slower than charindex but for multiples it might be better. Certainly for more than 2 delimiters the self join would be simpler code to read.




Hall of Fame
Group: General Forum Members
Last Login: Today @ 5:42 PM
Points: 3,596,
Visits: 5,112


Sean Lange (2/11/2013)
ahpitre (2/11/2013) How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.Easiest way I can think of would be to simply use replace. select * from YourTable cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')
This way you are still splitting on commas but it will work for either character.
Be careful now! [Sorry I just love it when Jeff says that] Try this:
CREATE TABLE #Strings (strcol VARCHAR(8000))
;WITH Tally (n) AS ( SELECT TOP 1000 1 FROM sys.all_columns a, sys.all_columns b) INSERT INTO #Strings SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) FROM Tally
DECLARE @BlackHole VARCHAR(8000)
PRINT 'Sean''s suggestion' SET STATISTICS TIME ON select @BlackHole=Item from #Strings cross apply dbo.DelimitedSplit8k(replace(strcol, ';', ','), ',') SET STATISTICS TIME OFF
PRINT 'Dwain''s suggestion' SET STATISTICS TIME ON select @BlackHole=Item from #Strings CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a cross apply dbo.DelimitedSplit8k(MyString, ',') b SET STATISTICS TIME OFF
PRINT 'Sean''s suggestion with COLLATE' SET STATISTICS TIME ON select @BlackHole=Item from #Strings cross apply dbo.DelimitedSplit8k(replace(strcol COLLATE Latin1_General_BIN, ';', ','), ',') SET STATISTICS TIME OFF
DROP TABLE #Strings
I found that applying a builtin function to the string to be split in the DelimitedSplit8K FUNCTION's call has adverse performance effects. I can't say as to why. But these are the results:
Sean's suggestion
SQL Server Execution Times: CPU time = 27971 ms, elapsed time = 28411 ms.
Dwain's suggestion
SQL Server Execution Times: CPU time = 468 ms, elapsed time = 471 ms.
Sean's suggestion with COLLATE
SQL Server Execution Times: CPU time = 13323 ms, elapsed time = 13567 ms.
My mantra: No loops! No CURSORs! No RBAR! Hoouh!
My thought question: Have you ever been told that your query runs too fast?
My advice: INDEXing a poorperforming query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it? The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Learn to understand recursive CTEs by example. Splitting strings based on patterns can be fast!




SSCAddicted
Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429,
Visits: 1,721


Steven Willis (2/11/2013) ...I can't guess what effect this would have on performance of the function. I was wondering about that! Thanks for doing the proof testing.




Hall of Fame
Group: General Forum Members
Last Login: Today @ 5:42 PM
Points: 3,596,
Visits: 5,112





Hall of Fame
Group: General Forum Members
Last Login: Today @ 5:42 PM
Points: 3,596,
Visits: 5,112





Forum Newbie
Group: General Forum Members
Last Login: Friday, February 22, 2013 11:34 AM
Points: 7,
Visits: 18


Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice.
What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?



