

SSCoach
Group: General Forum Members
Last Login: Yesterday @ 3:14 PM
Points: 16,067,
Visits: 16,681


ahpitre (2/12/2013)
Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice. What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?
The whole point of this function to not do any looping. Looping is what causes sql server to crawl like a snail.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Moden's splitter.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns Cross Tabs and Pivots, Part 2  Dynamic Cross Tabs Understanding and Using APPLY (Part 1) Understanding and Using APPLY (Part 2)




Forum Newbie
Group: General Forum Members
Last Login: Wednesday, September 10, 2014 10:29 AM
Points: 7,
Visits: 20


OK. Great advice. I guess I could modify it just so the user can provide multiple delimiters, then use the replace before doing the split. Thanks.




SSCoach
Group: General Forum Members
Last Login: Yesterday @ 3:14 PM
Points: 16,067,
Visits: 16,681


Very interesting Dwain. I tried another idea to see how it would hold up. It seems that it is possible to use another temp table to hold the replaced values. I tried with your sample data and scaled up x10 up to a million rows and this approach seems to have a slight edge at all those sizes. Now if the table had more columns this is going to degrade as is scales but it is certainly interesting.
CREATE TABLE #Strings (strcol VARCHAR(8000))
;WITH Tally (n) AS ( SELECT TOP 10000 1 FROM sys.all_columns a, sys.all_columns b) INSERT INTO #Strings SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) FROM Tally
DECLARE @BlackHole VARCHAR(8000)
PRINT 'Sean''s new suggestion' SET STATISTICS TIME ON
create a new table using the replace logic select replace(strcol, ';', ',') as strcol into #NewStrings from #Strings
select @BlackHole=Item from #NewStrings cross apply dbo.DelimitedSplit8k(strcol, ',')
SET STATISTICS TIME OFF
PRINT 'Dwain''s suggestion' SET STATISTICS TIME ON select @BlackHole=Item from #Strings CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a cross apply dbo.DelimitedSplit8k(MyString, ',') b SET STATISTICS TIME OFF
DROP TABLE #Strings DROP TABLE #NewStrings
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Moden's splitter.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns Cross Tabs and Pivots, Part 2  Dynamic Cross Tabs Understanding and Using APPLY (Part 1) Understanding and Using APPLY (Part 2)




SSCrazy Eights
Group: General Forum Members
Last Login: Sunday, October 9, 2016 5:09 PM
Points: 9,932,
Visits: 11,344


dwain.c (2/11/2013) I found that applying a builtin function to the string to be split in the DelimitedSplit8K function's call has adverse performance effects. If you look closely at the execution plans, you'll see the ones that do not perform well end up doing the REPLACE on a big string a *ridiculous* number of times. This is because the optimizer hardly costs scalar functions at all, so it does not care very much how many times they are executed so long as the result is correct.
Physically separating the replace from the function call using Sean's method is a supported way to work around this limitation, though it does involve writing a copy of the whole input set. There are also *unreliable* tricks like the following, which may cause the replace to be applied only once:
SELECT @BlackHole = dsk.Item FROM ( SELECT strcol = REPLACE(strcol, ';', ',') + LEFT(NEWID(), 0) FROM #Strings ) AS s CROSS APPLY dbo.DelimitedSplit8K(s.strcol, ',') AS dsk; None of the methods shown so far performs as well (for me) as simply applying the SQLCLR function twice:
SELECT dsk2.Item FROM #Strings CROSS APPLY dbo.SplitterB(strcol, ',') AS dsk1 CROSS APPLY dbo.SplitterB(dsk1.Item, ';') AS dsk2; That returns results so quickly I didn't even bother coding up a CLR function that would accept an array of delimiters. No doubt that would be even faster. The same idea could be applied to the TSQL function, I suppose, but the implementation and testing looks decidedly nontrivial to me. For anyone that needs the SplitterB code:
CREATE ASSEMBLY [Split] FROM dbo].[SplitterB] (@Input [nvarchar](max), @Delimiter [nchar](1)) RETURNS TABLE ( [sequence] [int] NULL, [item] [nvarchar](4000) NULL ) WITH EXECUTE AS CALLER AS EXTERNAL NAME [Split].[UserDefinedFunctions].[SplitterB];
Paul White SQLPerformance.com SQLblog.com @SQL_Kiwi




Hall of Fame
Group: General Forum Members
Last Login: Wednesday, February 24, 2016 6:28 AM
Points: 3,977,
Visits: 6,431


Sean Lange (2/12/2013)
ahpitre (2/12/2013)
Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice. What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split? The whole point of this function to not do any looping. Looping is what causes sql server to crawl like a snail.
The other way instead of looping (no recursion required) is to use cascading CROSS APPLYs, as Paul has done in his example where he calls the CLR splitter twice.
My mantra: No loops! No CURSORs! No RBAR! Hoouh!
My thought question: Have you ever been told that your query runs too fast?
My advice: INDEXing a poorperforming query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it? The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Learn to understand recursive CTEs by example. Splitting strings based on patterns can be fast! My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Selfmaintaining, Contiguous Effective Dates in Temporal Tables




Hall of Fame
Group: General Forum Members
Last Login: Wednesday, February 24, 2016 6:28 AM
Points: 3,977,
Visits: 6,431


Sean Lange (2/12/2013)
Very interesting Dwain. I tried another idea to see how it would hold up. It seems that it is possible to use another temp table to hold the replaced values. I tried with your sample data and scaled up x10 up to a million rows and this approach seems to have a slight edge at all those sizes. Now if the table had more columns this is going to degrade as is scales but it is certainly interesting. CREATE TABLE #Strings (strcol VARCHAR(8000))
;WITH Tally (n) AS ( SELECT TOP 10000 1 FROM sys.all_columns a, sys.all_columns b) INSERT INTO #Strings SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' + REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) FROM Tally
DECLARE @BlackHole VARCHAR(8000)
PRINT 'Sean''s new suggestion' SET STATISTICS TIME ON
create a new table using the replace logic select replace(strcol, ';', ',') as strcol into #NewStrings from #Strings
select @BlackHole=Item from #NewStrings cross apply dbo.DelimitedSplit8k(strcol, ',')
SET STATISTICS TIME OFF
PRINT 'Dwain''s suggestion' SET STATISTICS TIME ON select @BlackHole=Item from #Strings CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a cross apply dbo.DelimitedSplit8k(MyString, ',') b SET STATISTICS TIME OFF
DROP TABLE #Strings DROP TABLE #NewStrings
Sean  Very nice touch. Your temp table approach beats the cascading CROSS APPLYs and a couple of variants I tried it against, even without adding the COLLATE on REPLACE.
My mantra: No loops! No CURSORs! No RBAR! Hoouh!
My thought question: Have you ever been told that your query runs too fast?
My advice: INDEXing a poorperforming query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it? The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Learn to understand recursive CTEs by example. Splitting strings based on patterns can be fast! My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Selfmaintaining, Contiguous Effective Dates in Temporal Tables




Hall of Fame
Group: General Forum Members
Last Login: Wednesday, February 24, 2016 6:28 AM
Points: 3,977,
Visits: 6,431


Paul White (2/12/2013)
dwain.c (2/11/2013) I found that applying a builtin function to the string to be split in the DelimitedSplit8K function's call has adverse performance effects.If you look closely at the execution plans, you'll see the ones that do not perform well end up doing the REPLACE on a big string a *ridiculous* number of times. This is because the optimizer hardly costs scalar functions at all, so it does not care very much how many times they are executed so long as the result is correct. Physically separating the replace from the function call using Sean's method is a supported way to work around this limitation, though it does involve writing a copy of the whole input set. There are also *unreliable* tricks like the following, which may cause the replace to be applied only once: SELECT @BlackHole = dsk.Item FROM ( SELECT strcol = REPLACE(strcol, ';', ',') + LEFT(NEWID(), 0) FROM #Strings ) AS s CROSS APPLY dbo.DelimitedSplit8K(s.strcol, ',') AS dsk;
Paul  Thanks for the analysis. Sorry for the basic question but, can you show me where exactly in the execution plan you're seeing this? I'm not really very good at reading them but I want to improve.
And yes, the CLR approach certainly rules the roost here. I was just trying to avoid causing the suggestion of doing the REPLACE inside the DelimitedSplit8K call from doing something unexepected, as like I said I'd seen this issue before.
My mantra: No loops! No CURSORs! No RBAR! Hoouh!
My thought question: Have you ever been told that your query runs too fast?
My advice: INDEXing a poorperforming query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it? The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Learn to understand recursive CTEs by example. Splitting strings based on patterns can be fast! My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Selfmaintaining, Contiguous Effective Dates in Temporal Tables




SSCrazy Eights
Group: General Forum Members
Last Login: Sunday, October 9, 2016 5:09 PM
Points: 9,932,
Visits: 11,344


dwain.c (2/12/2013) Sorry for the basic question but, can you show me where exactly in the execution plan you're seeing this? Taking the following code as an example (which ran for 5m 33s on my SQL Server 2012 machine):
select @BlackHole=Item from #Strings cross apply dbo.DelimitedSplit8k(replace(strcol, ';', ','), ',') The execution plan is:
The Filter operator executes 1,000 times applying the following predicate to the 1,291,917 rows it receives:
substring(replace([tempdb].[dbo].[#Strings].[strcol],';',','),CONVERT_IMPLICIT(int,[Expr1054],0),(1))=','
So that particular REPLACE executes 1.3M times.
The other references are in the Compute Scalars:
Now there are some added complications regarding exactly when each defined expression gets evaluated and how many times, but that's enough to give you the flavour.
Paul White SQLPerformance.com SQLblog.com @SQL_Kiwi




Hall of Fame
Group: General Forum Members
Last Login: Wednesday, February 24, 2016 6:28 AM
Points: 3,977,
Visits: 6,431





Forum Newbie
Group: General Forum Members
Last Login: Wednesday, September 10, 2014 10:29 AM
Points: 7,
Visits: 20


How do you use this function? Also, how can I pass an additional parameter, so it's always inserted into the new table? I have a column named Part. I want the table with the split to include Part (which is repeated for all substrings that are splitted from main string). My final output should be something like this :
Input_table
Part Specs  123 Ddfldkk; P4987843; D48974587 456 Adfldkk; Z4987843
Output_table (created by Split function)
Part Specs  123 Ddfldkk 123 P4987843 123 D48974587 456 Adfldkk 456 Z4987843



