Set-Based String Splitting table function

Question

Set-Based String Splitting table function

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

bevanward SSC Eights! Points: 855 More actions · Answer 1

Jeff Moden - Tuesday, March 20, 2018 3:10 PM
bevanward - Tuesday, March 20, 2018 3:04 PM
Jeff Moden - Monday, March 19, 2018 7:06 PM
Cool feedback. Thanks, Bevan.
I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
Hi Jeff
Have modified Eirikur's version for multiple character delimiters and longer fields. Certainly runs fast from what I've seen so far have to test more.
The only issue I have stumbled into so far is where delimiters overlap and it splits multiple times - for example when splitting with three dots and try and pass through a area with lots of dots. Any thoughts appreciated. Thanks for pointing this all out.
Thanks
Bevan
Post the modifications you made.

This is what it looks like - did not realise can not attach .sql files.

Thanks
Bevan

use tempdb go

drop function if exists DelimitedSplit_Max_Multi_LAG go

create function [dbo].[DelimitedSplit_Max_Multi_LAG] /* Based on http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/ and http://www.sqlservercentral.com/articles/Tally+Table/72993/

*/

--===== Define I/O parameters (@pString NVARCHAR(max), @pDelimiter NVARCHAR(1000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Tableâ€ produces values from 0 up to 100,000,000... -- enough to cover a big file WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows E6(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+6 or 1,000,000 E8(N) AS (SELECT 1 FROM E6 a, E6 b), --10E+8 or 100,000,000 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8 ), cteStart(N1) AS (--==== Find right edge of delimiter SELECT t.N+case when t.N = 0 then 0 else datalength(@pDelimiter)/2 end FROM cteTally t WHERE (SUBSTRING(@pString,t.N,datalength(@pDelimiter)/2) = @pDelimiter OR t.N = 0) ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1), Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - case when s.N1=1 or count(*) over (partition by (select null)) = ROW_NUMBER() OVER(ORDER BY s.N1) then 1 else (datalength(@pDelimiter)/2) end),0)-s.N1,datalength(@pString)/2)) FROM cteStart s ; GO

select * from dbo.DelimitedSplit_Max_Multi_LAG('The shortbeard codling can grow to a maximum standard length of about 33 cm (13 in) and takes the form of a somewhat spindle-shaped cylinder. The snout is fairly blunt and the upper jaw is slightly longer than the lower jaw. There is a short barbel on the chin and a flattened spine on the operculum. There are two dorsal fins with no spines and a total of from 69 to 76 soft rays. The first dorsal fin has 6 rays, the second of which is the longest, and the second dorsal fin originates immediately behind the first fin. Each pectoral fin has 17 to 23 soft rays. The pelvic fins are widely separated and each has two elongated soft rays; these fins originate to the front of the pectoral fins. The anal fin has no spines and has 57 to 63 soft rays. The caudal fin is small and rounded. The general colour of this fish is tan, the dorsal and anal fins having dark edges.','. ')

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 2

bevanward - Tuesday, March 20, 2018 1:55 AM
Hi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.
Thanks again keep well
Bevan

Hi Bevan,

Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.

CREATE VIEW [phi].[vwPROCEDURE_V] AS WITH cteProcedure_Code AS ( SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|', CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-')) ) ds ) , cteProcedure_Date AS ( SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS VARCHAR(9)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|', CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-')) ) ds ) , cteProcedure_Location AS ( SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS CHAR(1)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|', CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-')) ) ds ) , cteBlock_Num AS ( SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS VARCHAR(4)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|', CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-')) ) ds ) SELECT -- Source a.FACILITY_STAY_EPISODE_ID ,a.FACILITY_STAY_ID ,a.FACILITY_ID ,a.clinical_codeset -- Procedure Code ,b.ItemNumber-1 AS [ROW_NUMBER] ,b.ItemNumber-1 AS procedure_code_number ,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code -- Procedure Date ,c.ItemNumber-1 AS procedure_date_number ,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date -- Procedure Location ,d.ItemNumber-1 AS procedure_location_number ,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location -- Procedure Type ,CAST( CASE WHEN b.ItemNumber-1 = 0 THEN 'P' ELSE FORMAT(b.ItemNumber-1,'00') END AS CHAR(2) ) AS procedure_type -- Block Number ,e.ItemNumber-1 AS block_num_number ,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num

-- Source FROM trn.PROCEDURE a

-- Procedure Code LEFT JOIN cteProcedure_Code b ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID

-- Procedure Date LEFT JOIN cteProcedure_Date c ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=c.ItemNumber

-- Procedure Location LEFT JOIN cteProcedure_Location d ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=d.ItemNumber

-- Block Num LEFT JOIN cteBlock_Num e ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=e.ItemNumber GO

The source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.

Regards,
Scott

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 3

I have previously modified my copy of Jeff's code to only return non-blank items, and return them stripped of leading and trailing spaces.

Here is an excerpt of my changes:

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. --===== Strip leading and trailing spaces, only return non-empty results.

SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY l.N1), Item = LTRIM(RTRIM(SUBSTRING(@pString, l.N1, l.L1))) FROM cteLen l WHERE LEN(SUBSTRING(@pString, l.N1, l.L1)) > 0

I want to do a similar thing for Eirikur's code, but it's a bit trickier due to the LEAD() windowing function.

I include my changes in their entirety, including a longer header with use cases. Some of the header comments are meant to help less TSQL-aware colleagues. The changes are at the bottom of the code.

Compare the results of my use cases with the original code vs. my changes - my desired output should be obvious by comparing the two results.

If there is a better (i.e. more performant) way to implement these changes please let me know.

/*===================================================================== Function Name : fn_MoH_DelimitedSplit8K_LEAD.sql Purpose : User Defined Function to split a string based on a delimiter and return the tokens in the string as a table object. Uses the LEAD windowing function for performance gains over the DelimitedSplit8K function. SQL Server Version : Microsoft SQL Server 2012 (SP3) (KB3072779) - 11.0.6020.0 (X64)

Other Code Called : None

Originally Written by : Scott In Sydney Date : 21MAR2018 Stored Process Version : 1.0

=======================================================================

This code was originally published at: http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/

See also: http://www.sqlservercentral.com/articles/Tally+Table/72993/

Full attribution is given to the original author Eirikur Eiriksson by virtue of the above URL. My assumption is that this code is in the public domain by virtue of that URL.

=======================================================================

Modification History : Original version

=====================================================================*/

/*--------------------------------------------------------------------- Usage:

-- Create a synonym to this function in the desired database USE Dev GO

DROP SYNONYM [dbo].[fn_MoH_SplitDelimitedStringToTable] CREATE SYNONYM [dbo].[fn_MoH_SplitDelimitedStringToTable] FOR [master].[dbo].[fn_MoH_DelimitedSplit8K_LEAD]

SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item FROM fn_MoH_SplitDelimitedStringToTable(' ','foo bar blah') ds

-- default delimiter is a blank space, but a UDF require all parms to be specified, use "DEFAULT" for the default delimiter SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item FROM dbo.fn_MoH_SplitDelimitedStringToTable(DEFAULT,'foo bar blah') ds

Returns:

foo bar blah

as a table, with Column Name "Item".

SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item FROM master.dbo.fn_MoH_DelimitedSplit8K_LEAD(',','fie, fi,,,, fo, fum') ds

Returns:

fie fi fo fum

as a table, with Column Name "Item".

=======================================================================

-- Cross Apply with another table SELECT s.name, ds.Item FROM ( SELECT TOP 3 * FROM sys.databases ) s CROSS APPLY fn_MoH_SplitDelimitedStringToTable(' ','foo bar blah') ds

SELECT s.name, ds.Item FROM ( SELECT TOP 3 * FROM sys.databases ) s CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable(' ',' foo bar blah ') ds

SELECT s.name, ds.Item FROM ( SELECT TOP 3 * FROM sys.databases ) s CROSS APPLY master.dbo.fn_MoH_DelimitedSplit8K_LEAD('^','foo^^bar^^^blah') ds

Returns foo/bar/blah cross joined (Cartesian Product) with the first three rows from sys.objects.

=======================================================================

-- Split a delimted string supplied by a table instead of hard coded IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp

SELECT * INTO #temp FROM ( SELECT ' foo| bar| blah ' UNION ALL SELECT 'fie|fi|fo|fum' ) d (string)

SELECT * FROM #temp

SELECT t.string, ds.ItemNumber, ds.Item FROM #temp t CROSS APPLY master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|',t.string) ds

Returns the delimited string data split by the supplied delimiter, with the string data supplied by a column in a table.

-- Nested query SELECT s.name, x.ItemNumber, x.Item FROM ( SELECT TOP 3 * FROM sys.databases ) s CROSS APPLY ( SELECT ds.* FROM #temp t CROSS APPLY master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|',t.string) ds ) x

IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp

Returns the delimited string data split by the supplied delimiter, with the string data supplied by a column in a table, cross joined (Cartesian Product) with the first three rows from sys.objects.

=======================================================================

----------------------------------------------------------------------- Notes:

The input delimited string is limited to 8000 characters.

Do not mark this user defined function as a system function via sp_MS_marksystemobject. This only works for stored procedures.

Instead, create a synonym in each database requiring this UDF. This will allow a two-level name to be used to call this UDF.

Alternatively, use a three-level name to call this UDF, since it does not require the calling database context.

---------------------------------------------------------------------*/

ALTER FUNCTION [dbo].[fn_MoH_DelimitedSplit8K_LEAD] --===== Define I/O parameters ( @pDelimiter CHAR(1) = ' ' , @pString VARCHAR(8000) ) --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE! RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS ( SELECT 1 FROM E1 a, E1 b ), --10E+2 or 100 rows E4(N) AS ( SELECT 1 FROM E2 a, E2 b ), --10E+4 or 10,000 rows max cteTally(N) AS ( --==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS ( --==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) ), cteResults AS ( --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000)) FROM cteStart s ) --===== Strip leading and trailing spaces, only return non-empty results. SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), LTRIM(RTRIM(r.Item)) AS Item FROM cteResults r WHERE LEN(r.Item) > 0 ;

/******* END OF FILE *******/

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 4

Also, I've done a performance comparison of DelimitedSplit8K vs. DelimitedSplit8K_LEAD, given my use case and minor code changes noted in previous posts above.

DelimitedSplit8K:

SET STATISTICS TIME ON

SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.DIAGNOSIS src CROSS APPLY master.dbo.fn_MoH_DelimitedSplit8K('|', CONCAT(src.diagnosis_code_p,'|',src.diagnosis_code_sec) ) ds WHERE ds.Item IS NOT NULL

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(17277002 rows affected)
SQL Server Execution Times:
CPU time = 3927157 ms, elapsed time = 4867563 ms. (Which is 01:27:07)

DelimitedSplit8K_LEAD:

SET STATISTICS TIME ON

SELECT src.FACILITY_STAY_EPISODE_ID ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.DIAGNOSIS src CROSS APPLY master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|', CONCAT(src.diagnosis_code_p,'|',src.diagnosis_code_sec) ) ds WHERE ds.Item IS NOT NULL

SQL Server parse and compile time:
CPU time = 11 ms, elapsed time = 11 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(17277002 rows affected)
SQL Server Execution Times:
CPU time = 906594 ms, elapsed time = 1832653 ms. (Which is 00:30:22)

Hope this helps...

bevanward SSC Eights! Points: 855 More actions · Answer 5

Scott In Sydney - Tuesday, March 20, 2018 5:05 PM
bevanward - Tuesday, March 20, 2018 1:55 AM
Hi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.
Thanks again keep well
Bevan
Hi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from  Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V] AS WITH cteProcedure_Code AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-')) ) ds ) , cteProcedure_Date AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(9)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-')) ) ds ) , cteProcedure_Location AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS CHAR(1))   AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-')) ) ds ) , cteBlock_Num AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(4)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-')) ) ds ) SELECT    -- Source    a.FACILITY_STAY_EPISODE_ID   ,a.FACILITY_STAY_ID   ,a.FACILITY_ID   ,a.clinical_codeset      -- Procedure Code   ,b.ItemNumber-1 AS [ROW_NUMBER]   ,b.ItemNumber-1 AS procedure_code_number   ,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code      -- Procedure Date   ,c.ItemNumber-1 AS procedure_date_number   ,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date      -- Procedure Location   ,d.ItemNumber-1 AS procedure_location_number   ,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location      -- Procedure Type   ,CAST(    CASE     WHEN b.ItemNumber-1 = 0 THEN 'P'     ELSE FORMAT(b.ItemNumber-1,'00')    END AS CHAR(2)    )      AS procedure_type       -- Block Number   ,e.ItemNumber-1 AS block_num_number   ,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num
-- Source FROM trn.PROCEDURE a
-- Procedure Code   LEFT JOIN cteProcedure_Code b ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID
-- Procedure Date LEFT JOIN cteProcedure_Date c ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=c.ItemNumber
-- Procedure Location LEFT JOIN cteProcedure_Location d ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=d.ItemNumber
-- Block Num LEFT JOIN cteBlock_Num e ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=e.ItemNumber GO
The source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
Scott

Hi Scott

Just an initial thought that might be completely off but thought I should mention first up.

Can you run a bunch of batches in parallel?

If you can monopolize the resources on the server then ...

Run the original SQL2012 lag split for each column concurrently

For example run a bunch of these concurrently:

insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is null

If you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.

Then once it is complete recomposite the results into the final required shape?

I think with that number of records you should be able to get that processing time down a lot.

Will have more of look tonight when I get off

Thanks
Bevan

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 6

bevanward - Wednesday, March 21, 2018 5:22 AM
Scott In Sydney - Tuesday, March 20, 2018 5:05 PM
bevanward - Tuesday, March 20, 2018 1:55 AM
Hi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.
Thanks again keep well
Bevan
Hi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from  Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V] AS WITH cteProcedure_Code AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-')) ) ds ) , cteProcedure_Date AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(9)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-')) ) ds ) , cteProcedure_Location AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS CHAR(1))   AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-')) ) ds ) , cteBlock_Num AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(4)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-')) ) ds ) SELECT    -- Source    a.FACILITY_STAY_EPISODE_ID   ,a.FACILITY_STAY_ID   ,a.FACILITY_ID   ,a.clinical_codeset      -- Procedure Code   ,b.ItemNumber-1 AS [ROW_NUMBER]   ,b.ItemNumber-1 AS procedure_code_number   ,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code      -- Procedure Date   ,c.ItemNumber-1 AS procedure_date_number   ,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date      -- Procedure Location   ,d.ItemNumber-1 AS procedure_location_number   ,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location      -- Procedure Type   ,CAST(    CASE     WHEN b.ItemNumber-1 = 0 THEN 'P'     ELSE FORMAT(b.ItemNumber-1,'00')    END AS CHAR(2)    )      AS procedure_type       -- Block Number   ,e.ItemNumber-1 AS block_num_number   ,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num
-- Source FROM trn.PROCEDURE a
-- Procedure Code   LEFT JOIN cteProcedure_Code b ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID
-- Procedure Date LEFT JOIN cteProcedure_Date c ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=c.ItemNumber
-- Procedure Location LEFT JOIN cteProcedure_Location d ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=d.ItemNumber
-- Block Num LEFT JOIN cteBlock_Num e ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=e.ItemNumber GO
The source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
Scott
Hi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is null
If you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan

Hi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those

😎

bevanward SSC Eights! Points: 855 More actions · Answer 7

Eirikur Eiriksson - Wednesday, March 21, 2018 9:02 AM
bevanward - Wednesday, March 21, 2018 5:22 AM
Scott In Sydney - Tuesday, March 20, 2018 5:05 PM
bevanward - Tuesday, March 20, 2018 1:55 AM
Hi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.
Thanks again keep well
Bevan
Hi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from  Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V] AS WITH cteProcedure_Code AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(8)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-')) ) ds ) , cteProcedure_Date AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(9)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-')) ) ds ) , cteProcedure_Location AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS CHAR(1))   AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-')) ) ds ) , cteBlock_Num AS ( SELECT src.FACILITY_STAY_EPISODE_ID     ,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber     ,CAST(ds.Item AS VARCHAR(4)) AS Item FROM trn.PROCEDURE src CROSS APPLY dbo.fn_MoH_SplitDelimitedStringToTable('|',    CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-')) ) ds ) SELECT    -- Source    a.FACILITY_STAY_EPISODE_ID   ,a.FACILITY_STAY_ID   ,a.FACILITY_ID   ,a.clinical_codeset      -- Procedure Code   ,b.ItemNumber-1 AS [ROW_NUMBER]   ,b.ItemNumber-1 AS procedure_code_number   ,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code      -- Procedure Date   ,c.ItemNumber-1 AS procedure_date_number   ,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date      -- Procedure Location   ,d.ItemNumber-1 AS procedure_location_number   ,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location      -- Procedure Type   ,CAST(    CASE     WHEN b.ItemNumber-1 = 0 THEN 'P'     ELSE FORMAT(b.ItemNumber-1,'00')    END AS CHAR(2)    )      AS procedure_type       -- Block Number   ,e.ItemNumber-1 AS block_num_number   ,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num
-- Source FROM trn.PROCEDURE a
-- Procedure Code   LEFT JOIN cteProcedure_Code b ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID
-- Procedure Date LEFT JOIN cteProcedure_Date c ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=c.ItemNumber
-- Procedure Location LEFT JOIN cteProcedure_Location d ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=d.ItemNumber
-- Block Num LEFT JOIN cteBlock_Num e ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID AND b.ItemNumber=e.ItemNumber GO
The source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
Scott
Hi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is null
If you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan
Hi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those
😎

Hi Eirikur

Firstly thanks for your article Reaping the benefits of the Window functions in T-SQL! Jeff pointed it out to me after I had a go at writing a set-based split function and since have had a great education in taking it all much further.

My main need to adapt what you have written are:
1. Multiple character delimiters
2. Text longer than varchar(8000)

I was wondering if you can have a look at what I have come up with from your experience. It is already significantly faster than what I have had in use however would appreciate any comments if you get a chance.
Many thanks
Bevan

use tempdb go

drop function if exists DelimitedSplit_Max_Multi_LAG go

create function [dbo].[DelimitedSplit_Max_Multi_LAG]

/* This script is based on a modification of an excellent script by Eirikur Eiriksson http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/

And commentary and excellent work from Jeff Moden http://www.sqlservercentral.com/articles/Tally+Table/72993/

Author: Bevan Ward Date: 20-Feb-2018 Purpose: Table function that takes a string of nvarchar(max) value and a delimiter of nvarchar(1000) and splits the String into records on the delimiter boundaries. If there are no instances of the delimiter then string is returned as a single ordinal. Dependency: SQL2012+ Based on: Version: 1

Syntax: select * from dbo.DelimitedSplit_Max_Multi_LAG('<input string>','<delimiter>') Example: select * from dbo.DelimitedSplit_Max_Multi_LAG('Apple,Orange,Pear,Grape',',') select a.name, b.ItemNumber, b.Item from sys.objects a cross apply dbo.DelimitedSplit_Max_Multi_LAG(name,'_') b

Pseudo code: 1. Find character position of delimiters - right edge as position + length of delimiter except for first position 2. Use LEAD to pull up the next sequence position and substring the piece from the string 3. Retun this result with a position number

Known issues: over printing delimiters are not catered for - for example the delimiter of '..' when splitting a string that includes ... is not catered for

Form of output: +----------+----+ |ItemNumber|Item| +----------+----+ | 1 | <> | +----------+----+ | n | <> | +----------+----+

*/

--===== Define I/O parameters (@pString NVARCHAR(max), @pDelimiter NVARCHAR(1000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Tableâ€ produces values from 0 up to 100,000,000... -- enough to cover a big file WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows E6(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+6 or 1,000,000 E8(N) AS (SELECT 1 FROM E6 a, E6 b), --10E+8 or 100,000,000 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8 ), cteStart(N1) AS (--==== Find right edge of delimiter SELECT t.N+case when t.N = 0 then 0 else datalength(@pDelimiter)/2 end FROM cteTally t WHERE (SUBSTRING(@pString,t.N,datalength(@pDelimiter)/2) = @pDelimiter OR t.N = 0) ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1), Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - case when s.N1=1 or count(*) over (partition by (select null)) = ROW_NUMBER() OVER(ORDER BY s.N1) then 1 else (datalength(@pDelimiter)/2) end),0)-s.N1,datalength(@pString)/2)) FROM cteStart s ; GO

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 8

bevanward - Wednesday, March 21, 2018 5:22 AM
Hi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is null
If you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan

Hi Bevan,

Thanks for the idea. I suppose I could embed the separate queries into separate data sources in SSDT vs. using a view, and run those separate queries in parallel on SSDT. For various reasons, I like embedding the logic in views and using those views as a data source in SSDT where possible. But that doesn't mean I'm married to that approach, esp. for performance gains.

I've had limited success using temp tables in SSDT, but I think there are options I can use to make this possible (persistent connections). Of course, I could just create permanent tables in my database and then drop them.

Having said that, I'm quite happy with the performance gains from Eirikur's code.

I've had good success in SSDT using the Balanced Data Distributor in other jobs, but I've never used that with a view, especially with a complex view such as mine. I wonder if I could use the BDD using my current view to achieve parallel processing? One thing to consider is the BDD doesn't guarantee any order in the target table, but that shouldn't be an issue in my processing.

When I ran a single query for performance testing, it ran in 30 mins (see above post). When I ran the data load in my Dev environment overnight, the entire view / data load ran in 32 mins. So I'm wondering if splitting out the queries into parallel processes will gain me much?

Regards,
Scott

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 9

Eirikur Eiriksson - Wednesday, March 21, 2018 9:02 AM
Hi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those
😎

Hi Eirikur,

Thanks for your code contributions to the community.

My code changes are here: https://www.sqlservercentral.com/Forums/Dialogs/PostDetails.aspx?PostID=1928604
You may also be interested in: https://www.sqlservercentral.com/Forums/Dialogs/PostDetails.aspx?PostID=1928608

Let me know if you still have problems seeing the code.

The crux of my changes are:

...

cteStart(N1) AS ( --==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) ), cteResults AS ( --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000)) FROM cteStart s ) --===== Strip leading and trailing spaces, only return non-empty results. SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), LTRIM(RTRIM(r.Item)) AS Item FROM cteResults r WHERE LEN(r.Item) > 0

In summary, I've added an extra cteResults, moved the ROW_NUMBER() derivation to the final select statement, added a WHERE clause to only return non-blank tokens, and strip leading/trailing spaces from the results.

It appears to work correctly. But, it is critical for my processing that ROW_NUMBER() corresponds to the order of the tokens in the delimited string. This appears to be the case when spot checking source vs. target data.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 10

Hi Guys,

and thanks for the nice comments, wouldn't want to take too much credit for this as it is a community (SSC or Sql Server Community) effort.

I got some suggestions:

1. The (n)varchar(max) can hold up to (2^31) - 1 or 2147483647 characters, the tally table must be able to produce that number, optimal construct would be using 15^8 or something like
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N)) , NUMS(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY @@VERSION) FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8)

2. Consider using a single character token as a delimiter and replace the multi character delimiter with the token before passing it to the function. This is straight forward as one can encapsulate the single character token function in another function.

3. One of the reasons for the LEAD/LAG versions performing so well is that they utilize the special worktable, but it has limits. If the number of rows exceed that limit, then the benefits are minimal or even negative.

😎

I've done some extensive work on this and even have an article draft with code and a test harness somewhere. Will try to find it and post back when I have the chance.

bevanward SSC Eights! Points: 855 More actions · Answer 11

Eirikur Eiriksson - Thursday, March 22, 2018 4:14 AM
Hi Guys,
and thanks for the nice comments, wouldn't want to take too much credit for this as it is a community (SSC or Sql Server Community) effort.
I got some suggestions:
1. The (n)varchar(max) can hold up to (2^31) - 1 or 2147483647 characters, the tally table must be able to produce that number, optimal construct would be using 15^8 or something like
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N)) , NUMS(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY @@VERSION) FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8)
2. Consider using a single character token as a delimiter and replace the multi character delimiter with the token before passing it to the function. This is straight forward as one can encapsulate the single character token function in another function.
3. One of the reasons for the LEAD/LAG versions performing so well is that they utilize the special worktable, but it has limits. If the number of rows exceed that limit, then the benefits are minimal or even negative.
😎
I've done some extensive work on this and even have an article draft with code and a test harness somewhere. Will try to find it and post back when I have the chance.

Hi Eirikur

Thanks for this - most appreciated.

I'll update the tally table to your new method - the order by @@version is something I haven't seen before but am sure is better then order by (select null).

The replacement prior to running a singe delimiter makes a lot of sense as it will stop the need for testing for overlapping delimiters too.

The way I use it means that I'm not sure on the data size so I'll have to start doing some more complete testing once I've made these changes.

Thanks again - the way you have collated, documented and enhanced the community work is excellent and appreciated by many!
Bevan