Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Alan.B (11/5/2013)


    Forgive me if the code to do this has already been posted but I had a requirement for a greater-than-8k splitter. I made the following changes:

    1) Changed @pString to varchar(max)

    2) Added more rows to the tally table (E8(N))

    3) Changed

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, 8000)

    This seems correct and produces the right result. It runs twice as slow (which is expected and acceptable [since it's still much, much faster than what I am replacing]). Other than the BIN COLLATION changes that MM mentioned a few pages back - does the code above look good?

    Second Question:

    I have a requirement where I need to split a value on a table that may or may not have columns with 8k characters. If there aren't rows with >8K characters I want it to use the 8K version, otherwise it would require the varchar(max) version (dbo.DelimitedSplitVcMax). Is this a good approach?

    --DDL

    CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);

    GO

    IF NOT EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)

    select s.Item

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s

    ELSE

    select s.Item

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s

    Those are mostly changes I made and have been happy with it (like you said, performance wasn't an issue as it's not used in a OLTP environment).

    /*

    See sometime for improvements to this: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Jeff Moden 2011-05-02

    */

    CREATE FUNCTION udf_StrList2Table (

    @List NVARCHAR(MAX) = N'',

    @Delimiter NCHAR(1) = N','

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    /*

    Following inline delimited spliter written by Jeff Moden.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    */

    WITH E1(N) AS

    ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    E8(N) AS

    (

    --==== Limits the number of rows right up front for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (DATALENGTH(ISNULL(@List,1))) 1 FROM E4 a, E4 b -- 100,000,000

    ),

    cteTally(N) AS

    ( --==== This provides the "zero base"

    SELECT 0 UNION ALL

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8

    ),

    cteStart(N1) AS

    ( --==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT

    ROW_NUMBER() OVER(ORDER BY s.N1) as ListPos,

    SUBSTRING(@List,s.N1,ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,DATALENGTH(ISNULL(@List,1)))) as Value

    FROM cteStart s

    GO

    GRANT SELECT ON [dbo].[udf_StrList2Table] TO [public]

    --Example: SELECT * FROM udf_StrList2Table('First,Second,Third,Fourth,Fifth,Sixth',',')

    --Example: SELECT * FROM udf_StrList2Table('First' + CHAR(9) + 'Second'+ CHAR(9) + 'Third',CHAR(9))

    With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.

    /* Anything is possible but is it worth it? */

  • Thanks!

    With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.

    That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (11/5/2013)


    With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.

    That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!

    Hi there. Well, as long as we are talking about "reasonableness" ;-), why not just use a SQLCLR split function? It has been noted in this forum, as well as in the article, that splitting strings is more efficient and more scalable in .Net / CLR. Using SQLCLR you would not have to worry about <= 8k vs > 8k in terms of either the delimited string or its elements, nor about VARCHAR vs NVARCHAR, nor about adding schema (computed column and index) to determine which of the two functions to call (and would you want to replicate that field and index to other tables in the future if you need to split values in additional places?).

    You can find numerous examples of SQLCLR code for a split function. I believe Jeff attached one from Paul White somewhere in this forum (I don't see it attached to the article), or you can use your preferred search engine to find any one of 1000 blogs on it (I believe one from Adam Machanic shows up towards the top of the list), or you can just download SQL# and have a split function within minutes without doing any coding :-).

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (11/5/2013)


    Alan.B (11/5/2013)


    With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.

    That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!

    Hi there. Well, as long as we are talking about "reasonableness" ;-), why not just use a SQLCLR split function? It has been noted in this forum, as well as in the article, that splitting strings is more efficient and more scalable in .Net / CLR. Using SQLCLR you would not have to worry about <= 8k vs > 8k in terms of either the delimited string or its elements, nor about VARCHAR vs NVARCHAR, nor about adding schema (computed column and index) to determine which of the two functions to call (and would you want to replicate that field and index to other tables in the future if you need to split values in additional places?).

    You can find numerous examples of SQLCLR code for a split function. I believe Jeff attached one from Paul White somewhere in this forum (I don't see it attached to the article), or you can use your preferred search engine to find any one of 1000 blogs on it (I believe one from Adam Machanic shows up towards the top of the list), or you can just download SQL# and have a split function within minutes without doing any coding :-).

    Take care,

    Solomon...

    Thank Solomon.

    I was actually replacing a CLR :-P. The splitter that I was using is the splitter that comes with Master Data Services (mdq.split) out of the [Microsoft.MasterDataServices.DataQuality] assembly. Our CLR integration may be getting shut off and I was being proactive. It's not the fastest CLR splitter I've ever used (though it is faster than any T-SQL splitter I have ever used*) but it's the one that Microsoft Ships with Master Data Services/Data Quality Services. No new DLL to compile, No new assembly to create, it's been well tested, know bugs have been worked out...

    * All that said, I never did a comparison test between mdq.split and delimitedsplit8k.

    Using these two variations of mdq.split (& using NVARCHAR because it's a CLR :rolleyes: )...

    CREATE FUNCTION [mdq].[Split](@input [nvarchar](4000), @separators [nvarchar](10), @removeEmpty [bit], @tokenPattern [nvarchar](4000), @mask [tinyint])

    RETURNS TABLE (

    [Sequence] [int] NULL,

    [Token] [nvarchar](4000) NULL,

    [IsValid] [bit] NULL

    ) WITH EXECUTE AS CALLER

    AS

    EXTERNAL NAME [Microsoft.MasterDataServices.DataQuality].[Microsoft.MasterDataServices.DataQuality.SqlClr].[SplitWithCheck]

    GO

    CREATE FUNCTION [mdq].[SplitVCMax](@input [nvarchar](max), @separators [nvarchar](10), @removeEmpty [bit], @tokenPattern [nvarchar](4000), @mask [tinyint])

    RETURNS TABLE (

    [Sequence] [int] NULL,

    [Token] [nvarchar](max) NULL,

    [IsValid] [bit] NULL

    ) WITH EXECUTE AS CALLER

    AS

    EXTERNAL NAME [Microsoft.MasterDataServices.DataQuality].[Microsoft.MasterDataServices.DataQuality.SqlClr].[SplitWithCheck]

    ...and my varchar(max) version of Jeff's splitter (posted earlier)

    I used this code for testing (which is similar to what I am doing):

    IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

    DECLARE @n int=5000

    SELECT n AS id,

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')+

    REPLACE(newid(),'-',',')

    AS val

    INTO #vals

    FROM tally

    WHERE n<=@n

    ALTER TABLE #vals

    ADD CONSTRAINT vals_xxx PRIMARY KEY(id);

    GO

    SET STATISTICS TIME OFF;

    SET NOCOUNT ON;

    SET STATISTICS TIME ON;

    select v.id, s.Item

    FROM #vals v

    CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s

    select v.id, s.Item

    FROM #vals v

    CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s

    SELECT v.id, s.Token

    FROM #vals v

    CROSS APPLY mdq.split(v.val,',',0,NULL,0) s;

    SELECT v.id, s.Token

    FROM #vals v

    CROSS APPLY mdq.splitVcMax(v.val,',',0,NULL,0) s;

    SET STATISTICS IO OFF;

    GO

    I got these results:

    --jeff8k

    SQL Server Execution Times:

    CPU time = 1404 ms, elapsed time = 3713 ms.

    --vcmax version

    SQL Server Execution Times:

    CPU time = 6412 ms, elapsed time = 7768 ms.

    --mdq original

    SQL Server Execution Times:

    CPU time = 858 ms, elapsed time = 5777 ms.

    --mdq vcmax

    SQL Server Execution Times:

    CPU time = 6303 ms, elapsed time = 9011 ms.

    Its worth noting that these results are adding the Latin1_General_BIN collation that Mr. Magoo demonstrated a few pages back. That said, Jeff's splitter is faster than the CLR that ships with Microsoft MDS/DQS.

    P.S. Here's a link to information about on mdq.split in case anyone was curious mdq.Split (Transact-SQL) I included a link in case anyone googled, "mdq.split"; it does not no longer comes up in google or yahoo (or Bing :laugh:) searches...

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Gatekeeper (11/5/2013)


    Alan.B (11/5/2013)


    Forgive me if the code to do this has already been posted but I had a requirement for a greater-than-8k splitter. I made the following changes:

    1) Changed @pString to varchar(max)

    2) Added more rows to the tally table (E8(N))

    3) Changed

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, 8000)

    This seems correct and produces the right result. It runs twice as slow (which is expected and acceptable [since it's still much, much faster than what I am replacing]). Other than the BIN COLLATION changes that MM mentioned a few pages back - does the code above look good?

    Second Question:

    I have a requirement where I need to split a value on a table that may or may not have columns with 8k characters. If there aren't rows with >8K characters I want it to use the 8K version, otherwise it would require the varchar(max) version (dbo.DelimitedSplitVcMax). Is this a good approach?

    --DDL

    CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);

    GO

    IF NOT EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)

    select s.Item

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s

    ELSE

    select s.Item

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s

    Those are mostly changes I made and have been happy with it (like you said, performance wasn't an issue as it's not used in a OLTP environment).

    /*

    See sometime for improvements to this: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Jeff Moden 2011-05-02

    */

    CREATE FUNCTION udf_StrList2Table (

    @List NVARCHAR(MAX) = N'',

    @Delimiter NCHAR(1) = N','

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    /*

    Following inline delimited spliter written by Jeff Moden.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    */

    WITH E1(N) AS

    ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    E8(N) AS

    (

    --==== Limits the number of rows right up front for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (DATALENGTH(ISNULL(@List,1))) 1 FROM E4 a, E4 b -- 100,000,000

    ),

    cteTally(N) AS

    ( --==== This provides the "zero base"

    SELECT 0 UNION ALL

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8

    ),

    cteStart(N1) AS

    ( --==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT

    ROW_NUMBER() OVER(ORDER BY s.N1) as ListPos,

    SUBSTRING(@List,s.N1,ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,DATALENGTH(ISNULL(@List,1)))) as Value

    FROM cteStart s

    GO

    GRANT SELECT ON [dbo].[udf_StrList2Table] TO [public]

    --Example: SELECT * FROM udf_StrList2Table('First,Second,Third,Fourth,Fifth,Sixth',',')

    --Example: SELECT * FROM udf_StrList2Table('First' + CHAR(9) + 'Second'+ CHAR(9) + 'Third',CHAR(9))

    With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.

    Alan - You may want to also consider something like this:

    --DDL

    CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);

    GO

    SELECT s.Item

    INTO #Temp

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s

    WHERE DATALENGTH(val) <= 8000;

    --IF EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)

    INSERT INTO #Temp

    SELECT s.Item

    FROM tbl v

    CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s

    WHERE DATALENGTH(val) > 8000;

    SELECT Item

    FROM #Temp;

    Which would apply the correct splitter to the task. Or use Solomen's suggestion.

    Edit: Using IF EXISTS (as commented out) would be optional.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks Dwain!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • You can go one step further with Dwain's code and remove the table and take the two selects and combine them into a union all.

    /* Anything is possible but is it worth it? */

  • Just to make sure that everyone knows where I stand on this...

    There is absolutely no doubt in my mind that the SQLCLR version, like the one Paul White wrote for me for this article, is the best way to go. It doesn't have the performance problem that occurs when you join a Tally structure to a MAX datatype, it handles > 8000 characters as effeciently as it does <= 8000 characters, and it'll take either VARCHAR or NVARCHAR in stride without having to make any special consideration.

    There ARE, however, shops that still won't allow SQLCLR and, if you can operate within the restrictions of the DelimitedSplit8K function, it's the fastest TSQL-only function that you're likely to run across and makes a decent, if not single purposed, second to the SQLCLR function.

    To be sure, DelimitedSplit8K and its various permutations weren't designed to be uber flexible. They each were designed (like many intrinsic functions) to do one and only one thing well... split a string based on a single character delimiter using only TSQL.

    I also want to thank all of the people that have put time and effort into this community-designed function over the years. As has been suggested in this discussion, I'm doing some additional testing (when I have the time) with adding the binary collation as well as a slightly different take on the cCTE for creating the Tally structure to see if we can get a bit more performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/6/2013)


    I'm doing some additional testing (when I have the time) with adding the binary collation as well as a slightly different take on the cCTE for creating the Tally structure to see if we can get a bit more performance.

    Great to hear! Idle hands are the devil's workshop.

    Can't wait to see the next release of one of my favorite toolbox FUNCTIONS.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hey Moden,

    Just tried this for the first time. Bravo. Clean and very speedy!

    Thanks

    John.

  • Thanks for your post. I wonder if you considered using the LEAD function to improve your split. The LEAD function is significantly faster than CHARINDEX. Below is a function that performs much better especially when handling large number of rows. Also, my split handles multi-character delimiters.

    CREATE FUNCTION [dbo].[BetterSplit]

    (@L NVARCHAR(MAX),@D NVARCHAR(100))

    RETURNS TABLE AS RETURN

    WITH A(N)AS(SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1)

    ,B(N)AS(SELECT 1FROM A,A B),C(N)AS(SELECT 1FROM B A,B),D(N)AS(SELECT 1FROM C A,C)

    ,T(N)AS(SELECT TOP(ISNULL(LEN(@L),0))ROW_NUMBER()OVER(ORDER BY(SELECT 1))FROM D)

    ,S AS(SELECT S=1UNION ALL SELECT S=N+LEN(@D)FROM T WHERE SUBSTRING(@L,N,LEN(@D))=@D)

    ,L AS(SELECT S,L=LEAD(S,1,LEN(@L)+LEN(@D)+1)OVER(ORDER BY S)-S-LEN(@D)FROM S)

    SELECT Value=LTRIM(RTRIM(SUBSTRING(@L,S,L)))FROM L

  • Miller (4/13/2014)


    Thanks for your post. I wonder if you considered using the LEAD function to improve your split. The LEAD function is significantly faster than CHARINDEX. Below is a function that performs much better especially when handling large number of rows. Also, my split handles multi-character delimiters.

    CREATE FUNCTION [dbo].[BetterSplit]

    (@L NVARCHAR(MAX),@D NVARCHAR(100))

    RETURNS TABLE AS RETURN

    WITH A(N)AS(SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1)

    ,B(N)AS(SELECT 1FROM A,A B),C(N)AS(SELECT 1FROM B A,B),D(N)AS(SELECT 1FROM C A,C)

    ,T(N)AS(SELECT TOP(ISNULL(LEN(@L),0))ROW_NUMBER()OVER(ORDER BY(SELECT 1))FROM D)

    ,S AS(SELECT S=1UNION ALL SELECT S=N+LEN(@D)FROM T WHERE SUBSTRING(@L,N,LEN(@D))=@D)

    ,L AS(SELECT S,L=LEAD(S,1,LEN(@L)+LEN(@D)+1)OVER(ORDER BY S)-S-LEN(@D)FROM S)

    SELECT Value=LTRIM(RTRIM(SUBSTRING(@L,S,L)))FROM L

    Wonderful suggestion and, yes, I have indeed considered it, but someone already beat me to it. Please see the following article.

    http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/13/2014)


    Miller (4/13/2014)


    Thanks for your post. I wonder if you considered using the LEAD function to improve your split. The LEAD function is significantly faster than CHARINDEX. Below is a function that performs much better especially when handling large number of rows. Also, my split handles multi-character delimiters.

    CREATE FUNCTION [dbo].[BetterSplit]

    (@L NVARCHAR(MAX),@D NVARCHAR(100))

    RETURNS TABLE AS RETURN

    WITH A(N)AS(SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1)

    ,B(N)AS(SELECT 1FROM A,A B),C(N)AS(SELECT 1FROM B A,B),D(N)AS(SELECT 1FROM C A,C)

    ,T(N)AS(SELECT TOP(ISNULL(LEN(@L),0))ROW_NUMBER()OVER(ORDER BY(SELECT 1))FROM D)

    ,S AS(SELECT S=1UNION ALL SELECT S=N+LEN(@D)FROM T WHERE SUBSTRING(@L,N,LEN(@D))=@D)

    ,L AS(SELECT S,L=LEAD(S,1,LEN(@L)+LEN(@D)+1)OVER(ORDER BY S)-S-LEN(@D)FROM S)

    SELECT Value=LTRIM(RTRIM(SUBSTRING(@L,S,L)))FROM L

    Wonderful suggestion and, yes, I have indeed considered it, but someone already beat me to it. Please see the following article.

    [/quote-1]

    Broken or nonexistent link.

  • Thanks Jeff,

    I think you forgot to add the link in your quote.

  • Lynn Pettis (4/13/2014)


    Jeff Moden (4/13/2014)


    Miller (4/13/2014)


    Thanks for your post. I wonder if you considered using the LEAD function to improve your split. The LEAD function is significantly faster than CHARINDEX. Below is a function that performs much better especially when handling large number of rows. Also, my split handles multi-character delimiters.

    CREATE FUNCTION [dbo].[BetterSplit]

    (@L NVARCHAR(MAX),@D NVARCHAR(100))

    RETURNS TABLE AS RETURN

    WITH A(N)AS(SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1UNION ALL SELECT 1)

    ,B(N)AS(SELECT 1FROM A,A B),C(N)AS(SELECT 1FROM B A,B),D(N)AS(SELECT 1FROM C A,C)

    ,T(N)AS(SELECT TOP(ISNULL(LEN(@L),0))ROW_NUMBER()OVER(ORDER BY(SELECT 1))FROM D)

    ,S AS(SELECT S=1UNION ALL SELECT S=N+LEN(@D)FROM T WHERE SUBSTRING(@L,N,LEN(@D))=@D)

    ,L AS(SELECT S,L=LEAD(S,1,LEN(@L)+LEN(@D)+1)OVER(ORDER BY S)-S-LEN(@D)FROM S)

    SELECT Value=LTRIM(RTRIM(SUBSTRING(@L,S,L)))FROM L

    Wonderful suggestion and, yes, I have indeed considered it, but someone already beat me to it. Please see the following article.

    [/quote-1]

    Broken or nonexistent link.

    Thanks, Lynn. I've repaired the link.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 556 through 570 (of 990 total)

You must be logged in to reply to this topic. Login to reply