Convert a string list to TABLE with single CTE query

  • Comments posted to this topic are about the item Convert a string list to TABLE with single CTE query

  • This is a very nice function. But, to even consider its use, it would be helpful to know how it compares in terms of execution performance to other similar functions that have been written before.

    For example I have 2 versions (version 1, version 2 below) of this split function. Would be great if someone is able to provide some metrics and a nice writeup. I take no credit for writing either of them. Beings with higher SQL mojo deserve the credit. (version 3 is code from the article wrapped in a function).

    NOTE: It also looks like version 2 and 3 are very similar. Version 2 will not work with a recursion level greater than 100. I use it only on short CSV strings. Version 1 is still my goto function.

    simple test run with 110 items.

    SET STATISTICS TIME ON;

    SELECT * FROM dbo.split('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)

    SET STATISTICS TIME OFF;

    SET STATISTICS TIME ON;

    SELECT * FROM dbo.split2('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)

    SET STATISTICS TIME OFF;

    SET STATISTICS TIME ON;

    SELECT * FROM dbo.split3('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)

    SET STATISTICS TIME OFF;

    version 1: <=== this version uses a "Tally Table" named dbo.tbl_Number. You need to create one for this version to work. I've included a stored procedure at the end that will create this for you.

    CREATE FUNCTION [dbo].[split]

    (

    @DelimitedString AS VARCHAR(MAX),

    @Delimiter AS CHAR = N','

    )

    RETURNS TABLE

    AS

    RETURN(

    SELECTSUBSTRING(

    @Delimiter + @DelimitedString + @Delimiter,

    NumberId + 1,

    CHARINDEX(@Delimiter, @Delimiter + @DelimitedString + @Delimiter, NumberId + 1) - NumberId - 1

    ) AS Token

    FROMdbo.tbl_Number WITH (NOLOCK)

    WHERENumberId >= 1

    ANDNumberId < LEN(@Delimiter + @DelimitedString + @Delimiter) - 1

    ANDSUBSTRING(@Delimiter + @DelimitedString + @Delimiter, NumberId, 1) = @Delimiter

    )

    version 2:

    CREATE FUNCTION [dbo].[split2]

    (

    @s-2VARCHAR(512),

    @sepCHAR(1) = N','

    )

    RETURNS TABLE

    AS

    RETURN (WITH Pieces(pn, start, stop) AS (

    SELECT1,

    1,

    CHARINDEX(@sep, @s-2)

    UNION ALL

    SELECTpn + 1,

    stop + 1,

    CHARINDEX(@sep, @s-2, stop + 1)

    FROMPieces

    WHEREstop > 0

    )

    SELECTpn,

    SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS s

    FROMPieces

    )

    version 3 <=== this is the same code provided in the article wrapped in a table-function

    CREATE FUNCTION [dbo].[split3]

    (

    @DelimitedString AS VARCHAR(MAX),

    @Delimiter AS CHAR = N','

    )

    RETURNS @Result TABLE(

    TokenVARCHAR(128)

    )

    AS

    BEGIN

    WITH rep(item, delim) AS

    (

    SELECT@DelimitedString AS item,

    @Delimiter AS delim

    UNION ALL

    SELECTLEFT(item, CHARINDEX(delim, item, 1) - 1) AS item,

    delim

    FROMrep

    WHERECHARINDEX(delim, item, 1) > 0

    UNION ALL

    SELECTRIGHT(item, LEN(item) - CHARINDEX(delim, item, 1)) AS item,

    delim

    FROMrep

    WHERECHARINDEX(delim, item, 1) > 0

    )

    INSERT INTO @Result(Token)

    SELECTitem

    FROMrep

    WHERECHARINDEX(delim, item, 1) = 0

    OPTION(MAXRECURSION 0); -- Needed to handle recursion levels greater than 100. By default SQL Server limits recursion levels to 100.

    RETURN;

    END

    Tally Table

    /*EXEC dbo.CreateNumbersTable DEFAULT

    SELECT * FROM dbo.tbl_Number

    \*/

    CREATE PROCEDURE [dbo].[CreateNumbersTable]

    @EndingNumberINT = 65535

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE @IsDeleteTableBIT

    SELECT @IsDeleteTable = dbo.ufn_IsTableExist('tbl_Number')

    IF @IsDeleteTable = 1

    DROP TABLE dbo.tbl_Number

    CREATE TABLE dbo.tbl_Number

    (

    NumberId INT IDENTITY(1,1) NOT NULL,

    CONSTRAINT PK_tbl_Number PRIMARY KEY CLUSTERED

    (

    NumberId ASC

    ) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    WHILE COALESCE(SCOPE_IDENTITY(), 0) < @EndingNumber

    BEGIN

    INSERT dbo.tbl_Number DEFAULT VALUES

    END

    END

    /*SELECT dbo.ufn_IsTableExist('tbl_Number')

    \*/

    ALTER FUNCTION [dbo].[ufn_IsTableExist]

    (

    @TableName SYSNAME

    )

    RETURNS BIT

    AS

    BEGIN

    RETURN(SELECTCASE WHEN EXISTS(SELECT1

    FROMINFORMATION_SCHEMA.TABLES

    WHERETABLE_TYPE = N'BASE TABLE'

    ANDTABLE_NAME = @TableName)

    THEN 1

    ELSE 0

    END);

    END

  • Sandor,

    Have a look at Jeff Moden's splitter testing. You'll find that the inline tally table version is quite fast without needing a helper table and he's done all the testing you need.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Another article: http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/

    /* Anything is possible but is it worth it? */

  • Gatekeeper,

    Thank you for the links.

    -sandor

  • Try this - it works faster for me than using a tally table - it converts the string to xml, then selects from the XML

    CREATE FUNCTION [dbo].[Split]

    (

    @delimited nvarchar(max),

    @delimiter nvarchar(100)

    ) RETURNS @t TABLE

    (

    val nvarchar(max)

    )

    AS

    BEGIN

    declare @xml xml

    set @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

    insert into @t(val)

    select

    r.value('.','nvarchar(max)') as item

    from @xml.nodes('//root/r') as records(r)

    RETURN

    END

  • Just tried the XML version with a short test string.

    SELECT * FROM dbo.Split(N'03680027,08563027,03682027,03677003,03683027,17948027,03702003,17948003,15636003,03620003,03299003,03014003,03679027,08563003,03013003,15636027,00781003,35879049,03682003,03683003,03679003,03680003,30441003,02747049,02748049,02749049,03300003,02981003,02981027,02981004,02982003,02982027,02982004,02983003,02983027,02983004,06921003,02974003,02974027,02974004,02975003,02975027,02975004,02973003,02973027,02973004', N',');

    Looks really good. Seems to perform just as quickly as the one I use with the Tally Table.

    Thank you.

    Sandor

  • I tried it the other day with a very long delimited list of several thousand items - a tally based function took about 3 seconds, the xml one took 0 seconds

  • Thanks for the script.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply