Split string into table, separator can be more than 1 char

  • Comments posted to this topic are about the item Split string into table, separator can be more than 1 char

  • I like your function, because you use a CTE.:-D

    I've been learning how and when to use CTE's and this is just another item that has helped me to better understand them.

    This funciton has been done before, but I haven't seen it using a CTE.

    Thanks!

  • Not a bad solution but using a recursive CTE and a multi-statement TVF this function is not very scalable.

    Here is another solution to this problem:

    CREATE function [dbo].[DelimitedSplit] (

    @pString varchar(max),

    @pDelimiter char(1)

    )

    returns table

    as

    return

    with

    a1 as (select 1 as N union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1),

    a2 as (select

    1 as N

    from

    a1 as a

    cross join a1 as b),

    a3 as (select

    1 as N

    from

    a2 as a

    cross join a2 as b),

    a4 as (select

    1 as N

    from

    a3 as a

    cross join a2 as b),

    Tally as (select top (len(@pString))

    row_number() over (order by N) as N

    from

    a4),

    ItemSplit(

    ItemOrder,

    Item

    ) as (

    SELECT

    N,

    SUBSTRING(@pDelimiter + @pString + @pDelimiter,N + 1,CHARINDEX(@pDelimiter,@pDelimiter + @pString + @pDelimiter,N + 1) - N - 1)

    FROM

    Tally

    WHERE

    N < LEN(@pDelimiter + @pString + @pDelimiter)

    AND SUBSTRING(@pDelimiter + @pString + @pDelimiter,N,1) = @pDelimiter --Notice how we find the delimiter

    )

    select

    row_number() over (order by ItemOrder) as ItemID,

    Item

    from

    ItemSplit

  • It's a handy function, and a well written article. But, although I much prefer CTEs to subqueries, recursive CTEs are notoriously slow for this type of problem. Use of a tally table (also known as number table) for parsing delimited strings will do the same work in a fraction of the time, and there are even faster techniques in certain situations.

    If you are not familiar with tally tables, it is really worth your while to become familiar with them. Jeff Moden's excellent article on the subject can be found here[/url].

    For an amazing group discussion about parsing delimited strings, look here.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi,

    You're right, using the Tally table is faster:

    using this parameter:

    DECLARE @Parameter VARCHAR(8000)

    SET @Parameter = REPLICATE('Element01,Element02,Element03,Element04,Element05,',159)

    So I got next execution times (running the same script several times):

    - my function: 120 ~ 180 miliseconds

    - Lynn's function: 90 ~ 140 miliseconds

    - Jeff's function (a function created based on Jeff's example, using Tally table (already created on my database) : 90 ~ 140 miliseconds

    Thanks!

    Thanks.

  • This is the function that I am currently using, is it more efficient than the CTE ?

    create FUNCTION [dbo].[func_CreateClientTableOrdered]

    (

    @item_list as varchar(4000)

    )

    RETURNS @Items Table(OrderNo int, Item varchar(60))

    AS

    BEGIN

    declare @separator as char(1)

    ,@position as int

    ,@item as varchar(60)

    ,@OrderNoas int

    set @OrderNo = 0

    SET @separator = ','

    SET @item_list = @item_list + @separator

    SELECT @position = patindex('%'+@separator+'%', @item_list)

    WHILE @position <> 0

    BEGIN

    SELECT @Item = left(@item_list, @position - 1)

    SELECT @item_list = stuff(@item_list, 1, @position, '')

    SELECT @position= patindex('%'+@separator+'%', @item_list)

    INSERT @Items(OrderNo, Item)

    SELECT @OrderNo, @Item

    SELECT @OrderNo= @OrderNo + 1

    END

    RETURN

    END

    Can someone give me a good way of proving which function would perform better?

    Or point me to an article on performance tuning functions. Thanks, Paul

  • phegel (11/20/2009)


    This is the function that I am currently using, is it more efficient than the CTE ?

    create FUNCTION [dbo].[func_CreateClientTableOrdered]

    (

    @item_list as varchar(4000)

    )

    RETURNS @Items Table(OrderNo int, Item varchar(60))

    AS

    BEGIN

    declare @separator as char(1)

    ,@position as int

    ,@item as varchar(60)

    ,@OrderNoas int

    set @OrderNo = 0

    SET @separator = ','

    SET @item_list = @item_list + @separator

    SELECT @position = patindex('%'+@separator+'%', @item_list)

    WHILE @position <> 0

    BEGIN

    SELECT @Item = left(@item_list, @position - 1)

    SELECT @item_list = stuff(@item_list, 1, @position, '')

    SELECT @position= patindex('%'+@separator+'%', @item_list)

    INSERT @Items(OrderNo, Item)

    SELECT @OrderNo, @Item

    SELECT @OrderNo= @OrderNo + 1

    END

    RETURN

    END

    Can someone give me a good way of proving which function would perform better?

    Or point me to an article on performance tuning functions. Thanks, Paul

    First, look up in this thread, I posted a function that should perform better. Also, there is a link to another thread that contains a very good discussion on string parsing.

    I can see two issues with your function. One, it is using a multi-statement TVF. Second, it is using a while loop. Both of these will keep your function from scaling well.

  • halford13 (11/20/2009)


    Hi,

    You're right, using the Tally table is faster:

    using this parameter:

    So I got next execution times (running the same script several times):

    - my function: 120 ~ 180 miliseconds

    - Lynn's function: 90 ~ 140 miliseconds

    - Jeff's function (a function created based on Jeff's example, using Tally table (already created on my database) : 90 ~ 140 miliseconds

    Thanks!

    Thanks.

    I think you'll find that my function uses Jeff's split method, it just incorporates the tally table into the function itself.

  • Hi Lynn,

    Yes, I saw you used Jeff's split method.

  • Erland Sommarskog still knows it best:

    http://www.sommarskog.se/arrays-in-sql-2005.html

    😛

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply