Split strings alternative to XML

  • Hi guys,

    For some performance considerations, we would like to try alternative to the code below.

    Anyone could help on how to get this job done using CTE, temp table or I don not know...

    Actually, I am not a dev guy at all ::-)

    Thanks,

    Mack

    CREATE FUNCTION [dbo].[fn_Split](@data NVARCHAR(MAX), @delimiter NVARCHAR(5))

    RETURNS @t TABLE (Idx int identity (1,1), data NVARCHAR(max))

    AS

    BEGIN

    DECLARE @textXML XML;

    SELECT @textXML = CAST('<d>' + REPLACE(@data, @delimiter, '</d><d>') + '</d>' AS XML);

    INSERT INTO @t(data)

    SELECT T.split.value('.', 'nvarchar(max)') AS data

    FROM @textXML.nodes('/d') T(split)

    RETURN

  • Check out the resources with this article: http://www.sqlservercentral.com/articles/Tally+Table/72993/. You should find a CLR splitter that will meet your needs.

    Be sure to read the article and the discussion with it.

  • Thanks Lynn,

    ...I have gone through the article, too much complicated to me and definitely not my stuff...

    The thing is I know the function above is killing perf, but I do not know how to write something else despite my readings.

    The goal is to submit some results to the dev team and show how crappy then can produce their coding when not being concerned by global performance...

    If anyone can "translate" this little function, ideally not in CLR, that would be helpful 🙂

  • Does your function have to split a string declared as nvarchar(max)?

  • Not sure this is absolutely necessary, maybe varchar(some values) could fit, but as of now, I do not know "where" they could use this function to split "what" amout of data...

    It could some comments in a text box as it is commercial app...

    nvarchar(max) type should be tested at least to compare performance.

  • The two T-SQL functions, DelimitedSplit8K and DelimitedSplitN4K, that are attached to the article I referred you to are optimized to split varchar(8000) and nvarchar(4000) strings.

    I would read the article and discussion again, as use these functions on appropriate sized strings to see how they work.

    For string longer than varchar(8000) and nvarchar(4000) you will probably want to go to a CLR splitter as it will out perform a T-SQL based splitter.

  • I am trying...but I do not succeed in adapting the 8k function.

    Don't know how to "integrate" the <d>, <d/>, etc stuff as they exist in the initial function...

    I really suck

  • I have got it...

    create function Split_fnOK

    (

    @datavarchar(8000),

    @deli_char varchar(3)

    )

    returns @list table

    (

    Idxint,

    datavarchar(8000)

    )

    as

    begin

    declare @from_locint

    declare @to_locint

    if charindex(@deli_char,@data,0) <= 0

    begin

    insert into @list(Idx, data) values (1, @data)

    return

    end

    if charindex(@deli_char,@data,0) > 0

    begin

    select @from_loc= 0

    select @to_loc= charindex(@deli_char,@data,0)

    end

    if charindex(@deli_char,@data,0) <= 0

    begin

    select @to_loc = null

    end

    while @to_loc is not null

    begin

    if substring(@data,@from_loc, @to_loc - @from_loc) <> ''

    begin

    insert into @list(Idx, data)

    select isnull(max(Idx),0) + 1, substring(@data,@from_loc, @to_loc - @from_loc)

    from@list

    end

    select @from_loc = charindex(@deli_char,@data,@from_loc+len(@deli_char)) + len(@deli_char)

    select @to_loc = charindex(@deli_char,@data,@from_loc)

    if @to_loc = 0

    begin

    if substring(@data,@from_loc, (len(@data) - @from_loc) + len(@deli_char)) <> ''

    begin

    insert into @list(Idx, data)

    select isnull(max(Idx),0) + 1, substring(@data,@from_loc, (len(@data) - @from_loc) + len(@deli_char))

    from@list

    end

    select @to_loc = null

    end

    end

    return

    end

    go

    With that set up, I have got much better stats time/cpu and I/O. Good.

    Cheers

  • MackF (3/27/2013)


    I have got it...

    ...

    With that set up, I have got much better stats time/cpu and I/O. Good.

    Cheers

    This is going to be a LOT slower than the methods Lynn suggested. Actually orders of magnitude slower. I know you have been pointed to it before but check out the article in my signature about splitting strings. Then compare your looping function to DelimitedSplit8K and see which one is faster. 😛

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • MackF (3/27/2013)


    I have got it...

    create function Split_fnOK

    (

    @datavarchar(8000),

    @deli_char varchar(3)

    )

    returns @list table

    (

    Idxint,

    datavarchar(8000)

    )

    as

    begin

    declare @from_locint

    declare @to_locint

    if charindex(@deli_char,@data,0) <= 0

    begin

    insert into @list(Idx, data) values (1, @data)

    return

    end

    if charindex(@deli_char,@data,0) > 0

    begin

    select @from_loc= 0

    select @to_loc= charindex(@deli_char,@data,0)

    end

    if charindex(@deli_char,@data,0) <= 0

    begin

    select @to_loc = null

    end

    while @to_loc is not null

    begin

    if substring(@data,@from_loc, @to_loc - @from_loc) <> ''

    begin

    insert into @list(Idx, data)

    select isnull(max(Idx),0) + 1, substring(@data,@from_loc, @to_loc - @from_loc)

    from@list

    end

    select @from_loc = charindex(@deli_char,@data,@from_loc+len(@deli_char)) + len(@deli_char)

    select @to_loc = charindex(@deli_char,@data,@from_loc)

    if @to_loc = 0

    begin

    if substring(@data,@from_loc, (len(@data) - @from_loc) + len(@deli_char)) <> ''

    begin

    insert into @list(Idx, data)

    select isnull(max(Idx),0) + 1, substring(@data,@from_loc, (len(@data) - @from_loc) + len(@deli_char))

    from@list

    end

    select @to_loc = null

    end

    end

    return

    end

    go

    With that set up, I have got much better stats time/cpu and I/O. Good.

    Cheers

    Show us 2 lines of sample data to split so we can show you how to blow the doors off of everything but a CLR and come pretty close to that, to boot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am now testing this one:

    CREATE FUNCTION fnSplitString(@str nvarchar(max),@sep nvarchar(max))

    RETURNS TABLE

    AS

    RETURN

    WITH a AS(

    SELECT CAST(0 AS BIGINT) as idx1,CHARINDEX(@sep,@str) idx2

    UNION ALL

    SELECT idx2+1,CHARINDEX(@sep,@str,idx2+1)

    FROM a

    WHERE idx2>0

    )

    SELECT SUBSTRING(@str,idx1,COALESCE(NULLIF(idx2,0),LEN(@str)+1)-idx1) as value

    FROM a

    Execution is more simple, and performs same than above with less CPU time consumed for optimizations then...

    Ma data to parse could be just like that 'toto.toto,toto.toto,toto.'

    Simple 🙂

  • MackF (3/29/2013)


    I am now testing this one:

    CREATE FUNCTION fnSplitString(@str nvarchar(max),@sep nvarchar(max))

    RETURNS TABLE

    AS

    RETURN

    WITH a AS(

    SELECT CAST(0 AS BIGINT) as idx1,CHARINDEX(@sep,@str) idx2

    UNION ALL

    SELECT idx2+1,CHARINDEX(@sep,@str,idx2+1)

    FROM a

    WHERE idx2>0

    )

    SELECT SUBSTRING(@str,idx1,COALESCE(NULLIF(idx2,0),LEN(@str)+1)-idx1) as value

    FROM a

    Execution is more simple, and performs same than above with less CPU time consumed for optimizations then...

    Ma data to parse could be just like that 'toto.toto,toto.toto,toto.'

    Simple 🙂

    From your original post you stated that you wanted to find an alternative to an XML splitter.

    For some performance considerations, we would like to try alternative to the code below.

    I don't quite understand why you keep looking at slower alternatives than the one suggested. This is like going to a car dealer and telling them you want to have the fastest car on the lot. The guy show you the Lamborghini, you smile and nod your head and walk over the Pinto. You test drive it and it is in fact faster than the Pacer you are driving currently. The salesman reminds you that for the same price ($0 in t-sql land) that you could drive the MUCH MUCH MUCH faster Aventador but you say you want to keep testing out the Pinto. It just doesn't make sense. Read the article about the tally table splitter, look at the performance tests. There is some code that is almost identical to the one you posted. It was tested in that article and it was found to be magnitudes slower. Stop driving the Pinto and accept the free keys to your new high performance sports car.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply