Split string using XML

  • Comments posted to this topic are about the item Split string using XML

    --Divya

  • Very useful.

  • what about performance?

  • Good. Different approach.

  • Be aware of the possibility of markup characters within your data.

    Eg, the following split will fail:

    Declare @xml as xml,@str as varchar(100),@delimiter as varchar(10)

    SET @STR='A4,C,D,E'

    SET @delimiter =','

    SET @xml = cast((''+replace(@str,@delimiter ,'')+'') as xml)

  • Hi all,

    Its a new way to split the string using XML. I have written a SQL function to split the string without using XML. Please refer the below code...

    CREATE function Split_fn

    (

    @split_string varchar(max),

    @deli_char varchar(3)

    )

    returns @list table

    (

    SeqNo int,

    SplitString varchar(max)

    )

    as

    begin

    declare @from_loc int

    declare @to_loc int

    if charindex(@deli_char,@split_string,0) 0

    begin

    select @from_loc = 0

    select @to_loc = charindex(@deli_char,@split_string,0)

    end

    if charindex(@deli_char,@split_string,0) <= 0

    begin

    select @to_loc = null

    end

    while @to_loc is not null

    begin

    if substring(@split_string,@from_loc, @to_loc - @from_loc) ''

    begin

    insert into @list(seqno, SplitString)

    select isnull(max(seqno),0) + 1, substring(@split_string,@from_loc, @to_loc - @from_loc)

    from @list

    end

    select @from_loc = charindex(@deli_char,@split_string,@from_loc+len(@deli_char)) + len(@deli_char)

    select @to_loc = charindex(@deli_char,@split_string,@from_loc)

    if @to_loc = 0

    begin

    if substring(@split_string,@from_loc, (len(@split_string) - @from_loc) + len(@deli_char)) ''

    begin

    insert into @list(seqno, SplitString)

    select isnull(max(seqno),0) + 1, substring(@split_string,@from_loc, (len(@split_string) - @from_loc) + len(@deli_char))

    from @list

    end

    select @to_loc = null

    end

    end

    return

    end

    go

    select * from dbo.split_fn('raja,ravi,prabhu',',')

    Rafidheen.M

  • This technique was documented by Erland Sommarskog, SQL Server MVP in 2004. The XML and other methods can be found in the classic article "Arrays and Lists in SQL Server".

    For SQL Server 2000, see http://www.sommarskog.se/arrays-in-sql-2000.html and for SQL Server 2005, see http://www.sommarskog.se/arrays-in-sql-2005.html

    SQL = Scarcely Qualifies as a Language

  • What about using Tally tables to do the same:

    http://www.sqlservercentral.com/articles/T-SQL/62867/

  • We did performance and scaling test to split the comma separated string value using XML query and SQL function.

    The plan was better with XML split than SQL function; however XML query performance degraded when number of concurrent users increase. SQL function did better in scalability test.

  • We did performance and scaling test to split the comma separated string value using XML query and SQL function.

    The plan was better with XML split than SQL function; however XML query performance degraded when number of concurrent users increase. SQL function did better in scalability test.

  • Very handy - thanks for the article and examples Divya. And thanks for the performance info Senthilnathan.

  • For splitting delimited lists I really like Jeff Moden's approach.

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    Depending on how you use XML as was previously mentioned can cause an issue with the special XML characters. You would probably want to do an initial select to get it encoded properly possibly nesting it inside your code. Here is an example showing the characters getting entity encoded.:

    DECLARE @data table(

    someData varchar(255) NOT NULL PRIMARY KEY);

    INSERT INTO @data (someData) VALUES ('SpecialChars, , & ');

    INSERT INTO @data (someData) VALUES ('just, regular');

    SELECT d.someData

    FROM @data d

    FOR XML PATH(''), TYPE

  • Cool concept and well written article, but this is not the best way to split a string.

    I've performance tests on this and the SQL while loop.

    The XML version seems slick at first, but slows down terribly when size of the string increases.

    And Jeff's tally method leaves the while loop far behind.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

  • The fastest way to do this is to

    1 use a CLR scalar function to convert the delimitted string as a fixed width string.

    2 split it using a tally table and substring

    CREATE FUNCTION [dbo].[FnSplitQuick](@str nvarchar(max), @maxwidth int, @delimitter varchar(1))

    RETURNS TABLE

    RETURN

    (

    WITH dt as (

    select

    Data,

    Offset = case when @maxwidth > 0 then 0 else 10 end,

    Width = case when @maxwidth > 0 then @maxwidth else left(data,10) end,

    MaxN = (len(data) - 2 - case when @maxwidth > 0 then 0 else 10 end) / case when @maxwidth > 0 then @maxwidth else left(data,10) end

    from ( select data = dbsystem.dbo.[fnConvertToFixedWidth](isnull(@str,'')+@delimitter+'a', @maxwidth, @delimitter) ) d

    )

    select

    Idx = N,

    Value = rtrim(substring(data,N*Width+1 + Offset,Width))

    from dt

    inner join dbsystem..tally n

    on n<= MaxN

    )

    and the clr

    [Microsoft.SqlServer.Server.SqlFunction()]

    public static SqlString ConvertToFixedFn(SqlString str, int padWidth, SqlString delimitter) {

    // split

    string[] arrStr = str.Value.Split(delimitter.Value[0]);

    StringBuilder sb = new StringBuilder();

    bool addWidth = (padWidth == 0);

    if (addWidth) {

    padWidth = 1;

    for (int i = 0; i padWidth) {

    padWidth = arrStr.Length;

    }

    }

    }

    if (addWidth) {

    sb.Append(padWidth.ToString().PadRight(10));

    }

    foreach (string item in arrStr) {

    sb.Append(item.PadRight(padWidth));

    }

    return new SqlString(sb.ToString());

    }

  • Very cool concept ....

    For all interested in performance aspect, XML data types are powerful but expensive. For XML queries, execution plan means very little. As the XML payload/ nodes increase or node iteration increases, LOB parsing becomes extremely heavy. Processing raw XML without indexes or schema applied on it, is similar to a heap in concept but worse because XML parsing adds a heavy layer. Schema helps with the read-aheads and Iops. Adding indexes helps with the lookup. Caveat with the indexes, needs sufficient head room for growth. Some of the benchmarks from my a prior project, the index size is roughly 10-12 times the data size for a 500+ node XML. Here is a sample benchmark, 200 node XML of roughly 69K size has a 810K index size. The size of the data matters as well.

    Without taking away the spotlight from Divya's cool technique, if there are numerous XML nodes I would

    1. pin the XML data to physical table and column

    2. create a schema and apply it on the XML column

    3. make the table transient to save on disk space. This means maintenance to defrag

    4. apply primary and at least secondary PATH index

    5. maintain a seperate LUN for the table, if using the second example to split numerous rows in the table

Viewing 15 posts - 1 through 15 (of 49 total)

You must be logged in to reply to this topic. Login to reply