• Jeff,

    I would like to ask your opinion on the version of the split I recently wrote. It is implemented as a stored proc and does not have any limitation on the size of the string to split. Performance tests that I ran show that feeding text consisting of 10,000 sentences (dot delimited) to it (about 500,000 characters in total or 1 MB in datalength) takes about 0.3 seconds to execute, and expanding the size 10-fold to 100,000 sentences (5 mln characters in total or 10 MB in datalength) increases the execution time also about 10-fold to a total of 3 seconds. This is on a 2-year old desktop with Windows XP SP3, 2 GB of RAM duo-core CPU. The idea is to utilize the engine's ability to process xml:

    create proc dbo.usp_DelimitedSplit

    (

    @text nvarchar(max),

    @delimiter char(1),

    @entitize bit = 1

    )

    as

    begin

    declare @xml xml;

    if @entitize = 1 set @text = (select @text for xml path(''));

    set @xml = '<r>' + replace(@text, @delimiter, '</r><r>') + '</r>';

    select

    row_number() over (order by (select null)) item_number,

    item.value('text()[1]', 'varchar(max)') item_value

    from @xml.nodes('//r') R(item);

    end;

    go

    As you can see, the procedure is very simple. The optional @entitize parameter deserves an explanation though. It is there to ensure that if the text contains some "xml unfriendly" characters then these are entitized by default. However, it leaves the option to the calling code to pass 0 as @entitize value should the caller be aware that the text has already undergone entitizing treatment prior to the procedure execution. For example,

    declare @delimiter char(1);

    declare @text nvarchar(max);

    select @delimiter = '.', @text = 'a<x>&z.b.c';

    exec dbo.usp_DelimitedSplit @text, @delimiter;

    automatically handles unfriendly characters returning

    item_number item_value

    -------------------------------

    1 a<x>&z

    2 b

    3 c

    Here is the test I used to start from scratch, pad the stirng with 10,000 sentences and then call the procedure to split it. The number 10000 is hard-coded, but can be changed to whatever arbitrary number:

    declare @delimiter char(1);

    declare @text nvarchar(max);

    set @delimiter = '.';

    -- populate @text like this

    -- "This is a<&>/><x>& part of the text to split into rows_000001."

    -- "This is a<&>/><x>& part of the text to split into rows_000002." etc

    set

    @text = stuff(cast(

    (

    select top 10000

    @delimiter + 'This is a<&>/><x>& part of the text to split into rows_' +

    replace(str(row_number() over(order by a.[object_id]), 6), ' ', '0')

    from sys.objects a cross join sys.objects b

    for xml path('')

    ) as varchar(max)), 1, 1, '');

    -- because the @text happens to be already entitized,

    -- pass 0 as @entitize parameter value

    exec dbo.usp_DelimitedSplit @text, @delimiter, 0;

    The above returns:

    item_number item_value

    ----------------------------------------------------------------------------------

    1 This is a<&>/><x>& part of the text to split into rows_000001

    2 This is a<&>/><x>& part of the text to split into rows_000002

    /* ... abridged ...*/

    10000 This is a<&>/><x>& part of the text to split into rows_010000

    Thanks,

    Oleg