• The tally table method seems to work fine, but the estimated and actual execution plans show a pretty severe penalty. My guess is that the initial file I/O and memory allocation of the table's index are at fault for those costs. I then derived a little test on about 17,000 tokens using the initial method, the XML method, and the tally table. With that said, the tally table won by a long shot when parsing a large array:

    declare @i int = 26, @x varchar(max) = '', @d char(1) = ' ', @j-2 int;

    declare @t table (id tinyint primary key, tokens int, which varchar(32), start datetime, finish datetime);

    --Note: this is going to take a while, so if you want to run this more than once, store this data somewhere...

    set @j-2 = @i*@i;

    while @j-2 > 0 BEGIN

    while @i > 0 BEGIN

    set @x = @x + @d +CHAR(91 - @i);

    set @i = @i - 1;

    END

    set @j-2 = @j-2 - 1;

    set @i = 26

    END

    declare @C int;

    insert into @t (id,which,start) values (1,'fnStringSplit_2005_Distinct_CHAR',getdate());

    select @C = COUNT(*) from ..[fnStringSplit_2005_Distinct_CHAR](@x,@d);

    update @t set tokens = @C, finish = GETDATE() where id = 1;

    insert into @t (id,which,start) values (2,'udf_StrList2Table',getdate());

    select @C = COUNT(*) from ..[udf_StrList2Table] (@x,@d)

    update @t set tokens = @C, finish = GETDATE() where id = 2;

    insert into @t (id,which,start) values (3,'fn_Split',getdate());

    select @C = COUNT(*) from ..[fn_Split](@x,@d)

    update @t set tokens = @C, finish = GETDATE() where id = 3;

    select *, CONVERT(float, finish) - convert(float,start) as runtime from @t;

    The results show us that the clear winner as the tally table. Volume seems to be a very large consideration in which method you choose. It looks like arrays of enormous scale should be parsed with a tally table method:

    idtokenswhichstartfinishruntime

    217577udf_StrList2Table54:52.054:52.00.00000057870056480169

    317577fn_Split54:52.055:04.00.00013854166900273400

    117577fnStringSplit_2005_Distinct_CHAR51:26.954:52.00.00237364969507325000

    I added back my CTE function and reran it with a limit of 100 tokens (CTE max):

    declare @i int = 26, @x varchar(max) = '', @d char(1) = ' ', @j-2 int;

    declare @t table (id tinyint primary key, tokens int, which varchar(32), start datetime, finish datetime);

    set @j-2 = @i;

    while @j-2 > 0 BEGIN

    while @i > 0 BEGIN

    set @x = @x + @d +CHAR(91 - @i);

    set @i = @i - 1;

    END

    set @j-2 = @j-2 - 1;

    set @i = 26

    END

    set @x = LEFT(@x,100)

    declare @C int;

    insert into @t (id,which,start) values (0,'fnStringSplit_CTE',getdate());

    select @C = COUNT(*) from ..[fnStringSplit_CTE](@x,@d)

    update @t set tokens = @C, finish = GETDATE() where id = 0;

    insert into @t (id,which,start) values (1,'fnStringSplit_2005_Distinct_CHAR',getdate());

    select @C = COUNT(*) from ..[fnStringSplit_2005_Distinct_CHAR](@x,@d);

    update @t set tokens = @C, finish = GETDATE() where id = 1;

    insert into @t (id,which,start) values (2,'udf_StrList2Table',getdate());

    select @C = COUNT(*) from ..[udf_StrList2Table] (@x,@d)

    update @t set tokens = @C, finish = GETDATE() where id = 2;

    insert into @t (id,which,start) values (3,'fn_Split',getdate());

    select @C = COUNT(*) from ..[fn_Split](@x,@d)

    update @t set tokens = @C, finish = GETDATE() where id = 3;

    select *, CONVERT(float, finish) - convert(float,start) as runtime from @t;

    The results show CTE is the winner, but the others are fairly close with the tally table last:

    idtokenswhichstartfinishruntime

    051fnStringSplit_CTE10:29.710:29.70.00000003858440322801

    151fnStringSplit_2005_Distinct_CHAR10:29.710:29.70.00000027006171876565

    351fn_Split10:29.710:29.80.00000027006171876565

    251udf_StrList2Table10:29.710:29.70.00000038580037653446

    The funny part about this one is that if I ran it a dozen times, only the tally table's result changed dramatically when splitting a small array of values (it sometimes took a lot longer... randomly).

    So what I take from this is CTE is best for small arrays, and the tally table is best for large ones. The one last test I did not try is to run several thousand small splits at volume (may about 200 sets of 100 would do it). The only way I could think of to mock this up is to make a highly parallel package call via SSIS (basically, the same function called 2000 times at the same time). If I feel inspired, I'll give it a shot. Any suggestions on an easier parallelization test are welcome.