• Dwain,

    I see rather different performance using your test rig on SQL Server 2012 (build 3321) and SQL Server 2008 R2 (build 4266):

    +--------------------------------------------------+

    ¦ SQL Server ¦ PatternSplitCM ¦ SQL#.RegEx_Matches ¦

    ¦------------+----------------+--------------------¦

    ¦ 2012 ¦ 24203 ¦ 7889 ¦

    ¦ 2008 R2 ¦ 23019 ¦ 24303 ¦

    +--------------------------------------------------+

    Taking the 2008 R2 result first the RegEx is slightly slower, but there is a very important reason for this: the test rig does not produce unique strings. In a typical test, the count of distinct strings generated was 303 (out of 1000 rows total). The optimizer notices this and introduces a 'performance spool' into the query plan for the PatternSplitCM execution only. This optimization introduces a sort on the MyString column above the nested loops join, and a lazy table spool below it. The effect is to group identical MyString occurrences together, so only the first of any set of duplicates is processed by the PatternSplitCM logic. For subsequent rows with the same value of MyString, the spool replays the spooled result.

    You can see this in a post-execution plan by inspecting the rewinds and rebinds counter of the table spool. Rewinds occur where the spool replays saved results; a rebind occurs when the correlated parameter (MyString) changes and the PatternSplitCM logic needs to be performed for real. In the test where there were 303 distinct MyString values, the table spool showed 303 rebinds and 697 rewinds. Likewise, all the operators below the table spool show 303 actual executions, not 1000. It is possible to disable the 'performance spool' optimization with trace flag 8690. Doing this forces PatternSplitCM to run on all 1000 input rows, degrading performance from 23019 ms to 40588 ms.

    The optimizer does not apply this same optimization to the RegEx execution, because it has no idea how expensive that function is. By default it assumes a small cost and calculates (wrongly) that the performance spool would cost more (due to the sort and spool to tempdb) than it would save by avoiding cheap-looking function executions. We can introduce the 'performance spool' optimization to the RegEx execution with trace flag 8691. Doing this reduces the number of executions of the RegEx function from 1000 to 303, with a corresponding reduction in execution time from 24303 ms to 9978 ms.

    The situation with SQL Server 2012 is slightly different because returning rows from a CLR streaming table-valued functions has been significantly optimized in this release. Even without the performance spool, processing all 1000 rows through the RegEx takes only 7520 ms. With a performance spool added using TF 8691, the elapsed time is reduced further to 4886 ms.

    A secondary consideration is that I see significant sort spills to tempdb when running the PatternSplitCM solution. This is easy to see with the new warning symbols in SQL Server 2012; in previous releases we need to monitor for sort warnings using Profiler. The underlying reason is the guessed 100-row cardinality from the Top operator, which is used to estimate the sort memory requirement. The strings in this test are quite long, and so require more than 100 numbers from the on-the-fly numbers table.