Firstly, thank you all for looking into this. It was me after all missing something rather obvious, maybe too obvious. The CTE in the code generates a sequence of 10^8 entries which crosses the threshold of noticeable generation time (see table).
+------------+--------+---------+
| Row count | CPU ms | Elapsed |
+------------+--------+---------+
| 100 | 0 | 0 |
| 1000 | 0 | 0 |
| 10000 | 0 | 1 |
| 100000 | 16 | 17 |
| 1000000 | 217 | 155 |
| 10000000 | 1981 | 1566 |
| 100000000 | 20313 | 15955 |
| 1000000000 | 194144 | 153251 |
+------------+--------+---------+
Testing on 2008 to 2014 produced similar performance and almost identical execution plans with only a slight difference in cost distribution. Erland was spot on, all of the plans exhausted the CTE number generation.
I believe that Itzik Ben-Gans smart fn_nums is more geared towards populating number tables than for inline usage such as this. He has another version which includes a TOP/OOF clause. In fact, without a TOP/OOF clause, all tests had the same problem, regardless of the maximum number or the presence of a where / filtering clause outside the CTE.
Fixing this in the code was simply a case of changing the second CTE.
😎
;WITH NX(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))
SELECT
MT.MYTEMP_ID
,NM.N
,SUBSTRING(MT.MYTEMP_STR,NM.N,1) AS STR_PART
FROM #MYTEMP MT
CROSS APPLY
(
/* Tally moved here for using the len() to limit the rowcount */
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N
FROM NX N1,NX N2,NX N3 ,NX N4,NX N5,NX N6,NX N7,NX N8,NX N9 -- (10^9)
ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (LEN(MT.MYTEMP_STR)) ROWS ONLY
) AS NM;