Nice job, Arthur.
No reflection on your fine code. Just an observation about XML. I'm absolutely amazed at how expensive this type of XML processing actually is. The underlying table valued function does...
2 Full Cartesian Products (Square Joins)
2 Triangular Joins
2 Double Joins (2x # of rows)
6 Single Joins (1x # of rows)
The CPU time it took to process 16 "sets" (1 set per result row) of such data on my admittedly 10 year old desktop was 32ms. Now, even if that performance was linear (and it won't be), it would take 8.8 hours to import a lousy million rows where something like BCP importing a million row delimited file containing the same data takes well less than a minute.
Then, there's the impact on the pipe and the IO system. A million "rows" of this type of XML data (sans any unnecessary spaces and all end of line characters) is about 168 million characters. A delimited file with the same information AND 2 character end of line markers is only about 42 million characters or about one fourth as much IO.
I know I sound like a Luddite, but I'm just not seeing any advantage of using "flat" XML to transmit and import "flat" data.
--Jeff Moden
Change is inevitable... Change for the better is not.