• Mikael Eriksson SE - Monday, April 24, 2017 11:34 PM

    Using the parent axis when extracting values from XML is not good for performance. 
    Use an extra cross apply instead.

    select P.X.value('(PackID/text())[1]','int') as PackID,
       I.X.value('(ItemNumber/text())[1]','int') as ItemNumber,
       I.X.value('(Qty/text())[1]','int') as Qty
    from @packXML.nodes('/Pack') as P(X)
    cross apply P.X.nodes('Item') as I(X);

    For more info on the problem with the parent axis you can have a look at this question on StackOverflow:
    Cross apply xml query performs exponentially worse as xml document grows
    .

    Well done Mikael. I would have thought that the extra APPLY would increase the overhead but you are 100% right - the parent axis is a killer. I looked at the link you posted which includes some good info but also did a quick performance test and was blown away at the difference. 


    IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
    CREATE TABLE #tmp (packkey int identity primary key, packXML XML NOT NULL);

    INSERT #tmp(packXML)
    SELECT TOP (100000)
    '<Pack>
    <PackID>'+ CAST(abs(checksum(newid())%10) AS varchar(2)) +'</PackID>
    <Item>
    <ItemNumber>1</ItemNumber>
    <Qty>'+ CAST(abs(checksum(newid())%20)+1 AS varchar(2)) +'</Qty>
    </Item>
    </Pack>
    <Pack>
    <PackID>2</PackID>
    <Item>
    <ItemNumber>1</ItemNumber>
    <Qty>'+ CAST(abs(checksum(newid())%20)+1 AS varchar(2)) +'</Qty>
    </Item>
    <Item>
    <ItemNumber>2</ItemNumber>
    <Qty>'+ CAST(abs(checksum(newid())%90)+1 AS varchar(2)) +'</Qty>
    </Item>
    </Pack>'
    FROM sys.all_columns a, sys.all_columns b;

    SET NOCOUNT ON;
    IF OBJECT_ID('tempdb..#dump1') IS NOT NULL DROP TABLE #dump1;
    IF OBJECT_ID('tempdb..#dump2') IS NOT NULL DROP TABLE #dump2;
    GO

    SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    SELECT
    packkey,
    N.value('(../PackID/text())[1]', 'int') as PackID,
    N.value('(ItemNumber/text())[1]','int') as ItemNumber,
    N.value('(Qty/text())[1]','int') as Qty
    INTO #dump1
    FROM #tmp
    CROSS APPLY packXML.nodes('Pack/Item') as T(N);

    SELECT
    packkey,
    P.X.value('(PackID/text())[1]','int') as PackID,
    I.X.value('(ItemNumber/text())[1]','int') as ItemNumber,
    I.X.value('(Qty/text())[1]','int') as Qty
    INTO #dump2
    FROM #tmp
    CROSS APPLY packXML.nodes('/Pack') as P(X)
    CROSS APPLY P.X.nodes('Item') as I(X);
    SET STATISTICS TIME OFF;
    SET STATISTICS IO OFF;
    GO

    DROP TABLE #dump1;
    DROP TABLE #dump2;
    DROP TABLE #tmp;

    Results

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#tmp_0000000002B3'. Scan count 3, logical reads 2278, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 25422 ms, elapsed time = 13259 ms.
    Table '#tmp_0000000002B3'. Scan count 3, logical reads 2278, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 6313 ms, elapsed time = 3205 ms.


    I learned something new today before 9AM.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001