Inefficient T-SQL Query using XQuery

  • A query in our database was flagged as being particularly inefficient. It is executed at an average rate of 13 times per minute and has an average CPU time of 2.2 seconds.

    A real performance benefit has been observed by simply ensuring that only the first occurrence of the root element is used (by adding "[1]" to the .query method path). Side-by-side, this has a relative cost of 3% versus 97% when compared with the original query. The execution plan shows a greatly reduced estimated row count (4,307 instead of 180,000) in the number of rows it thinks the Extended Operator (UDX) will process.

    I suppose our question is... why? Why was this change required to make this query 33 times more efficient, when the XML will only ever have one "MyXML" parent element anyway? 

    A modified version of the code is given below.

    Query Plan:
    https://www.brentozar.com/pastetheplan/?id=HkxyUnMJm

    -- Create temporary table.
    DECLARE @MyXML xml
    DECLARE @MyTable TABLE (
        RowID int identity(1,1),
        Reference nvarchar(30),
        SomeXML xml
    );

    INSERT INTO @MyTable (Reference, SomeXML)
    VALUES
    (
        '1234567890',
        '<MyXML>
            <Store>The Cake Shop</Store>
            <Location>Llandudno</Location>
            <Products>
                <Product Type="Chocolate Cake" Price="10.00" />
            </Products>
        </MyXML>'
    );

    -- 97% COST:
    SET @MyXML =
    (
        SELECT
            Reference,
            SomeXML.query('/MyXML/*')
        FROM @MyTable
        ORDER BY RowID
        FOR XML PATH('Wales'), ROOT('AllStores')
    );

    -- 3% COST WITH "[1]" ADDED!
    SET @MyXML =
    (
        SELECT
            Reference,
            SomeXML.query('/MyXML[1]/*')
        FROM @MyTable
        ORDER BY RowID
        FOR XML PATH('Wales'), ROOT('AllStores')
    );

  • The key thing is that the estimator doesn't know the same facts about the data that you do. The estimator is just that, an estimator; if you can help it make better estimations then you should do so.

    In this case you stated "when the XML will only ever have one "MyXML" parent element anyway". The estimator doesn't know that, therefore it assumes otherwise; and that there could be many MyXML elements and so estimates accordingly. When you change the query to use SomeXML.query('/MyXML[1]/*') instead, you're telling the query to only use the first node, which also means that 9for the query) there will only be 1. It can now safely assume there will be 1 and so it's estimates change accordingly.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Wednesday, May 23, 2018 4:22 AM

    The key thing is that the estimator doesn't know the same facts about the data that you do. The estimator is just that, an estimator; if you can help it make better estimations then you should do so.

    In this case you stated "when the XML will only ever have one "MyXML" parent element anyway". The estimator doesn't know that, therefore it assumes otherwise; and that there could be many MyXML elements and so estimates accordingly. When you change the query to use SomeXML.query('/MyXML[1]/*') instead, you're telling the query to only use the first node, which also means that 9for the query) there will only be 1. It can now safely assume there will be 1 and so it's estimates change accordingly.

    Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?

  • zoggling - Wednesday, May 23, 2018 4:44 AM

    Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?

    It's less efficient because it's expecting more data. I'd hazard a guess that, due to the high estimated row count, that the query immediately ended up using a worktable on tempdb, where as with the lower row count it was about to do the work in memory. Writing and reading from tempdb is far slower than doing processes in memory (unless you have some crazy fast storage device for your tempdb).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Wednesday, May 23, 2018 4:51 AM

    zoggling - Wednesday, May 23, 2018 4:44 AM

    Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?

    It's less efficient because it's expecting more data. I'd hazard a guess that, due to the high estimated row count, that the query immediately ended up using a worktable on tempdb, where as with the lower row count it was about to do the work in memory. Writing and reading from tempdb is far slower than doing processes in memory (unless you have some crazy fast storage device for your tempdb).

    Thank you for your thoughts.

    I have also just observed a separate similar query's Actual Execution Plans being 50:50 cost on a SQL Server 2012 SP4 instance, in comparison with 95:5 cost on a SQL Server 2014 SP2 instance! Why would that be? These are exactly the same queries run on two different instances, but the 2012 SP4 instance does not observe the second query to be any more efficient than the first!

    DECLARE @AuthenticationConfig xml =
    '<Blancmange Factory="Mulberry">
    <PuddingType Contents="Milk">
      <Ingredients Key="Eggs" Product="Farms UK" />
      <Ingredients Key="Milk" Product="Farms NI" />
      <Ingredients Key="Flour" Product="Farms IR" />
    </PuddingType>
    </Blancmange>'

    -- SQL Server 2012 SP4 - COST: 50%
    -- SQL Server 2014 SP2 - COST: 95%
    SELECT
        ROW_NUMBER() OVER(ORDER BY (SELECT 1)) as 'Row',
        AuthenticationConfig.v.value('(@Product)[1]','nvarchar(32)') as 'Product'
    FROM @AuthenticationConfig.nodes('/Blancmange/PuddingType[@Contents="Milk"]/Ingredients') AuthenticationConfig(v)

    -- SQL Server 2012 SP4 - COST: 50%
    -- SQL Server 2014 SP2 - COST: 5%
    SELECT
        ROW_NUMBER() OVER(ORDER BY (SELECT 1)) as 'Row',
        AuthenticationConfig.v.value('(@Product)[1]','nvarchar(32)') as 'Product'
    FROM @AuthenticationConfig.nodes('/Blancmange[1]/PuddingType[@Contents="Milk"]/Ingredients') AuthenticationConfig(v)

    Execution Plan - SQL Server 2014 (SP2-CU10-GDR) (KB4052725) - 12.0.5571.0 (X64)
    https://www.brentozar.com/pastetheplan/?id=HJrBATG1X

    Execution Plan - SQL Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (X64))
    https://www.brentozar.com/pastetheplan/?id=HJvD0azJm

  • zoggling - Wednesday, May 23, 2018 5:23 AM

    I have also just observed a separate similar query's Actual Execution Plans being 50:50 cost on a SQL Server 2012 SP4 instance, in comparison with 95:5 cost on a SQL Server 2014 SP2 instance! Why would that be? These are exactly the same queries run on two different instances, but the 2012 SP4 instance does not observe the second query to be any more efficient than the first!

    A new cardinality estimator was brought in with SQL Server 2014, so some queries do perform better or worse depending on if you're running on 2012- or 2014+. There's lots of articles out there on the subject: New cardinality estimator SQL Server 2014.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply