Inefficient T-SQL Query using XQuery

Question

Inefficient T-SQL Query using XQuery

zoggling

SSCrazy

Points: 2234
More actions
May 23, 2018 at 3:52 am

#346820

A query in our database was flagged as being particularly inefficient. It is executed at an average rate of 13 times per minute and has an average CPU time of 2.2 seconds.
A real performance benefit has been observed by simply ensuring that only the first occurrence of the root element is used (by adding "[1]" to the .query method path). Side-by-side, this has a relative cost of 3% versus 97% when compared with the original query. The execution plan shows a greatly reduced estimated row count (4,307 instead of 180,000) in the number of rows it thinks the Extended Operator (UDX) will process.
I suppose our question is... why? Why was this change required to make this query 33 times more efficient, when the XML will only ever have one "MyXML" parent element anyway?
A modified version of the code is given below.
Query Plan:
https://www.brentozar.com/pastetheplan/?id=HkxyUnMJm
-- Create temporary table. DECLARE @MyXML xml DECLARE @MyTable TABLE (     RowID int identity(1,1),     Reference nvarchar(30),     SomeXML xml );
INSERT INTO @MyTable (Reference, SomeXML) VALUES (     '1234567890',     '<MyXML>         <Store>The Cake Shop</Store>         <Location>Llandudno</Location>         <Products>             <Product Type="Chocolate Cake" Price="10.00" />         </Products>     </MyXML>' );
-- 97% COST: SET @MyXML = (     SELECT         Reference,         SomeXML.query('/MyXML/*')     FROM @MyTable     ORDER BY RowID     FOR XML PATH('Wales'), ROOT('AllStores') );
-- 3% COST WITH "[1]" ADDED! SET @MyXML = (     SELECT         Reference,         SomeXML.query('/MyXML[1]/*')     FROM @MyTable     ORDER BY RowID     FOR XML PATH('Wales'), ROOT('AllStores') );

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

The key thing is that the estimator doesn't know the same facts about the data that you do. The estimator is just that, an estimator; if you can help it make better estimations then you should do so.

In this case you stated "when the XML will only ever have one "MyXML" parent element anyway". The estimator doesn't know that, therefore it assumes otherwise; and that there could be many MyXML elements and so estimates accordingly. When you change the query to use SomeXML.query('/MyXML[1]/*') instead, you're telling the query to only use the first node, which also means that 9for the query) there will only be 1. It can now safely assume there will be 1 and so it's estimates change accordingly.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

zoggling SSCrazy Points: 2234 More actions · Answer 2

Thom A - Wednesday, May 23, 2018 4:22 AM
The key thing is that the estimator doesn't know the same facts about the data that you do. The estimator is just that, an estimator; if you can help it make better estimations then you should do so.
In this case you stated "when the XML will only ever have one "MyXML" parent element anyway". The estimator doesn't know that, therefore it assumes otherwise; and that there could be many MyXML elements and so estimates accordingly. When you change the query to use SomeXML.query('/MyXML[1]/*') instead, you're telling the query to only use the first node, which also means that 9for the query) there will only be 1. It can now safely assume there will be 1 and so it's estimates change accordingly.

Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?

Thom A SSC Guru Points: 99126 More actions · Answer 3

zoggling - Wednesday, May 23, 2018 4:44 AM
Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?

It's less efficient because it's expecting more data. I'd hazard a guess that, due to the high estimated row count, that the query immediately ended up using a worktable on tempdb, where as with the lower row count it was about to do the work in memory. Writing and reading from tempdb is far slower than doing processes in memory (unless you have some crazy fast storage device for your tempdb).

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

zoggling SSCrazy Points: 2234 More actions · Answer 4

Thom A - Wednesday, May 23, 2018 4:51 AM
zoggling - Wednesday, May 23, 2018 4:44 AM
Does this mean that, in practice, this is not a performance issue? Or does it become a much less efficient execution plan as a result of estimating a higher workload?
It's less efficient because it's expecting more data. I'd hazard a guess that, due to the high estimated row count, that the query immediately ended up using a worktable on tempdb, where as with the lower row count it was about to do the work in memory. Writing and reading from tempdb is far slower than doing processes in memory (unless you have some crazy fast storage device for your tempdb).

Thank you for your thoughts.

I have also just observed a separate similar query's Actual Execution Plans being 50:50 cost on a SQL Server 2012 SP4 instance, in comparison with 95:5 cost on a SQL Server 2014 SP2 instance! Why would that be? These are exactly the same queries run on two different instances, but the 2012 SP4 instance does not observe the second query to be any more efficient than the first!

DECLARE @AuthenticationConfig xml = '<Blancmange Factory="Mulberry"> <PuddingType Contents="Milk"> <Ingredients Key="Eggs" Product="Farms UK" /> <Ingredients Key="Milk" Product="Farms NI" /> <Ingredients Key="Flour" Product="Farms IR" /> </PuddingType> </Blancmange>'

-- SQL Server 2012 SP4 - COST: 50% -- SQL Server 2014 SP2 - COST: 95% SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) as 'Row', AuthenticationConfig.v.value('(@Product)[1]','nvarchar(32)') as 'Product' FROM @AuthenticationConfig.nodes('/Blancmange/PuddingType[@Contents="Milk"]/Ingredients') AuthenticationConfig(v)

-- SQL Server 2012 SP4 - COST: 50% -- SQL Server 2014 SP2 - COST: 5% SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) as 'Row', AuthenticationConfig.v.value('(@Product)[1]','nvarchar(32)') as 'Product' FROM @AuthenticationConfig.nodes('/Blancmange[1]/PuddingType[@Contents="Milk"]/Ingredients') AuthenticationConfig(v)

Execution Plan - SQL Server 2014 (SP2-CU10-GDR) (KB4052725) - 12.0.5571.0 (X64)
https://www.brentozar.com/pastetheplan/?id=HJrBATG1X

Execution Plan - SQL Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (X64))
https://www.brentozar.com/pastetheplan/?id=HJvD0azJm

Thom A SSC Guru Points: 99126 More actions · Answer 5

zoggling - Wednesday, May 23, 2018 5:23 AM
I have also just observed a separate similar query's Actual Execution Plans being 50:50 cost on a SQL Server 2012 SP4 instance, in comparison with 95:5 cost on a SQL Server 2014 SP2 instance! Why would that be? These are exactly the same queries run on two different instances, but the 2012 SP4 instance does not observe the second query to be any more efficient than the first!

A new cardinality estimator was brought in with SQL Server 2014, so some queries do perform better or worse depending on if you're running on 2012- or 2014+. There's lots of articles out there on the subject: New cardinality estimator SQL Server 2014.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk