Suboptimal execution plan with xml.nodes

  • I've noticed that xml.nodes always have an estimated number of rows of 200, like for instance in the following example:

    declare @xml xml;

    set @xml = '

    <root>

    <a><test>1</test></a>

    </root>'

    select

    x.a.value('(test)[1]','int')

    from

    @xml.nodes('/root/a') x(a)

    Now, this may be fine on most cases, in my case it definitely not. I recieve an XML document with more than 10,000 rows which I intend to shred into several tables. To me, this seems like good candidate for parallelism, or at least looking at merge or hash join instead of only inner loop. However, due to the estimated number of rows being as low as 200, this does never take place. Does anyone have a great way fo telling SQL Server to estimate for instance 15000 rows for xml.nodes?



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Did you try to insert the xml data into a table with an indexed xml column?

    AFAIK, SQL Server does not calculate any statistical information on a variable. Therefore, it can't detect that the pre-defined(?) value is off.

    By storing the data in an indexed xml column, you might benefit from the statistic information.

    As a side note: I don't think this will lead to parallelism though, since 15k elements is still not that much.

    Most probably, storing the data in an indexed xml column will already improve performance so parallelism is no longer needed.

    If my guess turns out to be wrong, please provide more detailed information (sample data, current query, expected result, current performance information) so we can have a closer look into it.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • When you work with XML variable, SQL Server only knows that it is going to work with an XML variable, but has no idea about the XML variable’s size or structure. In order to create a query plan it takes some assumptions. If the XML document is a small one and you don’t join any tables to the results of the XQuery, then most times you won’t have performance problems. Unfortunately in some cases you can have performance problems. You could try working with temporary table that has an XML index on the column. This way SQL Server will have some idea about the XML before runtime and it will be able to produce a better query plan. This can help performance especially if you join the results of the XQuery with another table. I don’t think that will improve performance if you just use the XQuery on one XML document, but you can test it. Bellow is the modification of your code that shows that SQL Server expects only 1 row and not 200 with the same XML after it was inserted into the temporary table. Again you should check if it helps you in terms of performance because as I wrote before it is mainly helpful when you join the results of the XQuery to another table.

    create table #tmp (id int not null identity(1,1) primary key, XmlCol xml)

    create primary xml index TmpIndex on #tmp (XmlCol)

    declare @xml xml;

    set @xml = '

    <root>

    <a><test>1</test></a>

    </root>'

    insert into #tmp (XmlCol) values (@xml)

    select

    x.a.value('(test)[1]','int')

    from

    #tmp cross apply XmlCol.nodes('/root/a') x(a)

    drop table #tmp

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks for the response folks. You're absolutely right, having an indexed XML column will greatly improve the statistics, and generate a more efficient execution plan, using merge joins instead of inner loop. Unfortunately, generating the index is a time and CPU-consuming operation, so the total operation went from 14 to 19 seconds. So, the only way to improve this performance seems to be having the .NET developers to this in the application and use set-based inserts against the tables.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply