Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Xquery .value() method performance problem Expand / Collapse
Author
Message
Posted Thursday, July 11, 2013 5:30 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 05, 2013 3:00 AM
Points: 3, Visits: 56
We have a SQL 2012 database containing an XML column against which we have defined the Primary and three Secondary indexes permitted, along with a FullText index. The XML column does not have a schema collection (though we have experimented with this and have not been able to identify any benefit).

The XML has a very flat structure, of this nature:-
<Root>
<DocumentType>
<FieldName1 @value=”x” />
<FieldName2 @value=”x” />


</DocumentType>
</Root>

This table contains only 1.6 million rows, with each record containing xml of one of multiple DocumentTypes. There is an application over this table which allows the user to search the xml documents, selecting both filter and display criteria, from which TSQL is generated. A typical query that would get created is shown here:-

with xmlnamespaces ('http://namex/namey/4.0/DSL/ICMS/FormData' as ns)
select [ID],
[ActionDate],
[TypeName],
[Status],
[CreatedByName],
[CreatedDate],
[UpdatedByName],
[UpdatedDate],
[LocationID],
[FormData].value('(/ns:FormData/ns:legacydata/ns:firstname/@value)[1]','nvarchar(MAX)') as 'FirstName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:lastname/@value)[1]','nvarchar(MAX)') as 'LastName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:prepfirstname/@value)[1]','nvarchar(MAX)') as 'Report Preparer FirstName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:preplastname/@value)[1]','nvarchar(MAX)') as 'Report Preparer LastName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:eventtype/@value)[1]','nvarchar(MAX)') as 'Event Type',
[FormData].value('(/ns:FormData/ns:legacydata/ns:totalvalue/@value)[1]','nvarchar(MAX)') as 'Total value',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactidnumber/@value)[1]','nvarchar(MAX)') as 'Vehicle Plate No',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactdob/@value)[1]','nvarchar(MAX)') as 'Date of Birth',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactaddress/@value)[1]','nvarchar(MAX)') as 'Address',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactcity/@value)[1]','nvarchar(MAX)') as 'City',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactstate/@value)[1]','nvarchar(MAX)') as 'State'
from [case].[Incident] with (nolock)
where [Type] = 'E7391567-2456-422F-9371-81CBFFC90950'
and CONTAINS([FormData], '"external"')
and [FormData].exist('(/ns:FormData/ns:legacydata/ns:subcategory/@value[.="external"])[1]')=1

This returns approximately 55,000 rows, but takes about 36 seconds to do so – without the xquery .value() statements it returns within a few seconds. From observations, the vast majority of the execution time appears to be taken up with the spooling of results. This appears to indicate that it is expensive to get the XML based data using the .value() method.

Looking at the execution plan, each of the xquery .value() statements generates a Primary XML index seek, rather than a seek on the Secondary Path index, which is what might be expected. This query only shows a small number of display columns, there is no limit on the number that can be selected. The documents are user defined, and can contain hundreds of fields, and as they are defined by the user, there is no way to know how to sensibly shred the documents into relational tables and frankly would negate the whole point of using XML.

We have experimented with the TSQL, for instance joining to a sub-select using only the CONTAINS clause for the FullText index (which returns the 55,000 IDs in milliseconds), using xquery .nodes() and cross apply and other re-factoring, yet have been unable to squeeze out any further performance gains.

Looking at all the documentation and articles on the internet, the Primary Index will always be used, and usage of the secondary XML indexes is based on the cost based optimisation, which as we understand it depends on statistics. Yet it is not possible to create statistics on XML datatype columns. The final predicate using the xquery .exist(), does do a seek on the Secondary VALUE index so it has some means of determining that it is better to do so, but how?

So the question is, are we getting the maximum performance possible from SQL, or are there further changes we can make to improve the performance?
Post #1472512
Posted Thursday, July 11, 2013 8:01 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 1:54 AM
Points: 1,663, Visits: 5,231
Do you really need nvarchar(MAX)?
Does it save any time to reduce those to nvarchar(4000)?
Does it speed things up if you comment out all but one of the ".value" columns?


MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1472591
    Posted Thursday, July 11, 2013 8:23 AM
    Forum Newbie

    Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

    Group: General Forum Members
    Last Login: Tuesday, November 05, 2013 3:00 AM
    Points: 3, Visits: 56
    Hi, thanks for responding.

    Unfortunately the data captured in the XML is defined by the users, and includes free text areas which are not limited in size. The requirement is to return ALL the data.

    Changing NVARCHAR(MAX) to NVARCHAR(4000) has no discernible effect.

    If all the xquery.value() selects are removed, the data returns in about 7 seconds
    If there is only one xquery .value() method in the query, then the data is returned in about 14 seconds - as they are added back in, the time increases
    Post #1472606
    Posted Thursday, July 11, 2013 8:28 AM


    SSCommitted

    SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

    Group: General Forum Members
    Last Login: Today @ 1:54 AM
    Points: 1,663, Visits: 5,231
    you said you tried using the ".nodes" method - did you do that to get down to the ns:formdata/ns:legacydata level, then just take values from e.g. ns:firstname/@value ? and it made no difference?

    MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1472609
    Posted Thursday, July 11, 2013 8:29 AM
    Forum Newbie

    Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

    Group: General Forum Members
    Last Login: Tuesday, November 05, 2013 3:00 AM
    Points: 3, Visits: 56
    Yes, that is what we did
    Post #1472612
    Posted Sunday, July 14, 2013 3:45 PM


    Right there with Babe

    Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

    Group: General Forum Members
    Last Login: 2 days ago @ 7:54 AM
    Points: 758, Visits: 633
    You mention spooling, that should not really happen - but I know that it can happen under some circumstances anyway. Some of these issue were addressed in SQL 2008, but you have to turn on trace flag 4199 to get the benefit. I like to believe that this should not be necessary in SQL 2012, but give it a try.

    What you also could consider is to insert the data into a temp table, only taking query('/ns:FormData/ns:legacydata') in that table, and then shred from there.

    These are shots in the dark - it's quite difficult to performance-tune without access to the database.


    Erland Sommarskog, SQL Server MVP, www.sommarskog.se
    Post #1473404
    « Prev Topic | Next Topic »

    Add to briefcase

    Permissions Expand / Collapse