SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Xquery .value() method performance problem


Xquery .value() method performance problem

Author
Message
milotet
milotet
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 77
We have a SQL 2012 database containing an XML column against which we have defined the Primary and three Secondary indexes permitted, along with a FullText index. The XML column does not have a schema collection (though we have experimented with this and have not been able to identify any benefit).

The XML has a very flat structure, of this nature:-
<Root>
<DocumentType>
<FieldName1 @value=”x” />
<FieldName2 @value=”x” />


</DocumentType>
</Root>

This table contains only 1.6 million rows, with each record containing xml of one of multiple DocumentTypes. There is an application over this table which allows the user to search the xml documents, selecting both filter and display criteria, from which TSQL is generated. A typical query that would get created is shown here:-

with xmlnamespaces ('http://namex/namey/4.0/DSL/ICMS/FormData' as ns)
select [ID],
[ActionDate],
[TypeName],
[Status],
[CreatedByName],
[CreatedDate],
[UpdatedByName],
[UpdatedDate],
[LocationID],
[FormData].value('(/ns:FormData/ns:legacydata/ns:firstname/@value)[1]','nvarchar(MAX)') as 'FirstName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:lastname/@value)[1]','nvarchar(MAX)') as 'LastName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:prepfirstname/@value)[1]','nvarchar(MAX)') as 'Report Preparer FirstName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:preplastname/@value)[1]','nvarchar(MAX)') as 'Report Preparer LastName',
[FormData].value('(/ns:FormData/ns:legacydata/ns:eventtype/@value)[1]','nvarchar(MAX)') as 'Event Type',
[FormData].value('(/ns:FormData/ns:legacydata/ns:totalvalue/@value)[1]','nvarchar(MAX)') as 'Total value',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactidnumber/@value)[1]','nvarchar(MAX)') as 'Vehicle Plate No',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactdob/@value)[1]','nvarchar(MAX)') as 'Date of Birth',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactaddress/@value)[1]','nvarchar(MAX)') as 'Address',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactcity/@value)[1]','nvarchar(MAX)') as 'City',
[FormData].value('(/ns:FormData/ns:legacydata/ns:contactstate/@value)[1]','nvarchar(MAX)') as 'State'
from [case].[Incident] with (nolock)
where [Type] = 'E7391567-2456-422F-9371-81CBFFC90950'
and CONTAINS([FormData], '"external"')
and [FormData].exist('(/ns:FormData/ns:legacydata/ns:subcategory/@value[.="external"])[1]')=1

This returns approximately 55,000 rows, but takes about 36 seconds to do so – without the xquery .value() statements it returns within a few seconds. From observations, the vast majority of the execution time appears to be taken up with the spooling of results. This appears to indicate that it is expensive to get the XML based data using the .value() method.

Looking at the execution plan, each of the xquery .value() statements generates a Primary XML index seek, rather than a seek on the Secondary Path index, which is what might be expected. This query only shows a small number of display columns, there is no limit on the number that can be selected. The documents are user defined, and can contain hundreds of fields, and as they are defined by the user, there is no way to know how to sensibly shred the documents into relational tables and frankly would negate the whole point of using XML.

We have experimented with the TSQL, for instance joining to a sub-select using only the CONTAINS clause for the FullText index (which returns the 55,000 IDs in milliseconds), using xquery .nodes() and cross apply and other re-factoring, yet have been unable to squeeze out any further performance gains.

Looking at all the documentation and articles on the internet, the Primary Index will always be used, and usage of the secondary XML indexes is based on the cost based optimisation, which as we understand it depends on statistics. Yet it is not possible to create statistics on XML datatype columns. The final predicate using the xquery .exist(), does do a seek on the Secondary VALUE index so it has some means of determining that it is better to do so, but how?

So the question is, are we getting the maximum performance possible from SQL, or are there further changes we can make to improve the performance?
mister.magoo
mister.magoo
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10355 Visits: 7891
Do you really need nvarchar(MAX)?
Does it save any time to reduce those to nvarchar(4000)?
Does it speed things up if you comment out all but one of the ".value" columns?

MM


select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • milotet
    milotet
    SSC Rookie
    SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

    Group: General Forum Members
    Points: 31 Visits: 77
    Hi, thanks for responding.

    Unfortunately the data captured in the XML is defined by the users, and includes free text areas which are not limited in size. The requirement is to return ALL the data.

    Changing NVARCHAR(MAX) to NVARCHAR(4000) has no discernible effect.

    If all the xquery.value() selects are removed, the data returns in about 7 seconds
    If there is only one xquery .value() method in the query, then the data is returned in about 14 seconds - as they are added back in, the time increases
    mister.magoo
    mister.magoo
    SSChampion
    SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

    Group: General Forum Members
    Points: 10355 Visits: 7891
    you said you tried using the ".nodes" method - did you do that to get down to the ns:formdata/ns:legacydata level, then just take values from e.g. ns:firstname/@value ? and it made no difference?

    MM


    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • milotet
    milotet
    SSC Rookie
    SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

    Group: General Forum Members
    Points: 31 Visits: 77
    Yes, that is what we did
    Erland Sommarskog
    Erland Sommarskog
    SSCertifiable
    SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)

    Group: General Forum Members
    Points: 5084 Visits: 875
    You mention spooling, that should not really happen - but I know that it can happen under some circumstances anyway. Some of these issue were addressed in SQL 2008, but you have to turn on trace flag 4199 to get the benefit. I like to believe that this should not be necessary in SQL 2012, but give it a try.

    What you also could consider is to insert the data into a temp table, only taking query('/ns:FormData/ns:legacydata') in that table, and then shred from there.

    These are shots in the dark - it's quite difficult to performance-tune without access to the database.

    Erland Sommarskog, SQL Server MVP, www.sommarskog.se
    Go


    Permissions

    You can't post new topics.
    You can't post topic replies.
    You can't post new polls.
    You can't post replies to polls.
    You can't edit your own topics.
    You can't delete your own topics.
    You can't edit other topics.
    You can't delete other topics.
    You can't edit your own posts.
    You can't edit other posts.
    You can't delete your own posts.
    You can't delete other posts.
    You can't post events.
    You can't edit your own events.
    You can't edit other events.
    You can't delete your own events.
    You can't delete other events.
    You can't send private messages.
    You can't send emails.
    You can read topics.
    You can't vote in polls.
    You can't upload attachments.
    You can download attachments.
    You can't post HTML code.
    You can't edit HTML code.
    You can't post IFCode.
    You can't post JavaScript.
    You can post emoticons.
    You can't post or upload images.

    Select a forum

































































































































































    SQLServerCentral


    Search