Xquery .value() method performance problem

  • We have a SQL 2012 database containing an XML column against which we have defined the Primary and three Secondary indexes permitted, along with a FullText index. The XML column does not have a schema collection (though we have experimented with this and have not been able to identify any benefit).

    The XML has a very flat structure, of this nature:-

    <Root>

    <DocumentType>

    <FieldName1 @value=”x” />

    <FieldName2 @value=”x” />

    </DocumentType>

    </Root>

    This table contains only 1.6 million rows, with each record containing xml of one of multiple DocumentTypes. There is an application over this table which allows the user to search the xml documents, selecting both filter and display criteria, from which TSQL is generated. A typical query that would get created is shown here:-

    with xmlnamespaces ('http://namex/namey/4.0/DSL/ICMS/FormData' as ns)

    select [ID],

    [ActionDate],

    [TypeName],

    [Status],

    [CreatedByName],

    [CreatedDate],

    [UpdatedByName],

    [UpdatedDate],

    [LocationID],

    [FormData].value('(/ns:FormData/ns:legacydata/ns:firstname/@value)[1]','nvarchar(MAX)') as 'FirstName',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:lastname/@value)[1]','nvarchar(MAX)') as 'LastName',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:prepfirstname/@value)[1]','nvarchar(MAX)') as 'Report Preparer FirstName',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:preplastname/@value)[1]','nvarchar(MAX)') as 'Report Preparer LastName',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:eventtype/@value)[1]','nvarchar(MAX)') as 'Event Type',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:totalvalue/@value)[1]','nvarchar(MAX)') as 'Total value',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:contactidnumber/@value)[1]','nvarchar(MAX)') as 'Vehicle Plate No',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:contactdob/@value)[1]','nvarchar(MAX)') as 'Date of Birth',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:contactaddress/@value)[1]','nvarchar(MAX)') as 'Address',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:contactcity/@value)[1]','nvarchar(MAX)') as 'City',

    [FormData].value('(/ns:FormData/ns:legacydata/ns:contactstate/@value)[1]','nvarchar(MAX)') as 'State'

    from [case].[Incident] with (nolock)

    where [Type] = 'E7391567-2456-422F-9371-81CBFFC90950'

    and CONTAINS([FormData], '"external"')

    and [FormData].exist('(/ns:FormData/ns:legacydata/ns:subcategory/@value[.="external"])[1]')=1

    This returns approximately 55,000 rows, but takes about 36 seconds to do so – without the xquery .value() statements it returns within a few seconds. From observations, the vast majority of the execution time appears to be taken up with the spooling of results. This appears to indicate that it is expensive to get the XML based data using the .value() method.

    Looking at the execution plan, each of the xquery .value() statements generates a Primary XML index seek, rather than a seek on the Secondary Path index, which is what might be expected. This query only shows a small number of display columns, there is no limit on the number that can be selected. The documents are user defined, and can contain hundreds of fields, and as they are defined by the user, there is no way to know how to sensibly shred the documents into relational tables and frankly would negate the whole point of using XML.

    We have experimented with the TSQL, for instance joining to a sub-select using only the CONTAINS clause for the FullText index (which returns the 55,000 IDs in milliseconds), using xquery .nodes() and cross apply and other re-factoring, yet have been unable to squeeze out any further performance gains.

    Looking at all the documentation and articles on the internet, the Primary Index will always be used, and usage of the secondary XML indexes is based on the cost based optimisation, which as we understand it depends on statistics. Yet it is not possible to create statistics on XML datatype columns. The final predicate using the xquery .exist(), does do a seek on the Secondary VALUE index so it has some means of determining that it is better to do so, but how?

    So the question is, are we getting the maximum performance possible from SQL, or are there further changes we can make to improve the performance?

  • Do you really need nvarchar(MAX)?

    Does it save any time to reduce those to nvarchar(4000)?

    Does it speed things up if you comment out all but one of the ".value" columns?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Hi, thanks for responding.

    Unfortunately the data captured in the XML is defined by the users, and includes free text areas which are not limited in size. The requirement is to return ALL the data.

    Changing NVARCHAR(MAX) to NVARCHAR(4000) has no discernible effect.

    If all the xquery.value() selects are removed, the data returns in about 7 seconds

    If there is only one xquery .value() method in the query, then the data is returned in about 14 seconds - as they are added back in, the time increases

  • you said you tried using the ".nodes" method - did you do that to get down to the ns:formdata/ns:legacydata level, then just take values from e.g. ns:firstname/@value ? and it made no difference?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Yes, that is what we did

  • You mention spooling, that should not really happen - but I know that it can happen under some circumstances anyway. Some of these issue were addressed in SQL 2008, but you have to turn on trace flag 4199 to get the benefit. I like to believe that this should not be necessary in SQL 2012, but give it a try.

    What you also could consider is to insert the data into a temp table, only taking query('/ns:FormData/ns:legacydata') in that table, and then shred from there.

    These are shots in the dark - it's quite difficult to performance-tune without access to the database.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Viewing 6 posts - 1 through 5 (of 5 total)

    You must be logged in to reply to this topic. Login to reply