Performance issue with XML

  • Big surprise right?  Well, this one is to me.

    I'm in the process of changing the datatype on a table from text to XML (it was done in SQL 2000 initialy).  The front end (.NET) sends in 2 BLOB's, one of which gets stored in the database, the other which I am working hard on to make no longer needed.

    To complicate things some of the XML data is copied into seperate tables for efficient searching of the data (such as people).

    The original stored procedure uses sp_xml_preparedocument to get the text field ready to do the inserts.  The new one selects the data right out of the passed in blob (now declared as an XML datatype). using the format of From @XML_var.nodes('//node').

    In performance testing the original method of using sp_xml_preparedocument has turned out to be far superior when it comes to performance.  Which to me makes no sense at all.  To confirm what we saw from the client (we do a nightly refresh of some of the data, time went from 4.5 hours to 10+ hours) I loaded it into a query window to test the save stored procedure.  In one minute the new code could save it 8 times in one minute, the orignal 375!

    Anyone have any idea why this would be so radically different?

  • care to share with us the code? That would make it easier to give you a coherent answer that would be useful to you. Not necessary to compare both SP, just the new one using XML. Plus whatever definition files you are using.

    Even without that, it's possible that there's an external reference that's being invoked in the XML that did not occur in the original method. Other than that... no good ideas.

  • Sorry been busy.  Here is a sample of the code, the two XML data types being passed in the range of 150-350K, depending on the policy involved.  The code presented is the worst offending part of the stored procedure, there are 6 more similar inserts in the proc.  The result is 89 rows inserted for the data tested, but this varies widely.

    Now I have done a lot more testing to pinpoint the problem.  If you take out the insert statement and just run the select, it runs in about 0.2 seconds, 4.8 seconds with the insert... I also took out the part that is varchar(max) just to take that out of the equation and it made no difference.

    Tried select....into a #temp table, then insert into the real table from the #temp table.  The insert into the temp table then took up majority of the time.... Very strange.

    Any ideas would be appreciated.

     

    insert into dbo.Requirement(

    ExtractId,

    ItemKeyValue,

    RequirementId,

    SrcObjPath,

    SrcMember,

    CmpObjPath,

    CreatedOn,

    SatisfiedOn,

    SatisfiedSession,

    FollowupOn,

    FollowupActive,

    Severity,

    Override,

    Note,

    ParentKey,

    Message,

    Resend,

    InitialFlag

    )

    from

    (

    select

    ExtractId,ItemKeyValue,RequirementId,SrcObjPath,SrcMember,CmpObjPath,CreatedOn,

    SatisfiedOn,

    case SatisfiedSession

    when 0 then null

    else SatisfiedSession

    end as SatisfiedSession,

    FollowupOn,FollowupActive,Severity,[Override],Note,

    ParentKey,Message,Resend,InitialFlag

    from (

    select

    @ExtractId as ExtractID,

    xml_req.value(

    './ItemKeyValue[1]','int') as ItemKeyValue,

    xml_req.value(

    './RequirementId[1]', 'int') as RequirementId,

    xml_req.value(

    './SrcObjPath[1]', 'varchar(255)') as SrcObjPath,

    xml_req.value(

    './SrcMember[1]', 'varchar(255)') as SrcMember,

    xml_req.value(

    './CmpObjPath[1]', 'varchar(255)') as CmpObjPath,

    xml_req.value(

    './CreatedOn[1]', 'datetime') as CreatedOn,

    xml_req.value(

    './SatisfiedOn[1]', 'datetime') as SatisfiedOn,

    xml_req.value(

    './SessionId[1]', 'int') as SatisfiedSession,

    xml_req.value(

    './FollowupOn[1]', 'datetime') as FollowupOn,

    xml_req.value(

    './FollowupActive[1]', 'bit') as FollowupActive,

    xml_req.value(

    './Severity[1]', 'varchar(50)') as Severity,

    xml_req.value(

    './Override[1]', 'datetime') as [Override],

    xml_req.value(

    './Note[1]', 'varchar(max)') as Note,

    xml_req.value(

    './ParentKey[1]', 'int') as ParentKey,

    xml_req.value(

    './Message[1]', 'varchar(500)') as Message,

    xml_req.value(

    './Resend[1]', 'bit') as Resend,

    xml_req.value(

    './InitialFlag[1]', 'char(1)') as InitialFlag

    From @XML_data.nodes('//OutstandingRequirement') as reqs(xml_req)) as ColumnReqs

  • Did I mention I do not like XML?

    So on a whim I decided to try something a bit different.  Instead of doing an insert directly from the main XML blob, I decided to select out the parts I needed into a seperate XML variable before each insert, then parse that out for the insert instead.  Sounds way backwards, but sped up the stored procedure by a factor of 7, so now it is "only" 4.3 times as slow as the old original stored procedure....

  • Are you fighting lots of XML indexes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I might be missing something, but is there a reason to run the XML extract as a subquery?

    You have something like:

    insert into table (...) select ... from (select ... from xml)

    Could you just do insert into table (...) select ... from xml?

    We recently converted a lot of our procs that were using sp_xml_preparedocument over to the xml.nodes() approach (like what you're doing) and didn't see any performance issues with it.

     


    Greg Walker
    DBA, ExpenseWatch.com

  • I'm in the middle of a similar conversion, from OPENXML to xquery, and have similar questions.  When you do a showplan on these, there appears to be lots and lots of processing to evaluate each the .value expressions.  The more .value expressions, the more processing.  Is there some way to streamline that?  xml indexes?  

    How about reading the XML into an XML column in a work table, with xml indexes on the column, then querying the work table's XML to load the temp tables? 

  • I've seen the same issue when looking at query plans. I think this has to do with the 10,000 row assumption thing on working tables.

    If you run profiler against these types of procs, with statement level tracking, you'll see the XML parsing takes up very little. The query analyzer traces always show them as 100% (or close) of the workload, while profiler shows that reads/writes/duration are all very low for these.

    When first digging into it, the difference between the openxml and the xquery stuff was small enough to not be noticible and usually was lost in the noise.

    My bigger concern, and the main reason for the changeover, was the resource leaks with sp_xml_preparedocument if you didn't make the corresponding call to sp_xml_removedocument (we had a dev that never called the remove stuff, which was locking up resources on our server).

    The showplan stuff is annoying, though, as it makes it a lot harder to tune your procs if the xquery takes up the whole thing.


    Greg Walker
    DBA, ExpenseWatch.com

  • "My bigger concern, and the main reason for the changeover, was the resource leaks with sp_xml_preparedocument if you didn't make the corresponding call to sp_xml_removedocument (we had a dev that never called the remove stuff, which was locking up resources on our server)."

    That's been our concern as well, as we think we've seen the leftover xml docs cause havoc on our servers.  However, we can't prove it.  There's no way to explicitly identify or count XML docs that are in memory.

    Don't mean to hijack the thread, but how is memory management different with xquery?  If I have a 1mb xml doc as an input parameter to a sproc, where does SQL Server allocate the memory to process the xml doc? 

    Also, the if you measure the size of the query plan in the sproc cache, an xquery sproc plan it is larger than the equivalent OPENXML query plan, which I attribute to the additional overhead dealing with .value expressions.  Any thoughts on this?

  • I'm not at work, but will try to answer a few questions.  I have one primary XML and one secondary XML index on the column.  I did however remove them to test, and it made no huge difference in performance (about 5-10% faster without them on).

    I believe that sp_xml_preparedocument uses the same limited space as the query cache has on 32-bit systems.

    I did try the above query on a friends 64 bit machine, and it had the same performance difference.

    The subqueries made no difference, but is there because of some of the queries needed to convert some of the data.  Not needed on all of them, but for consistencies sake....  I tried taking it out and selecting straight from the XML into a table, and it was as slow.

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply