parse XML file

  • We have a parse XML to Int function, it is used in a couple of stored procedure process.

    But I found the execution plan for this cost 99% in the excution of stored procedure.

    How can we optimize this function?

    CREATE FUNCTION [dbo].[ParseXmlToInt](@xml XML)

    RETURNS TABLE

    AS RETURN

    (

    select ids.id.value('@id', 'int') as id from @xml.nodes('//r') ids(id)

    )

    GO

    Thanks

  • By specifying '//r' for the xpath, you say SQL server needs to traverse the entire document in search of any elements called 'r', anywhere in the document.

    Do you know anything on the format of the documents you're passing in? for example if you know the document always contains 'r' elements as the root element(s) (the parameter is an xml fragment without a real root node, probably taken from a bigger document). Something like <r @id="1"/>

    <r @id="2"/>. Then your function would only need to read these root elements, so the xquery search will be much faster. This would be the query in the function: Note the xpath

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/r') ids(id)

    If the parameters' format has a root element called 'root', the query would be:

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/root/r') ids(id)

    If there are many elements called 'root', but you only the r-elements from the first root, the query will be:

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/root[1]/r') ids(id)

    The point is, think well about your xpath: be as specific you can be, so that sql server can find out which parts of the document it needs to read and which parts it can skip. Performance of xpath queries is all about scans and seeks, just like in 'normal' T-SQL.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Thank you.

    The xml we have is like below format: Total 60000 records.

    SET @EmpIDs='<ROOT>

    <r id="5551189" />

    <r id="5578554" />

    <r id="5587894" />

    <r id="5602958" />

    <r id="5631882" />

    <r id="5643028" />

    <r id="5643069" />

    <r id="5648894" />

    <r id="5648969" />

    <r id="5650411" />

    ......

    <r id="8015231" />

    </ROOT>'

    And we are currently use

    select ids.id.value('@id', 'int') as id from @xml.nodes('//r') ids(id)

    Is that correct?

    It looks like we should use below from your previous post, will this improve performance

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/root/r') ids(id)

    Thanks

  • sqlfriends (10/6/2011)


    It looks like we should use below from your previous post, will this improve performance

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/root/r') ids(id)

    XML is case sensitive, so that should be

    select ids.id.value('@id', 'int') as id

    from @xml.nodes('/ROOT/r') ids(id)

    in order to match your sample data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I've created an XML document as per your sample above, and used copy and paste to extend it to 80,000 entries. On my machine, your function parses it and returns a 80,000 row table in 881ms.

    I would say that's quite reasonable performance, what sort of performance would you find acceptable?

  • ..and half that time is just writing the output to the screen.

    Setting "discard results after execution" results in an execution time of 420ms.

  • paul_ramster (10/6/2011)


    I've created an XML document as per your sample above, and used copy and paste to extend it to 80,000 entries. On my machine, your function parses it and returns a 80,000 row table in 881ms.

    I would say that's quite reasonable performance, what sort of performance would you find acceptable?

    Thanks,

    We have a stored procedure that uses the function, and from the excution plan of running the stored procedure, which total takes 40 seconds, in the excution plan of the stored procedure, I can see 99% cost is on the function for reading XML into a table.

    That's why I think this function needs to be improved for performance.

  • That means that the optimizer estimated the cost of the function as 99% of the total cost of the query, which is not quite the same thing as it actually being 99% of the cost in practice.

    I think my previous test suggests that the optimizer is not getting the cost estimate correct for your query. It's quite likely that something else in your query is taking the time. Could be some outdated stats, or skewed distribution in the data or something, but we'd need more details to see if that is true.

    A copy of the execution plan XML for starters.

  • I run the sproc again, and include the actual excution plan in the attachment.

    Thanks

  • Thanks for the attempt, but that is not the execution plan. That is a picture of a part of the execution plan. Please have a look at this article http://www.sqlservercentral.com/articles/SQLServerCentral/66909/ on how to get and post your query's execution plan and try again.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply