Using OpenXML

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/rVasant/usingopenxml.asp

  • Good Article!!..Simple presentation....Keep it up..

  • Thought this was great myself, keep em coming!

  • Nice article.

    You might consider adding information about performance management. For example, maintaining the document handle across multiple queries and/or procedure calls is pretty expensive. Generally, if we run into a situation where the XML is used in more than a single statement, we load the data into a temp table and close the document as soon as possible before doing more processing. It shows quite a performance improvement.

    Also, another performance issue, we found that element centric XML code is pretty fat, causing more networking and memory issues(every column has <dude></dude> as well as the data as well as the rows). As a standard in our organization we use attribute centric (where columns get @dude="", reducing the file size and memory footprint considerably).

    Excellently written.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Good Article Vasant, wrote in a very simple and easy way.

    I have one doubt here about  sp_xml_preparedocument.

    In BOL and also in the mentioned article XML document/content is assigning to a varchar variable and using in sp_xml_preparedocument. Instead of using the variable, can we also use a path where the XML file exists. If so how do we need to mention.

    Can any one help regarding this with a simple syntax?

     

    Thanks,

    Subhash

  • "can we also use a path where the XML file exists."

    I'm pretty sure that you can't reference files directly from within TSQL. If you really, really HAD to do this, you have to open a command shell and/or create objects with sp_oa* procedures. I'd strongly recommend against both. They're basically stepping outside of the realm where TSQL peforms consistently and well.

    You know that you can pass procedures a TEXT data type so that you're not limited by the 8K of varchar (or 4k of nvarchar)? That's how we do it now. If you have to load really huge files, instead of using OPENXML, look into XML Bulk load in BOL.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Could someone give me an example of how I might be able to import an XML Doc that has this structure? I currently use this structure when Exporting / Importing data from any table in my DB. It's simple and understandable. Sometimes, this file can be about 50MB though. How would that be possible using a Stored Proc?

    <DATASOURCE Name="Users">

    <ROW>

    <COLUMN Name="ID"><![CDATA[1]]></COLUMN>

    <COLUMN Name="FirstName"><![CDATA[George]]></COLUMN>

    <COLUMN Name="LastName"><![CDATA[Wilson]]></COLUMN>

    </ROW>

    <ROW>

    <COLUMN Name="ID"><![CDATA[2]]></COLUMN>

    <COLUMN Name="FirstName"><![CDATA[Martha]]></COLUMN>

    <COLUMN Name="LastName"><![CDATA[Wilson]]></COLUMN>

    </ROW>

    </DATASOURCE>

  • A followup to you your comment about


    "Generally, if we run into a situation where the XML is used in more than a single statement, we load the data into a temp table and close the document as soon as possible before doing more processing"


    It is  CRITICALLY important to do this. There is actually a bug in OpenXML that will cause you to get unexpected crashes when you use the OpenXML over and over.  It might not bite you every time, but I've had first hand experience with it.

    We had an application that we used OpenXml with in testing. The stored proceedure had been finished and had been working for 6 months when one day, the stored proccedure starte crashing for NO reason. - The XML was valid, and had worked the day before, and nothing had changed, but it simply started crashing. A consultant who was working with us, who had worked in the SQL group at Microsoft said that he had seen that before, and to do the OpenXML, get the data into table variables, and close the xml as quickly as possible. We ended up implimenting that change, which was fairly trivial and poof - the problem disappeared as quickly as it had appeared. I seem to recall that the root of the problem was a memory leak in the OpenXML code.

     

     

     

  • Yes, it is critically important to call sp_xml_removedocument. The statement in the article "This handle is valid for until the connection is reset, or until the execution of sp_xml_removedocument." is correct but does not mention the bug you ran into. If you do not call sp_xml_removedocument the handle is orphaned and will not be released. Overtime your handle count is going to increase and your server will eventually crash. (You can watch the handle count increase for sqlservr.exe by using the process tab of the task manager and adding the column for handles.)

    Therefore our standard is to read the xml document into a temp table and immediately call sp_xml_removedocument before there is a chance to orphan the handle.

     

  • Just to clarify:

    In our case we were calling sp_xml_removedocument EVERY time. The problem we ran into is not from the orphans, but from repeated selects from the xml. So, I agree that it's vital to call sp_xml_removedocument everytime, but it's also vital to only make a limited number of selects directly from the xml, which BOL doesn't emphasis enough. Also, the problem we ran into wasn't a server crash, just an unexpected failure of stored proccedure.

    Like you, our practice now is to make only one select against the xml - so you can get it into a temp table or table variable, then close it.

    I just wanted to mention this so others don't have to go thru the headache I went thru.

     

     

  • That's good to know. We were doing it for performance reasons, not because of failures in the code, but if we dodged a bullet, so much the better.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Tricky one...

    i m working on it ... wait for some time

    Are you in need of a generic stored proc. (to insert data in any table) ???

    If not, then solution is simple...

  • nice suggestion for using temp tables...

    Its better to use them if the XML files are large...

    If any exceptions occurs, temporary tables are not destroyed from the memory.

    This can be a big problem.

    Instead, one can use variable of TABLE Type.

    thnx.

  • Well, if you can make it for one table, I can probably modify the Proc to handle any table. Also, I can have the Proc auto-magically figure out the PKeys for the Table it's importing, so don't worry about that either.

  • Good job.

    But in practical scenario where we have a huge data, say 2,000,000 records to be inserted into the database, the OpenXML command doesnt seem to be satisfactory.We are facing this problem where we have used this OpenXML which consumes about 15secs for inserting 65,000 records in a single user environment.When it is concurrent users, the performance of the entire system is poor.

    We came across another component 'XML Bulk Load Component' for the same scenario.But wasnt able to proceed much on this.Would like to have comments from anyone who have knowledge on this.

    Thanks & Regards,

    Ramya

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply