XML dump of major dataset

  • They company I work for wants to do dump a significant part of our data set to an XML file once a day. Right now I have written a quick and dirty VBScript that is set to run but this is proving to be seriously process intensive, i.e. it is running the server that the script lives on at 100% constant for 3 - 4 mins at a time with the whole script taking about 12 mins to complete. And that situation can only get worse as our data set expands at the right it is.

    I need to figure out a better way of dumping this data to the XML file. Here are the issues I have with it currently:

    1) 3 different databases are involved in generating it

    2) 11 different table content the data (5 containing primary content with only some related information)

    3) Although there are common basic information between the 5 tables there is a significant number of columns that are required that they have nothing in common.

    4) If it weren't for the fact that one of the main requirements is that it is a single XML file, I would simply use something like Web Assistant to build individual XML files for each of the master tables

    5) The master tables have an every expanding data set ranging from 100s of rows to 1000s!

    Now I have been doing some research into what options that leaves me with and my thoughts fall into 2 approaches:

    1) Use some sort of advanced Web Assistant task to do the job. Something I know very little about but from the day or 2 researching it doesn't look too difficult to get to grips with.

    2) Write a windows service that I can then prioritise is resource usage. This looks like the more complex and time consuming approach and so, probably wont wash with my boss!

    Just trying to reach out and see if there are any other options or suggestions out there.

    Square-eyed-over-too-much-XML

  • I had a similar kind of situation, therefore, thought to share, may be this gives you some idea:

    I had a very large XML file received from one of my company's business partner. What I need to do with that XML file was to load some data from it. Because I don't need all the data from this huge XML file, I end up finding two approaches for this:

    1. Used third-party ODBC Driver for XML in SQL DTS to load required data directly from XML to my tables. This gave very efficient results wrt time & resources.
    2. Other option which I used for this problem was transforming this Huge XML into new xml file with only required elements leaving behind useless data. (This option depends upon situations, with mine it was better option.)

    If you need to know about more or about 3rd ODBC driver, you can email me, as I don't know if we can post any 3rd party application names here.

     

  • how does your 'quick and dirty' vbscript perform the operation?

    you could use a 'for xml' query to get xml data out into an ado stream and write it to a file. you could also open the recordset directly and perform a .save with adpersistxml flag set. or you could use a shaped recordset and save that with .save as well if it's more complex. if i have to save a chunk of xml data i normally use the stream method if i'm using vbscript, but the rs.save method is pretty fast. i just normally end up having to re-style that for it to be useful for my needs.

  • or a possibly faster route may be to create a dts package that performs all your dumps in parallel to seperate files, then have a script merge them all together in one file. that would make the dump faster, but you would still have to wait on the merge.

  • Well, a couple of questions come to mind.

    1) Is the XML you are generating only new data or is it basically a copy of all the data in the tables?

    2) How often does the data you are retrieving change?

    3) Do you have a substantially large number of rows that need to be retrieved that may not have been changed?

    What you could do is implement a strategy where you only retrieve those rows that have changed since the last time you generated the XML file and append and merge with the existing XML file. This would only be helpful tho if you have alot of records that are not changing from day to day. To implement you could just create triggers that insert into some other audit table and then using that audit table(s) as a basis for your export.

    Again - this only helps if you have a substantially large number of records AND a large part of them are not changing AND you need basically the whole table.

    Cheers

    TNS

    SQL noob

  • The Quick and Dirty VBScript use MSDataShape to do 5 separate calls to the DB - one for each major table. It then merges the recordsets into one masssive VB array which it goes through build sections of the file and dumping it to disk.

    I did consider doing for xml but as the field names for a lot of the data don't match up it is very trick to build the correct XML file that way.

  • 1) Basically a copy of the data in the tables

    2) There is always some changes to some of the rows on a daily basis, be it new rows or just updates to current rows

    3) Possibly, we do have a last edited field but working out how to filter it in to the file would probably be just as intensive resource wise and just redoing the file each time. Admittedly it might not have to do it so often.

    There is approx. 15,000 long rows of data that get retrieved.

  • How about, create views on the tables you need. Taking into consideration including only fields you need and naming them with teh required names - then using For Xml query on the views.


    -Ken

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply