October 18, 2006 at 3:00 am
Hi all, my problem is this.
I have a huge xml feed that I get updates on every minute. Since it is complex enough I chose to insert it in SQL server 2005 relational tables before displaying any sort of data on my website.
The data in the XML are stored as different nodes representing different objects and they need proccesing before insert, so I cannot use bulk insert to do the job.
I also found out that using stored procedures is slower  than gathering 20-40 queries in a string and issuing all inserts at once.
 than gathering 20-40 queries in a string and issuing all inserts at once.
Do you thing I'm going the right way? Is there anything else that i can use to make things quicker?
October 18, 2006 at 10:23 pm
I actually built a solution similar to what you're describing using Bulk Load from the SQLXML. I'm actually processing large FIXML (Financial Information eXchange) files with size in the 50+ MB. Here are the steps to follow:
I have oversimplify this process. It is kind of complex but very, very fast. You can find the NXSLT2 utility and SQLXML by googleing these terms. Once you install the SQLXML, you will find plenty samples in the documentation. If you know Spanish, I wrote an article about this process which you can find at http://syoungdesigns.blogdns.com/
[font="Verdana"]Sal Young[/font]
[font="Verdana"]MCITP Database Administrator[/font]
October 19, 2006 at 3:48 am
Unfortunately I don't know spanish but thanks for your reply.
I' ll give it a try.
Thanks again
October 19, 2006 at 1:49 pm
I have found that a much faster way of parsing large amounts xml for insertion into a table is to use XmlReader. XmlReader gives you very fast, forward only read access to an xml blob. As you read through the xml, create a csv file that you can then bulk load into sql server with bcp.exe (or BULK INSERT/DTS/SSIS etc).
Using this technique I parsed 1TB of xml into relational tables in sql in 15hrs (about 1gb/min).
If you use the xslt approach, you suffer from having to bring the entire xml file into memory in order to transform it. It's much more efficient to use XmlReader if the transform is relatively simple (and xml to csv is very simple).
---------------------------------------
elsasoft.org
October 20, 2006 at 2:04 am
jezemine I think I would try your solution, too.
Actually what I do right know is to read the Xml with XmlReader and create batches of SQL insert statements. This used to work pretty good for my needs (100Mb+ xml file with 200000 inserts) in a shared hosting environment but it seems that the server is getting crowded cause I'm noticing worse and worse performance. This is why I'm searching for an other solution.
I' m just wondering though what sort of hardware do you use to achieve 1gb/min. It just seems too good.
Anyway your solution is closer to what I have already done so it is worth a try...
Thank you
October 22, 2006 at 8:38 am
To get the 1gb/min, I ran the parser app on one server (dual proc hyper threaded, 8gig ram) and SQL Server on another (same specs). Each server has 1gbit NICs, so that helped a lot with sending so much data 
The reason your method slow is because you are generating INSERT statements, instead of bulk load files. I think you'll find that the inserts are your bottleneck, not the XmlReader. If you switch to using bulk insert, it will be much faster.
To make bulk inserts fast, drop all indexes on the table you are inserting to, and use the TABLOCK hint. also you should use the -b flag on bcp.exe so that you don't insert the whole file as one big transaction (this could blow your transaction log if there are millions of rows in the file). Generally i use -b100000.
More tips here: http://www.databasejournal.com/features/mssql/article.php/3095511
Finally, in my case I had one process parsing the xml and creating the csv's, and another process bcp'ing the csvs into sql server.
---------------------------------------
elsasoft.org
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply