Extremely Large Data Load

  • We have an asp.net ap that generates 12 million rows of data that needs to be inserted into a single table. The ap does a calculation takes about 20 minutes to run, but the insert takes the better part of 3 hrs. I've tried several approaches none which have been suitable.

    First I had the developer pass me the data in xml format 200,000 rows at a time (larger datasets failed on the web server where the ap is running.) This took roughly 3.5 hrs to load.

    Next I had the developer pass me data row by row(I wasn't too concerned with contention here... there's only 2 users for this ap.) This took roughly 3 hrs to load.

    I want to try using BCP now. Is there a way to load via bcp directly into table from a file in byte format ? Can you bcp directly from a smo object and a byte stream ?

    Does anyone have any other suggestions on how to handle this ? I thought about using horizontally partitioning the data and loading it via multiple threads into several different tables, but we have performance issues on the web server with this approach.

    TIA,

    MKulangara


    Mathew J Kulangara
    sqladventures.blogspot.com

  • Avoid XML like the plague.

    If you are using BCP there is command line parameter that lets you specify the batch size for each transaction. I tend to set it to 100,000 records at a time.

    You could use the DTS Wizard to do the import if you are using SQL2000, or SSIS in SQL2005.

    Make sure your database is large enough to take a 12 million row insert without triggering a file expansion.

    Consider change your database recovery mode to BULK_INSERT.

    Drop any unecessary indexes and constraits prior to the INSERT, you can always put them back on later.

    Make sure your input file is on the SQL Server box and not read across the network.

    12 million rows should take minutes, not hours.

  • What type of hardware is the database server on?

    Regardless of the approach if the hardware cannot keep up its going to be slow.

    12 million rows will only take minutes if you have the hardware to support it. We have some very high end hardware here and I could probably do this in about a minute. At my last company this would have taken at least an hour becuase of disk contention. 

    Is this a dedicated SQL Server? How much other activity is going on at the same time?

    How many drives are the database and log files spread accross, and do they share the same drives?

    Have you looked at performance monitor and to see what how big the disk write queue is?

    I do agree with the suggestions above but again it can only go as fast as the hardware so I would check that out before you spend to much more time trying to tune this process.

     

     

  • I would definitly look into BCP (almost exclusively THE way to go when handling big dataloads) and the use of partitioning (so you ALWAYS load into an empty 'partition' THEN you SWITCH it).

    I guess one of the things that breaks you is the indexes defined on the table (with partitioning you could easily smack them on after the dataload has happend).

    Without partitioning: If the data to be inserted is small (less then 20-30% of the existing data) you might not gain anything by removing the indexes before the data load and recreating them after..

    BCP:

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/sqlcmpt9/html/c0af54f5-ca4a-4995-a3a4-0ce39c30ec38.htm

    SWITCH:

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/f1745145-182d-4301-a334-18f799d361d1.htm

    Regards,

    Hanslindgren

  • Thanks for the responses. When I use Bulk Insert I was able to reduce the insert time to about 4.5 minutes.

    Currently the software architecture is as follows:

    Calculations from a .net front end: 10 minutes

    Write text file to bulk insert to a network location: 50 minutes

    Bulk Insert into DB 4.5 minutes(tried BCP as well.. which ran about 4.8 minutes)

    My question is:

    1. is it possible to redirect a byte stream to bcp or bulk insert in some way using SMO ? This way I don't have to have the developer create an actual file

    2.Can BCP or Bulk Insert accept a file in binary format ??

    TIA,

    Matt Kulangara


    Mathew J Kulangara
    sqladventures.blogspot.com

  • 1. I do not know it you can send a stream to either BCP or BULK INSERT. But it sounds probable that you could do it some way. Though saving the file for some time could actually be valuable for backup and the ability to verify what is going on in the 'pipe line'.

    2. Yes, both can accept Binary files. But that depends on what you mean. They can only except Native files as binary (the format BCP/Bulk Insert is using when it outputs Native files). I am not sure how easy it would be to find those specifications.

    Regards,

    Hanslindgren

    P.S I would look into having one additional step: Having the textfile compressed while sending it over the network, possibly though a .NET connection wich compresses before the send and decompresses as it reaches the server. THEN do you BULK INSERT or BCP...

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply