Inserting a 800 million records from Staging database to EDW, how to totally disable logging

  • chris.stuart (4/19/2010)


    Jonathan Mallia (4/16/2010)


    if you have any indexes, disable or drop them, and recreate afterwards.

    Also, do the same from any triggers.

    Furthermore, set the database recovery mode to bulk logged and reset to full recovery after data load...

    Hope this helps 🙂

    I've found on 1 of my other tables with also about 900mil records that its actually way faster to disable and rebuild the indexes on the table. I left the query running last week to check on the insert of about 4 mil records without disabling the indexes and it took more than 24 hours, after which I canned the insert. With the indexes disabled, (In the script I disable the indexes and rebuild them when theres more than 50K records), it takes no more than 2 hours to bulk insert the data and to rebuild the 3 indexes on the table, 2 hours to 24hours+? Rather take the 2 hours

    And as for triggers, I stay as far as possible, away from them.

    I've not thought about changing the the recovery mode, but will give this a try aswell, I'm sure it will give a huge improvement as well.

    Thanks

    thx for reminding me about this feature

    going to try your method

  • Hi, I am new to SSIS. I need to pull data from EDW.

    can you please tell me which connection manger (OLEDB or ADO.NET ...)do we need to use to connect to EDW and how can I do that?

    Advanced Thanks ...

  • dawalker-1068762 (4/14/2010)


    I don't know if this is an option for you or not, but you can also use a "select into" and that is not logged. We do that with some of our larger warehouse tables and it is much faster than reloading the existing table. After the new table has been created we just drop the old one, rename the new one and recreate the indexes. You do seem to lose some of the table dependency info when you do it like that, but there is no doubt it is faster and gets around the logging issue. It may be different in the newer versions, but it was thay way in SQL2000. Another benefit to doing it that way is you minimize the amount of down time while the table reloads. Our table happens to be hit quite frequently around the clock so this prevents issues with locking and blocking as well.

    Don't want to burst your bubble but select into is still logged. It may be minimally logged, but it is still logged.

  • ksvr99 (7/25/2011)


    Hi, I am new to SSIS. I need to pull data from EDW.

    can you please tell me which connection manger (OLEDB or ADO.NET ...)do we need to use to connect to EDW and how can I do that?

    Advanced Thanks ...

    This is a 2 year old thread and also one which is not really applicable to what you wish to do. Please start a new thread on the appropriate forum.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I also heard that SELECT INTO was a non-logged operation...

  • DVaughan 11378 (7/27/2011)


    I also heard that SELECT INTO was a non-logged operation...

    There are plenty of rumors / wrong thing being thoughts, this is one of them.

    It's minimally log, but it's still logged to keep the integrity of the db.

    Plz take this conversation on a new thread if you wish to continue.

  • chris.stuart (4/3/2009)


    I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.

    The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!

    I've had the recovery on simple all the while, but even this is limiting my style.

    What I would like to do is

    1) Set logging totally off.

    2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)

    3) Also the EDW table is partitioned but Staging is not, if that makes any difference.

    4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.

    I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.

    Any comments welcome

    BCP out the data to a data file and then BULK INSERT data to target table using option TABLOCK and a reasonable batchsize. (You need to determine the appropriate batch size depending on your environment). i have found this approach to be much faster.

    Amol Naik

  • I know this is an old post, but just want to add my two pence...

    Select into a a good option as it is minimum logged. Also break up your insert to align with target table partition which enable you to switch in the new partition after it is loaded and indexed. Then repeat the process for all other partitions.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply