Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging Expand / Collapse
Author
Message
Posted Wednesday, May 26, 2010 9:17 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: 2 days ago @ 1:30 PM
Points: 379, Visits: 558
Please accept my apology if this post is in the wrong forum.

I'm trying to bulk load two tables with mimimal logging. I started by setting the recovery model to Bulk-logged. Here's the pseudo-code for each table:

TRUNCATE TABLE table_name

INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index

I can backup the transaction log and shrink it down to 50 MB. After running the code for the two tables, the transaction log has grown to about 560 MB, which is about the same size as the primary data file (.mdf file).

Each table has exactly one index, which is non-clustered. (There are no primary keys, and no unique constraints.) I tried dropping the index before the insert statement, and then recreating the index after the insert statement as follows:

TRUNCATE TABLE table_name
DROP nonclustered_index

INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index

CREATE nonclustered_index

Once again, I backup the transaction log and shrink it down to 50 MB. After running the code for the two tables, the transaction log stays at 50 MB.

In my mind, the 2nd scenario achieved my goal of bulk loading the two tables with minimal logging. But I don't know why I had to drop/create the indexes. I'd prefer not to do this. This documentation got me headed in the right direction:

http://msdn.microsoft.com/en-us/library/ms190422.aspx

But the question remains: If the tables are truncated before the bulk load, why do I have to drop/create indexes? (I am using SQL 2005, and the documentation is for SQL 2008.)

What am I missing?


Dave Mason
Orange County, FL
Post #928372
Posted Wednesday, May 26, 2010 2:46 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: 2 days ago @ 1:30 PM
Points: 379, Visits: 558
Followup...

I'm really reluctant to drop and recreate indexes. Let's say another dba or developer changes the index (add/remove columns to/from the index, change the FILLFACTOR, etc). My code would nullify those changes because I hard-coded the steps to create the index. Not good!

A better option is to disable the index and rebuild it later. (Thanks to Andy Warren for his article about Disabling Indexes!)

Here's my revised pseudo-code:

TRUNCATE TABLE table_name
ALTER INDEX nonclustered_index ON table_name DISABLE

INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index

ALTER INDEX nonclustered_index ON table_name REBUILD

If I get really motivated, I can programatically determine the non-clustered indexes and disable/rebuild them.

Still, after reading the documentation on MSDN, I'm confused as to why any of this matters when the tables are truncated first...


Dave Mason
Orange County, FL
Post #928622
Posted Sunday, August 29, 2010 3:46 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, June 16, 2014 9:38 AM
Points: 2,163, Visits: 2,189
Dave Mason (5/26/2010)

Still, after reading the documentation on MSDN, I'm confused as to why any of this matters when the tables are truncated first...


It's been a while, but in case you haven't figured it out, the reason that the log files stay so much smaller when you either DROP or DISABLE the non-clustered indexes is that if the indexes are active while you are loading data into the table SQL Server has to log all of the updates to the indexes. So as a record is inserted it has to log the page changes, and page splits. When you DROP or DISABLE the index first, and then CREATE or REBUILD it after the data is in, you are essentially creating a brand new index.

The two main good side-effects of doing this are that:
* the fragmentation in your indexes will be much less, making for better performance.
* It is normally faster to insert the data and then index. (It is faster to create the index all at once than to modify it row by row to keep it up to date.)

I hope that helps explain things.
Post #977118
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse