I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.
The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!
I've had the recovery on simple all the while, but even this is limiting my style.
What I would like to do is
1) Set logging totally off.
You can't. It's inherent to operations within SQL Server. What you could do is to use SSIS or SQLCMD to perform a bulk insert operation. That would be minimally logged, reducing the overhead. But a straight INSERT... SELECT is going to be fully logged, no options.
2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)
Again, no real options here. You could break down the inserts so that there is a series of them and each, by itself, will cause less logging. It's a valid technique, but it will extend the time it takes to do this operation, not reduce it.
3) Also the EDW table is partitioned but Staging is not, if that makes any difference.
Not in terms of logging it doesn't. It might make a huge difference in terms of overall performance.
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.
I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.
Any comments welcome
There's just no getting away from the log. Look up the various methods of using bulk operations.
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore RooseveltThe Scary DBA
Author of: SQL Server Query Performance Tuning
and SQL Server Execution Plans
Product Evangelist for Red Gate Software