Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Inserting a 800 million records from Staging database to EDW, how to totally disable logging


Inserting a 800 million records from Staging database to EDW, how to totally disable logging

Author
Message
chris.stuart
chris.stuart
SSC Veteran
SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)

Group: General Forum Members
Points: 214 Visits: 776
I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.
The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!
I've had the recovery on simple all the while, but even this is limiting my style.
What I would like to do is
1) Set logging totally off.
2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)
3) Also the EDW table is partitioned but Staging is not, if that makes any difference.
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.
I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.

Any comments welcome
Grant Fritchey
Grant Fritchey
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17586 Visits: 32265
chris.stuart (4/3/2009)
I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.
The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!
I've had the recovery on simple all the while, but even this is limiting my style.
What I would like to do is
1) Set logging totally off.

You can't. It's inherent to operations within SQL Server. What you could do is to use SSIS or SQLCMD to perform a bulk insert operation. That would be minimally logged, reducing the overhead. But a straight INSERT... SELECT is going to be fully logged, no options.

2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)

Again, no real options here. You could break down the inserts so that there is a series of them and each, by itself, will cause less logging. It's a valid technique, but it will extend the time it takes to do this operation, not reduce it.

3) Also the EDW table is partitioned but Staging is not, if that makes any difference.

Not in terms of logging it doesn't. It might make a huge difference in terms of overall performance.

4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.
I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.

Any comments welcome


There's just no getting away from the log. Look up the various methods of using bulk operations.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
chris.stuart
chris.stuart
SSC Veteran
SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)

Group: General Forum Members
Points: 214 Visits: 776
Hi Grant

Thanks for the reply. I think that you hit the sanil on the head with this 1. I knew if I just explain it, someone will have someone will find what I'm missing actually.
I'm actually using SSIS to populate from a flat file to the staging, and going from staging to EDW is also SSIS, with a SQL statement file to do the load into the EDW.
I'll rather then slit up the SQL statement file into smaller pieces or SSIS components, and use the SSIS data flow's OLE Component to control my Rows per batch and Max insert commit size.
that will sort out all of my problems with the added bonus that I can actually see how many records have been inserted (on Data Flow Path).
I was feeling a bit uneasy about the SQL statement file in the SSIS, and now I know why.
And thanks for the other options and suggestions as well.

Regards
Chris
Grant Fritchey
Grant Fritchey
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17586 Visits: 32265
Cool. Glad I could help a little.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
alen teplitsky
alen teplitsky
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1595 Visits: 4621
have you tried bulk logging? and make sure auto create and update statistics is off during the load. i've done replication snapshots of 300 million row tables and turning off the statistics creation and updates speeds things up
chris.stuart
chris.stuart
SSC Veteran
SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)SSC Veteran (214 reputation)

Group: General Forum Members
Points: 214 Visits: 776
Hi I see that my auto stats was on, I've switch it off now and will see how that goes. Thanks
Dean.chetty
Dean.chetty
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 13
Hi,
Not too sure if switching off the "auto stats" solved your problem.

One of the techniques that we have found useful is to batch inserts, updates and deletes.

Dean
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3093 Visits: 4639
chris.stuart (4/3/2009)
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.


lots of questions and I suffer from ADD syndrome so... let me answer the last one :-D

Use small transaction size a.k.a. run your load in small batches.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Alvin Ramard
Alvin Ramard
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2676 Visits: 11590
PaulB-TheOneAndOnly (4/14/2010)
chris.stuart (4/3/2009)
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.


lots of questions and I suffer from ADD syndrome so... let me answer the last one :-D

Use small transaction size a.k.a. run your load in small batches.


I certainly would recommend small batches.



Alvin Ramard
Memphis PASS Chapter

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help
dawalker-1068762
dawalker-1068762
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 106
I don't know if this is an option for you or not, but you can also use a "select into" and that is not logged. We do that with some of our larger warehouse tables and it is much faster than reloading the existing table. After the new table has been created we just drop the old one, rename the new one and recreate the indexes. You do seem to lose some of the table dependency info when you do it like that, but there is no doubt it is faster and gets around the logging issue. It may be different in the newer versions, but it was thay way in SQL2000. Another benefit to doing it that way is you minimize the amount of down time while the table reloads. Our table happens to be hit quite frequently around the clock so this prevents issues with locking and blocking as well.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search