Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Inserting a 800 million records from Staging database to EDW, how to totally disable logging Expand / Collapse
Author
Message
Posted Friday, April 3, 2009 4:42 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 5:02 AM
Points: 201, Visits: 681
I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.
The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!
I've had the recovery on simple all the while, but even this is limiting my style.
What I would like to do is
1) Set logging totally off.
2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)
3) Also the EDW table is partitioned but Staging is not, if that makes any difference.
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.
I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.

Any comments welcome
Post #689719
Posted Friday, April 3, 2009 6:26 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 4:11 PM
Points: 15,646, Visits: 28,027
chris.stuart (4/3/2009)
I need to insert 800 million records (15 years worth of data) from my staging database to my final EDW database. The transform of the data in the table is straight forward and no problem. The problem starts with the amount of time that this query takes logging and database being offline or network problems.
The insert query (insert into edw select * from staging) takes about 2 hours to insert a years worth of data, but its' happened before that something goes wrong and a roll back happens. Now a rollback takes even longer!
I've had the recovery on simple all the while, but even this is limiting my style.
What I would like to do is
1) Set logging totally off.

You can't. It's inherent to operations within SQL Server. What you could do is to use SSIS or SQLCMD to perform a bulk insert operation. That would be minimally logged, reducing the overhead. But a straight INSERT... SELECT is going to be fully logged, no options.

2) Even if a rollback would have happened; I would like to decide to run a sub-query to insert the missing data, or maybe delete the inserted data myself and run the query again. (I would think that my delete would be quicker than a rollback in any case.)

Again, no real options here. You could break down the inserts so that there is a series of them and each, by itself, will cause less logging. It's a valid technique, but it will extend the time it takes to do this operation, not reduce it.

3) Also the EDW table is partitioned but Staging is not, if that makes any difference.

Not in terms of logging it doesn't. It might make a huge difference in terms of overall performance.

4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.
I know that people will respond that the logging is needed, but the DBA does backups every day and any data that’s not in the EDW, can easily be re-loaded from the staging again. The EDW should just be a container to receive data, not something that also tries to keep track of what data is received.

Any comments welcome


There's just no getting away from the log. Look up the various methods of using bulk operations.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #689786
Posted Friday, April 3, 2009 6:36 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 5:02 AM
Points: 201, Visits: 681
Hi Grant

Thanks for the reply. I think that you hit the sanil on the head with this 1. I knew if I just explain it, someone will have someone will find what I'm missing actually.
I'm actually using SSIS to populate from a flat file to the staging, and going from staging to EDW is also SSIS, with a SQL statement file to do the load into the EDW.
I'll rather then slit up the SQL statement file into smaller pieces or SSIS components, and use the SSIS data flow's OLE Component to control my Rows per batch and Max insert commit size.
that will sort out all of my problems with the added bonus that I can actually see how many records have been inserted (on Data Flow Path).
I was feeling a bit uneasy about the SQL statement file in the SSIS, and now I know why.
And thanks for the other options and suggestions as well.

Regards
Chris
Post #689798
Posted Friday, April 3, 2009 6:55 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 4:11 PM
Points: 15,646, Visits: 28,027
Cool. Glad I could help a little.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #689824
Posted Friday, April 3, 2009 9:30 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: 2 days ago @ 12:08 PM
Points: 1,414, Visits: 4,540
have you tried bulk logging? and make sure auto create and update statistics is off during the load. i've done replication snapshots of 300 million row tables and turning off the statistics creation and updates speeds things up

https://plus.google.com/100125998302068852885/posts?hl=en
http://twitter.com/alent1234
x-box live gamertag: i am null
[url=http://live.xbox.com/en-US/MyXbox/Profile[/url]
Post #690029
Posted Sunday, April 5, 2009 11:23 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 5:02 AM
Points: 201, Visits: 681
Hi I see that my auto stats was on, I've switch it off now and will see how that goes. Thanks
Post #690728
Posted Wednesday, April 14, 2010 6:38 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 19, 2012 12:03 PM
Points: 1, Visits: 13
Hi,
Not too sure if switching off the "auto stats" solved your problem.

One of the techniques that we have found useful is to batch inserts, updates and deletes.

Dean
Post #903070
Posted Wednesday, April 14, 2010 7:54 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,068, Visits: 4,639
chris.stuart (4/3/2009)
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.


lots of questions and I suffer from ADD syndrome so... let me answer the last one

Use small transaction size a.k.a. run your load in small batches.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #903145
Posted Wednesday, April 14, 2010 8:07 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 3:08 PM
Points: 3,149, Visits: 7,979
PaulB-TheOneAndOnly (4/14/2010)
chris.stuart (4/3/2009)
4) I would also like to set the logging db to maybe just a few gigs with the space being reused and records being committed as and when needed, but I keep on running into (The transaction log for database is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases) error.


lots of questions and I suffer from ADD syndrome so... let me answer the last one

Use small transaction size a.k.a. run your load in small batches.


I certainly would recommend small batches.




Alvin Ramard
Memphis PASS Chapter

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.
Post #903160
Posted Wednesday, April 14, 2010 1:51 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Yesterday @ 1:43 PM
Points: 11, Visits: 103
I don't know if this is an option for you or not, but you can also use a "select into" and that is not logged. We do that with some of our larger warehouse tables and it is much faster than reloading the existing table. After the new table has been created we just drop the old one, rename the new one and recreate the indexes. You do seem to lose some of the table dependency info when you do it like that, but there is no doubt it is faster and gets around the logging issue. It may be different in the newer versions, but it was thay way in SQL2000. Another benefit to doing it that way is you minimize the amount of down time while the table reloads. Our table happens to be hit quite frequently around the clock so this prevents issues with locking and blocking as well.
Post #903534
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse