Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Transaction Log Growing Despite Simple Recovery Expand / Collapse
Author
Message
Posted Monday, April 19, 2010 10:55 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 6:26 AM
Points: 419, Visits: 722
I have a data warehouse database which is set to simple recovery. The transaction log was originally set to 1G, but has grown to 9G. If I shrink the file, it grows to 9G the next day during the data load process. Once the process is complete, none of the transaction log is being used, but it still takes up a great deal of space on disk.

Through monitoring, I was able to determine the cause of the transaction log bloat. I have determined that a single stored procedure, which runs at the very end of the data load process, is the cause.

The stored procedure is fairly simple. It has two INSERT/SELECT statements, which move data from the temporary data load tables to the actual data warehouse fact table. The first INSERT statement moves about 800,000 rows from these tables. The second INSERT statement creates about 200,000 rows in the fact table by querying the fact table itself. All of the transaction log bloat appears to be happening during the first INSERT statement.

The first INSERT statement also performs a couple of lookups from tables in other databases. The data from our warehouse comes from multiple systems. System-dependant lookups and conversions are performed earlier in the process. The data union also happens just before this step. The purpose of this step is to move the data, while performing all the lookups/conversions that are system-independant. There are only a couple.

However, for consistency across applications, one of the lookups is applied as a table-value UDF, which is OUTER APPLY'd against the data. Since this is the one thing that truly differentiates this query from the others, I suspect it is the cause. But I can't be sure.

The inline UDF is little more than a few joins wrapped into a function (that should theoretically be expanded inline). It returns all of the data from those joins (about 8 columns), all of which are used in the data warehouse. I could theoretically expand these joins inline manually, but that wouldn't meet our business need very well. The business rules behind this lookup change often, and other applications and reports use this same data. I needed a single point where I could change the business rules and have it effect all applications that used it. The inline UDF served this purpose. Also, I don't know that this is the actual problem, nor can I be sure that expanding the function inline will actually solve it. I am hesitant to make such a huge change without gathering more information first.

Has anyone run into something like this before. Does it sound like the OUTER APPLY is causing the transaction log bloat? Or could it possibly be something else?

Specifically (without being to verbose or giving away confidential information), the query is an INSERT/SELECT with a long list of fields. A couple of fields have simple CASE/COALESCE conversions. The FROM statement is as follows:

FROM
temp.OrderPrep OP
LEFT JOIN dim.Customer C on C.CustomerID = OP.CustomerID
OUTER APPLY Lookup.dbo.fn_SalesRegion(
CASE
WHEN C.CustomerType = 'OEM-N' then 'NAT'
WHEN C.Channel = 'EXPORT' then 'EXP'
ELSE ShipToState
END, ShipToCountry) SR
OUTER APPLY Lookup.dbo.fn_IRRegion(ShipToState, ShipToZipcode, ShipToCountry) IRR

There is no WHERE statement. It loads the entire table.

I can post the UDFs if need be, but they are just inline UDFs that are fairly simple SELECT/FROM/WHERE queries.
Post #906171
Posted Monday, April 19, 2010 11:06 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 9:07 PM
Points: 22,530, Visits: 30,309
All of this:
The first INSERT statement moves about 800,000 rows from these tables. The second INSERT statement creates about 200,000 rows in the fact table by querying the fact table itself.


Is written to the log file before it is written to the database (not completely true, as some data may get written to the database before it is all written to the transaction log). The reason for this is in case of a failure and the transaction needs to be rolled back before it is completed (such as a server crash).

If the transaction log has grown to 9 GB during this process, then that is how large it needs to be. If you are concerned about the size of the transaction log, can this stored procedure be modified to complete the load in smaller batches, say of 100,000 rows each instead of 800,000 and 200,000 respectively?



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #906181
Posted Monday, April 19, 2010 11:13 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, April 11, 2014 9:14 AM
Points: 302, Visits: 256
I had a similar issue in our production environment where our transaction log was growing to unacceptable sizes after bulk data loads. In my case, the transaction log was not needed so I would manually truncate it after the inserts to get it back down to an acceptable size. I added this query at the end of my stored procedure to perform this:

EXEC sp_dboption 'db_name', 'trunc. log on chkpt.', 'TRUE'
USE db_name
CHECKPOINT
GO
DBCC SHRINKDATABASE (db_name, 10)


Danny Sheridan
Comtekh, Inc.
Post #906188
Posted Monday, April 19, 2010 11:52 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 2:55 PM
Points: 41,570, Visits: 34,495
Danny Sheridan (4/19/2010)
EXEC sp_dboption 'db_name', 'trunc. log on chkpt.', 'TRUE'


That option was deprecated back before SQL 2000. It should not be used any more and the replacement is Simple recovery.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #906231
Posted Monday, April 19, 2010 12:14 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:42 PM
Points: 14,840, Visits: 27,315
It sounds like it's doing everything through a single, logged operation. As long as you do this, you'll need this much space, as Lynn says. You either have to live with it, or change something. You could break the procedure up to a bunch of small procedures, each that commits the data move when it's done, so that the transactions are smaller and more discrete. Another option would be to use a minimally-logged operation such as bcp or BULK INSERT.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #906255
Posted Monday, April 19, 2010 12:26 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 6:26 AM
Points: 419, Visits: 722
Lynn Pettis (4/19/2010)
All of this:
The first INSERT statement moves about 800,000 rows from these tables. The second INSERT statement creates about 200,000 rows in the fact table by querying the fact table itself.


Is written to the log file before it is written to the database (not completely true, as some data may get written to the database before it is all written to the transaction log). The reason for this is in case of a failure and the transaction needs to be rolled back before it is completed (such as a server crash).

If the transaction log has grown to 9 GB during this process, then that is how large it needs to be. If you are concerned about the size of the transaction log, can this stored procedure be modified to complete the load in smaller batches, say of 100,000 rows each instead of 800,000 and 200,000 respectively?


This makes sense. I've had really large INSERT/SELECT statements fail 3/4 of the way through, and the destination will still be blank. I figured it was using tempdb, but my tempdb is 5G, and typically never uses more than 3.

I suppose I could build a loop into the process, I would probably have to add an identity key to the intermediate table. Not a huge problem. It shouldn't add too much time/effort to the creation of the intermediate.

What confuses me is that the creation of the intermediate table doesn't cause the same bloat. The intermediate table is created using two different procedures, each adding data from one system. My numbers are a little off. The intermediate table is actually slight over a million rows. 800,000 of those come from one of the two systems. Those 800,000 rows are moved in a single INSERT/SELECT (very similar to the procedure I'm having a problem with). I have confirmed that the transaction log was at 2G after this intermediate step ran. It grow from 2G to 9G during the final step of processing, in about 5 minutes time. It seemed strange, so I didn't think that the data mass alone was causing the problem...

Post #906262
Posted Monday, April 19, 2010 12:29 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 6:26 AM
Points: 419, Visits: 722
Danny Sheridan (4/19/2010)
I had a similar issue in our production environment where our transaction log was growing to unacceptable sizes after bulk data loads. In my case, the transaction log was not needed so I would manually truncate it after the inserts to get it back down to an acceptable size. I added this query at the end of my stored procedure to perform this:


The size of the file after the load is of no consequence. The fact that it grew to 9G in the first place means that I need to have 9G (or more) available on a drive where space is at a premium.

"CHECKPOINT" should be enough, in simple recovery mode, to force a checkpoint. I've actually added this to several points in my code while troubleshooting. This is how I know that a single INSERT statement is causing the problem.
Post #906264
Posted Monday, April 19, 2010 12:32 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 6:26 AM
Points: 419, Visits: 722
Grant Fritchey (4/19/2010)
Another option would be to use a minimally-logged operation such as bcp or BULK INSERT.


I'm confused as to how BCP or BULK INSERT would help, considering the data is already in a table.

Perhaps using SSIS would minimally log the transfer? I've avoided using it in the past because I am unsure about the UDF call.
Post #906268
Posted Monday, April 19, 2010 12:39 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 6:29 PM
Points: 32,834, Visits: 14,975
I don't think BCP/Bulk Import would help here.

If space is at a premium, I would look to resolve that issue. Isn't data likely to grow? Meaning more log space needed?

You can break up the insert, with the idea that you allow commits to occur, so a checkpoint can whack the log records and free up space.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #906274
Posted Monday, April 19, 2010 12:41 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:42 PM
Points: 14,840, Visits: 27,315
jvanderberg (4/19/2010)
Grant Fritchey (4/19/2010)
Another option would be to use a minimally-logged operation such as bcp or BULK INSERT.


I'm confused as to how BCP or BULK INSERT would help, considering the data is already in a table.

Perhaps using SSIS would minimally log the transfer? I've avoided using it in the past because I am unsure about the UDF call.


I suspect they won't help as things currently exist. But if you want to change the behavior of the system, you're going to have to change your process. It's really a question of picking and choosing where you make those changes.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #906275
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse