SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Transaction Log Growing Despite Simple Recovery


Transaction Log Growing Despite Simple Recovery

Author
Message
jvanderberg
jvanderberg
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1780 Visits: 746
The easiest way seems to be looping based on an identity key. I've broken up the transaction into groups of 50,000 rows, and a CHECKPOINT is issued manually (just to be sure) after each load. The first part of the transaction seems to take up the most log space, at slightly over 1G. The remaining data groups seem to take only a few hundred megs of log space. It's certainly a great deal better than 9G. Thank you all for your help.

An interesting thing I've learned from this - the dramatic effect of file resizes on query perforance. Typically, the stored procedure runs in 6-9 minutes (when there is enough log space to accomidate it). When I was forcing it to resize the log space (1G at a time), it took 16-20 minutes, and that's just 8 resizes. Imagine if my resize were set to 10%!! I could be waiting an hour for that to complete. File resizes truly do kill a query. You will notice the difference.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)

Group: General Forum Members
Points: 97595 Visits: 38988
Here is a guess at why you see a larger growth from stage to production than from source to stage: indexes on the production table(s). Do you currently drop and recreate the indexes or are they in place during the transfer?

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
jvanderberg
jvanderberg
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1780 Visits: 746
Lynn Pettis (4/19/2010)
Here is a guess at why you see a larger growth from stage to production than from source to stage: indexes on the production table(s). Do you currently drop and recreate the indexes or are they in place during the transfer?


Ah ha! It all make sense now. I keep the indexes online. I had considered drop/recreate, but up until now could see no specific reason to do so. The process itself didn't take that long, so I didn't think I'd really get much of a performance boost from dropping/recreating indexes. There are about twenty indexes on the fact table. Most are INTs, but still the index space for the table is nearly as much as the data space.

On the other hand, there are virtually no indexes on the intermediate table. Only a identity key added today.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)SSC Guru (97K reputation)

Group: General Forum Members
Points: 97595 Visits: 38988
I wrote a set of procedures for a previous employer that dropped and recreated the indexes and foreign key references on specified tables that were used before and after the load process into their data warehouse. Since it dynamically captured the index information when run there was no need to maintain drop and create scripts so if a new index was created on a table, it was automagically captured during the next load.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
jvanderberg
jvanderberg
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1780 Visits: 746
Lynn Pettis (4/19/2010)
I wrote a set of procedures for a previous employer that dropped and recreated the indexes and foreign key references on specified tables that were used before and after the load process into their data warehouse. Since it dynamically captured the index information when run there was no need to maintain drop and create scripts so if a new index was created on a table, it was automagically captured during the next load.


I'll try it out and see what the results are. This seems like it might actually give me a performance boost, whereas breaking up the query is almost certainly a performance hit.
jvanderberg
jvanderberg
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1780 Visits: 746
I had trouble removing the clustered index on InvoicedDate (it was taking a long time), but I was finally able to do it. I tested the stored procedure with all indexes removed except InvoicedDate, and the log file still jumped to 7G. Once I removed the InvoicedDate index, the log file only reached slightly over 2G. It is interesting to see how much of an impact the clustered (non-unique) index had on the transaction log.

I still have a identity key that I was not able to remove without removing the field itself. The identity key is required as the data provided no guaranteed unique keys. The primary key is non-clustered because it's order means nothing. I instead opted to cluster on the InvoicedDate field. The primary key doesn't appear to have a significant impact on the transaction log. Slightly over 2G is consistant with the other steps, which use slightly over 1G of log space. The extra log space can easily be accounted for by the increased number of rows, as well as the increase row size (the additional of several fields via lookup).

My preference is to get it under 2G, with enough room to spare that I don't have to worry about it growing. So, it looks like I may still have to break the data into smaller chunks. So now it becomes a matter of which provides the best performance. From what I'm seeing, recreating indexes seems expensive. However, building the data in chunks took over twice as long as building the data all at once. It looks like the difference in performance will be pretty nominal. Since building the data in chunks is simpler overall than some hybrid solution, looks like seperating the data is the way to go.

Thanks again for all your help!
Steve Jones
Steve Jones
SSC Guru
SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)SSC Guru (149K reputation)

Group: Administrators
Points: 149932 Visits: 19448
Thanks for the update, and I am surprised at the difference in the log with the clustered index removal. I would tend to agree that it doesn't necessarily make sense.

I'm wondering in terms of your timing and chunking, could you insert the data differently? Maybe do something to avoid page splitting?

Likely this is a challenging problem, and I'd actually encourage you to start another thread and post some code, see what suggestions people might have for sections to try and change the way it's being load in the proc.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
jvanderberg
jvanderberg
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1780 Visits: 746
Well, ultimately breaking it into blocks gives me complete control over the transaction log size. In this specific case, disk space is more important than performance. I can trade performance for disk space by reducing the block size. Also, the fixed block size means that the transaction log will not increase in size as a result of more source data. This solution seems to be acceptable.

I ended up with this:

DECLARE @i INT, @max INT, @blocksize INT
SET @i = 0
SET @blocksize = 50000
SELECT @max = MAX(id) FROM temp.OrderPrep

WHILE @i <= @max
BEGIN

IF @debug = 1 SELECT @i AS , @max AS [MAX]

INSERT INTO fact.CustomerOrder
SELECT
-- SELECT List removed for brevity and privacy
FROM
temp.OrderPrep OP
LEFT JOIN dim.Customer C on C.CustomerID = OP.CustomerID
OUTER APPLY Lookup.dbo.fn_SalesRegion(
CASE
WHEN C.CustomerType = 'OEM-N' then 'NAT'
WHEN C.Channel = 'EXPORT' then 'EXP'
ELSE ShipToState
END, ShipToCountry) SR
OUTER APPLY Lookup.dbo.fn_IRRegion(ShipToState, ShipToZipcode, ShipToCountry) IRR
WHERE OP.[id] <= @i AND (OP.[id] > @i - @blocksize)

IF @debug = 1 PRINT 'Checkpoint ' + CAST(@i AS VARCHAR(50))

CHECKPOINT

SET @i = @i + @blocksize

END



Which really isn't much more code than the original script. It's not the most elegant solution in the world, but it works. Frankly, I've already spent too much time on this. File bloat, however, can become serious, so it was imperative that I spend however much time was needed to solve it the problem.

Besides SSIS, I can't think of any other ways of moving the data. I've avoided SSIS for a couple of reasons. The first is because I am unsure about the OUTER APPLY capabilities via the UDFs. The other is because this code [i]works
, and I want to minimize the possibility of introducing new bugs. That second reason applies to any other method of moving data.
brainy
brainy
SSChasing Mays
SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)SSChasing Mays (617 reputation)

Group: General Forum Members
Points: 617 Visits: 649
I agree doing this in a loop or in chunks will give complete control. I too have experienced same problem.

I have a question here : bcp will log or NOT, while inserting data to a table?
GilaMonster
GilaMonster
SSC Guru
SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)SSC Guru (231K reputation)

Group: General Forum Members
Points: 231441 Visits: 46350
There is no such thing as an unlogged operation. Every data modification will log.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search