Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Breaking up a data load to prevent log file growth


Breaking up a data load to prevent log file growth

Author
Message
CapnHector
CapnHector
SSC Eights!
SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)

Group: General Forum Members
Points: 917 Visits: 1789
Im coming here to get some options for breaking up a data load. the situation is that i have to expand a set of numbers to cover every number from 000 to 999 (1 becomes 1000-1999). there are 258 thousand input numbers and if i just do a cross apply to the numbers table it causes the log to balloon (a problem because i dont want to have a "one time shrink" every month). This is being loaded into a VARCHAR(16) column which is why the data types are CHAR()

the two options i came up with are a cursor with a cross apply to a numbers table (Slow of course) and an output from an update (setting an isProcessed flag) with a cross apply to a numbers table holding 000-999 (faster than the cursor but still has a lot of data to crunch)

The sample data only shows what im trying to do with the input and has been paired down to only the relevant columns.

IF (OBJECT_ID('tempdb..#Block') IS NOT NULL)
DROP TABLE #Block

IF (OBJECT_ID('tempdb..#SampleNumbers') IS NOT NULL)
DROP TABLE #SampleNumbers

CREATE TABLE #Block (N CHAR(3) PRIMARY KEY WITH(FILLFACTOR = 100), isProcessed BIT DEFAULT(0))
CREATE TABLE #SampleNumbers (Block CHAR(7) PRIMARY KEY)

INSERT INTO #Block(N)
SELECT '000' AS N UNION ALL
SELECT TOP 999 RIGHT ('000' + CAST( ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR),3)
FROM sys.all_columns;

INSERT INTO #SampleNumbers
SELECT *
FROM (VALUES ('1234567'),('2345678'),('3456789'),('4567890'))x(TopLevel)

--INSERT INTO MyTable
SELECT Block + N
FROM #SampleNumbers
CROSS JOIN #Block



Currently i am using the following but wondering if there is a better way

IF (OBJECT_ID('tempdb..#tempSample') IS NOT NULL)
DROP TABLE #tempSample

CREATE TABLE #tempSample (Block CHAR(7))
CREATE CLUSTERED INDEX uix_tmpSampleIndex ON #tempSample(Block)

DECLARE @Count INT = 1

WHILE @Count > 0
BEGIN
BEGIN TRY
BEGIN TRANSACTION

UPDATE TOP (500) a SET isProcessed = 1
OUTPUT inserted.Block INTO #tempSample
FROM #SampleNumbers a
WHERE isProcessed = 0

SET @Count = @@ROWCOUNT

--INSERT INTO MyTable (MyCol)
SELECT Block + N
FROM #tempSample
CROSS APPLY #Num

COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH

TRUNCATE TABLE #tempSample
WAITFOR DELAY '00:00:01'

END
DROP TABLE #tempSample



This is a load into our staging database on our production server and needs to limit its impact (reason for the WAITFOR and transactions). Is there any better way to do this? (i need to work on the error handling so i don't get an infinite on an error where it keeps processing the same batch and getting the exact same error. its better than nothing as if it has been running way to long i know there was an error in the expansion.)


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
Vedran Kesegic
Vedran Kesegic
Old Hand
Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)

Group: General Forum Members
Points: 382 Visits: 1255
Log of what db has grown? what recovery model it uses?
did you measure which statement cumulatively takes most time?

Try to avoid cross apply, use @table variable instead of #temp table in your clever OUTPUT clause. Use filtered index containing only unprocessed rows. Temporarily switch to bulk-logged recovery model to minimize logging (read more about that, it's not so simple).

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

CapnHector
CapnHector
SSC Eights!
SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)

Group: General Forum Members
Points: 917 Visits: 1789
I should have mentioned that the database is in simple recovery mode so the t log growth is simply from the amount of data in a single transition


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
Vedran Kesegic
Vedran Kesegic
Old Hand
Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)

Group: General Forum Members
Points: 382 Visits: 1255
#tempSample is never cleared. So you throw out the same rows over and over in cross apply. Is that intentional?
You also don't need uix_tmpSampleIndex index.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

CapnHector
CapnHector
SSC Eights!
SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)

Group: General Forum Members
Points: 917 Visits: 1789
Vedran Kesegic (12/4/2012)
#tempSample is never cleared. So you throw out the same rows over and over in cross apply. Is that intentional?
You also don't need uix_tmpSampleIndex index.


#tmpSample is truncated right before the WAITFOR. as far as the index goes on #tempSample i found it to speed things up a little in my testing. To answer your second question i missed when i answered from my phone, the log that balloons is the log on our staging database where this is taking place.

The question is not so much the speed, each iteration of the loop takes 3 seconds to produce 500,000 expanded records and that includes the WAITFOR. Speed is not the primary issue as this is only the final import step of a data import (i have 250,000 records to expand to 250,000,000 so it takes some time by its self along with the other data.). the question i had was on the method and if there is any thing better to minimize load on the system. the thing that gets me is the loop but i think the nature of this beast will require it.

i do thank you for your improvements and will look at implementing them through some testing. i do like the filtered index on isProcessed which will speed up finding the top 500. im also thinking of throwing MAXDOP 1 on the queries to force it not to go parallel and take advantage of the same thing that makes the quirky update work.


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
Vedran Kesegic
Vedran Kesegic
Old Hand
Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)

Group: General Forum Members
Points: 382 Visits: 1255
I missed your truncate because the indentation misled me Smile
Try with CHECKPOINT command every several batches, because log cannot be truncated if checkpoint did not occur (also happens automatically, but we will force it here).

Also, you can optimize transaction log by setting large enough initial size (e.g. 4000 MB) and increment (e.g. 512 MB). Prior to that shrink log file to smallest possible (1KB). That will optimize number of VLF's in transaction log.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

CapnHector
CapnHector
SSC Eights!
SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)

Group: General Forum Members
Points: 917 Visits: 1789
Vedran Kesegic (12/4/2012)
I missed your truncate because the indentation misled me Smile
Try with CHECKPOINT command every several batches, because log cannot be truncated if checkpoint did not occur (also happens automatically, but we will force it here).

Also, you can optimize transaction log by setting large enough initial size (e.g. 4000 MB) and increment (e.g. 512 MB). Prior to that shrink log file to smallest possible (1KB). That will optimize number of VLF's in transaction log.


When we created the database, i grew the log to 50 gig in chunks to have optimal vlf sizes for the type of data transactions that take place in the staging database. Durring the single pass cross apply it ballooned to 100 gig (in 1 gig increments, may need to look at the auto growth settings later). In testing the loop, i did not have any issues with log file expansion with out a explicit checkpoint as SQL Server was check pointing the log as needed.

I guess from your responses that there is probably not a better way (avoiding the loop) to do this. That just makes me want to cringe that i have a loop in production code but when its the only choice left you go with what works.


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
Vedran Kesegic
Vedran Kesegic
Old Hand
Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)Old Hand (382 reputation)

Group: General Forum Members
Points: 382 Visits: 1255
Are you sure you have no open transactions behind (DBCC OPENTRAN)? What is log_reuse_desc from sys.databases? Not sure about the names, I'm writing from the head.
These are really huge log sizes.
Loop should not bother you, because you are processing data in batches, not row by row. My MVP colleague Dean would ask "How would you eat an elephant? One by one meal until you are finished."
Other things that comes to my mind (other than avoiding log growth with checkpoints, or bulk logged recovery model) is to create additional log file to fit this operation, and drop it afterwards.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

CapnHector
CapnHector
SSC Eights!
SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)SSC Eights! (917 reputation)

Group: General Forum Members
Points: 917 Visits: 1789
Vedran Kesegic (12/4/2012)
Are you sure you have no open transactions behind (DBCC OPENTRAN)? What is log_reuse_desc from sys.databases? Not sure about the names, I'm writing from the head.
These are really huge log sizes.
Loop should not bother you, because you are processing data in batches, not row by row. My MVP colleague Dean would ask "How would you eat an elephant? One by one meal until you are finished."
Other things that comes to my mind (other than avoiding log growth with checkpoints, or bulk logged recovery model) is to create additional log file to fit this operation, and drop it afterwards.


The db is in simple recovery as once the load is done we will back it up and it will then sit till our next load. the log_reuse_desc is always either nothing or while we are running the loads CHECKPOINT. at the time the log ballooned this was the only open transaction.

Thanks for the validation on the loop ill try out the filtered index in qa and if there is no improvement just leave it as it is since the log is not growing over the process. I really appreciate the help.


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45267 Visits: 39934
capnhector (12/3/2012)
Im coming here to get some options for breaking up a data load. the situation is that i have to expand a set of numbers to cover every number from 000 to 999 (1 becomes 1000-1999). there are 258 thousand input numbers and if i just do a cross apply to the numbers table it causes the log to balloon (a problem because i dont want to have a "one time shrink" every month). This is being loaded into a VARCHAR(16) column which is why the data types are CHAR()


I have to ask.... why are you creating 1,000 rows of storage for every row of input? What is the business reason for that? I'm asking so I can see if there's an alternative to loading so much data that might not get used.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search