Controlling the log size with large updates

  • Hi,

    I've loaded 170 million rows of data from an old Orcale DW to a staging DB (SQL2008, simple recovery model) and I'm now trying to fetch/update the table with FK keys from the new DW. My problem is that the log expands so much that I must split the data by years/weeks and then always release the disk space before the next update interval starts. What would be a good way to do this?

    I would try the following:

    update STG

    Set ItemKEY = D.ItemKEY

    From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)

    Where YEAR = 2004 and WEEK <= 26
    go
    DBCC SHRINKFILE(N'STG_OldDW_log', ???) -- a specific amount or allow it to minimum?
    GO
    update STG
    Set ItemKEY = D.ItemKEY
    From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)
    Where YEAR = 2004 and WEEK > 26

    go

    DBCC SHRINKFILE(N'STG_OldDW_log', ???)

    GO

    Etc…….

    Any help would be appreciated, Ville

  • Your approach of updating in batches is right method and following it up with Shrink means you have limited disk space.

    Is when you do date range does the rows split to 50000 rows per update or less then it makes it easier on 4Gb RAM server.

    If above preassumption is right I dont see any problem with your approach.

    Cheer Satish 🙂

  • There is still about 15 million rows when split like in my example.

    OK, thanks. I'll just continue like I planned, or maybe I need to split even more

    Ville

  • Ville Lucander (4/28/2009)


    Hi,

    I've loaded 170 million rows of data from an old Orcale DW to a staging DB (SQL2008, simple recovery model) and I'm now trying to fetch/update the table with FK keys from the new DW. My problem is that the log expands so much that I must split the data by years/weeks and then always release the disk space before the next update interval starts. What would be a good way to do this?

    I would try the following:

    update STG

    Set ItemKEY = D.ItemKEY

    From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)

    Where YEAR = 2004 and WEEK 26

    go

    DBCC SHRINKFILE(N'STG_OldDW_log', ???)

    GO

    Etc…….

    Any help would be appreciated, Ville

    Is your database recovery model set to full?

    If so you can write with in a loop and with in a transaction, data can be updated for every one week. Ensure that transaction log backup runs for every 15 minutes or so, that way all in active log entires are being flushed and transacation log releases space to accomodate for new transactions.

    Given below is the code.

    declare @minweek tinyint, @maxweek tinyiny,@sqlstring nvarchar(200)

    set @minweek =1

    set @maxweek = 52

    while @minweek '+ convert(char(2),@minweek)+''''

    print @sqlstring

    Begin Transaction

    I have commented out actual execution part, to execute it uncomment it and then execute it.

    -- exec sp_executesql @sqlstring

    Commit

    set @minweek = @minweek+ 1

    end

  • Hi

    No it's running o Simple recovery model. But I could still use a loop construct and expand it a little to cover years 2004-2008 and just execute DBCC shrinkfile.... after each week.

    Thanks, ville

  • You shouldn't need to do a dbcc shrinkfile if you're in simple recovery mode - once each transaction commits, the log space will be available to be re-used for the next transaction. Manually shrinking the log creates much more work for the server than is required.

  • Howard is right. You shouldn't need to release space. Set the log to be the largest batch based on the year/month you are doing. After each commit, the space will be reused.

  • Hi,

    I'm on vacation but briefly. Yes, It makes very much sence what you say... but do I need to add explicite 'Begin Tran - Commit Tran' when running it in a loop like below?

    declare @STARTweek tinyint, @ENDweek tinyint, @sqlstring nvarchar(200), @Year smallint

    set @Year = 2004

    WHILE @Year <= 2009

    BEGIN

    Set @STARTweek = 1

    Set @ENDweek = 5

    WHILE @ENDweek = ' + CAST(@STARTweek as varchar(2)) + N' and VIIKKO <= ' + CAST(@ENDweek as varchar(2))

    raiserror (@sqlstring,0,1) with nowait

    execute (@sqlstring)

    raiserror ('SHRINKFILE(N''STG_VanhaDW_log'')',0,1) with nowait

    DBCC SHRINKFILE(N'STG_VanhaDW_log')

    Set @STARTweek += 5

    Set @ENDweek += 5

    END

    Set @Year += 1

    END

    Ville

  • Hi Ville,

    No, you don't necessarily need the begin tran, commit tran. Take the shrinkfile out though, it's not required.

    By definition, if the file can be shrunk, then the disk space you're recovering is unused anyway. If you shrink the file, the next update has to claim the space from the OS all over again and depending on your log file growth settings, that could be very expensive and also cause fragmentation.

    Cheers,

    Howard

  • Hello Howard.

    still wondering that when does the server have time to clear the log when I run the loop with 170 million rows. It is one batch/transaction, isn't it?

    Steve mentioned earlier that I should make the log space big enough to hold the data of the biggest period of rows btu I'dl ike to run this as a whole (the loop example). (I might have to run this and similar ones with other dimensions in the history fact several times before all is done)

    Is there any way to accomplish this or do I have to run it in smaller sets

    Ville

  • In Simple recovery mode, the log space is recovered as soon as the transaction ends - this is a very lightweight process that only puts a pointer that the log space can be overwritten.

    The log needs to be big enough to hold the largest transaction you'll run, so if you run one update statement that updates 170 million rows, then there's no getting around the fact that your transaction log needs to be big enough to accomodate this.

    If you split it up into multiple updates (even if contained within a loop) then you only need a log big enough to accommodate each indivudual update statement.

    Howard

  • Hi,

    sounds reasonable. I'll just start a test loop and see how the log file behaves

    thanks, ville

  • Hi Howard,

    yes, it works just like you described

    update STG

    Set TuoteKEY = D.TuoteKEY

    From dbo.STG_KIINTIO_VK_vDW STG JOIN DW.dbo.D_TUOTE D ON (STG.TuoteKoodi=D.TuoteKoodi)

    Where VUOSI = 2004 and VIIKKO >= 1 and VIIKKO = 6 and VIIKKO <= 10

    Etc...

    Thanks again, ville

  • Hello

    I am new to Sql Server 2008 bulk load (Oracle DBA entrusted with SQL Server now); Soon i will be dealing with 2TB of data /year ; 17GB/day assuming 365 days of dataload. I am too reading about similar case scenarios. Did you give a thought to changing the database mode to BULK LOGGING prior to dataload and then putting it back to Simple. When the db is in SIMPLE mode the logging should not taking place. I am surprised it is happening. Again i am no expert in SS and new to this type of bulk data load stuff.

    HTH

  • Hi,

    changing to Bulk-logged would not have helped in this scenario. Updating existing column values is always logged. You can use bulk-logged when using bcp, select * into table from table2, etc,,,

    and all transactions are always logged even though you are running the db in Simple recovery model but the log space is marked reusable after each transaction finishes.

    Ville

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply