SSIS and Transaction Log Growth

  • IanSinclair (3/5/2014)


    And if the Maximum Insert Commit Size in the OLEDB destination is already set to a reasonable value you.....

    This option might not work - so plitting a source stream into a few hundred streams might be unwieldy but necessary.

    You'll have to explain to me why that won't work, I'm afraid I'm not familiar with the problem you didn't explain, just hinted at without explanation.

    Stewart "Arturius" Campbell (3/5/2014)


    Alternatively, change the recovery model to bulk logged just before the load and back on completion thereof.

    however, this will require a full backup before and after.

    Bulk logged reduces log space usage by using minimal logging for most bulk operations.

    see Prerequisites for Minimal Logging in Bulk Import for more information in this regard

    good question, thanks Craig

    Bulk and Simple use similar logging techniques for minimally logged transactions, so either would work for that purpose. In my experience, while this will help, it requires a significant exposure of point in time recovery loss (hours, remember) that would be unacceptable for most systems. Edit: I should have also mentioned that in my explanation below but that was starting to get REALLY long. You're right, though, it would HELP. I'm not sure it would fix it however. Depends on where the bulk log was in the log when backups tried to run. /EDIT

    Koen Verbeeck (3/5/2014)


    Nevyn (3/5/2014)


    Koen Verbeeck (3/5/2014)


    Interesting question, but I do not agree entirely with the answer.

    Setting the commit size to a lower value will commit batches before the entire load is done. What if something fails and I want to roll everything back?

    Splitting the load up in smaller chunks but in manageable transactions seems to make more sense.

    Well, the destination in this case IS a staging table. If you want to roll everything back, you can just truncate it again, no?

    Point taken 😀

    Heh, yup, that was part of the reason I made sure to mention this went to a staging structure which was the concern. Koen just needs to have more morning coffee. 🙂

    Thanks for the comments guys, keep 'em coming. I'm human, I could have made a mistake in this somewhere. :hehe:


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (3/5/2014)


    Koen just needs to have more morning coffee. 🙂

    ... as always :rolleyes:

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Stewart "Arturius" Campbell (3/5/2014)


    Alternatively, change the recovery model to bulk logged just before the load and back on completion thereof.

    however, this will require a full backup before and after.

    Bulk logged reduces log space usage by using minimal logging for most bulk operations.

    see Prerequisites for Minimal Logging in Bulk Import for more information in this regard

    good question, thanks Craig

    I'm no SSIS expert so can we assume this insert will be in the form of a bulk import and not just a T_SQL insert statement?

    ---------------------------------------------------------------------

  • and 3 points worth - nice. 🙂

    ---------------------------------------------------------------------

  • george sibbald (3/5/2014)


    Stewart "Arturius" Campbell (3/5/2014)


    Alternatively, change the recovery model to bulk logged just before the load and back on completion thereof.

    however, this will require a full backup before and after.

    Bulk logged reduces log space usage by using minimal logging for most bulk operations.

    see Prerequisites for Minimal Logging in Bulk Import for more information in this regard

    good question, thanks Craig

    I'm no SSIS expert so can we assume this insert will be in the form of a bulk import and not just a T_SQL insert statement?

    Yes, if you meet all the requirements in the linked article. Otherwise, it is a logged insert just like a TSQL insert.

  • yes, but the question says insert, rather than import, so does SSIS run the insert as a bulk import by default, or is that a setting you can choose..................just trying to expand my depth of knowledge here 🙂

    ---------------------------------------------------------------------

  • george sibbald (3/5/2014)


    yes, but the question says insert, rather than import, so does SSIS run the insert as a bulk import by default, or is that a setting you can choose..................just trying to expand my depth of knowledge here 🙂

    It's an option, called Fast Load in the interface.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • nice question..

    thanks craig.

  • Stewart "Arturius" Campbell (3/5/2014)


    Alternatively, change the recovery model to bulk logged just before the load and back on completion thereof.

    however, this will require a full backup before and after.

    Bulk logged reduces log space usage by using minimal logging for most bulk operations.

    see Prerequisites for Minimal Logging in Bulk Import for more information in this regard

    good question, thanks Craig

    Bulk and Simple use similar logging techniques for minimally logged transactions, so either would work for that purpose. In my experience, while this will help, it requires a significant exposure of point in time recovery loss (hours, remember) that would be unacceptable for most systems. Edit: I should have also mentioned that in my explanation below but that was starting to get REALLY long. You're right, though, it would HELP. I'm not sure it would fix it however. Depends on where the bulk log was in the log when backups tried to run. /EDIT

    Bulk and simple recovery is not same if you put db in simple mode and if any transaction happens than LSN chain breaks and you must need full db backup immediately elase your t-log backup fails. However in Bulk recovery it won't break LSN chain and only issue is with point-in-time recovery.

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • free_mascot (3/5/2014)


    Bulk and Simple use similar logging techniques for minimally logged transactions, so either would work for that purpose. In my experience, while this will help, it requires a significant exposure of point in time recovery loss (hours, remember) that would be unacceptable for most systems. Edit: I should have also mentioned that in my explanation below but that was starting to get REALLY long. You're right, though, it would HELP. I'm not sure it would fix it however. Depends on where the bulk log was in the log when backups tried to run. /EDIT

    Bulk and simple recovery is not same if you put db in simple mode and if any transaction happens than LSN chain breaks and you must need full db backup immediately elase your t-log backup fails. However in Bulk recovery it won't break LSN chain and only issue is with point-in-time recovery.

    You're acting like I said that the log chain wasn't going to be broken in simple mode. I said that the exposure time for point in time recovery was too high for most locations. That they use similar bulk logging techniques is a different piece of what I said.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Finally, I can say.....easy peasy!

  • Koen Verbeeck (3/5/2014)


    Interesting question, but I do not agree entirely with the answer.

    Setting the commit size to a lower value will commit batches before the entire load is done. What if something fails and I want to roll everything back?

    Splitting the load up in smaller chunks but in manageable transactions seems to make more sense.

    That was my reaction too on discovering that splitting the stream into small chunks was not the "corect" answer. when I split the thing, I know where it got to before any failure; when I use a mgic number under teh covers I have some working out to do to find out how to recover. I believe the latter is more complicated than teh former, so that the "corect" answe is not by any means the simplest way to achive the objective except in that extremely rare case (for most of us it mever happens) where coping with failure is not required for some reason.

    Tom

  • Nevyn (3/5/2014)


    Koen Verbeeck (3/5/2014)


    Interesting question, but I do not agree entirely with the answer.

    Setting the commit size to a lower value will commit batches before the entire load is done. What if something fails and I want to roll everything back?

    Splitting the load up in smaller chunks but in manageable transactions seems to make more sense.

    Well, the destination in this case IS a staging table. If you want to roll everything back, you can just truncate it again, no?

    Yes, but that's potentially a big performance hit if you have to truncate the table and then insert several hundred million rows that had already been inserted correctly instead of just teh chunk that failed and all following chunks.

    Tom

  • TomThomson (3/11/2014)


    Koen Verbeeck (3/5/2014)


    Interesting question, but I do not agree entirely with the answer.

    Setting the commit size to a lower value will commit batches before the entire load is done. What if something fails and I want to roll everything back?

    Splitting the load up in smaller chunks but in manageable transactions seems to make more sense.

    That was my reaction too on discovering that splitting the stream into small chunks was not the "corect" answer. when I split the thing, I know where it got to before any failure; when I use a mgic number under teh covers I have some working out to do to find out how to recover. I believe the latter is more complicated than teh former, so that the "corect" answe is not by any means the simplest way to achive the objective except in that extremely rare case (for most of us it mever happens) where coping with failure is not required for some reason.

    Okay, after Tom's explanation here, I realize the problem. I didn't word the incorrect answer as well as I could have.

    What I had in my head was: Use conditional splits to split the stream into multiple streams to deliver via multiple OleDB targets as separate transactions.

    There's the confusion, Koen, that you mentioned a while back. 🙂

    Edit:

    In my experience, however, when a load fails you reload it from scratch, even something this large. The reason being you can't be sure what's missing and what's not. Let's take the multi-stream example of what's missing. How do you know which other chunks succeeded and which didn't? Do you rely on the order in the origination file? If it's a query, you can't rely on order at all, how do you control for what chunks you will get? Do you turn off certain streams, redeploy the package, and rerun for the missing pieces after hours of painstaking double checks?

    This is not something in most places that would have been done. If the error repeated a few times, a new design would be put into play, sure. But for a one off to a staging table? No, you'd rerun the process after fixing the bad data row (or making sure the network wasn't going to burp again, the usual culprit for my life) and keep on trucking.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (3/11/2014)In my experience, however, when a load fails you reload it from scratch, even something this large. The reason being you can't be sure what's missing and what's not.

    May be a slight bit off topic, but I like to design for retry and restartability within a DATA FLOW TASK as opposed to a control flow task. The SSIS Operational and Tuning Guide has a nice example, but the caveat to doing this is the ability to have a solid data extract query (and range metadata if you decide to split this into multiple streams). In some cases it is not easy to do.

    Read pages 16 onward......

    http://msdn.microsoft.com/en-us/library/jj873729.aspx

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply