Data Loss or Downtime

  • ZZartin (7/10/2015)


    Xavon (7/10/2015)


    It depends. We effectively have two systems, one for managing the business and the other for reporting. For the management system, downtime is more important. For the reporting system data loss is more important.

    Unless we're switching meanings for downtime and data loss I would disagree with that. For your transaction system data loss is completely critical even if it might mean more downtime, do you think a customer is going to be happy if the $X order they placed and were billed for is lost when you decide that recovering quickly is more important than data loss?

    And inversely as long as the source data is intact the data warehouse can always be rebuilt, even if it means more downtime people can wait awhile on their reports.

    Our management system is not like that, however.

    And it sounds like you are saying data loss is more important in either case.

  • I assume that any partial restore scenario intended to get the database back online as soon as possible would also involve a followup attempt to re-insert previously lost transactional records. For example, access to purchase history is not critically important for customers to place new orders. Accounting will want that purchase history back as soon as possible, but accountants are also practical enough to appreciate any effort by the DBA to get the order entry system back online, so new orders can be taken, even if accounting has to wait a few days for full recovery of purchase history.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I have a 350GB database in my SQLServer based telephone system. If that system goes down, we're toast because it's a critical part of our business to "answer the phone" to take care of our customers' customers.

    Going back to the article and, so far as I'm concerned, it's ALWAYS critical to get a system back online with ALL the critical data.

    So, does that mean that I have to wait for a 350GB database to restore here? Not with an RTO of 15 minutes! My personally imposed RTO for that database is 10 minutes.

    So, how to pull off the "Insert Miracle Here" block on the recovery plan? It's easy and all it takes is avoiding the problem that a whole lot of people make. That problem is that most people have all their junk in a single file group and that should never be the case for "large" databases (and 350GB isn't really that large but it's the biggest one I have).

    At the very least, non-critical tables (usually "Worm" tables like audit tables and Invoice Detail tables that have all closed rows except for, say, the last 30 days) should be in a separate filegroup other than the PRIMARY filegroup. In my 350GB telephone database, all but about 4GB is NOT critical to "get back in business". I need to get the system backup up and running and only have, say, the last 30 days of transactional data available so that people on the floor can get back to work. The other 346GB of data can be brought back on line in reverse temporal order almost at my leisure because it's just not used that often. It's audit data that we have to keep forever. And most everyone has audit and transactional detail tables that fit that very description.

    I took the largest table (necessarily contains compressed recordings of all calls) and partitioned it by month. I've set the filegroups for the old months to Read-Only so that I don't even have to back them up or even consider them in index maintenance anymore. They're permanently cast in stone. That means that my recovery plan is to restore the PRIMARY filegroup and the current month of the (big) table, and declare that we're "back in business". Then, I can continue to do "Piece-Meal" restores at my leisure until all data has been restored.

    As to the folks that "accept some data loss" for DR, it's not really acceptable. Certainly, losing more than about 15 minutes of data (our RPO on the system) isn't acceptable to me and my personal RPO is to the nearest second just before a crash. I don't know why anyone would tolerate any larger loss than that.

    Heh... seriously.... if you've been storing data and data loss is ever acceptable to you or the company you work for, then do yourself a favor... just delete the data now so you don't have to maintain it or worry about it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I strongly suspect that in the event of a DR scenario the business would find out that certain non-critical systems were actually far more important than they realised.

    The question I would ask the community is whether you have a prioritised action plan for recovering systems and their data?

  • David.Poole (7/12/2015)


    I strongly suspect that in the event of a DR scenario the business would find out that certain non-critical systems were actually far more important than they realised.

    The question I would ask the community is whether you have a prioritised action plan for recovering systems and their data?

    Agreed. It's just like being a DBA... no one will miss what it does until it stops doing it. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As everyone is saying, it depends on which data is missing.

  • I know this was a re-run of an old editorial, but...

    How should people be using data? In general:

    1) Big important strategic decisions should be made using long-term data that is carefully analyzed. Although such decisions can be time-sensitive, the need to make them is (or should be) known in advance. If senior leaders are looking ahead as they should be, there should (usually) be enough lead time to say, "Get the database back up for operational needs first."

    2) The kinds of decisions that require real-time information are more of the day-to-day operational. If people rely on the database to make such decisions, downtime is the killer. Incomplete data is no big deal, if the immediately essential data is available and accurate.

    3) Assuming we have good historical data, the laws of statistics tell us that a relatively small number of recent datapoints shouldn't change "the big picture" signficantly - and if they do, then yesterday was probably just an outlier and we shouldn't conclude much from it.

    Henry Ford reputedly said that if he'd asked his customers what they wanted, they'd have said a faster horse. And unfortunately, in real life, leaders aren't always looking ahead as they should and they don't necessarily understand significance or good decision-making. So in a sense, it becomes a question of whether we're going to try to understand our customers' needs enough to actually help them, or if we're going to be co-dependent "enablers."

  • kelly.bailey (7/13/2015)


    I know this was a re-run of an old editorial, but...

    How should people be using data? In general:

    1) Big important strategic decisions should be made using long-term data that is carefully analyzed. Although such decisions can be time-sensitive, the need to make them is (or should be) known in advance. If senior leaders are looking ahead as they should be, there should (usually) be enough lead time to say, "Get the database back up for operational needs first."

    2) The kinds of decisions that require real-time information are more of the day-to-day operational. If people rely on the database to make such decisions, downtime is the killer. Incomplete data is no big deal, if the immediately essential data is available and accurate.

    3) Assuming we have good historical data, the laws of statistics tell us that a relatively small number of recent datapoints shouldn't change "the big picture" signficantly - and if they do, then yesterday was probably just an outlier and we shouldn't conclude much from it.

    Henry Ford reputedly said that if he'd asked his customers what they wanted, they'd have said a faster horse. And unfortunately, in real life, leaders aren't always looking ahead as they should and they don't necessarily understand significance or good decision-making. So in a sense, it becomes a question of whether we're going to try to understand our customers' needs enough to actually help them, or if we're going to be co-dependent "enablers."

    Heh... again, no one misses data until it's not available. And, statistically speaking, if someone loses data about my paycheck, taxes, hours worked, stocks, mortgage payments, etc, etc, then there will be a very high probability that I'll pull the Dunkin' Donuts move on them... "See you in court, sonny".

    Mind the pennies and the dollars will take care of themselves. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The answer to the question should already be defined for every system. It should be in the DR manual (including when data loss being acceptable - possibly with a justification).

    Jeff's stated opinion that if you can afford to lose data then you can afford to permanently lose the database (my phrasing) does suggest scenarios where this is true but there are databases where some data loss would be totally acceptable e.g. the data representing who is in at the moment in a database backing an electronic entry system where it is not for a heavily secured site i.e. the loss is acceptable.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply