Data Loss or Downtime

  • Comments posted to this topic are about the item Data Loss or Downtime

  • Being slightly pedantic I think the question:

    So what is more important to you: downtime or data loss?

    is wrong and not a reflection on the spirit of the article, more correctly it should be along the lines of:

    So what is more important to you: downtime or partial/non-current data?

    But, in answer to the spirit of the article, the answer is as always, it depends :). I can think of several of the systems I have written/support that could function on a partial, non current data set, and others that are completely dependant upon the results that have happened within tens of seconds before. I think the defining requirement is to understand how the customer works with a system and the impacts of all scenarios such that some form of SLA is in place which allows the most efficient and cost effective re-instation of a fully working system.

  • Hi,

    It depends on the nature of the system. I work in a financial institute with a lot of trading; in a downtime vs. partial data loss situation business has to weigh the cost of being down until the data is recovered, risk of reputational loss and increased revenue loss against a smaller risk of reputational loss and less loss of revenue with partial data.

    In most cases (in this scenario) it would be better to accept a partial data loss until it can be recovered, and get business up and running to mitigate the additional reputational and financial loss.

  • It depends. Among other things it depends on what data is going to be missing.

    Thinking back to the bank there were some tables that we could do without during business hours but were critical for the overnight processes. There were other tables that we could do without for 3 weeks, but they had to be there (and complete) during the last week of the month. There were other tables where if the information in there was incomplete it was worse than if the system was completely offline.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hello!

    To me, the strategy that I have followed is:

    1. Get the database online as soon as possible - non-current data allowed

    2. Attempt to restore the most-recent data first - mostly OLTP systems need transactions from within the last month 80% of the time

    3. Attempt to restore the history data - the rest of the 80% data, used 20% of the time

    Going side-track here, but I find the above strategy useful in data cleanup projects as well.

    Have a great day!

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • Fortunately our OLTP Database fits on one tape, we also restore to a backup database for immediate rollover in case of failure.

    However recently we were hacked and although our DB was untouched some system files were corrupted so we could not load the DB.

    We dump all our major files to csv files overnight from our OLTP DB and these were initially used as a psuedo Datawarehouse until we got our current full blown MS SQL version. Because these files were available we knew our customers addresses, we knew what stock we had, we knew where the stock was and we knew all the prices.

    Now it wasn't as smooth and efficient as normal trading and the system says we didn't process anything that day but we got all the orders out that were due out. So the impact to the business was minimal and once the system files were restored all was well.

    So how well you prepare for downtime is as important as getting the data back.

  • It Depends...

    Like i use to work on the product which shows up the prices and then people do the purchasing from there.

    so if i make the thinks online without taking the downtime then may be customers will see the bad prices and which can create a bad impact over my site just because of the bad data.So its better that i take the downtime and then recover all of my data and then make the things back to normal again.

    Thanks

    Vineet

  • Obviously, I want my cake and I want to eat it too (and I'll have as much of yours as I can)... or at least that's how most businesses would approach it.

    When I'm setting up DR for various systems, I ask the business, how much data loss are you prepared to handle? My assumption is, if stuff goes south, you're going to lose data. So then, it's a question of minimizing down time. yeah, I try to get after the data too, but if an app is mission critical, the question quickly comes up, everyone off line, or one user with incomplete data? I know how most businesses are going to answer that one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Nakul Vachhrajani (1/7/2011)


    Hello!

    To me, the strategy that I have followed is:

    1. Get the database online as soon as possible - non-current data allowed

    2. Attempt to restore the most-recent data first - mostly OLTP systems need transactions from within the last month 80% of the time

    3. Attempt to restore the history data - the rest of the 80% data, used 20% of the time

    Going side-track here, but I find the above strategy useful in data cleanup projects as well.

    Have a great day!

    I agree with Nakul here. In most cases I'll go out on a limb and say uptime is more important than retrieving your existing data. I say that because downtime is putting future revenue at risk (assuming your system is revenue generating). At least if the system is up, even in a hobbled state and empty, future transactions can process.

    Also, data loss is defined by your backup strategy. If you take tlog backups every 30 mins then that's your potential data loss, up to 30mins worth of data. Everything else is just offline until it can be restored assuming you have good backups. There's a big difference between the two. I consider data lost if it's truly lost, meaning there's no way to recover it. Everything else is just temporarily offline.

  • GilaMonster (1/7/2011)


    It depends. Among other things it depends on what data is going to be missing.

    I think that this is one of the most important points made so far. Also, how easily is the missing data to rebuild? A bad lookup table is one thing, but a table that holds one's personal account records and balance is quite a different animal. 😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • For us, any VLDB that is critical has some kind of high-availalbility option like mirroring or active-passive clustering to help stave off outages. However, no solution is perfect.

    What we have started doing is warehousing data that is necessary to keep for processing, but not needed in the "day-to-day" shuffle of data.

    We can conglomerate data between the OLTP and warehouse databases for reporting, when needed and we have processes that pull data out of the warehouse for day-to-day operations when needed.

    One aside, how does one recover a database when it is damaged for use? How can you ensure that all of the information your application requires will be intact and has integrity?

    Regards, Irish 

  • People working is first priority. Downtime in manufacturing is expensive and you only have to have one call from a VP or CEO to understand that insuring that people continue working is more important than your job. 😉

    I would tend to agree with Nakul. Basic data to continue functioning. As much information for the past x days as possible in as quick a time as possible. Continue working to restore historical data more than x days old as opportunity presents. In my case, x would probably be 14 days rather than 30, even though some reports would be amiss.

  • SQLArcher (1/7/2011)


    Hi,

    It depends on the nature of the system. I work in a financial institute with a lot of trading; in a downtime vs. partial data loss situation business has to weigh the cost of being down until the data is recovered, risk of reputational loss and increased revenue loss against a smaller risk of reputational loss and less loss of revenue with partial data.

    In most cases (in this scenario) it would be better to accept a partial data loss until it can be recovered, and get business up and running to mitigate the additional reputational and financial loss.

    I also work in financial services, and I would disagree with this assessment. Having some customers log in and have missing transactions would be worse than keeping everyone offline until all the data is restored - imagine the number of support calls from customers with missing transaction data.

    Also, trying to merge the missing data with a database that has had numerous changes since returning to an online state could be problematic, and could result in duplicate key issues.

    Unless it was static historical data that is missing, staying offline until all is recovered would be better.

  • I realized that most of the time, getting the site back up, having lookup and other types of ancillary data (like products, prices, etc), was the most important thing. Recovering other data such as older orders, was secondary.

    historical data should be on separate systems and is far less critical than current transactional data such as prices and products that lead to sales systems. thereby, production transaction systems need to be very highly available. administrators should be focused on bringing these online immediately WITH good data, if that system goes down. historical data is what management uses to assist in analyzing where to go. current data is the life blood of an organization. that needs to be running all the time. management can afford to move slower than the front lines of an organization.

    with this in mind, the question isn't downtime v. data loss, all systems need to be up and running. the question is what data/systems should have priority in going back online.

    if it's affordable, having redundant systems and sites, eliminate this question. replicating data to warm sites makes this issue moot.

  • Another great question Steve!

    I have to agree with those who've already responded with "It depends." That's not a cop-out, it's a valid question and each business needs to evaluate that question and the regularly review the decisions.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply