Probabilities and Disaster Recovery

Question

Probabilities and Disaster Recovery

Steve Jones - SSC Editor

SSC Guru

Points: 734459
More actions
September 25, 2011 at 9:34 pm

#257734

Comments posted to this topic are about the item Probabilities and Disaster Recovery

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

djackson 22568 SSChampion Points: 11733 More actions · Answer 1

I am certain you are not saying to ignore large scale disaster possibilities. There are a lot more than shown on the graph you included. Some of the ones on there may differ for each location. A water leak is probably a higher risk in a brand new building (been there - seen that) and in older buildings then it would be in building that is 5-30 years old. A company located next to a major train railway has significant risk of chemical leaks. A company next to a river has a higher flooding risk. Obviously there needs to be a review of the risks to find out how likely they are.

On the small things, though, these are in my opinion much more difficult. These are why developers are (or should be) highly compensated. A developer must consider a whole lot of stupid human tricks to secure an application against stupidity, and then recognize that the end users are likely to find a million more ways to act like an idiot. It is virtually impossible to prevent these issues. You can't prevent everything because users frequently need to perform updates, even on a large scale, and there are only so many "You are about to delete 10,000 records from the database. Are you sure you aren't being a moron?" buttons you can display.

So while I agree these need to be considered, when you add up the cost of preventing these, most companies decide to not spend the time as the payoff is far lower than preventing even some of the largest disaster issues. For example, if fixing or preventing each possible minor stupid human trick costs $500 - it doesn't take long before all of them end up costing millions. So you prioritize these as much as you do the large ones. Well, hopefully you do.

Overall the concept you appear to suggest, focusing on a large number of possible issues and not just the large disasters, is a good one. One I hope most companies are doing already. But just as we can be blinded about the small issues while focusing on the large ones, the reverse can also be true. It comes down to reviewing all of, or as many as possible, of the risks and then deciding what it will cost to prevent them, and whether it is worth the investment. Smaller risks may be less costly, but the volume of those can end up costing more than a new data center in certain situations.

This appears to be a unique topic more along SQL Server, and I appreciate it as I anticipate a lot of discussion. Discussion that should help point out holes in my thinking.

Dave

Eric M Russell SSC Guru Points: 125522 More actions · Answer 2

The most common disaster that I seem to hear about is the "whoops" disaster. A disaster that's human error, a situation where someone makes a mistake in data entry. The most common "whoops" for DBAs seems to be the UPDATE or DELETE without a WHERE clause, but it seems these days there's no shortage of issues occurring from applications that allow users to manipulate large batches of information.

Of course there is no good reason for a DBA to be running ad-hoc UPDATE or DELETE operations. However, one form of disaster recovery that can be implemented in data change scripts (or even applications where the user supplies input parameters for data change operations), are assertions that rollback the transaction, if the expected maximum number of records modified is exceeded. This would be automatic disaster recovery at the transaction level, which is a lot better than manual data recovery after the fact on an entire database.

declare @product_category int = 7;

-- Oops! This update has a bug in the WHERE clause...

update product

set price = price * 1.10

where product_category = product_category;

-- Fortunately this assertion has your back covered:

if @@rowcount > 100

raiserror ('Assertion 4: The number of rows updated was %d,

but the maxinum number expected is 100.

This transaction will be rolled back.', 16, 1, @@rowcount);

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Eric M Russell (9/26/2011)
declare @product_category int = 7; -- Oops! This update has a bug in the WHERE clause... update product set price = price * 1.10 where product_category = product_category; -- Fortunately this assertion has your back covered: if @@rowcount > 100 raiserror ('Assertion 4: The number of rows updated was %d, but the maxinum number expected is 100. This transaction will be rolled back.', 16, 1, @@rowcount);

That alone won't do anything other than raise an error that says something untrue. It's not rolling back the transaction. That transaction committed.

declare @product_category int = 7;

begin transaction

-- Oops! This update has a bug in the WHERE clause...

update product

set price = price * 1.10

where product_category = product_category;

-- Fortunately this assertion has your back covered:

if @@rowcount > 100

begin

raiserror ('Assertion 4: The number of rows updated was %d, but the maxinum number expected is 100. This transaction will be rolled back.',

16, 1, @@rowcount);

rollback transaction

end

Probably not sufficient, it needs to check for existing transactions, possibly create a savepoint, etc, etc.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric M Russell SSC Guru Points: 125522 More actions · Answer 4

GilaMonster (9/26/2011)
Eric M Russell (9/26/2011)
...
That alone won't do anything other than raise an error that says something untrue. It's not rolling back the transaction. That transaction committed.
...
Probably not sufficient, it needs to check for existing transactions, possibly create a savepoint, etc, etc.

Yes, I know. The template I use for my actual data change script uses try/catch error handling and implements the rollback logic there.

The following isn't possible now, but I think it would be handy if we could specify explicit row count assertions using a query hint like so, allowing SQL Server to rollback (or even shortcut using cost estimates) a runaway operation and return an error instead.

-- If this query is going to delete more than 100 records,

-- then something is wrong, so don't allow it to complete:

delete from Product

where product_category = 7

option (assert_maxrows 100);

-- If this query is going to return more than 10,000 rows,

-- then I don't even want the result:

select a, b, c

from x ...

join y ...

join z ...

option (assert_maxrows 10000);

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 5

Using scripts for changes is the best solution, hopefully scripts that someone else has looked at and include some error handling or checks for incorrect (or too many) changes to data. However in the real world that doesn't seem to be the case, and barring a wholesale change in the way the vast majority of people make changes, I think a little DR preparedness for these is the most important things.

Scripts and checks, however, also miss out the human error. I'm building a script against Dev for production and accidentally run it on production. Too easy to do, and I've done that as well.

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 6

djackson 22568 (9/26/2011)
I am certain you are not saying to ignore large scale disaster possibilities. There are a lot more than shown on the graph you included.

Nope, don't ignore them, but be reasonable. Earthquakes and tornadoes hit Denver, but very, very rarely. Not worth planning on a major disaster here as we are highly geologically stable. Preparing for the once in a century or millennium event isn't prudent.

Also, I wasn't trying to list all disasters on he graph. Not sure there's enough graph paper to do that. You have to pick the list that applies in your environment, and maybe that's a good poll to run: what's a disaster. Hmmmmm.

Note that it can change. I worked in a company, 12 year old building, and they decided to retrofit water cooling in the data center because of the load growth. They ran 12" diameter pipes beneath the raised floor, which took weeks. We were still running. The contracts and building people didn't see any issues with that. With power and network lines beneath the floor, I made sure we ran lots, and lots of extra backups that month. Had to get a VP to order them to tell us when they were testing the water filling the pipes. No disaster, but if could have been one.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Eric M Russell (9/26/2011)
GilaMonster (9/26/2011)
Eric M Russell (9/26/2011)
...
That alone won't do anything other than raise an error that says something untrue. It's not rolling back the transaction. That transaction committed.
...
Probably not sufficient, it needs to check for existing transactions, possibly create a savepoint, etc, etc.
Yes, I know. The template I use for my actual data change script uses try/catch error handling and implements the rollback logic there.

I suspected as much. Wanted to clarify for anyone reading that's not that familiar with SQL's error handling

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass