Alternatives to SQL Backups

,

When most of think about doing a backup, we think a SQL backup to disk or

tape - or for the more adventurous, using the capabilities of a third party

backup solution to do the same thing. Some combination of full, differential,

and log backups (you ARE doing log backups aren't you) gives us the ability to

recover if something bad happens.

Bad usually falls into two categories. The first is hardware. Something

happens to the server that forces you to restore or maybe even rebuild. The

second and more common is user error. Good applications usually do a pretty good

job of limiting the amount of damage a user can do (one common way is by just

marking items as deleted, not actually ever deleting anything) but it can still

happen. It can also happen at the developer level, something that makes it

through whatever flavor of QA you have in place. Where I work we have a 'in

between' scenario. We have a team that does nothing but process data received in

a zillion different formats and load in into our system, they also do

maintenance on existing data. That maintenance might consist of updating 100k

rows because we got wrong values from the client initially. We try to

standardize those operations, but obviously they represent another opportunity

for error.

Still, a good backup plan covers all of that, right?

Ok, you're right, it does. But let's think about what it means and how it

works. Let's say one of these power users runs an update that updates the wrong

column, sets the wrong value, leaves off the where clause. Something bad.

They call you, but the query has already completed. You can't actually bring the

database down because you've got other employees working against it (billable

time you know!), so you need to:

  • Restore your last full backup (how long does it take to restore?) to a new

    database (do you have the space?)

  • Restore all the transaction logs up to 30 seconds before the mistake

    happened (more time)

  • Join original table to new table in the restored copy, figure out what

    action or actions will fix the problem (you probably have to do this, not

    the person who caused the problem)

If your database is any significant size this is going to take a while. I'd

bet half an hour minimum from the time you learn about the problem to having the

data corrected. If the db is large, it could be longer. You're tied up,

potentially users can't work or have bad/incomplete/incorrect data, you're

burning disk IO to get the restore done.

So....options? You could give the power users restore permissions and let

them do it. Just kidding. It's one thing to look at letting a user initiate a

backup, something else to let them do a restore on a production server.

If you're doing log shipping with a decent amount of latency, you can bring

the spare out of standby, set up a linked server to it, query across servers to

get the information you need. Potentially this is faster than doing the restore

but you need to practice it a couple times - if you don't stop the log shipping

before the bad data goes across...well, back to plan A!

If you're replicating to a reporting server you might be able to get the data

from there as well. Again it depends on your latency. Transactional replication

in a 100m switched environment is typically a few seconds. Worth considering

slowing it down to give you the option. If you're using a snapshot you may not

be able to get 100% of the data you need depending on the frequency of the

snapshot/how long after the snapshot the mistake happened. If you're using

transactional, go stop the distribution agent right away!

Logging changes to a history table (auditing) is a very valid technique.

You've got the data you need right there, just a matter of determining which

rows to join to and what to fix. Including hostname and username in the auditing

table can make this a lot easier. The tradeoff is the extra disk IO generated by

the triggers on each table and the each disk space needed to store the history.

Log Explorer from Lumigent lets you back out the change at the transaction

log level - assuming you own the tool and you're not truncating the log!

In most cases ALL of the above methods will require you to get involved to

fix the problem. Depending on your environment you may feel comfortable giving a

few key users access to the auditing tables if you went that route, or access to

the reporting server if you're using replication.

This last idea may surprise you and it's not workable in all cases. Have the

users make a copy of the table (or of the chunk of rows they are about to change

if the table is huge) in MS Access. A 'make table' operation in Access is

reasonably quick (compare to 'select into') and gives the user the ability to

join to a local 'backup' table if they want to verify the changes afterwards or

they need to back them out. Similar but not as simple to implement ideas would

be show them how to create temp tables or to run a server side process using

DTS/BCP that would make a copy.

As a DBA my primary concern is protecting the data and making sure the data

is available to the people that need it. Backups do that. Beyond that, it makes

sense to implement controls where ever possible to restrict the amount of damage

that can be done - requiring all data access be done via stored procedure for

example. Yet I know...and you should too...that mistakes will happen. If you

rely solely on a backup you're going to work harder and take longer to fix the

mistake than if you've thought through and have ready to go one or more of the

above options. You also have to recognize that being the only one who can fix

the mistake makes you the bottleneck. As long as you can be responsive, that's

ok. On the other hand, having someone that can fix 'minor' mistakes without

calling you...is that a bad thing?

Agree? Disagree? Add your comments below and tell me what I got right...or

wrong!

Rate

Share

Share

Rate