Another Disaster (Almost)

  • All I can say is OUCH!

    I'm sitting here thinking that I sure better go back over my disaster recovery plan again and make another dry run to make sure I haven't missed anything! This is the type of wake up call we all don't need to have happen but must be prepared for. Excellent job keeping your users up Andy!

    Gary Johnson


    Sr. DB Engineer

    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • Thanks. We actually sat down today with our vendor to figure out what clustering will cost..another OUCH! I'll probably write up some info on that as well, amazing how quickly the costs add up.


  • Andy,

    Great article. I think this topic is fantastic. Probably one of the best area's on this site, even tho there are only three entries. It's a good insight to other DBA's Disaster Recovery (DR) and it's always interesting to see what extra work you have to do on the fly when the problem falls out of the DR plan scope.

    Fortunately (touch wood) we've not experienced anything like that.....but who know's what the future holds...especially as were moving to a new production box....

    I guess you can test and test your DR plan, but computers, as wonderfull as they are, are always full of surprises!

    Don't wish to curse all the rest of you out there, but any further disasters (or near disasters) would make interesting reading....

    Clive Strong

  • While the content of the article was useful, I must offer some well intended criticisim. I am not sure if english is your second language, or if you intended the article to read in the style of a personal journal. The article would have been much easier to read if you had used proper puctuation, spelling or even close to the correct grammar. I would expect higher quality writting on a site such as this one.

  • Nope, English is my first and only language. Well, I speak geek a little. We're pretty informal here, though we do try to get the spelling correct. We (Steve, Brian, myself) find most technical content hard to read, so we try for a less intense approach. In this case I did write in journal fashion because I wanted to try to present as best I could what happened/how I felt, not really try to clean it up.

    We've got a book project under way to compile info from the site and the majority of readers did want us to correct typo's and minor mistakes, so we'll be reviewing old content and putting more time into proofing new ones. In the interim, if you see something wrong, please do point it out and we'll try to fix it.


  • I had some bad memories while reading your article. Things can get ugly in a big way when dealing with SQL Server...Can you believe that SQL Server 7 was supposed to be able to run itself without needing DBA's (right)? SQL 2000 should probably fly the space shuttle or something. I have found that when SQL breaks's not a small issue.

    At the United Network for Organ Sharing, we had SQL 2000 running on clustered servers with the databases on a SAN. All of a sudden (I'm not kidding), a very important (1 megabyte) database file disappears off the SAN and now our 240 GB database is suspect. Now nobody in the country can match organs.

    Having the cluster didn't help, since the problem was on the SAN. We pointed the app to our Hotsite, which was kept up to date via Log Shipping. It took several hours the next day to get everything back to normal, but lives were saved, thanks to Log Shipping.

  • Glad I'm not Andy or the last guy

    I've been pretty lucky with very few disasters, and hopefully that will continue.

    Steve Jones

  • Reads like a story. We know we need to restore or re-snapshot when such a thing happens. What would have added value to this article is, "what originally caused the problems" and "how to identify the root of the problem" and "how to avoid such hardware failure (Checks one should do on a regular basis)"

  • As I mentioned above, it IS written in journal style. As for your questions, you tell me? How could I know that the container would drop? What could I do to prevent it next time? Easy to say cluster everything, keep a warm standby, etc, but sometimes companies truly cannot afford it (or you can't convince them to afford it).

    What I was hoping to share was how something gets handled when regardless of whether you tried to prevent it, could have prevented it, etc, things go bad and you have to make decisions. If you can learn something, think of something, not do something from reading my sad story, then I've done some good.


  • using replication i had a rollover to backup site the other day only to find out that all the text fields were empty seems that if text fields are updated with writetext then replication doesnt work. Plus the verify program so graciously doesnt verify text fields. So manual restores field by field

  • Has any body tried the sql up program by incepto

  • FWIW - I think the journal style is fine and easy to read.

    Clustering / high availability is something we too have considered. Unfortunately the cost of traditional clustering is prohibitive.

    I second 'danschl's request for any real-life experience with SQL-Up. Even pricing would be helpful. I dislike the policy of some companies who do not even post the pricing on their sites.

  • We are looking at testing SQL-UP at SQL Server Central. We met with the vendor and do like the program. There are some limitations and issues (we need another box ), but it's a neat product and I was impressed.

    I won't delve into pricing, send your feedback to Incepto on that, but it's much less expensive than Clustering from MS and you don't need shared disk. In fact the two servers can be different models/brands. You do need some good bandwidth in between the servers, so it's not really a WAN solution.

    Steve Jones

  • I was at the demo with Steve and was really going in thinking I would not like it. Pleasantly surprised. Failover is quick! The downside (and keep in mind this comment is based on a 15 min demo/conversation) is that you have to create a shadow db on each server to store/reconcile the changes/manage stuff, so potentially you increase your disk usage quite a bit. Not sure by what factor.


Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply