DR Failovers

  • Comments posted to this topic are about the item DR Failovers

  • Re: "I know a few companies that consider secondary systems to be critical and will actually fail over, and then run the other system for a few months, failing back to then retest the primary system. In this case, there really isn't a primary and secondary system, but rather two systems that can work, being alternately used throughout the year."

    This is the nearest to perfect kind of DR solution to ensure business continuity in the event of a failure, whether catastrophic or not.

    Often DR solutions are set up at the DB level but never tested, so in the unlikely event of a failure, a lot of time is expended making the DR work and sometimes it doesn't.

    I experienced such a situation recently. The site operated its apps through multiple instances. A single instance failed to restart after a sqlserver upgrade. The database had been replicated via log shipping, so one would think that the app could have been reconfigured to the standby database. The app was from an external vendor, who denied all knowledge of the installation, which had been done by a previous sysadmin, who liked multiple sqlserver instances but not documentation. Despite technical support from Microsoft, we couldn't repair the failed instance. There were only 2 users and there were higher priorities, so the vendor was called in to handle the problem.

    All the other more business critical apps had log shipped databases, but similarly failover had never been tested and there was willingness to devote time/resources to do this.

    Assuming that systems will fail and configuring, operating your infrastructure accordingly is a great attitude and will pay dividends in virtually seamless business continuity, which when all is said and done is the name of the game.

  • With the release of SQL Server 2012, testing DR is a dream. The "always on" works great allowing us to switch over between data centers in less than a minute. We probably switch between data centers on a biweekly basis. It's really fantastic having DR totally hands off. It was a little bit of extra effort and script writing to have the SQL Jobs work seamlessly, but in the end it was worth it.

    Prior to migrating to SQL 2012 from SQL 2005, we were log shipping and that was a real pain and took more manual intervention than we would like. The downtime in this scenario would have been a couple of hours.

    Steve

  • My current division's DR plans are virtually non-existent. Luckily our customers are being moved to a different SW that will make our DR plans pretty much unneeded.

    It's been that way since before I started, and I don't think I'll ever be able to fix it from my position.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • We have DR plans for our high priority systems but rarely test them. I recently approached the team of one of these systems that I thought would be most likely to want to test a full failover and they agreed. There were a couple things that we found didn't repoint automatically and so far we've remediated those and done a quick, off-hours failover to confirm that and are planning another full failover to complete the testing.

    I'm hoping I can use this to convince the other application owners that their apps should be tested as well so we're not dealing with cleanup in an emergency situation.

  • cfradenburg (7/22/2013)


    We have DR plans for our high priority systems but rarely test them. I recently approached the team of one of these systems that I thought would be most likely to want to test a full failover and they agreed.

    In my last company our DR plans only really came to fruition when our regulating agency came in and did an audit. If you have a regulating agency, sometimes it doesn't hurt to put a bug in there ear. But be careful why and how you do it.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • We run DR tests quarterly, but we don't fail over unless there is a real disaster. We have actually been running production from our DR site since Sandy hit the NY area. We expect to switch back later this year.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply