Failover options

  • Here's a situation:

    2 sites, 1 primary and 1 backup.

    1 unreliable, fairly slow connection between them.

    Clustering at the primary site, all cluster nodes at the primary site.

    Can accept up to 4 hrs of data loss.

    Scenario: Primary site for servers burns down, floods out, etc.

    What's a simple, easy-to-administer, reliable method that you would use to make sure the database servers at the secondary site can be used, if need be.

    Auto-failover is a nice-to-have, but not essential.

    Doing it so that no application configuration files, database connection strings, etc. have to be altered is really valuable.

    I'm reading up on mirroring, log shipping, and clusters.

    How would you do it, and why? Good and bad points? Risks?

    Thanks!

  • The options are database mirroring or SAN mirroring.

    Ability to be synchronous is dictated by latency. The primary factor is typically distance between facilities.

    Dan

  • I've been doing more reading and it appears that mirroring does not support filestream data.

    There's a goodly chance we'll be using filestream data in the near future, so it looks like mirroring is out. (Unless someone knows a cool trick...)

  • Do you have a guestimate of the size of the database(s) you're talking about here?

    The Redneck DBA

  • Most of the databases aren't that large and they don't change all that much, either.

    Sharepoint's the biggest, at a bit over 200GB.

    The rest are in the 10 to 20GB size.

  • Log shipping is fairly easy to implement, and you can set it to run at a schedule which suits you (i.e. less frequent if you find it is causing any kind of bandwidth issue at the secondary site).

    You will need to copy full backups of the source databases to the secondary site before you can get it up and running, so the 200GB one may be a slight concern if you can't physically take a copy of the backup file to site, but you should be able to copy it overnight without any issues).

    Bear in mind, that log shipping is fairly "clunky" and can be relatively unreliable, in comparison to other methods, when one missed log-shipping window, may result in you having to re-copy a fresh full backup to the second site again to resynchronise them..

    Also, if the link between the 2 sites is as unreliable as you say, equally bear in mind, that the secondary server is in effect there to act as live in the event of a failure with the primary/source.

    You would probably be looking to route all traffic to the secondary site/server in a failover situation, until you are able to resolve the issues with the primary.

    If the connection is unreliable or cannot take the burden of live traffic, then you may need to consider a better link...

  • Ian,

    Thanks for the comments. I got a test of log shipping up yesterday just to learn how to do it.

    I have a report I wrote that will tell me which log shipping jobs failed across all my servers, so I won't have to check multiple jobs per database per server manually. :w00t: Otherwise I would spend 1/4 my day just verifying all the jobs ran.

    I have to set up a test with filestream data in order to understand the behavior with that kind of data. Maybe it's no different from anything else, but I'm guessing it's not exactly the same.

    You are right about the issues with the connection. All I can do about that is due diligence and point it out, which I had already planned on doing.

    We've already got clustering for key servers. I'm going to implement mirroring to local servers for a selected few instances also.

  • I must admit, I've never had to cater for filestream data so far, and it does pose some otherwise more difficult manageability issues which other methods don't.

    You do have the option of using merge/transactional replication, if there are any users on the secondary site who have to make use of the primary server, as you can then kill 2 birds with one stone, by providing better performance to the "failover" site and going some way to providing a scale of redundancy.

    If there isn't a reasonable user-base in the secondary site (or if they don't really require it anyway), or if the link is not sufficient to support it, then it's probably a non-runner.

    Log shipping is probably the best solution by the sounds of your issue anyway, and I would only worry about the size of the blobs being ported across the "weak" link (excuse the pun 😛 )

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply