January 27, 2012 at 11:53 am
Reasons why I disagree with Replication as a DR solution.
1) Latency. Depending on your replication setup, you could have more latency in it than in your normal database backups. Which means you have missing data that you might not otherwise have. Cluster setups switch over automatically and don't miss a lot. Sure, you might lose a transaction in the middle, but it's rare. Usually the transaction rolls back.
2) Availability. It's not easy to fail over Replication. In fact, you can't fail over Replication. You would either have to backup the replicated database and move it over to your source server or create a new publisher / distributor / subscriber setup to move things over to your new server. And if you wanted to make your replicated server the new "primary," you have to consider things like DNS changes, virtual server name changes, hard-coded connection strings in packages, clients, etc. that would have to be changed before you could recover.
3) A lot of Replication setups involve more than one location or don't involve all the data in an instance. If your primary server goes down, you've lost more than just your OLTP database. You've lost everything in msdb, model, and master. That is important data that you aren't replicating (I can almost guarantee no one replicates system databases) to your Replication destination. And not all servers have the same information in the system databases.
Additionally, businesses with multiple databases on one instance aren't going to set up Replication for all of the dbs individually (I'm talking in general) because it's too much of a hassle. Replication breaks easily. I can't tell you the number of times I've had to recreate or reinitialize subscriptions because network connections went down. Usually the Replicated instance fails much more frequently than the primary instance.
4) Location of the Replicated Instance. Is it on site and on the same server as your current instance? As soon as the server goes down, you've lost Replication too. Is it on site but on a different server? What happens when a tornado hits or the server room catches on fire or the generator goes out? Is the instance off site? How easily can you get to it if the primary site goes down? How much more difficult has your recover become because a solution not designed for DR is harder to access?
Disaster Recovery isn't recovery if you can't get access to the data in a timely manner. When I plan for DR, I plan for servers or databases that will be up in 2-3 hours at the most. DR should not be about "where's a copy of the data." It should be about "how do we lose the least amount of data in the least amount of time and get back running so the company isn't bleeding millions of dollars for every hour we're down."
Now I grant you that my DR solution also accounts for Business Continuity which is a plan for getting the company back up and running as a whole after a disaster. But Replication and Log Shipping in my mind are not DR. DR is RAID and Clustering and backup scenarios with off-site backup storage.
EDIT: I forgot to mention Database Mirroring. I don't think of DM as a "final" DR solution, but if you want a partial DRS, Database Mirroring is (to my mind) better than Replication because you can set up a witness server to use it for failover.
January 27, 2012 at 11:59 am
SQLKnowItAll (1/27/2012)
I just did a quick "click" on replication to create a new publisher and went through a bit of the process to see that peer-to-peer is not an option. I know I'm being lazy, but is this something else not in the wizard or something?
Correct, its not in the wizard.
January 27, 2012 at 12:02 pm
The global diagram gives my favourite interpretation in conjunction with local backups for recovery.
January 27, 2012 at 12:12 pm
Hi Brandie,
Although I agree with you from a DBA perspective, I would disagree that this is not an acceptable DR solution from an application standpoint. The reasons you cite don't apply to an application; i.e. The application does not care about database failover, it simply needs a site to go to.
1. Latency: My business accepts it and can tolerate the small data loss
2. Failover and Availability: The database is not failing over, the application is. Which makes this highly available. Even more so than a cluster since we are on several different sites and ISPs.
3. Who says that we are not also backing up our databases on an instance along with replication? I wouldn't use replication as the only part of a DRS, of course if backups are the only part of a company's DRS they are in big trouble. Replication is just to cover a subset of scenarios, but these scenarios can be disaster related.
4. I suppose that depends, but it would be silly to rely on it as part of a DRS if it were on the same machine. Clearly I am not suggesting that as part of a DRS.
Any DRS involves many small pieces and they all have to somehow work together. If a site, server, whatever goes down... I know my company's applications have a site to go to. In the meantime, I can work on restoring data from backups or whatever else I need to do on the site that went down. To me, you are approaching this simply from a DBA or database failure standpoint. However DR must include plans for so many other things. So is this a part of a DR plan specifically from the DBA department? No, but to a developer or an operations person it is.
Jared
CE - Microsoft
January 27, 2012 at 12:15 pm
Jared,
Actually, I'm approaching this from the POV of someone who did DR & BC design at a previous employer far far before I ever became a DBA.
It's obvious to me from your OP that you are stuck on this one solution. I don't think it's a good solution. All I can do is advise you not to rely on it. If you choose to ignore that advice, and the reasons in it, that is your decision.
January 27, 2012 at 12:38 pm
I guess what I am trying to make clear is that this is not a complete solution, but one piece of a much larger comprehensive plan. Of course, I wuold never suggest that replication in and of itself is going to prevent data loss or maintain business continuity.
If one of my applications cannot connect to a db for any reason, I need to have the best way to handle that situation. Depending on the disaster itself, rerouting to a replicated database on a separate instance is much easier, reliable, and more efficient than anything else. On the exact same system, a different disaster will require something completely different like a backup restore or something else.
Having both of these things in your toolkit are beneficial. To exclude 1 because it cannot be used in the more severe cases would be a mistake in my case.
Jared
CE - Microsoft
January 27, 2012 at 12:46 pm
SQLKnowItAll (1/27/2012)
If one of my applications cannot connect to a db for any reason, I need to have the best way to handle that situation. Depending on the disaster itself, rerouting to a replicated database on a separate instance is much easier, reliable, and more efficient than anything else.
In which case, I would recommend Database Mirroring rather than Replication. It seems (to me) to be more reliable in addition to having the capacity to detect database failures. Replication just breaks and can't tell you why it broke.
And you can still report off of DM.
January 27, 2012 at 12:52 pm
Brandie Tarvin (1/27/2012)
SQLKnowItAll (1/27/2012)
If one of my applications cannot connect to a db for any reason, I need to have the best way to handle that situation. Depending on the disaster itself, rerouting to a replicated database on a separate instance is much easier, reliable, and more efficient than anything else.In which case, I would recommend Database Mirroring rather than Replication. It seems (to me) to be more reliable in addition to having the capacity to detect database failures. Replication just breaks and can't tell you why it broke.
And you can still report off of DM.
Maybe an example will help here. If not, I concede defeat 🙂
Applications are hosted on siteX, siteY, and site Z. Primary database is at siteA and secondary is at siteB. For some reason siteX cannot connect to siteA. Say a malfunction or configuration error in the switch at that site. However, it CAN connect to siteB. No issues between siteA and siteB. In this case, the replication serves as a solution for the problem in the short term. This would be 1 scenario of many built into a complete DR plan.
Is this bad and why? Mirroring would not suit this 1 case.
Jared
CE - Microsoft
January 30, 2012 at 7:26 am
SQLKnowItAll (1/27/2012)
Brandie Tarvin (1/27/2012)
SQLKnowItAll (1/27/2012)
If one of my applications cannot connect to a db for any reason, I need to have the best way to handle that situation. Depending on the disaster itself, rerouting to a replicated database on a separate instance is much easier, reliable, and more efficient than anything else.In which case, I would recommend Database Mirroring rather than Replication. It seems (to me) to be more reliable in addition to having the capacity to detect database failures. Replication just breaks and can't tell you why it broke.
And you can still report off of DM.
Maybe an example will help here. If not, I concede defeat 🙂
Applications are hosted on siteX, siteY, and site Z. Primary database is at siteA and secondary is at siteB. For some reason siteX cannot connect to siteA. Say a malfunction or configuration error in the switch at that site. However, it CAN connect to siteB. No issues between siteA and siteB. In this case, the replication serves as a solution for the problem in the short term. This would be 1 scenario of many built into a complete DR plan.
Is this bad and why? Mirroring would not suit this 1 case.
With the caveat that I am not taking into account real life budgeting / hardware issues... I think that you're making this more complicated than it needs to be. If I understand what you're talking about, that is.
You have only 2 sites with data. Site A and Site B. Mirroring works well for a 1 to 1 situation like this. You have Site A as the primary server and Site B as the secondary server. The witness server is set up to switch over if A is not responding.
I'm not quite sure how your network is set up that you have 3 separate sites for application servers (or why), so I don't know what kind of network issues you're running into with the A & B sites. Given how concerned you are about losing access to one site, I'd say your network is probably fragile enough to lose access to both sites at the same time. But again, I don't know your network or why it is set up in that configuration.
January 30, 2012 at 8:23 am
If replication is completely off the table for your plan, what do you suggest instead to remedy the solution described in the previous response to maintain business continuity? The thing is, I can't come up with a better solution, so any ideas would be appreciated.
Maybe I am making this more complicated than I should be. However, I still do not feel that anyone has justified replication to not be considered as a "part" of a more complete DRS. To completely exclude it from a comprehensive plan is like dismissing band-aids because they cannot fix a severe cut. As I'm sure you know, a good comprehensive disaster recovery plan must include quick and simple fixes for small outages as well as bigger solutions to more severe problems. I don't believe that anyone has made the case that replication cannot be considered part of a disaster recovery plan because my previous example demonstrates a potential point of failure in a system that is would not have been addressed by mirroring; i.e. if in this case replication was not set up there would be loss of business continuity.
I will concede that it would be foolish to say that replication can be a substitute for backups and that it should be used as the only piece of a DRS, but to completely dismiss it as a possible solution to one point of failure would be a mistake in the example described earlier. Mirroring is great if you are simply setting up a hot site or have the money for resources that are only used in the case of a disaster. However, in the example stated earlier mirroring alone would not have solved the problem.
Jared
CE - Microsoft
January 30, 2012 at 9:40 am
I read the first page where you said mirroring isn't an option but I'm still confused as to why? Can you expand on that?
It sounds like your looking for more of a HA solution than a DR one. I know those areas kind of blend together though. In your example you mention an application not being able to connect to your database because of a switch malfunction at your primary site. I guess that could be a disaster if the site caught fire and there's damage to the switch, but I'm assuming you were referring to something a lot less severe, possibly a networking issue that will be fixed in the next couple of hours and you just want the application to be available during that time span. For instances like that I'd say replication would probably work, especially for an application that you would like to always be available but it's not mission critical that it need be.
I'm thinking of a couple web sites that I've worked with in the past that would fall under here. I was already replicating the data that the site needed to another server that I could have used for the failover, and if it was available while other applications were down I would have gotten brownie points for that, but no one would have been too upset if it was down for an hour or two.
Just my opinions though. I could see why someone would agree with the above. I'm certainly no expert in the HA/DR realm.
January 30, 2012 at 9:48 am
Thanks for the reply Brendan. Mirroring is not an option "for this case" because the database on site 2 needs to be available for read. Mirroring is still an option to be implemented as another portion of DR, just separate from this. So, in an ideal world... Replication set up with mirroring on each site would be a good part of a comprehensive plan.
I think of the difference between HA and DR as HA being within your control; i.e. planned maintenance such as a server restart for updates or memory/hardware upgrades. I think of DR as unplanned or out of our control. The situation described is not likely, and replication is primarily set up for reporting and for HA. I am just disagreeing that it cannot be a part of a DRS as well. 🙂
Jared
CE - Microsoft
January 30, 2012 at 10:01 am
I see now, that database at site 2 is active all the time. Makes sense then. I would think it would work in this instance.
Just a difference in thinking I guess. When I think of HA I think of applications that are always available regardless of the reason they are unreachable and whether or not that is in your control. DR kind of says it all, you want to recover your data from a disaster to the point in time before the disaster happened. Could be a natural disaster, human disaster (both intentional and unintentional), and corruption would probably fall in here too. Like I said, they blend together. That's just how all the companies I have worked for have defined those two areas.
January 31, 2012 at 9:16 am
DR and HA are NOT the same thing, nor should they be confused as such. Replication is an acceptable solution for High Availability, but it will not completely recover your business in case of a disaster.
Jared, what you are looking for is a High Availability scenario and you are correct that your replication setup is assisting you with that. But what happens when the building catches on fire? How do you recover your server?
Replication doesn't cover everything. It doesn't even come close. People don't replicate system databases (usually), or SSIS packages, or SSAS / SSRS objects. What about your logins? Your linked servers? Your websites? How does Replication protect those?
The difference between HA and DR is this: HA is making sure the data is available at all times when your customers need it. DR is making sure your data and your systems is protected and can, if it disappears, be recovered.
DR solutions should never be "partial". You should have an all inclusive, top-to-bottom disaster recovery solution for your business put into place, written up and rehearsed by all employees who would be involved, so that when something happens nobody has to think about what their role is. DR should not just cover SQL Server or SQL Server user databases.
If you want to know more about DR, try taking a look at this site: http://www.disaster-recovery-guide.com/%5B/url%5D. It's a little cluttered, but it seems to have some good information in there.
January 31, 2012 at 10:02 am
DR and HA are NOT the same thing, nor should they be confused as such. Replication is an acceptable solution for High Availability, but it will not completely recover your business in case of a disaster.
Jared, what you are looking for is a High Availability scenario and you are correct that your replication setup is assisting you with that. But what happens when the building catches on fire? How do you recover your server?
I believe that HA and DR, although not the same thing, are certainly not mutually exclusive. Almost any article that you find in a google search for either one of these will speak of them both and the tight-knit relationship between them. Same with any textbook. Of course replication will not completely recover my business in case of a fire. How does an onsite database backup recover the entire business? Many mirrored servers are on-site specifically for hardware failure. How does that help with a fire if both servers are destroyed? How does a tape backup help when all of the tapes are destroyed in the same fire?
Replication doesn't cover everything. It doesn't even come close. People don't replicate system databases (usually), or SSIS packages, or SSAS / SSRS objects. What about your logins? Your linked servers? Your websites? How does Replication protect those?
I never said it covers everything, I said in one case or situation it will cover what we need covered. That is the point of a comprehensive plan. Tell me, does your plan involve only database backups? Of course not, but it still does them as part of a complete DRS.
The difference between HA and DR is this: HA is making sure the data is available at all times when your customers need it. DR is making sure your data and your systems is protected and can, if it disappears, be recovered.
I'm sorry, but I disagree on the distinction. In this case they are overlapped. I guarantee that the stakeholders in my business would consider it a disaster if our customers could not get access to their data. In fact, that would be a bigger disaster than an entire site going down. In my opinion, restarting a server for windows updates and having another take on the load is simply HA. Hot Swapping memory for an upgrade is simply HA. Restarting a server because of a crash or unresponsiveness and having another take over is HA and DR. Hot swapping memory because of a problem with the DIMM itself is HA and DR.
DR solutions should never be "partial". You should have an all inclusive, top-to-bottom disaster recovery solution for your business put into place, written up and rehearsed by all employees who would be involved, so that when something happens nobody has to think about what their role is. DR should not just cover SQL Server or SQL Server user databases.
Who said anything about this being partial? The DRS covers everything we expect it to. This involves fires, earthquakes, floods, electric issues, hardware failures, data corruption, network issues, etc. If you have 1 "feature" that can solve all of those problems, I would like to know what it is 🙂 Of course, none exists, but the DRS includes things like backups, mirroring, RAID, Hot Sites, Clustering, etc. together. Not one of these things are all inclusive. In my case, replication is added to that list.
I guess I just don't understand why you say that since replication cannot restore all aspects of a database and the server that it cannot be part of a comprehensive solution when other things that you mention cannot completely restore functionality either. It is in combining several components together that make up a DRS. 'm not trying to argue per se, I am just not really getting the logic of saying that this should not be a part of the plan, but mirroring should. They both have their limitations in different situations and neither one is fully comprehensive. Maybe 1 more than another, but as I demonstrated with an unlikely scenario that still needs attention in a DRS replication was the solution and mirroring would not have even failed-over.
Jared
CE - Microsoft
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply