Part of my disaster recovery strategy are involves using DM. it’s not to say I disagree with you, but I’ve got following reason why I use it. If you think DM is not a good tool, do you have any other suggestions in mind? I’m always looking for ways to improve my disaster recovery plan.
Our organisation have SLA (I believe most organisations would have too), and system must be up within xxx time. There would be a chance where whole site fails indefinitely. Sounds weird, but it is.
This really depends on the SLA. If DB mirroring is not used, log ship or any other method would need to be used in place. So, in any way, disk space would need to be allocated and it’ll be expensive. Backup/Restore is just not viable for a 2hours RPO/RTO especially true for VLDB. Things also to consider -> whole server down, so where can you put VLDB without a warm stand-by? Clustering is for same-site SQL availability.
SAN fabrics are always redundant, no? When you buy a 1 x server + 1 x SAN, you’ll make sure it has 2 HBAs to 2 SAN controller/switch, etc. But I reckon some would go with configuring SAN to performance instead of redundancy, and therefore they have this single point of failure.
True. Script all across DR server but disable them.
True. My preference would be high performance though.
You can use SQL Express edition to host the witness to reduce costs.
Transfer logins to DR server. If there are any user logins mapped to DB, fail-over mirror to DR, create the login and default to DB, and switch back to primary. User ids can be easily fixed with sp_change_users_login. I believe jobs can be scheduled to sync logins between both servers too. We’re using lots of web app + DB, therefore, SQL passwords are not changed frequently.
Agree. Therefore, would need to transfer them during implementation.
• In SQL Server Standard Edition, the mirror server always uses a single thread to roll forward the database.
• In SQL Server 2005 Enterprise Edition, mirror servers on computers with fewer than five CPUs also use only a single thread.
• With five or more CPU's, a SQL Server 2005 Enterprise Edition mirror server distributes its roll forward operations among multiple threads during a failover (known as parallel redo) and is optimized to use one thread for every four CPUs. This improves the failover time on the mirror server as the log records can be rolled forward in parallel.
Very true. Its preferred to set high performance if you’ve got issues with performance
True. Transaction log will grow until mirror is available. I would suggest if mirror is down indefinitely, remove mirror and fix the mirror before re-establishing DM.
Agree. High safety full should be used when the servers are side by side
I’ve not had any issues with my smaller databases. I’ve got 2 separate sites with ~ 20 DBs mirrored across. All servers are tuned to support DB load on the primary. Very expensive architecture, but its all about business and SLAs.
Replication are definitely not a good option for DADR.
Thank you, Simon, and all others who participated in this thread.
FWIW, I was DBA Manager and inherited paired clustered servers at six sites throughout the US, all doing merge replication. It was a real headache; clustering was down more often that the SQL Servers were.
I do agree that clustering has its benefit. But for me, its still a single point of failure. Clustering is availability vs DM is redundancy. I don’t doubt the reliability of SAN but sometimes shits just happen.
We too, had SAN that replicate at disk level (MirrorStore) replicating to another site. But unfortunately, not all DBs are available on SAN and I’m not used to the concept of sharing a disk for few DBs/app, especially those busy OLTPs. Say SAN replication is enabled, how would you put the DB up if your building is burnt? Would business wait until the SQL instance is up? You’ll still need a SQL instance somewhere to access the replicated DB, and this is what I’m using DM for. Sometimes, businesses enforces that the system should never be down for mere few hours. It costs them too much.
There are even more expensive solution such as setting up mirror + clustering, which quadruple the hardware and costs. But, I reckon these are case by case. Nothing is “one” best solution. I’m sure you’ve got your reasons for clustering, and its working for you.
I would also like to share a bit of my experience on log shipping.
Log shipping was a headache for me too because it requires intermittent attention, although it’s a bit less now. It doesn’t work very well with VLDB as it constantly giving error messages on backup/restore alerts jobs. Fortunately we’ve got MOM for monitoring and would know which ones are not false alarms and making sure the alerts are minimised (E.g increase log backup frequency, set log restore to be a bit higher, increase log shipping monitor alert threshold, etc). So far, we’ve reduced a lot of false alarm and manage to monitor all DBs pretty well (but not without 0 effort though). That’s why I’m interested to slowly use DB mirroring if it proves superior over log shipping. There’s a custom solution building log shipping + SQL Lite Speed to compress the transaction logs before sending over the network. I'm pretty sure there are codes out there that can be downloaded and tested. I remember by a person name Chris Kempler who wrote a similar solution.
The business I’m in requires redundancy, even at some point of time it might sacrifice a bit of performance.
Ok, now back to DM. I’ve not changes any of my application to be re-written to DM aware. All our DRP plans require manual intervention, which is my preference too. I don’t like if a system switches itself back and forth when we’ve got network glitch rather than a real DR situation. We created a DNS CNAME for our SQL Server and application are config to connect using this CNAME. If primary server is down, our network guys can quickly change the CNAME to have the DR IP. A forced DNS/IP refresh can happen at any time. So, from the application point of view, no config changes would need to be performed. This requires manual intervention, which is part of our policy. This CNAME in some way works the same as clustering virtual config where CNAME = SQL virtual name, CNAME IP = SQL virtual IP and CNAME IP can be changed at any time.
An important point to DM, the default failover time for DM is 10s. This is way too low! I would say at any time, set it to a higher value if you've got an automatic failover enabled, e.g. 90s
ALTER DATABASE <DB Name> SET PARTNER TIMEOUT 90;
With or without clustering, the server needs to be spec-ed to support the DBs load its holding. I don’t think clustering helps in many ways in terms of performance. Funnily enough, I found its harder to support clustering than DM. M$ specifically do not recommend clustering for separate geographic location as likely you’ll need to involve hardware manufacturer in any issue. Read the part on the Node Location
I would be very interested how you architect your servers for a business objects and its pros and cons.
To clarify, I use MS clustering for local failover high-availability in an active-passive configuration and no way implied clustering as site failover alternative. For failover to our remote site which is more than 1,000 miles away we use SAN based EMC SRDF/A to identical H/W at our dark site. We too manually swing the DNS entry upon failover to the remote server and both the DNS swing and EMC commands are scripted. Our failover time even with mounting the R2s is is pretty close to the time it takes to swing the DNS entry.Based on my experience, yours may vary, I can state the following generalizations:Hardware solutions such SRDF/A are generally better than S/W solutions such as Data Mirroring.Our spend on a hardware i.e. SAN solution is much more than the spend on S/W solution using DM.Data Mirroring I would characterize as similar to log shipping i.e. phyiscal operation of replaying log restores to a remote host. Having used log shipping both the MS delivered and customized for several years and given the choice over DM/Log Shipping and H/W based disk replication I would choose disk based replication. And finally having run disk based replication over SRDF/A for over two years, I can attest that it works and requires 0 DBA hours. I can not say the same for log shipping. I would also state that SRDF/A is generally better than MirrorView/A (You get what you pay for).My two biggest complaints with DM is that it requires rewriting your applicaitons to be DM aware if you want automatic failover and it is a physical operation instead of a logical operation.
You point sounds valid enough. We’re talking about different situation. So, I wouldn’t say clustering is any bad, its just difference in approach for different purpose.
Just a few more things to understand on your disk mirroring. If I understand correctly, mirroring on the disk level just merely brings the DBs/Logs to a different SAN location. Say if you primary location has an indefinite power outage (or assume its one day), how would you made sure the SQL can be brought up in say, the next hour or so. Do you have another SQL box at your DR location ready and connected to that replicated SAN disks? Do you re-attach the DB? What about the SAN zoning to the disks? Do you already have fibre laid to the SAN at DR site?
From what I can see, you can replicate an active DB across to another SAN but cant have any SQL connects to the DB. The disk mirroring basically just replicate contents across.
I appreciate you sharing your experience. It would help me to think of different approaches. I’m very interested to know how you would be able to bring up your DBs in layout you’ve mentioned.
There are many ways to write the connection string, but below is one example. In this example 'ServerA' is the principal, 'ServerB' is the mirror, AdventureWorks is the database when using Windows Authentication to connect to the SQL Servers:
"Data Source=ServerA;Failover Partner=ServerB;Initial Catalog=AdventureWorks;Integrated Security=True;"
Below are some more helpful links that might answer most of your questions\concerns.
http://msdn2.microsoft.com/en-us/library/ex6y04yf(vs.80).aspx (What's new in ADO.net 2.0)
http://msdn2.microsoft.com/en-us/library/5h52hef8(VS.80).aspx (Connection String Info)
http://msdn2.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring(VS.80).aspx (SqlConnection.ConnectionString Property )
I attempted to implement DM on a 3-box workgroup using the GUI wizard, but failed, apparently because the wizard's logic does not work for workgroups. So next I tried to implement Michael Merriil excellent script for non-domain DM. Everything worked fine (endpoints, certificates, etc.) until it came to actually setting up the mirroring. Then it failed again.
I am wondering if it has to do with the presence of named instances. For example the primary is called jaguar/sql05dev. The primary box also has another instance jaguar/sqlexpress. The other boxes also have named instances, although only one SQL 2k5 instance exists on each. Anyway, I am suspicious of
alter database aspnetdb set partner = 'TCP://192.168.1.4:5022' as there is no named instance specification.
How do I specify a named instance with the IP address notation?
That's correct at the DR site there is an identical server where all volumes are replicated at the disk level from one SAN to another SAN over two OC3 pipes using SRDF/A. The DR server does not have the disks mounted, they are unmounted and a series EMC commands are run to mount the volumes. The DR SQL Server is configured as follows:
One time setup -- Use a temporary volume to install SQL Server service using same instance name and path as primary site. Set SQL Server service to manual at DR site, remove temporary volume, setup SRDF/A to replicate all volumes except the OS this includes the system databases (master...). The repliated volumes remain unmounted. In event of disaster or during several annual tests run EMC scripts to mount the volumes, start SQL Server service and swing DNS entry. When SQL Server comes online its a mirror image of primary site SQL Server. The actual time it takes to mount the volumes is almost instaneous. The disk replication is within one minute of the primary site and setup to be consistent across a set of disks. The particular setup we are running is nothing speical in that you can have EMC design a identical solution. Here's one more cool thing about using SAN based replication. Since its a SAN based solution we use the same technique for all of servers including Unix, Windows, web, app and of course SQL Server. On the Web and App servers we maintain dark servers and replicate teh boot partition i.e. the C drive. Because of MS cluster setup at the primary site and the lame setup of storing IP Addresses as part of the cluster configuration we cannot replicate the boot partitions of the SQL Servers. Although we have replicated boot partitions for non-clustered SQL Servers in the past, however as I now longer have non-clustered SQL Servers.