VMware combined with MSCS Clustering

  • Any thoughts if an MSCS clustering strategy on top of VMware is overkill from an HA perspective? It the current proposed design, but there has been much discussion as to if this is the best approach.

    There are conflicting opinions on the value of MSCS with regards to VMware failures. The proposed advantage of this configuration was to provide a failure to another VMware machine should the primary VM fail. Likewise, the VM could be patched using the failover mechanism of MSCS to minimize the downtime of the patching.

    The VMware support team is questioning this advantage compare to the disadvantages. I am being told the VMware doesn't fail (which I have difficulty believing). Additionally, since hardware failures within the VM are handled by VWware, the MSCS configuration doesn't provide us any additional availability.

    My research indicates that an MSCS failover will be initiated by hardware failures, failure to connect to the disk, IP/Network issues, and SQL Server non-response. (My summary - I have the detailed list from the white papers.) What isn't clear to me is what of these failures VMware addresses. Clearly not SQL Server non-response, but by failing over, this is nothing short of a service recycle.

    I checked quite a few sites on this type of configuration – including Perry Whittle’s article on this site - http://www.sqlservercentral.com/articles/Clustering/66246/ and Brent Ozar - http://www.brentozar.com/ which both imply this is not recommended for a production environment. This configuration was in production at my last place of employment, and our clusters never failed-over (unplanned that is), but I also was not part of the design at the time this approach was implemented.

    Also as a side-bar, there is not much MSCS expertise in-house. This is part of why I'm trying to determine if the hesitation is the configuration, in general, or the use of MSCS.

    The other configuration option on the table is using two physical machines with MSCS. My next point of research is what is more scalable - "SQL on non-MSCS on VM" or "SQL on MSCS with two physical machines".

    Any input is greatly appreciated.

    Thanks,

    Cindy

  • I would have to answer yes and no. It depends on what was purchased when VMWare was setup. I am not a VMWare guy but we use it here and we have in place software that can role a virtual machine to another virtual machine in very little time. so lets say the physical server goes down, with in a couple seconds the VM at the backup site would come on line. Becuase our VM environment is already clustered I would think that doing the same with SQL would deffinetly be overkill and could even cause problems. I could also see the advantage if you do not cluster your vm environment. so I think the first question to ask is what is the failover plan for the VM?

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Hi Cindy

    thank you for citing my article, its good to know people do read them! Let's start at the beginning here.

    The first question, disregarding virtualisation at this time, is do you absolutely need clustered SQL Servers, if so how many nodes?

    ESX 4 nows offers better support for RDM's but remember that there are caveats when using MSCS under VMWare

    you cant vMotion the clustered VM's including HA or DRS.

    you are limited to 2 nodes

    The options you have are;

    A/ cluster in a box where both VM's are on the same host

    B/ cluster across hosts where a VM is located on a separate host

    C/ Physical to virtual where one cluster node is physical and the other a VM

    There are many different things to understand here not least if you are to deploy a VM with huge resources and clustering there is a very valid argument that the machine should not be a VM. The only valid scenario IMHO here is physical to virtual. You have a physical node and need a fail over partner but dont want to dedicate more hardware. This is fine for that!

    Discuss, think, evaluate and then do it some more

    Check this link for more info

    MSCS ESX4

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • Perry,

    Thanks for the response (And yes - I do read your articles!)

    The first question is the biggest point of contention. I don't personally feel we need a cluster - VM or physical - which is the main reason for my post. The applications don't fit my definition of requiring high-availability, but I have been requested by "others" to create a clustered solution on VM. If the battle is lost, the clustered solution will be a two node active/passive configuration.

    In my discussions with the VMware team, we feel we could get almost the same availability by installing a stand-alone SQL Server instance on a VM. The VM should provide a high-level of availability without the complexity and excess hardware required for the clustered solution.

    Therefore, I'm trying to determine if there is any loss of availibility not using the cluster and quantify. For example, if our single VM / single SQL Server solution provides 99%, does the cluster solution provide 99.999% and is the cost justified for what little gain there is.

    I have another meeting with the VM team tomorrow. I want to get their average availability numbers on virtual and physical machines. I'll also mention the cavets you described. No doubt it will mean more to them than it does to me and I think he mentioned the "vMotion" one before.

    Do virtual machines provide higher availability over physical machines or isn't the question this easy? In my opionion, this project needs "reasonable" availability - recovering in a couple of hours, not recovery in a couple of seconds. It is the availability target that is pushing for the original clustered solution on VM.

    Thanks,

    Cindy

  • Hi Cindy

    if we're talking a straight VM with standard VHD's then in my experience VMWare HA is generally quicker and less fuss than MSCS!

    During my first encounters with ESX back in 2005 i performed a test where i had a user logon to a virtual server i had deployed. I then moved the live VM from one host to another without the user realising what i had done.

    However, having said that we had an incident here only yesterday with our SQL Server cluster where the active node public NIC was lost and a failover occurred in the middle of the day, no one noticed (apart from a couple of services that had to be restarted on an application server).

    The first question is key, do you really need to go to all the trouble and expense of MSCS, when VMWare HA may do the trick!

    Regards

    Perry

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • Hi

    We also have a similar situation in my current project where we are planning to have SQL MSCS on ESX. The main purpose of MSCS is for hardware failures which can be offset by VMWare HA. But what about guest OS related issues? If the OS of a standalone/cluster node VM is corrupted unexpectedly and went into BSOD, then how to recover that particular VM?

    One alternative I know is VMWare FT for standalone VMs. But I also heard VMWare FT is slower in recovering VMs than MSCS? Is that true?

    Please suggest whether to go for VMWare FT or MSCS for guest failures??

  • phani112 (9/19/2012)


    Hi

    We also have a similar situation in my current project where we are planning to have SQL MSCS on ESX. The main purpose of MSCS is for hardware failures which can be offset by VMWare HA. But what about guest OS related issues? If the OS of a standalone/cluster node VM is corrupted unexpectedly and went into BSOD, then how to recover that particular VM?

    One alternative I know is VMWare FT for standalone VMs. But I also heard VMWare FT is slower in recovering VMs than MSCS? Is that true?

    Please suggest whether to go for VMWare FT or MSCS for guest failures??

    If your requirement is to mitigate failures at the guest OS level then yes, virutal clusters are ideal and work very well.

    The requirements between MSCS (Windows 2003 and beyond) and WSFC (Windows 2008 on) have changed quite a bit. The main change that affects virtual clusters is the requirement for all storage to meet SCSI-3 persistent reservations.

    The use of RDM's presents the best way to expose your SAN based LUNs directly to the clustered VMs as shared storage.

    With the improvements in Windows 2008 and SQL Server 2012, AlwaysOn groups make a much more attractive HA solution.

    Check the Microsoft SVVP for what is and isn't supported in the way of hypervisors and other software.

    Set yourself up a test system first and use this to validate your requirements.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply