SQL Server 2012 AlwaysOn Groups and FCIs Part 1

  • Comments posted to this topic are about the item SQL Server 2012 AlwaysOn Groups and FCIs Part 1

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Good work.

  • I currently work in a very small shop with a limited HA/DR plan. I have been looking at the options to enhance our plan and AlwaysOn seems really cool. I see the power of multiple nodes and a Listener that serves as the entry point into the group. Here is my question...what about the server hosting my business application? I see where I can make my databases very resilient - which is great. However, what happens when my application server blows up? Am I stuck with down time until I can bring up a new instance? Do the applications you work with all have built in HA?

    Thanks!

  • heb1014 (2/10/2014)


    Here is my question...what about the server hosting my business application?

    Without knowing a bit more about the app hard to say. First point of contact should be the vendor. WSFCs can host many clustered resources, generic services among them. Obviously you'd run this on separate cluster nodes to your AO replicas which will increase the node count required for the Windows cluster, but hey there's no such thing as a free lunch 😉

    heb1014 (2/10/2014)


    However, what happens when my application server blows up? Am I stuck with down time until I can bring up a new instance? Do the applications you work with all have built in HA?

    Thanks!

    It's down to the design of the application and what it will support, it may be possible that your app could be set up as a clustered resource in your WSFC.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Good article.

    IMO Microsoft has "one upped" Oracle with Always On over Oracle's RAC solution.

    I am a multi-platform DBA. I administer both SQL Server with Always On, and Oracle RAC. SQL with AO is a far less complex, less expensive, and a more resilient solution than Oracle RAC. RAC does not deal at all with the storage being a SPOF. It is shared between all nodes of the cluster simultaneously. On numerous occasions I have experienced momentary hiccups of the storage server, and that always causes the entire cluster go down. So much for being high available.

    With SQL + AO I have never experienced that because unlike RAC it is shared-nothing. IMO RAC is a scalability solution even though Oracle touts it as the ultimate high availability solution. It is not. SQL + AO is both scalable and highly available.

  • chuck.hamilton (2/10/2014)


    I am a multi-platform DBA.

    As am I

    chuck.hamilton (2/10/2014)


    With SQL + AO I have never experienced that because unlike RAC it is shared-nothing.

    That's not completely true and exactly what this article seeks to discuss. If you introduce a failover clustered instance of sql server into the AO group there is still shared storage and a SPOF somewhere in the cluster.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • That's not completely true and exactly what this article seeks to discuss. If you introduce a failover clustered instance of sql server into the AO group there is still shared storage and a SPOF somewhere in the cluster.

    With SQL and AO, give me a scenario where some shared storage goes offline and it renders your data unavailable.

    If the quorum disk goes offline, who cares? At worst the cluster loses quorum and the default behavior is to leave all resource groups in the cluster running. They simply remain frozen on whatever node they were on until quorum can be reestablished and the cluster service can be restarted on all the nodes. In the meantime, the AO listener switches connections over to a replica in a matter of milliseconds - automatically.

    With RAC if your shared disk goes offline for even a millisecond, you've got to restart the entire cluster. That is a time consuming, completely manual operation. I've never had a situation where I could simply restart CRS, or even just reboot the nodes. They need to be completely power cycled or CRS wont start correctly. That process takes 30-40 minutes per node, and they must be power cycled one at a time.

    When this happens at 3am I'd much rather let AO switch connections over to a replica automatically while I sleep, than get woken up and spend the next hour or two bringing the RAC cluster back up.

  • chuck.hamilton (2/10/2014)


    With SQL and AO, give me a scenario where some shared storage goes offline and it renders your data unavailable.

    I never said it did, you want to review my initial response. I have pointed out that bringing an FCI into an AO group still has a dependency on shared storage and introduces a SPOF. AO does help to protect, but it's important to get the cluster quorum configuration correct from the outset

    chuck.hamilton (2/10/2014)


    If the quorum disk goes offline, who cares?

    If your cluster has no shared storage how do you manage to have a quorum disk witness?

    Unless your cluster is "same site" and has access to shared storage, you should use an alternative quorum configuration (based on your site config).

    chuck.hamilton (2/10/2014)


    At worst the cluster loses quorum and the default behavior is to leave all resource groups in the cluster running. They simply remain frozen on whatever node they were on until quorum can be reestablished and the cluster service can be restarted on all the nodes. In the meantime, the AO listener switches connections over to a replica in a matter of milliseconds - automatically.

    Erm no, what I think you're describing here is a loss of multiple nodes resulting in the cluster going offline, correct? If this happens the listener will not be able to failover since it's a cluster resource.

    Now, clustering under Windows 2012 has a new HA protection feature called "Dynamic Node Weight Configuration", this feature will balance out the quorum votes to account for a graceful shutdown of multiple nodes. The problem is, this will not protect against a disaster or sudden outage. I'll cover more of the cluster requirements in Part 2.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • great article, can't wait for part 2

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply