AWE in a mixed memory cluster

Question

AWE in a mixed memory cluster

Glenn Owens

SSCrazy

Points: 2099
More actions
July 24, 2006 at 2:58 pm

#169025

I have an active/passive SQL cluster in which the failover and 1 of the active nodes will be upgraded to 8gb of memory - the other 2 active nodes will remain at 4gb each.
Are there any issues to be concered with if one of the 4gb nodes fail over to the 8gb failover node and then back again?
I'm going to configure AWE on the 2 8gb nodes (with the /PAE switch???). The 4gb nodes will only have the /3gb switch set in the boot.ini file
If you're wondering why the mixed memory mode... it's because the project is budget constrained and memory is expensive
Thanks!
Glenn

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

vinny_1973 SSCommitted Points: 1832 More actions · Answer 1

Your problems will arise if your SQL Server is in a running state where it will consume the 8Gb of memory on that node and then that node fails! This will then go onto one of the other nodes, do what SQL Server does best and consume all the memory of the 4Gb node, start paging and grind your second server to a halt!

Basically, if your SQL Server is capable of consuming 8Gb of memory, then all nodes in the cluster need 8Gb ... if not, then there's no real point in doing the upgrade! Plus i'm not sure if asymmetric hardware is supported on an MS Cluster .... regardless of whether it works or not!?

Glenn Owens SSCrazy Points: 2099 More actions · Answer 2

I'm not sure that I understand your comments...

In the active/passive cluster an active node will failover to the designated failover node (not to another active). So if the failover is configured with 8gb it should be able to accomodate any of the nodes in the cluster.

Additionally, since the SQL instances on the 4gb nodes will NOT have the AWE configured - failing them over to the 8gb passive node should simply mean that the SQL instance will at worst use 1.728gb - if the /3gb switch is not used on the failover node.

Did I miss something?

Glenn

CDA SSC Eights! Points: 848 More actions · Answer 3

If I understand this right, you're proposing:

4 node cluster

Node A is the active primary 8GB ram AWE enabled /PAE

Node B is the passive failover 8GB ram AWE enabled /PAE

Node C is the active secondary 4GB ram /3GB

Node D is the active "trinary?" 4GB ram /3GB

First, don't use the /3gb switch on either of the 8GB nodes, even for failover, as it isn't friendly with over 4GB of ram.

Second, any node of the cluster can be failed over to, unless it's specifically configured to only have the "failover" node (B) as it's failover (via setting preferred owners for the resources, etc).

That being said, is the memory being dynamically allocated, or are you setting hard limits? If dynamic, this scenario should be fine. If you're setting hard limits, you may run into issues in a multiple failure scenario, if more than one node are over on the "B" node at the same time (and experience heavy load/use).

Glenn Owens SSCrazy Points: 2099 More actions · Answer 4

Your assessment is correct. Thanks for the heads up on the /3gb switch.

We have configured the cluster so that any active node can failover only to the designated passive node (I assumed that was the definition of the active/passive cluster and that active/active cluster configuration were capable of "selective" failover - sorry - I'm still new to the world of SQL Clusters...)

Also, I wan't aware that AWE allowed for dynamic memory allocation. I thought that I had read that AWE requires a hard upper limit specification and that SQL will consume that amount of memory - which can't be changed unless the SQL instance is restarted after a new AWE configuration???

It would seem that in any SQL cluster you would want to create a failover node that could accomodate a worst-case scenario (in our case - max the failover node with 16gb memory in the event that all active nodes were to failover). Of course there is a ROE/risk to be considered...

Thanks!

Glenn

CDA SSC Eights! Points: 848 More actions · Answer 5

ok, thanks for the clarification.

Worst case scenario, only the failover (b node) is running.

With hard configured memory you've got to allocate chunks of memory to each instance running on the machine. (if you've got 8gb of ram, you have to divy it up such that the sum total max memory <= the total amount of memory in the system, and further remembering the os and any other apps need ~2+gb). This is to keep the instances from "competing" for the memory resources. The downside is that when running in a healthy state, you're not using the max capacity of the server hardware.

It sounds weird, as when everything is healthy, the instances should have access to as much memory as possible on the machines.

This is probably where things will go weird in this scenario, as you've got disparate source systems, it's hard to configure this for a true worst case event happening, as you want to maximize your prod cluster, while allowing the whole evironment to "play nice" in the case of a failure.

A failover of this type may need intervention, to set the min/max memory sets to a "fair share" of the available resources during said event. Otherwise you're going to see contention between instances if the memory levels are left at full production capacity.[EDIT] Also, you will probably have to set the AWE for the non-AWE systems, if you expect to use the higher memory space during the event [/EDIT]

In a multi instance prod. cluster, we've typically either limited the sum total so each can play with each other on a single system (limited both to 3-4gb ram), or defined one / two instances to be non-mission critical (ie, in a true failover event, we only have the critical instances up and running, the non-critical are either severely resource limited (very low max server mem, etc) or just plain disabled until we can bring it back up. The client always needs to identify which systems are which (critical/noncritical), and so far have been pleased during failure testing that core business has been running at full speed.

We've also tossed around the idea of a massive sized ram failover node (the 16gb suggestion above) but so far have been able to work within budget

Glenn Owens SSCrazy Points: 2099 More actions · Answer 6

Thanks for the input! This is a tough question/decision.

I assume that this would be the issue even if we bit the bullet and brought all of the nodes in the cluster up to 8gb. We would still need to consider what to do in the event of a multiple failover event - except with each of the active nodes expecting/working with ~6.5gb under normal running conditions - on failover there would be considerable competition for the ~6.5gb on the failover.

So... we want to maximize the normal running state to improve performance and get the biggest bang-for-the-buck but need to scale back the normal running-state memory consumption (as specified by the AWE configuration) in order to insure that, in a multiple failover scenario, the SQL instances play nicely together until we can get them back to the active nodes.

There's nothing in AWE and or SQL Cluster configuration that allows us to pre-configure for various failover scenarios?

Thanks!

Glenn

PS - just to make sure that I understand the 8gb configuration...

Remove the /3gb switch from the boot.ini
Add the /PAE switch to the boot.ini
using sp_config - enable AWE on the server
set the max server memory (again using sp_config)

Is that about it????

CDA SSC Eights! Points: 848 More actions · Answer 7

You are correct, it's a tradeoff however it's handled, unless you have the resources to create a massive failover node, (which, by definition most of the time does nothing).

So far, I haven't found any pre configuration options for this. MS seems to assume you have sufficient resources on both sides of the cluster to handle what you're throwing at it. Your only option to maximize the non-failover nodes while allowing them to play together is a brief hiccup while you reconfigure the memory allocations for sharing the failover node. Otherwise you're preconfiguring lower than normal memory limits to meet your DR needs.

and your ps steps look right, you may be able to combine the removal of the /3gb and adding /pae to one step, *if you are willing to risk multiple changes at 1 time*. I like doing things seperate, in case something "breaks".

Thanks,

D