Win2003/SQL 2000 Clustering failover

  • We're installing an active/active cluster for the first time.  We've read several posts/articles.  However, there is still confusion regarding failover and resources.

    We have two servers, A and B, each with a named instanced of SQL.  he cluster has three logical drives, one for the quorum and one for each sql instance (Q, D and E).  If Server A fails and the resources are moved to Server B.  How does the SQL instance on Server B *know* of the active databases located on ServerA/drive D?

    If I fail the service on Server A, I see the resource (Drive D) move to the active node, Server B.  However, I don't understand how the databases are accessed from Server B once the failover occurs?  My guess is Server B has copies of the databases from Server A which are updated through some sort of log shipping replication.

    Any help is greatly appreciated.  Thanks,

     

    d.

  • First off lets get terminology consistent.  You have ServerA and ServerB (physical servers) in a cluster, then you have two virtual servers (instances of SQL Server) that are independent of both each other and either of the physical servers.  Instance1 can run on either ServerA or B and the same goes for Instance2.

    So, if you normally have Instance1 running on ServerA and Instance2 running on ServerB, and ServerB fails.  The disks that Instance2 is installed on are moved over to the contol of ServerA and Instance2 begins running on ServerA.  Instance1 doesn't know anything about Instance2's databases or anything else about it.  ServerA does, and allocates resources to it, but the two instances are blissfully unaware of each other.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Then how does failover occur if Server A is *blissfully* unaware?  From what I'm seeing, even though Server A is running the resources of Server B after the failover, the SQL Server instance on Server A doesn't see the second disk drive, Drive E.  If I create a dependency between Drive E and the SQL Server instance on Server A using Cluster Admin, the instance can see and access the drive.  I'm able, at this point, to attach the databases to the instance.  However, this places the cluster in what appears to be an active/passive mode, rather than active/active.

    How does a single application database on ServerA/InstanceA failover to ServerB/InstanceB when the cluster is configured in an active/active mode? 

  • ServerA isn't unaware...Instance1 is.  That's why it's important to maintain consistency in terminology.  Instance1 will not see the drive that Instance2 uses, it can't or else the two instances would not be able to ever run on seperate servers.

    You have to realize that it isn't just the disks and instance of SQL server that are moved to ServerA it's also the name and IP address.  So an application that is configured to connect to Instance2 (NOT ServerB) doesn't know or care which physical server is responding to the request, all it knows is that it gets a response from Instance2. 

    There is no need to reconfigure dependencies or detach/attach databases.  After the failover, Instance2 should startup and be available with no manual inntervention.  Once ServerB is back up and running you can manually move Instance2's cluster group(s) back over to ServerB.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • First, I have to say THANK YOU!!! for taking the time to help me through this! 

    O.k.  Let's see if I have it now....

    The configuration is

    Server A > Instance A > Disk Drive D:

    Server B > Instance B > Disk Drive E:

    If Server B fails, Instance B begins executing on Server A.  And since the resources (Disk E, name and IP) have also been moved, everything continues working.  Correct?

    d.

     

  • You got it.  The most common source of confusion in clustering is that people tend to speak of Servers and Instances as the same thing.  Even in a non clustered environment they are not the same thing, but that confusion gets really ugly when you try to grasp clustering operations.

    Glad I could help.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Woo hoo!!!!!  I got it!!!!! 

    Bummer!  It doesn't work

    When I take Server B offline, I see the resources move.  However, I'm not able to access the instance.  We've had many, many problems with the install.  Could the problem be with how we named the instances?  We installed a different virtual server and instance name with each.

    Example:

    Cluster1 < Virtual Server

    DBCluster1 < Instance name

    Cluster2 < Virtual Server

    DBCluster2 < Instance name

    d.

  • ALL the cluster resources move?  and does the DBCluster2 instance start?  You should be able to see it in the Services applet. 

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • "ALL the cluster resources move?"  Yes.  I see all the resources move.
     
     
    "and does the DBCluster2 instance start?  You should be able to see it in the Services applet."  No.  The instance fails.  I do not see it in the Service applet.   I don't remember ever seeing a second instance in the Service applet.  Problem with the install?  We followed the instructions in the BOL. 
     
    d.
     
     
  • Does anyone have any idea what would cause the following errors?  The documentation I've found states the newest service pack must be installed.  I've done that.  I've rebuilt both nodes and re-installed SQL Server and SP3a. 

    It appears that the services aren't being created on the failover nodes.  I can see the resources move to the other node.  I see where the logical drive, name and IP come back online.  However the services for SQL Server, the agent and full text search fail.  The errors messages are from the event viewer.

    "The description for Event ID ( 17052 ) in Source ( MSSQL$DBCLUSTER2 ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] OnlineThread: Error 2 bringing resource online."

    "The description for Event ID ( 17052 ) in Source ( MSSQL$DBCLUSTER2 ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] OnlineThread: RegOpenKeyExW failed (status 2)"

    Any ideas?

     

    d.

     

  • Are both nodes (servers) in the possible owners for the SQL services?

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  •   yes.  All resources are owned by both owners (DevDBCluster1 and 2).

    I found this KB doc: http://support.microsoft.com/default.aspx?scid=kb;en-us;815431&Product=sql2k.  I have experienced this when installing the service pack 3.  I uninstalled and re-installed SQL and the service pack.  I can see in the registery where the keys have been created on ServerA for InstanceB.  However, I do not see an entry in the Service applet for InstanceB.

    d.

  • and yes...  I still get the same errors when I test the failover.

  • Finally!  a solution!

    After much digging, I found the following articles. 

    KB# 258750

    KB# 815431

    Everything is working now!

  • We also receive the following on a SQL Server 2000 active/active installation.  Do you remember what you did to remedy the situation relative to KB article 258750 or 815431?

    "The description for Event ID ( 17052 ) in Source ( MSSQL$DBCLUSTER2 ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] OnlineThread: Error 2 bringing resource online."

    "The description for Event ID ( 17052 ) in Source ( MSSQL$DBCLUSTER2 ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: [sqsrvres] OnlineThread: RegOpenKeyExW failed (status 2)"

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply