SQLServerCentral Article

SQL 2008 R2 Cluster Setup: Easy? Not So Much.

,

The problem

A few days ago I was adding the third instance to a two-node SQL Server 2008 R2 EE on a Windows 2008 R2 EE failover cluster, when near to the end of the setup process of the first node, the setup.exe crashed, leaving the installation in an incomplete state.

I tried to run the repair tool but it would not work and I received the "the installation has never been configured" message.

A similar crash of the setup.exe happened to me once before, but as it was the first instance setup, I preferred restoring the image of the server that had been made just after having finalized the Windows cluster configuration.

This time it was different: I already had two working instances running and I did not want to start from scratch nor cause downtime, etc.

Uninstalling a specific SQL Server Instance is not very easy, even when everything goes the right way, removing a node from a Cluster could be even more challenging. I thought I had to try to manually repair the installation of the first node and then try to add the second node to the cluster.

Evaluating the situation

I started looking at how dramatic the situation could be: what had been installed, what would run and what would not. The SQL Server and SQL Server Agent services where present in the SQL Server Configuration Manager, but they were not started.

While I could find the SQL Server resource in the Cluster Manager, the SQL Agent was nowhere. I tried to bring the Cluster resource online; Network, Disks and SQLServer started gracefully. I felt some relief, the fact that the cluster resource was online gave me some hope!

I went back to the SQL Server Configuration manager and I tried to start the SQL Server Agent service; it fell back, so I went to the Application and System Event Logs and I found the errors logged by the SQL Server Agent service while it was trying to start. It cried a lot, in fact, saying that it could not find the SQL instance it belonged to.

Some of the error messages were:

SQLServer Error: 65535, SQL Network

Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].

And

Logon to server 'ClusterResource\InstanceName'

failed (DisableAgentXPs)

Strangely, the cluster resource and the instance names were correct, and after starting SQL Server I had successfully connected to it through SSMS.

Making the SQL Server Agent service start

I googled around and found this illuminating post that explained where to find the Registry keys, which I was sure I would have to tweak:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.InstanceName\SQLServerAgent]

I was told to complete the ServerHost key with its value:

"ServerHost"="ClusterResourceName\\InstanceName"

Again I tried to start the Service from SQL Configuration Manager and this time it started gracefully. So first step was done, but now I had to add the service to the Cluster resource.

Adding the SQL Server agent service to the cluster

Now to add a specific service to a Cluster resource. I will not be doing it by using basic Cluster.exe syntax like:

cluster restype "SQL Server Agent" /create /DLL:sqagtres.dll

as I have found in a very useful MSDN forum post. This would be wrong, because I already had a generic "SQL Server Agent" resource in the cluster (remember, I already had two SQL Server instances installed). Plus I needed to add the specific SQL Server Agent that was part of the instance I was installing (i.e. "SQL Server Agent (InstanceName)").

Instead, these are the steps I took:

First, in the Failover Cluster Management I used Add Resource to add the SQL Server Agent resource. I named the resource as “SQL Server Agent (InstanceName)” after the real instance name.

Right-clicking on the SQL Agent resource, I went to Properties and filled in values for the two parameters:

  1. The Virtual Server name of the SQL instance
  2. The Instance name

Now I added the SQL Server Resource as a dependency for the newly created SQL Server Agent resource.

But one more magic step was necessary.

Making the system believe that the features were correctly installed and configured

Without this operation all the above steps will fail. I also thought that if I had found this tweak before, I might have tried to repair the installation with the repair tool from the very start!  

I opened the Registry Editor and browsed to the following location:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft Microsoft SQL Server\MSSQL10.INSTNAME\ConfigurationState

This is a named instance. Here you have to make sure that the values for each of the keys present are 1. My values for all the four features listed were "2", so I changed them all. This will set the configuration state for all SQL features as "installed and configured". A value greater than 1 indicates either an install failure or configuration failure.  

Bringing online the cluster resource

It was time to try to bring the resource online, and it worked perfectly. I checked for eventlog errors but could find none. It was time to install the second node.

Adding a node to a SQL Cluster

Well, believe it or not, I really thought it would have been a straightforward operation, now that I'd patched my setup on the first node. Of course, I was wrong.

During the simplified "add node to existing cluster" SQL setup process, Setup.exe asks less questions because once you have agreed that you are adding a node to the proposed instance, you're just asked for the passwords of the SQL Server and of the SQL Server Agent service accounts (those service accounts being read-only as they were of course already defined during the first node setup).

So what a surprise when setup.exe asked me for the password of an SQL server Agent's EMPTY service account! I sighed, cancelled the setup and went back to the first node. What to do next?

Well, I found out that it was my fault! I had not executed the fourth step that was advised in the MSDN forum: i.e. running a repair of the failed SQL Server installation!

This time the repair went straightforward. I launched the setup of the second node again and everything went perfectly. I then could do the usual post-setup tests and configuration operations and I am now the happy Administrator of my 16th clustered SQL Server instance!

   

Resources:

Let SQL Server Agent service start

http://www.eggheadcafe.com/software/aspnet/32049165/i-surrender-sql-server-agent-fails-to-start-2-node-cluster.aspx

SQL Server Agent not listed as cluster resource (in MSDN SQL Server forums)

http://social.msdn.microsoft.com/Forums/en-US/sqlsetupandupgrade/thread/5049fc44-0f44-4caf-9c5a-282edf30d878

How to: Repair a Failed SQL Server 2008 Installation (in MSDN library)

http://msdn.microsoft.com/en-us/library/cc646006.aspx

(Originally published by myself on my blog at

Rate

4.67 (12)

You rated this post out of 5. Change rating

Share

Share

Rate

4.67 (12)

You rated this post out of 5. Change rating