SQL 2000 SP4 Cluster - Cluster Services overwriting registry parameters

  • quackhandle1975

    SSChampion

    Points: 10963

    The Setup:

    Windows Server 2003 Enterprise SP2 7GB RAM

    SQL Server 2000 SP4 + AWE Hotfix

    2 Node cluster Active/Passive

    10 SQL Instances Running

    The Issue:

    SQL Instance fails to come back on after taking it offline. From event viewer the error states that the ERRORLOG can't be found. When checking in registry:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\GSQP038A\MSSQLServer\Parameters\

    SQLArg0 -dR:\Program Files\Microsoft SQL Server\MSSQL$GSQQ009A\data\master.mdf

    SQLArg1 -eR:\Program Files\Microsoft SQL Server\MSSQL$GSQQ009A\log\ERRORLOG

    SQLArg2 -lR:\Program Files\Microsoft SQL Server\MSSQL$GSQQ009A\data\mastlog.ldf

    From above the details are obviously incorrec,t the instance name is different (GSQQ009A resides on the same cluster as GSQP038A)

    and the path is incorrect, drive paths should be O:\ and not R:\

    When we update the registry with the correct details, the service has to be started by restarting the SQL Service via the services.mmc and not cluster admin.mmc. The SQL Service comes back online however takes longer than it should.

    Then we bring Service back online via cluster admin and all is fine however the registry parameters have now changed back to incorrect drive/instance so when taken offline again it will not come back online.

    So the problem lies (we think) with MS Clustering Services overwriting registry keys from another SQL Instance. This occurs on both nodes.

    Any ideas as to why this is happening?

    Thanks In Advance

    Scott

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975

    SSChampion

    Points: 10963

    A workaround was found by hacking the registry values and changing to correct path/instance name before the sql server service was failed over. This works with no errors, however why the issue occured at all is still unknown. :unsure:

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • ganci.mark

    Hall of Fame

    Points: 3482

    I am having a registry issue also.

    Different in that when I install SQL 2000 to new nodes added to the cluster,

    The HKey_local_machine\software\microsoft\microsoft SQL Server\instancename\MSSQLSERVER

    is not being set up completely and does not match the registries on the other nodes.

    Missing keys, unc's instead of drive letter, default port instead of what we assigned, named pipes incorrect etc...

    In this 4 node cluster there are 4 instances.

    2 of the 4 instances installed great and work just fine across all 4 nodes.

    The other 2 are giving trouble on the two new nodes.

    After many reinstalls,my last attempt will be to export reg key from 2 working nodes and apply on new nodes and see if that resolves issue when I fail over. I am hopeful but right now might be 50/50.

    I just hate that I have to do this reg hack.

    I think it is somehow related to communication with the cluster resources.

    not sure. Have posted in two spots and nobody has touched this isssue yet.

  • quackhandle1975

    SSChampion

    Points: 10963

    Thanks for the reply. I had a similar issue to you where the named pipes registry settings for the cluster causing the problems was incorrect.

    Again I had to hack this to change it, however I am not sure if this was related to the original issue. I feel that because this is a SQL 2000 Cluster issue many DBAs won't be aware of these issue or possible solutions as they are running 2005/8 Clusters.

    MSDN/MS aside are there any SQL 2000 Cluster troubleshooting websites out there?

    Rgds.

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • ganci.mark

    Hall of Fame

    Points: 3482

    I resolved my issue.

    I exported the reg key from a working node and applied it to the new nodes.

    (the keys between working nodes were virtually identical so I knew this would be safe)

    Works great now.

    I am not sure why this occurred even after repeated install attempts.(add node install)

    Also 2 other instances out of 4 installed just fine.

    Thanks

    Mark G.

  • Random Visitor

    SSC Veteran

    Points: 274

    Hi,

    I've got very similar problems to the first post - 2 SQLServer 2000 instances - one on each node of a 2 node cluster. I change the startup parameters on one to bring it up in single user mode but cluster admin couldn't start it and the instance is 'failed' . Had been stopping an starting it all ok up until this point. Have tried moving groups onto one node - they move ok but still doesn't start the failed instance. What I have noticed though is that on the node where both instances are located only details of the failed instance is showing in the registry on that node. Details of the running instance - although on the same node - don't appear but they do appear in the registry of the other node that has nothing running on it. It does only show details of that single instance though - not both of them.

    The errorlog of the instance that is failing to start says:

    2011-05-25 18:17:24.53 server Copyright (C) 1988-2002 Microsoft Corporation.

    2011-05-25 18:17:24.53 server All rights reserved.

    2011-05-25 18:17:24.53 server Server Process ID is 4268.

    2011-05-25 18:17:24.53 server Logging SQL Server messages in file 'J:\logs\ERRORLOG'.

    2011-05-25 18:17:24.53 server initconfig: Error 3(The system cannot find the path specified.) opening 'I:\data\master.mdf' for configuration information.

    The instance did start - with logs going to that drive - until I tried to startup in single user mode but now seems to be totally messed up. Any ideas? I've checked the KB articles I could find and got a dump of the cluster resource checkpoints and all components seem to be listed. Have run out of ideas now.

  • ganci.mark

    Hall of Fame

    Points: 3482

    Hello. What was the original trouble or reason for trying to start in single users mode?

    Also based on this message in your log:

    2011-05-25 18:17:24.53 server initconfig: Error 3(The system cannot find the path specified.) opening 'I:\data\master.mdf' for configuration information.

    Have you confirmed that that location and file exists and is accessible? ('I:\data\master.mdf')

  • Random Visitor

    SSC Veteran

    Points: 274

    Hi, the Drive exists and is accessible but the data folder is no longer there - so no mdf! A copy of the master.mdf exists on the original drive still - I moved the master to a new location (per the log description) sometime before and that seemed to work ok. So - as SQLServer is offline in the cluster I simply copied the db files back to the location and the cluster admin can now bring it online ! phew.

    I tried to put the instance into single user mode earlier to move the model db to a new location per Microsoft's tech article instructions, I wanted to detach and attach it. Initially the single user mode didn't appear to work - I put the parameters into Enterprise Manager and I think maybe I got it wrong to start off with. It is in single user mode now as when I try to start up with E.M I can't connect and I get the 'instance is in single user mode' message.

    Thanks for making me take proper notice of the error log 🙂

  • ganci.mark

    Hall of Fame

    Points: 3482

    You're welcome. I can't tell you how many times I have wasted hours of stressful troubleshooting and the problem/solution was right there in front of me the whole time.

    Regards,

    Mark Ganci

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply