SQL Server 2016 cluster issue

  • Hi,

    We have installed SQL server 2016 Standard (SP1)  in two-node Windows 2016 Fail-over cluster successfully..
    <by mistake> When we restart SQL engine service from Configuration manager on Active node, SQL services not coming to ONLINE in any of the nodes,  Cluster resource 'SQL Server (DEVSQLCLU16)'  status getting failed after 'Pending Online' status with below error logs.

    Error from Event logs: 
    Cluster resource 'SQL Server (DEVSQLCLU16)' of type 'SQL Server' in clustered role 'SQL Server (DEVSQLCLU16)' failed.
    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

  • Any extra information in the SQLServer errorlog ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA - Wednesday, February 21, 2018 3:57 AM

    Any extra information in the SQLServer errorlog ?

    Hi Johan,
    Thanks for your response, Unfortunately SQLServer errorlogs got overwritten because we have rebooted number of times for work around but didnt work.

    Below errors from recent SQL Servererrorlogs:
    2018-02-20 14:32:51.98 spid5s  Error: 25623, Severity: 16, State: 1.
    2018-02-20 14:32:51.98 spid5s  The event name, "D0234D96-8A83-4636-A717-41459AF88D71.XtpEngine.xtp_physical_db_restarted", is invalid, or the object could not be found
    2018-02-20 14:32:51.98 spid5s  Event session "telemetry_xevents" failed to start. Refer to previous errors in the current session to identify the cause, and correct any associated problems.
    2018-02-20 14:32:51.98 spid5s  Error: 25709, Severity: 16, State: 1.
    2018-02-20 14:32:51.98 spid5s  One or more event sessions failed to start. Refer to previous errors in the current session to identify the cause, and correct any associated problems.

    FYI - databses we are able to access sometime ( when resource state is ' Pending ONLINE') , after few minutes its going done

  • nagarjunaraju123 - Wednesday, February 21, 2018 2:49 AM

    Hi,

    We have installed SQL server 2016 Standard (SP1)  in two-node Windows 2016 Fail-over cluster successfully..
    <by mistake> When we restart SQL engine service from Configuration manager on Active node, SQL services not coming to ONLINE in any of the nodes,  Cluster resource 'SQL Server (DEVSQLCLU16)'  status getting failed after 'Pending Online' status with below error logs.

    Error from Event logs: 
    Cluster resource 'SQL Server (DEVSQLCLU16)' of type 'SQL Server' in clustered role 'SQL Server (DEVSQLCLU16)' failed.
    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    You should never start/stop/restart SQL Server services using either the services applet - or SQL Server Configuration Manager that are installed in the cluster.  You should always utilize the Failover Cluster Manager to bring the resources and services online, offline or migrate to another server.

    The services you be set to Manual so the cluster service can manage when the services are started on that node.

    Why are you restarting SQL Server in the first place?  There should be no reason to do that on a regular basis.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • - one of the things you must configure with sqlserver is the number of errorlog files to keep after startup. (now you get why)
    - I would have guessed it would mention somewhere it is supposed to only be started using Cluster manager, which would make the solution more obvious .

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Jeffrey Williams 3188 - Wednesday, February 21, 2018 1:04 PM

    nagarjunaraju123 - Wednesday, February 21, 2018 2:49 AM

    Hi,

    We have installed SQL server 2016 Standard (SP1)  in two-node Windows 2016 Fail-over cluster successfully..
    <by mistake> When we restart SQL engine service from Configuration manager on Active node, SQL services not coming to ONLINE in any of the nodes,  Cluster resource 'SQL Server (DEVSQLCLU16)'  status getting failed after 'Pending Online' status with below error logs.

    Error from Event logs: 
    Cluster resource 'SQL Server (DEVSQLCLU16)' of type 'SQL Server' in clustered role 'SQL Server (DEVSQLCLU16)' failed.
    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    You should never start/stop/restart SQL Server services using either the services applet - or SQL Server Configuration Manager that are installed in the cluster.  You should always utilize the Failover Cluster Manager to bring the resources and services online, offline or migrate to another server.

    The services you be set to Manual so the cluster service can manage when the services are started on that node.

    Why are you restarting SQL Server in the first place?  There should be no reason to do that on a regular basis.

    Hi

    That's ok .
    But if any body start by configuration manager then any solution to bring online cluster services.

  • If the cluster manager has already started SQL Server on that node - then you cannot start it from configuration manager.  If SQL Server is not running - then you need to figure out why it isn't running before attempting to start it back up.

    Cluster Manager will take care of starting SQL Server on the node it should be running on - if that node fails then it will start it up on one of the other nodes in the cluster.

    I don't understand where there would ever be a need to manually start SQL Server in a cluster.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • nagarjunaraju123 - Wednesday, February 21, 2018 2:49 AM

    Hi,

    We have installed SQL server 2016 Standard (SP1)  in two-node Windows 2016 Fail-over cluster successfully..
    <by mistake> When we restart SQL engine service from Configuration manager on Active node, SQL services not coming to ONLINE in any of the nodes,  Cluster resource 'SQL Server (DEVSQLCLU16)'  status getting failed after 'Pending Online' status with below error logs.

    Error from Event logs: 
    Cluster resource 'SQL Server (DEVSQLCLU16)' of type 'SQL Server' in clustered role 'SQL Server (DEVSQLCLU16)' failed.
    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    Connect to one of the nodes and using failover cluster manager, ensure the role is offline.
    Once the role is offline start the resources separately and in this order then see which fails and report back

    • start disk resources
    • Start network name
    • start sql server resource
    • start sql server agent resource

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks for all your response.

    Issue: The issue was happened because of  "Hide instance" is enabled.
    As part of our company security baseline policies we have implemented hide instance flag to 'YES' for SQL Server instance.
    After changing the default of the SQL Server instance and starting the SQL Server instance from SQL Server configuration manager, we are unable to start succesfully the instance.

    Analysis: 
    SQL service has started, but then the cluster service cannot connect to the instance 

     Errors after attempting to bring online SQL Server resource:

    • Event ID 1254: "Clustered role 'Cluster Group' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay peri
    • Event ID 1069: "Cluster resource % of type 'SQL Server' in clustered role 'Cluster Group' failed.

    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet."

    • Event ID 1205: "The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application. 

    Fix:

    Unhidean instance of the SQL Server Database Engine:

    •        In SQLServer Configuration Manager, expand SQL Server Network Configuration,right-click Protocolsfor <server instance>, and then select Properties.
    •            On the Flags tab, in the HideInstance box,select No,and then click OK 

     After changing the above value, we bring online the SQL Serverresource.

    NOTE: For cross check we have restarted the SQL Server service through SQL ServerConfiguration Manager and services.msc and it worked fine.

     

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply