SQL 2017 server keeps locking up

  • I manage a two-node production SQL cluster running SQL Server 2017 CU31 on Windows Server 2019, which hosts one SQL instance and over 20 databases. As the system administrator, I’m responsible for keeping everything operational, but I lack advanced SQL debugging skills.

    For nearly a week now, I’ve been dealing with a frustrating issue: the SQL service frequently locks up without any apparent reason, causing all databases to drop. I can't connect to SQL Server Management Studio, even locally on the active node. The only workaround I’ve found is to fail over to the passive node, but this has led to a continuous cycle of switching back and forth between the two nodes.

    The logs provide minimal information since logging often stops when the issue occurs. I have one generic application crash report in the event log:

    apache
    Faulting application name: sqlservr.exe, version: 2017.140.3460.9, time stamp: 0x63d17a72
    Faulting module name: ntdll.dll, version: 10.0.17763.4720, time stamp: 0xecd88729
    Exception code: 0xc0000374
    Fault offset: 0x00000000000fb819
    Faulting process id: 0x1e70
    Faulting application start time: 0x01d9f4dc1189bdb8
    Faulting application path: C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
    Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
    Report Id: 3ee7c562-86f6-404b-a902-d74dd9e17822

    That’s the extent of the information I have, and I’m at a loss for what to do next. I’ve rebooted the servers multiple times, run sfc /scannow, executed a DISM restore, and checked the disks with chkdsk. I’ve also verified that the SAN, connected via fiber channel, is healthy. Both SQL Server CU31 and Windows updates are current.

    I could really use some guidance on how to resolve this issue. Any help would be greatly appreciated!

  • check the sql server errorlog file for more information.

    ( you can find that path in the sql server service startup information )

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • So to clarify a few things:

    • It sounds like the OS on the active node is still responsive?  Are you able to log into Windows when SQL is down?
    • Presuming the answer to the above is "yes," have you checked the SQL Server service state, both in the SQL Configuration Manager and in Services?  (They *should* be the same, but)
    • Does it "gracefully" fail over when you tell it to, or do you have to "force" the failover?  (Apologies, I've only occasionally played with clustering)
    • Who set up the cluster, you or someone else (excluding SQL from that)?
  • "The SQL service frequently locks up without any apparent reason, causing all databases to drop."

    What to you mean by locks up? and do you mean the databases drop from the server or you just cannot conncet to them ?

    Is your log file filled to capacity by any chance ?

    ----------------------------------------------------

  • Enable the DAC so you can get in and see what is running when it locks up (sp_whoisactive).

    https://www.brentozar.com/archive/2011/08/dedicated-admin-connection-why-want-when-need-how-tell-whos-using/

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply