SQL 2012 sp1 Crash ntdll.dll

  • For the 3rd time in as many weeks, we've had a SQL 2012 sp1 instance go down. 

    Couple changes that were made: 
    Add a linked server type:SQLSERVER and running Sp_blitz and using the linked server to send data back to our management SQL Server. (this job was not running so i guess i cant blame Brent)
    Added CommVault backup agent and started running CommVault backups, however the first crash occurrence was before we implemented this. 

    There is no information in the SQL Logs. The last event was a tranlog backup then the next log entry was the service starting back up.

    Event viewer had a single entry for the crash (aside from the one saying it terminated unexpectedly)

    Faulting application name: sqlservr.exe, version: 2011.110.3000.0, time stamp: 0x5082086a
    Faulting module name: ntdll.dll, version: 6.1.7601.17725, time stamp: 0x4ec4aa8e
    Exception code: 0xc0000374
    Fault offset: 0x00000000000c40f2
    Faulting process id: 0x1178
    Faulting application start time: 0x01d29c04eedd742a
    Faulting application path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
    Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
    Report Id: eb0621d9-0cce-11e7-a90d-005056a90e7f

    Looking for any thoughts, suggestions, or direction on how to further troubleshoot this or capture more data if it happens again.

    Thanks

  • Tom Van Harpen - Monday, March 20, 2017 11:48 AM

    For the 3rd time in as many weeks, we've had a SQL 2012 sp1 instance go down. 

    Couple changes that were made: 
    Add a linked server type:SQLSERVER and running Sp_blitz and using the linked server to send data back to our management SQL Server. (this job was not running so i guess i cant blame Brent)
    Added CommVault backup agent and started running CommVault backups, however the first crash occurrence was before we implemented this. 

    There is no information in the SQL Logs. The last event was a tranlog backup then the next log entry was the service starting back up.

    Event viewer had a single entry for the crash (aside from the one saying it terminated unexpectedly)

    Faulting application name: sqlservr.exe, version: 2011.110.3000.0, time stamp: 0x5082086a
    Faulting module name: ntdll.dll, version: 6.1.7601.17725, time stamp: 0x4ec4aa8e
    Exception code: 0xc0000374
    Fault offset: 0x00000000000c40f2
    Faulting process id: 0x1178
    Faulting application start time: 0x01d29c04eedd742a
    Faulting application path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
    Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
    Report Id: eb0621d9-0cce-11e7-a90d-005056a90e7f

    Looking for any thoughts, suggestions, or direction on how to further troubleshoot this or capture more data if it happens again.

    Thanks

    That's unusual to not have anything at all in the SQL Server error logs. Do you have any dump files being generated when this happens? Mini-dumps or full dumps?

    Sue

  • I dont see any dump files (.mdmp) files.
    for the time in question there are multiple .LOG.1 files(from full text indexing operation), a SQLAGENT.1(shown below) and a system_health_.... extended events file. 
    There is also a trace file (.trc) which ended about 6 minutes before the crash and shows a normal log backup as the last entry.

    SQLAGENT file:

    2017-03-19 13:07:37 - ! [012] The MSSQLSERVER service terminated unexpectedly
    2017-03-19 13:07:37 - + [139] AutoRestart: Attempting to restart the MSSQLSERVER service (attempt #1)...
    2017-03-19 13:07:42 - ! [368] AutoRestart: Unable to restart the MSSQLSERVER service (reason: Access is denied)
    2017-03-19 13:07:47 - + [139] AutoRestart: Attempting to restart the MSSQLSERVER service (attempt #2)...
    2017-03-19 13:07:47 - ! [368] AutoRestart: Unable to restart the MSSQLSERVER service (reason: Access is denied)
    2017-03-19 13:07:52 - + [139] AutoRestart: Attempting to restart the MSSQLSERVER service (attempt #3)...
    2017-03-19 13:07:52 - ! [368] AutoRestart: Unable to restart the MSSQLSERVER service (reason: Access is denied)
    2017-03-19 13:07:52 - ! [140] AutoRestart: The MSSQLSERVER service could not be restarted after 3 attempts
    2017-03-19 13:07:52 - + [360] SQLServerAgent initiating shutdown following MSSQLSERVER shutdown
    2017-03-19 13:07:54 - ! [359] The local host server is not running
    2017-03-19 13:07:54 - ! [240] 1 engine thread(s) failed to stop after 2 seconds of waiting
    2017-03-19 13:07:54 - ! [311] Thread 'JobInvocationEngine' (ID 2552) is still running
    2017-03-19 13:07:54 - ! [359] The local host server is not running
    2017-03-19 13:07:54 - + [098] SQLServerAgent terminated (forcefully)

  • I'd probably still search for any log, dmp or mdmp file extensions during that time. It could be a lot of things and with only getting one entry in the event log when the crashes happen, it won't be too easy to figure out. I'd also keep track of what you see logged before the crash.
    It's looks like it's heap corruption of some sort which can sometimes happen with linked servers - usually old or faulty drivers though.
    And if the backup agent was installed that could possibly be the issue even if there weren't any backups running.
    There a way to monitor heap corruption but if you aren't getting any dump files, that would be pointless.

    Seems the only thing you could do for now and with the crashes being fairly consistent with just the two changes to the server, I'd try uninstalling the backup agent and get rid of the linked servers for now.

    Sue

  • Thanks Sue,
    I've removed the linked server since I've found some posts regarding issues that reference linked servers.
    It appears so far that its been the last 3 Sundays, but since the event viewer is cutoff last Monday i cant confirm that. There are no jobs running during those times. 
    I'll dig around some more and see what i can find, will update if i find anything useful. 

    Thanks,
    Tom

  • Tom Van Harpen - Monday, March 20, 2017 4:42 PM

    Thanks Sue,
    I've removed the linked server since I've found some posts regarding issues that reference linked servers.
    It appears so far that its been the last 3 Sundays, but since the event viewer is cutoff last Monday i cant confirm that. There are no jobs running during those times. 
    I'll dig around some more and see what i can find, will update if i find anything useful. 

    Thanks,
    Tom

    Hi Tom.
    The exception has not been cached by the SQL dumper. I assume that the event log entry was from Windows error reporting (WER). You should find the mini dump under the WER directory.
    Your susspiscion that this is caused by the linked server provider might be correct. Analyzing the dump would confirm that.
    To avoid the instance crash, you should set the provider to run outside of the SQL server process (instance restart required- in some rare cases OS reboot is needed for the change to become affected). This way, next time a failure in the linked server provider will result just in an error message and will not affect the SQL Server process.
    Please.S. some providers work only inprosess of the SQL Server.
    Hth

  • Tom Van Harpen - Monday, March 20, 2017 11:48 AM

    For the 3rd time in as many weeks, we've had a SQL 2012 sp1 instance go down. 

    To be blunt, 2012 sucked until sp3 came out.  It even had a bug where if you rebuilt a cluster index online under certain conditions, it would corrupt the table. 

    I strongly recommend that you upgrade your 2012 to the latest SP/CU NOW!  Don't wait, just do it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,
    I feel that is good advice. I'd like to know why it happened but i guess if i can get to the point where it doesnt happen again that would be acceptable. Also i appreciate the bluntness 🙂
    Tom

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply