CHECKDB - Cannot continue the execution because the session is in the kill state.

Question

Post reply

CHECKDB - Cannot continue the execution because the session is in the kill state.

jsqldba

SSChampion

Points: 11257
More actions
August 16, 2016 at 12:37 am

#312500

we started getting this last weekend on a large-ish AlwaysOn database (800GB).
it failed on all 3 replicas last weekend. next run it succeeds on two and fails on one (a secondary).
using Ola's scripts, full checkdb.
any idea what the issue is? i don't find anything useful online.
thanks!
Date and time: 2016-08-15 20:00:09
Command: DBCC CHECKDB ([XXX]) WITH NO_INFOMSGS, ALL_ERRORMSGS, PHYSICAL_ONLY
HResult 0x254, Level 21, State 1
Cannot continue the execution because the session is in the kill state.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jack 95169 SSC-Addicted Points: 410 More actions · Answer 1

I'm having a similar problem

Database is 225Gb

NOT Always On

SQL Server 2012

Slightly different command DBCC CHECKDB ([xxx]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

Error says HResult 0x254, Level 21, State 1 Cannot continue the execution because the session is in the kill state. Process Exit Code 1.

jsqldba SSChampion Points: 11257 More actions · Answer 2

For my issue, i think the problem was related to the CHECKDB running out of disk space for the snapshot. we increased the disk space sometime around the time of the issue, and the error went away. i forgot all about this question.

Jack 95169 SSC-Addicted Points: 410 More actions · Answer 3

Thank you - that makes sense, even though the error message does not! The server is low on space and I have asked for more

Sean Perkins Ten Centuries Points: 1377 More actions · Answer 4

I'm receiving this error message and I don't have a space issue.

Are there any other suggestions to check out?

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
One of the greatest tragedies in life is to lose your own sense of self and accept the version of you that is expected by everyone else.

Hans Lindgren SSChampion Points: 10496 More actions · Answer 5

It could be due to an Always On Availability Group 'Failover'...
Whenever I have a running query, connected to an AG Listener (and to one of the AG databases), and there is a failover, I get the error:
.Net SqlClient Data Provider: Msg 596, Level 21, State 1, Line 0
Cannot continue the execution because the session is in the kill state.
.Net SqlClient Data Provider: Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

Alex Gay SSCrazy Points: 2234 More actions · Answer 6

I know that this is an old thread, but it seems to be being updated regularly so I assume that people are still having problems with it.
This error message is caused by your HAG (High Availability Group) SQL Server failing over from one node to the other.
If your HAG is in Synchronous-commit mode log entries cannot be committed on the primary node until they have been written to the secondary nodes. If the heartbeat between the machines fails, and the server attempts to fail over it has to ensure that both nodes are in a consistent state. The databases change from being Synchronized to Resolving mode, and any transactions on the primary node that have not been committed to the secondary nodes are rolled back. This rollback is what gives you the error message, as the automatic rollback state is the same as if you had issued a kill command against the SPID.
The cause is usually network related, especially if the nodes are geographically separated, as will be the case with a DR copy at a remote site. If they are not then check you local NICs and switches.
The recommended resolution from Microsoft is to increase the LeaseTimeout and/or HealthChecktimeout values.
More information can be found from Microsoft.

Joe Kawalec Newbie Points: 5 More actions · Answer 7

I have a similar issue but not related to AGs. We're on a solid state NetApp platform with plenty of space (or so it appears). My dbcc check will run for about 22 hours and then fail with a kill state error. Now the db is ~13TB in size so running long is pretty usual. In the past it has finished successfully but lately it has not. It will through an error about the session is in a kill state and there is some corruption. Here's a more detailed error and it seems to jump around the mount points.

I had our hardware dept review and they can't find any corruption using check dsk etc. Just wondering if there is a timeout ceiling etc.

Thanks for any assistance.

The operating system returned error 1392(The file or directory is corrupted and unreadable.) to SQL Server during a write at offset 0x0000131601c000 in file 'M:\Clarity_Report7\mnt\SQLData\CLRpt7.ndf_MSSQL_DBCC12'. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Sue_H SSC Guru Points: 90861 More actions · Answer 8

The error is on the internal database snapshot used by dbcc checkdb. The snapshots use sparse files but those can get large if this is a decent sized clarity database due to the activity on the source database. The snapshot is created in the same directory as the data files for the database. Not sure how much space is available but if you run out of space on the snapshot, it can go corrupt or become unreadable. But space issues would be my first guess. Do you know how much space is available when you run the check? Did you check for additional error messages in the SQL Server log and the windows logs?

Sue

Joe Kawalec Newbie Points: 5 More actions · Answer 9

Putting this on this thread as well: Thanks Sue, I was just making sure the thread was still be monitored. So we have 3.8 tb mount points with 60% free space on them to date. That just over 2tb of space (which we assumed to be enough) with which to do the dbcc. Our DB with the issue is about 13tb. I initially thought of space as well but discounted that option once free space was reviewed. I may be underestimating the amount of space dbcc requires, so it is still on the table. We've made a copy of the db and moved it over to another server for testing there. This should enable us to run uninterrupted by data loads etc. and should be more stable from that perspective but I think we'll need more space on the new server as you recommended. I will keep you posted as we continue down the road with the separate server. Messages via the event viewer were either the same as sqlserver logs or inconclusive. Not very verbose.

Another related question. We're actually using a deduped/snapped copy of the db on that other server. Will that matter and give us accurate results? Or do we need to do a full backup/restore in order to get a good idea of the dbcc check results? Thank you very much for your response btw! - Joe

Joe Kawalec Newbie Points: 5 More actions · Answer 10

Just attempting to wrap this up. We were able clone/snap a copy of the db over to another environment and run a dbcc check there successfully. however, we had to add significant resources to tempdb as you can imagine. 3rd time was the charm with over 300gb of space but it did work without any error messages which gave us a warm fuzzy if nothing else. We added some additional space to tempdb prd (where the original issue was found) and ran the dbcc check below and it completed with no errors. I'm going to add some additional parameters to see if we can get back to our original dbcc check commands. Many thanks to Sue for confirming our direction.

DBCC CHECKDB('DBNAME') WITH PHYSICAL_ONLY