Database “In Recovery”
Error message is saying the Log file is full – though this isn’t the case – it’s only 6MB, max size is set to 2GB – and there is room on the disk.
If I take the database OFFLINE and then back to ONLINE, I’m able to run short metadata-queries before it goes into the “In Recovery” state again.
DBCC OPENTRAN tells me change tracking is the guilty party (we use CT heavily):
Oldest active transaction:
SPID (server process ID): 17s
UID (user ID) : -1
Name : sys.syscommittab
LSN : (3274:364:2)
Start time : Oct 23 2013 7:36:35:483PM
SID : 0x01
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Again using the OFFLINE/ONLINE trick:
ROLLBACK TRANSACTION [sys.syscommittab]
Msg 3903, Level 16, State 1, Line 3
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
Each time the DB tries to come back ONLINE the following error gets logged:
• Starting up database 'DatabaseXYZ'.
• 4837 transactions rolled forward in database 'DatabaseXYZ' (17:0). This is an informational message only. No user action is required.
• The transaction log for database 'DatabaseXYZ' is full due to 'CHECKPOINT'.
• During undoing of a logged operation in database 'DatabaseXYZ', an error occurred at log record ID (3282:464:176). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
The original error was ~5days ago - for some reason they have the system logs set to purge after 2 days - so those are lost. I don't know the original error - just that the DB is down.
Dataloss is tolerable here, so I’ll let CHECKDB rebuild the log file right? (Don’t do this at home… yadda)
ALTER DATABASE DatabaseXYZ SET EMERGENCY-- SUCCEEDS
ALTER DATABASE DatabaseXYZ SET SINGLE_USER WITH ROLLBACK IMMEDIATE-- SUCCEEDS
DBCC CHECKDB (DatabaseXYZ,REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS-- FAILS
Msg 5245, Level 16, State 1, Server SQLSERVER, Line 2
Object ID 2089058478 (object 'sys.syscommittab'): DBCC could not obtain a lock
on this object because the lock request timeout period was exceeded. This objec
t has been skipped and will not be processed.
So DBCC CHECKDB isn’t able to overrule whatever is happening. That’s not great.
Failed CHECKDB puts the DB into “Suspect” stated (again not good). I’m able to get it back “In Recovery” by setting it back to multi-user, and then taking it offline/online again. So I’m back to “In Recovery” I can do this loop over and over again – and have.
Any advice from the crowd?
I have tried unsuccessfully to add additional log files to the DB before it goes “In Recovery” again. It fails.
I wouldn’t care so much except this is a pretty bad state to be in.
With 2012 production servers on the horizon now is the time to figure out the cause: VM / SQL 2012 / Our-code
The development group says this has happened before on at least one other server – they just blow it away and redeploy. But they’ve not mastered the ability to reproduce the issue.
o A VM server (test server)
o Microsoft SQL Server 2012 - 11.0.2218.0 (X64)
o Enterprise Edition (64-bit) on Windows NT 6.2 <X64> (Build 9200: ) (Hypervisor)
• FULL recovery mode
• 500MB MDF file – unlimited growth
• 6MB LDF file – 10% growth, maxsize = 2GB
• “In recovery” state.
• Reporting an error at a certain log record ID.
• Snapshot Isolation
• Change Tracking enabled
• This database is redeployed often for testing, and the test doesn’t generate many TLOG entries. Up till now they haven’t had problems even though they don’t run any backups.