DBCC Checkdb cannot finish - Msg 8921, Level 16, State 1, Line 8 Check terminate

Question

DBCC Checkdb cannot finish - Msg 8921, Level 16, State 1, Line 8 Check terminate

Viewing 3 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale SSC-Insane Points: 24976 More actions · Answer 1

To me that sounds like when the disk had bad sectors which resulted in data getting corrupted in the system databases and possibly user databases.

First, I would put it into single user mode and make sure you are the single user. You don't want people going in and changing the data while you are trying to fix things.

Next, I would run checkdb on master and HOPE it comes back clean. Then repeat for model and msdb. I would run it manually just so you can verify things and begin working on recovery.

You have 3 suspect pages in msdb, so I see a few options here. You could restore from a backup, you could run checkdb with fix allow data loss, or you could rebuild and then recreate all the jobs that would disappear with the rebuild. Which option you go with is entirely up to you. I would not recommend the fix with allow data loss because you could end up with jobs that have missing steps. I like the restore from backup OR rebuild and recreate the jobs from the copies of the scripts that you have in source control... you do have your jobs scripts in source control (or in scripts somewhere) so you can recreate them, right?

With the user database, that one is going to be more challenging to fix. I would start by restoring a known (or suspected) good backup onto a test system, re-do checkdb to ensure it is in a good state. At this point, you may have lost some data due to data changes throughout the day. If you can recreate the data, then do so on the test system, then restore it to live (after taking a backup of live). If it is too much data changed OR you don't know or have a way to put the data back in, then I would be looking at some other methods. Since you are getting a data error and it isn't an error on a specific page, that may be harder to correct or may not be possible to correct. I've never had success fixing cyclic redundancy check errors inside or outside of SQL Server. Usually I end up just removing the file/folder with the error if I can. You MAY have success by running chkdsk (the windows tool, not a SQL tool) while the database is offline, but it will take time to complete and MAY result in corrupting the database worse. So take a backup and cross your fingers you can recover the bad data. Worst case, there are a few consultants out there that specialize in database data recovery. They MAY be able to get the database going again with minimal data loss (possibly none), but they do cost a bit of money, and they may not be able to recover everything. It MAY be your bad pages are JUST on the indexes and an index rebuild may fix it. BUT if the corruption is on the clustered index/heap and you rebuild your index, you may have just blown away the only good copy of the data.

Database data recovery can be tricky and needs to be done carefully to prevent permanent data loss, so step 1 is to take a backup of the current state so you can get things back to how they are now in the event you make things worse while trying to fix it. And before making any changes, I recommend restoring that backup to a test system and work on making changes on the backup so you can get back to your current state. Once you have fixed the problems, you can always restore your "fixed" database over-top of the broken one OR you can take the database offline and swap the data files. But I would not do any actual changes on the LIVE broken database just so you don't cause further harm.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Angel_V Old Hand Points: 333 More actions · Answer 2

We managed to fix the issue with restore from last full and around 160 transaction log backups ( since full backups were failing due to the corruption.)

The data file itself we couldn`t copy or archive as it was giving us errors so we had to delete...root cause remained unclear.

No data loss in this case but we had couple hours downtime due to the large amount of log backups to be restored 1 by 1 each taking 40sec to 2 min...

Thanks for suggestions to Mr. Brian Gale ! We did indeed restore the DB on a test server to see if everything is okay and we do not have any corruption from log backups or other issues... 😉

Mr. Brian Gale SSC-Insane Points: 24976 More actions · Answer 3

My opinion - every time I've seen an "unable to delete" error, or unable to copy, move, archive, etc, it has been due to disk failure. I would run a chkdsk to make sure the disk is not faulty, and make sure to run the check that looks for bad sectors. If bad sectors appear, I would replace the disk ASAP! Even if the disk comes back clean, I would be suspect of the disk or the storage controller.

Database corruption is usually a sign of some form of hardware failure.

EDIT - was just re-reading the original post and it definitely says there is a bad block. I would replace that disk ASAP. 1 bad block can very quickly turn into more. And once you start getting bad blocks, it isn't long before the disk is full of them and unusable.

This reply was modified 3 years, 11 months ago by Mr. Brian Gale.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.