Database backup/checkDB problems

  • Hi all,

    Got a very odd problem, and one I'm hoping has a solution. Running the following code:

    BACKUP DATABASE [MyDB] TO DISK = N'MyLocation\MyFileName.bak' WITH NOFORMAT, NOINIT, NAME = N'MyFileName', SKIP, REWIND, NOUNLOAD, STATS = 10

    Results in the following exception:

    Msg 3203, Level 16, State 1, Line 1
    Read on "MyLocation\MyDB.mdf" failed: 1117(The request could not be performed because of an I/O device error.)

    Running the following code:

    DBCC CHECKDB (MyDB)

    Results in the following exception:

    DBCC results for 'MyDB'.
    Msg 8921, Level 16, State 1, Line 1
    Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

    ...

    CHECKDB found 0 allocation errors and 0 consistency errors in database 'MyDB'.

    Event Viewer contains the following:

    The operating system returned error 1117(The request could not be performed because of an I/O device error.) to SQL Server during a read at offset 0x00000151a5c000 in file 'MyLocation\MyDB.mdf_MSSQL_DBCC6'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    If any more information is necessary please let me know. I have absolutely no idea how to continue. I feel like this is a catch-22 - I can't backup the database because I can't run checkdb, and I can't run checkdb because I can't backup the database.

  • That doesn't sound too well. Prepare to restore the database to another server from a previous backup.

    Does the operating system reports IO problems outside of SQL Server?

    Does your storage / network produce any alarms?

    Any chance to run a diskcheck when SQL Server is offline? When SQL Server is offline, you might be able to copy/transfer the MDF/NDF files to another sql server and run DBCC_check there.

    Is your SQL 2014 up to date with patches? Might help with the DBCC check problems

    Is your operating system up to date with patches / driver updates?

  • Run DBCC checks on your system databases. It sounds like either one of them is corrupt, or you're hitting an issue with tempdb, as was suggested in the error message when you tried to run the first DBCC. Make sure tempdb has plenty of space.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore Roosevelt
    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Look in the SQL Server error log, there may be additional info there.

    Just in case, add another file(s) to tempdb if you can, to give it additional space.  You never want tempdb to run out space while SQL is running.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who give you the shirt off *someone else's* back.

  • This looks like an issue with the internal snapshot that is created for CHECKDB.  Since  the internal snapshot is created in the same location as the MDF file - and you are also getting errors when attempting to backup, it really looks like a problem with that volume.

    These kinds of errors should be logged in the system event log or application event log.  Review those to see if you are logging any kind of I/O errors.

    Is the database in full recovery model?  Do you have an unbroken chain of transaction log backups from the last known good backup?  If you have those available, then you can restore from the last known good backup and apply all of the transaction logs to get to current (after taking a tail-log backup, of course).

    Before doing anything - you need to validate the volume and fix whatever caused the issues, or this will just occur again.

    One final thought...do you have enough space available on that volume to create the internal snapshot?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Hi all - thanks for the suggestions. Will try them out this evening and report back.

  • Okay - a few updates:

    1. Running DBCC CheckDB on all the system databases resulted in no errors
    2. No I/O problems are being reported. To note - the database is currently in use, and functioning perfectly normally with the exception of being unable to be backed up
    3. CHKDSK reported no errors on the disk
    4. Physical disk space is more than sufficient on the drive
    5. I tried increasing tempdb significantly - to the point where it was larger than the actual database being backed up. Error still persists
    6. Database is in full recovery mode, however, the last working full backup is from two weeks ago. Restoring that backup and then transferring over data is really a last-resort move ... would definitely like to see if there are other options

    Any other suggestions? :/

  • kramaswamy wrote:

      <li style="list-style-type: none;">

    1. Database is in full recovery mode, however, the last working full backup is from two weeks ago. Restoring that backup and then transferring over data is really a last-resort move ... would definitely like to see if there are other options

    Why would you need to transfer over any data - you should have transaction log backups available from that last known good backup through current.  A final tail-log backup would get you back to current point in time.

    What is the Edition and version of SQL Server?  You posted to the 2014 forum...

    What is the O/S version?  Is this a VM?

    https://support.microsoft.com/en-us/help/2519834/sql-server-reports-operating-system-error-1117-i-o-device-error-on-vmw#:~:text=This%20is%20a%20severe%20system,see%20SQL%20Server%20Books%20Online.

    Here is an older thread with the same issue: https://www.sqlservercentral.com/forums/topic/possible-corruption-due-to-io-device-error

    The error you are getting clearly indicates an issue with drive/volume and that needs to be addressed.  Fixing this will require restoring from backups and applying transaction logs up to current point in time after the storage issues have been resolved.

    You may not run into any issues with the database...until you happen to hit the corrupted portions of that database (if the DB is corrupted - but there is no way to tell).  At this point, if you continue using the database you will get further away from being able to recover the system without extreme data loss...can you afford to lose a month of data or more?

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Hah - I hadn't thought of that. My normal backup routine involves clearing out the transaction log backups after the full backup is made, so I had just assumed I wouldn't have a full chain anymore - but of course, since the full backup is not being made, the chain of transaction logs aren't being cleared out either.

    Will try restoring from that chain tonight to see if it works. Thanks for the inputs.

  • Hi again all,

    Jeffrey your idea worked like a charm. Took a bit of finessing since I had about 270 transaction log backups to chain. Found a script online that let me query the sys dbs to get the list, and modified it to output a list of RESTORE LOG statements. Executed the restore on the main DB to a backup DB, then restored the chain.  Then I took the restored DB, and backed it up to a new full backup. Lastly, I took that new full backup, and restored it back to my main DB.

    Now, I'm able to run CHECKDB against the main database again, and full backups are working properly.

    Thanks for the help everyone. Was worried this was going to be a nightmare, but as it stands I'm guessing the client won't even know anything was wrong.

    **EDIT - maybe spoke too soon ... trying to run a backup on the restored backup on my main DB is strangely still failing. Investigating...

    **EDIT - So it seems that setting the database to single_user allowed the backup to work. I wonder whether I could have just tried that from the start ... still, it might mean there's some other problem. TBD...

    • This reply was modified 1 year, 11 months ago by  kramaswamy.
    • This reply was modified 1 year, 11 months ago by  kramaswamy.
  • Are you able to take Differential backups ?

  • Will let you know tonight when they run at their scheduled time. Hopefully yes 🙂

  • Okay - differential is now working properly. I have no idea what the original problem was, but things seem to be resolved.

    I do wonder whether I could have solved all of this by just putting the database in single_user mode before running the CHECKDB and full backup. If this happens again in the future I'll try that out.

  • You still need to review the storage system - or this could happen again.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey - not sure exactly what you mean by review the storage system? We have regular maintenance being done on the server, is there something specific I should be telling our tech team to investigate?

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply