824 errors

  • Hi All,

    Today we are seeing 824 errors on one of the database.

    When I ran checkdb I am seeing below errors.

    Note: The database hosted on a 3rd part storage appliance called Actifio.

    Use master

    Go

    DBCC CHECKDB (CDP) WITH ALL_ERRORMSGS, NO_INFOMSGS;

    GO

    Msg 1823, Level 16, State 6, Line 3A database snapshot cannot be created because it failed to start.

    Msg 1823, Level 16, State 7, Line 3A database snapshot cannot be created because it failed to start.

    Msg 1823, Level 16, State 8, Line 3A database snapshot cannot be created because it failed to start.

    Msg 7928, Level 16, State 1, Line 3The database snapshot for online checks could not be created.

    Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.

    Msg 5030, Level 16, State 12, Line 3The database could not be exclusively locked to perform the operation.

    Msg 7926, Level 16, State 1, Line 3Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked.

    See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.

    Msg 824, Level 24, State 2, Line 3SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 3:193245696; actual 0:0). It occurred during a read of page (3:193245696) in database ID 21 at offset 0x00017096400000 in file 'G:\SQLDUMP\DATA\CDP\Actifio\CDP_dat.mdf_MSSQL_DBCC21'.

    Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately.

    Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

     

    SQL version is : Microsoft SQL Server 2017 (RTM-CU22) - 14.0.3356.20 (X64

    Questions

    =========

    1. Why SQL was not able to generate the db snapshot? How to make sql to generate snapshot?

    2. Seeing some entries in the msdb..suspect_pages? Now is there a way to know what objects/tables are residing in those list of corrupted pages?

    Regards,

    Sam

  • Step 1 - FULL BACKUP.  Take a full backup so you can get back to your current state.

    Step 2 - fix the snapshot issue.  Quick google:

    https://sql-play.com/2012/02/01/error-message-a-database-snapshot-cannot-be-created-because-it-failed-to-start-when-running-dbcc/

    Which basically says you PROBABLY have a snapshot file already existing that needs to be removed first.  They say restart SQL services should fix this.  If that does, onto the next step.

    Step 3 - Fix the corruption.  First, restore the database onto a test environment (don't want to make LIVE worse).  Next, on the TEST system run

    DBCC CHECKDB('CDP', REPAIR_REBUILD) <-- https://www.infosecurity-magazine.com/blogs/repairing-sql-database-corruption/

    This should correct the issue.  See what it tells you and if it was successful or not.  If it worked, repeat on live and take another full backup after fixing the corruption.

    Now, the above is completely ignoring the the "Actifio" factor.  I have not used that tool and am not sure if the database corruption is a symptom or the primary cause.  One thing I'd want to check is does Actifio have some sort of check disk like utility?  If so, I'd get some downtime and run that to make sure the disk isn't starting to fail or corrupt.  if the underlying disk is failing, your database corruption will only get worse over time.

    Based on my previous experience with using non-standard filesystems, I would strongly encourage you to switch over to a more standardized format like NTFS and abandon Actifio.  We had a file system that was required by our failover software that caused us more headache than it was worth.  Eventually, the failover software abandoned the file system software and a lot of our problems went away.  If Actifio is required, I would be looking at what is needed for proper maintenance and sanity checks.  Does Actifio require regular defragging and check disk using the windows/linux tools, or do they have their own maintenance tools that run automatically?  Are they running frequently enough?  Fixing the database corruption is only useful if the underlying disks are good.  If the disks are starting to fail, fixing the corruption is like putting a band-aid on a severed limb and hoping it won't get worse.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks a lot Brain. Will follow up with the Actifio on this.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply