Database Integrity Issues

  • To avoid a lot of contention during peak business hours we separated our integrity checks into weekday and a weekend run.  During the week we do a PHYSICAL only and on weekends when activity is lesser, we run a full blown integrity check.  While it appears to be inconsistent, sometimes we get errors for a specific table in our of our databases and the integrity check fails.  We see this:

    Scheduled Checkdb ran Sunday, August 29, 2021 9:06 AM 

    DBCC CHECKDB (MyDatabase) WITH all_errormsgs, no_infomsgs executed by DOMAIN\svc_sql_prod found 1117 errors and repaired 0 errors. Elapsed time: 1 hours 10 minutes 0 seconds.  Internal database snapshot has split point LSN = 00050f08:0001efbd:0002 and first LSN = 00050f08:0001e84f:0001.

    Some errors (snippet):

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:260087802) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).

    Table error: Object ID 768057822, index ID 1, partition ID 72057594534821888, alloc unit ID 72057594118471680 (type LOB data). The off-row data node at page (1:260087802), slot 0, text ID 1830944768 is referenced by page (1:260087788), slot 0, but was not seen in the scan.

    Object ID 768057822, index ID 1, partition ID 72057594534821888, alloc unit ID 72057594118471680 (type LOB data): Page (1:260087802) could not be processed.  See other errors for details.

    Object ID 768057822, index ID 1, partition ID 72057594534821888, alloc unit ID 72057594756792320 (type In-row data): Errors found in off-row data with ID 1830944768 owned by data record identified by RID = (1:17895895:3)

    ...

    CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.

    CHECKDB found 0 allocation errors and 1116 consistency errors in table 'schema.AggregateChanges' (object ID 768057822).

    CHECKDB found 0 allocation errors and 1117 consistency errors in database 'MyDatabase'.

     

    A little investigation reveals it's a specific table (AggregateChanges), running another integrity check at the table-level reveals the same:

    Checktable was run manually Sunday, August 29, 2021 12:27 PM and again, reported 1,117 errors.

    DBCC CHECKTABLE (MyDatabase.schema.AggregateChanges) WITH all_errormsgs, no_infomsgs, tableresults executed by DOMAIN\myself found 1117 errors and repaired 0 errors. Elapsed time: 0 hours 5 minutes 37 seconds.  Internal database snapshot has split point LSN = 00051369:00003671:00a0 and first LSN = 00051361:00006b3e:0008.

    The next (Monday, a weekday) the integrity process kicked off running PHYSICAL_ONLY and it found no issues (understandable since it is skipping any logical checks, etc):

    Scheduled Checkdb ran 8/30/2021 8:07:10 AM and no errors.

    DBCC CHECKDB (MyDatabase, noindex) WITH all_errormsgs, no_infomsgs, physical_only executed by DOMAIN\svc_sql_prod found 0 errors and repaired 0 errors. Elapsed time: 0 hours 41 minutes 47 seconds.  Internal database snapshot has split point LSN = 00052f74:0001c542:0004 and first LSN = 00052f74:0001c46a:0001.

    Well that's good news, but what about our corrupted table?

    Checktable ran at 8:40am, just to double check, and no errors errors reported

    DBCC CHECKTABLE (MyDatabase.schema.AggregateChanges) WITH all_errormsgs, no_infomsgs, tableresults executed by DOMAIN\myself found 0 errors and repaired 0 errors. Elapsed time: 0 hours 6 minutes 19 seconds.  Internal database snapshot has split point LSN = 00052f74:0001e285:0003 and first LSN = 00052f74:0001df87:0001.

    The table itself is basically a logging table, storing XML of changes being made to records, etc and data is spread across

    Table row data allocation

    AggregateChanges

    How is it possible that it "fixed itself"?  My impression of corruption is that once it's there, it remains unless you manually resolve the issue... Where did the corruption go?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I wanted to also note that Vormetric encrypts the data at the OS/folder level

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • So you have a process which is encrypting the MDF/NDF/LDF on disk outside of the SQL server process?

    TDE is not a performance killer, it will add 1-2% CPU overhead which in this day and age is negligible and is probably about what anything else will be using while it encrypts at rest.

    The encryption is outside of the SQL process so it could well be that the encryption and checkdb are fighting with each other as stuff is happening disk level.

    As for the corruption, take a copy of the database as it is now and move it to a server with no encryption on and run a full check

    DBCC CHECKDB (<MyDatabase>) WITH ALL_ERRORMSGS, NO_INFOMSGS, TABLERESULTS, DATAPURITY

    If the database comes back as clean your OK, but if it continues to happen you need to try and disable Vormetric and see if the issue resurfaces itself or not, and if not you know Vormetric and SQL don't like each other.

  • It might not hurt to reach out to Vormetric as well.  It is a paid tool, so might as well get their support.

    From my experience, once corruption happens, unless I do something to fix it, it remains corrupted.  I would expect that Ant-Green is correct in that it is likely something related to Vormetric.  I would try removing that from the equation as they suggested and see if the problem persists.  That is assuming you have a backup during the time the corruption occurred.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks for everyone's comments thus far, that's the current plan - restore to a different server, different storage, sans Vormetric and see if the corruption is still present.  Just so odd that it's then one day, and not the next...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Is the data sort of static during your DBCC run?  Do you do mass loads or deletes during that timeframe?  I would suspect your encryption process is causing it.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply