How to check cause of SQL error 5172 (PageAudit property is incorrect)

  • Hi,

    I have found discussions on this forum about SQL error 5172 when attaching a (recovered) database after a SQL server crash, however our problem is a bit different.

    We use DPM 2007 to backup our databases (100 plus) on several SQL 2005 clusters. Recently we found out that some restores of those databases (about 5) could not be attached on our test SQL server(s). We use this procedure to copy the databases to another environment for none important reasons.

    All our databases are located on SAN disks. The SAN disks (Fiberchannel) are mounted volumes on our SQL 2005 servers. We have been testing this procedure for some weeks now so that we can isolate the problem, however we cannot see or predict why or when this problem wil occur. The only constant variable is that we have one specific database that cannot be recovered on another or even the same SQL server. We were able to confirm that DPM is not the problem because we tested the procedure on a different DPM server with the same and different databases on several SQL servers. We even used vshadow to create a snapshot of the volumes were the database and logfiles are located, mounted the snapshot and got the very same error.

    So far we have tested (in short) the following:

    1 Use DPM 2007 to create a replica (from scratch). Restore to file location on other SQL server. Attach > error 5172

    2 Move LUN to other SQL server and do test 1 again > Same error

    3 Use SQL to make backup of database. Restore database to the same LUN (different DB name). Run test 1 on new database (which is essentialy the same DB) > No errors after attach

    4 Use vshadow to create a snapshot. Mount snapshot to mountpoint. Attach > error 5172

    We suspect the original database to be corrupt or bad in some way, however a DBCC CheckDB did not reveal any problems. Detaching and attaching the database is also possible without problems. So my question is: How can I check the running database for errors that explain our problem with backup and restore using VSS (with or without using DPM).

    Maybe you can let me know what you think, what I can test and /or how I can check the database.

    thx

    Arjan

  • Hi Arjan,

    It is a tough one, because the error may be pointing at the way DPM replicates the data and tlog files. A PageAudit error means something in the header of the data file is unexpected and has been changed, possibly by a low level access in the IO stack. Filter drivers can do that. Many block based replication systems use such filter drivers to replicate the IO stream to another location. I assume DPM uses VSS and a software provider to replicate the blocks, the question is, are there any problems in the filter driver DPM uses that could zero initialize the first page of your copied MDF file ? As long as it is not a third party driver (it should be a MS signed driver), I doubt that, however, if you have support for DPM, it could be handy to ask for any suggestion.

    Now when you say 'Move LUN to other SQL Server', how do you do that ? Is it the same LUN you detach and reattach to another box or do you copy the contents of the LUN somewhere else in your SAN ?

    If DBCC CHECKDB does not return any error on the source database, if you can perform a dbcc fileheader('yourdatabasename'), the problem lies somewhere in the copy process.

    I'll try to ask more experienced people for any advice on this. Keep in touch,

    David B.

  • David, thanks for replying.

    We are already working with Microsoft to resolve the issue. The problem is that we can not predict / see that a DB cannot be restored. The eventviewers do not show events that point us in the right direction. DPM does not show any problems when protecting or restoring protected databases. Of all the databases we protect only a few have restore (attach) problems.

    We used vshadow to create a snapshot of the disks were the database is stored. When we attached that snapshot we got the same error (Error: 5172). Therefor we concluded (for now) that DPM on it's own is not the problem, but the VSS process that is used by DPM (and vshadow) is. We do use software providers for VSS as included with Windows 2003 and SQL 2005. We do not use any providers specific for the SAN hardware.

    We moved the LUNs to another server by disconnecting the LUNS from clusterA (node 1 and 2) and connecting the same LUNs to clusterB (node 1 and 2). Our databases and log files are located on separate LUNs (all fiberchannel).

    We did copy the MDF and LDF files to a USB disk and moved them to another environment that is almost the same as our production enviroment (hardware, software, deployment, patchlevels, etc are the same. SAN and network is almost the same). In that environment we could not reproduce this problem (read: we were able to protect and restore the proteced database without any problems)

    I will try the suggested dbcc fileheader('yourdatabasename') command.

    Arjan

    I appreciate you asking around for more advice. Thank you.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply