A couple of weeks ago I wrote an article at www.sqlcopilot.com/dbcc-checkdb-with-data_purity.html about the DATA_PURITY option of DBCC CHECKDB, and how to identify the affected columns.
One of the ways to find affected columns is to run a SELECT with a WHERE clause to return out-of-range data. But this doesn't always work. It is possible for data to be within a valid range but still fail the data purity check. It is also possible for the column to be totally corrupt, resulting in an arithmetic overflow error when you attempt to SELECT it.
The method I showed in my article for when a SELECT doesn't help was to use DBCC PAGE. But if you have more than a few columns with invalid data, it can be quite time consuming to do. This was the case for me recently when I migrated a database from SQL Server 2000 to 2008 R2 and the CHECKDB found 540,000 data purity errors!
A SELECT for values outside the range for the datatype (decimal(23, 8) in this case) didn't return any data and so DBCC PAGE was the only option - but obviously there was no way I could run it manually 540,000 times!
The script attached to this article was my solution to the problem.
1. It runs DBCC CHECKDB(dbname) WITH DATA_PURITY, NO_INFOMSGS, TABLERESULTS and captures the results in a temporary table
2. It extracts the page, slot, object id, column name and data type for each row returned
3. It loops through the results perfoming a DBCC PAGE for each one to get the primary key values of the rows containing the invalid data.
Within this loop, there is also code to derive the condition for the primary key (allowing for multi-column keys). The key value is then used to query the table to get the current value of the affected column. The results are stored in a table, tmp_final_results. This can be dropped once all investigation is complete.
There is further code, commented out, that I will explain shortly.
How to use the script
This is a 3 stage process.
1. First run the script within the context of the database to check.
It may take several hours to run, especially if you have a few hundred thousand data purity errors.
When it has finished, the table tmp_final_results will contain a row for each out-of-range column.
2. Use the first commented out section of the script to view the results.
Note: the conversion of [Value] to varchar is necessary, as attempting to retrieve some out-of-range data may result in arithmetic overflow errors. By converting them to varchar, these are displayed as -1.#IND instead.
3. Fix the data
Now you have to decide what to set each one to.
If you are lucky you may find that all the values are within a valid range for their datatypes, so a straightforward UPDATE to their existing value will fix the problem. The third section of my script (also commented out) generates an UPDATE statement for each column.
However, you may find out-of-range values, and some may be displayed as -1.#IND. For these you have to decide what they should be set to, and this means speaking with someone who knows the application well and getting them to look at your data.
4. Tidy up
The final commented out section needs to be run to drop the tables generated by the script.
Please remember that this script is a tool to aid in the identification of columns that have failed a data purity check. You should not blindly update the columns - ensure you are 100% confident of the correct values. However, the code in section 3 of the script is available if you decide the suggested values can be used. And of course, backup the database before making any data changes.