SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
Search:  
 
 

SQL Musings

Add to Technorati Favorites Add to Google
Author Bio
Steve Jones Editor at SQLServerCentral.com You can follow Steve on Twitter as way0utwest (www.twitter.com/way0utwest)
Browse by Tag : data corruption (RSS)

The last good date for your database

By Steve Jones in SQL Musings | 09-24-2009 6:42 AM | Categories: Filed under: , , , ,
Rating: (not yet rated) Rate this |  Discuss | 1,769 Reads | 787 Reads in Last 30 Days |4 comment(s)

cc165445.cover(en-us)[1] I was reading the most recent issue of TechNet from Microsoft one morning and flipped through the column on SQL Server. This is one of the few paper magazines I get, and for some reason I see things in there that interest me.

It’s not the topics, but often the delivery and format. I flip through it at my desk, or while cooking, and I’ll see things that I’ve missed on the Internet, or get reminded of things I’ve forgotten. The October issue contained one of those.

Paul Randal writes the SQL Q&A column and he usually has something in there that’s worth knowing. This month he answered a question about some strange messages in the SQL Server error log. Here’s the message from my test instance:

Dbcc_startup

In case it’s hard to read, the message is:

CHECKDB for database 'db1' finished without errors on 2009-09-21 08:33:03.713 (local time). This is an informational message only; no user action is required.

This was on startup of the instance, right after the “starting up database” messages that you see. I have this same message for a couple other databases (db2 and master). It didn’t appear for model, msdb, tempdb, and a couple other databases I have on this instance.

Why not? Or rather, why did it appear for these databases? Is DBCC CHECKDB running on startup?

The answer is no, and Paul has a good explanation in his column. It’s not online yet, but I’ve read this before and Paul gave me a great reminder. The message is the “last known good” time for the database. It shows the last time that DBCC completed successfully, and gives you a reference point that you can use an a DR or corruption scenario to try and narrow down when things went bad. Note this is SQL Server 2005 and later.

It’s also a good reminder that you haven’t run dbcc in some time on a DB. When I first read the column, I checked my error log and didn’t see the message. That was a stark reminder that I didn’t have maintenance set up on this instance. It’s a test instance, only been installed for about 20 days, but still. I ran dbcc on those databases that showed messages (db1, db2, master) only as a test. I’ve now set up maintenance on the instance to grab a backup every night and run weekly maintenance.

If I had production instances, I’d love to be able to easily capture this data from a server, or even compare the messages to the databases on the server and then report back to me if CHECKDB hadn’t been run, or if it had been longer than xxx days.

Sankar Reddy, a fellow MVP, wrote a script that helps here, and it’s worth checking out his blog on the last clean DBCC value.


Human Data Quality

By Steve Jones in SQL Musings | 04-22-2009 1:49 PM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 1,597 Reads | 153 Reads in Last 30 Days |2 comment(s)

Disclaimer: I read and speak one language, having failed pretty well at learning Latin, Spanish, and then Japanese in my schooling. I'm sure there are more than a few people that would actually say I've not doing too well with English, either!

I've got a few examples here of "data quality" issues that I've seen in emails and posts lately. I don't intent to make fun of anyone, and I'm sure I would make much worse mistakes if I were to attempt to post on a non-English site. Instead I thought these highlighted some great challenges in the data world. First my examples:

"Greet" in response to fixing something.

"I'm thinning about the best way to ..." - A post wondering about a T-SQL query.

"sintax error"

That last one might be easily corrected, and I've seen other errors that are worse (and I can't find right now). But how smart does a routine need to be to decode these types of grammatical issues?

You might think a grammar checked can handle things, but I've written a lot of sentences that Word flags as having an issue, but isn't sure what to do with them. And Word is a free-form application. Imagine if you are trying to do some type of parsing or clean-up of data that isn't constrained with look-up tables?

Data quality is becoming a bigger and bigger issue in our world, and I'm not even sure that we realize it. More and more systems exchange data, and greater amounts of it. As companies seek to work together, and partner to develop new applications, they are merging data between them, depending on employees that aren't always DBAs to somehow match up data. Or they depend on automated systems to "guess" what should go where?  I'm not always sure they do a good job matching up data.

And then information is lost.

Not that DBAs do a better job, but I think a human has a better chance of learning from past mistakes and correcting them in the future.

I'm not an ETL expert, but I think there is a tremendous amount of flexibility and power in the SSIS programming model to help you figure out how to best match up data from disparate sources and clean if before it infects your system.


TechEd - Data Corruption

By Steve Jones in SQL Musings | 06-11-2008 8:23 AM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 5,985 Reads | 377 Reads in Last 30 Days

I went to this one, mostly because I don't understand coruption that well and Paul Randal is a great speaker. Worth going to see him if you get the chance. 

Run CheckDB. It's important.

You need to know how long checkdb takes to run. Checkdb is optimized to run for the times where there are no issues. So it runs fast. If there are issues, then it does a deep dive and runs longer.

In general, keep in mind that only the first 200 messages are reported. Run all_msgs to get all data and use no_infomsgs since there are lots of things reported that aren't needed in a DR situation.

Make sure checkdb completes. Story here about a stock firm.

Lots of messages once it's complete, so there are over 100 errors checkdb can report. Many have a variety of states.

The error results aren't too readable, but there are some things you should be aware of.

First determine if checkdb completed itself. Make sure that it has completed. Common errors with the checkdb process itself.

7984-988 - corruption in the critical system tables.

8967 - invalid states within CHECKDB. 

8930 - Metadata corruption, not critical, but Checkdb cannot make sense of the metadata about a table(s)

Many of these are BOL or online documented. 

Demo with pre-corrupted databases. A critical error shown on a system table, 7995, so checkdb cannot run, including repair. This means a restore situation.

Corruption in a system table, checkdb will not run, but you can perhaps run checktable against individual tables to determine which one is corrupt.

If you have corruptions in nonclustered indexes only, then you can run repair_rebuild if it's recommended from the output. Look through the errors and if the errors are all index IDs > 1, then you can manually repair the non-clustered indexes.

The flipside here is that if you must run repair, you must be in single user mode. If the index is large, than you must have 2x space to rebuild as well, which can be a problem. If you repair with checkdb, it might be able to correct some records.

You can also run an online rebuild in 2005 (Enterprise edition), to try and repair. However since this reads the old index, you need to do an offline rebuild.

Unrepairable error

Was one found? A PFS error? (89090, 8938, 8939) - No way to fix these. You can try to extrat data, but the pages where the corruption exists will not extract. Thereforre a backup is needed.

Was it 8970, invalid data in the column. Example, more than 1439 minutes after midnight. If you get this, repair cannot fix this since it does not know what values should be there. You can repair these manually.

8992 - metadata mismatch, checkcatalog error. Cannot be repaired, but depending on what this is, you might be able to hack the system tables.

Demo

DBCC Page is unsupported/undocumented, but it's safe for production use. Only reads data. Does not change anything. It's heavily used by the SQL Server team.

One trick in binding to system tables is that you can do it with the Dedicated Admin Connection. It doesn't work with regular connections.

System tables are cached. If you start in single-user mode, and use the DAC, you can change metadata. Paul had a demo that showed hwo to fix some corruption doing this, but he got an error that the metadata cache wasn't consistent. SQL Server maintains this and if you make the change to system tables, the cache isn't updated. So you need to restart SQL Server.

Look for blog posts on this stuff from Paul in the next few weeks at SQLSkills

Recovery using backups

Not necessarily the best way to deal with corruption.With large databases, it can be quicker to run DBCC to repair things. However most corruptions require ALLOW_DATA_LOSS, which means data gets deleted.

Full backups are a good starting point, necessary for a base.

Log backups give you good recovery to point in time. 

Backups need to be valid, so you should test them. You can restore bad backups with the "Continue after restore" option. This will restore the database, which is corrupt, but you can possibly retrieve data. Even with a corrupt database, you want to still get a backup first in case things get worse. Use CONTINUEAFTERERROR with the backup as well.

Restore or repair?

Do you have a database? If no, you need to restore.

Working backups? If no, repair. Or you can restore a damaged backup.

Log damaged? Restore, if that fails, use emergency mode and extract information.

CheckDB failed? If it's a critical error, restore or extraction.

If you have nonclustered index errors - restore or repair, choose which makes sense.

If you have repairable errors, choose either repair or restore.

If you have a choice, think about SLAs (downtime and data loss). Choose the one that limits the loss.

One thing to always do in a DR situation is ensure you have the tail log backup. This are all the changes since the last log backup. This is a just a current log backup.

Paul showed a single page restore, which is fairly cool.I haven't done that, but it can help with one (or relatively few) page corruptions.

One trick I've seen over and over is that you should ALWAYS restore with NORECOVERY. That's the default, do that, it allows you to continue with restores. When you're sure you're done, do a restore with recovery by itself to brings things online.

REPAIR_ALLOW_DATA_LOSS

Be careful. You're allowing things to be repaired, which often means just deleting data. 

Repair goes the fastest, doing the most provably correct thing. It doesn't try to save data, just get done quickly. Doesn't look at FKs, constraints, replications, or any types of relationships between objects.

CHECKBD repairs aren't replicated, so you need to quiese the system and then re-init the publications.

Damaged Log

If you don't have a backup, use EMERGENCY mode to extract the data. There is an emergency mode repair in 2005 that  you can run. This will rebuild the log (after extracting what it can) and then run checkdb as one operation.

What people do? 

Restart SQL Server - Doesn't really do anything. If there's corruption, it's corrupt and you're wasting time.

Jump to a last resort -  Without determining the error, they often run repair_allow_data_loss.

Detach a suspect database - Doesn't help.

XVI32, freeware hex editor. Paul showed how to corrupt a t-log. Interesting, not a useful skill if you're not a presenter/tester.

One thing about rebuilding the log on the sly with a detach/attach, Service Broker has been given a new GUID, so it's disabled. Watch out for this if you use SOA stuff.

No backup, no database? Worst state. Kimberly says URLT (update resume, leave town).

Summary

Know the signs

Run checkDB, let it complete,

Always take a backup before restore or repair

Try to limit downtime and data loss, but think about it.

 Test this, go through this using demo databases. And using your systems.