SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
Search:  
 
 

SQL Musings

Add to Technorati Favorites Add to Google
Author Bio
Steve Jones Editor at SQLServerCentral.com You can follow Steve on Twitter as way0utwest (www.twitter.com/way0utwest)
More Posts Next page »
Browse by Tag : sql server 2005 (RSS)

How Many Rows Have Changed?

By Steve Jones in SQL Musings | 10-28-2009 6:33 AM | Categories: Filed under: , ,
Rating: (not yet rated) Rate this |  Discuss | 1,154 Reads | 1154 Reads in Last 30 Days |1 comment(s)

One of the things that I've asked DBAs, and I see asked often, is how much does your data change? That affects the transaction log sizes (and backups), the load on your server, possibly the tuning and specs you want to implement, etc. But most DBAs don't have any clue what the change rate on their data is. Most probably can't even tell you quickly what their data growth rate is.

The good ones can, and many of them either track it, or they do what I used to do: they monitor backup sizes and calculate an acceleration and velocity that assists them in proactively managing servers.

That accounts for growth, but what about changes? Very few DBAs know this, and it seems like it's an advanced topic. I've never had a great, homegrown way to track this, and haven't worried about it. Most of my systems could have weekly downtime, and I'd rebuild indexes, and statistics, during that time, so it wasn't an issue.

The other day I was reading Kim Tripp's blog because, well, it's a good idea. She doesn't post a lot, but when she does, it's good stuff to know. In this post she talked about filtered indexes and statistics going stale quicker than expected. I'll let you read it if you're interested (you should be), but the interesting thing to me was her mentioning a way to know how many rows have changed.

In SQL Server 2000, there's a value in sysindexes.rcmodctr that keeps track of how many rows have changed. This is used to note when to refresh statistics (it's when 20% of the rows have changed).

In SQL Server 2005 and 2008, this is tracker per column, which may or may not be better. Read Kim's explanation of that. However the values are in sysrowsetcolumns.rcmodified for SS2K5 and sysrscols.rcmodified for SS2K8.

That's a handy piece of information for me to know. I can now easily track the number of rows that change daily, and then get some averages that might clue me in to the activity of the database. It's not that this average means anything by itself, but if I know it, and then compare the last day's change against the average, I'll have some idea of the load on my server. Might help me diagnose issues, or let me know if the server if coming under more load.


SQL Server Legend – Data Files and Threads

By Steve Jones in SQL Musings | 10-13-2009 8:38 AM | Categories: Filed under: , , ,
Rating: |  Discuss | 2,446 Reads | 978 Reads in Last 30 Days |11 comment(s)

Someone was asking about using multiple data files recently to try and increase performance. I had answered that unless you had separate physical disks that it wouldn’t matter.

However then I remembered hearing something about threads and files for I/O. I tjhought this was a myth, but I wasn’t sure. I searched around, and then pinged Gail Shaw since I know she does a lot of internals type investigation.

She confirmed this is a myth and sent me this reference: SQL Server Urban Legends Discussed. It’s from the Microsoft Customer Service engineers and discusses the origin of the myth and how things work.

The bottom line is that SQL Server uses a thread for each unique disk drive, not files. In SQL 2000, you can fool the system if the disk drives are volumes on the same physical disk. However in SQL 2005 the system checks to see if these are the same physical disk.

There are some caveats, but if you hear this rumor, dispel it. Let people know that multiple file groups (or files) only help if you have different physical I/O paths and drives.


The last good date for your database

By Steve Jones in SQL Musings | 09-24-2009 6:42 AM | Categories: Filed under: , , , ,
Rating: (not yet rated) Rate this |  Discuss | 1,974 Reads | 585 Reads in Last 30 Days |4 comment(s)

cc165445.cover(en-us)[1] I was reading the most recent issue of TechNet from Microsoft one morning and flipped through the column on SQL Server. This is one of the few paper magazines I get, and for some reason I see things in there that interest me.

It’s not the topics, but often the delivery and format. I flip through it at my desk, or while cooking, and I’ll see things that I’ve missed on the Internet, or get reminded of things I’ve forgotten. The October issue contained one of those.

Paul Randal writes the SQL Q&A column and he usually has something in there that’s worth knowing. This month he answered a question about some strange messages in the SQL Server error log. Here’s the message from my test instance:

Dbcc_startup

In case it’s hard to read, the message is:

CHECKDB for database 'db1' finished without errors on 2009-09-21 08:33:03.713 (local time). This is an informational message only; no user action is required.

This was on startup of the instance, right after the “starting up database” messages that you see. I have this same message for a couple other databases (db2 and master). It didn’t appear for model, msdb, tempdb, and a couple other databases I have on this instance.

Why not? Or rather, why did it appear for these databases? Is DBCC CHECKDB running on startup?

The answer is no, and Paul has a good explanation in his column. It’s not online yet, but I’ve read this before and Paul gave me a great reminder. The message is the “last known good” time for the database. It shows the last time that DBCC completed successfully, and gives you a reference point that you can use an a DR or corruption scenario to try and narrow down when things went bad. Note this is SQL Server 2005 and later.

It’s also a good reminder that you haven’t run dbcc in some time on a DB. When I first read the column, I checked my error log and didn’t see the message. That was a stark reminder that I didn’t have maintenance set up on this instance. It’s a test instance, only been installed for about 20 days, but still. I ran dbcc on those databases that showed messages (db1, db2, master) only as a test. I’ve now set up maintenance on the instance to grab a backup every night and run weekly maintenance.

If I had production instances, I’d love to be able to easily capture this data from a server, or even compare the messages to the databases on the server and then report back to me if CHECKDB hadn’t been run, or if it had been longer than xxx days.

Sankar Reddy, a fellow MVP, wrote a script that helps here, and it’s worth checking out his blog on the last clean DBCC value.


What's Your Build

By Steve Jones in SQL Musings | 09-16-2009 7:34 AM | Categories: Filed under: , ,
Rating: (not yet rated) Rate this |  Discuss | 1,513 Reads | 423 Reads in Last 30 Days |no comments

This would probably be a good editorial, but I thought I'd drop it out here as a short note.

You ought to be tracking the build (or version) of all your SQL Server instances and Windows. I'd suggest you automate this, maybe even drop it in a table in every (user) database every day.

With the constant release of patches, the changes you might make to a server, it's easy to lose track of what build a server is at. You might think all your servers are at SP1 or SP2, but you never know when an application might not support the latest version and you're behind.

The more time that passes between the application or deployment of patches, and a disaster, the less likely you'll be sure of what patch level you're on. And if you have dozens of instances, you won't be sure.

In a disaster, the last thing you want to do is have doubt about what you're restoring. Track the builds and you'll be fine. And bookmark my build list pages (2005 and 2008)


The Resource Database

By Steve Jones in SQL Musings | 09-14-2009 7:29 AM | Categories: Filed under: , ,
Rating: (not yet rated) Rate this |  Discuss | 1,579 Reads | 432 Reads in Last 30 Days |no comments

I saw a post recently that asked about backing up the Resource database. The person said they just realized that it wasn't included in their backup scheme and was worried.\

Don't worry, you don't need to back it up.

The page at MSDN describes the Resource database as essentially a large lookup table(s) for the SQL Server itself. It can find the system objects that are used: tables, views, DMVs, DMFs, etc. When the server is upgraded, a new copy of the Resource database is copied over. The objects don't have to be built from scripts with DROPs and CREATEs.

This means, however, that each time you patch your server, you potentially could be copying over a new version here if some system object changes. That means that you don't really want to back up this database.

If you are restoring to a new server, when you install that server, it will have the correct copy of the Resource database for your system. If you were to copy over an incorrect version, the system objects might not match up with the system code in the SQL Server executables and DLLs. That would be bad.

I haven't seen notes about the Resource database becoming corrupted, but it could. In that case, I'd suggest you copy the files from another SQL Server of the same version, or install SQL to another machine, patch it up, and then copy the files to your damaged installation. That could be a problem, but it does mean one other thing.

You ought to be tracking your SQL Server version and build on a regular basis as part of your monitoring.


Scripting in SSIS

By Steve Jones in SQL Musings | 08-06-2009 4:31 AM | Categories: Filed under: , , , ,
Rating: |  Discuss | 4,307 Reads | 467 Reads in Last 30 Days |2 comment(s)

Tim Mitchell presented at SQLSaturday #17 on Scripting in SSIS. I thought it was a really interesting presentation, and I’m glad I went. I haven’t done much SSIS work, and very little with scripting, so I learned a few things right off.

Tim is a great contributor here at SQLServerCentral, and I was happy to support him. He made a nice intermediate presentation on scripting in SSIS and I hope he writes a few articles.

The presentation started with a little background on how the scripting works in SSIS. Tim notes that VB.NET only could be used in 2005, but C# support was in SQL Server 2008. He also said that there were two scripting items in SSIS. The Sscript task in the data flow is where his presentation focused.

If the only tool you have is a hammer, everything looks like a nail. He used that analogy and said if you can do something with another task, use that. Don’t need to use scripting for everything,. Scripting works well when you can’t do something with native tools, or when it is very cumbersome.

This is lightweight programming, but it is programming. There can be performance issues as well since you’re essentially writing software. This also can be good job security if you learn to do it well. He mentioned a demand for SSIS jobs, which I’ve seen as well.

As far as debugging goes, there are no good end user feedback from the tools. This can be challenging if you are not a developer.

The Script task in the control flow typically is used for anything except moving data. It can, but usually used to work with OS, filesystem, external programs, variables, etc..

Script component is for data flow, used for data manipulation. Can be useful for branching, thought the presentation didn’t focus on that.

Tim works a lot with hospitals,and patient data. The script component works well dealing with the various unconventional inputs that he sees like:

  • semi structured
  • nonlinear files
  • multiple lines of text
  • varying numbers of columns
  • dissimilar data types
  • record type formats.

I haven’t done a lot of SSIS work, so it’s good to see the different real world types of issues people deal with. It might come in handy in the future if I run into one of these situations.

Tim had a nice record import example with characters from “The Office”, using characters and situations. Jim with a pencil stuck in his hand, Meredith’s car accident. and Dwight’s overdose of beets. That made me smile seeing it.

Then he used a script to output to three tables, 3 outputs, from one input. A script component allows n outputs. The outputs are set to the name and datatype for each output table or destination.

The edit script can be intimidating, but it shows real programming code. The CreateNewOutputRows() method that reads the file and loops through the stream. Each row is read in the loop and if it matches a particular value, it is added to the buffer for a particular output. This needs to be done before the values are added to the various elements of the output.

There was no error handling in the script, which I hate to see, but I understand trying to keep things simple. I would prefer to see this included and then skip over/mention the fact that it is in there. It’s easy to comment on TRY..CATCH blocks, but show a good use of the error handling structure.

The second example was the reverse of the first. Take data from multiple sources and combine it into one record. The interesting thing here is that the input needs to be un-synchronized from the input. Not all rows are going to be merged. The third example is to get multiple lines per input. For example, a report type item that describes a row in the db across multiple rows in the report. This was an interesting use of the Script component and basic programming to handle this structure. The fourth example reversed the third.

I thought this was a slightly dry session, and I have sent some specific notes to Tim for future work. Not that I’m a great presenter, but I noticed a few things that could be better. However as a basic introduction to the script component, I thought this was fantastic.


SQL Server Encryption - The Service Master Key

Rating: (not yet rated) Rate this |  Discuss | 3,945 Reads | 420 Reads in Last 30 Days |no comments

I've been tech editing a book about SQL Server encryption and this series of blog posts looks at some interesting things I've found when going through the book.

The Service Master Key (SMK) in SQL Server is the root of all encryption operations. It's the key that's used to encrypt all other keys below it, or at least encrypt those keys that are used to encrypt something else.

Surprisingly, there's no CREATE SERVICE MASTER KEY command. There is an ALTER SERVICE MASTER KEY command, which is used to work with this key.

Instead, the Service Master key is automatically created when you enable certain operations. For example, if you create a Database Master Key (DMK), if there is no SMK, it is automatically generated. It is then used to encrypt the DMK on the instance. If you enable Transparent Data Encryption (TDE), then an SMK is created if it does not exist. If you create a linked server, an SMK is needed to encrypt the password.

There are a few other operations as well that create the SMK, but the important thing for you to understand is that this key is at the root of all encryption, and needs to be protected. That means backup of this key, and secure storage. There is a Backup Service Master Key command, and you need to make sure you use this and store a copy of this somewhere that you can get to it in a DR situation. Putting it on the root of the instance host might not be a secure place, and honestly I think you need some secure place that you centralize all of your keys so that they are available along with, but separate from, the backups.

Inside the instance, or on the host, the SMK is protected by the Windows DPAPI mechanism, which should be secure. If it's not, then likely nothing is secure in the computer.

I've been tech editing a book about SQL Server encryption and this series of blog posts looks at some interesting things I've found when going through the book.

Other posts:


Backing Up Asymmetric Keys

Rating: (not yet rated) Rate this |  Discuss | 3,935 Reads | 407 Reads in Last 30 Days |4 comment(s)

I've been tech editing a book about SQL Server encryption and this series of blog posts looks at some interesting things I've found when going through the book.

We're DBAs, right? We're concerned with data, with stability, and perhaps most importantly, backing up and restoring our data. We expect to be able to do this, and at a fairly granular level. I can back up a table or a view (through bcp), a stored procedure (scripting), a file group, or a database.

As I was checking facts in the encryption book I stumbled upon this caveat. You can't back up an asymmetric key!

There's not even a backup asymmetric key command in the T-SQL reference!

I was stunned and looked all around to try and figure this out. After all, I can create an asymmetric key from a file. You use some other program to create the key files, like sn.exe from the Visual Studio resource kit, but you apparently can't get this key back out of SQL Server once it's in there.

It is in a database backup and restore, so that's good, but if I wanted to use this key in another database, or perhaps more importantly, I wanted to put the public key in one database, say an order entry system where I take credit cards in and encrypt them with this key. I don't have the private key, but I move that key to a second system, which the order entry people don't have access to. I then transfer the encrypted data between systems and can decrypt it with the private key on the second system.

If I've created my asymmetric keys in SQL Server, I can't easily do this. Can't do it for DR if I need to get data decrypted, just have to be extra sure I have a good database backup and I can use it to decrypt the data.

There's a backup certificate command, a backup master key command, and even a backup service master key command (though strangely no create service master key command), but no backup asymmetric key command.

I think this is an oversight in SQL Server encryption (as is the lack of CREATE SERVICE MASTER KEY) and ought to be corrected. Hopefully it will in the future. In the meantime, I would not recommend you creating asymmetric keys in SQL Server. Instead, create them in Visual Studio and then protect the files.

I've been tech editing a book about SQL Server encryption and this series of blog posts looks at some interesting things I've found when going through the book.

Other posts:


Until You Know Better…

By Steve Jones in SQL Musings | 03-30-2009 5:48 AM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 3,520 Reads | 385 Reads in Last 30 Days |no comments

Stick with the defaults. In fact, as Andy Warren has told me many times, unless you know better and have a reason to not do so, always go with the defaults. I think that’s good advice, and it’s why I think that defaults in applications, especially SQL Server, should be set for the values that are appropriate 80% of the time.

I saw a post on memory management that got me thinking. Someone was asking if settings have changed in 2008 and I needed to look around a bit to see if that was the case. I hadn’t seen much on this lately, and I need to write a short piece on that.

However in looking around, I stumbled across this article from Technet on How to determine proper SQL Server configuration settings. As I read through it, there was a nice trend:

In actual practice, if you change the affinity mask setting from the default it only rarely helps performance, and will frequently degrade performance.

Based on actual production experience, you do not need to use Fiber mode except in very rare circumstances.

In SQL Server 2000 or in SQL Server 2005, you cannot change the max async IO configuration setting. SQL Server 2000 or SQL Server 2005 automatically tunes this setting.

If you configure a number of worker threads to a value that is greater than the default, it is almost always counterproductive and slows performance because of scheduling and resource overhead.

These quotes from the various sections of the article seem to show that you don’t want to mess with the tuning of these “knobs” very often in SQL Server. Near the beginning of this article has this quote:

If you do make a configuration change, you must perform rigorous methodical performance testing both before and after the change to assess the degree of improvement.

This sums things up nicely. Rigorous and methodical testing should be performed or you’ll likely be making things worse and potentially not realize it at that time. For the majority of people, you should stick with the defaults unless you know better.


Version Restores

Rating: (not yet rated) Rate this |  Discuss | 4,975 Reads | 339 Reads in Last 30 Days |no comments

I saw a question today on restoring a SQL Server 7 backup to SQL Server 2005, and I wasn't sure this was supported. So I went browsing through Books online and sure enough, it's supported. I ended up in the SQL Server 2008 books online first and it's not supported there, so it appears that you get to restore from the 2 previous versions at least for now.

That's good and it means that upgrades are possible through backup and recovery, which is a nice, safe way to move things around as far as I'm concerned. I always prefer having a fallback, in this case the original server instance.

Should MS support more versions? I'm not sure they should. That's a lot of backwards compatibility for them to maintain. I know it would seem simpler since they have the 70 mode in the database, but I think interpreting comments is much different from converting the file format.

However they have the code to convert 70 databases to 80. And from 80 to 90. Why not include a utility on the media that converts old backups to a newer format?


A Custom Trace

By Steve Jones in SQL Musings | 12-08-2008 9:09 AM | Categories: Filed under: , , , ,
Rating: (not yet rated) Rate this |  Discuss | 4,510 Reads | 295 Reads in Last 30 Days |no comments

I have to admit that when I had a performance problem with some code, I would often fire up Profiler and try to trace events that were occurring right then. If you’ve used Profiler, it was clunky in v6.5, slightly worse in 7,2000, and it’s very complicated and lots of options. And if you forgot to enable some of them, you lost the data.

Now you can set up custom trace templates that will include the custom fields and events you need, but those aren’t always on each system, especially if you end up connecting to a server and trying to trace things there. And if you want to customize it for a specific set of circumstances, you either end up with tons of trace templates, or you waste time setting up new traces.

I saw in a recent post that Erland Sommerskag had posted a new sqltrace stored procedure on his site. It was written by a friend and it takes a batch as a parameter and then traces the current connection while executing a batch.

It’s simple, and it fills a quick need that can help you to easily and quickly get performance data about your batches and figure out what is going on. I tested this on my system and got and error because I neglected to closely RTFM that Erland wrote. I don’t have a c:\temp, which is the default location for trace files. You can easily change this in the source code, or even include it as a parameter (which I would recommend).

In any case, I’d urge you to check it out. It’s a quick and relatively-clean solution that can help you get quick performance information on batches of code.


Sunday Morning

By Steve Jones in SQL Musings | 09-28-2008 8:02 AM | Categories: Filed under: ,
Rating: (not yet rated) Rate this |  Discuss | 3,178 Reads | 208 Reads in Last 30 Days

I woke up early, with my schedule a bit messed up from last week and yesterday. I was tempted to lay in bed with my wife, but since there was a kid in our bed and I knew I'd be getting up early with Kyle this week, I decided to get moving.

It was nice and quiet, no kids, and so I answered a few posts, getting back above 400 (my monthly goal) and cleaning up the site. A nice cup of coffee, a bar, and some water keeping me fueled.

For some reason I don't mind these quiet early mornings, first thing I do in the day, getting work done. When Tia's gone, however, I struggle in the afternoons for some reason. I tend to go to lunch with friends and then do something around the house since I can't work. That usually means more late nights, which throw off my productivity in the morning, and I can't be productive like I am today.

On the bright side, I saw some interesting things in the forum.

Someone had an OS error, could not create file (called from xp_subdir) and they reported back that it was a remote disk and the SQL Service was running under a local account. Changed it to a doman account and it worked.

Quite a few people don't understand how logs work, thinking they are like regular log files and get written to as things happen and then grow. Someone, perhaps me, needs to write a good reference article on this. I have a good general one from Gail Shaw, but I need to get another one out there.

Got a good idea for an editoral, but it's a secret :) 

Someone had performance issues from bulk loading to a table every minute. Wow, that's quite a system if it's any size. Partitioning could help, staging, perhaps separate disk arrays, but ultimately I think you just need more horsepower (disk, memory, and CPU) in your server. 


The First One

By Steve Jones in SQL Musings | 07-08-2008 1:30 PM | Categories: Filed under: ,
Rating: (not yet rated) Rate this |  Discuss | 3,206 Reads | 229 Reads in Last 30 Days

It's been a long time since I saw a vulnerability in SQL Server released. And to date, there haven't been any for SQL Server 2005, nearly 3 years after it's release in Nov 2005.

That changed today I saw a note from Secunia just some in: Microsoft SQL Server and MSDE Multiple Vulnerabilities, which lists a few issues with SQL Server 7, 2000, and 2005. That's a major release, and I'm surprised.

I shouldn't be, after all there are security issues with all software and I knew that one would come in SQL Server 2005 a tsome point. I'm just hoping that this is fixed in 2008 and that we don't start to see more and more.


TechEd - Execution Plans

By Steve Jones in SQL Musings | 06-13-2008 6:51 AM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 6,530 Reads | 434 Reads in Last 30 Days

Randy Dyess of Solid Quality Mentors had a session and since I'd just gotten a copy of Grant Fritchey's book with the same name, I wanted to check it out.

There are 3 types of plans: graphical, text, and XML.

Graphical is cool, looks good, but it doesn't have enough information and once you are in a non-trivial situation, you have to scroll around and it's hard to hold all that information in your mind.

Text plans are what you need to learn to read. Especially in grid mode, we deal with this format for data all the time, so this makes lots of sense for many DBAs, so spend some time working on this. Once you get used to reading text plans, likely this will be your preferred method.

XML - Also hard to read, perhaps need to know XQuery, might be worth learning if you want to know more about XML. Randy doesn't like this, but I think it's worth reading about XML a bit and learning how it works since I think we'll have this data around for a long time. However reading XML isn't easy as things lik whitespace and other characters are encoded. Therefore it's not "human readable" and needs to be consumed by something. However XQuery should help here.

All plans can generate estimated or actual plans. Sometims you can't get actuals since it would be hours to execute the query, or you can't run it on the production system and need to know, or have a good idea of what's happening. Estimated plans help here, but they are not guarenteed to be the same as the actual plan.

One problem with estimated plans is temporary tables. You might need to comment one out to get the plan, and it might not be a good representation of the final plan. If you create a permanent work table and remove it, the same thing happens.

Text Plans

In the text plans, follow the parent column in your plan. It contains numbers, and the highest number happens first. The recommendation is to start tuning with the highest numbers, tuning there first since a change there will flow down to lower numbers. An observation from Randy that SQL Server seems to do the hardest or more resource intensive things first.

Options:

  • STATISTICS PROFILE ON
  • SHOWPLAN_ALL 
  • SHOWPLAN_TEXT
  • STATISTICS IO
  • STATISTICS TIME ON

Statistics IO gives you IO statistics from the query. You get more informaiton in 2008 as this has been enhanced. This is really read and scan information that you can use to determine where you might think about tuning the query. A big thing here is to try and reduce physical reads, but you need to run this a few times and be sure that the physical reads aren't the first ones to populate the cache. Once the data is in the cache, it might perform much better. 

In this case, Randy looks to reduce logical reads. If those can be reduced, even if the data is not in cache, physical reads might be reduced.

Execution plans can also be pulled from memory. You can grab the plan from the cache, which is actually what is being used by SQL Server to execute the query. DMVs are available to help here.

How do you tune?

Randy's Goals

  • Optimize duration / CPU usage
  • Optimize IO usage

Tracking duration meaning physical clock time, helps to determine how busy the system is. It's good to know an average time for procedures to execute as a baseline. Know that this will grow over time (potentially) As you add data.

CPU usage - Track this since it's a limited resource. Knowing the top usage queries can allow you to focus your efforts on those queries that are used often and impact the CPU.

Large, complex queries usually have large batches of code, each of which can be tuned as a mini-execution plan.

Top Slowdowns

  • Table and clustered index scans
  • Index scans v index seeks
  • Bookmark lookups
  • Join methods
  • Sorts
  • Compute Scalar

Table and clustered index scans are the leading cause of bad performing queries. Not always, sometimes you want this, but it is often bad. A CI scan is a table scan really, but could have better performance since you don't have as many random IOs as on a heap. Randy has this is the number one thing to tune when he sees this. If he can remove this, it often helps. Partial scans are sometimes ok, such as scanning a set of data all grouped together.

Index scans are beter than table scans, but still costly. This usually means a lot of rows ned to be read for the query. Investigate to see if you can turn these into seeks. CI seeks are the best things to have since they hit the data, but only specific rows.

Bookmark Lookups - A non-clustered index is used, but then you must use the CI key to go read the CI and get the data. In 2005 the INCLUDE columns can help remove these. In 2005/2008, you don't see this operator. You see RID (row identifier) and then a Lookup to get the data. These are hard to resolve, you might include 3-5 columns, but not more. You want to review the columns used by indexes to see if you can cover more queries. Or perhaps see if you really need to return that column and perhaps remove the columns. Don't SELECT *, that can cause this. You might also try creating aditional indexes  to be used for joins or perhaps, if appropriate for that table, change the NCI to a CI. This shouldn't be done lightly, especially if you've spent time considering what the best CI should be.

Be sure that your NCI doesn't duplicate the CI keys. They are already included.

Join Methods - SQL Server uses 3 types. Nested loop, Merge, Hash.

Nested loops are for smaller inputs, and has low memory usage. When you have a small and large tables, this often occurs. You don't want this between two large tables. Nested loops may resort your data.

If you have two large tables, a merge join in better. It's for larger inputs, and middle in terms of memory being used.

The hash join is for large tables, uses lots of memory. A row from the first table is run through a hash algorithm and then stored. The hash buckets are stored in MemToLeave, which is only 256MB on 32 bit machines. Above that, the hash tables moves to tempdb, which is much slower.

Sorts are expensive. Don't add them unnecessarily. DISTINCT, UNION, ORDER BY, GROUP BY, aren't always needed. Don't add them out of habit. UNION ALL does not produce a sort because duplicates aren't removed. If possible, allow indexes to presort things.

Compute Scalar is a function being used by the optimizer. Try to reveiew for implicit functions and watch if you have this in a WHERE clause 

 

 


TechEd - Data Corruption

By Steve Jones in SQL Musings | 06-11-2008 8:23 AM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 6,147 Reads | 394 Reads in Last 30 Days

I went to this one, mostly because I don't understand coruption that well and Paul Randal is a great speaker. Worth going to see him if you get the chance. 

Run CheckDB. It's important.

You need to know how long checkdb takes to run. Checkdb is optimized to run for the times where there are no issues. So it runs fast. If there are issues, then it does a deep dive and runs longer.

In general, keep in mind that only the first 200 messages are reported. Run all_msgs to get all data and use no_infomsgs since there are lots of things reported that aren't needed in a DR situation.

Make sure checkdb completes. Story here about a stock firm.

Lots of messages once it's complete, so there are over 100 errors checkdb can report. Many have a variety of states.

The error results aren't too readable, but there are some things you should be aware of.

First determine if checkdb completed itself. Make sure that it has completed. Common errors with the checkdb process itself.

7984-988 - corruption in the critical system tables.

8967 - invalid states within CHECKDB. 

8930 - Metadata corruption, not critical, but Checkdb cannot make sense of the metadata about a table(s)

Many of these are BOL or online documented. 

Demo with pre-corrupted databases. A critical error shown on a system table, 7995, so checkdb cannot run, including repair. This means a restore situation.

Corruption in a system table, checkdb will not run, but you can perhaps run checktable against individual tables to determine which one is corrupt.

If you have corruptions in nonclustered indexes only, then you can run repair_rebuild if it's recommended from the output. Look through the errors and if the errors are all index IDs > 1, then you can manually repair the non-clustered indexes.

The flipside here is that if you must run repair, you must be in single user mode. If the index is large, than you must have 2x space to rebuild as well, which can be a problem. If you repair with checkdb, it might be able to correct some records.

You can also run an online rebuild in 2005 (Enterprise edition), to try and repair. However since this reads the old index, you need to do an offline rebuild.

Unrepairable error

Was one found? A PFS error? (89090, 8938, 8939) - No way to fix these. You can try to extrat data, but the pages where the corruption exists will not extract. Thereforre a backup is needed.

Was it 8970, invalid data in the column. Example, more than 1439 minutes after midnight. If you get this, repair cannot fix this since it does not know what values should be there. You can repair these manually.

8992 - metadata mismatch, checkcatalog error. Cannot be repaired, but depending on what this is, you might be able to hack the system tables.

Demo

DBCC Page is unsupported/undocumented, but it's safe for production use. Only reads data. Does not change anything. It's heavily used by the SQL Server team.

One trick in binding to system tables is that you can do it with the Dedicated Admin Connection. It doesn't work with regular connections.

System tables are cached. If you start in single-user mode, and use the DAC, you can change metadata. Paul had a demo that showed hwo to fix some corruption doing this, but he got an error that the metadata cache wasn't consistent. SQL Server maintains this and if you make the change to system tables, the cache isn't updated. So you need to restart SQL Server.

Look for blog posts on this stuff from Paul in the next few weeks at SQLSkills

Recovery using backups

Not necessarily the best way to deal with corruption.With large databases, it can be quicker to run DBCC to repair things. However most corruptions require ALLOW_DATA_LOSS, which means data gets deleted.

Full backups are a good starting point, necessary for a base.

Log backups give you good recovery to point in time. 

Backups need to be valid, so you should test them. You can restore bad backups with the "Continue after restore" option. This will restore the database, which is corrupt, but you can possibly retrieve data. Even with a corrupt database, you want to still get a backup first in case things get worse. Use CONTINUEAFTERERROR with the backup as well.

Restore or repair?

Do you have a database? If no, you need to restore.

Working backups? If no, repair. Or you can restore a damaged backup.

Log damaged? Restore, if that fails, use emergency mode and extract information.

CheckDB failed? If it's a critical error, restore or extraction.

If you have nonclustered index errors - restore or repair, choose which makes sense.

If you have repairable errors, choose either repair or restore.

If you have a choice, think about SLAs (downtime and data loss). Choose the one that limits the loss.

One thing to always do in a DR situation is ensure you have the tail log backup. This are all the changes since the last log backup. This is a just a current log backup.

Paul showed a single page restore, which is fairly cool.I haven't done that, but it can help with one (or relatively few) page corruptions.

One trick I've seen over and over is that you should ALWAYS restore with NORECOVERY. That's the default, do that, it allows you to continue with restores. When you're sure you're done, do a restore with recovery by itself to brings things online.

REPAIR_ALLOW_DATA_LOSS

Be careful. You're allowing things to be repaired, which often means just deleting data. 

Repair goes the fastest, doing the most provably correct thing. It doesn't try to save data, just get done quickly. Doesn't look at FKs, constraints, replications, or any types of relationships between objects.

CHECKBD repairs aren't replicated, so you need to quiese the system and then re-init the publications.

Damaged Log

If you don't have a backup, use EMERGENCY mode to extract the data. There is an emergency mode repair in 2005 that  you can run. This will rebuild the log (after extracting what it can) and then run checkdb as one operation.

What people do? 

Restart SQL Server - Doesn't really do anything. If there's corruption, it's corrupt and you're wasting time.

Jump to a last resort -  Without determining the error, they often run repair_allow_data_loss.

Detach a suspect database - Doesn't help.

XVI32, freeware hex editor. Paul showed how to corrupt a t-log. Interesting, not a useful skill if you're not a presenter/tester.

One thing about rebuilding the log on the sly with a detach/attach, Service Broker has been given a new GUID, so it's disabled. Watch out for this if you use SOA stuff.

No backup, no database? Worst state. Kimberly says URLT (update resume, leave town).

Summary

Know the signs

Run checkDB, let it complete,

Always take a backup before restore or repair

Try to limit downtime and data loss, but think about it.

 Test this, go through this using demo databases. And using your systems.

 

More Posts Next page »