RE: Indexing in SQL Server 2005

One Orange Chip

Points: 28027

May 22, 2010 at 3:36 pm

There's lots of ways.

1. Parameter sniffing: SQL Server generates a plan based on parameters passed to a procedure. That plan may work great most of the time but for certain other parameters, it performs poorly. The parameter it used for the plan generation runs quickly for some values and performs way too much work for others.

For example, let's say I have a database with customer names in it. I have a nonclustered index on LastName. I perform a search for customers with the last name Williams and I return other columns with it. In order to return the data, SQL basically has to choose between performing a clustered index scan or using the index and then performing key lookups to find the other values. So SQL has to try to determine which method will be faster.

So let's say that there are 100,000 rows in the table with an average of 100 rows per leaf page (about 1000 leaf pages). If the statistics indicate that there are approximately 100 rows with the LastName of Williams, then using the nonclustered index, it will have to read in 1 or 2 leaf pages of the nonclustered index and 2 non-leaf pages and then about 100 key lookups resulting in about 100 leaf pages read in to get the other pages. And let's say that on average, it has to scan 2 non-leaf pages to get to the leaf pages. It estimates that it will need to read in about 3/4 of the table to complete the clustered index scan. So using the nonclustered index, it will have to read about 303 or 304 leaf pages vs. possibly scanning about 750 leaf pages.

That's a no brainer. Reading 304 pages will be much faster than 750 pages.

Now let's say that we run the query again searching for the last name Daniels. SQL sees that there is a good plan already in cache so it uses it. Only this time, there are actually 400 rows returned. So reads 3 or 4 leaf pages of the nonclustered index and 2 non-leaf pages. Then it has to perform 400 key lookups to get the other columns requiring on average 2 non-leaf page reads for each. So it ends up actually performing 1205 or 1206 reads. If it had performed the table scan, it might have only had to read in 750 pages.

One way that this issue could have been prevented is by creating a covereing index instead. This would have completely avoided the key lookups.

2. Bad statistics: Using the examples from above, if I queried for the name Daniels and the statistics were inaccurate, SQL Server might think that there were only 100 people with the name Daniels causing it to choose the nonclustered index seek + key lookups instead of the table scan. If in fact there were 500 people named Daniels, the query is going to take a long time to run.

3. Poor cluster key choice: I see this a lot. Using large values or non-sequential values for cluster keys can result in lots of performance problems. The bigger your cluster key, the bigger every index is. The cluster key is included in every row of a nonclustered index. The bigger the row of data in the index, the less number of rows that fit on a page the more pages the index has. Having a small cluster key vs. a large cluster key can result in a size difference of many GB's. I've done demos for people demonstrating how choosing, for example, a uniqueidentifier instead of an int can result in a size difference of more than 50 GB for a single large table very quickly.

Another poor cluster key choice is when the data is not sequential and new data coming in may end up on any page in the index. This results in a lot of page splits requiring a lot of additional work and increasing fragmentation.

4. Low usability/high maintenance indexes: Let's say that I have a set of queries that I run once a month. To speed up the queries, I create indexes on several tables. If I have a transaction tracking table has 50,000 inserts, updates, and deletes per month and I only use the index in my query once per month, then the cost to maintain the index is much greater than the usability of the index.

I have seen indexes on tables with 50,000,000+ updates due to inserts, updates, and deletes that have never been used by a query.

These are just a few examples. I could go on and on talking about other cases such as indexes with included LOB columns, duplicated indexes, etc.

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]