Every DBA deals extensively with indexes – in a love-hate relationship. You love what they can do for performance, but simultaneously you hate what they can do to performance. Every time that the database fails to yield instant results for that query of the billion-row table with a LIKE ‘%something%’ clause, a developer or an ad-hoc user thinks that you can make that query perform like lightning with an index.
SQL Server can be both a hero and a villain in this saga. Suggested indexes may be just the trick for the particular query for which they’re suggested, but may be duplicative of other indexes, or prima facie stupid (e.g., an index with an INCLUDE of almost every column of the database. The index INCLUDE clause is not supposed to be a data backup method!
The DBA shares the three virtues of a developer: laziness, impatience and hubris. With SQL Server indexing, the better three virtues are patience, diligence and humility. This article explains what the DBA should examine on a regular basis. It’s intended to be an overall view and to recommend some general resources to start you on your way (hopefully to your new Tesla as management sees benefits from your labor).
Part I – Missing Indexes
Missing indexes are important because SQL Server can, with the correct indexing, improve the performance of a given query by as much as one hundred times. However, one must be diligent.
Finding “Missing Indexes” is Easy
SQL Server includes a neat DMV (Dynamic Management View). It’s named sys.dm_db_missing_index_details. It was introduced in SQL Server 2005, and it is in essence every single index recommendation for every single command run against that database since the last SQL Server restart. Any time that SQL Server wishes that it had an index that is “just so,” it shows up in this DMV.
You can create nifty queries that will join to related views, determine how much improvement would result from that index and the number of times it was called, and created a weighted rank to determine the indexes that would give you the most bang for your buck.
Pinal Dave created this extremely popular script that not only does all of this, but it even generates prototype CREATE INDEX statements. It’s easy as opening the SQL script in Management Studio and executing it. I recommend doing this weekly on every OLTP database and all data warehouses.
Analyzing Missing Indexes Requires Diligence
Great! You think, "Wow! I can just go download Pinal’s script, copy out all of the recommendations and apply! I’m done!" Ha! Your SAN and Server sales reps would love that as your database grows rapidly and requires more power to handle its workload.
Pinal carefully notes that the script is “just for guidance,” and rightly so. Remember that I wrote above that SQL Server can recommend stupid indexes. I’ve seen them. Index on one column with every other column in the table in the INCLUDE. Duplicate an existing index except for different or additional INCLUDE columns. Recommend an index on A, B and C when one with A and B already exists.
This is where diligence comes into play. You need to examine every recommendation against the following criteria:
- Is there another index that differs only in INCLUDE columns? If so, perhaps you should create one covering index that INCLUDEs all columns that are in both indexes’ INCLUDE. Or possibly, just include NO columns. What would influence that decision one way or the other? How about row count? Do you want to have an index that INCLUDEs ten of twenty columns of a billion-row table? Probably not. Better to save disk space and INSERT/UPDATE I/O by letting the index fetch the data from matching rows instead of its own leaf.
- Is there another index that has almost as many columns as the recommendation? Let’s go back to the example above: An index on A and B exists; SQL server wants one on A, B and C. Perhaps you can drop the index on A and B and replace it with one on A, B and C.
- Is the index worth the trouble? Let’s say that SQL Server asked for an index that is projected to improve the query by 23%, and the query was executed 15 times in the last 90 days. On an active or large table, that is an awful lot of INSERT/UPDATE work to help out a query that executes on the average of once every six days.
I’d love to have developed a script that could compare every missing index recommendation against the existing indexes and suggest alternatives. I probably could develop one with about 4,000 lines of SQL and 200 hours of my time. For now, however, Pinal Dave’s script is outstanding and gets the job done for the diligent DBA.
Part II – Unused Indexes
Unused indexes are pure evil. How evil? More evil than Jar-Jar Binks. SQL Server is being put to forced labor in maintaining indexes that nobody uses. Any good DBA will want to whack those indexes; to light-saber them in two vertically just as some Jedi should have done to ol’ Jar-Jar (HINT HINT!!! for Episode 8’s opening scene).
Again, Pinal Dave makes use of a DMV, in this case sys.dm_db_index_usage_stats, to find out which indexes are unused. He has created one awesome script that even includes a drop statement for the offenders that the DBA wishes to send to Valhalla.
Caveats on Unused Indexes
However, I must again note that these metrics are taken since the last SQL Server startup. I strongly recommend creating a job to store these metrics and accumulate these stats, so that you can get a full picture.
Moreover, let’s assume that these tables are replicated using Merge or Peer-to-Peer Transactional Replication. If you would like to keep your schemas constant between the nodes (a very good idea), you will need to accumulate these stats across every node. Then schedule a DROP on a monthly basis.
Part III – Active Heaps are a Heap of Trouble
No table that is actively maintained should be a heap. A heap is the SQL Server equivalent of a needle in a haystack unless you are doing an unordered sequential SELECT.
Every table needs a nice, narrow clustered index that makes sense. My favorite is a int or bigint IDENTITY column. Each key is larger than the one before it, eliminating page splits to insert rows. The clustering column need not be the table’s natural primary key. SQL Server easily permits a nonclustered primary key and a unique clustered index on the same table.
Why are active heaps bad?
No heaps = best practice. Yes, I know that there are exceptions (e.g., a log table that is only INSERTED to and almost never read, or a staging table for loading bulk quantities of rows to another table). On the whole, however, heaps are a heap of trouble.
- They are slow : Heaps are slower on Select, Update and Delete. If the table has a nonclustered index, even inserts will be slower.
- They become badly fragmented : Heaps become fragmented as time passes and new rows are inserted at the end of the heap and not where old ones are deleted. The bad part is that a heap cannot be defragmented like a clustered index can.
Finding the heaps in your database is a simple query:
select OBJECT_NAME(object_id) as TableName , * from sys.indexes where index_id = 0;
Now you have a list of heaps. What to do?
Create a clustered unique index (not necessarily a primary key), preferably on an int or bigint IDENTITY column. Simple.
The modern DBA should consider index management a critical part of his or her job. Done correctly, a lousy performing application can dramatically improve. It’s worth doing it well.
John F. Tamburo is the Chief Database Administrator for Landauer, Inc., the world's leading authority on radiation measurement, physics and education. John can be found at @SQLBlimp on Twitter. John also blogs at www.sqlblimp.com.