Blog Post

Missing indexes

,

One of the most common things I encounter when asked to help with performance problems, is wrong or inadequate indexing. Creating the optimal indexes for a system, is no where near a trivial excercise. You need to consider the read/write ratio as well as how you write your queries. This is not the scope of this blog post.

SQL Server has a number of dynamic management views/functions, which is very useful for different purposes. At least if you know they are there, and how to use them. For missing index information, these views are important:

select * from sys.dm_db_missing_index_group_stats
select * from sys.dm_db_missing_index_details
select * from sys.dm_db_missing_index_groups

 

sys.dm_db_missing_index_group_stats tracks information about indexes which could have been useful to the SQL Server. It holds information about how many times the SQL Server could have used that index, and how great an impact it would have had on the query. This DMV does not give you the details about the table and columns that the index should have been created on.

This is however available in the sys.dm_db_missing_index_details DMV, where you get information about the database_id, object_id and the columns in the index.

The last DMV is only used to bind the two DMV’s together.

The DMV’s contains data on server level, so it will show details about missing index in all your databases.

Let me show how a basic query on these DMV’s could look like:

SELECT *
FROM
    sys.dm_db_missing_index_groups g
    INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle

 

On my local SQL Server, this gives me an empty resultset:

image

So, let me create some test data:

 

CREATE DATABASE IndexDemo
GO
USE IndexDemo
GO
CREATE TABLE DemoData (
    Id INT IDENTITY PRIMARY KEY,
    Val1 INT,
    Val2 INT,
    Val3 CHAR(4000)
)
GO
INSERT INTO DemoData (Val1, Val2, Val3)
VALUES (1,1, '')
GO
INSERT INTO DemoData (Val1, Val2, Val3)
SELECT Val1, Val2, Val3
FROM DemoData
GO 15

 

I have created a table with a clustered index on the Id column, and filled the table with 32K rows. Now I will create a query to return the Id and Val1 column, given a specific value for Val2:

SELECT  Id, Val1
FROM    DemoData
WHERE   Val2 = 123

So, now let me try to run the missing index query again, but this time only looking at a limited number of columns:

SELECT
    user_seeks,
    last_user_seek,
    avg_total_user_cost,
    avg_user_impact,
    database_id,
    object_id,
    equality_columns,
    inequality_columns,
    included_columns
FROM
    sys.dm_db_missing_index_groups g
    INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle

 

image

 

Now one row is returned. Let’s go through the returned columns, and see what the values mean:

user_seeks
This tells you how many times the SQL Server could have used this missing index to lookup data. The higher this number is, the more reason to build the index. How high this is, depends on your system. On high volume OLTP setups, numbers of less the 10.000 is not worth worrying about, and on smaller systems a value of 20-100 might be high enough to consider the index.

last_user_seek
This tells you when the missing index was last needed. If this value is equal to last saturday at night, it _could_ mean that you have a nightly job running on saturday, which might need this index.

avg_total_user_cost
This value does not have a unit, but it’s a value that tells  you something about the estimated cost of the queries that needed the index. The higher the value is, the more resource intensive is the query that needs the index. If the value is very low (perhaps 0.01) then the query that needed the index was not very resource intensive. Think of a query that executes in 2ms compared to a query that takes 2000ms. You will probably gain more by adding an index to satisfy the 2000ms query than the 2ms one.

avg_user_impact
This is the expected improvement of the query if you build the query. The closer the value is to 100, the more the query will benefit from building the index.

database_id
This is the database where you need to build the index.

object_id
This is the object_id of the table that could need an index.

equality_columns
These columns are the ones that you should build the index on. Equality columns means that the predicate used the column like this: “Val2 = x”, which was exactly how I wrote my demo query.

inequality_columns
You should also add these columns to your index. Inequality columns are used for range predicates, like: “Val2 > x”. My missing index query returned NULL in this column, because my demo query had no such predicate.

included_columns
If columns are used in the SELECT part of your query, and not in the predicate, you can add these columns as included columns in the index. This way all needed data is available in the index, and no key lookup is needed on the clustered index. But beware! The more columns you add to your index, the more write IO and space you need. My rule of thumb says, that the total number of columns (equality_columns + inequality_columns + included_columns) should not exceed 5. If the total number of columns exceeds 5, that does not mean that you should not build the index – it just means that you need to think carefully about what you are doing!

So, normally I run this query, and returns the rows ordered by user_seeks desc. Then I look at the top 10-20 rows, and see if any of them have avg_user_impact in the 90-100 range. Then I look for the number of columns in the index, and then I decide whether or not the index should be build. This is NOT a query that gives you the answer to everything, but it can definately help you spot the indexes that you should add to your system.

Finally you can actually modify the query to give you the full CREATE INDEX statement, so you just need to copy that to a query window and execute it. The full query I usually use, looks like this:

DECLARE @DBName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
--If this line is commented in, the missing index list will only contain
--missing indexes for the given database
--SET @DBName = 'MyDatabase'
--SET @TableName = 'MyTable'
;WITH CTE
AS
(
SELECT
    DB_NAME(d.database_id) AS DatabaseName,
    user_seeks,
    user_scans,
    avg_total_user_cost,
    avg_user_impact,
    d.equality_columns,
    d.inequality_columns,
    d.included_columns,
    'USE ' + DB_NAME(d.database_id) + '; CREATE NONCLUSTERED INDEX IX_' +
    replace(replace(replace(replace(isnull(equality_columns, '') +
    isnull(inequality_columns, ''), ',', '_'), '[', ''),']', ''), ' ', '') +
    CASE WHEN included_columns IS NOT NULL
    THEN '_INC_' + replace(replace(replace(replace(included_columns, ',', '_'), '[', ''),']', ''), ' ', '')
    ELSE '' END + ' ON ' + statement + ' (' +
    CASE
    WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL
    THEN equality_columns + ', ' + inequality_columns
    WHEN equality_columns IS NOT NULL AND inequality_columns IS NULL
    THEN equality_columns
    WHEN equality_columns IS NULL AND inequality_columns IS NOT NULL
    THEN inequality_columns
    END + ')' +
    CASE WHEN included_columns IS NOT NULL THEN ' INCLUDE (' +
     replace(replace(replace(included_columns, '[', ''),']', ''), ' ', '') + ')'
     ELSE '' END +
    CASE WHEN @@Version LIKE '%Enterprise%' THEN ' WITH (ONLINE = ON)'
    ELSE '' END AS CreateIndex
FROM
    sys.dm_db_missing_index_groups g
    INNER JOIN sys.dm_db_missing_index_group_stats gs on gs.group_handle = g.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details d on g.index_handle = d.index_handle
WHERE
    (DB_NAME(d.database_id) = @DBName
    OR @DBName IS NULL)
)
SELECT * FROM CTE
WHERE CreateIndex LIKE '%'+@TableName+'%' OR @TableName IS NULL
ORDER BY user_seeks DESC

 

Feel free to modify to match your naming standards.

@HenrikSjang

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating