Embrace The Missing Index DMVs - But Proceed with Caution!

One of the performance tools I use all of the time is the set of Missing Index DMVs:

· sys.dm_db_missing_index_details– Detailed specifics on the missing indexes, including column lists
· sys.dm_db_missing_index_groups– relates individual missing indexes to index groups

· sys.dm_db_missing_index_group_stats– information on potential cost and benefit of the missing indexes

· sys.dm_db_missing_index_columns– (not regularly used but included for completeness) – included information on individual columns in the missing indexes – this information is readily retrieved from sys.dm_db_missing_index_details as groups of columns

As I always tell you, the easiest way to start is to borrow from someone else. The most commonly used query is from Glenn Berry’s (blog/@GlennAlanBerry) Diagnostic Information Queries. As of the February 2017 release it is Query #31:

-- Missing Indexes for all databases by Index Advantage (Query 31) (Missing Indexes All Databases)

SELECT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS index_advantage,

migs.last_user_seek, mid.statement AS Database.Schema.Table,

mid.equality_columns, mid.inequality_columns, mid.included_columns,

migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact

FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)

INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)

ON migs.group_handle = mig.index_group_handle

INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)

ON mig.index_handle = mid.index_handle

ORDER BY index_advantage DESC OPTION (RECOMPILE);

------

-- Getting missing index information for all of the databases on the instance is very useful

-- Look at last user seek time, number of user seeks to help determine source and importance

-- Also look at avg_user_impact and avg_total_user_cost to help determine importance

-- SQL Server is overly eager to add included columns, so beware

-- Do not just blindly add indexes that show up from this query!!!

The results look like this:

Index advantage	last_user seek	Database.Schema. Table	Equality columns	Inequality columns	Included columns	Unique compiles	User seeks	avg_total user_cost	avg_user impact
219108325.6	02/16/2017 22:28:13	database1.schema2.table1	employee_id	NULL	id, as_of_effective_date	5	36411704	6.07278699	99.09
9319171.13	02/17/2017 10:09:54	database4.dbo.table3	cmpcode, year_max, period_max	NULL	rundatetime	581	3482	2679.33185	99.89
7881068.93	02/17/2017 10:05:19	database2.dbo.table99	code	grpcode	cmpcode	341	3451	2285.305576	99.93
7037526.5	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	NULL	rundatetime	453	2588	2720.924091	99.94
5861313.35	02/17/2017 10:05:19	database2.dbo.table99	NULL	grpcode	cmpcode, code	341	3451	2285.305576	74.32
3440880.84	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	rundatetime	NULL	227	1294	2661.233188	99.92
3362884.24	02/17/2017 10:05:43	database2.dbo.table12	elmlevel, deldate	NULL	cmpcode, code, name, sname	278	629	5353.357209	99.87
1646616.36	02/17/2017 10:10:39	database7.schema33.table2	NULL	doc_status	cmpcode, doccode, docnum	724	234099	10.3851265	67.73
1592595.34	02/17/2017 10:09:51	database2.dbo.table99	code	NULL	cmpcode, name, sname	140	310	5137.918128	99.99
877818.52	02/16/2017 15:57:09	database2.dbo.table99	elmlevel, grpcode	NULL	cmpcode, code	184	378	3185.120325	72.91

What this tells me is that the potentially (**potentially**) most useful index is on schema2.table1 in database1 on the employee_id column, with the id and as_of_effective_date columns along for the ride as INCLUDEs. Since the MSSQLServer service was last restarted, the index would have been compiled 5 times (low cost) and would have been used a whopping 36 million times (huge benefit)!

…but wait!

At this point we all need to pause and consider the collective wisdom...

https://cdn.meme.am/instances/44649369.jpg

You can see that Glenn warns us in the last comment of his script “Do not just blindly add indexes that show up from this query!!!”

One of the most common reasons people quote for this is the impact such an index can have. It is possible that adding an index can cause other queries to create/choose a query plan that is less favorable than its current plan because of the index of the new index. Maybe Query1 was using Plan1 and running smoothly but now that there is a new index it may start using Plan2 which take milliseconds longer but is at a lower “cost.” (Yes, milliseconds definitely matter!)

This is very uncommon but it can happen. Always test missing indexes in a DEV/TEST environment before you roll them out into production!

Because you all have DEV/TEST environments for every single PROD environment that matches the hardware/software specs of PROD, right?

Right?

RIGHT?

http://24.media.tumblr.com/tumblr_m8lvn0pSSH1qbsjydo1_500.jpg

Well….test if you can – I would never recommend “test in PROD” from an academic sense, but we all know in the real world we often have no choice – which is just another reason to be even more cautious of blindly adding new indexes – “missing” or otherwise.

One of the top reasons I say to be cautious of the Missing Index DMVs that I want to discuss has to do with duplicative suggestions.

Let’s look at the results from above again:

Index advantage	last_user seek	Database.Schema. Table	Equality columns	Inequality columns	Included columns	Unique compiles	User seeks	avg_total user_cost	avg_user impact
219108325.6	02/16/2017 22:28:13	database1.schema2.table1	employee_id	NULL	id, as_of_effective_date	5	36411704	6.07278699	99.09
9319171.13	02/17/2017 10:09:54	database4.dbo.table3	cmpcode, year_max, period_max	NULL	rundatetime	581	3482	2679.33185	99.89
7881068.93	02/17/2017 10:05:19	database2.dbo.table99	code	grpcode	cmpcode	341	3451	2285.305576	99.93
7037526.5	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	NULL	rundatetime	453	2588	2720.924091	99.94
5861313.35	02/17/2017 10:05:19	database2.dbo.table99	NULL	grpcode	cmpcode, code	341	3451	2285.305576	74.32
3440880.84	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	rundatetime	NULL	227	1294	2661.233188	99.92
3362884.24	02/17/2017 10:05:43	database2.dbo.table12	elmlevel, deldate	NULL	cmpcode, code, name, sname	278	629	5353.357209	99.87
1646616.36	02/17/2017 10:10:39	database7.schema33.table2	NULL	doc_status	cmpcode, doccode, docnum	724	234099	10.3851265	67.73
1592595.34	02/17/2017 10:09:51	database2.dbo.table99	code	NULL	cmpcode, name, sname	140	310	5137.918128	99.99
877818.52	02/16/2017 15:57:09	database2.dbo.table99	elmlevel, grpcode	NULL	cmpcode, code	184	378	3185.120325	72.91

Highlighted rows 4 and 6 are an example of what I call duplicative recommendations. The CREATE INDEX statement for the two recommendations (generated using Bart Duncan’s Missing Index script) shows this even more clearly:

CREATE INDEX missing_index_2044_2043_table3 ON database4.dbo.table3 (cmpcode, usercode) INCLUDE (rundatetime)

CREATE INDEX missing_index_2046_2045_table3 ON database4.dbo.table3 (cmpcode, usercode,rundatetime)

The first index is only two columns with an INCLUDE of a third column, while the second index is only all three columns. The second index will not only satisfy any situations needing that index, but will also satisfy any situations needing the first index.

Note that the Index Advantage (weighted average of cost and benefit) of the second index, the index we really want, is only half that of the first index. When I report recommendations like this to the client I edit the output to match the highest Index Advantage of the duplicative indexes with the most correct recommendation – in this case I would use the second index definition (the index on all three columns with no INCLUDE) with the first Index Advantage (7037526.5).

Another situation similar to that of the duplicative recommendation is that of the “left-hand-equivalent” recommendation. Consider the two highlighted rows here:

Index advantage	last_user seek	Database.Schema. Table	Equality columns	Inequality columns	Included columns	Unique compiles	User seeks	avg_total user_cost	avg_user impact
219108325.6	02/16/2017 22:28:13	database1.schema2.table1	employee_id	NULL	id, as_of_effective_date	5	36411704	6.07278699	99.09
9319171.13	02/17/2017 10:09:54	database4.dbo.table3	cmpcode, year_max, period_max	NULL	rundatetime	581	3482	2679.33185	99.89
7881068.93	02/17/2017 10:05:19	database2.dbo.table99	code	grpcode	cmpcode	341	3451	2285.305576	99.93
7037526.5	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	NULL	rundatetime	453	2588	2720.924091	99.94
5861313.35	02/17/2017 10:05:19	database2.dbo.table99	NULL	grpcode	cmpcode, code	341	3451	2285.305576	74.32
3440880.84	02/17/2017 10:09:39	database4.dbo.table3	cmpcode, usercode	rundatetime	NULL	227	1294	2661.233188	99.92
3362884.24	02/17/2017 10:05:43	database2.dbo.table12	elmlevel, deldate	NULL	cmpcode, code, name, sname	278	629	5353.357209	99.87
1646616.36	02/17/2017 10:10:39	database7.schema33.table2	NULL	doc_status	cmpcode, doccode, docnum	724	234099	10.3851265	67.73
1592595.34	02/17/2017 10:09:51	database2.dbo.table99	code	NULL	cmpcode, name, sname	140	310	5137.918128	99.99
877818.52	02/16/2017 15:57:09	database2.dbo.table99	elmlevel, grpcode	NULL	cmpcode, code	184	378	3185.120325	72.91

As above, here are the scripted CREATE INDEX statements for those two rows:

CREATE INDEX missing_index_35_34_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode)

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code) INCLUDE (cmpcode, name, sname)

These two indexes are not as obviously related but they are.

http://i.imgur.com/iQYuWno.jpg

They are not only on different fields, but also have different INCLUDE columns. If you look closely though, the actual index columns are what I call “left-hand equivalent” – they both start with code and then the first index adds grpcode, so an index on code, grpcode would cover both situations for the searchable index fields.

The second piece that would truly make an index cover both situations is for it to include the sum of the INCLUDE’d columns – hence:

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode, name, sname)

This index on two columns with three included columns covers both situations – instead of choosing one index over the other we need to do a little work and combine them, but the effect can be very beneficial, and once you understand how it works it doesn’t take that much time.

Here is another (completely contrived) situation:

CREATE INDEX missing_index_44_45_table23 ON database3.dbo.table23 (name,address1) INCLUDE (address2)

CREATE INDEX missing_index_32_33_table23 ON database3.dbo.table23 (name) INCLUDE (address1, address2, state)

CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)

CREATE INDEX missing_index_48_49_table23 ON database3.dbo.table23 (name, address1,city)

So we need to start at the beginning – are the indexes all on the same database and table? Check! (You may chuckle but especially when looking across an instance you may find you have very similar looking databases/tables!)

Next, let’s look at left-hand equivalence. All four indexes start with name field – so far so good. Index_32_33 ends there, so it is a likely candidate to be consolidated with something else.

This is where it gets a little trickier – both index_44_45 and index_48_49 have address1 as their second column, which means they could be duplicative and could also be related (left-hand-equivalent) to index_32_33 upon further investigation.

Index_55_56 however does not continue with address1 – instead it has city in its second position. This means index_55_56 is *not* duplicative of index_44_45 or index_48_49 although it can still be related to the narrowest index, index _32_33.

This demonstrates again how important the order in the index is – indexes are searched from left-to-right, so "name, city" <> "name, address".

Consider index_55_56 and index_48_49 – even though index_48_49 *does* have the city column in its index list, it is not in the same-left-to-right order (with address1 in the way) so it isn’t left-hand-equivalent and therefore not combinable.

This leaves us with two options, either of which can be optimal:

Combine index_32_33, index_44_45, and index_48_49, and just create index_55_56 as is:

CREATE INDEX missing_index_98_99_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2,state)
CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)

Combine index_44_45 with index_48_49 (both containing name,address1) and index_32_33 and index_55_56:

CREATE INDEX missing_index_77_78_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2)
CREATE INDEX missing_index_88_89_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1, address2, state)

As stated above either of these options work – they both cover all four situations. One thing to consider is the size of the fields contained in the indexes – in option 1 we are storing eight fields (name twice, address1 twice, city twice, address2 once, and state once) whereas in option 2 we are storing *nine* fields as we have address2 in the INCLUDE of both indexes. This may make Option 1 at least slightly “better” although depending on the datatype of address2 and the number of rows in table23, that advantage may be negligible.

Missing indexes are an oft-avoided subject but they really can make a difference to performance, and the algorithms inside SQL Server to help determine and weight the recommendations has become much better with each version of SQL Server. One thing these improved algorithms still don’t watch for are the duplicative/related situations we have discussed here, so you still need to watch for them yourselves.

http://www.cindyvallar.com/crowsnest.jpg

Again, do *not* just blindly create new indexes – consider, test if possible, and weigh the advantages against the possible disadvantages such as the amount of space the index will consume.

Hope this helps!

Embrace The Missing Index DMVs - But Proceed with Caution!

Rate

Share

Share

Rate

Embrace The Missing Index DMVs - But Proceed with Caution!

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts