Blog Post

Embrace The Missing Index DMVs - But Proceed with Caution!

,

One of the performance tools I use all of the time is the set of Missing Index DMVs: 

·         sys.dm_db_missing_index_details– Detailed specifics on the missing indexes, including column lists 

·         sys.dm_db_missing_index_groups– relates individual missing indexes to index groups

·         sys.dm_db_missing_index_group_stats– information on potential cost and benefit of the missing indexes

·         sys.dm_db_missing_index_columns– (not regularly used but included for completeness) – included information on individual columns in the missing indexes – this information is readily retrieved from sys.dm_db_missing_index_details as groups of columns 

As I always tell you, the easiest way to start is to borrow from someone else.  The most commonly used query is from Glenn Berry’s (blog/@GlennAlanBerry) Diagnostic Information Queries.  As of the February 2017 release it is Query #31:

-- Missing Indexes for all databases by Index Advantage  (Query 31) (Missing Indexes All Databases)

SELECT CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)) AS index_advantage,

migs.last_user_seek, mid.statement AS Database.Schema.Table,

mid.equality_columns, mid.inequality_columns, mid.included_columns,

migs.unique_compiles, migs.user_seeks, migs.avg_total_user_cost, migs.avg_user_impact

FROM sys.dm_db_missing_index_group_stats AS migs WITH (NOLOCK)

INNER JOIN sys.dm_db_missing_index_groups AS mig WITH (NOLOCK)

ON migs.group_handle = mig.index_group_handle

INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)

ON mig.index_handle = mid.index_handle

ORDER BY index_advantage DESC OPTION (RECOMPILE);

------

-- Getting missing index information for all of the databases on the instance is very useful

-- Look at last user seek time, number of user seeks to help determine source and importance

-- Also look at avg_user_impact and avg_total_user_cost to help determine importance

-- SQL Server is overly eager to add included columns, so beware

-- Do not just blindly add indexes that show up from this query!!!

The results look like this:

Index

advantage

last_user

seek

Database.Schema.

Table

Equality

columns

Inequality

columns

Included

columns

Unique

compiles

User

seeks

avg_total

user_cost

avg_user

impact

219108325.6

02/16/2017 22:28:13

database1.schema2.table1

employee_id

NULL

id, as_of_effective_date

5

36411704

6.07278699

99.09

9319171.13

02/17/2017 10:09:54

database4.dbo.table3

cmpcode, year_max, period_max

NULL

rundatetime

581

3482

2679.33185

99.89

7881068.93

02/17/2017 10:05:19

database2.dbo.table99

code

grpcode

cmpcode

341

3451

2285.305576

99.93

7037526.5

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

NULL

rundatetime

453

2588

2720.924091

99.94

5861313.35

02/17/2017 10:05:19

database2.dbo.table99

NULL

grpcode

cmpcode, code

341

3451

2285.305576

74.32

3440880.84

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

rundatetime

NULL

227

1294

2661.233188

99.92

3362884.24

02/17/2017 10:05:43

database2.dbo.table12

elmlevel, deldate

NULL

cmpcode, code, name, sname

278

629

5353.357209

99.87

1646616.36

02/17/2017 10:10:39

database7.schema33.table2

NULL

doc_status

cmpcode, doccode, docnum

724

234099

10.3851265

67.73

1592595.34

02/17/2017 10:09:51

database2.dbo.table99

code

NULL

cmpcode, name, sname

140

310

5137.918128

99.99

877818.52

02/16/2017 15:57:09

database2.dbo.table99

elmlevel, grpcode

NULL

cmpcode, code

184

378

3185.120325

72.91

What this tells me is that the potentially (**potentially**) most useful index is on schema2.table1 in database1 on the employee_id column, with the id and as_of_effective_date columns along for the ride as INCLUDEs.  Since the MSSQLServer service was last restarted, the index would have been compiled 5 times (low cost) and would have been used a whopping 36 million times (huge benefit)!

…but wait!

At this point we all need to pause and consider the collective wisdom...

https://cdn.meme.am/instances/44649369.jpg
You can see that Glenn warns us in the last comment of his script “Do not just blindly add indexes that show up from this query!!!”
One of the most common reasons people quote for this is the impact such an index can have.  It is possible that adding an index can cause other queries to create/choose a query plan that is less favorable than its current plan because of the index of the new index.  Maybe Query1 was using Plan1 and running smoothly but now that there is a new index it may start using Plan2 which take milliseconds longer but is at a lower “cost.”  (Yes, milliseconds definitely matter!)

This is very uncommon but it can happen.  Always test missing indexes in a DEV/TEST environment before you roll them out into production!

Because you all have DEV/TEST environments for every single PROD environment that matches the hardware/software specs of PROD, right?

Right?  

RIGHT?
http://24.media.tumblr.com/tumblr_m8lvn0pSSH1qbsjydo1_500.jpg
Well….test if you can – I would never recommend “test in PROD” from an academic sense, but we all know in the real world we often have no choice – which is just another reason to be even more cautious of blindly adding new indexes – “missing” or otherwise.
--

One of the top reasons I say to be cautious of the Missing Index DMVs that I want to discuss has to do with duplicative suggestions.

Let’s look at the results from above again:

Index

advantage

last_user

seek

Database.Schema.

Table

Equality

columns

Inequality

columns

Included

columns

Unique

compiles

User

seeks

avg_total

user_cost

avg_user

impact

219108325.6

02/16/2017 22:28:13

database1.schema2.table1

employee_id

NULL

id, as_of_effective_date

5

36411704

6.07278699

99.09

9319171.13

02/17/2017 10:09:54

database4.dbo.table3

cmpcode, year_max, period_max

NULL

rundatetime

581

3482

2679.33185

99.89

7881068.93

02/17/2017 10:05:19

database2.dbo.table99

code

grpcode

cmpcode

341

3451

2285.305576

99.93

7037526.5

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

NULL

rundatetime

453

2588

2720.924091

99.94

5861313.35

02/17/2017 10:05:19

database2.dbo.table99

NULL

grpcode

cmpcode, code

341

3451

2285.305576

74.32

3440880.84

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

rundatetime

NULL

227

1294

2661.233188

99.92

3362884.24

02/17/2017 10:05:43

database2.dbo.table12

elmlevel, deldate

NULL

cmpcode, code, name, sname

278

629

5353.357209

99.87

1646616.36

02/17/2017 10:10:39

database7.schema33.table2

NULL

doc_status

cmpcode, doccode, docnum

724

234099

10.3851265

67.73

1592595.34

02/17/2017 10:09:51

database2.dbo.table99

code

NULL

cmpcode, name, sname

140

310

5137.918128

99.99

877818.52

02/16/2017 15:57:09

database2.dbo.table99

elmlevel, grpcode

NULL

cmpcode, code

184

378

3185.120325

72.91

Highlighted rows 4 and 6 are an example of what I call duplicative recommendations.  The CREATE INDEX statement for the two recommendations (generated using Bart Duncan’s Missing Index script) shows this even more clearly:

CREATE INDEX missing_index_2044_2043_table3 ON database4.dbo.table3 (cmpcode, usercode) INCLUDE (rundatetime)

CREATE INDEX missing_index_2046_2045_table3 ON database4.dbo.table3 (cmpcode, usercode,rundatetime)

The first index is only two columns with an INCLUDE of a third column, while the second index is only all three columns.  The second index will not only satisfy any situations needing that index, but will also satisfy any situations needing the first index.

Note that the Index Advantage (weighted average of cost and benefit) of the second index, the index we really want, is only half that of the first index.  When I report recommendations like this to the client I edit the output to match the highest Index Advantage of the duplicative indexes with the most correct recommendation – in this case I would use the second index definition (the index on all three columns with no INCLUDE) with the first Index Advantage (7037526.5).

--

Another situation similar to that of the duplicative recommendation is that of the “left-hand-equivalent” recommendation.  Consider the two highlighted rows here:

Index

advantage

last_user

seek

Database.Schema.

Table

Equality

columns

Inequality

columns

Included

columns

Unique

compiles

User

seeks

avg_total

user_cost

avg_user

impact

219108325.6

02/16/2017 22:28:13

database1.schema2.table1

employee_id

NULL

id, as_of_effective_date

5

36411704

6.07278699

99.09

9319171.13

02/17/2017 10:09:54

database4.dbo.table3

cmpcode, year_max, period_max

NULL

rundatetime

581

3482

2679.33185

99.89

7881068.93

02/17/2017 10:05:19

database2.dbo.table99

code

grpcode

cmpcode

341

3451

2285.305576

99.93

7037526.5

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

NULL

rundatetime

453

2588

2720.924091

99.94

5861313.35

02/17/2017 10:05:19

database2.dbo.table99

NULL

grpcode

cmpcode, code

341

3451

2285.305576

74.32

3440880.84

02/17/2017 10:09:39

database4.dbo.table3

cmpcode, usercode

rundatetime

NULL

227

1294

2661.233188

99.92

3362884.24

02/17/2017 10:05:43

database2.dbo.table12

elmlevel, deldate

NULL

cmpcode, code, name, sname

278

629

5353.357209

99.87

1646616.36

02/17/2017 10:10:39

database7.schema33.table2

NULL

doc_status

cmpcode, doccode, docnum

724

234099

10.3851265

67.73

1592595.34

02/17/2017 10:09:51

database2.dbo.table99

code

NULL

cmpcode, name, sname

140

310

5137.918128

99.99

877818.52

02/16/2017 15:57:09

database2.dbo.table99

elmlevel, grpcode

NULL

cmpcode, code

184

378

3185.120325

72.91

As above, here are the scripted CREATE INDEX statements for those two rows:

CREATE INDEX missing_index_35_34_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode)

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code) INCLUDE (cmpcode, name, sname)

These two indexes are not as obviously related but they are.
http://i.imgur.com/iQYuWno.jpg
They are not only on different fields, but also have different INCLUDE columns.  If you look closely though, the actual index columns are what I call “left-hand equivalent” – they both start with code and then the first index adds grpcode, so an index on code, grpcode would cover both situations for the searchable index fields.
The second piece that would truly make an index cover both situations is for it to include the sum of the INCLUDE’d columns – hence:

CREATE INDEX missing_index_250_249_table99 ON database2.dbo.table99 (code,grpcode) INCLUDE (cmpcode, name, sname)

This index on two columns with three included columns covers both situations – instead of choosing one index over the other we need to do a little work and combine them, but the effect can be very beneficial, and once you understand how it works it doesn’t take that much time.

--

Here is another (completely contrived) situation:

CREATE INDEX missing_index_44_45_table23 ON database3.dbo.table23 (name,address1) INCLUDE (address2)

CREATE INDEX missing_index_32_33_table23 ON database3.dbo.table23 (name) INCLUDE (address1, address2, state)

CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)

CREATE INDEX missing_index_48_49_table23 ON database3.dbo.table23 (name, address1,city)

So we need to start at the beginning – are the indexes all on the same database and table?  Check!  (You may chuckle but especially when looking across an instance you may find you have very similar looking databases/tables!)

Next, let’s look at left-hand equivalence.  All four indexes start with name field – so far so good.  Index_32_33 ends there, so it is a likely candidate to be consolidated with something else.

This is where it gets a little trickier – both index_44_45 and index_48_49 have address1 as their second column, which means they could be duplicative and could also be related (left-hand-equivalent) to index_32_33 upon further investigation.

Index_55_56 however does not continue with address1 – instead it has city in its second position.  This means index_55_56 is *not* duplicative of index_44_45 or index_48_49 although it can still be related to the narrowest index, index _32_33.

This demonstrates again how important the order in the index is – indexes are searched from left-to-right, so "name, city" <> "name, address".

Consider index_55_56 and index_48_49 – even though index_48_49 *does* have the city column in its index list, it is not in the same-left-to-right order (with address1 in the way) so it isn’t left-hand-equivalent and therefore not combinable.

This leaves us with two options, either of which can be optimal: 

Combine index_32_33, index_44_45, and index_48_49, and just create index_55_56 as is:

 

CREATE INDEX missing_index_98_99_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2,state) 

CREATE INDEX missing_index_55_56_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1)

 

Combine index_44_45 with index_48_49 (both containing name,address1) and index_32_33 and index_55_56:

 

CREATE INDEX missing_index_77_78_table23 ON database3.dbo.table23 (name,address1,city) INCLUDE (address2) 

CREATE INDEX missing_index_88_89_table23 ON database3.dbo.table23 (name, city) INCLUDE (address1, address2, state) 

As stated above either of these options work – they both cover all four situations.  One thing to consider is the size of the fields contained in the indexes – in option 1 we are storing eight fields (name twice, address1 twice, city twice, address2 once, and state once) whereas in option 2 we are storing *nine* fields as we have address2 in the INCLUDE of both indexes.  This may make Option 1 at least slightly “better” although depending on the datatype of address2 and the number of rows in table23, that advantage may be negligible.

--

Missing indexes are an oft-avoided subject but they really can make a difference to performance, and the algorithms inside SQL Server to help determine and weight the recommendations has become much better with each version of SQL Server.  One thing these improved algorithms still don’t watch for are the duplicative/related situations we have discussed here, so you still need to watch for them yourselves.

http://www.cindyvallar.com/crowsnest.jpg
Again, do *not* just blindly create new indexes – consider, test if possible, and weigh the advantages against the possible disadvantages such as the amount of space the index will consume.

--

Hope this helps!

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating