Finding and Eliminating Duplicate or Overlapping Indexes

Question

Finding and Eliminating Duplicate or Overlapping Indexes

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 1

That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.

Going over 300 indexes manually definitely sounds painful, but I consider it worth the day or two of effort to ensure that no mistakes are made along the way.

If you'd like to run the final duplicate detection script and have it auto-script a DROP INDEX statement with each row returned, you can do it like this:

;WITH CTE_INDEX_DATA AS (

SELECT

SCHEMA_DATA.name AS schema_name,

TABLE_DATA.name AS table_name,

INDEX_DATA.name AS index_name,

STUFF((SELECT ', ' + COLUMN_DATA_KEY_COLS.name + ' ' + CASE WHEN INDEX_COLUMN_DATA_KEY_COLS.is_descending_key = 1 THEN 'DESC' ELSE 'ASC' END -- Include column order (ASC / DESC)

FROM sys.tables AS T

INNER JOIN sys.indexes INDEX_DATA_KEY_COLS

ON T.object_id = INDEX_DATA_KEY_COLS.object_id

INNER JOIN sys.index_columns INDEX_COLUMN_DATA_KEY_COLS

ON INDEX_DATA_KEY_COLS.object_id = INDEX_COLUMN_DATA_KEY_COLS.object_id

AND INDEX_DATA_KEY_COLS.index_id = INDEX_COLUMN_DATA_KEY_COLS.index_id

INNER JOIN sys.columns COLUMN_DATA_KEY_COLS

ON T.object_id = COLUMN_DATA_KEY_COLS.object_id

AND INDEX_COLUMN_DATA_KEY_COLS.column_id = COLUMN_DATA_KEY_COLS.column_id

WHERE INDEX_DATA.object_id = INDEX_DATA_KEY_COLS.object_id

AND INDEX_DATA.index_id = INDEX_DATA_KEY_COLS.index_id

AND INDEX_COLUMN_DATA_KEY_COLS.is_included_column = 0

ORDER BY INDEX_COLUMN_DATA_KEY_COLS.key_ordinal

FOR XML PATH('')), 1, 2, '') AS key_column_list ,

STUFF(( SELECT ', ' + COLUMN_DATA_INC_COLS.name

FROM sys.tables AS T

INNER JOIN sys.indexes INDEX_DATA_INC_COLS

ON T.object_id = INDEX_DATA_INC_COLS.object_id

INNER JOIN sys.index_columns INDEX_COLUMN_DATA_INC_COLS

ON INDEX_DATA_INC_COLS.object_id = INDEX_COLUMN_DATA_INC_COLS.object_id

AND INDEX_DATA_INC_COLS.index_id = INDEX_COLUMN_DATA_INC_COLS.index_id

INNER JOIN sys.columns COLUMN_DATA_INC_COLS

ON T.object_id = COLUMN_DATA_INC_COLS.object_id

AND INDEX_COLUMN_DATA_INC_COLS.column_id = COLUMN_DATA_INC_COLS.column_id

WHERE INDEX_DATA.object_id = INDEX_DATA_INC_COLS.object_id

AND INDEX_DATA.index_id = INDEX_DATA_INC_COLS.index_id

AND INDEX_COLUMN_DATA_INC_COLS.is_included_column = 1

ORDER BY INDEX_COLUMN_DATA_INC_COLS.key_ordinal

FOR XML PATH('')), 1, 2, '') AS include_column_list,

INDEX_DATA.is_disabled, -- Check if index is disabled before determining which dupe to drop (if applicable)

'DROP INDEX ' + SCHEMA_DATA.name + '.' + TABLE_DATA.name + '.' + INDEX_DATA.name AS drop_statement

FROM sys.indexes INDEX_DATA

INNER JOIN sys.tables TABLE_DATA

ON TABLE_DATA.object_id = INDEX_DATA.object_id

INNER JOIN sys.schemas SCHEMA_DATA

ON SCHEMA_DATA.schema_id = TABLE_DATA.schema_id

WHERE TABLE_DATA.is_ms_shipped = 0

AND INDEX_DATA.type_desc IN ('NONCLUSTERED', 'CLUSTERED')

)

SELECT

*

FROM CTE_INDEX_DATA DUPE1

WHERE EXISTS

(SELECT * FROM CTE_INDEX_DATA DUPE2

WHERE DUPE1.schema_name = DUPE2.schema_name

AND DUPE1.table_name = DUPE2.table_name

AND (DUPE1.key_column_list LIKE LEFT(DUPE2.key_column_list, LEN(DUPE1.key_column_list)) OR DUPE2.key_column_list LIKE LEFT(DUPE1.key_column_list, LEN(DUPE2.key_column_list)))

AND DUPE1.index_name <> DUPE2.index_name)

Please note that you cannot drop everything returned by this report---the query returns all sets of duplicates. For each set, you must decide which to keep and which to drop. Dropping both will leave you with no index at all, so it's important to review everything returned by the query above and decide what to keep or drop before taking any further action.

Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.

erb2000 SSCommitted Points: 1697 More actions · Answer 2

Ed Pollack (6/16/2014)
That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.
<snip>
Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.

Thank you. You've gone above and beyond expectations here.

alexander.oss Old Hand Points: 302 More actions · Answer 3

I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 4

alexander.oss (6/16/2014)
I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?

That's an interesting question, and one worth looking at more closely. First off, the duplicate detection here does not take into account included columns, only key columns. Two potential duplicates with different include columns will show up here in these queries---but that is OK as there may be the chance to combine them into a single covering index.

Consider also 2 indexes like these:

Index 1: Column A ASC, Column B ASC

Index 2: Column A ASC

In this case, Index 1 is duplicating the functionality of Index 2. If you run a query on only Column A, then Index 2 would technically be slightly faster as you only need to return data on that single column, and not also on Column B.

That being said, this speed difference is very, very small. The trade off in the other direction is that more indexes mean more writes when they need to be updated.

There's no automated way to handle all of this, and manual review in each of these cases is necessary to ensure that the best decisions are made. Deciding what indexes to keep or drop can be time-consuming given the trade-offs between reads & writes, but these decisions, when done carefully and correctly based on known query data, can greatly improve SQL Server performance.

DennisPost SSCrazy Points: 2691 More actions · Answer 5

Great article!

This helped me find a bunch of overlapping indexes.

Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.

Most of the duplicates I found had filters.

When looking at included columns I always use a string sorting function to make the comparisons easier.

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 6

DennisPost (6/17/2014)
Great article!
This helped me find a bunch of overlapping indexes.
Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.
Most of the duplicates I found had filters.
When looking at included columns I always use a string sorting function to make the comparisons easier.

For anyone using filtered indexes, those would be useful additions for sure!

kris7233 Grasshopper Points: 18 More actions · Answer 7

Great article, and very helpful for me especially right now at work.

I'm wondering why you dropped

NCI_Product_Weight_DUPE and

NCI_Product_Weight_OVERLAP

Instead of dropping

NCI_Product_Weight and

NCI_Product_Weight_DUPE

Just leaving you with the index NCI_Product_Weight_OVERLAP.

You said this earlier:

"For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."

So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 8

kris7233 (6/17/2014)
Great article, and very helpful for me especially right now at work.
I'm wondering why you dropped
NCI_Product_Weight_DUPE and
NCI_Product_Weight_OVERLAP
Instead of dropping
NCI_Product_Weight and
NCI_Product_Weight_DUPE
Just leaving you with the index NCI_Product_Weight_OVERLAP.
You said this earlier:
"For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."
So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?

No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.

I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing

david.gugg SSCertifiable Points: 6166 More actions · Answer 9

Great scripts, thanks a lot! I've added them to my personal library, with you as the source.

Be still, and know that I am God - Psalm 46:10

kris7233 Grasshopper Points: 18 More actions · Answer 10

Ed Pollack (6/17/2014)
No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.
I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing

Ah okay I see now. Thanks again for the great article.

smarinova SSC Enthusiast Points: 192 More actions · Answer 11

It is very useful.

It showed statistics as duplicate indexes though.

lauri.pietarinen SSC Veteran Points: 236 More actions · Answer 12

May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.

(c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).

Note also, that "subset" includes also identical indexes.

create table t1 (

c1 int,

c2 int,

c3 int,

c4 int,

c5 int);

create index t1x1 on t1(c1);

create index t1x2 on t1(c2);

create index t1x3 on t1(c3);

create index t1x4 on t1(c1,c2);

create index t1x5 on t1(c1,c2,c3);

create index t1x6 on t1(c1,c3);

create index t1x7 on t1(c1 desc, c3);

create index t1x8 on t1(c1 desc, c3) include (c5);

create index t1x9 on t1(c3 , c5) include (c4);

create index t1x10 on t1(c1, c3 desc);

create index t1x11 on t1(c1, c2, c4, c5);

select ix2.name, 'is a subset of', ix1.name

from sys.tables tab,

sys.indexes ix1,

sys.indexes ix2

where tab.name = 't1' and

ix1.object_id = tab.object_id and

ix2.object_id = ix1.object_id and

ix1.index_id <> ix2.index_id and

ix1.type_desc <> 'HEAP' and

ix2.type_desc <> 'HEAP' and

( select count(*) -- count of common key columns in index starting from left

from sys.index_columns ixc1,

sys.index_columns ixc2

where ixc1.object_id = ix1.object_id and

ixc1.index_id = ix1.index_id and

ixc2.object_id = ix2.object_id and

ixc2.index_id = ix2.index_id and

ixc1.key_ordinal = ixc2.key_ordinal and

ixc1.column_id = ixc2.column_id and

ixc1.is_descending_key = ixc2.is_descending_key and

ixc1.is_included_column = 0 and

ixc2.is_included_column = 0

group by ixc1.object_id

having max(ixc1.key_ordinal) = count(*) -- no gaps

)

=

( select count(*) -- count of key columns in ix2

from sys.index_columns ixc2

where ixc2.object_id = ix2.object_id and

ixc2.index_id = ix2.index_id and

ixc2.is_included_column = 0 )

order by ix2.name , ix1.name

;

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 13

lauri.pietarinen (6/26/2014)
May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.
(c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).
Note also, that "subset" includes also identical indexes.
create table t1 (
c1 int,
c2 int,
c3 int,
c4 int,
c5 int);
create index t1x1 on t1(c1);
create index t1x2 on t1(c2);
create index t1x3 on t1(c3);
create index t1x4 on t1(c1,c2);
create index t1x5 on t1(c1,c2,c3);
create index t1x6 on t1(c1,c3);
create index t1x7 on t1(c1 desc, c3);
create index t1x8 on t1(c1 desc, c3) include (c5);
create index t1x9 on t1(c3 , c5) include (c4);
create index t1x10 on t1(c1, c3 desc);
create index t1x11 on t1(c1, c2, c4, c5);
select ix2.name, 'is a subset of', ix1.name
from sys.tables tab,
sys.indexes ix1,
sys.indexes ix2
where tab.name = 't1' and
ix1.object_id = tab.object_id and
ix2.object_id = ix1.object_id and
ix1.index_id <> ix2.index_id and
ix1.type_desc <> 'HEAP' and
ix2.type_desc <> 'HEAP' and
( select count(*) -- count of common key columns in index starting from left
from sys.index_columns ixc1,
sys.index_columns ixc2
where ixc1.object_id = ix1.object_id and
ixc1.index_id = ix1.index_id and
ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc1.key_ordinal = ixc2.key_ordinal and
ixc1.column_id = ixc2.column_id and
ixc1.is_descending_key = ixc2.is_descending_key and
ixc1.is_included_column = 0 and
ixc2.is_included_column = 0
group by ixc1.object_id
having max(ixc1.key_ordinal) = count(*) -- no gaps
)
=
( select count(*) -- count of key columns in ix2
from sys.index_columns ixc2
where ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc2.is_included_column = 0 )
order by ix2.name , ix1.name
;

That's a nice approach for the specific problem at hand. I actually submitted a revision (already published) to my original article to add some further functionality to the index searching. The intention of my script was to give the basics of finding overlapping/duplicate indexes and let the reader take it from there.

Currently, the script in the article will return all of your indexes as part of the duplicate-checking. It will not tell us what is a subset of what---that is left to our diligent research.

There are a ton of specific cases where more info is needed. For example, what if there were indexes with different fill factors, paddings, or were set up as hypothetical indexes? We may not want to automate every aspect of this search, but would definitely want to add those elements into the query if they matter to our specific database environment.

I like your query, though, and it's an excellent use of the system views to return a very user-friendly version of the data we are looking for! Thanks for reading, and for the helpful addition!

lauri.pietarinen SSC Veteran Points: 236 More actions · Answer 14

Thank you for your kind words!

Indeed, there can be any number of reasons for overlapping indexes and it certainly would be irresponsible to just go and drop them without further investigation. I would suggest that in practice it is harder to get rid of an index than creating new ones, even in obvious cases!

julian.fletcher SSCrazy Points: 2984 More actions · Answer 15

I've just run this on one of our databases and it returned 64 rows, which was surprising as our existing "duplicate index" checking script returned 0.

The reason is that our script doesn't match indexes if they have different filter expressions. For example, it doesn't match

CREATE UNIQUE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_UniqueConstraint_701001] ON [dbo].[T_AccountManualAdjustment]

(

[ClientID] ASC,

[ServiceCategoryID] ASC,

[UniqueIfInterestCharge] ASC

)

WHERE ([Deleted]=(0))

WITH (...)

with

CREATE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_ClientID] ON [dbo].[T_AccountManualAdjustment]

(

[ClientID] ASC

)

WITH (...)

Our reasoning is that, in this example, we couldn't replace IX_T_AccountManualAdjustment_ClientID with IX_T_AccountManualAdjustment_UniqueConstraint_701001.

Any thoughts? Thanks.

UPDATE. Sorry - I missed that this had already been raised in an earlier post.

Finding and Eliminating Duplicate or Overlapping Indexes

Cookies on SQLServerCentral