Finding and Eliminating Duplicate or Overlapping Indexes

  • That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.

    Going over 300 indexes manually definitely sounds painful, but I consider it worth the day or two of effort to ensure that no mistakes are made along the way.

    If you'd like to run the final duplicate detection script and have it auto-script a DROP INDEX statement with each row returned, you can do it like this:

    ;WITH CTE_INDEX_DATA AS (

    SELECT

    SCHEMA_DATA.name AS schema_name,

    TABLE_DATA.name AS table_name,

    INDEX_DATA.name AS index_name,

    STUFF((SELECT ', ' + COLUMN_DATA_KEY_COLS.name + ' ' + CASE WHEN INDEX_COLUMN_DATA_KEY_COLS.is_descending_key = 1 THEN 'DESC' ELSE 'ASC' END -- Include column order (ASC / DESC)

    FROM sys.tables AS T

    INNER JOIN sys.indexes INDEX_DATA_KEY_COLS

    ON T.object_id = INDEX_DATA_KEY_COLS.object_id

    INNER JOIN sys.index_columns INDEX_COLUMN_DATA_KEY_COLS

    ON INDEX_DATA_KEY_COLS.object_id = INDEX_COLUMN_DATA_KEY_COLS.object_id

    AND INDEX_DATA_KEY_COLS.index_id = INDEX_COLUMN_DATA_KEY_COLS.index_id

    INNER JOIN sys.columns COLUMN_DATA_KEY_COLS

    ON T.object_id = COLUMN_DATA_KEY_COLS.object_id

    AND INDEX_COLUMN_DATA_KEY_COLS.column_id = COLUMN_DATA_KEY_COLS.column_id

    WHERE INDEX_DATA.object_id = INDEX_DATA_KEY_COLS.object_id

    AND INDEX_DATA.index_id = INDEX_DATA_KEY_COLS.index_id

    AND INDEX_COLUMN_DATA_KEY_COLS.is_included_column = 0

    ORDER BY INDEX_COLUMN_DATA_KEY_COLS.key_ordinal

    FOR XML PATH('')), 1, 2, '') AS key_column_list ,

    STUFF(( SELECT ', ' + COLUMN_DATA_INC_COLS.name

    FROM sys.tables AS T

    INNER JOIN sys.indexes INDEX_DATA_INC_COLS

    ON T.object_id = INDEX_DATA_INC_COLS.object_id

    INNER JOIN sys.index_columns INDEX_COLUMN_DATA_INC_COLS

    ON INDEX_DATA_INC_COLS.object_id = INDEX_COLUMN_DATA_INC_COLS.object_id

    AND INDEX_DATA_INC_COLS.index_id = INDEX_COLUMN_DATA_INC_COLS.index_id

    INNER JOIN sys.columns COLUMN_DATA_INC_COLS

    ON T.object_id = COLUMN_DATA_INC_COLS.object_id

    AND INDEX_COLUMN_DATA_INC_COLS.column_id = COLUMN_DATA_INC_COLS.column_id

    WHERE INDEX_DATA.object_id = INDEX_DATA_INC_COLS.object_id

    AND INDEX_DATA.index_id = INDEX_DATA_INC_COLS.index_id

    AND INDEX_COLUMN_DATA_INC_COLS.is_included_column = 1

    ORDER BY INDEX_COLUMN_DATA_INC_COLS.key_ordinal

    FOR XML PATH('')), 1, 2, '') AS include_column_list,

    INDEX_DATA.is_disabled, -- Check if index is disabled before determining which dupe to drop (if applicable)

    'DROP INDEX ' + SCHEMA_DATA.name + '.' + TABLE_DATA.name + '.' + INDEX_DATA.name AS drop_statement

    FROM sys.indexes INDEX_DATA

    INNER JOIN sys.tables TABLE_DATA

    ON TABLE_DATA.object_id = INDEX_DATA.object_id

    INNER JOIN sys.schemas SCHEMA_DATA

    ON SCHEMA_DATA.schema_id = TABLE_DATA.schema_id

    WHERE TABLE_DATA.is_ms_shipped = 0

    AND INDEX_DATA.type_desc IN ('NONCLUSTERED', 'CLUSTERED')

    )

    SELECT

    *

    FROM CTE_INDEX_DATA DUPE1

    WHERE EXISTS

    (SELECT * FROM CTE_INDEX_DATA DUPE2

    WHERE DUPE1.schema_name = DUPE2.schema_name

    AND DUPE1.table_name = DUPE2.table_name

    AND (DUPE1.key_column_list LIKE LEFT(DUPE2.key_column_list, LEN(DUPE1.key_column_list)) OR DUPE2.key_column_list LIKE LEFT(DUPE1.key_column_list, LEN(DUPE2.key_column_list)))

    AND DUPE1.index_name <> DUPE2.index_name)

    Please note that you cannot drop everything returned by this report---the query returns all sets of duplicates. For each set, you must decide which to keep and which to drop. Dropping both will leave you with no index at all, so it's important to review everything returned by the query above and decide what to keep or drop before taking any further action.

    Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.

  • Ed Pollack (6/16/2014)


    That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.

    <snip>

    Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.

    Thank you. You've gone above and beyond expectations here.

  • I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?

  • alexander.oss (6/16/2014)


    I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?

    That's an interesting question, and one worth looking at more closely. First off, the duplicate detection here does not take into account included columns, only key columns. Two potential duplicates with different include columns will show up here in these queries---but that is OK as there may be the chance to combine them into a single covering index.

    Consider also 2 indexes like these:

    Index 1: Column A ASC, Column B ASC

    Index 2: Column A ASC

    In this case, Index 1 is duplicating the functionality of Index 2. If you run a query on only Column A, then Index 2 would technically be slightly faster as you only need to return data on that single column, and not also on Column B.

    That being said, this speed difference is very, very small. The trade off in the other direction is that more indexes mean more writes when they need to be updated.

    There's no automated way to handle all of this, and manual review in each of these cases is necessary to ensure that the best decisions are made. Deciding what indexes to keep or drop can be time-consuming given the trade-offs between reads & writes, but these decisions, when done carefully and correctly based on known query data, can greatly improve SQL Server performance.

  • Great article!

    This helped me find a bunch of overlapping indexes.

    Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.

    Most of the duplicates I found had filters.

    When looking at included columns I always use a string sorting function to make the comparisons easier.

  • DennisPost (6/17/2014)


    Great article!

    This helped me find a bunch of overlapping indexes.

    Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.

    Most of the duplicates I found had filters.

    When looking at included columns I always use a string sorting function to make the comparisons easier.

    For anyone using filtered indexes, those would be useful additions for sure!

  • Great article, and very helpful for me especially right now at work.

    I'm wondering why you dropped

    NCI_Product_Weight_DUPE and

    NCI_Product_Weight_OVERLAP

    Instead of dropping

    NCI_Product_Weight and

    NCI_Product_Weight_DUPE

    Just leaving you with the index NCI_Product_Weight_OVERLAP.

    You said this earlier:

    "For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."

    So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?

  • kris7233 (6/17/2014)


    Great article, and very helpful for me especially right now at work.

    I'm wondering why you dropped

    NCI_Product_Weight_DUPE and

    NCI_Product_Weight_OVERLAP

    Instead of dropping

    NCI_Product_Weight and

    NCI_Product_Weight_DUPE

    Just leaving you with the index NCI_Product_Weight_OVERLAP.

    You said this earlier:

    "For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."

    So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?

    No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.

    I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing 🙂

  • Great scripts, thanks a lot! I've added them to my personal library, with you as the source.


    [font="Tahoma"]Personal blog relating fishing to database administration:[/font]

    [font="Comic Sans MS"]https://davegugg.wordpress.com[/url]/[/font]

  • Ed Pollack (6/17/2014)


    No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.

    I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing 🙂

    Ah okay I see now. Thanks again for the great article.

  • It is very useful.

    It showed statistics as duplicate indexes though.

  • May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.

    (c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).

    Note also, that "subset" includes also identical indexes.

    create table t1 (

    c1 int,

    c2 int,

    c3 int,

    c4 int,

    c5 int);

    create index t1x1 on t1(c1);

    create index t1x2 on t1(c2);

    create index t1x3 on t1(c3);

    create index t1x4 on t1(c1,c2);

    create index t1x5 on t1(c1,c2,c3);

    create index t1x6 on t1(c1,c3);

    create index t1x7 on t1(c1 desc, c3);

    create index t1x8 on t1(c1 desc, c3) include (c5);

    create index t1x9 on t1(c3 , c5) include (c4);

    create index t1x10 on t1(c1, c3 desc);

    create index t1x11 on t1(c1, c2, c4, c5);

    select ix2.name, 'is a subset of', ix1.name

    from sys.tables tab,

    sys.indexes ix1,

    sys.indexes ix2

    where tab.name = 't1' and

    ix1.object_id = tab.object_id and

    ix2.object_id = ix1.object_id and

    ix1.index_id <> ix2.index_id and

    ix1.type_desc <> 'HEAP' and

    ix2.type_desc <> 'HEAP' and

    ( select count(*) -- count of common key columns in index starting from left

    from sys.index_columns ixc1,

    sys.index_columns ixc2

    where ixc1.object_id = ix1.object_id and

    ixc1.index_id = ix1.index_id and

    ixc2.object_id = ix2.object_id and

    ixc2.index_id = ix2.index_id and

    ixc1.key_ordinal = ixc2.key_ordinal and

    ixc1.column_id = ixc2.column_id and

    ixc1.is_descending_key = ixc2.is_descending_key and

    ixc1.is_included_column = 0 and

    ixc2.is_included_column = 0

    group by ixc1.object_id

    having max(ixc1.key_ordinal) = count(*) -- no gaps

    )

    =

    ( select count(*) -- count of key columns in ix2

    from sys.index_columns ixc2

    where ixc2.object_id = ix2.object_id and

    ixc2.index_id = ix2.index_id and

    ixc2.is_included_column = 0 )

    order by ix2.name , ix1.name

    ;

  • lauri.pietarinen (6/26/2014)


    May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.

    (c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).

    Note also, that "subset" includes also identical indexes.

    create table t1 (

    c1 int,

    c2 int,

    c3 int,

    c4 int,

    c5 int);

    create index t1x1 on t1(c1);

    create index t1x2 on t1(c2);

    create index t1x3 on t1(c3);

    create index t1x4 on t1(c1,c2);

    create index t1x5 on t1(c1,c2,c3);

    create index t1x6 on t1(c1,c3);

    create index t1x7 on t1(c1 desc, c3);

    create index t1x8 on t1(c1 desc, c3) include (c5);

    create index t1x9 on t1(c3 , c5) include (c4);

    create index t1x10 on t1(c1, c3 desc);

    create index t1x11 on t1(c1, c2, c4, c5);

    select ix2.name, 'is a subset of', ix1.name

    from sys.tables tab,

    sys.indexes ix1,

    sys.indexes ix2

    where tab.name = 't1' and

    ix1.object_id = tab.object_id and

    ix2.object_id = ix1.object_id and

    ix1.index_id <> ix2.index_id and

    ix1.type_desc <> 'HEAP' and

    ix2.type_desc <> 'HEAP' and

    ( select count(*) -- count of common key columns in index starting from left

    from sys.index_columns ixc1,

    sys.index_columns ixc2

    where ixc1.object_id = ix1.object_id and

    ixc1.index_id = ix1.index_id and

    ixc2.object_id = ix2.object_id and

    ixc2.index_id = ix2.index_id and

    ixc1.key_ordinal = ixc2.key_ordinal and

    ixc1.column_id = ixc2.column_id and

    ixc1.is_descending_key = ixc2.is_descending_key and

    ixc1.is_included_column = 0 and

    ixc2.is_included_column = 0

    group by ixc1.object_id

    having max(ixc1.key_ordinal) = count(*) -- no gaps

    )

    =

    ( select count(*) -- count of key columns in ix2

    from sys.index_columns ixc2

    where ixc2.object_id = ix2.object_id and

    ixc2.index_id = ix2.index_id and

    ixc2.is_included_column = 0 )

    order by ix2.name , ix1.name

    ;

    That's a nice approach for the specific problem at hand. I actually submitted a revision (already published) to my original article to add some further functionality to the index searching. The intention of my script was to give the basics of finding overlapping/duplicate indexes and let the reader take it from there.

    Currently, the script in the article will return all of your indexes as part of the duplicate-checking. It will not tell us what is a subset of what---that is left to our diligent research.

    There are a ton of specific cases where more info is needed. For example, what if there were indexes with different fill factors, paddings, or were set up as hypothetical indexes? We may not want to automate every aspect of this search, but would definitely want to add those elements into the query if they matter to our specific database environment.

    I like your query, though, and it's an excellent use of the system views to return a very user-friendly version of the data we are looking for! Thanks for reading, and for the helpful addition!

  • Thank you for your kind words!

    Indeed, there can be any number of reasons for overlapping indexes and it certainly would be irresponsible to just go and drop them without further investigation. I would suggest that in practice it is harder to get rid of an index than creating new ones, even in obvious cases!

  • I've just run this on one of our databases and it returned 64 rows, which was surprising as our existing "duplicate index" checking script returned 0.

    The reason is that our script doesn't match indexes if they have different filter expressions. For example, it doesn't match

    CREATE UNIQUE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_UniqueConstraint_701001] ON [dbo].[T_AccountManualAdjustment]

    (

    [ClientID] ASC,

    [ServiceCategoryID] ASC,

    [UniqueIfInterestCharge] ASC

    )

    WHERE ([Deleted]=(0))

    WITH (...)

    with

    CREATE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_ClientID] ON [dbo].[T_AccountManualAdjustment]

    (

    [ClientID] ASC

    )

    WITH (...)

    Our reasoning is that, in this example, we couldn't replace IX_T_AccountManualAdjustment_ClientID with IX_T_AccountManualAdjustment_UniqueConstraint_701001.

    Any thoughts? Thanks.

    UPDATE. Sorry - I missed that this had already been raised in an earlier post.

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply