Finding and Eliminating Duplicate or Overlapping Indexes

  • Ed,

    This is a terrific article, and should go into the "Troubleshooting Hall of Fame", if there ever is such a place. You clearly put a lot of work into this, and provided thoughtful insight along the way. Thank you for packing so much information into this write-up. I have already shared it with my team.

    We support a lot of vendor-provided databases in health-care, and in the rush to market, database optimization is one of the victims of a shortened development cycle. "If the query works, then we can sell it" seems to be the way of things. So, in addition to missing indexes, and indexes that are not needed, we get duplicate indexes. Your article distills all that is wrong with duplicate indexes into a quick study.

    Thanks

    Jeff Bennett

    St. Louis, MO

  • Thank you for the script.

    A question I ran the query on one of my databases.

    I found one of the table named secPermission has duplicated indexes.

    Both on PermissionID, and one of the indexes is of course Primary Key - clustered index

    the other one is a non-unique, non-clustered on the permissionID.

    I don't know why the developer did that, for me it is apparently wrong.

    Is there a possibility that is correct? Thanks,

    CREATE TABLE [dbo].[secPermission](

    [PermissionID] [INT] IDENTITY(1,1) NOT NULL,

    [PermissionNm] [VARCHAR](40) NOT NULL,

    [PermissionDesc] [VARCHAR](120) NOT NULL,

    [CreateDt] [DATETIME2](7) NOT NULL,

    [CreatedBy] [VARCHAR](128) NOT NULL,

    [ChangeDt] [DATETIME2](7) NULL,

    [ChangedBy] [VARCHAR](128) NULL,

    [DeactivationDt] [DATETIME2](7) NULL,

    CONSTRAINT [PK_secPermission] PRIMARY KEY CLUSTERED

    (

    [PermissionID] ASC

    )

  • This is a common mistake or oversight. Occasionally, there may be a potential use case for creating an additional index on the primary key column(s) with a select few include columns (maybe filter it), but an index identical to the clustered PK is unnecessary, and on top of that, SQL Server still pays the maintenance costs for that extra non-clustered index.

    If they are 100% identical, drop the non-clustered index : )

    -Ed

    sqlfriends (7/8/2016)


    Thank you for the script.

    A question I ran the query on one of my databases.

    I found one of the table named secPermission has duplicated indexes.

    Both on PermissionID, and one of the indexes is of course Primary Key - clustered index

    the other one is a non-unique, non-clustered on the permissionID.

    I don't know why the developer did that, for me it is apparently wrong.

    Is there a possibility that is correct? Thanks,

    CREATE TABLE [dbo].[secPermission](

    [PermissionID] [INT] IDENTITY(1,1) NOT NULL,

    [PermissionNm] [VARCHAR](40) NOT NULL,

    [PermissionDesc] [VARCHAR](120) NOT NULL,

    [CreateDt] [DATETIME2](7) NOT NULL,

    [CreatedBy] [VARCHAR](128) NOT NULL,

    [ChangeDt] [DATETIME2](7) NULL,

    [ChangedBy] [VARCHAR](128) NULL,

    [DeactivationDt] [DATETIME2](7) NULL,

    CONSTRAINT [PK_secPermission] PRIMARY KEY CLUSTERED

    (

    [PermissionID] ASC

    )

  • Thanks! I will drop it.

  • Dropping the index removes the possibility of quickly pulling it back out of your hat, if you do end up needing it. Of course, these are duplicate indexes, so there would not be a need unless it was specifically mentioned in a forced plan. Disabling, and then dropping after a period of time is a safer approach.

    Never disable a clustered index..

    Thanks

    Jeff Bennett

  • Important stuff in your article. Probably 75+% of the new clients I see in my consulting business have duplicate or overlapping indexes - most often created via DTA use.

    I think you should add some huge caution statements to the article about the following:

    1) Search ALL source code for index names before dropping or you can break code.

    2) Dropping unique/pk/microsoft-provided indexes could break things.

    3) A nonclustered index could be a dupe of the clustered index on purpose for execution efficiency reasons. Usage check could reveal the importance of this.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • This response is directed at Kevin

    I am curious how an non-clustered index that matches the clustered index could be more efficient. Can you elaborate on that,please?

    Thanks

    Jeff Bennett

    Saint Louis, MO

  • chudman (7/10/2016)


    This response is directed at Kevin

    I am curious how an non-clustered index that matches the clustered index could be more efficient. Can you elaborate on that,please?

    Thanks

    Jeff Bennett

    Saint Louis, MO

    Sure thing. Say you have a plan that does an index seek for just the one/few columns of the clustered index. Well, the leaf-level of the clustered index is the data - ALL of it (in-row anyway, plus pointers to off-row LOB/overflow data). So touching each row requires accessing potentially a LOT of data compared to a few-column NC index, right? Total page reads (and thus 8K data potentially lifted off disk, using RAM, sent through CPUs, locked/latched/etc) could easily be an order of magnitude or more greater!

    Also, I am not exactly sure of the very low-level physical details but recall that the clustered index key(s) is carried on all NC indexes as a pointer back to every row. So if you create a NC index that is exactly the clustered index columns in the same order it is quite possible that physically there is not even a need to store the columns twice. Hmm, I really should investigate that. Maybe one of the bloggers that reads this will do some testing and post about it. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks, Kevin,

    I am not trying to be snarky or nit-picking. I am really curious. I have come across this a couple of times recently in third-party databases, and was perplexed as to whether the additional, identical, non-clustered index was just adding overhead and not contributing to efficiencies.

    Again, thanks

    Jeff Bennett

    Saint Louis, MO

  • chudman (7/10/2016)


    Thanks, Kevin,

    I am not trying to be snarky or nit-picking. I am really curious. I have come across this a couple of times recently in third-party databases, and was perplexed as to whether the additional, identical, non-clustered index was just adding overhead and not contributing to efficiencies.

    Again, thanks

    Jeff Bennett

    Saint Louis, MO

    I took your request to be genuine, which is why I gave you a detailed, straight-forward answer. 😎

    Like I said, take a look at index usage stats after the app has been running for a long time and see if that dupe NC index is getting hit a lot. If it isn't it could well be there inappropriately/unnecessarily.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi,

    I have a situation where the above script to identify the duplicate indexes does not capture this duplication:

    CREATE UNIQUE CLUSTERED INDEX [IND_table] ON [dbo].

    (

    [ID] ASC,

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    ALTER TABLE [dbo].

    ADD CONSTRAINT [table1] UNIQUE NONCLUSTERED

    (

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    Could you please suggest why this is not been considered as duplicate through the above script?

    Thanks in advance.

    Deepak.

  • Hi Deepak,

    Those indexes are not considered overlapping/duplicate indexes as they do not share sorting functionality. For the sorting columns, order matters. The first index begins with ID, whereas the second index begins with X. For the sake of sorting & searching, they do not overlap.

    That being said, the uniqueness check is not necessary on both. If the second unique index is valid, and all combinations of X, Y, and Z are unique, then we know with certainty that all combinations of ID, X, Y, and Z are also unique. You could remove the uniqueness check on ID, X, Y, Z safely, knowing that the uniqueness enforced by X, Y, Z will ensure both scenarios are covered. The scripts we worked with here did not delve into uniqueness duplication, but adding onto the script a section that validates the overlap of unique indexes/constraints wouldn't be too tough : )

    -Ed

    Deepak Sahu (7/11/2016)


    Hi,

    I have a situation where the above script to identify the duplicate indexes does not capture this duplication:

    CREATE UNIQUE CLUSTERED INDEX [IND_table] ON [dbo].

    (

    [ID] ASC,

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    ALTER TABLE [dbo].

    ADD CONSTRAINT [table1] UNIQUE NONCLUSTERED

    (

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    Could you please suggest why this is not been considered as duplicate through the above script?

    Thanks in advance.

    Deepak.

  • Ed Pollack (7/11/2016)


    Hi Deepak,

    Those indexes are not considered overlapping/duplicate indexes as they do not share sorting functionality. For the sorting columns, order matters. The first index begins with ID, whereas the second index begins with X. For the sake of sorting & searching, they do not overlap.

    That being said, the uniqueness check is not necessary on both. If the second unique index is valid, and all combinations of X, Y, and Z are unique, then we know with certainty that all combinations of ID, X, Y, and Z are also unique. You could remove the uniqueness check on ID, X, Y, Z safely, knowing that the uniqueness enforced by X, Y, Z will ensure both scenarios are covered. The scripts we worked with here did not delve into uniqueness duplication, but adding onto the script a section that validates the overlap of unique indexes/constraints wouldn't be too tough : )

    -Ed

    Deepak Sahu (7/11/2016)


    Hi,

    I have a situation where the above script to identify the duplicate indexes does not capture this duplication:

    CREATE UNIQUE CLUSTERED INDEX [IND_table] ON [dbo].

    (

    [ID] ASC,

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    ALTER TABLE [dbo].

    ADD CONSTRAINT [table1] UNIQUE NONCLUSTERED

    (

    [X] ASC,

    [Y] ASC,

    [Z] ASC

    )

    Could you please suggest why this is not been considered as duplicate through the above script?

    Thanks in advance.

    Deepak.

    Also I would bet that ID is the ubiquitous identity(1,1) that almost everyone puts on their tables (even when they shouldn't), and if so it is already virtually certain to be unique. I have seen VERY BAD DATA FLAWS when client's added that to some other column(s) they needed to be unique - thus allowing DUPLICATE VALUES in their data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Ed 🙂

  • Wow, this is really useful. I would finish this off with

    ORDER BY DUPE1.schema_name, DUPE1.table_name, DUPE1.key_column_list

    which shows how many hits we're getting :crazy:

    I've got a bunch with the same or overlapping columns in the index but with different columns included. I'm not sure if this is worth it. We're a DW so originally decided to err on the side of having the extra index if it was a close call; that we would endure the extra space requirement and processing time at night in order to give our users (slightly) faster queries during the day.

    Ken

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply