Your thoughts on dropping redundant indexes...

  • A coworker and I are looking at cleaning up the indexes on a few of our production databases. An obvious 1st place to start is to find and remove duplicate indexes. So, no issues there.

    What I want is to get, is the community’s thoughts regarding “redundant” indexes…

    For example:

    Index 1: (Col_1, Col_2) INCLUDE (Col_3, Col_4, Col_5)

    Index 2: (Col_1) INCLUDE (Col_2, Col_5)

    Clearly, Index 2 is redundant as it’s fully covered by Index 1… But… Depending on the size of certain columns, there is the potential that Index 1 could be MUCH larger than index 2 (has far more pages).

    So, my questions are:

    1)Do I want to drop index 2, based solely on the fact that it’s redundant?

    2) Do I need to set thresholds in terms of size differences?

    a.If they are close in size, drop index 2…

    b.If Index 1 is N times larger than index 2, keep index 2 for performance reasons…

    c.If so, what’s the threshold?

    3)Do I need to do a deeper dive into the index usage?

    a.Say I’ve got a proc that executes 20K times a day and it uses index 2 due to its smaller size, do I want to risk impacting the performance of that proc by forcing it to use the larger index 1?

    4)What other considerations need to be taken into account?

    5)Aside from using the before & after picture of sys.dm_exec_procedure_stats, what else do I need to do to check for degraded proc performance once an index has been dropped?

    Example 2:

    Index 1: (Col_1, Col_2) INCLUDE (Col_3, Col_4, Col_5)

    Index 2: (Col_1, Col_2) INCLUDE (Col_4, Col_6)

    In this scenario, either index can be made redundant, simply by adding Col_6 to index 1 OR adding Col_3 & Col_5 to index 2 (for the sake of clarity, let’s call these “delta columns”)…

    Questions:

    1)Do I want to do this?

    a.If the delta columns are small and won’t substantially impact the size of the updated index? (I’m thinking YES)

    b.If the delta columns are large and would have a significant impact on the size of the updated index? (I’m thinking probably not)

    c.Assuming that A & B are valid assumptions, what’s a reasonable “cut-off” point?

    Your input would be appreciated as would any links or articles on the topic. Google pulls up plenty of info regarding duplicate indexes but surprisingly little regarding redundant indexes.

    Thank you in advance,

    Jason

  • Too many "it depends" here. In general I always look to reduce indexes/indexing when they are direct or near duplicates. Just a LOT of cost in many ways for them. I also look to coalesce a inc b,c and a inc b,d into a inc b,c,d.

    Yes, index usage does factor into this. The most important part of that is if they are never used get rid of them when appropriate! 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/17/2016)


    Too many "it depends" here. In general I always look to reduce indexes/indexing when they are direct or near duplicates. Just a LOT of cost in many ways for them. I also look to coalesce a inc b,c and a inc b,d into a inc b,c,d.

    Yes, index usage does factor into this. The most important part of that is if they are never used get rid of them when appropriate! 🙂

    Kevin - Thanks for the feedback. You've kind of confirmed my suspicion that there isn't a simple formula for this. As far as coalescing indexes, that's what I was trying to convey with example #2 and it's associated questions.

    Thanks,

    Jason

  • Yes, You can just add the the col_3 to the first index and its took care of both need.

    Also, Before drop any index, look for indexusage statistics report.

    RightClick on DB-->GO to Reports -->Then click on StandardReports -->Then Click on Indexusage Statistics

    This report will tell the index you wanted to remove was used or not.

  • CMRWB (6/17/2016)


    Yes, You can just add the the col_3 to the first index and its took care of both need.

    Also, Before drop any index, look for indexusage statistics report.

    RightClick on DB-->GO to Reports -->Then click on StandardReports -->Then Click on Indexusage Statistics

    This report will tell the index you wanted to remove was used or not.

    Thanks CMRWB - I know I "can" do that, but it doesn't necessarily mean I should. If Col_3 is, for example a Comments VARCHAR(8000) column and the other columns on the index are comparatively small, (all int columns for instance), adding Col_3 would cause the updated index to grow immensely (compared to its original size) and potentially impact queries that use that currently use that index.

    Index usage stats are only helpful in finding unused indexes, they don't help in identifying redundant indexes. The goal here is to get rid of indexes that are being used but can be eliminated because there is another index that could be used if the "preferred" index is dropped.

    I'm just trying to get a sense of when we should do this and when we shouldn't.

  • Using index operational stats, if you see an index where the total writes exceeds the total reads, then it probably useless. But generally speaking, I'm comfortable with non-clustered indexes contributing as much as 30% to a table's total allocated storage, so long as each index is being used. Over time, if you see a need to add an additional index, then maybe drop the least used one to offset the cost of index maintenance.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/17/2016)


    Using index operational stats, if you see an index where the total writes exceeds the total reads, then it probably useless. But generally speaking, I'm comfortable with non-clustered indexes contributing as much as 30% to a table's total allocated storage, so long as each index is being used. Over time, if you see a need to add an additional index, then maybe drop the least used one to offset the cost of index maintenance.

    Thanks Eric - I do like the idea of using the operational stats to find indexes that cost more to maintain than what they're worth.

    That, however doesn't necessarily help with the redundant index problem. The issue is that when the optimizer has multiple indexes (all of which will satisfy the query requirements), it will choose the smallest one. That, of course, doesn't mean it can't use the next smallest one w/o suffering a noticeable performance impact...

    I do suppose that if an index is "under used", it would indicate that the query(s) that use it, aren't running very often and I may not care if they are marginally impacted by being forced to use a larger index.

    Thank you sir!

  • Jason A. Long (6/17/2016)


    Eric M Russell (6/17/2016)


    Using index operational stats, if you see an index where the total writes exceeds the total reads, then it probably useless. But generally speaking, I'm comfortable with non-clustered indexes contributing as much as 30% to a table's total allocated storage, so long as each index is being used. Over time, if you see a need to add an additional index, then maybe drop the least used one to offset the cost of index maintenance.

    Thanks Eric - I do like the idea of using the operational stats to find indexes that cost more to maintain than what they're worth.

    That, however doesn't necessarily help with the redundant index problem. The issue is that when the optimizer has multiple indexes (all of which will satisfy the query requirements), it will choose the smallest one. That, of course, doesn't mean it can't use the next smallest one w/o suffering a noticeable performance impact...

    I do suppose that if an index is "under used", it would indicate that the query(s) that use it, aren't running very often and I may not care if they are marginally impacted by being forced to use a larger index.

    Thank you sir!

    OK, I understand you have identified two indexes that are possible redundant of each other, and you want to drop one index, but you want to know which one to drop. An an experiment take a query that currently uses one of the redundant indexes, and then leverage the INDEX query hint to create two variations of this query; one variation explicitly references one index, and the other variation references the other index. Using SET STATISTICS IO ON and SET STATISTICS TIME ON, determine which variation optimizes I/O, CPU, and runtime duration.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If you are reviewing the indexes then you may also want to review the decision about which index is clustered. By the time you have coalesced Index 1 and 2, in the original example, you may well have included the majority of the columns in the table and you may have a good candidate for the clustered index for the table.

  • Recombinant (6/21/2016)


    If you are reviewing the indexes then you may also want to review the decision about which index is clustered. By the time you have coalesced Index 1 and 2, in the original example, you may well have included the majority of the columns in the table and you may have a good candidate for the clustered index for the table.

    It is EXCEPTIONALLY RARE that you should pick a bunch of columns as the clustered index. Those columns would get carried on every nonclustered index as the pointer back to each row, (severely) bloating all your NC index structures.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/21/2016)


    It is EXCEPTIONALLY RARE that you should pick a bunch of columns as the clustered index. Those columns would get carried on every nonclustered index as the pointer back to each row, (severely) bloating all your NC index structures.

    The critical factor is the physical width of the key rather than the number of columns. A composite key can be narrow or wide (number of bytes) depending on its components. It can make more sense to include additional columns in the key rather than have sql server add a uniquifier each row. If there is a set of narrow columns that allows the composite key to be unique, and the constituent fields have been chosen economically, then the optimiser can perform range scans using multiple columns.

    Generally, the most pointless clustered indexes are those that use a single column surrogate key value. In these cases the optimiser can never make use of a range scan.

  • Recombinant (6/21/2016)


    TheSQLGuru (6/21/2016)


    It is EXCEPTIONALLY RARE that you should pick a bunch of columns as the clustered index. Those columns would get carried on every nonclustered index as the pointer back to each row, (severely) bloating all your NC index structures.

    The critical factor is the physical width of the key rather than the number of columns. A composite key can be narrow or wide (number of bytes) depending on its components. It can make more sense to include additional columns in the key rather than have sql server add a uniquifier each row. If there is a set of narrow columns that allows the composite key to be unique, and the constituent fields have been chosen economically, then the optimiser can perform range scans using multiple columns.

    Generally, the most pointless clustered indexes are those that use a single column surrogate key value. In these cases the optimiser can never make use of a range scan.

    Your focus on range scans sounds like lots of data warehouse queries. Outside of that we will need to agree to disagree on some key design philosophies.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I have see cases where, for example, a clustered index was used on a date column and, of course, in queries. In this case that I'm speaking of, there was supposed to be an initial seek followed by a range scan and it would do just that. However, the table was very wide and that made the quite slow and resource intensive. Adding a non-clustered index that had exactly the same key as the clustered index cause the query to run much faster. Except that one was clustered and the other was not, the two indexes were identical.

    In this same scenario, a 2 column non-clustered index was previously chose by SQL Server as the index of choice instead of using the clustered index because it was much more narrow... even though the date column based query could only use the 2nd key column of the 2 column index, which was the date column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (6/21/2016)

    Your focus on range scans sounds like lots of data warehouse queries. Outside of that we will need to agree to disagree on some key design philosophies.

    The design philosophy I have is to design the indexing for the access patterns expected in an application. Yes, DWH is different to OLTP but would you classify CRM, with large numbers of historic records, as DWH or OLTP? At the end of the day you have to design, test, compare query plans, with what you hoped would happen, and iterate until you have a good design. Rules of thumb can be useful but they only get you so far. I wouldn't fly in a plane that was just designed by rules of thumb.

  • Find the queries that are using each index to do a search or scan.

    THEN

    Look at the code and see if the entire index is being used.

    Do this for all queries until you have an exact idea what will happen if you drop one or the other index, or even, both.

    You could calculate what index is actually needed and that might be neither of the 2.

    THEN

    Drop both indexes in QA and see what the system asks for after a week or so and see if the missing index is confirmed by your calculation.

    THEN

    Apply the change to DEV and QA.

    How do I find the queries and procs which may be hitting this index?

    DECLARE @IndexName AS NVARCHAR(128) = 'IX_my_Index_That_I_Am_thinking_of_dropping';

    -- Make sure the name passed is appropriately quoted

    IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);

    --Handle the case where the left or right was quoted manually but not the opposite side

    IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;

    IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

    -- Dig into the plan cache and find all plans using this index

    ;WITH XMLNAMESPACES

    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT

    stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,

    obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,

    obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,

    obj.value('(@Table)[1]', 'varchar(128)') AS TableName,

    obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,

    obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,

    cp.plan_handle,

    object_name(qp.objectid),

    query_plan

    FROM sys.dm_exec_cached_plans AS cp

    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)

    CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)

    OPTION(MAXDOP 1, RECOMPILE);

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply