Indexes with more writes than reads

  • I have inherited a database that is over-indexed, i.e. there are sometimes 10-20 indexes on a table. The performance is at times not great due to blocking from long running queries. I want to clean up the indexes as a starting point.

    Through a query I found some time ago on the SQLCat blog I have discovered a large number of indexes in the database that have a huge disparity between reads and writes. The range of difference is sometimes almost 2 million more writes than reads. Should I just drop the indexes that have say, more than 100,000 more writes than reads and then see what the Missing Index DMVs tell me after a few days of running without those indexes?

    In some cases there are a few hundred thousand reads but maybe a million writes on the index. Thus, there are a fair number of reads happening, just not in comparison to the number of writes. In some cases there are almost no reads and a million or more writes. I am obviously dropping those indexes. I just am not sure what to do about the indexes that do have a fair number of reads. I would appreciation suggestions or things I should consider.

  • [Quote]In some cases there are a few hundred thousand

    reads but maybe a million writes on the index.[/quote]

    That's not the best way to determine if you should keep an index. There are many factors that determine the number of reads and writes. What you know for sure is that the existence of reads means that the Optimizer is using that particular index.

    I think you're on the right track though, 20 indexes on a transactional db/table is probably too many. I would start by looking for an indexes that are not being used first. If there's 20 indexes its likely that some of them are not being used. There are DMV queries you can use to find unused indexes (check the SSC scripts section of this sight, I have seen some good ones there.)

    Index tuning is a complex art but starting by removing unused (a barely ever used) indexes is a good place to start.

    Note: I'm assuming that you are talking about a transactional DB. If you're talking about a data warehouse then 20 indexes may be just fine depending on the environment and what it's used for.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Careful dropping indexes. The DMVs that report usage reset on restarts of SQL Server. Just because an index is not used recently doesn't mean it isn't used on a monthly, quarterly, or yearly report that needs it to run in a timely fashion.

  • Lynn Pettis (7/17/2015)


    Careful dropping indexes. The DMVs that report usage reset on restarts of SQL Server. Just because an index is not used recently doesn't mean it isn't used on a monthly, quarterly, or yearly report that needs it to run in a timely fashion.

    True. And it might be better overall, if that index is really needed for those processes but not the rest of the time, to just create the index at the start of that process, and to drop it when finished. Of course, "it depends" applies here, and this would need a bit of testing to determine.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thanks for the comments and help. The number of reads on some of these did give me pause, despite the out of balance ratio to writes. I think the box has not been restarted for about a month so the info on these indexes should be about that old. I will just drop the ones that have very few reads, or no reads at all, and keep checking usage as time goes on.

  • I see the same issue occcuring with a lot of the primary keys as well. Would the concensus be the same there too?

  • The number of reads on some of these did give me pause, despite the out of balance ratio to writes.

    There's no reason (based on the information you have provided thus far in this thread) to think that there is an "out of balance ratio [of reads/writes]". I have never read anywhere that there is an optimal read/write ratio. Looking at the ratio of reads/writes is not where I start when I'm tuning my database. Are my indexes getting used? How often? What kind of performance gains are they providing? How much impact are they having on my Inserts/Updates/Deletes? ... These are the kinds of questions I ask myself when tuning indexes.

    I will just drop the ones that have very few reads, or no reads at all, and keep checking usage as time goes on.

    Not so fast. What if they have very few reads because they are they are being used by queries that would otherwise generate many more reads? Let's say, for example, that I have a table with millions of rows... I have a query that runs often and only needs to retrieve 200 rows so I create a filtered index that handles that query. With that index in place my query generates virtually no reads; without it my query scans millions of rows and generates tons of reads... I'd say we want to keep that index. One of the benefits of an index is that, when used, it helps your query complete with drastically fewer reads.

    Again, using the number of reads/writes to determine if you need to keep or remove an index is not the way to go.

    I see the same issue occurring with a lot of the primary keys as well. Would the consensus be the same there too?

    More or a less a different topic and, as a rule, primary keys are good. If it's a clustered primary key (the SQL Server default) then it creates a clustered index which is generally preferable to a heap (a table that does not have a clustered index).

    I want to reiterate my point that Index tuning is a complex art. There are many types of indexes Clustered, nonclustered, filtered, columnstore, XML... all with different purposes, benefits and pitfalls. We have not talked about index fragmentation. Again - indexing is a complex science. There are endless good articles and books about the subject which are worth reading. My advise is to learn, learn, learn then test, test, test.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan,

    Thanks for the comments. If an index has 50 reads and 1 million writes, for example, then 99.9 of the usage of that index is spent on maintenence. That is not going to be an index that will be very helpful. I have that sort of scenario in about 20 of the indexes I came across and in many of the primary keys. Some are clustered primary keys, but many of them are not.

  • lmarkum (7/17/2015)


    Alan,

    Thanks for the comments. If an index has 50 reads and 1 million writes, for example, then 99.9 of the usage of that index is spent on maintenence. That is not going to be an index that will be very helpful. I have that sort of scenario in about 20 of the indexes I came across and in many of the primary keys. Some are clustered primary keys, but many of them are not.

    It has occurred to me that we're probably talking about two different things with respect to reads

    If by "reads" you mean user_seeks, user_scans and user_lookups FROM sys.dm_db_index_usage_stats and by "writes" you are talking about user_updates FROM sys.dm_db_index_usage_stats then you are correct: 50 reads and 1 million writes would mean that index is probably doing more harm than good. I thought that by "reads" you were talking about logical reads, physical reads, read-ahead reads, etc; the stuff you would get by examining STATISTICS IO or extended events. :hehe:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan,

    We were talking about two differenet things. I was talking about reads and writes against indexes, which is shown is the usage stats DMV and not SET STATISTICS IO ON.

  • lmarkum (7/17/2015)


    I see the same issue occcuring with a lot of the primary keys as well. Would the concensus be the same there too?

    No. Primary keys are part of your logical database design. About all you'd do to a pk is, after a huge amount of testing, consider changing a primary key which is enforced by a clustered index to rather use a nonclustered. But that's a huge amount of work and testing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • lmarkum (7/17/2015)


    I have inherited a database that is over-indexed, i.e. there are sometimes 10-20 indexes on a table. The performance is at times not great due to blocking from long running queries. I want to clean up the indexes as a starting point.

    My personal feeling is that cleaning up indexes for the "performance is at times not great due to blocking from long running queries" is absolutely the wrong place to start. Identify and repair the code exhibiting the performance problems first. Yes, it may boil down to consolidation or removal of indexes but I don't believe that you've specifically identified that as the main problem for your queries. There's better fruit to pick with much higher ROI.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, as I identify long running queries I come across tables with too many indexes. I examine the usage of those indexes with the usage_stats DMV and find that a lot of them are write heavy, few reads. If I clean them up then DML performance improves and I can implement indexes that improve my long running queries.

  • lmarkum (7/18/2015)


    Jeff, as I identify long running queries I come across tables with too many indexes. I examine the usage of those indexes with the usage_stats DMV and find that a lot of them are write heavy, few reads. If I clean them up then DML performance improves and I can implement indexes that improve my long running queries.

    That's certainly a thoughtful approach. My concern is that if only indexes are considered as a problem or a fix for INSERTs or SELECTs, then huge opportunities in scale and performance might be missed. Sometimes indexes are the answer. Sometimes they are not. It truly depends and I just want to make sure that people know that sometimes, indexes are not the answer and you have to bear down and actually change the code. As a hugely over-simplified example, if someone is using something like the FORMAT function, no index-therapy in the world will help you overcome the fact that FORMAT is 44 times (4,400% as a number meant to impress) slower than the good ol' fashioned CONVERT function. Another over-simplified but still very pertinent example would be the use of rCTEs (Recursive CTEs) that count (as in produce a sequence of integers) when compared to the use of the much higher performing cCTEs (Cascading CTEs, Itzik Ben-Gan's method) or other CROSS JOIN based pseudo-cursor methods. A not so simple example would be the rewrite of WHERE and JOIN clauses to ensure that they are SARGable and CAN actually make effective use of indexes.

    Index therapy can and certainly does help but the real performance will be derived from thoughtful analysis and changes to the code, which may also help index therapy, as well. When combined, you can sometimes (many times, actually) achieve performance improvements and reduction in resources that are literally measured as thousands of times improvement.

    Don't discount short running code, either. Something that takes only several hundred milliseconds are frequently overlooked by people because they don't show up as "long running" code. Yet, if they are hit upon tens of thousands of times a day, they become a major but hidden-from-most performance problem capable of causing substantial blocking, timeouts, and even deadlocks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I completely agree that re-writing code is important and it is good to be reminded of the importance of re-writing code. Much of the code I see in my environment is pumping thousands or hundreds of thousands of rows into table variables, for example. I am urging the dev team to at least test a re-write with temp tables to see how that affects performance. I am also going to be examining how we can inline some frequently used UDF's so the optimizer can actually "see" what is going on and perhaps choose better query plans. The main thrust of this question though was about determining how best to handle indexes that are getting little to no use from current code, especially as compared to the cost of maintaining those indexes through Insert, Update, Delete operations.

    We have a lot of PK's that are non-clustered for some reason. They are almost always an IDENTITY column. We then usually have a clustered index on some combination of INT and VARCHAR. When I look at read/write usage on these and page split information, which my monitoring software is giving me warnings about, I see that the clustered indexes are at the top of the list for page splits, in most cases. For example, we have a table where the clustered index is VARCHAR(50) followed by an INT value. I think the columns are email and a CustomerId. On this same table we have an IDENTITY INT column as the PK. I have already changed one of these so that the PK is the clustered index and the previous clustered index is a non-unique non-clustered index. This allowed for a narrow, ever increasing clustered index.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply