Unique Indexes Are Code; Non-Unique Indexes Are Data

  • Comments posted to this topic are about the item Unique Indexes Are Code; Non-Unique Indexes Are Data

  • Not a bad idea.

    But how do you protect against an index slowing other queries down on Select specifically and the usual Insert/Delete/Update in general.

  • Thanks for an interesting article.

    From a more theoretical approach, constraints are the things enforcing business logic, not indexes.

    As you have observed, indexes are for performance. Unique vs non-unique indexes serve their own purposes for performance but its the unique constraint that throws an error for duplicates, not the unique index.

    It appears to me that you should be comparing constraints, not indexes in Red Gate to allow you to synchronise constraints but not indexes.

    As I understand it, if a constraint (unique or primary key) is identified as missing between environments, creating the constraint in the target will also create a unique index, but all other indexes can be maintained separately.

  • This article offers some interesting thoughts and perspectives, so thanks for sharing!

    But the content feels like a rant where author is searching for ways to make a point.

    Does it really matter if we think of one type of index as code and another as data? It's kind of a strange analogy. I understand the points why the author saying this but I don't understand the benefit. To me thinking of unique indexes as "code" and non-unique indexes as "data" isn't really that important. What's important are their characteristics and knowing when to use which.

    I work with developers regularly who are interested in looking at execution plans.

  • Daniel - no offense, but when I read these articles I want to actually learn something, not hear a soapbox speech about the software made by the parent company of this forum. Solutions, workarounds, or ways your DBA team have come up to handle this issue - all of that would've been a welcome conclusion to this article.

    With that said...three things:

    1) Have you investigated Redgate's Comparison SDK to see if you can implement the changes you want yourself?

    2) The point where you draw the line between Database Developer and DBA is not a line at all but in fact a huge gray area. I have been on both sides and ultimately the responsibility can fall on both sides. Also IMO any junior developer / senior out of college can write SQL code and get back correct results; same with unique constraints/indexes. The difference between these regular developers and a DATABASE Developer IMHO is that the latter does it in a fashion that meets business logic, performs optimally, and provides minimal maintenance for DBAs. Just as they understand how to query in sets (instead of cursors/loops), they also understand how to use filtered indexes, maxdop, transaction isolation levels, etc. All of this IS code. So bottom line - I don't agree that Non-Unique indexes are just "Data."

    Every place I've worked at, the developers had access to a fairly recent (1-3 months) restored copy (some confidential data may be deleted) of a production database in which they could test index performance. So there's really no excuse why your "Database" developers aren't able to do this other than Politics and/or red tape. This is where you should be putting your Soapbox. Educate your "database" developers to actually be database developers.

    3) Eventually there is a handoff between dev and dba in which case DBA takes over responsibility on performance tweaking. So you as DBAs should ALSO be checking in your code changes into source control as well, especially since your changes. Obviously not if you're doing the fix at 3am, but as some point your production environment should have a current source repository branch that matches the sql object in production. So that next time your developers make changes, they have to merge with all your index changes prior to submitting code for deployment.

    I realize this isn't probably what you want to hear - but either way I hope this helps

  • I think Daniel does raise some interesting points. I regularly track missing indexes and unused indexes to see if any indexes need to be created or dropped. I believe that should be part of any DBA's regular tasks if they are interested in continuously improving performance.

    However, I disagree on two points. I believe there are many opportunities in the course of development for a developer to suggest a non-unique index. A developer should be able to anticipate which tables may grow large and which queries may require suitable indexes to improve the execution plan. And while I like the distinction of code indexes and data indexes, I also think that developers sometimes need to create application domain data (i.e. Non-Unique indexes).

    I also am not in favor of automatically generating indexes. Automatically gathering the data is one thing. But any code change should be carefully analyzed and tested before rolling out to production. So while I may be automatically saving data on missing/unused indexes, I manually review that data and carefully select and test those suggestions prior to full implementation.

    Gordon

    Gordon Pollokoff

    Wile E. is my reality, Bugs Bunny is my goal - Chuck Jones
    Walking on water and developing software from a specification are easy if both are frozen. - E. Berard
    Doing more things faster is no substitute for doing the right things. - S. R. Covey
    Any sufficiently advanced bug is indistinguishable from a feature.- R. Kulawiec

  • Thanks for sharing this article with us Daniel.

    There is one statement you make which I found interesting.

    "It doesn't make sense to maintain one set of indexes for all copies of the database. Each database needs its own."

    However you didn't follow with any content explaining the statement. For instance why is it important that the development environment needs different indexes to production. If I only have 10% of the production data in dev why is it important that I have indexes tuned for that data landscape? I'm reluctant to agree at all with this statement and I was rather hoping you would supply some content that supported this statement and oppose my disagreement.

    You're clearly a very intelligent guy who is striving for high performance databasing and you have my admiration for that. However your article depicts a working environment were there is a distrust and lack of cohesive collaboration between the individuals developing an application and those supporting it. The article promotes a technical approach which further underpins this disjointed business culture.

    Uniting the DBA and development teams will serve you well. Don't expect technology to solve poor communication between teams.

    Altering index structures in production without any control change process or source control is a massive no no for medium and enterprise size solutions. If you're working for a small company then obviously shooting from the hip is much more effective.

    You should take the index analysis code you've developed and focus on making it correctly identify potential efficiencies in production indexes only. You should then promote these changes to the development and test teams. In an ideal world you should be able to test the changes on a copy of production data before deploying them.

  • Secondly I feel that if you had made the main focus of your article about the solution you've developed for monitoring and analysing indexes that you would have found a much more appreciative and complimentary audience.

  • Sorry, but what you propose doing is, in my experience, DISASTROUSLY BAD!!!

    1) No where did I see mention of things like "duplicate", "overlapping", etc. The missing indexes subsystem (MIS) will CRUSH you with INCLUDED-column bloat, duplicate and overlapping indexes, etc. I once spent over 200 man hours at a client unwinding their rampant and un-informed use of MIS and it's nasty upscale cousin the Database Tuning Advisor. Locking/blocking/deadlock issues, horrible concurrency issues, tlog bloat, data modification bloat, slow query compiles, etc.

    2) No where did I see mention of the many CREATE INDEX options, which MUST be used (especially fill factor) if you have any hope of having a good indexing strategy.

    3) No where did I see mentioned that index usage is tracked since SQL Server start time. So if you start dropping indexes after the server was started just an hour or a day or a week ago you are in BIG trouble. You best both be up for a substantial time AND be up long enough to cover potentially important time-period points such as end/start of month/qtr/yr, etc. Without that you could be dropping some critically important indexes.

    4) No where did I see mentioned that developers or admins could use INDEX HINTS in their code. Now you can start breaking code if you don't first search for it in all code sources. And I mean ALL code sources: sprocs/UDFs/views in ALL databases (including system ones), SQL Agent jobs, ALL external repositories including source code for ALL applications, admin scripts, powershell, etc.

    I have seen all of the above at clients, some of it far too frequently. On the one hand it makes me go "KACHIINNNGGG!!" because it is more work. But I still hate seeing it. There is PLENTY of work for me to do just fixing the common design/dev/mx/etc issues everyone does. These things you mention are a case of "a little knowledge is a DANGEROUS thing" and are completely avoidable.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Great solution and interesting article. I see your point of view, but the world is not entirely black and white. I also think automated index maintainance is risky, but hey, I used your suggestions, analyzed them and used them:-). I even got some ideas I overlooked before :cool:.

    Thanks!

  • I'm sorry , but I can't even begin to describe how much I disagree with this article

    my main gripe is the fact that the author has not realised that a unique index is effectively a composite object - a combination of a unique constraint and an index together.

    for example

    The usage of a non-unique index is optional. Just because an index exists doesn't mean that the server will use it.

    how is this different from a unique index - if the query optimiser decides to perform an index seek and a bookmark lookup against the PK then the unique index is also optional... there are so many points in here that I don't think my web browser will let me actually type them all

    It doesn't make sense to maintain one set of indexes for all copies of the database. Each database needs its own

    have you ever tried to do an automated build that will deploy to 200 databases where you need to perform some operation on a table and you are blocked because an index is not in source code..... (change the data type on a column ??)

    you can not expect the database developers to maintain the non-unique indexes. They just can't do it. There is no way to create them in the development environment and know that they'll work in production

    I am kinda spitting feathers at this one... unique indexes enforce data integrity as well as providing a potential performance boost.... if you build a system in dev without your constraints then how can you ensure your code doesn't break when it hits live??

    MVDBA

  • MVDBA (6/26/2014)


    The usage of a non-unique index is optional. Just because an index exists doesn't mean that the server will use it.

    how is this different from a unique index - if the query optimiser decides to perform an index seek and a bookmark lookup against the PK then the unique index is also optional

    I think he means the creation of a non-unique index is optional, not that it's optional for the query optimizer to use it.

    It doesn't make sense to maintain one set of indexes for all copies of the database. Each database needs its own

    have you ever tried to do an automated build that will deploy to 200 databases where you need to perform some operation on a table and you are blocked because an index is not in source code..... (change the data type on a column ??)

    Good point.

    you can not expect the database developers to maintain the non-unique indexes. They just can't do it. There is no way to create them in the development environment and know that they'll work in production

    I am kinda spitting feathers at this one... unique indexes enforce data integrity as well as providing a potential performance boost.... if you build a system in dev without your constraints then how can you ensure your code doesn't break when it hits live??

    I think you misread. He says developers cannot be expected to maintain the non-unique indexes.

    John

  • Thank you all for your feedback. I do apologize for the tone of the article; to some extent, it IS a rant. However, it is also an article about a specific database problem and a possible solution with working code examples. If that is what interests you, then I suggest that you focus on that aspect of the article.

    This response will be about the technical aspects of the automatic index maintenance process.

    About creating an index that could have a negative impact on performance: the process is self-correcting. Any index it creates is always a potential candidate to be dropped. If it creates an index that ends up harming performance, the index will go away. On my servers, the process detects the problem and drops the offending index sooner than I do.

    About duplicate and overlapping indexes: Yes, if I created every missing index that SQL Server told me to create, I would be creating too many indexes. Many would be duplicates. Many would have overlapping columns. The database would use far more disk space for the indexes than it would for the data itself. Conversely, if I dropped every index that SQL Server told me to drop, I would be dropping too many, and I would start having performance problems. The process avoids this by creating and dropping just one index at a time, and it runs on a schedule once every hour, or once every fifteen minutes, or however often is appropriate. After a lot of queries, SQL Server might have ten index suggestions for a table, many of which have overlapping columns. As soon as I create one of them, SQL Server forgets the others and starts over, so that any missing index suggestions it makes after that point are based on queries that were run on the table after I created the first index. Those new suggestions might overlap each other, but they are much less likely to overlap the index that I created.

    Yes, index usage gets reset every time the service restarts, so if you're not careful, you could drop indexes that look like they're not needed just because they haven't been used since (say) midnight last night. However, I did say that the process is highly customizable; my actual code has lots of safety checks that aren't in the article. When looking for an index to drop, the process makes a list of candidates and then selects the worst index out of the list. The criteria for selecting which indexes are droppable is customizable, and the process for ranking the droppable indexes is independently customizable. I could decide that indexes are only droppable if they are nonclustered and non-unique, if they have been written to 1000 times or more, if they have not been used in any seek, scan, or lookup in the last 90 days, and if they belong to tables that use more disk space for indexes than for data. Then, out of those droppable candidates, I might pick the one to drop with the most writes and the fewest reads, or the one with the most disk space, or however I choose. Each organization will have different requirements that determine which indexes are important to keep.

    So, the code I've included in the article is a starting point, not a finished product. I didn't post my current code because my rules for determining which indexes to create or drop are only valid for my organization at the current time; they might be different for you, and they might even be different for us in a few months.

    And yes, this process might not work for you at all if you have index hints in your database code. If you work in that kind of environment, then you just can't create and drop indexes automatically. My current employer is not like that, so this process makes more sense. We only had one index hint in our entire database, and yes, it blew up within a few days of implementing this process, so the point is very valid. My personal experience is that the database server is usually better at generating query execution plans than I am, so I don't try to use index hints -- I let the server decide which indexes to use. It makes sense for us to take the server's suggestions about which indexes to create and drop.

  • This response will be about the rant.

    About unique constraints vs. unique indexes: there are such things as unique indexes that are not constraints. They do enforce uniqueness just as effectively as unique constraints; an attempt to insert duplicate data will fail with a similar error whether the rule is enforced with a unique constraint or a unique index. Furthermore, there are situations where you cannot use a unique constraint; for example, unique constraints cannot be filtered but unique indexes can. If I was able to check in and deploy all constraints and ignore all indexes, that would almost but not quite solve my problem. I really need to be able to treat primary keys, unique constraints, and unique indexes one way and treat non-unique indexes a different way.

    About code vs. data: yes, it seems like a strange way to refer to the difference between unique and non-unique indexes, and it's an unnecessary distinction to make in most situations, but it's the distinction that makes the most sense to the company that creates the database source control and deployment software. I didn't come up with that nomenclature myself; it came up in conversation with a Red Gate employee. I was trying to explain why I want to capture some indexes in source control and not others, and David Atkinson said, "So you want to treat unique indexes like code and non-unique indexes like data." For the purpose of source control and deployment, yes, that's how I want to treat my indexes. Code gets tracked and deployed; data doesn't. Code is kept identical on the test and production environments; data isn't. I don't take all the data that gets entered into the production database and go back and check it into source control; it's not code. That's how I want to handle my non-unique indexes.

    "It doesn't make sense to maintain one set of indexes for all copies of the database. Each database needs its own": The context for that statement came before the statement. Non-unique indexes perform differently between development and production, between test and production, and between one production database and another. It probably wouldn't hurt the development and test databases to have the same indexes as the primary production database, but it could hurt the primary and secondary production databases to make them have the same indexes as each other. (A good example deserves a whole response to itself.)

    About unique indexes and query optimization: Yes, a unique index is also an index. The server can choose to use one for query optimization, and it can choose not to, just as it can choose to use or not to use a non-unique index for query optimization. However, the rule-enforcing behavior of a unique index is not optional -- the server can't arbitrarily choose not to prevent duplicate data from being inserted into a table with a unique index.

    I'm not suggesting that anyone should have an automatic process create and drop unique indexes. Unique indexes are the developer's responsibility to create, and they should be checked into source control, vigorously tested, and deployed to all copies of the database in all environments. Of course the code will break without them. My process, by very intentional design, never touches unique indexes. That's the whole point of the "rant" part of the article -- that when capturing database changes in source control and deploying them, it should be possible to treat unique indexes one way and non-unique indexes another way.

    I'm also not saying that everyone should treat unique and non-unique indexes differently like I do; I'm only saying that it should be possible to do so. (Well, OK, I went back and read the article, and I did say that, but I intended it to be interpreted as hyperbole. My apologies.)

  • First, a comment about the "official" topic of the article.

    I do agree, to some extent - but with a twist. I think that all indexes, both unique and nonunique, should be considered tuning intruments. The part you can "code" is what I would call "business rules", and this should be enforced with constraints. SQL Server will automatically create an index for every primary key or unique constraint, so you get them by default.

    If I need to implement a business rule that enforces uniqueness, I will use a primary key or unique constraint. If I can get additional performance by picking a smart order of the columns in the constraint, I will - but the main purpose of the constraint is to enforce the business rule.

    When I create an additional index for performance, I will declare it as unique when I know that it will contain unique values. Which can only be the case if it contains all columns of a unique constraint or of the primary key. Protecting integrity is not what I create an index for. The only reason I declare it as unique is to give SQL Server's optimizer more information, not to enforce the business rule.

    But that theory does not always fly. SQL Server's non-standard implementation of the UNIQUE constraint means that I sometimes have to use a filtered index instead of a constraint. And there are also edge cases where even with ANSI-compliant UNIQUE iomplementation, I would still need to use a filtered index to enforce a business rule. In those cases, I will have to grit my teeth and accept having to use a (filtered) unique index.

    There are also some other elements in the article I want to comment on. The first is this quote:

    "A non-unique index will not be used if the data is not of sufficient quantity. If there are only a dozen rows in a table, then the server will ignore indexes because it's faster to scan the table than it is to use an index. There must be just enough overhead involved in an index seek or an index scan to make it not worthwhile for small tables. I experimented recently and found that the server only started using indexes when there were tens of thousands of rows in the table."

    Please experiment a bit more. This statement is completely incorrect. Even on tables with very few rows, indexes will be used.

    My final comment is on the automated index maintenance. And that comment is: "please don't".

    I have seen missing index recommendations that make no sense at all (for instance, add an INCLUDE clause for the primary key column), recommendations for indexes that already exist, and I have once had (and unfortunately lost) a script where SQL Server recommends an index, then does not use it for the same query but recommends a duplicate copy of the same index. Add automated index creation to that script, and you're right on track for misery,

    There can also be lots of overlaps between missing index recommendations. I have seen indexes recommended on (Col1, Col2), (Col1, Col2, Col3), (Col1, Col3), and (Col1, Col3) with an include of (Col2). A single recommendation can satisfy all four - not with 100% effectivity but close enough and with much lower overhead. If each of those indexes is used, your script that drops unused indexes will not pick this up. But using a single index instead of all four would be a much better choice!

    Dropping unused indexes is very dangerous. SQL Server tracks how often the index is used, not what effect it had. A single use of an index might save two hours, or a million uses might combined save a minute. And an index that is completely unused might be that single index that makes the difference between an end-of-year reporting job fninshing in an hour or running for two weeks (and holding locks on all tables).

    I have seen your reply and it is welll possible that you know what you are doing. But by publishing this article, you will create lots of problems on other servers. Way too many people will see the article, fail to read the warnings in your code (tl;dr), or the discussion on the boards (tl;dr2) and simply copy/paste your code. You can say that they are responsible themselves for the problems they introduce the stupidity of copy/paste-ing code withhout really understanding what they do, and you would be right - but I think that authors have a responsibility too.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 52 total)

You must be logged in to reply to this topic. Login to reply