Unique Indexes Are Code; Non-Unique Indexes Are Data

  • TheSQLGuru (6/26/2014)


    Sorry, but what you propose doing is, in my experience, DISASTROUSLY BAD!!!

    1) No where did I see mention of things like "duplicate", "overlapping", etc. The missing indexes subsystem (MIS) will CRUSH you with INCLUDED-column bloat, duplicate and overlapping indexes, etc. I once spent over 200 man hours at a client unwinding their rampant and un-informed use of MIS and it's nasty upscale cousin the Database Tuning Advisor. Locking/blocking/deadlock issues, horrible concurrency issues, tlog bloat, data modification bloat, slow query compiles, etc.

    2) No where did I see mention of the many CREATE INDEX options, which MUST be used (especially fill factor) if you have any hope of having a good indexing strategy.

    3) No where did I see mentioned that index usage is tracked since SQL Server start time. So if you start dropping indexes after the server was started just an hour or a day or a week ago you are in BIG trouble. You best both be up for a substantial time AND be up long enough to cover potentially important time-period points such as end/start of month/qtr/yr, etc. Without that you could be dropping some critically important indexes.

    4) No where did I see mentioned that developers or admins could use INDEX HINTS in their code. Now you can start breaking code if you don't first search for it in all code sources. And I mean ALL code sources: sprocs/UDFs/views in ALL databases (including system ones), SQL Agent jobs, ALL external repositories including source code for ALL applications, admin scripts, powershell, etc.

    I have seen all of the above at clients, some of it far too frequently. On the one hand it makes me go "KACHIINNNGGG!!" because it is more work. But I still hate seeing it. There is PLENTY of work for me to do just fixing the common design/dev/mx/etc issues everyone does. These things you mention are a case of "a little knowledge is a DANGEROUS thing" and are completely avoidable.

    Agree on all points!

    Igor Micev,My blog: www.igormicev.com

  • The article is an interesting, alternate look at indexing. Your strategy is about indexing for performance, and I guess I recognise that that need can vary with the same database design in terms of functional keys and table columns.

    On the other hand, I like to choose indexes for my database design personally. You'd put me out of a job... although I suppose I wouldn't have to actually tell anyone that the job was being done by SQL Agent now. Hmm.

    Also, there are technicalities about indexes that you didn't consider.

    Here are some of my ideas about Microsoft SQL Server performance, acquired from various sources or made up inside my own head.

    A clustered index physically organises data into key order within a table. It also is the basis of nonclustered indexes, so it should have a very short key. You don't have to have a clustered index at all, but it's likely that you can choose a clustered index that at least theoretically improves the database performance. And it's fun.

    By default, a primary key constraint gets the clustered index. But this might not be the best thing. The primary key represents the natural lookup of individual rows in the table, but that doesn't mean that it'll be good for ranges. And the proper primary key for the data may be long. If basically you deal with added data in time order, for instance, I think you may do better with an int column with identity property as a unique clustered index. It will just act as the interface between the lookup that you do with the primary key or any other useful key, and actually finding your data.

    You put the clustered index on before the nonclustered indexes, which are built using the clustered index. When dropping, you drop the clustered index after the others. Otherwise, since the server doesn't know that that's what you're doing, it will rebuild nonclustered indexes to function without the clustered index.

    Business rules can be formally designed with constraints. There's no direct performance benefit of a constraint index over an index that was just declared unique, but database design tools understand what cosntraints are for. There's also an impact on naming; a constraint must have a unique name in the database (or schema) namespace, and its index has the same name. But other index names can be duplicated between tables. Choosing index names can be part of your approach to managing them. In your system, you might want to have permanent indexes and temporary indexes. So you can write a version of your code so that "permanent" indexes are recognised by name and never touched.

    I think I understand correctly that in fact all the indexes have a unique key in their internal structure. If they aren't declared as unique, the location of a row in a data page is added to the index key internally, making it longer by 4 bytes. Or else the clustered index key is tacked on to the explicit index key. This says to me that if you can make the index key unique by adding a term that either is smaller than that, or is potentially actually useful in the key, then your index costs you less and/or gives you more.

    I'm not quite sure about the existence of the clustered index key inside a nonclustered index. It's present, because the nonclustered index uses the clustered index and its key to find the data, but does that mean that a query that uses the clustered index columns can take them from a nonclustered index that, for instance, covers the query? In other words, that explicitly adding any clustered index column in the nonclustered index key is redundant? Or maybe SQL Server knows that and quietly undoes the duplication of a key column internally?

    Beware of a colleague with management's ear who gets excited about tuning SQL Server. One such has stuck me with a Microsoft white paper that apparently offers a catalogue of things to test and tune, without providing tools to do so. Apparently I'm expected to write those and create a reporting tool. For instance, a rule that a frequently updated table ought not to have more than 3 indexes. At this point, I too mentioned Red Gate tools, but, okay... Alarmingly, I've overheard him talking to a consultant (there is another peril) about identifying those unused indexes and dropping them. But the consultant seems to be a level-headed mature fellow for whom this isn't a new experience.

    I think that a similar outbreak of manageriacal enthusiasm for improving database performance led to all of our servers for years, up to the present, being carefully configured with SSIS disabled, apparently because it's a secret curse invented by Microsoft to interfere with the server. Maybe it is, but I had to write my own parser for bloody comma delimited text because of that.

    Our database designs include hundreds of broadly identical databases - with different data of course - dedicated to individual business units, plus a warehouse of reportable data updated overnight, with 32 regions and, due to my whimsical colleagues, data spread in tables named Costs_2010, Costs_2011, Costs_2012, Costs_2013, WITH DIFFERENT COLUMNS, and then grouped in a view called Costs, which is what reports are run against. For example. These aren't partitioned views, obviously, but I'm pretty sure they ought to be that at least. The excitable colleague prefers partitioned tables, probably because they're newer - although I don't see that they're appropriate for this.

    In the meantime, my indexing includes discovering a setting that either we should have had for all the regions since 2011, or we have suddenly created a need for, that I want to apply across all the copies. So MY idiosyncratic indexing system includes a stored procedure for table Costs_YYYY, for instance, in any of its incarnations, that accepts one fully qualified table name, and primary key name, and performs a series of dynamic SQL fixed-name CREATE INDEX and CREATE STATISTICS on it. There's an intermediate procedure that drops some or all of existing indexes and statistics. On reflection, the current setup would be better if I'd designed the procedure to use database name, schema and table name separately, to facilitate checking whether the target indexable columns actually exist in the table. Anyway, in spare time that I don't have, I find index statements in other people's programs written wherever they first thought of them, cut them out from there, and put them in my special indexing procedure for the table instead. Mine! A wise animal marks his territory.

  • Now I have a process on each database in the development, test, and production servers that creates custom indexes just for that database and gets rid of indexes that aren't being used.

    This is a terrible idea. You are now writing code on a one system, testing it on another system and then releasing it to a third system all with their own indexes.

    Now I can imagine an annual process kicks off at year end. While this is running SQL is recommending a whole bunch of indexes because these queries are "new". Your process kicks off and starts creating indexes for this "new" code, only to find they don't get used because the annual processing has finished. Now these indexes require maintenance and performance is affected negatively until your process decides it's time to remove them.

    There is no way in the world that I would allow this sort of automated maintenance on any of my servers.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Thank you for taking the time to share your experience and scripts.

    I see a fair amount of criticism about what you said, mostly directed at two things: System architects, developers, database designers (developers), and DBAs should all get along and work together to create "optimal" data solutions, and; No script is smart enough to replace a "good" DBA reading through execution plans and knowing everything about the data's universe before making any changes, even to non-unique indices. As always, there is truth on both sides of the issue.

    FI have been blessed with the opportunity to serve in several roles for companies ranging from 2 or 3 people to fortune 100 companies. I would offer these observations based on 35 years of experience with that.

    First, I have yet to ever encounter the panacea where an IT organization works in concert unless the IT department is 1 or 2 people. When I have worked, as I am now, with clients who have thousands of IT professionals around the globe working on an untold number of applications, dashboards, BIT reports, etc, hitting databases all over the company all the time my observation is there is very little, if any, collaboration or cooperation between the database and systems gurus. More likely, there is a turf war of some sort going on between the leaders of those two major IT divisions while both divisions puff their feathers out and tell the other how they don't know enough to have an opinion. The loser in that battle is always the end user, but it rages on as it has since the days of punching RPG cards. That's a social and political issue, not a technical one.

    Second, most large companies today are so squeezed by economic constraints that there are NEVER enough developers and/or DBAs to support existing systems, let alone properly develop new ones. I have often encountered the view that information systems are ancillary to a business, like the systems would be nice to have, but more production workers and a new paint booth would mean more revenue, so the money heads in that direction. I spent over a year once with a fortune 100 company giving presentations to directors and VPs just to convince them that the right IT solutions would actually increase their bottom line and make their own individual jobs easier to accomplish. And this was what we would consider a highly technical company. Yet, their IT division of over 5,000 was completely paralyzed when it came to new applications because they barely had the budget to maintain legacy systems, and an attitude of "get in line" when a business unit manager needed something new. My point is, cooperation among all those involved in the life cycle of an application and associated database is often scarce or non-existent for a lot of reasons that are probably not going to change.

    Third, while it is wonderful to think one just hires a "good" DBA to look a poorly performing database and make it all better, it's highly unlikely for most companies. Very few applications can get the business and budget attention to make them perform better. I'm not saying it doesn't ever happen, just that it's rare. End users, for many of the same reasons discussed above, can often complain about system performance until the cows come home and it will do absolutely no good. The client I am working with right now, one of the largest manufacturing concerns on the planet, has systems that are so slow sometimes it is better to use paper or a homegrown spreadsheet to get the job done. Not better in the true sense of the word, but better from the individual manager's standpoint because he wants to finish his day and go home after 10 hours, or on a Saturday, and the systems are so slow, disjointed, and impossible to query that he gives up and does the best he can. The main problem here is two-fold. First is scale. Something as simple as increasing bandwidth between facilities and regional data centers can cost millions, as can hiring a team of database experts to improve the performance. Everyone outside the IT organization wants it to happen, but no one has budget dollars to give IT, and the IT budget is under sever restraint, as it has been for years, due to global economic conditions.

    All that said, I thought your article was great. While I might not go all the way to automation for creating and destroying non-PK indices, then again I might, at least for some, but only if I thought it would actually improve performance. The simple little scripts you handed out, even if the automation is not deployed, will certainly help gain some quick and dirty insight to what database performance issues might be fixed by simple index changes. For the system I am currently hired to develop, I am the ONLY person - from concept to support, including database design, creation, and maintenance. I would much rather look at the simple output of these scripts than analyze execution plans to see how I can help when a user makes a performance complaint that try to analyze execution plans. The latter may be more "correct", but there is no time or budget for that as I have deadlines to meet and no one else to hand that work off to. It's the good old 80/20 rule, where I have to look for the low hanging fruit when it comes to performance, and your ideas gave me a couple of new thoughts on how to do that.

    Thanks again for sharing.

  • rja.carnegie (6/26/2014)


    I think I understand correctly that in fact all the indexes have a unique key in their internal structure. If they aren't declared as unique, the location of a row in a data page is added to the index key internally, making it longer by 4 bytes. Or else the clustered index key is tacked on to the explicit index key.

    You're confusing a few things here.

    Indexes can be declared as unique or nonunique. For clustered indexes only, SQL Server will add an internal 4-byte value to provide uniqueness. This value, which is for internal purposes only and never exposed to the outside world, is not related at all to the location of the row in a data page - it's simply NULL for the first value, 1 for the first duplicate, 2 for the third, and so on. They will not be renumbered after a DELETE and I don't know how much effort SQL Server puts into reusing values that have come available.

    I'm not quite sure about the existence of the clustered index key inside a nonclustered index. It's present, because the nonclustered index uses the clustered index and its key to find the data, but does that mean that a query that uses the clustered index columns can take them from a nonclustered index that, for instance, covers the query? In other words, that explicitly adding any clustered index column in the nonclustered index key is redundant? Or maybe SQL Server knows that and quietly undoes the duplication of a key column internally?

    Every nonclustered index stores a pointer to the data in its leaf pages. That pointer is the clustered index value (for tables with no clustered index, it will be file#, page#, and slot#). And yes, SQL Server can and will use all values available in the leaf pages to satisfy queries, so an index can cover a query that uses columns in the index plus columns in the nonclustered index. If you explicitly add the clustered index key to the create index statement with the INCLUDE clause, SQL Server will not add a second copy. If you add them as extra indexed columns, SQL Server will add them to the intermediate and root pages (where they would otherwise not be included), but still have only a single copy in the leaf pages.

    For instance, a rule that a frequently updated table ought not to have more than 3 indexes.

    There is no such rule. And even as a guideline, it sucks. Whoever said that should be banned from the SQL Server world for the next nine official matches and forced to work with Oracle for four months.

    I wish I could comment on your management's ideas on SSIS, but I am pretty sure that this would trip this forums profanity filters.

    The excitable colleague prefers partitioned tables, probably because they're newer - although I don't see that they're appropriate for this.

    As a rule of thumb, I' say that in most cases partitioned tables are a better idea then partitioned views. But there may be situations where this does not apply, and I do not know your situation well enough to comment.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • There is no such rule. And even as a guideline, it sucks. Whoever said that should be banned from the SQL Server world for the next nine official matches and forced to work with Oracle for four months.

    I wish I could comment on your management's ideas on SSIS, but I am pretty sure that this would trip this forums profanity filters.

    Oh, VERY nice Hugo!! I almost snorted my coffee out my nose just now! 😀 Looks like we aren't going to cross paths on the SQL Saturday circuit in Europe this summer/fall. Hopefully you are going to make it to the PASS Summit.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hugo Kornelis (6/28/2014)


    For clustered indexes only, SQL Server will add an internal 4-byte value to provide uniqueness.

    Hate to be a stickler here, but I've heard the 4-byte number thrown around so much I assumed it was true - until I got burned when my sizing estimates were off 🙂

    The "uniqueifer" as its called is a variable length column. So even though you have 4-bytes to store the value you have to tack on another 2 to store the column offset.

    Furthermore if you do not have any variable length columns in your table, you have to add another 2-bytes for tracking the # of Variable length columns in your row.

    So you could essentially be adding 8 bytes. Furthermore, althought I haven't witnessed it, I've read uniqueifers can cause null bitmaps to appear in non-leaf pages for nonclustered indexes. That's even more space

  • Thanks for the views and code to generate the index DDL.

    While we probably won't automate it, this definitely gives us a starting point to look at indexing.

    But for now I'll have a DBA review the code before implementing it to ensure that is doesn't clash with other indexes. That said, I ran it against a smaller system on Friday, and accepted the suggestions, which significantly improved performance over the weekend.

    I'd really like to see an expanded/follow-up article with more of your fine tuning in.

  • Excerpts from this article:

    "It doesn't make sense to maintain one set of indexes for all copies of the database. Each database needs its own."

    I completely disagree with this statement. Development, QA, User Acceptance, and production environments should be as close to each other as possible. By "close" I mean database metadata, including indexes and size of data. This way slow queries can be detected at development stage and not after production deployment when all of a sudden dozen users scream "Bloody hell!!!". The only exception to this is rule is sensitive data protection.

  • Interesting perspective. I'm not sure I'd use the terms 'code' and 'data', but unique and non-unique indexes are very different, with different uses. And I do like the missing index subsystem (MIS), handy if one uses it wisely, dangerous if it is used blindly.

    Just be careful of the negative aspect. we all sometimes get there.

    The more you are prepared, the less you need it.

  • Hi Daniel, inspired by your article, I am planning to build my own index maintenance script. To overcome some of the problems mentioned in the discussion, I am planning to implement reporting on the changes made, allow the script to (optonally) make suggestions only, limit the number/size of indexes and so on...

    Because, despite of all the commentrs given in the discussion, I agree with you that automaticall managing indexes, is better than doing nothing, or only then when things go wrong. Because that is what the accidential DBA usually does.

    Now you mentioned that you have a solution with many of these features. So I was wondering if you are willing to share your solution with me. It will save me for sure some mistakes, which I am bound to make when I developmy own solution and I am sure I can't yet oversee the features which are nescessary to make it work.

    Regard

    Herman

  • Herman van Midden (6/26/2014)


    Great solution and interesting article. I see your point of view, but the world is not entirely black and white. I also think automated index maintenance is risky, but hey, I used your suggestions, analyzed them and used them. 🙂 I even got some ideas I overlooked before. 😎

    Thanks!

    rja.carnegie (6/26/2014)


    On the other hand, I like to choose indexes for my database design personally. You'd put me out of a job... although I suppose I wouldn't have to actually tell anyone that the job was being done by SQL Agent now. Hmm.

    cjgray (6/27/2014)


    The simple little scripts you handed out, even if the automation is not deployed, will certainly help gain some quick and dirty insight to what database performance issues might be fixed by simple index changes... I would much rather look at the simple output of these scripts than analyze execution plans... The latter may be more "correct", but there is no time or budget for that... It's the good old 80/20 rule, where I have to look for the low hanging fruit when it comes to performance, and your ideas gave me a couple of new thoughts on how to do that.

    Alex Gay (6/30/2014)


    Thanks for the views and code to generate the index DDL. While we probably won't automate it, this definitely gives us a starting point to look at indexing.

    But for now I'll have a DBA review the code before implementing it to ensure that is doesn't clash with other indexes. That said, I ran it against a smaller system on Friday, and accepted the suggestions, which significantly improved performance over the weekend.

    I'd really like to see an expanded/follow-up article with more of your fine tuning in.

    Herman van Midden (9/25/2014)


    Hi Daniel, inspired by your article, I am planning to build my own index maintenance script. To overcome some of the problems mentioned in the discussion, I am planning to implement reporting on the changes made, allow the script to (optionally) make suggestions only, limit the number/size of indexes and so on...

    Because, despite of all the comments given in the discussion, I agree with you that automatically managing indexes is better than doing nothing, or only when things go wrong. Because that is what the accidental DBA usually does.

    Now you mentioned that you have a solution with many of these features. So I was wondering if you are willing to share your solution with me. It will save me for sure some mistakes, which I am bound to make when I develop my own solution and I am sure I can't yet oversee the features which are necessary to make it work.

    Thank you all for these encouraging comments. It seems like most of the comments are critical of the process, but that those who actually tried it got some good benefit out of it. I find that enlightening.

    I didn't quote all of the "thank you's", but to all who said "thank you" -- you're welcome.

    I'm not sure I can post a verbatim copy of my actual code, since my current employer arguably has a claim on it. However, I think I can make a list of all the features it has and post that, and if you need help implementing any specific feature, I could probably provide some guidance. (Think of it as a homework assignment -- I can't do it for you, but I'll see if I can help you wherever you get stuck.)

    If you want to develop a process like this, I would recommend that you do what I did -- don't trust it, at least at first. Start by just creating the views that will tell you which indexes are not being used and which indexes the server recommends that you create. Use those for a while but create your indexes manually. If you make a mistake, you want to find out now, before you write any code that creates and drops indexes for you.

    When you get to the point where you are confident that the views are returning useful and reliable information, then write code that reads those views and outputs CREATE INDEX and DROP INDEX statements -- but don't execute them automatically; log them, read them, and execute them yourself. Do that for a while, and refine your code-writing code as needed.

    When you get tired of pasting CREATE INDEX and DROP INDEX statements into a query window to execute them, then add code to just go ahead and execute the statements that it wrote. Keep logging them so you can always see what it has done. Write it so that it creates one index and drops one index at a time. Execute that by hand for a while and monitor what it does.

    Finally, when you are confident that all of the code reliably does what you want, then create a job to do it on a schedule.

  • What a great article! I've been pointing a lot of other people to it now that I've found it, because it summarizes my own experience so well.

    Although I have to wonder if half the people who replied actually read your article, the points raised by SQLGuru are valid. Also, the distinction between constraints and indexes is a good one and may solve your personal issue.

    As for teaching developers to write indexes: a lot of developers can't even get their heads around the create table-statement. Asking them to create indexes willy-nilly is a recipe for disaster, especially combined with the valid objections raised by SQL Guru. I've seen DBA's go on a 3-day course about indexing. That sort of investment is out of the question for most developers, who'd rather spend those days on a development course.

    No, the DBA is the one with both the knowledge *and* the access to the database. He or she should do this, just as much as tuning the server performance in any other way is a DBA-job, not a job for a developer. The developer is about the right algorithm, the correct join and the checking of constraints. The DBA about the best performance given the limitations of the hardware. And while you can combine the functions in one person, in large operations this is just not feasible given the workload.

    Is it the job of the developer to know and remain aware of whether the production server has 4 or 8 cores? 1 or 16 GB of memory? SSD's or SAN? RAID configuration? If the answer is no, then the developer doesn't have the knowledge to index anything and you should leave it to the pro.

    Production databases for testing... sure, could be done. But it puts specialized work with an untrained person, and incurring a heavy workload on the DBA to just copy data to and fro, while also maintaining responsibility for making sure all sensitive data is scrubbed. It has the specialist doing generic filler work, and the generalist doing specialist work. It's putting the horse behind the cart, IMO.

    No, while I think automating index deployment is very risky, and the DBA should automate indexing with some care, the DBA is the right person to determine indexing strategy.

  • What utter flamebait.

    Congratulations.

    As to people who are saying it's the DBA's job to index - Developers can't possibly be expected to grasp the subtleties... hogwash.

  • Interesting article, thanks for taking the time to write it.

    Interesting replies too!

    All good food for thought.

    Question: For your particular use case/situation and with your available resources, have you found your solution to be positive and successful, and in any way measurable as such?

    Martin 🙂

Viewing 15 posts - 16 through 30 (of 52 total)

You must be logged in to reply to this topic. Login to reply