Seeing Double

  • Comments posted to this topic are about the item Seeing Double

  • Just googled this

    http://www.irishdev.com/blogs/merrion/archive/2007/02/20/2962.aspx

    The guy states that duplicate indexes are a drain on server cycles as each update, insert or delete has to be reflected in each index.

    Definately a disadvantage in speed as the transaction and data volumes increase

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • I can certainly see that having an index on ColA,ColB is completely different from an index on ColB, ColA.

    (It's like having a phonebook sorted by First Name, Last Name and another by Last Name, First Name.)

    However if you've got an index on ColA, ColB, ColC is there ever any point having an index on ColA, ColB? (Let's assume they're all ascending and both are non-clustered.)

    It IS a fascinating subject. Deceptively complex and yet deceptively simple.

  • what if the indices were on seperate storage devices from the main data, and the queries could be answered solely by the index itself (i.e. covering indexes), and in a high throughput system one set of queries was pointed to indexA and one set to indexB.

    would there be a performance boost?

    the principle of duplicates doesn't sit right with me, but the above seems to be a grey area.

  • I think you're right - It will defintely be quicker if queries are hitting alternate covering indexes on different physical drives but that would be a hefty compromise in my opinion.

    I'd rather get the performance boost by adding spindles rather than, effectively, hard-wiring a physical disk arrangement into the application code. I appreciate that, sometimes, high performance demands some hard decisions but I don't think this is the answer.

    Great lateral thinking though 🙂

    As an aside we have always included a check for duplicate indexes and foreign keys in our product as, when we were doing release 1.0 we were operating against some fairly large (five years ago and in the terabyte region) databases we discovered duplicate indexes that were sapping the insert and update performance considerably.

    Malcolm
    DB Ghost - Build, compare and synchronize from source control = Database Change Management for SQL Server
    www.dbghost.com

  • For literally the same index, COLA, COLB and COLA, COLB, no, I can't support that. I'd even say that's unsupportable if the first one was clustered and the second one wasn't. I've realized more performance benefits by getting rid of indexes that all shared the same leading edge than I have by adding indexes with the same leading edge. As was said previously, COLA, COLB and COLB, COLA could be very useful indexes to maintain, but then because the leading edge is different, they're just not the same index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • 1) ColA,ColB and ColA are not the same, and should be allowed (although this is rare situation)

    a) This only makes sense when ColB impacts the size of the index significantly.

    2) ColB,ColA is not ColA,ColB are the same, and I don't think should be allowed. A better way to handle that would be ColB,ColA and ColA.

    3) I don't think hints should fail if they reference indexes that don't exist. A warning maybe.

  • Perhaps the support for duplicate indexes is a insidious plot by those who sell processors, memory, drives and SAN trays to help us grow our data and performance needs beyond the bounds of our existing infrastructure, requiring the purchase of more infrastructure, allowing the creation of more duplicate indexes...well, I think you can see where this is going.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Actually there are a few (very specialized) reasons for allowing duplicate indexes to be allowed.

    In one place I worked there was a lot of embedded SQL in the application that used index hints (originally built on SQL 6.5). Since a query would fail (at least in <= 2000) if you had an index hint in the query but the index was not present. I imagine a lot of those indexes had changed over the years as the data changed, and some of those indexes where now in the form of the same index, or a lot of

    Col a, Col B

    Col a, Col B, Col C, Col D

    type indexes.

    Now ideally we could have cleaned up the code and got rid of the index hints, which we did whenever we came accross them, but so that we would not break the systems it was safer to let the indexes stay.

    I personally don't see a reason why it should be illegal to have the same index, let the DBA decide what he wants in the system. Who knows, on some systems there might just be a very good reason why they are there. Of course I would have a very long talk with a deverlop or dba that proposed such an index.

  • I think you may be misinterpreting compatibility. I think it's BACKWARD compatibility. Meaning, they let you create duplicate indexes in previous versions. If they disallow them now, not only will upgrades potentially fail (because you're upgrading from a version that allowed multiple identical indexes to one that would error if it found them), but also apps might break (if someone has an ETL process that inadvertently or misguidedly creates duplicate indexes, their app would break after an upgrade).

    I'm not saying it's a good idea, but that's my interpretation of "compatibility."

  • First, let me say that I prefer avoiding duplicate indexes.

    However, I feel like it might come handy in a very rare situation. Your point of view and critism are always welcome as this is an essai.

    To better show this particular situation let's take index A with fill factor and pad index at 100% and index B with fill factor and pad index at 10% for example and are full defragmented (both) before the daily operations.

    You have a table that get inserted and then selected very often during your daily operations (millions and millions of times) and you can't reindex even using ONLINE.

    You will have an overhead while the insert are made to write both indexes. However in the long run, index A will get very fragmented and it's BTree unbalanced until next reindex. However for index B, it will take a lot longer to achieve the same degradation of Index A and the DB Engine when scanning indexes and statistics for selecting an optimal execution plan come at a point where it will switch from index A to index B for selecting.

    So in conclusion selecting time must be lower than times it took to update both indexes and manage splits and the time it would spent searching into a greater unbalanced BTree (Index A).

    What do you think about that?

  • In small systems... duplicate away... if you have no intention of scaling up and performance is not a potential issue to you. Now on a moral\ethical and straight up 'why would you do that', databases should be designed properly because as the dba\developer you are told one thing. Then when sales gets ahold of this new product that is designed for small customers they go and sell it to the largest customer in the folio. I have had to deal with this numerous times from customers using databases which were meant to temporarily hold banking transactions\images which decide they are a great archive solution and then complain about performance. Your design needs to be robust enough to go a bit above what the product spec sheet says is acceptable use just to aid you in future growth and troubleshooting. Sales should be slapped for doing such things but maybe there is a valid reason.

    Back closer to topic... Even if you have multiple disks with the indexes on each you will still end up with higher reads and with higher memory loads. Your best bet is to definitely get rid of the true dup index and even some of the tertiary indexes based on the data within.

    Example:

    ColA, ColC, ColB

    ColA, ColC

    ColA, ColB

    ColC

    ColA is a 10 digit account number

    ColB is the uique identifier -- in this case Integer -- used to cross to different tables.

    ColC is either 1,2,3, or 4 and there are 15 million items it is not very discrete.

    Based on how your application queries data it woudl be foolish to not get rid of ColA, ColC as it is a direct duplicate and ColC is minimal in length. From my experience i would also get rid of ColA,ColB as ColC is minimal in length and even though SQL Server will still utilize it, there is a performance gain on forcing it to ColA,ColC,ColB. As for the index on ColC alone... This has proven beneficial and issue on some of the systems i support. It truly depends on the system, hardware, scale, userbase, load etc. I have had instances to remove it and to leave it. Due to the fact that you almost always require cross referencing to another table in a join then ColC alone can be removed as you will almost always have ColB and ColC together in your queries atleast for the application.

    WHenever i do performance work it is treated as a special case. I utilize what i have learned previously but no 2 databases (even for the same app) at 2 sites are the same. They have different workloads (even if minimal) which can drastically effect your findings. There are good default indexes to create but once your system gets larger then some customization may be necessary. It is hard to explain to customers that SQL Server has it's own brain and sometimes it doesn't necessarily solve the maze correctly of getting your data back. It will work succesfully but there may be a better path.

  • Keith,

    I think you're right, but that's why we have compatibility mode. If it's < 10 or 11, then allow them, but in new versions don't.

    The idea of one fragmented and one not is interesting. However you'd have to be sure the optimizer had some notion of fragmentation and used the less fragmented one. I guess you could use index hints and constantly rename them at every reindex.

    I still don't really think this is a good idea and while it's easy to say "do what you want", most Big applications are developed as small ones. Developers use small data sets and allowing a team to build duplicated indexes because they don't bother to check if it's already in place seems to be a waste.

  • I was working on a query in which an existing index covered what was needed but wanted to do a group by. I wanted to create a duplicate index in which the group by field was first thinking that the compute scalar would have been faster since it could compute while scanning the index. I didn't see any performance boost from this but it could be because SQL Server is using temp tables while scanning and doesn't see the point of computing the group by until after it moves the index in it's own working tables. Am I correct on this type of assumption? Which is also probably why duplicates aren't necessary in any fashion.

  • I don't understand any reason for a clustered and an non clustered index as it would seem to defeat the purpose of the clustered index. The clustered is reserving pages for additional indexes and fast access by that key because the data is being retrieved in that fashion... i.e. transactions by date.

    Therefore no use for an unclustered index.

    I would defend the stand there is not a need for duplicate indexes due to the overhead on inserts, uopdates and deletes but then I wouldn't even clone myself.

    The only excuse I could think of is if for some reason all the indexes you have are getting bypassed and adding another one would fix that but I can't think of a realistic case where that would occur assuming the tables are designed approprately.

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply