Lack of clustered index generally a good idea?

  • https://use-the-index-luke.com/blog/2014-01/unreasonable-defaults-primary-key-clustering-key
    This is my first exposure to this concept. What is you guys' take on this? It sounds like it MAY be a good practice to not use clustered indexes at all (unless you have a specific need to)? Or am I oversimplifying?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement - Friday, November 30, 2018 12:56 PM

    https://use-the-index-luke.com/blog/2014-01/unreasonable-defaults-primary-key-clustering-key
    This is my first exposure to this concept. What is you guys' take on this? It sounds like it MAY be a good practice to not use clustered indexes at all (unless you have a specific need to)? Or am I oversimplifying?

    Having a clustered index is a usually a best practice.  Choosing the right clustered index can be a difficult choice.

  • My interpretation of the blog post is that it's so hard to choose the *right* clustered index that you're better off erring on the side of omitting one completely. Not to mention the separate issue of perhaps separating your primary key from your clustered index, which I'll admit is also a new concept for me. 🙁


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • it is just too darn difficult to choose a good clustering key. As a matter of fact, choosing a good—the “right”—clustering key is almost impossible if there are more than one or two indexes on the table. The result is that most people just stick to the default—which is the primary key. Unfortunately, this is almost always the worst possible choice.

    If someone is complaining about a practice of blindly adding a IDENTITY column to every table and making it the clustered primary key, then I would agree.  You might even argue that letting the primary key automatically default to clustered is a sloppy practice, and I would not be moved to comment.  But arguing that clustered indexes in general are bad is going too far.

    Yes, some thought should go in to choosing the best columns for a clustered index.  They should be narrow, unique, static, and ever increasing.  But I don't find this to be (in general) "too darn difficult" or "almost impossible".

    https://www.red-gate.com/simple-talk/sql/learn-sql-server/effective-clustered-indexes/%5B/url%5D 

    Some of the statements in your link are the kind of thing you hope impressionable youngsters don't hear until they know enough about SQL to recognize how wrong they are.  Not wrong in all cases, but these are very misleading generalizations.  It discusses the "clustered index penalty", but I don't see any mention of the alternative "heap penalties"; record forwarding, not reclaiming space from deleted rows, and other fragmentation issues.

  • Heap tables also have their problems as well.  Updates do not always go back where they started resulting in forward referencing points to get you to where the data actually resides.  With that, SQL Server isn't very good at reusing the space vacated by data moved in a heap table.

  • Great link, thanks! Learning some important basics here at a late date in my career...better than never I suppose.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • If something is difficult, the best possible solution to that difficulty is to just not do it... said no one who accomplished things.

    Yes, choosing the right clustering keys is hard. It's not however impossible. Further, sometimes having a bad clustered index can hurt performance, but frequently, it's just meh, it's not hurting, but it's not helping. In general, when you consider how SQL Server works, I'd err on the side of having a clustered index on every table, even if it's not perfect, with some exceptions. This is as opposed to, all the tables are heaps, because picking the clustered index is hard, and a few exceptions where the clustered key was clear have clustered indexes. I'm positive, based on years of experience and testing, that the clustered approach is going to work better than the heap approach.

    I've read that persons blogs before. They have more than a few controversial approaches. I think it's primarily to gin up hits as opposed to actually teaching useful methodologies.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • One other thing to consider is the table in relation to other tables.

    If the table lives in isolation, then you can make the choice purely based on that table.

    When a table has relations with other tables, the 'design' of the clustering should be done for the group of tables.
    Sometimes even repeating fields (redundant and not normalised) to improve the performance on that specific relation.

    If you have a client, the client has orders and the order has lines, for performance it might be and advantage to cluster everything on :
    (client   (order   (lines ))). So that all three tables have the same ordering querying a combination of the tables has the same processing order in all three tables.
    When you get client/order/lines data for a client, each table has only to be accessed 'once', because all data for the client is 'bundled'. And when processing multiple clients all tables have the same order for the data, so that there is no 'jumping' around in the table.

    Having a clustered key does also help with maintenance updates, because there is 'an order' in the table, you can use that information when processing the complete table.

    Ben

  • By default, I add an identity column to a table and then a clustered index on top of that. I don't recall ever choosing to model a heap table. Having that unique sequential clustered index is about minimizing table and index fragmentation and maintenance, even if the key never gets used by the application directly. I also have a unique non-clustered index on the natural key. Whichever index gets designated primary depends on which gets used in foreign key relationships. In many cases that's the clustered identity column.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Scott Coleman - Friday, November 30, 2018 2:42 PM

    it is just too darn difficult to choose a good clustering key. As a matter of fact, choosing a good—the “rightâ€â€”clustering key is almost impossible if there are more than one or two indexes on the table. The result is that most people just stick to the default—which is the primary key. Unfortunately, this is almost always the worst possible choice.

    If someone is complaining about a practice of blindly adding a IDENTITY column to every table and making it the clustered primary key, then I would agree.  You might even argue that letting the primary key automatically default to clustered is a sloppy practice, and I would not be moved to comment.  But arguing that clustered indexes in general are bad is going too far.

    Yes, some thought should go in to choosing the best columns for a clustered index.  They should be narrow, unique, static, and ever increasing.  But I don't find this to be (in general) "too darn difficult" or "almost impossible".

    https://www.red-gate.com/simple-talk/sql/learn-sql-server/effective-clustered-indexes/%5B/url%5D 

    Some of the statements in your link are the kind of thing you hope impressionable youngsters don't hear until they know enough about SQL to recognize how wrong they are.  Not wrong in all cases, but these are very misleading generalizations.  It discusses the "clustered index penalty", but I don't see any mention of the alternative "heap penalties"; record forwarding, not reclaiming space from deleted rows, and other fragmentation issues.

    By the same token, someone making a blind recommendation to not use a clustered index just because they don't know what to use for a clustered index is just as bad a recommendation, especially since the RID created for heaps is frequently wider than most clustered indexes.  If you can't use it for performance, use it to help control the insert patterns and to keep the NCI's from being unnecessarily wide because they too must eventually reside in memory in order to be used.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply