Difference between clustered and non clustered index..

  • Hi,

    Someone help me in knowing the difference between clustered and non clustered index.. Based on performance Which is better to use???

    Thanks in advance.

  • A very crude answer to your question would be

    Clustered index work better on tables which have a column that can be used to uniquely identify each row.

    Non Clustered indexes are created on additional column that help query performance on column such as taht used on joins. Mainly if your normalizing you should have a column with a primary key which would be clustered index and the other columns which would be required could be non clustered indexes

    Jayanth Kurup[/url]

  • See http://www.sqlservercentral.com/articles/Indexing/68563/ and http://www.sqlservercentral.com/articles/Indexing/68636/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jayanth_Kurup (7/7/2011)


    A very crude answer to your question would be

    Clustered index work better on tables which have a column that can be used to uniquely identify each row.

    Non Clustered indexes are created on additional column that help query performance on column such as taht used on joins. Mainly if your normalizing you should have a column with a primary key which would be clustered index and the other columns which would be required could be non clustered indexes

    I realize you're posting a "crude" answer but someone may take that as "gospel" and I want to make sure that people understand that it isn't. I've found that there are enough "It Depends" variations of data that there really isn't any type of rule-of-thumb that you can really go by when determining the design of the clustered index except that you really need to be careful what you select for the clustered index columns if the table suffers a lot of inserts and that the clustered index should be as narrow as possible because, as you said, it becomes a part of every non-clustered index automatically except when CI columns are specficially included.

    First, I strongly recommend that anyone reading this take the time to visit and read the articles at the two links presented by Gail and Paul as well as looking up "Clustered Index" in BOL.

    Second, clustered indexes don't always "work better on table which have a column that can be used to uniquely identify each row" if such a value is a UNIQUEIDENTIFIER datatype. Even with the new SEQUENTIAL NEWID(), there will be hell to pay in the form of "page splits" if you put a clustered index on such a column.

    Also, a properly defined nonclustered "covering" index will frequently blow the doors off of a clustered index for performance. Last but not least, clustered indexes work very well for "ranges of data" such as when dates are involved. For that reason, many folks will have a non-clustered index on the PK and a clustered index on some related DATETIME column. Of course, if rows are regularly inserted into the table out of that DATETIME order, you'll end up with massive page fragmentation as surely as if you'd used a GUID.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff , couldn't have explained it better myself. However I have one deviation from whats mentioned in the above post , I prefer not to use a index on the date time column because of the size it uses , usually i have an internal column such as int (i.e identity column). A key point I failed to mention in my post was the "crude" answer applied to performance on reading data off tables and not the impact of inserts update and page splits.

    Jayanth Kurup[/url]

  • Jayanth_Kurup (7/7/2011)


    However I have one deviation from whats mentioned in the above post , I prefer not to use a index on the date time column because of the size it uses , usually i have an internal column such as int (i.e identity column).

    8 bytes instead of 4? Not a major difference and if many queries are date-based the gain can be massive.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Don't really have many date based queries. I use ids since they have little scope for confusion such as date formats and precisions etc. However I would never use a unique identifier as the primary key for a column.

    Jayanth Kurup[/url]

  • Jayanth_Kurup (7/7/2011)


    Don't really have many date based queries.

    I've worked with a lot of them. When all queries against a table filter on the date (inserted date, transaction date or similar), the date column is a good candidate for the cluster.

    However I would never use a unique identifier as the primary key for a column.

    Nothing wrong with a uniqueidentifier as a primary key, as long as the clustered index is elsewhere.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/7/2011)


    I've worked with a lot of them. When all queries against a table filter on the date (inserted date, transaction date or similar), the date column is a good candidate for the cluster.

    The DATETIME column is the only (or leading) key in the clustered index? It is more usual to see a compound cluster with the DATETIME column as the last key. Of course, a compound key means we are no longer talking about 8 bytes (plus 4 bytes for the uniqueifier in rows that require it). Not being able to declare the cluster as unique can have some side-effects too.

    There are some important optimizations in the Query Processor and Storage Engine which work best (or at all) with a single INTEGER column. As always, the factors that influence design and indexing are many and complex, so I mention all this to emphasise that the usual mantra of "it depends" applies here.

    Nothing wrong with a uniqueidentifier as a primary key, as long as the clustered index is elsewhere.

    Indeed, there are also times where a clustered UUID key does makes sense; not just when using NEWSEQUENTIALID - there are other considerations as highlighted in the recent Latch Waits White Paper.

  • Would the above recommendations hold even for partitioned tables ? Keeping in mind the unique role of the column that stores the partition key

    Jayanth Kurup[/url]

  • SQLkiwi (7/7/2011)


    GilaMonster (7/7/2011)


    I've worked with a lot of them. When all queries against a table filter on the date (inserted date, transaction date or similar), the date column is a good candidate for the cluster.

    The DATETIME column is the only (or leading) key in the clustered index? It is more usual to see a compound cluster with the DATETIME column as the last key. Of course, a compound key means we are no longer talking about 8 bytes (plus 4 bytes for the uniqueifier in rows that require it). Not being able to declare the cluster as unique can have some side-effects too.

    Leading column.

    When the vast majority of queries on the table query for a date range (like the last 2 days, last day, last hour), the datetime is ascending and query most of the columns, that datetime is a natural choice. Ascending, non-changing, unique if there's an identity as the second column.

    No, it's not an identity, but if putting the cluster on an identity means either creating a nonclustered index with almost every other column or having most queries tablescan, the minor downsides to it being slightly larger than a single-column integer cluster are just that. Minor.

    It's a bit of a special case, but it's one I've been seeing a lot lately working with transactional data for various clients. For other cases, yes, cluster on an identity is my preference, but not my rule.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Understood, and agreed 🙂

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply