The Clustered Index is not the Primary Key

  • Steve Jones - SSC Editor (5/24/2016)


    vliet (5/24/2016)


    Well, heaps do have a slight advantage in certain cases. Because SQL Server ...

    I really dislike opening statements like this. You are almost implying that someone reading this should really consider heaps every time, or at least, often.

    Every table should have a clustered index. Start with that. If you find you have a valid, rational, backed up by data reason, then you can use a heap. However, don't consider heaps from the start.

    I am looking at one of my production databases and from the list of 5 tables 4 have all PK and indexes as nonclustered, 1 has a clustered PK. Sorry, I am still a newbie with SQL-Server. I seems to me that Clustered is the default (in the generated SQL there is no option, but in SQL-Studio it shows Clustered).

    From the "shrink or not to shrink a database" discussion I was following there were suggestions that the best way to for example reorganize a heap is to create a clustered index in for example in another database file and then drop it.

    Sorry, I didn't go through all the responses to my question, I will have to indulge myself into what is the purpose of the Clustered index and why someone decided to use non-clustered. I had an impression that when you need a PK it usually should associated with a clustered index.

    I posted an attachment screenshot.

    Thanks to everyone who responded.

    Just learning some basics here.

    Ryszard

  • paul.bowman (5/24/2016)


    Steve Jones - SSC Editor (5/24/2016)


    paul.bowman (5/24/2016)


    Also sometimes what appears to be the natural primary key, from the point of view of the data model and retrieval use cases, turns out to be the wrong primary key from the point of view of the storage model and how the data is generated.

    It seems you are confusing PK and clustered key, which was my point in the piece. The storage, usage/retrieval, have nothing to do with the PK. The PK is about identifying a field(s) that generate and enforce uniqueness.

    The storage, the impact on insert/retrieval, these are cluster/noncluster issues.

    The point is a natural primary key that is a combined key actually doesn't have an order - so, in the abstract, the ordering is a clustering (or indexing) issue, and therefore separate.

    However, given you can't actually instantiate a primary key without a unique index (clustered or non-clustered) it is still possible to instantiate the primary key in the "wrong" order.

    Admittedly if the primary key is non-clustered, you only have to rebuild a non-clustered index 2-3 times a week, instead of the whole cluster.

    My bad for confusing the distinction being made in the OP.

    The problem I usually see is that people think of the Primary Key as an index. If you read Books Online, the Primary Key is a constraint that SQL Server supports using an index. If you don't specify it as a nonclustered index and a clustered index does not yet exist on a table then SQL Server creates this index as a clustered index.

  • Lynn Pettis (5/24/2016)


    paul.bowman (5/24/2016)


    Steve Jones - SSC Editor (5/24/2016)


    paul.bowman (5/24/2016)


    Also sometimes what appears to be the natural primary key, from the point of view of the data model and retrieval use cases, turns out to be the wrong primary key from the point of view of the storage model and how the data is generated.

    It seems you are confusing PK and clustered key, which was my point in the piece. The storage, usage/retrieval, have nothing to do with the PK. The PK is about identifying a field(s) that generate and enforce uniqueness.

    The storage, the impact on insert/retrieval, these are cluster/noncluster issues.

    The point is a natural primary key that is a combined key actually doesn't have an order - so, in the abstract, the ordering is a clustering (or indexing) issue, and therefore separate.

    However, given you can't actually instantiate a primary key without a unique index (clustered or non-clustered) it is still possible to instantiate the primary key in the "wrong" order.

    Admittedly if the primary key is non-clustered, you only have to rebuild a non-clustered index 2-3 times a week, instead of the whole cluster.

    My bad for confusing the distinction being made in the OP.

    The problem I usually see is that people think of the Primary Key as an index. If you read Books Online, the Primary Key is a constraint that SQL Server supports using an index. If you don't specify it as a nonclustered index and a clustered index does not yet exist on a table then SQL Server creates this index as a clustered index.

    Aha, now I am with you. Thank you.

  • The clustered index determines the physical arrangement of rows on disk. Columns of the clustered index are always part of a non-clustered index even if you don't specify them. In queries that have columns that aren't covered by the columns in a non-clustered index (edit: see stephen's example above), you'll see Key Lookups in the plan. Not sure if that's what Kimberley was referring to, but probably not far off.

    Really good discussion of it (with examples) here: https://www.brentozar.com/archive/2015/08/clustered-index-key-columns-in-nonclustered-indexes/

    The notion that clustered indexes are directly connected to physical arrangement in storage is a persistent 'myth', from what I have been told. Here is a link that discusses it in depth:

    http://blog.waynesheffield.com/wayne/archive/2012/10/does-a-clustered-index-really-physically-store-the-rows-in-key-order/

  • "Tables should have primary keys." I wish others believed/thought that way, because I see a lot of tables with no keys of any kind, let alone have a primary key.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Jacob Wilkins (5/24/2016)


    stephen.lawrenson (5/24/2016)


    Hi Ryszard,

    ...

    A non clustered index, at the leaf level, contains the clustering key (hidden column/s at the end of the defined index).

    ...

    Additionally, the clustering key is included at the non-leaf levels if the nonclustered index is not defined as UNIQUE.

    See the discussion at http://www.sqlservercentral.com/Forums/Topic1750713-2799-1.aspx and Kimberly Tripp's explanation at http://www.sqlskills.com/blogs/kimberly/nonclustered-indexes-lookup-key-btree/

    Cheers!

    Hi Jacob,

    Thanks for the valid expansion on my answer, TBH I didn't want to go too deep for the OP - but if he has got his head around everything else posted this concept will be a doddle for him 🙂

    Cheers

    Steve

  • Hello,

    I've studied some of the MS documentation and I may have some additional questions. My main aim is to understand how to reorganize one of my databases. In order to shrink it correctly I need to understand the concept of heap, primary key, index, etc.

    From this documentation:

    https://msdn.microsoft.com/en-us/library/ms190457.aspx

    there seems to be a clear view what a cluster is.

    Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

    The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

    However when I look at some of the syntax people use to create a table with a primary key there are things I'd like to be sure of. For example from one of the examples quoted previosly in this thread:

    https://www.brentozar.com/archive/2015/08/clustered-index-key-columns-in-nonclustered-indexes/

    I see this syntax:

    ALTER TABLE [ClusterKeyColumnsTest] ADD CONSTRAINT [PK_ClusterKeyColumnsTest]

    PRIMARY KEY

    CLUSTERED ([ID]) WITH (FILLFACTOR = 100)

    Would this have been also correct - without the CLUSTERED key word?

    ALTER TABLE [ClusterKeyColumnsTest] ADD CONSTRAINT [PK_ClusterKeyColumnsTest]

    PRIMARY KEY ([ID])

    There are some examples here:

    https://msdn.microsoft.com/en-us/library/ms174979%28v=sql.105%29.aspx

    The following example shows the column definition for a PRIMARY KEY constraint with a clustered index on the BusinessEntityID column of the Employee table (allowing the system to supply the constraint name) in the AdventureWorks2008R2 sample database.

    BusinessEntityID int

    PRIMARY KEY CLUSTERED

    Based on what MS says somewhere else:

    When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index.

    That would imply we do not need the CLUSTERED clause when we specify the PRIMARY KEY clause, correct?

    This is yet another example from MS SQL documentation:

    CREATE TABLE dbo.Globally_Unique_Data

    (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL,

    Employee_Name varchar(60)

    CONSTRAINT Guid_PK PRIMARY KEY (guid) );

    This will create a Clustered index (implicit CLUSTERED), correct?

  • That would imply we do not need the CLUSTERED clause when we specify the PRIMARY KEY clause, correct?

    That's correct but I like to use it anyway. It prevents me from accidentally creating a primary key based on a non-clustered index because I forget that a clustered index already existed. I tend to assume that I'll make mistakes and then hope to be pleasantly surprised.:-)

  • FunkyDexter (5/25/2016)


    That would imply we do not need the CLUSTERED clause when we specify the PRIMARY KEY clause, correct?

    That's correct but I like to use it anyway. It prevents me from accidentally creating a primary key based on a non-clustered index because I forget that a clustered index already existed. I tend to assume that I'll make mistakes and then hope to be pleasantly surprised.:-)

    Understand, there is nothing wrong with the primary key constraint being supported by a nonclustered index. You need to understand that you need to separate the concept of the primary key and the clustered index. Depending on the data stored and how it is commonly accessed may determine that the clustered index should not be the primary key.

  • You need to understand that you need to separate the concept of the primary key and the clustered index.

    Yep, that's fully understood. My point was that the inclusion of the keyword makes it explicit that that you want a clustered index to support the primary key. If you want a non-clustered index then include the NonClustered keyword.

    I prefer not to rely on the default because you can accidentally end up with the wrong one when, for example, a clustered index already exists on the table.

  • FunkyDexter (5/25/2016)


    You need to understand that you need to separate the concept of the primary key and the clustered index.

    Yep, that's fully understood. My point was that the inclusion of the keyword makes it explicit that that you want a clustered index to support the primary key. If you want a non-clustered index then include the NonClustered keyword.

    I prefer not to rely on the default because you can accidentally end up with the wrong one when, for example, a clustered index already exists on the table.

    Please explain, why is it wrong for a primary key constraint to be supported by a nonclustered index when a table already has a clustered index declared? Again, unless you actually intend on the primary key constraint to be supported by a clustered index, you should ensure that one doesn't already exist prior to creating it.

    I do understand using the optional key words nonclustered or clustered adds clarity to action being completed.

    Personally, I think the default should be a nonclustered index, not a clustered index.

  • why is it wrong for a primary key constraint to be supported by a nonclustered index

    Nothing at all. I don't think I said there was. If I've given that impression it was unintentional.

    I do understand using the optional key words nonclustered or clustered adds clarity to action being completed

    Yep, that was the beginning and end of my point. I like to be explicit and clear (though evidently I sometimes fail in this regard:doze:)

  • Steve Jones - SSC Editor (5/24/2016)


    vliet (5/24/2016)


    Well, heaps do have a slight advantage in certain cases. Because SQL Server ...

    I really dislike opening statements like this. You are almost implying that someone reading this should really consider heaps every time, or at least, often.

    Every table should have a clustered index. Start with that. If you find you have a valid, rational, backed up by data reason, then you can use a heap. However, don't consider heaps from the start.

    I guess the exception to that rule is if the code you write will always do a table scan because the whole table is always everything you want. This is particularly true with import staging tables. When I wrote part 1 of "Hierarchies on Steroids", I actually documented the fact that you didn't want to build a clustered index until after the Nested Sets hierarchy had been built and was ready for use because it would actually slow the build down... by quite a bit, IIRC.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • However, I like Kimberly Tripp's advice. Choose the clustering key separately from the PK, keep it unique, fixed, and narrow..

    I'm not sure I fully understand the "choose the clustering key separately from the PK" part. I thought it was perfectly ok to use the PK as the clustered key as long as it fulfills the other criterias: keep it unique, narrow, static, ever-increasing and fixed width?

  • Sorry for making this to drag on like this.

    Attached is a real example from one of my databases (not my design BTW). It shows a table with 2 first columns. The PK's are KEY_ID and FIXEL_ID, the latter being a VARCHAR.

    I do not see a CLUSTERED index here, a non-clustered one is, this is the extract into SQL:

    CREATE TABLE [dbo].[FIXEL](

    [Key_id] [int] NOT NULL,

    [fixel_id] [varchar](15) NOT NULL,

    [name] [varchar](32) NULL,

    /* ---- etc, etc, */

    CONSTRAINT [PK_FIXEL] PRIMARY KEY NONCLUSTERED

    (

    [Key_id] ASC,

    [fixel_id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

    IGNORE_DUP_KEY = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    Whether a non-clustered index was explicitely requested I don't know. From the screenshot you may see a sample of the data. My question would be, why in this case a non-clustered index was chosen? Is it because a CLUSTERD (default) would not be created due to the types of the PK columns?

    I can provide some more rows to show what's the table made up of.

    Regards,

    Ryszard

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply