Why use a Primary Key constraint (or Foreign for that matter)?

  • I'm trying to understand the nuances in tables and indexes beyond what Management Studio does by default.

    My question is what are the advantages of using a primary key constraint as opposed to just having a unique index on a key (that is a primary key)? I've done searching online and read some parts of books. Everyone seems to agree they should be used, but there is no clear reasoning of what advantage the Primary Key constraint provides over a simple Unique constraint.

    Example:

    create table dbo.Car (

    CarID int not null identity(1,1),

    VIN varchar(25) not null primary key nonclustered

    );

    go

    create unique clustered index CX_Car_CarID on dbo.Car

    (CarID)

    with (fillfactor=90);

    go

    What are the implications of not declaring VIN as a "Primary key", but just a unique key, such as:

    create table dbo.Car (

    CarID int not null identity(1,1),

    VIN varchar(25) not null unique

    );

    go

    create unique clustered index CX_Car_CarID on dbo.Car

    (CarID)

    with (fillfactor=90);

    go

    One reason I'm wondering is that I would like to include some columns in my primary key, but this isn't allowed if the index is for an actually declared "Primary Key". It seems pointless to build a "Primary Key" index and then another, on the same key, with included columns, unless the declaration of "Primary Key" actually provides some performance advantages.

    Incidentally, I'm wondering about this for Foreign Keys as well. Beyond enforcing referential integrity and documentation, are there any advantages to actually declaring these constraints in the schema? It would seem that the enforcement of referential integrity would actually be a bit of a performance hit, no?

    Any thoughts?

    Thanks!

  • By default a primary key will have a clustered index created, assuming there is not already a clustered index on the table. In your first query you create the table and VIN will have a clustered index, then you add the clustered index on CarID and VIN is now a nonclustered index. For your example table I would quite possibly drop CarID and use VIN as the primary key.

    The whole point of foreign keys is for RI. Yes there is a minor performance hit but it is generally not noticeable, and the integrity of the data is worth the hit.

    You might want to take a long look at the stairways articles on indexes. I think when you finish reading those you will have a much deeper understanding. You can find the Stairways over on the left side (4th or 5th from the top).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • dave-L (1/31/2013)


    Incidentally, I'm wondering about this for Foreign Keys as well. Beyond enforcing referential integrity and documentation, are there any advantages to actually declaring these constraints in the schema? It would seem that the enforcement of referential integrity would actually be a bit of a performance hit, no?

    No. In fact, in many cases there can be a performance improvement. The existence of a trusted foreign key gives the optimiser additional information that can well lead to better performance than when there isn't a foreign key.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the info on the foreign keys.

    Aside from the automatic clustered index, any thing special about an actual Primary Key Constraint versus a Unique Constraint?

  • It's not an automatic clustered index. It's by default clustered, that's all. Trivial to have a nonclustered primary key or a clustered unique constraint

    PK disallows nulls.

    This is more a DB design concept than implementation. When you're doing a logical design, you identify the candidate keys (column or sets of columns that are unique). You then chose one as the primary key and the others become alternate keys. When implemented, the primary key is set as the primary key and the alternate keys should get unique constraints or unique indexes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In logical terms there is absolutely no difference. A key (i.e. a candidate key) is a set of attributes which is required to be minimally unique and does not permit nulls. A table may have multiple keys. By convention one key per table is designated as "primary" - usually a key that has some special significance for the designer or user of the database, e.g. a key used as a foreign key in other tables. That's only a convention however and is or essentially should be just a label of convenience.

    In implementation terms there is mostly no difference either. A primary key may be clustered or non-clustered and most of the features supported by SQL Server can apply equally well to any unique constraint or unique index. Unfortunately the SQL syntax designated PRIMARY KEY is actually required in a few places or makes a difference to how some features operate. Replication is one example. Also the PRIMARY KEY syntax is relied upon by some database design and development tools to drive certain features. These limitations of SQL Server and other software are unfortunate because they actually work against the original concept of a primary key - no longer can you necessarily choose the "preferred" or "most significant" identifier as a primary key because you also have to consider the possibly unintended consequences for other features or software.

    In practice therefore there is no easier answer to your question. For the most part it makes no difference at all and that's the way it should be. Any actual difference depends on how the key will be used. It also depends on what your motivation is for defining such a key in the first place - what difference does it make to you? If you can't answer that question then maybe you have no justification for defining it as "primary" in the first place.

    Here's a fun example - hypothetical but a perfectly realistic example of multiple keys. Imagine a table of marriages (for simplicity I'm only considering monogamous, male-female marriages). There are two columns, Husband and Wife. Both columns are keys because we don't want to allow polygamy. There are also referential integrity constraints to ensure that only women can appear in the Wife column and only men in the Husband column. Now, which is the "primary" key - Husband or Wife? Does it make any difference?

  • I think SqlVogel has primarily nailed the technical part of the answer. I'd add one more reason for using a primary key over just using unique constraints. Clarity. By defining the PK, everyone going forward knows what it is. When they create a new table and have to establish a relationship, they'll know what to use. When people have to write queries, they'll know what you intended to be the primary key. Just simple clarity.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • One other point is that if you use a unique constraint to function like a primary you MUST have the column defined as NOT NULL.

    create table dbo.Car (

    CarID int not null identity(1,1),

    VIN varchar(25) unique

    );

    go

    create unique clustered index CX_Car_CarID on dbo.Car

    (CarID)

    with (fillfactor=90);

    go

    insert Car

    select null

    select * from car

    Notice I removed the NOT NULL constraint on VIN. You can't make that mistake if you define the column as the primary key. If you don't specify your column will allow NULL.

    create table dbo.Car (

    CarID int not null identity(1,1),

    VIN varchar(25) unique

    );

    go

    create unique clustered index CX_Car_CarID on dbo.Car

    (CarID)

    with (fillfactor=90);

    go

    insert Car

    select null

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply