primary key discussion

  • Recently I got into a discussion w/ a coworker over the primary key of a fact table. He wanted to put the primary key on the identity and I suggested putting the primary key on the unique columns which in this instance there are two and they are ints.

    Here’s his logic.

    Cluster index the surrogate key(PK) and then add non-clustered indices to the table. A smaller primary key leads to smaller non-clustered indices and faster performance.

    To me this doesn’t seem right. Why would u cluster an id unless u were going to use it in your queries?

  • i've never heard of anyone using the identity as the primary key

  • Snargables (8/22/2014)


    i've never heard of anyone using the identity as the primary key

    Really? What do you use for the PK on lookup tables? I am not by any means saying it should be the only primary key on any table but this is incredibly common especially for things like lookup tables.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Snargables (8/22/2014)


    Recently I got into a discussion w/ a coworker over the primary key of a fact table. He wanted to put the primary key on the identity and I suggested putting the primary key on the unique columns which in this instance there are two and they are ints.

    Here’s his logic.

    Cluster index the surrogate key(PK) and then add non-clustered indices to the table. A smaller primary key leads to smaller non-clustered indices and faster performance.

    To me this doesn’t seem right. Why would u cluster an id unless u were going to use it in your queries?

    This debate has been, and will continue, raging for decades. Using natural keys vs surrogate keys.

    There are very solid arguments to both sides of this. Using a surrogate key like an identity makes it somewhat easier for RI. You only have to include a single column in foreign tables and only 1 column on join predicates. Of course it also now an arbitrary value as opposed to the natural key. If you are going to use the surrogate key as your join and where predicates there is likely some validity to making this your clustered index. If however you are joining on the natural key this would not make sense.

    At the end of the day there really is not a right or wrong answer that applies to all situations. They should however be consistent across your entire system. Some people have a much stronger opinion about this topic than I do. I have worked on both of systems and they both have their pros and cons.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Snargables (8/22/2014)


    Recently I got into a discussion w/ a coworker over the primary key of a fact table. He wanted to put the primary key on the identity and I suggested putting the primary key on the unique columns which in this instance there are two and they are ints.

    Here’s his logic.

    Cluster index the surrogate key(PK) and then add non-clustered indices to the table. A smaller primary key leads to smaller non-clustered indices and faster performance.

    To me this doesn’t seem right. Why would u cluster an id unless u were going to use it in your queries?

    Expanding on all the good points Sean has made, consider this: what if you needed six or even ten columns of your table in order to uniquely identify a row? And to make it even more awkward, one or more of those columns are nullable 😛

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I don't disagree with the other points, but I'm with you, if you're not using that index, it's a waste. You only get one clustered index. Rather than just arbitrarily pick a column because it's smaller and will help the number of rows on a page in the nonclustered indexes, that's not a good reason to choose that index. Your clustered index is generally, but not always, the best path to the data. It should be used as such. If that means a compound key, fine. And yeah, even three, four, six column compound keys are acceptable, assuming they're not nullable and are not of crazy data types (varchar(250), just don't).

    There are lots of reasons to disagree, but at the end of the day you want to make the choices you make on your system based on sound principles and knowledge, not just some blind following of ill-understood best practices.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Even if you use a natural key as the primary key, I would still consider adding a synthetic id to the tables and making it a non nullable unique index on the table. I can't tell you how many times a primary key value (single or multiple column) suddenly was no longer a good candidate as a primary key for a table due to changing requirements. Having an alternate key readily available has helped me on numerous occasions.

  • Lynn Pettis (8/22/2014)


    Even if you use a natural key as the primary key, I would still consider adding a synthetic id to the tables and making it a non nullable unique index on the table. I can't tell you how many times a primary key value (single or multiple column) suddenly was no longer a good candidate as a primary key for a table due to changing requirements. Having an alternate key readily available has helped me on numerous occasions.

    Further on this, it guaranties an ever increasing order of values, hence preventing fragmentation/page splitting on inserts.

    😎

  • Eirikur Eiriksson (8/22/2014)


    Lynn Pettis (8/22/2014)


    Even if you use a natural key as the primary key, I would still consider adding a synthetic id to the tables and making it a non nullable unique index on the table. I can't tell you how many times a primary key value (single or multiple column) suddenly was no longer a good candidate as a primary key for a table due to changing requirements. Having an alternate key readily available has helped me on numerous occasions.

    Further on this, it guaranties an ever increasing order of values, hence preventing fragmentation/page splitting on inserts.

    😎

    If you make the SID the clustered index. Not necessarily the best choice for the clustered index on a table. You really need to analyze the various data access paths before choosing the best index for your clustered index.

  • Lynn Pettis (8/22/2014)


    Eirikur Eiriksson (8/22/2014)


    Lynn Pettis (8/22/2014)


    Even if you use a natural key as the primary key, I would still consider adding a synthetic id to the tables and making it a non nullable unique index on the table. I can't tell you how many times a primary key value (single or multiple column) suddenly was no longer a good candidate as a primary key for a table due to changing requirements. Having an alternate key readily available has helped me on numerous occasions.

    Further on this, it guaranties an ever increasing order of values, hence preventing fragmentation/page splitting on inserts.

    😎

    If you make the SID the clustered index. Not necessarily the best choice for the clustered index on a table. You really need to analyze the various data access paths before choosing the best index for your clustered index.

    Quite true. One of the stronger cases for using SID as a clustered index is when either an ever increasing order of values cannot be guaranteed or achieved.

    😎

  • CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Example: look up table for two-letter state codes using the two letter code as PK. What would IDENTITY mean? How would it improve anything?

    And sometimes it makes a valid internal identifier for internally derived lookup tables where there aren't any nice ISO standard values.

  • CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Example: look up table for two-letter state codes using the two letter code as PK. What would IDENTITY mean? How would it improve anything?

    Of course for states I would use the code.

    How about something like Coupon types? What would you use there? I am certainly not going to use the description of the type (PercentageDiscount, FreeShipping, BOGO, etc).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Example: look up table for two-letter state codes using the two letter code as PK. What would IDENTITY mean? How would it improve anything?

    If you go international and start getting repeated codes, unless you change your table and PK definition because the previous design would become a failure.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Colums with the IDENTITY property can be per table, and you can have the same value for columns with the identity property in separate tables, so at best it would be the count of insertion attempts that were made to that TABLE.

  • patrickmcginnis59 10839 (8/29/2014)


    CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Colums with the IDENTITY property can be per table, and you can have the same value for columns with the identity property in separate tables, so at best it would be the count of insertion attempts that were made to that TABLE.

    If you want to be really technical, it's not even that. Some identity values can be discarded without a corresponding INSERT attempt, so it's not a true "count" of those. It's just an arbitrary identifier ... which is ok, because that's all that's needed!

    Now, it's far too often used as a clustering key when it should not be, but it can still be useful as a pk or other unique identifier of a table row.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply