Database Design - Need help in Choosing a primary key

  • Hi All,

    I am kind of confused. I need some help in choosing a primary key for the database which i am designing.

    I have few tables which contains 5 -15 fields out of it 3 - 9 columns combined to form the uniqueness of the row.

    All are un-related tables. Three parent tables connect with 20 child non-related child tables.

    I believe it would not be a wise choice to choose 3 to 9 fields for primary key.

    But if i use an auto increment as a key will there be of any use as it might never be used to fetch rows. Then why do i still have to go with that?

    Or Is it ok to create a primary key of upto 5 attributes?

    Can someone help me on this?

    Thanks.

  • It all depends on your business rules.

    If the rule is that you must identify each row uniquely within a table then you need the default primary key.

    How will you join 3 tables with 20 child tables?

    Alex S
  • Thanks,

    For example, user id is used as a linkage across multiple tables.

    I am concerned about defining a key for the child tables as it requires 3- 9 columns to make the row unique. Will that be optimal?

    Or defining a surrogate key will be of any use?

  • shafirmubarak (4/30/2014)


    Thanks,

    For example, user id is used as a linkage across multiple tables.

    I am concerned about defining a key for the child tables as it requires 3- 9 columns to make the row unique. Will that be optimal?

    Or defining a surrogate key will be of any use?

    If 3-9 columns make the row unique, even if you create a surrogate key, you'll still need a unique constraint to ensure the business model is accurately reflected in your data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes, the unique enforcement would be absolute necessary.

    But i was not sure how surrogate key will be of use?

  • shafirmubarak (4/30/2014)


    Yes, the unique enforcement would be absolute necessary.

    But i was not sure how surrogate key will be of use?

    In some situations it will help performance to have a narrower index. But that's just a general observation. It's hard to be more specific with no details.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (4/30/2014)


    shafirmubarak (4/30/2014)


    Yes, the unique enforcement would be absolute necessary.

    But i was not sure how surrogate key will be of use?

    In some situations it will help performance to have a narrower index. But that's just a general observation. It's hard to be more specific with no details.

    It can also make sql development a little easier. You would have only 1 parameter for identification of the row instead of 5. Also your where clause is only 1 predicate instead of 5. Now I wouldn't recommend designing your database around ease of coding but that is a difference.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The number one rule of a primary key is that it must be "meaningless" yet unique. To be meaningless it must not be one of your four columns that are under user control. Your four columns can and should be an Alternate Key (unique constraint/index).

    I've gotten away from Identities as primary keys and I feel liberated. Identities are non-ANSI and prevent batch insertion. Personally, I prefer uniqueidentifiers for the PK.

    All non-clustered indexes automatically include the primary key so if you want skinny indexes use a single column for your primary key.

  • Bill Talada (4/30/2014)


    The number one rule of a primary key is that it must be "meaningless" yet unique. To be meaningless it must not be one of your four columns that are under user control. Your four columns can and should be an Alternate Key (unique constraint/index).

    I've gotten away from Identities as primary keys and I feel liberated. Identities are non-ANSI and prevent batch insertion. Personally, I prefer uniqueidentifiers for the PK.

    All non-clustered indexes automatically include the primary key so if you want skinny indexes use a single column for your primary key.

    Why a guid over an identity? They take up a lot more storage and are a complete PITA to debug. Non-ANSI honestly doesn't really matter because database agnostic is a farce.

    I am not sure how switching from one type of generated value to a different (and more painful) one makes you feel liberated.

    What do you mean that an identity prevents batch insertion? That is simply not true.

    Also, if you are using a super big column like a guid for a primary key you need to make sure your primary key is NOT clustered. You need to find something else to use as your clustered index.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The number one rule of a primary key is that it must be "meaningless" yet unique.

    This is a matter of opinion. There is no reason a primary key MUST be a magic value. That is what surrogate keys do. By your definition you have completely ruled out any natural keys.

    It should however be a value that is constant. A good example is an automobile. They all have a VIN. It is constant, it is presented to the user AND it makes a perfect primary key for a table that holds information about cars.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Guids are my preference; I never said they were gospel truth. The difference between guids and identities is that identities get generated on insert whereas guids can be generated on clients so clients can generate child table foreign keys already linked.

    Identities can be batch inserted if you use the "SET IDENTITY_INSERT <table> ON" but then you've got to do the locking to prevent others inserting at the same time. If you don't care which identity value is assigned, you're OK but most apps insert child rows after parent rows so you've got to have an alternate key to fetch the rows back and match on to see which identity was assigned to what row, a real PITA.

    I generate helpful debugging views for each table so my developers never suffer from the PITA debugging you speak of for guids.

    I switched 450 tables from int identities to guids and my app ran noticeable faster on a 100 GB database with a 10% increase in size due to the datatype change. And we are set to batch insert rows now for an estimated 10x improvement in the near future from not network round-tripping identity inserts.

    Standardizing development to guid PKs can greatly simplify application development over using different natural keys like VINs for each table and I'm not sure a VIN is any shorter than a guid. You are correct that a PK should definitely be considered read-only and not updated.

  • Bill Talada (4/30/2014)


    Guids are my preference; I never said they were gospel truth. The difference between guids and identities is that identities get generated on insert whereas guids can be generated on clients so clients can generate child table foreign keys already linked.

    Identities can be batch inserted if you use the "SET IDENTITY_INSERT <table> ON" but then you've got to do the locking to prevent others inserting at the same time. If you don't care which identity value is assigned, you're OK but most apps insert child rows after parent rows so you've got to have an alternate key to fetch the rows back and match on to see which identity was assigned to what row, a real PITA.

    I generate helpful debugging views for each table so my developers never suffer from the PITA debugging you speak of for guids.

    I switched 450 tables from int identities to guids and my app ran noticeable faster on a 100 GB database with a 10% increase in size due to the datatype change. And we are set to batch insert rows now for an estimated 10x improvement in the near future from not network round-tripping identity inserts.

    Standardizing development to guid PKs can greatly simplify application development over using different natural keys like VINs for each table and I'm not sure a VIN is any shorter than a guid. You are correct that a PK should definitely be considered read-only and not updated.

    Sounds like you guys do lots of batch inserts. A guid is probably a much better choice in that situation. Just make sure that if you switched your PKs to guids that you either add another index to be the clustered index or make sure you regularly defrag your clustered index on a guid column. The fragmentation of a clustered index on a guid column is staggering. It can and will hit 99% + at as few as 10k rows. If you routinely defrag them then it shouldn't be an issue at all.

    As far as VINs my intention was not to suggest length of key but to suggest that is an excellent choice for a natural key over a surrogate key.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks everyone for your valuable suggestion.

    I have considered all your suggestion. Will go with surrogate key as primary key and have an alternate key to enforce uniqueness and business logic.

  • Bill Talada (4/30/2014)


    The number one rule of a primary key is that it must be "meaningless" yet unique.

    That's your number one rule but definitely not the number one rule. In principle a primary key is any candidate key but by convention it is usually either the "preferred" identifier in the table or the key referenced by foreign keys in other tables.

    Some sensible, commonly applied criteria for choosing a key are: Simplicity, Stability and Familiarity. Lack of meaning is not a requirement.

  • sqlvogel (5/2/2014)


    Bill Talada (4/30/2014)


    The number one rule of a primary key is that it must be "meaningless" yet unique.

    That's your number one rule but definitely not the number one rule. In principle a primary key is any candidate key but by convention it is usually either the "preferred" identifier in the table or the key referenced by foreign keys in other tables.

    Some sensible, commonly applied criteria for choosing a key are: Simplicity, Stability and Familiarity. Lack of meaning is not a requirement.

    I don't disagree with your statement, but the one caveat that I would add is that you don't want the PK value to be changeable. It just adds way too much headache to the entire design. Meaningful or not, it should be immutable (hopefully).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply