Setting up the Primary Key

  • I have been talking to different programmers about the primary field of my tables. I am being pointed in a 1000 different directions, if thats possible. Anyway I use a lot of spreadsheets at work and I want to develop an application to store my data in a database and be able to manipulate and generate reports from that data. Should I use an integer identity seed or a GUID? I'm trying to get this right the first time, is thats possible.

  • This may be worth a read. Bear in mind that by default, the primary key is the clustered index.

    http://www.sqlservercentral.com/articles/Indexing/68563/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Maybe you are getting confused because you are asking the wrong questions. First determine what keys make sense from a business perspective. If you are thinking purely in terms of IDENTITY or GUIDs then you probably aren't considering the real needs of the data model. Ask yourself what facts you are trying to represent in the database and what identifiers the consumers of the data will expect to see. Then you'll be able to determine what the business keys should be. Adding a surrogate key (if any) should always be a secondary consideration - and it's not totally clear from your question if that's what you are really asking about.

    Also understand that indexes and keys are two entirely different concerns that are best kept separate. Keys are about the logical correctness of the data whereas indexes are tool for performance optimisation.

  • David Portas (6/12/2010)


    Maybe you are getting confused because you are asking the wrong questions. First determine what keys make sense from a business perspective. If you are thinking purely in terms of IDENTITY or GUIDs then you probably aren't considering the real needs of the data model. Ask yourself what facts you are trying to represent in the database and what identifiers the consumers of the data will expect to see. Then you'll be able to determine what the business keys should be. Adding a surrogate key (if any) should always be a secondary consideration - and it's not totally clear from your question if that's what you are really asking about.

    Also understand that indexes and keys are two entirely different concerns that are best kept separate. Keys are about the logical correctness of the data whereas indexes are tool for performance optimisation.

    I have to say that I pretty much disagree with everything you said except for maybe not using GUIDs as a key especially since you use terms like "always".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/12/2010)


    David Portas (6/12/2010)


    Maybe you are getting confused because you are asking the wrong questions. First determine what keys make sense from a business perspective. If you are thinking purely in terms of IDENTITY or GUIDs then you probably aren't considering the real needs of the data model. Ask yourself what facts you are trying to represent in the database and what identifiers the consumers of the data will expect to see. Then you'll be able to determine what the business keys should be. Adding a surrogate key (if any) should always be a secondary consideration - and it's not totally clear from your question if that's what you are really asking about.

    Also understand that indexes and keys are two entirely different concerns that are best kept separate. Keys are about the logical correctness of the data whereas indexes are tool for performance optimisation.

    I have to say that I pretty much disagree with everything you said except for maybe not using GUIDs as a key especially since you use terms like "always".

    Cool, so what do you agree with then?

    😉

  • Jeff Moden (6/12/2010)


    I have to say that I pretty much disagree with everything you said except for maybe not using GUIDs as a key especially since you use terms like "always".

    You mean you don't think it's a good idea to bother identifying business keys at all?(!!) Or you think that should be done only after assigning surrogates?(!) It might be helpful to explain what you are talking about.

  • Heh... it's simple. Just reverse almost everything that David said.

    Keys and indexes do need to be considered together... that's part of what DRI is all about.

    Indexes are not just about performance... they're a part of DRI.

    Consumers of the data will not normally see surrogate identifiers unless it's something like "customer number" but it really doesn't matter either way.

    Surrogate identifiers should usually be considered especially if the only reasonable natural keys are private (ie. SSN), are non-unique (names, etc), can change (names, addresses, geographic location, telephone numbers, email addresses, driver's license numbers, credit card/bank account number, etc, etc), have the possibility of exceeding 900 characters (document UNC path, etc), or require a composite of columns.

    Business keys and consumer facing identifiers don't necessarily have a thing to do with each other but its ok if they do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/12/2010)


    Indexes are not just about performance... they're a part of DRI.

    Are you familiar with the SQL standard? It most certainly includes DRI and certainly does not include anything called an index. The two are unrelated, it's just that SQL Server happens to create indexes automatically for uniqueness constraints - not necessarily a useful feature and certainly not an essential part of DRI.

    Consumers of the data will not normally see surrogate identifiers unless it's something like "customer number" but it really doesn't matter either way.

    Exactly so. Which is why the business keys are important to those data consumers as the only means by which to identify rows in the database. If you don't know the business key of your table then you don't really know what the table's rows are supposed to mean. That is why in design terms the business keys should come first and the surrogates after.

  • robblot (6/11/2010)


    I have been talking to different programmers about the primary field of my tables.

    Please do not take it the wrong way but you shouldn't be talking to programmers about this matter, you should be talking to DBA.

    On the other hand I do prefer natural keys over surrogate keys in most environments. A exception would be the use of surrogate keys when doing dimensional modeling for an Oracle based data warehouse environment.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • David Portas (6/12/2010)


    Jeff Moden (6/12/2010)


    Indexes are not just about performance... they're a part of DRI.

    Are you familiar with the SQL standard? It most certainly includes DRI and certainly does not include anything called an index. The two are unrelated, it's just that SQL Server happens to create indexes automatically for uniqueness constraints - not necessarily a useful feature and certainly not an essential part of DRI.

    Agreed... in standard SQL, that's correct. But we're not using "standard SQL"... we're using T-SQL. 😉

    Consumers of the data will not normally see surrogate identifiers unless it's something like "customer number" but it really doesn't matter either way.

    Exactly so. Which is why the business keys are important to those data consumers as the only means by which to identify rows in the database. If you don't know the business key of your table then you don't really know what the table's rows are supposed to mean. That is why in design terms the business keys should come first and the surrogates after.

    Ah... now I know what you mean and I can agree with that provided that people understand that the business key and a surrogate identifier can be and frequently are the same thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think I'm in the middle between Jeff and David. The keys are a logical construct. They ought to be decided upon in terms of how we want our data to be correct from an RI perspective.

    However.

    Keys are implemented as indexes, and that is an important part of performance. You don't want keys and separate indexes on the same fields, since you'll have duplicate indexes. Likewise, you determine the clustered index from a performance perspective.

    I think you need two passes here. One to decide keys, a second to re-evaluate the key choices in terms of indexing and performance.

    And then you need to keep doing this as you work with the system. What you think or decide up front may not be what's best as you start to see people using the system.

  • Jeff Moden (6/13/2010)


    Agreed... in standard SQL, that's correct. But we're not using "standard SQL"... we're using T-SQL. 😉

    I get it. If Microsoft say that the world is flat or that 2 + 2 = 5 then we should all deny science and common sense and believe it is so... at least until Microsoft change their minds in the next version.

    Ah... now I know what you mean and I can agree with that provided that people understand that the business key and a surrogate identifier can be and frequently are the same thing.

    If the business is using it as an identifier then what is the point of calling it a surrogate? If there is to be any purpose in classifying different types of keys at all then surely they should be distinguished by their meaning in the domain of discourse and the way they are used by consumers of the data.

    As soon as you start exposing data to users and business processes then it acquires a potential business meaning - even if it had none to start with. That has consequences. For example you probably can't change it without some business impact whereas a defining feature of a surrogate is that the value is essentially meaningless and may be changed within the database at any time. So I don't see any point in saying that surrogates keys and business keys could be the same thing - the point of those terms is precisely that they describe two different and mutually exclusive uses of a candidate key. In principle all keys are equal. It is only the way they are used that defines them as business keys or surrogates.

  • David Portas (6/13/2010)


    In principle all keys are equal.

    I have to disagree here. Call me old-school but I do differentiate in between natural keys (the ones that have a meaning) and surrogate keys (the ones that represent something that has a meaning).

    Just my two cents.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • David Portas (6/13/2010)


    I get it. If Microsoft say that the world is flat or that 2 + 2 = 5 then we should all deny science and common sense and believe it is so... at least until Microsoft change their minds in the next version.

    Heh... that's precisely why I usually don't try to have a conversation with you. You just don't need to get so bloody sarcastic all the time. No science or common sense has been denied here and I wouldn't mind talking further about it but I'm not going to try to engage in conversation with someone who can't talk without sarcasm.

    Have a nice day, David.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/13/2010)


    Heh... that's precisely why I usually don't try to have a conversation with you. You just don't need to get so bloody sarcastic all the time. No science or common sense has been denied here and I wouldn't mind talking further about it but I'm not going to try to engage in conversation with someone who can't talk without sarcasm.

    Have a nice day, David.

    Perhaps my English sense of irony doesn't work so well in an international forum. I didn't intend any personal attack and I apologise if you thought I was rude.

    My point was simply this: Let's please use words and concepts for what they actually mean and are. There is so much rubbish written these days about data management issues. Mountains of it in fact. I think people need to be more responsible and precise to avoid any risk of perpetuating myths and misunderstandings.

    Separation of logical and physical concerns in database design is an immensely important issue. When discussing such things in a forum for "Newbies" I don't think I'm being too picky by pointing that out. I'm just trying to differentiate the essential concepts that the OP might not have fully appreciated and understood.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply