The Identity Debate

  • I have found even very small tables like 10 row lookup tables can benefit. I have flip-flopped on this one several times but typically I include an identity column as primary key. As others have noted, I have found that the "cast in stone set of codes" will change as soon as the real end users get their hands on the system. I just got tired of writing scripts that update the foreign key values (eg. update the table set colorcode = 'redish' where colorcode= 'ochre').

    It can get a little tiresome including the extra join for each lookup and I miss being able to scan the base table and recognizing the data but I have just about concluded that it is almost always worth the effort. And, you won't know it was wasted effort until you retire the app/database sometime in the future and note that none of the values changed over the life of the app. (not likely).

  • chris.compton (2/12/2008)


    Okay, with the risk of sounding more ignorant than I am, I'll admit that I get a little lost.

    I've been programming for almost two decades, and the terminology I have encountered has varied based on where I worked and whether I was talking to a DBA or not (mostly not).

    Could someone (briefly!!!) explain the difference between

    ·Natural key

    ·Surrogate key

    ·Candidate key

    I though I knew... but reading some sub-discussions in the 68 previous posts leads me to believe that either (1) I don't know as much as I thought or (2) I was right but there are some subtleties I don't appreciate or (3) some other posters were loose or wrong with their examples and/or labeling.

    A simple example (or two) would be sufficient; thanks!

    A "key" is just a set of one or more attributes (columns) that uniquely identify a row in a table. A "natural key" is a key that is meaningful, often in a business context, and occurs naturally in your data. If you have a table of Books, for instance, a natural key might be "ISBN" or "EAN" since it can be used to uniquely identify every row in the table.

    A "surrogate key" is a non-meaningful (in a business context) identifier added to a table to act as a key. As you've probably seen in these posts, it's often useful when the natural key is very wide ("StreetAddress1", "StreetAddress2", "City", "State", "ZIP", for example), or when the natural key changes often.

    Many times you can use multiple sets of attributes to uniquely identify rows in a table. For instance, in the Book table example "ISBN" might be one key, but you may have additional keys which can uniquely identify books. You may be storing the attributes (Publisher, CatalogNumber) or (SKU), or you might even be able to use (Publisher, Title, PublicationDate) to uniquely identify rows. All the combinations of attributes that can be used to uniquely identify rows in a table are "candidate keys".

    Only one candidate key can be "promoted" to the level of "Primary Key", however. This is the cadidate key that you identify to the system as the primary means of uniquely identifying rows. It doesn't have to be the only way, however, and you can always place unique constraints on other candidate key attribtue pairs to enforce their uniqueness.

    Hope that wasn't too verbose.

  • Matt,

    Thank you. I did not detect any disrespect, intended or not. But I could see my original train of thought got a little muddied in the way I stated it.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • What should the default be? Should all PKs be non-clustered? It has to be one way or the other in the system and I'd like to think that whoever build the UI had some reason for doing it this way. At least something better than flipping a coin.

    Steve,

    Personally I think the better idea is to NOT have SQL Server make a choice but put the choice directly in front of the designer. Make the designer choose before you can continue. Will that guarantee the right choice for the situation? Of course not. Might it cause some less experienced designers to discover what it means (in a storehouse of knowledge such as SQLServerCentral)? We can only hope so. And if it has that effect, we are all better designers for it.

    I'm certain the UI designers had their reasons, but I'll bet those are lost in the mists of time and lack of general experience considering how long SQL Server has done this (at least since 6.0, I think). Perhaps those reasons are outdated. After all, how many places were there like this site in those days?

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Ah, but I don't think that works. you still choose a default when you present the choice, meaning either clustered or nonclustered is selected. I guess you could select nothing and force them to pick from a drop down, but I think that would be a pain. Plus you still have scripting, which means when I write "primary key", it's non-clustered by default unless I add the clustered keyword.

    This was tried with the UAC in Vista and what happens is most people hit enter and take the default. They become numb to the dialog and the same thing will happen here. Most people don't know (or care) and just want the table built.

    That's why default behavior is important. 80% of people use it.

  • Steve Jones - Editor (2/13/2008)


    Ah, but I don't think that works. you still choose a default when you present the choice, meaning either clustered or nonclustered is selected. I guess you could select nothing and force them to pick from a drop down, but I think that would be a pain. Plus you still have scripting, which means when I write "primary key", it's non-clustered by default unless I add the clustered keyword.

    This was tried with the UAC in Vista and what happens is most people hit enter and take the default. They become numb to the dialog and the same thing will happen here. Most people don't know (or care) and just want the table built.

    That's why default behavior is important. 80% of people use it.

    I think this problem actually goes much deeper than just the defaults. Part of it is presentation, but part of it is that (in at least one spot), Microsoft is FORCING you to equate one with the other.

    What instance, you say? Glad you asked. Add an XML data type to a heap, then try to build a Primary Index on the XML column, and read the error message you get back. And no - that isn't fluff - they mean EXACTLY what the error message states. The fact that they want to force the issue is rather disturbing to me.

    I personally think that Primary Keys should be enforced by default on "regular" UNIQUE, non-clustered indexes. If it so happens that the PK and the UCI happen to match - that's great, but I want that to be thought out, not dummied in.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • At the Orlando PASS meeting last night, Jeff Garbus did his Indexing for Performance talk and he said something that made sense and relates to this. I probably won't get it right, but I'll try. Primary Keys are a logical contruct while indexes are physical and it is a mistake to equate the 2, which is what the visual designers do.

  • Heh... I know how to solve all of this... put all data into a single Name/Value table... primary key will also be all foreign keys and covering indexes :P:):D:hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/13/2008)


    Heh... I know how to solve all of this... put all data into a single Name/Value table... primary key will also be all foreign keys and covering indexes :P:):D:hehe:

    Stop - you're starting to sound like our "friend" Cimode's position from the infamous epic flame/religious war experience, NULL versus Null?[/url]

    for what it's worth - that thread is now actively referenced in Wikipedia...

    That kind of stuff belongs at best in a datawarehouse. I would just love to see the update queries you'd have to write for an OLTP system running on 6+NF. Actually no - I really would not, because that'd be right around the time where I go back to carving data into stone tablets...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... ya gotta admit, though... it was funny...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/13/2008)


    Heh... ya gotta admit, though... it was funny...

    Oh no denying it (as long as the context isn't lost that that was a tongue-in cheek comment). It's funny...all the way until the CIO reads that and says - "Hey - that's a GRAND idea...let's build ALL databases that way..."

    ...at which point I run screaming and tearing my hair out from his office......:w00t:

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ya gotta learn to do it right, Matt... when you're running and screaming from his/her office, it's his/her hair that you're supposed to have a handful of... not your own 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Miller (2/13/2008)


    Stop - you're starting to sound like our "friend" Cimode's position from the infamous epic flame/religious war experience, NULL versus Null?[/url]

    for what it's worth - that thread is now actively referenced in Wikipedia...

    Really? Where at? I miss my old friend cimode... Or was it "fabian"? Ahhh whatever he calls himself these days.

  • Mike C (2/13/2008)


    Matt Miller (2/13/2008)


    Stop - you're starting to sound like our "friend" Cimode's position from the infamous epic flame/religious war experience, NULL versus Null?[/url]

    for what it's worth - that thread is now actively referenced in Wikipedia...

    Really? Where at? I miss my old friend cimode... Or was it "fabian"? Ahhh whatever he calls himself these days.

    I figured I might catch your attention with that one...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (2/13/2008)


    I figured I might catch your attention with that one...:)

    That you did! Pascal is the most delightful pseudo-academic with a grudge that I've ever had the pleasure of not meeting in person 🙂

Viewing 15 posts - 76 through 90 (of 129 total)

You must be logged in to reply to this topic. Login to reply