Using Identity Field as my main relation in joins

  • Hi,

    In all my tables my first field is an Identity field which is set as my primary key. Is it a good practice to use this fied to join between tables or this might give me some trouble in the future?

    10x

    Ivan

  • Ideally the primary key should consist of column that have actual data rather than system generated values.

    for example for a table like customer , the emailid would be a good candidate for the primary key clustered index which in turn helps with the joins.

    Identity columns make more sense in table which store transactional data. for example a table storing orders etc.

    In any cases don't rely purely on the identity column.

    Jayanth Kurup[/url]

  • sammuts (8/16/2011)


    Hi,

    In all my tables my first field is an Identity field which is set as my primary key. Is it a good practice to use this fied to join between tables or this might give me some trouble in the future?

    10x

    Ivan

    It's an excellent choice if it is applicable to the business case, and is unlikely to cause some of the problems associated with a natural key (actual data rather than a system-generated value). This is a big subject, natural keys vs surrogate keys, and you may be better off dipping into a couple of textbooks rather than assembling an array of opinions from here. Almost all will be useful, some will be awful.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • One of the important elements in deciding on a natural key vs a surrogate key is to determine if the value you pick is going to change. In the first example I would argue strongly that email is NOT a valid natural key. Email addresses change. If it is your primary key you can't change it without changing the foreign key reference in every single table that references it. Maybe a better fit in that case would be the DUNS number or some other value that is constant. There is no perfect answer or implementation of this topic. Some are better than others and you have to make sure you understand the nuances of the choice and make decisions that best fit your needs.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Just remember that while an identity value does uniquely identify a row, it doesn't uniquely identify it for the business. So, just as an example, you need to store first and last name and they should be unique (it's an example), simply putting the identity column in place still allows multiple first&last names to be added. You'd have to add a unique constraint to satisfy the business requirement on top of uniquely identifying the row.

    One other consideration, if you do nothing, by default, the PK will be the clustered index. And if you're setting up identity values, they frequently become the cluster too. Just remember, that might not be the most frequently accessed path to the data and you'll want that most frequently accessed path to be the cluster more often than not. Just something else to keep in mind.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • nm

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jayanth_Kurup (8/16/2011)


    Ideally the primary key should consist of column that have actual data rather than system generated values.

    for example for a table like customer , the emailid would be a good candidate for the primary key clustered index which in turn helps with the joins.

    Identity columns make more sense in table which store transactional data. for example a table storing orders etc.

    In any cases don't rely purely on the identity column.

    Gosh... I just can't agree with that, Jayanth. People have more than one email address and a lot of people change them when they change ISP's. Peoples names can change and, if it were actually legal to use SSN's for indentificaiton, even those can change when a special request is made. Even ISO country names change and when they do, you need some way to track historical rows as well as relating them to new data.

    I know a lot of people will raise hell and prop it up on a stick for me recommending such a thing but even if a table has a strong natural key available, I still have an IDENTITY column on the table to save my bacon when the unexpected occurs. That also means that I may have both a PK and an AK on many tables.

    On highly transational tables, an IDENTITY column as the Clustered Index is an absolute must for me to prevent page rampant page and extent splits and the resulting timeouts they will frequently cause especially when a table has a substantial number of FK's associated with it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... I didn't scroll down before I replied. Maybe there won't be so many people raising hell with me after all. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/18/2011)


    Heh... I didn't scroll down before I replied. Maybe there won't be so many people raising hell with me after all. πŸ˜€

    I went there too...and later decided an "nm" would be a much better post to leave behind...something about a horse... :Whistling:

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Heh... I wondered what the withdrawal was. Now I know. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just to add my 2 cents. I've found that identity columns for both clustering key and primary key to be very, very fast when doing complex queries, and as Jeff pointed out you don't get rampant page splits when inserting.

    I do, however, usually have another unique index that defines a business uniqueness for each row. In business queries, the top level table will usually be queried on the business unique index because that's what the users consider to be unique. From there on down all the joining is done on the identity columns and their Foreign Keys into child tables.

    Since Identities are very narrow, 4 bytes for an INT, they are very fast in joins and I usually get MERGE type joins (the fastest for largish result sets) to the child tables since they are ordered.

    I can live with the fact that surrogate keys aren't the best business keys when I trade it off against the performance gains.

    Todd Fifield

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply