Identity col Primary Key Vs Composite Key

  • Greetings. I usually use an Identity column as my primary key and create a unique index on the fields that would have been the primary key had I chosen to use a composite key (on a junction table). I find that this makes Foreign Key references simpler and cuts down on redundancy in the tables. However I found that on the Transcender SQL exams they use the composite primary key approach. Is there a downside to using an incremental column as the primary key and creating a unique index on multiple columns for these types of junction tables?

    Thanks

  • I think its preference. I use idents and/or uniqueidentifiers mostly. Nothing wrong with composite keys, just faster for me to type only having one col to join.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • It does boil down to a matter of preference.

    Ex. going thru now.

    Workforce Management Database

    We have a table of SkillSets to Agents and their ranking.

    I Would do

    Skillset

    Agent

    Rank

    with PK on Agent,Skillset.

    Agent is most unique so I place it first. This is due to the fact stats are only stored for first column in composite key, that is the gotcha you have to keep in mind.

    The other developer wants to do

    Identity

    Skillset

    Agent

    Rank

    PK On Identity, and Unique Index on Agent.

    There is some extra stroage space and in the case of a large table now two index lookups done to get the data. Compare both ways and you will see the difference.

    Also keep in mind once a table grows larger than a page in terms of data size it wil create leaf indexes for the clustred index which means all the non-clustred have to be updated due to this movement.

    In addition the non-clustered index stores the values in order with the related clustered index value. So the above will have a index structure of Agent,Skillset,(=Indentity). And since I have to lookup based on Agent I have to use the non-clustered index to get to the clustered index leaf pages to the datas slot in a page.

    Overall there are a few performance hits (which you should be able to ascertain from the previous) with the Identity column as opposed to a good composite key (doesn't mean always use Composite over non-composite but be carefull in choice).

  • hi tswg2002. There's nothing wrong with identity column. I use it more often than composites because it makes joins easy. My rule is, if the primary key is referenced by another table, I use IDENTITIES otherwise I use candidate columns. I only use composites if the table is a "resolution" table and is not referenced by other tables.


    "The way to get things done is not to mind who gets the credit for doing them." - Benjamin Howett

  • Bani the question thow was aimed at junction tables.

    These are tables are where multiple values from two different tables are matched.

    In my previous post I mention the Juntion Table structure. This is mapping two tables and som support for relation of those.

    Agent Table

    Agent (IdentityField)

    AgentName

    AspectID

    CallCenter

    ...

    And

    Skillsets Table

    Skillset (Identity Column)

    SkillSetName

    The Junction table gives the ability to say and Agent has multiple skillsets in a relatinal format.

    In the situation of a Junction table you would never have need to query an Identity Column added for the purpose of uniqness and The identity value would not be used in a related table.

    However odds are most queries would be against Agent and some against Skillset depending on sections of the Management tool. You would definently want to have a Composite index on the two together to enforce uniqueness as you wouldn't want an Agent to have the same skillset listed twice.

    With that in mind you would be wasitng space and performance by making it a Unique non-clustered index with an Identity Column that is PK and clustered.

    Reason

    Non-Clustered Unique Index lookup will do Agent,Skillset to find all related Idnetities

    Agent,SkillSet = IdentityValue

    Then it will lookup in the PK Clustered Identity Index the related Identites.

    2 indexes lookups. Plus storage space for Non-Clustered Index Root and Leaf pages equal to Agent,Skillset total width at Some pages plus Agent,Skillset,Identity total widths on other pages plus some bytes for header and other page info.

    Now if you make Agent,Skillset the clustered index instead and perform your query then you only have one index lookup required.

    Agent,Skillset which on some root and leaf pages will have a database page numbers (this depends on the table size).

    In that situation no lookups will be made against the Identity PK non-clustered index for searching. Thus again storage space for Non-Clustered Index Root and Leaf pages equal to Identity total width at Some pages plus Agent,Skillset,Identity total widths on other pages plus some bytes for header and other page info.

    On top of all that for either if you have 2 indexes you have increased work for Inserts/Updates/Deletes and increased potential for page splits. And with the identity field the SQL Server has to use the proceseor time at INSERT to lock the table with the current IDENTITY value, get the current IDENTITY value, update the IDENTITY value for the next INSERT to have and finally unlock this table.

    Now with making a PK from the clustered composite index and leaving the IDENTITY column out you get the following bennifits. Uniqueness is enforced, and queries only need 1 index lookup (barring special need for other indexes on this table). you get rid of a lot of watse space depednign on te size of your table. You decrease server workload at INSERT and reduce overall potential for page splits. You get the same index structure you would be using for primary lookups still.

    (Note: Reason for Agent, Skillset structure is Agent is will have the highest number of unique values compared to Skillset in the table. With composite indexes only the stats for the first column are stored and to get maximum bennifit of stats this is suggested.)

    Also, we didn't discuss but there is a performance difference in Unique Constraints and Unique indexes, of which a PK is a Uniqeu Constraint. Fortunately you can see the following thread on that disucssion http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=10360&FORUM_ID=9&CAT_ID=1&Topic_Title=clustered+constraint+vs+clustered+index&Forum_Title=General

    But in the case of a junction table a composite index is the better choice as opposed to the sitation of a main table such as my Agent and Skillset tables would be.

  • Oooops. I think I was so agog to post then. I didn't read the original post carefully.

    quote:


    However odds are most queries would be against Agent and some against Skillset depending on sections of the Management tool. You would definently want to have a Composite index on the two together to enforce uniqueness as you wouldn't want an Agent to have the same skillset listed twice.

    With that in mind you would be wasitng space and performance by making it a Unique non-clustered index with an Identity Column that is PK and clustered.


    I never do a clustered index on identities. I have read from some of my database books that there is actually no point of making an identity clustered since it is only used as a pointer.

    quote:


    Now if you make Agent,Skillset the clustered index instead and perform your query then you only have one index lookup required.


    I do this and I am happy to know that I'm on the right track.

    I have another question though. What if the junction table is referenced by another table? Which is better, taking performance/ease in consideration; composite PK or identity?

    Edited by - bani on 03/07/2003 1:00:01 PM


    "The way to get things done is not to mind who gets the credit for doing them." - Benjamin Howett

  • Bani I would have to say that boils down to the reason for references and size considerations.

    For example with size, if I can use two SmallInt columns to represent all the unique possibilities based on other tables then I would go with the two columns as PK. Adding an Identity column of INT is the same storage size and thus both are relatively the same, the only difference is you have to add the extra column, index and IDENTITY requirements which means a bit more overhead from the extra column. But that is not always the case so most importantly you have to consider you specific needs and what poerformance gains and losses you must contend with.

  • I vote for composite keys. Let's say a bug is discovered and the data base key is corrupted. The composite keys (natural keys) will always allow a fix to find the related rows. The identities are really just pointers and remind me of a the old network data bases like idms.

  • The case against identity keys follows along these points:

    1) IDENTITY is proprietary to MSSQL.

    2) Over time, the Surrogate key (IDENTITY) and the natural key WILL diverge. This is the case that PBrazil brings up. The natural key will never diverge from itself. Given proper application of primary key constraint, you'll never get into trouble using the natural key.

    3) Increases in processing power make any minor differences in comparing an int (4 bytes) versus a zip code (9 bytes) negligible. A composite key that uses different data types may make surrogate keys an attractive alternative.

    $.02

    Steve Hendricks

    MCSD, MCDBA

    AFS Consulting Group

    shendricks@afsconsulting.com

    (949) 588-9800 x15


    Steve Hendricks
    MCSD, MCDBA
    Data Matrix

    shendricks@afsconsulting.com
    (949) 588-9800 x15

  • quote:


    1) IDENTITY is proprietary to MSSQL.


    Not exactly correct and not a valid reasoning against IDENTITY columns.

    quote:


    2) Over time, the Surrogate key (IDENTITY) and the natural key WILL diverge. This is the case that PBrazil brings up. The natural key will never diverge from itself. Given proper application of primary key constraint, you'll never get into trouble using the natural key.


    Agree there that the likelyhood of data issues becomes most slimer than a totally new key set.

    quote:


    3) Increases in processing power make any minor differences in comparing an int (4 bytes) versus a zip code (9 bytes) negligible. A composite key that uses different data types may make surrogate keys an attractive alternative.


    Just a note here. Please don't use 9 bytes for a zip code and plus 4. It too can be expressed as an int (4 bytes), you just need to place a constraint to keep the value ranges between 0 and 999999999. I understand the negligible piece but this example is commonly followed in real world but is not the most efficient way to store the data.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply