Overusing Identities

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 714373

    Comments posted to this topic are about the item Overusing Identities

  • Jeff Moden

    SSC Guru

    Points: 993788

    I use identities for the same reason that people are issued SSN's... It makes life real simple as a unique identifier in the world of large batch programming. As I prepare to install handrails for the upcoming ride :-P, I'll also tell you that if a natural key exists, I'll still usually use an IDENTITY column as at least a Non-null Unique Key and make the natural key the PK unless, maybe, if it's a totally static table.

    I'll usually put the clustered index on an IDENTITY column especially for large tables that are also highly transactional just to keep page splits at the lowest level possible. Sure, a date column would probably suffice but there isn't always a date column and, when there is, data is not always inserted in the same order as those dates. To me, it's worth the 4 bytes per row to keep from having nearly 8k bytes suddenly open up because a single row insert caused a page split. The extra 4 byte column is usually worth it to me in the savings it causes during maintenance.

    I also tend to use IDENTITY columns on everything I import. I always import to staging tables and never to the final table so that if someone does send "dupes", I can easily find them and tell the vendor of their problem. Yes, there are several ways to avoid the dupes but I always want to capture them for one reason or another and the IDENTITY column provides uniqueness when there may otherwise be none. The use of IDENTITY columns also makes it a snap to archive data because I can usually archive rows on one logical end of the table while the users continue to use the more recent other "end" of the table without mutual interference.

    The bottom line is that as with everything else, "It Depends". I can see arguments for using IDENTITIES for all tables, no tables, or just some tables. I happen to be one of those that favors the use of IDENTITY columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • george sibbald

    SSC Guru

    Points: 104200

    Just to note that using a GUID as a primary key might not be a good idea if the index is created as a clustered index due to the random values given to GUIDs, and their length at 16 bytes.

    NEWSEQUENTIALID() can be used in conjunction with GUIDs if they are to be the clustered index.

    ---------------------------------------------------------------------

  • SQLRNNR

    SSC Guru

    Points: 281210

    Jeff Moden (2/27/2010)


    I use identities for the same reason that people are issued SSN's... It makes life real simple as a unique identifier in the world of large batch programming. ...

    I'll usually put the clustered index on an IDENTITY column especially for large tables that are also highly transactional just to keep page splits at the lowest level possible. ...

    I also tend to use IDENTITY columns on everything I import. ...

    The bottom line is that as with everything else, "It Depends". I can see arguments for using IDENTITIES for all tables, no tables, or just some tables. I happen to be one of those that favors the use of IDENTITY columns.

    I favor the use of identity columns for much the same reasons. I don't like clustered indexes on GUID columns, and prefer to use a better natural key or an identity column. Jeff pretty much summed it up for me though.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Steve Cullen

    SSCertifiable

    Points: 5598

    We create identity columns for just about every table. It tends to make things much easier. One such area this makes things easier in is transactional replication.

    One area we don't do this with is in staging tables for ETL import. I tend to go back and forth on the whole identity idea there because there are occasionally dups to deal with, as Jeff has pointed out. It's never been a major issue though.

    Everywhere else we use identity, whether we have a natural key or not.

    Converting oxygen into carbon dioxide, since 1955.
  • Gail Shaw

    SSC Guru

    Points: 1004424

    I don't have a problem with using identities. What I do have a problem with is using identities as the only unique identifier of the entire table.

    When the primary key is an identity (or a GUID) and there is no other unique column or set of columns in the table, then it's easy to get into the situation where the only thing that differs between two rows is a meaningless artificial key. That easily leads to unamusing data integrity issues at a later date.

    It's important to identify the candidate keys during modelling. Maybe one gets selected as the primary, maybe an artificial key gets used instead, doesn't really matter, but then the candidate keys should get unique constraints defined on them so that the meaningless artificial key is not the sole measure of uniqueness in the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Michael Lysons

    SSCertifiable

    Points: 6359

    I've used Identity columns a lot in my data warehouse work, and I love the simplicity they bring. I've also used them extensively when working on transactional systems, and again found that the simplicity made for easier modelling.

    However, more recently I created a new system and decided to go for entirely natural keys. For no other reason than I fancied a change to what it was like, and so I ended up with natural, composite keys that brought a certain elegance to the design.

    I'd say it was more work with the natural keys though. And if I was building the same system again, I'm not sure which way I'd go.

  • bill.sugden

    Old Hand

    Points: 339

    When I was a newbie to DB development (from a programmeer background) I liberally spread IDENTITY columns as PK. Simpler for me I thought and easier for the other developers.

    Now I'm older and wiser. As Gail says, this can bite you if you are not careful. Now I try very hard to identify natural keys, and if they are there, even if composite keys are required, they are at least given a unique index. but the convenience of using IDENTITY for internal work and easier joining is very great, and I would still recommend their usage.

    By the way, I hated Aaron's example of an order detail table. Of course it won't work with a simple Order_ID + Product_ID as PK. I would hope that any order detail table would have a unique ID of Order_ID + LineNo. And this is one case where an identity column IS redundant.

    Bill

  • sqlvogel

    SSCrazy Eights

    Points: 9415

    I'm with Gail on this. What you choose as a primary key is not important. Candidate keys are the important issue. Just be sure you are enforcing the candidate keys required for the integrity of the data. Don't fall into the trap of thinking that "primary key" means a table has only one key or that one key is more important than others.

  • Filipe

    SSC-Addicted

    Points: 449

    Actually I must disagree and say that Identity columns are nice and convinient but can be very bad as well. Why would someone define an identity column on a table that already has another unique key, specially if that column is an int or bigint?

    Recently I had living proof that is not a good practice.

    I have one database that was developed with the maxim that identities are never enough, and this one table had a natural key (unique, int) that was theonly field used in the application, and had the identity, that was the clustered PK.

    I started noticing dead-locks on that table, and the dead-lock was between updates on the PK and the unique key. Since all access to the table was via the UK, when deleting from that table the order of the locks was the reverse of the selects, and it resulted in dead-locks. Getting rid of the PK (and the identity all together) got rid of the problem as well.

  • Jeff Moden

    SSC Guru

    Points: 993788

    Filipe (3/1/2010)


    Actually I must disagree and say that Identity columns are nice and convinient but can be very bad as well. Why would someone define an identity column on a table that already has another unique key, specially if that column is an int or bigint?

    Recently I had living proof that is not a good practice.

    I have one database that was developed with the maxim that identities are never enough, and this one table had a natural key (unique, int) that was theonly field used in the application, and had the identity, that was the clustered PK.

    I started noticing dead-locks on that table, and the dead-lock was between updates on the PK and the unique key. Since all access to the table was via the UK, when deleting from that table the order of the locks was the reverse of the selects, and it resulted in dead-locks. Getting rid of the PK (and the identity all together) got rid of the problem as well.

    Did you move the PK to the remaining column? Same question for the Clustered Index. Also, did this just happen to be a sequence table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • WI-DBA

    SSCrazy

    Points: 2110

    I agree with Gail. I think the overuse of identities is a problem. Too often I come across data integrity issues caused by the theory of "identity on every table" rather than utilizing the keys from the parent table as part of the child table. Even if the parent table PK is an identity, the child table can benefit (and subsequent grandchildren) from carrying the parent PK. Integrity improves and simplifies queries - and we all know developers need all the help they can get.

    Cheers
    http://twitter.com/widba
    http://widba.blogspot.com/

  • Filipe

    SSC-Addicted

    Points: 449

    Yes, moved PK and cluster to natural key column, and no, this is not a sequence table.

  • Grant Fritchey

    SSC Guru

    Points: 395267

    I'm going to go with Gail & David, there's nothing all that wrong with using Identities as your PK as long as you've also defined the natural key through a unique constraint on the table. Otherwise, it's a recipe for data disaster.

    I actually lean towards a design model these days that uses the Identity on all the key tables, and then uses compound keys on all the child and relationship tables. Those tables don't get an identity. I've found, after lots of testing, that most of the data access at our company is pretty hiearchical and building the tables in this way provides for the clustered index being the best access path to the data.

    We've also been growing quite a few systems with code that writes code (ORM flavors, nHibernate, CRM, SharePoint,etc.) which frequently leads to GUIDs. The sequential ones work very well. They're a bit fatter than your basic integer, so you suffer in the number of pages used, but other than that, the sequential GUIDs work very well. The random GUIDs on the other hand... they're a royal pain.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • jboc

    SSC Rookie

    Points: 46

    I tend to focus on High Performance databases where performance is critical. Aaron Bertrand suggests skipping a PK even while having a cluster in his ActivityLog example. From a performance aspect not having a PK strikes this writer as foolish. I tend to like the "ID INT IDENTITY(1,1) PRIMARY KEY," style since it takes half the index space as making the Datetime a PK. It tends to be a trade off, I like to use more storage space for faster performance, some people will trade performance for a little cost savings.

    Best wishes,

    Barry O’Connell

Viewing 15 posts - 1 through 15 (of 71 total)

You must be logged in to reply to this topic. Login to reply