The Identity Debate

  • Comments posted to this topic are about the item The Identity Debate

  • Can o' Worms (TM) 🙂

    I'd have to say I'm in the middle and happy to be so. However in my latest systems in which I've worked we've made a lot of use of GUIDs for the ease of creating all IDs in the middle-tier and then pumping it to the DB, as well as allowing for easier merging of two databases.... As always, there are definite trade-offs - I don't advocate it for all situations! Guids are a separate debate though.

    No matter where you sit on the fence, if you're maintaining, accessing or designing SQL Server databases then you should at least have an understanding of identity fields, how they work, why they exist, where they (could be / are) useful, etc. Same with the NULLs vs NO NULLs debate - even if you hate NULLs you'll still encounter them and should know their rules and how to deal with them.

  • I'd have to say I'm in the middle as well. It really depends on the system you are building or supporting. In a data warehouse, I feel they are important. Using natural keys may get you in trouble, especially as you bring data in from a variety of sources. What may be a unique index in a specific OLTP system, may run into some data from another.

    In an OLTP system, you may also need a SID instead of natural key. last company I worked in, we had an ISAM database (application in COBOL) where the primary key was the product id. Problem was that sometime the vendor changed this causing issues with historical data. A SID as the primary key would have prevented this.

    I guess it really comes down to this consultants phrase: "It Depends..."

    😎

  • I think a lot of good information gets lost in the debate between natural keys and surrogate keys. Anyone who doesn't have a good understanding of keys (primary keys, candidate keys, natural keys, surrogate keys, business keys, "foreign keys", car keys, etc.) who witnesses one of these "religious wars" is sure to get completely lost.

    At their core, keys are just a means of uniquely identifying rows in a table. I don't personally have anything against identity columns, guids, or other forms of surrogate keys. Surrogate keys offer a lot of advantages over large natural keys - they tend to be compact, they make for simpler WHERE clauses and join criteria, and they can provide performance improvements over wide natural keys.

    Even so, you should still identify candidate keys that uniquely identify each row even if you're using surrogate keys. Surrogate keys should work in support of natural keys, not as a total replacement for them. But that's just my opinion. 🙂

  • I've got to say that I'm rapidly going over to the dark side and sticking an Identity column in every table.

    More often than not I can identify a natural key to the data, which I am certain will not change, but I still do it. Why? So that it REALLY doesn't matter if the natural key does fall apart on me. I can change one reference somewhere and know that my Parent-Child relationships are totally unaffected. I don't even include this identity key in any stored procs or results, and if I could make it a hidden read-only column I would.

    I recently have been to a number of client sites where we are migrating them from a previous data provider to our data, and the question keeps re-occurring. "How do I map their Ids to your Ids? I need to keep history!", to which my answer is alway the same. Create your own internal Id and map everyone elses to that. it is the ONLy way to guarantee consistency and integrity!

    The other place I've found it useful is in Trigger code. Especially where you get a multi-row update happening and you need to have a fast way of ensuring that you can match the inserted or deleted records to the main tables.

    Lastly, the number of times I've been told that the Natural Key values would never change, only to be asked a year or so later if we can change this natural key value please would surprise you. Its enough to drive a man's hair grey.... oh... wait....

  • Fully agreed with all comments above, particularly how Identity fields should support Natural keys and not replace them.

    From a practical point of view, I personally add Identities to all tables as an absolute guarentee that the row will be unique.

    Too often, I have run into cases where tables aren't given accurate primary/unique keys. If the ETL is not careful, this would allow duplicate rows being written into the table (generally happens with fact records in data warehouses)

    To clean up the resulting duplicate mess, it's quite tricky if you have two rows that are identical in every field, as SQL has no way to delete only one of the rows. Much easier to maintain if you can simply say

    DELETE FROM ... WHERE ID = x

  • Every table must have a surrogate key, integer, guid or whatever used for referential integrity. Natural keys must supplement integrity, because:

    1. they're not always available (usually for 10-20% of tables)

    2. they're natural keys on one system, but are of no use on other, for example SSID, every country has its own personal id system.

    3. they're generally larger than surrogate keys, so they take more space in indexes, FKeys, matching plan, etc.

    4. natural key data and scope my change with time, possibly breaking it, if this happens and your system relies on natural key integrity, you have to redesign the system, if it relies on surrogate keys, you just modify natural key validation.

    About NULLs, if you understand the logic, you figure they have a very good use, they exist with reason and that there's no reason to hate them.

  • I'm another who adds a surrogate identity primary key to my tables and indexes on the natural key. I've too often had the problems of natural key (e.g. employee number) changing, generally because it is created by another system such as Payroll and the business buys a new payroll package.

    I've also had the disaster (not in SQL Server thnkfully) of an auto increment field re-initialising and re-loading the table, thus changing every employee's primary key - that took a lot of manual sorting out! It also improved the backup regime with backups held for longer periods before being overwritten.

    It leads me to worry about auto increment keys, as a table only has to be exported and re-imported with someone forgetting to enable identity insert to change all the keys and I have to admit to hedging my bets in places and adding the natural key as a second foreign key field in one or two tables which would be particularly difficult to re-create.

    I know a manual incremental field would be better but to date it hasn't justified the overheads involved, as my systems aren't so mission critical and we can roll back to the last good backup.

  • I've been adding identity columns on every table ever since I discovered them - for all the reasons that everyone has been pointing out. The only issue that I have is when the DBA/Analyst/Developer doesn't analyse the database properly and a) forgets to add a unique index on the natural primary key and b) leaves the identity column as the clustered index when it really shouldn't be. The second item can be addressed post-implementation but the first one can cause real problems.

  • same as all above except small tables (< 100 rows)

    where using identity would be too

    redundant beside a natural key

    or a meaningful key.

  • I'm one who began using Identity keys everywhere and only discovered their down side when I got to a database (modeled by those who came before me) with over 500 tables and every single one had an Identity key -- including, believe it or not, the table called Sex, which contained precisely two rows.

    As I analyzed the views and sprocs in this database, it became increasingly apparent that we were often looking up sids to sids. The Sex column is obvious, but I think so are columns such as StateOrProvince, in which there are a small number of uniquely identified entries. Countries is another example: there are unique 2- and 3-character codes mandated by the ISO, so why bother with sids? In many such cases, the PKs themselves are enough for a human to identify the meaning of the values, so silly joins can be eliminated most of the time. Granted, occasionally you do need to write out the long form of a country name, but how often?

    On the other hand, how many columns from a Persons table do you need to construct an intelligent key? Certainly the concatenated name components are insufficient, so what else do you add before you can guarantee uniqueness? And uniqueness isn't all: suppose that one of said columns changes its value. Then you break the rule about not changing PKs.

    In conclusion, I stand with one foot firmly in each of the two camps.

    Arthur

    Arthur Fuller
    cell: 647-710-1314

    Only two businesses refer to their clients as users: drug-dealing and software development.
    -- Arthur Fuller

  • I am a total advocate of identity columns. Every table I construct has one for the reasons already stated (guaranteed uniquiness being the main one). Business needs are always changing and I have found through bitter experience that using natural keys often comes back to haunt you. Even with countries, what happens when a country changes its name or splits or merges? (like USSR did?) and in these ever more politically correct days what happens when sex (or gender) become more than male and female? (eg transgendered) or the "accepted" terminology changes? (never say never!!!)

    I must admit I do use Male and Female as the natural keys but who knows whem this might turn round and bite me? 🙂

  • I'm reasonably pro identity, and pro uniqueidentifier as well, especially now that we have newsequentialid().

    When we had Celko in town for a visit his chief complaint about identities was that it allows gaps in the sequence. From his perspective that's horrible - "what did missing row #32 contain, was it a failed insert or someone stealing something and trying to hide evidence" - but he achieves the same affect by using a separate numbering table.

  • I am in favour of using one primary key for the row itself and have it compleately seperated from product id's etc since these have a tendency to change sometimes and I can then change them but keep the relations. The system users usually does not see the actual primary key in the systems I build. This might give a column that might not be nessisary, like a primary key column for an employee and an employee number, but it also gives more flexible systems to change and the security of being able to trust your relations between the tables etc.

  • After a decade in corporate IT I have figured out one thing - everything changes. Even working with medical records I have found that I could not rely on social security numbers to uniquely identify patients in all cases simply because the company decided to call the same patient two different patients in some specific situations.

    So, I have really gone in the direction of identity columns everywhere. This has never really gone wrong for me, so I tend to lean in that direction a lot.

    Recently, I have had to use GUID columns (merge replication to mobile devices) and found them to be just fine as well. I admit that I cannot remember any customer GUIDs, but it is something I can live with.

Viewing 15 posts - 1 through 15 (of 129 total)

You must be logged in to reply to this topic. Login to reply