Give Up on Natural Primary Keys

  • DinoRS - Tuesday, October 2, 2018 10:30 AM

    Jeff Moden - Tuesday, October 2, 2018 8:24 AM

    I'd argue that, at the register level on the CPU, that it matters whether or not they're BIG ENDIAN or LITTLE ENDIAN but it's been a very long time since I've looked at individual machine language operation clock cycle charts. 

    The other thing is, if you're actually talking about "Shift Left" or "Shift Right" operations, how does that even apply here?  "Shift Left" is multiplication by 2 and "Shift Right" is division by 2, neither of which is appropriate for the subject at hand.

    Hi Jeff,
    when you do (need) to consider replacing Natural Keys, "Shift  Left / Right" is quite viable. It might get even more viable when you consider a lot of rows in a very short time.
    Intel might agree with me on this subject, considering the following statement:

    In processors based on Intel Core microarchitecture, shift immediate by 1 is handled by special hardware such that it does not experience partial flag stall.

    Ofcourse, this might have changed with newer Architectures again, however I wouldn't expect any other result regarding these operations. You might not consider something like this unless you're heavily bound by CPU Resources or if your deep analysis uncovers these stalls to happen in your systems.

    Another point while being on this is: I never said I'd use uniqueidentifier or int or similar for this, have a look at this in binary:
    01000100
    01000110
    01000101
    01000010
    01000001

    Can you see how this might be much more benefical for Indexing than say a GUID or even NEWSEQUENTIALID() considering add (as CPU Operation to get value + 1) might get to the point of stalling? It might be even less readable than a GUID - which shouldn't be exposed publicly anywhere either within your application - but for extremely INSERT and Index Seek heavy Operations it should be (partially) faster. You can even go ahead and limit the amount of possible key by defining the size of the binary column appropriately.

    When I said "unique identifier", I wasn't referring to the datatype.  I was referring to anything that would serve as a unique key in the table.

    I agree with Patrick... I see no way (and I don't have the time to wade through the Intel document) that "Shift Left" or "Shift Right" would be beneficial for a large number of rows because 64 shifts would be the limit according to what I'm currently thinking about your suggestion.  Obviously, you're thinking of something else and you'll have to elaborate quite a bit as to how you could use either "Shift" to accommodate loading, say, a million rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eric M Russell - Tuesday, October 2, 2018 7:33 AM

    My point is that the primary key isn't just about internal uniqueness, it's about external referential integrity, and that should drive your decision about how to construct it. Natural keys are very important, and there should always be a natural key for internal integrity (uniqueness within the table), but that key need not be the designated primary (facilitating foreign key relationships). If you can't identify a natural key, then there is something wrong with your data or understanding of of the data; go back to the white board or consult with others in the organization.

    I agree with Eric.  If you are using an RDBMS, you should strive to follow some of the tenants of set theory.  Otherwise, JSON can provide an adequate data store.  One of the basic rules of database design is to identify a natural key for every table.  Purists can be happy by doing this in the logical model but the pragmatics can then create the physical model and add surrogate identity columns to use as the PK and use the natural key as an alternate key, especially if one of the natural key columns can be null (such as middle name).  In Oracle and MSSS the alternate key would become a Unique Key, not a unique index.

    To address Steve's original point of GDPR and restoring backups, you can track the dumb surrogate key in a file and when restores are necessary, this list of surrogate keys can be used to delete the records of the forgotten people in a highly efficient manner and the personal, possibly non-unique, data (email) is replaced by the surrogate key.  Of course, you should never re-use a surrogate key that has been deleted or all bets are off on uniqueness.

    As a practical point, I have worked on existing systems that used surrogate keys only and the application would allow users to start creating an account and associated information and if they stopped along the way, left and came back, they started all over with redundant information that could stop at any point along the way to completing registration.  And for history they were keeping all the records.  In this scenario a lot of time is is consumed with fixing data that cannot always be automated requiring human intervention.

  • hmbacon - Wednesday, October 17, 2018 1:43 PM

    If you are using an RDBMS, you should strive to follow some of the tenants of set theory.

    I think you mean "tenets" not "tenants".

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  •  Otherwise, JSON can provide an adequate data store.

    so if we want relational we go with CSV right?

  • drew.allen - Wednesday, October 17, 2018 2:17 PM

    hmbacon - Wednesday, October 17, 2018 1:43 PM

    If you are using an RDBMS, you should strive to follow some of the tenants of set theory.

    I think you mean "tenets" not "tenants".

    Drew

    The statement compiles true either way, but yes we should follow the tenets of set theory.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell wrote:

    Natural keys are very important, and there should always be a natural key for internal integrity (uniqueness within the table)

    Certainly there should be and it's explicit and implicit throughout GDPR - in relation to accuracy, rectification and portability for example - that PII requires such identifiers. GDPR understandably does not distinguish between identifiers that are created externally and those that are created "artificially" within a system (e.g. by key generators like GUIDs or IDENTITY in SQL Server). The regulation is quite explicit that unique identifiers on servers such IP addresses, cookies, login credentials and "digital identification" are also subject to the rules and that PII includes taking into account "all the means reasonably likely to be used ... either by the controller or by another person to identify the natural person directly or indirectly".

    The idea that we should give up on "natural" identification is not justifiable by GDPR. Businesses have a practical and a legal obligation to identify customers. Natural identification surely includes all the things we use to identify people and all those things seem to be equally subject to GDPR.

  • x wrote:

    Otherwise, JSON can provide an adequate data store.

    so if we want relational we go with CSV right?

     

    BWAAAA-HAAAA-HAAAA!!!!  I blew coffee out of my nose on that one!  +1 Billion!!! 😀 😀 😀

    To be sure, though, I believe the JSON comment was rendered in sarcasm.

    p.s. ASCII characters 28 thru 31 work much better than CSV.   😉

    MakeAsciiGreatAgain

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    For a customer table, probably a login name or an account number. Obviously the answer depends on the nature of the business domain (which is what a "business key" AKA natural key is all about).

    Ultimately all identifiers in a database are symbols invented by people or machines. The important distinction between a natural key and a surrogate is that the surrogate key does not exist in the "real world" - the business process outside the database. If it does take on such a role then it is de facto a natural ("naturalized"?) key and is likely to be subject to GDPR. I am not a lawyer but GDPR is pretty explicit that the regulations apply to identifiers assigned by machines as well as ones assigned by humans. GDPR is not concerned with how identifiers are created, only with how they are used or could be used.

  • Under your interpretation doesn't that mean that all FKs fall under the GDPR? Meaning, ultimately, that relational databases are forbidden?

    Or, if not forbidden, unusable by the vast majority of a company's employees?

  • nova wrote:

    For a customer table, probably a login name or an account number. Obviously the answer depends on the nature of the business domain (which is what a "business key" AKA natural key is all about).

    I'm not really a fan of usernames as a "natural" key. People can want to change their username. If in a business you have a username based on someone's real life name, and they get married, what do you do? Leave their username as it is as you can't change the key? Then you have places where changing is part of the functionality; such as Twitter.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I'm a big proponent of using natural keys where they exist.  But generally for name tables there is no choice but to use some form of identity.  For GDPR purposes you'll need another way to identify a user beyond the primary key.  Even when there is something like an employee number, it's usually randomly generated from somewhere.

  • Jeff Moden wrote:

    I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    Really for both of those I would probably use a key I've created as the new natural key.  For an employee that might be employee ID.

    The argument about natural keys vs surrogate keys only really becomes a debate when you are taking in data that has a logical primary key itself and you have to decide whether you want to maintain that or create your own.

  • I'm surprised Joe Celko hasn't piled into this thread as he always seems to bring up the advantages of natural primary keys and the disadvantages of using numbers as PKs in a lot of the other threads he replies to.

  • Jonathan AC Roberts wrote:

    I'm surprised Joe Celko hasn't piled into this thread as he always seems to bring up the advantages of natural primary keys and the disadvantages of using numbers as PKs in a lot of the other threads he replies to.

    Clearly no one in this topic has said his name into the mirror three times (yet). 😉

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 15 posts - 46 through 60 (of 108 total)

You must be logged in to reply to this topic. Login to reply