Using a Surrogate vs Natural Key

  • The savings that you're getting from creating your own surrogates are because of the architecture, not anything inherent in SQL. We inherited a UNIX model of files based on magnetic tapes. This is why we have an IDENTITY table property, that the tables in a single schema are disjoint and have to be indexed separately, and so forth. This is not the only model for SQL, and certainly not the best one for performance or storage.

    I'm probably one of the people who still remember WATCOM. It was a spinoff from the University of Waterloo in Canada. They produced a C compiler that out-performed Microsoft and generated some of the smallest, tightest code possible. They also produced an SQL compiler. The University produces some of the best systems programmers I've ever worked with, but they could not do a good human interface.

    Their SQL product knew the difference between a referenced and referencing table. The referenced columns in the key were materialized (one way, one place, one time) then the referencing tables built pointer chains back to that occurrence. Basically, they took a lesson from the old network databases. This meant that no matter how big the key was, the references to it used a simple pointer. It also meant doing joins on primary and foreign keys is fast and cheap (we got really good with pointer chains back in the old network days!). DRI actions to cascade the updates or deletes were also insanely fast; you simply change the reference and left the pointers alone.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Codd never says anything about the physical implementation. This abstraction makes soooo many different implementations. ๐Ÿ™‚

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    Codd never says anything about the physical implementation. This abstraction makes soooo many different implementations. ๐Ÿ™‚

    That's what I was getting at.ย  You said that "What he had in mind would be more like a hashing algorithm than an IDENTITY table property (please note a table property is not a column) or a GUID that the user has to manipulate." and that was my bone of contention.ย  He didn't even come close to recommending something "more like a hashing algorithm".

    --Jeff Moden

    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The original article is absurdly naive. Natural keys are all about data integrity. You can't sensibly ignore the integrity aspects and then assert that two different designs are possible alternatives and that one is "superior" on grounds of performance. If you don't enforce natural keys then your table is probably just an arbitrary dumping ground for duplicate data. And the example execution plan is for an unrestricted scan of the entire table! Did the author even think about what a more realistic sargable query might be?

    To build effective database designs you will need natural keys and sometimes surrogates as well. "It depends" is the practical approach exercised by good database and software developers everywhere.

    Incidentally, this article is the kind of nonsense that you occasionally used to see decades ago. You might expect an article like this to be rejected by the editors - but perhaps I'm expecting too much of SSC.

  • I think there is some confusion going on here, as I couldn't agree more รกnd less.

    I couldn't agree more that checksum and binary checksum functions have a big chance on collissions. They are not reliable if the intend is to generate unique values.

    But want I meant is to use the hashbytes function, and then you can make it as collision-proof as you wish, because you can use hash algorithms like MD5 (chance on collissions already astronomicly small), SHA1, SHA2-128 or SHA2-256. Google these algorithms for their chance on collisions, you might be surprised. So then IMHO you can say the chance that the earth is destroyed by a nucleair disaster is bigger than the chance on a hash collision. Of course their size is bigger than an integer, but as suggested before I believe, you can still use surrogate INT or BIGINT keys for primary and foreign key columns, and use the business key or a hash value of it as a unique index in the table. Then it is just to guarantee integrity/uniqueness of your business keys.


Viewing 5 posts - 31 through 35 (of 35 total)

You must be logged in to reply to this topic. Login to reply