Do You Really Need Surrogate Keys? Part 1 – Concepts and Methods

  • Comments posted to this topic are about the item Do You Really Need Surrogate Keys? Part 1 – Concepts and Methods

  • "UPDATE dbo.tblProduct*

    SET Color = 'Black'

    WHERE Name = 'ML Mountain Frame-W - Silver, 42';"

    I might be missing something but how can you expect a surrogate key to improve performance if you are not referencing it?

    Pete

  • Anytime this gets brought up (and it does, time and time again), before we argue further, I /really/ recommend reading;

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=2

    and quazibubble's posts. Yes he's argumentative, but puts up an impressive reasoning.

    Worked with a few people over the years who've tried todo weird and wacky stuff, get them spending 30mins going through that thread before arguing further and it's saved many a future project. (course, contractors who won't be around when there's later changes to the project love this stuff, but anyone having to maintain/expand systems are /really/ urged to read that link, and keep reading until it sinks in).

  • Robert Mahon-475361 (10/24/2010)


    Anytime this gets brought up (and it does, time and time again), before we argue further, I /really/ recommend reading;

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=2

    and quazibubble's posts. Yes he's argumentative, but puts up an impressive reasoning.

    I agree. I've just finished couple of weeks long job of repairing dozens of tables and relationships just because some creative admin before me had an idea that multicolumn non-surrogate PKs are cool. And I was repairing just one table.

    I mean - there is something coled business logic!? and it does change over time. So every time it changes you will go throw very painful process of changing table PKs and all FK's and indexes related to it, and all ad-hoc queries and all database object querying it?? And than you will justify that with 2% storage benefits and 1,5% query performance (if so!)? good luck with that! for me, "ID int IDENTITY(1,1)" rules!

    If you don't like how things are, change it! You're not a tree.

  • I wrote a blog article a while back on the theory on surrogate keys based around Codd and Date.

    The link: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2010/03/13/database-design-surrogate-keys-part-1-of-many-rules-for-surrogate-keys-e-f-codd-and-c-j-date-research-and-problems-they-solve.aspx

    Reminds me - I need to go back and finish the other parts I said I would.

    Surrogates are a good thing and save us from all sorts of problems.

    Tony.

  • Thanks for your post... If the Table has already has INTEGER type Natural Key, would you still recommend the Surrogate Key ...

  • kasisriharsharao (10/25/2010)


    Thanks for your post... If the Table has already has INTEGER type Natural Key, would you still recommend the Surrogate Key ...

    Read;

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=2

    All of quazibubble's posts, including his later posts from another account when the first one gets banned and you'll have your answer.

    Out of curiosity though, what integer natural key are you thinking?

  • Personally I think most tables should have a surrogate key, there are exceptions, but the rule is have one imho.

    Why? As I detail in my blog article - this is about concurrency, its about maintainability etc.

    So yes, if there is a candidate key available on the table that is integer then yes I would still recomemnd a surrogate to abstract the data from the relational plumbing and get some stability from possible changes to the natural key(s) that would cause havoc for concurrency.

    Tony.

  • An interesting article, but there seems to be one aspect that has not been covered.

    The article suggests that there is a slight overhead using a surrogate key (as an extra column is created ), however actually this can lead to a massive saving where foreign keys are concerned.

    if your primary key is nvarchar(25) (50 bytes ish) then your foreign key in another table is also nvarchar(50) - this child table may have many times more data than your parent table, so using a surrogate key of INT will drop your storage and retreival costs significantly. (down to 4 Bytes)

    so in a scenario where we only consider 1 table with 100,000 rows then yes we have an extra 400,000 bytes data, but in a scenario with 2 tables we actually save 4,600,000 bytes

    ok my maths might be slightly off, and yes it's VAR so we don't always store the entire 50 bytes of data, so my saving is slightly exagerated, but it does demonstrate the point that a key should be as small as possible...

    I'd love to see the article expanded to test the same scenario but using a join

    MVDBA

  • Robert Mahon-475361 (10/25/2010)


    kasisriharsharao (10/25/2010)


    Thanks for your post... If the Table has already has INTEGER type Natural Key, would you still recommend the Surrogate Key ...

    Read;

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=2

    All of quazibubble's posts, including his later posts from another account when the first one gets banned and you'll have your answer.

    Out of curiosity though, what integer natural key are you thinking?

    He certainly makes some good points.

    However, if you want to see the best point made, it is the one by JCamburn at 05/23/2003 : 23:19:21

    See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=6

    He actually gets it.

    Random Technical Stuff[/url]

  • When I saw the subject of today's featured article in the daily newsletter, I was enthusiastic. But after reading the article, I am quite disappointed.

    I was enthusiastic because there are many misconceptions and bad practices around surrogate keys, and I hoped this article would help eradicate them. But instead, the article itself falls victim to one of the most common misconceptions.

    A surrogate key should never replace the "business" key. It should be provided as an additional key, side by side with the original one. If ProductNumber identifies a product in the first table (without the surrogate key), it is also a candidate key in the second table (with surrogate). To prevent duplicate data entry, it should be declared as such, using a UNIQUE constraint. The table design screenprints posted don't carry this information, but the posted details from the execution plan of the UPDATE statement clearly show that such a constraint is not in place - otherwise, the execution plan would have been much more efficient.

    This renders all the comparisons in the article completely useless. A properly implemented surrogate key always goes hand in hand with a UNIQUE constraint on the "business" key, so such an implementation should be the basis of all comparisons.

    (And that being said, I also question the usefullness of a comparison that is based on one single query and one single update. Plus, the use of the estimated plan instead of the actual execution stats also make the exercise less meaningful, as the estimated plan can sometimes be completely off).

    But I'll remain positive, and hope that the next part(s) of this article will be better!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • An interesting start, but the discussion doesn't go far enough. There needs to be a distinction: what is the target? Is the target OLTP systems? or is the target a data warehouse or an OLAP cube? The function will determine part of (not all of) the need

    There is not enough data here to make formal statements. The optimizer will change it's path (as will the results change) for the following points:

    The optimizer will change...

    * at 500,000 rows, 1M, 5M, 10M, 100M rows.

    * what it's doing when joins are introduced across two tables each at 100M rows.

    * based on the type of indexes or compound indexes used on the table.

    * based on statistics being up-to-date (also the type and depth of statistics run).

    * based on the DOP (degree of parallelism)

    * based on the amount of RAM

    * based on 64 bit or 32 bit core

    * based on the version of SQLServer

    * based on the size of the Database Block

    * Size of the surrogate (not all surrogates can be integer driven)

    * Width of the row in the table

    * based on the logging

    * based on the settings of the database

    * based on the partitioning across I/O channels

    I have seen too many tests run at "low volume" (76k rows, or 245 rows) and stand up and say: Eureka! I have performance.

    Then when the queries make it to production (even a year later), they die a miserable death because of the lack of foresight to test at high volume levels.

    I think a test like this should be done, but it should be done to scale (80M rows in 1 table, then 2 tables joined, then 4 tables, then 8 tables joined).

    Then it should be done again with different settings (as indicated above).

    Great start, keep up the good work, just carry it further.

    Cheers,

    Dan Linstedt

    http://danLinstedt.com

  • Robert Mahon-475361 (10/25/2010)


    kasisriharsharao (10/25/2010)


    Thanks for your post... If the Table has already has INTEGER type Natural Key, would you still recommend the Surrogate Key ...

    Read;

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=6136&whichpage=2

    All of quazibubble's posts, including his later posts from another account when the first one gets banned and you'll have your answer.

    Out of curiosity though, what integer natural key are you thinking?

    You seriously embrace his inane ramblings? :hehe: Good luck with that!

  • michael vessey (10/25/2010)


    An interesting article, but there seems to be one aspect that has not been covered.

    The article suggests that there is a slight overhead using a surrogate key (as an extra column is created ), however actually this can lead to a massive saving where foreign keys are concerned.

    Precisely. Until you actually use the PK as an FK 4 levels down you cannot begin to appreciate how much crap is carved out of the great grandchild table by the use of a surrogate key. All of that crap has to be used in joins if you use a natural key. So with a natural key there is a storage overhead as well as performance overhead. And the overhead of the indexes (storage and maintenance) required on all that crap in each succeeding generation.

    The PK is (or is intended to be) nothing more or less than a pointer from parent to child. It is not what defines uniqueness. The column or columns which define uniqueness can change on a moment's notice. The PK/FK should never change. With a surrogate that is in fact the case, it will never change. With a natural key Murphy says it absolutely will change, often and at the worst possible time.;-)

    Surrogate keys were introduced long ago for a very good reason. They allowed us as designers to have a tool which the DB engine uses to maintain referential integrity in the most efficient manner possible, while simultaneously removing the herculean (and impossible) task of selecting a candidate key that would never change. We get to let the db engine do its job while we do ours, defining correctly normalized entities.

    There is nothing about using a surrogate key that removes the responsibility of proper normalization, or defining candidate keys, or of creating the constraints used to enforce uniqueness. That is absolutely part of our job, and if we don't do all of that stuff every single time we are not doing our job. Telling the db engine how to do its job is not my job however.

    Surrogate PK / FK. Set it and forget it.

    jwcolby54

  • kasisriharsharao (10/25/2010)


    If the Table has already has INTEGER type Natural Key, would you still recommend the Surrogate Key ...

    What if the natural key changes? (It can happen and it DOES happen.) This is the primary reason why I always use surrogate keys. Performance is a nice bonus but I'd be willing to take a small performance hit for consistency and concurrency. Why is it that so many people think that a key needs to add value to the data? Why can't its value just be that it's a key?

    "Beliefs" get in the way of learning.

Viewing 15 posts - 1 through 15 (of 178 total)

You must be logged in to reply to this topic. Login to reply