The Identity Debate

  • Hi Jeff, Thanks for your input. I want to however say to you that I also do not delete data from a permanent table. This Journals table is a working table if I can call it like that. The users start creating there Journal and edit and delete as much as they like. Once they are finished they post the journal to the live or permanent table. You know, when we finished the accounting system we created in 2000 we asked the auditors to audit it for us. The first thing they said was that we had to take all the DELETE buttons off where a live table is involved. Like you say, the user have to add a counter transaction to reverse the effect of the previous one.

    I would also like to comment and/or ask advice to Lynn Pettis' comment about the merger. A while ago one of my clients bought over another company (their opposition) and I had to merge their data with my client's data. I did not want to create duplicates so I created new numbers (e.g. ContractNumber). What would you guys do?

    :-PManie Verster
    Developer
    Johannesburg
    South Africa

    I can do all things through Christ who strengthens me. - Holy Bible
    I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)

  • I just create an updatable view with a "Company" column that is the union of the two tables. It's a little more complicated than that, but not much because I always plan on it happening 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Manie,

    If you create new numbers for the company being merged, do you keep a cross-reference table so you can map the old numbers to new? You still need to be able to account for the order/invoices that were issued by the merging company.

    I would go with how Jeff does it.

    😎

  • Hi Manie,

    It sounds like you have two separate tables for what are effectively the same entity. Why not have one table and use a flag to indicate whether it is work in progress? You could have a status column that could indicate whether it is "Work in progress", "live" or "deleted".

  • Ok, I'll try to answer all the questions/comments at once.

    1. Lynn, I would create the merge numbers in a way that I would not have to have another table. I would, depending on the size of the invoicenr etc. for instance take the merge invoicenr and plus it with 1000000 and when I need to pull them out again deduct the 1000000. Is this the best way to go? I don't know but it worked for me in my specific case. This wil be an "it depends" case like you said previously.

    2. The two tables? I inherited it so have to stick with it for now but your idea is a very good one, Andrew. I will definitely do that when the need arise again.

    Thanks guys,

    Manie

    :-PManie Verster
    Developer
    Johannesburg
    South Africa

    I can do all things through Christ who strengthens me. - Holy Bible
    I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)

  • Mike C - Wednesday, February 27, 2008 2:22 PM

    michael.rosquist (2/27/2008)


    Identity is evil.Using identity as PK is bad.Having a table like this:
    CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL)
    If I were Joe Celko I would ask you how do you prevent your table from containing CountryID CountryName 1 Canada 2 Canada 3 Sweden 4 Sweden etc If you want an incremental counter use:SELECT @nextid = coalesce(MAX(id), 0) + FROM tbl (UPDLOCK)This must be done in a transaction to prevent two processes gettingthe same ID./m

    If I were anyone but Joe Celko I would say you can easily solve this problem with:CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE);Oh wait a minute, I'm not Joe Celko!

    I don't like that because it claims the surrogate is the primary key, when it quite clearly isn't - at least assuming this table is a simple lookup for converting long primary key to short surrogate.  And I like it even less because it uses a 31 bit surrogate (31 not 32 because the defaults for the IDENTITY keyword are crazy) where 15 bits is more than enough (if one restricts "countries" to countries which are either UN members or recognised by the UN as no-member observer states 8 bits is plenty, so that for example "Isle of Man" is excluded; but if one includes countries like IoM, and also countries like England, Wales, and Scotland, and so on throughout the world, 8 bits is maybe not enough).  So I would would write it
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE,
        CountryName VARCHAR(50) PRIMARY KEY
    );

    In the unlikely event that I would often want to transalate a consecutive bunch of surrogates to real primaries I would change that to
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE CLUSTERED,
        CountryName VARCHAR(50) PRIMARY KEY NONCLUSTERED
    );

    Tom

  • Jack Corbett - Thursday, February 28, 2008 6:02 AM

    Manie,You need to supply a lock hint when selecting from the invoice number:select @nextinvoice = [next invoice number] from company with(updlock)

    That works, but I've discovered that people find it confusing.  So I prefer using
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
        select @nextinvoice = [next invoice number] from company ;
        update company set [next invoice number] = isnull([next invoice number],0)+1 ;
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    (assuming that the code uses the usual default isolation level).

    Actually this question made me think of the many times I've come across where problems were caused by using the default isolation level, READ COMMITTED, which doesn't guarantee transactional consistency (and neither does REPEATABLE READ) and people tried to fix it by using table hints all over the place so that it was impossible to see what was going on.  Most of the time the default isolation level is OK because your transactions don't do things that break it, and most of the rest of the time REPEATABLR READ is adequate, but there are plenty of caseslike this where you need to change the isolation level now and again.

    Tom

  • TomThomson - Monday, June 5, 2017 6:27 PM

    Mike C - Wednesday, February 27, 2008 2:22 PM

    michael.rosquist (2/27/2008)


    Identity is evil.Using identity as PK is bad.Having a table like this:
    CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL)
    If I were Joe Celko I would ask you how do you prevent your table from containing CountryID CountryName 1 Canada 2 Canada 3 Sweden 4 Sweden etc If you want an incremental counter use:SELECT @nextid = coalesce(MAX(id), 0) + FROM tbl (UPDLOCK)This must be done in a transaction to prevent two processes gettingthe same ID./m

    If I were anyone but Joe Celko I would say you can easily solve this problem with:CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE);Oh wait a minute, I'm not Joe Celko!

    I don't like that because it claims the surrogate is the primary key, when it quite clearly isn't - at least assuming this table is a simple lookup for converting long primary key to short surrogate.  And I like it even less because it uses a 31 bit surrogate (31 not 32 because the defaults for the IDENTITY keyword are crazy) where 15 bits is more than enough (if one restricts "countries" to countries which are either UN members or recognised by the UN as no-member observer states 8 bits is plenty, so that for example "Isle of Man" is excluded; but if one includes countries like IoM, and also countries like England, Wales, and Scotland, and so on throughout the world, 8 bits is maybe not enough).  So I would would write it
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE,
        CountryName VARCHAR(50) PRIMARY KEY
    );

    In the unlikely event that I would often want to transalate a consecutive bunch of surrogates to real primaries I would change that to
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE CLUSTERED,
        CountryName VARCHAR(50) PRIMARY KEY NONCLUSTERED
    );

    This is a blast from the past!

    So let's look at this logically. WTH is a "primary key"? Quite simply it is a key (also known as a "candidate key") which is a column, or set of columns (note that they are not technically a "set", since order *does* matter at creation time, though you can use them out of order in join conditions and projections) that uniquely identifies rows in your table.

    So what's the difference between a "primary key" and any other "candidate key" on your table? Here it is in a nutshell:

    1) A primary key is always non-nullable; any other candidate key can be declared as nullable or non-nullable
    2) A primary key is declared using the PRIMARY KEY clause; any other candidate key is declared using the UNIQUE constraint clause.
    3) A table can have only one PRIMARY KEY; tables can have multiple candidate keys
    4) A primary key, if declared on a table with no clustered index, will default to be your clustered index; if the table is already a clustered index the primary key will default to nonclustered. A unique constraint always defaults to nonclustered unless you specify the CLUSTERED keyword, and the table has no clustered index.

    BTW, I realize that #4 is a LOT to remember, so your best bet is to ALWAYS specify the CLUSTERED or NONCLUSTERED keyword every time you create a primary key or unique constraint -- and you never have to worry about all those crazy rules. Another thing I would recommend is always naming your primary key and unique constraints. Give them short but meaningful names and it helps to follow a pattern. For primary keys, I usually go with PK_TableName, for instance.

    Now, whether you define a key as a primary key or a non-nullable unique constraint just doesn't matter. Other than the items above, special handling by a few external third-party tools, and some people's odd misperceptions, it just doesn't matter. Here's what you need to know about the code you wrote above:

    1) If you use an IDENTITY column, it is a "surrogate key". What this means is it is for internal use only. You should NEVER expose a surrogate key to an end user, nor to an external system. I have no doubt that there will be some people who read this and will want to argue the point, but I will make it really simple: the values in a surrogate key are subject to change on a whim. If your users depend on the identity value "1" to be "Canada" and you hardwire the value "2" for "Zimbabwe" into your client code, as soon as you rebuild your database later you will potentially throw all your dependent systems out of whack and piss off your users.

    2) If you use an IDENTITY column do not rely on monotonically increasing numbers with no gaps. You can, and eventually will, have gaps in the numbers. IDENTITY columns do not guarantee there will be no gaps between 1 and 100. What they do guarantee is that every row will get a unique value that increases on each INSERT.

    3) If you use a surrogate key, such as an IDENTITY column, you should always, always, always, have a candidate key declared with a unique constraint that defines the "business key". The business key is the set of columns containing business data (i.e., data that is relevant to your business users, systems and processes) that uniquely identifies each row. In your example above, the IDENTITY column you created is an internal surrogate key and the CountryName is the business key in this scenario.

    So why do you even want a surrogate key? Simplicity and efficiency. The database engine can compare two integer values much faster than it can compare two strings (also I would suggest NVARCHAR instead of VARCHAR in your example). Multiply that by 1,000,000 comparisons in a join and you have a significant performance difference. It also tends to make queries much simpler to write when you join on a single column, as opposed to 4, 5 or 6+ columns.

    Now, all that said, Countries are a HORRIBLE example for any of this. There is a freely available ISO country code list which provides full country names, unique two character and three character country codes *and* 3 digit numeric country codes. Every country has their own set of internationally recognized codes, including a numeric code, so a contrived IDENTITY column is completely unnecessary.

    So at the end of the day, whether you want to declare a surrogate key as your primary key or just throw a unique constraint on it is a matter of 6 in one hand and half a dozen in the other.

    A far more interesting and consequential conversation is where to put your clustered index.

  • Mike C - Monday, June 5, 2017 11:01 PM

    TomThomson - Monday, June 5, 2017 6:27 PM

    Mike C - Wednesday, February 27, 2008 2:22 PM

    michael.rosquist (2/27/2008)


    Identity is evil.Using identity as PK is bad.Having a table like this:
    CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL)
    If I were Joe Celko I would ask you how do you prevent your table from containing CountryID CountryName 1 Canada 2 Canada 3 Sweden 4 Sweden etc If you want an incremental counter use:SELECT @nextid = coalesce(MAX(id), 0) + FROM tbl (UPDLOCK)This must be done in a transaction to prevent two processes gettingthe same ID./m

    If I were anyone but Joe Celko I would say you can easily solve this problem with:CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE);Oh wait a minute, I'm not Joe Celko!

    I don't like that because it claims the surrogate is the primary key, when it quite clearly isn't - at least assuming this table is a simple lookup for converting long primary key to short surrogate.  And I like it even less because it uses a 31 bit surrogate (31 not 32 because the defaults for the IDENTITY keyword are crazy) where 15 bits is more than enough (if one restricts "countries" to countries which are either UN members or recognised by the UN as no-member observer states 8 bits is plenty, so that for example "Isle of Man" is excluded; but if one includes countries like IoM, and also countries like England, Wales, and Scotland, and so on throughout the world, 8 bits is maybe not enough).  So I would would write it
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE,
        CountryName VARCHAR(50) PRIMARY KEY
    );

    In the unlikely event that I would often want to transalate a consecutive bunch of surrogates to real primaries I would change that to
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE CLUSTERED,
        CountryName VARCHAR(50) PRIMARY KEY NONCLUSTERED
    );

    This is a blast from the past!

    So let's look at this logically. WTH is a "primary key"? Quite simply it is a key (also known as a "candidate key") which is a column, or set of columns (note that they are not technically a "set", since order *does* matter at creation time, though you can use them out of order in join conditions and projections) that uniquely identifies rows in your table.

    So what's the difference between a "primary key" and any other "candidate key" on your table? Here it is in a nutshell:

    1) A primary key is always non-nullable; any other candidate key can be declared as nullable or non-nullable
    2) A primary key is declared using the PRIMARY KEY clause; any other candidate key is declared using the UNIQUE constraint clause.
    3) A table can have only one PRIMARY KEY; tables can have multiple candidate keys
    4) A primary key, if declared on a table with no clustered index, will default to be your clustered index; if the table is already a clustered index the primary key will default to nonclustered. A unique constraint always defaults to nonclustered unless you specify the CLUSTERED keyword, and the table has no clustered index.

    BTW, I realize that #4 is a LOT to remember, so your best bet is to ALWAYS specify the CLUSTERED or NONCLUSTERED keyword every time you create a primary key or unique constraint -- and you never have to worry about all those crazy rules. Another thing I would recommend is always naming your primary key and unique constraints. Give them short but meaningful names and it helps to follow a pattern. For primary keys, I usually go with PK_TableName, for instance.

    Now, whether you define a key as a primary key or a non-nullable unique constraint just doesn't matter. Other than the items above, special handling by a few external third-party tools, and some people's odd misperceptions, it just doesn't matter. Here's what you need to know about the code you wrote above:

    1) If you use an IDENTITY column, it is a "surrogate key". What this means is it is for internal use only. You should NEVER expose a surrogate key to an end user, nor to an external system. I have no doubt that there will be some people who read this and will want to argue the point, but I will make it really simple: the values in a surrogate key are subject to change on a whim. If your users depend on the identity value "1" to be "Canada" and you hardwire the value "2" for "Zimbabwe" into your client code, as soon as you rebuild your database later you will potentially throw all your dependent systems out of whack and piss off your users.

    2) If you use an IDENTITY column do not rely on monotonically increasing numbers with no gaps. You can, and eventually will, have gaps in the numbers. IDENTITY columns do not guarantee there will be no gaps between 1 and 100. What they do guarantee is that every row will get a unique value that increases on each INSERT.

    3) If you use a surrogate key, such as an IDENTITY column, you should always, always, always, have a candidate key declared with a unique constraint that defines the "business key". The business key is the set of columns containing business data (i.e., data that is relevant to your business users, systems and processes) that uniquely identifies each row. In your example above, the IDENTITY column you created is an internal surrogate key and the CountryName is the business key in this scenario.

    So why do you even want a surrogate key? Simplicity and efficiency. The database engine can compare two integer values much faster than it can compare two strings (also I would suggest NVARCHAR instead of VARCHAR in your example). Multiply that by 1,000,000 comparisons in a join and you have a significant performance difference. It also tends to make queries much simpler to write when you join on a single column, as opposed to 4, 5 or 6+ columns.

    Now, all that said, Countries are a HORRIBLE example for any of this. There is a freely available ISO country code list which provides full country names, unique two character and three character country codes *and* 3 digit numeric country codes. Every country has their own set of internationally recognized codes, including a numeric code, so a contrived IDENTITY column is completely unnecessary.

    So at the end of the day, whether you want to declare a surrogate key as your primary key or just throw a unique constraint on it is a matter of 6 in one hand and half a dozen in the other.

    A far more interesting and consequential conversation is where to put your clustered index.

    Your point 2 is just plain wrong.    An identity column doesn't guarantee unique values. Neither does it guarantee increasing values (indeed one of the things one can specify for an identity column is decreasing values).  Someone messing around with with identity insert or dbcc checkident can easily create duplicates and/or make the order go all over the place.  I've been known to put a UNIQUE constraint on identity columns to make it a bit more difficult for anyone to screw things up like that.  If you've never had to clear up the mess after someone with DBA privileges did something stupid you can perhaps be excused for thinking identity has the magic properties you suggested, and you can certainly be recognised as very lucky.

    When responding to a comment which explicitly used VARCHAR(50) as the type for the primary key I chose not to change the type because that would just muddy the waters.  Since most of my database experience has been with projects involving multiple languages, I rarely choose VARCHAR myself. 

    ISO 3166-1 does not contain codes for all countries.  It recognises only those countries which are one (or more) or UN members, members of at least one UN Special Agency, or direct parties to the statue of the international court of justice, so some countries are omitted.  Also it is far from being a stable standard - during the 16 years after its first release after its first release there were 28 newsletters describing changes, 1 technical corrigendum, and 2 new editions; since then the changes (continuing at about the same rate) are no longer released, just published in the ISO on-line catalog. I don't think data that changes about twice per year is a good choice for a primary key and it's a terrible choice for a surrogate key. 
    Adding ISO-3166-2 possibly does include all countries, but the format of the ISO 3166-2 codes is different for each country and the codes may be alphabetic, numeric, or alphanumeric, and they may also be of constant or variable length.  So only a lunatic would use that for surrogate keys, and of course the instability of the codes for the countries included in ISO 3166-1 still has to be handled (as those countries don't have ISO-3166-1 codes) so it's still unsuitable for a primary key.  There's an issue with historical data too - probably one would have to include codes from ISO 3166-3 if one has sales records for customers who were in countries that no longer exist in ISO 3166-1 or ISO-3166-2.
    So rather than countries being a horrible example of looking for surrogate keys because the ISO Standard exists, the ISO country codes standard is a horrible example of what not to use as a  primary key or as a surrogate key.

    When I have a table with two unique keys, one of which is available to people (as opposed to the system internals) to use as the key denoting the row they want and the other isn't, I will insist on calling the externally available key the primary key.   That way there is perhaps less risk of a surrogate being made externally visible by someone who doesn't realise it is a surrogate (or thinks that surrogates can be external).  So I don't regard the distinction between primary and surrogate as irrelevant, as you appear to.  It's probably irrelevant in a lot of tables, though, since many tables won't be visible externally (ie they won't be entry points to the data).

    Tom

  • TomThomson - Tuesday, June 6, 2017 4:47 AM

    Mike C - Monday, June 5, 2017 11:01 PM

    TomThomson - Monday, June 5, 2017 6:27 PM

    Mike C - Wednesday, February 27, 2008 2:22 PM

    michael.rosquist (2/27/2008)


    Identity is evil.Using identity as PK is bad.Having a table like this:
    CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL)
    If I were Joe Celko I would ask you how do you prevent your table from containing CountryID CountryName 1 Canada 2 Canada 3 Sweden 4 Sweden etc If you want an incremental counter use:SELECT @nextid = coalesce(MAX(id), 0) + FROM tbl (UPDLOCK)This must be done in a transaction to prevent two processes gettingthe same ID./m

    If I were anyone but Joe Celko I would say you can easily solve this problem with:CREATE TABLE Countries (CountryID INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE);Oh wait a minute, I'm not Joe Celko!

    I don't like that because it claims the surrogate is the primary key, when it quite clearly isn't - at least assuming this table is a simple lookup for converting long primary key to short surrogate.  And I like it even less because it uses a 31 bit surrogate (31 not 32 because the defaults for the IDENTITY keyword are crazy) where 15 bits is more than enough (if one restricts "countries" to countries which are either UN members or recognised by the UN as no-member observer states 8 bits is plenty, so that for example "Isle of Man" is excluded; but if one includes countries like IoM, and also countries like England, Wales, and Scotland, and so on throughout the world, 8 bits is maybe not enough).  So I would would write it
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE,
        CountryName VARCHAR(50) PRIMARY KEY
    );

    In the unlikely event that I would often want to transalate a consecutive bunch of surrogates to real primaries I would change that to
    CREATE TABLE Countries (
        CountryID SMALLINT IDENTITY UNIQUE CLUSTERED,
        CountryName VARCHAR(50) PRIMARY KEY NONCLUSTERED
    );

    This is a blast from the past!

    So let's look at this logically. WTH is a "primary key"? Quite simply it is a key (also known as a "candidate key") which is a column, or set of columns (note that they are not technically a "set", since order *does* matter at creation time, though you can use them out of order in join conditions and projections) that uniquely identifies rows in your table.

    So what's the difference between a "primary key" and any other "candidate key" on your table? Here it is in a nutshell:

    1) A primary key is always non-nullable; any other candidate key can be declared as nullable or non-nullable
    2) A primary key is declared using the PRIMARY KEY clause; any other candidate key is declared using the UNIQUE constraint clause.
    3) A table can have only one PRIMARY KEY; tables can have multiple candidate keys
    4) A primary key, if declared on a table with no clustered index, will default to be your clustered index; if the table is already a clustered index the primary key will default to nonclustered. A unique constraint always defaults to nonclustered unless you specify the CLUSTERED keyword, and the table has no clustered index.

    BTW, I realize that #4 is a LOT to remember, so your best bet is to ALWAYS specify the CLUSTERED or NONCLUSTERED keyword every time you create a primary key or unique constraint -- and you never have to worry about all those crazy rules. Another thing I would recommend is always naming your primary key and unique constraints. Give them short but meaningful names and it helps to follow a pattern. For primary keys, I usually go with PK_TableName, for instance.

    Now, whether you define a key as a primary key or a non-nullable unique constraint just doesn't matter. Other than the items above, special handling by a few external third-party tools, and some people's odd misperceptions, it just doesn't matter. Here's what you need to know about the code you wrote above:

    1) If you use an IDENTITY column, it is a "surrogate key". What this means is it is for internal use only. You should NEVER expose a surrogate key to an end user, nor to an external system. I have no doubt that there will be some people who read this and will want to argue the point, but I will make it really simple: the values in a surrogate key are subject to change on a whim. If your users depend on the identity value "1" to be "Canada" and you hardwire the value "2" for "Zimbabwe" into your client code, as soon as you rebuild your database later you will potentially throw all your dependent systems out of whack and piss off your users.

    2) If you use an IDENTITY column do not rely on monotonically increasing numbers with no gaps. You can, and eventually will, have gaps in the numbers. IDENTITY columns do not guarantee there will be no gaps between 1 and 100. What they do guarantee is that every row will get a unique value that increases on each INSERT.

    3) If you use a surrogate key, such as an IDENTITY column, you should always, always, always, have a candidate key declared with a unique constraint that defines the "business key". The business key is the set of columns containing business data (i.e., data that is relevant to your business users, systems and processes) that uniquely identifies each row. In your example above, the IDENTITY column you created is an internal surrogate key and the CountryName is the business key in this scenario.

    So why do you even want a surrogate key? Simplicity and efficiency. The database engine can compare two integer values much faster than it can compare two strings (also I would suggest NVARCHAR instead of VARCHAR in your example). Multiply that by 1,000,000 comparisons in a join and you have a significant performance difference. It also tends to make queries much simpler to write when you join on a single column, as opposed to 4, 5 or 6+ columns.

    Now, all that said, Countries are a HORRIBLE example for any of this. There is a freely available ISO country code list which provides full country names, unique two character and three character country codes *and* 3 digit numeric country codes. Every country has their own set of internationally recognized codes, including a numeric code, so a contrived IDENTITY column is completely unnecessary.

    So at the end of the day, whether you want to declare a surrogate key as your primary key or just throw a unique constraint on it is a matter of 6 in one hand and half a dozen in the other.

    A far more interesting and consequential conversation is where to put your clustered index.

    Your point 2 is just plain wrong.    An identity column doesn't guarantee unique values. Neither does it guarantee increasing values (indeed one of the things one can specify for an identity column is decreasing values).  Someone messing around with with identity insert or dbcc checkident can easily create duplicates and/or make the order go all over the place.  I've been known to put a UNIQUE constraint on identity columns to make it a bit more difficult for anyone to screw things up like that.  If you've never had to clear up the mess after someone with DBA privileges did something stupid you can perhaps be excused for thinking identity has the magic properties you suggested, and you can certainly be recognised as very lucky.

    When responding to a comment which explicitly used VARCHAR(50) as the type for the primary key I chose not to change the type because that would just muddy the waters.  Since most of my database experience has been with projects involving multiple languages, I rarely choose VARCHAR myself. 

    ISO 3166-1 does not contain codes for all countries.  It recognises only those countries which are one (or more) or UN members, members of at least one UN Special Agency, or direct parties to the statue of the international court of justice, so some countries are omitted.  Also it is far from being a stable standard - during the 16 years after its first release after its first release there were 28 newsletters describing changes, 1 technical corrigendum, and 2 new editions; since then the changes (continuing at about the same rate) are no longer released, just published in the ISO on-line catalog. I don't think data that changes about twice per year is a good choice for a primary key and it's a terrible choice for a surrogate key. 
    Adding ISO-3166-2 possibly does include all countries, but the format of the ISO 3166-2 codes is different for each country and the codes may be alphabetic, numeric, or alphanumeric, and they may also be of constant or variable length.  So only a lunatic would use that for surrogate keys, and of course the instability of the codes for the countries included in ISO 3166-1 still has to be handled (as those countries don't have ISO-3166-1 codes) so it's still unsuitable for a primary key.  There's an issue with historical data too - probably one would have to include codes from ISO 3166-3 if one has sales records for customers who were in countries that no longer exist in ISO 3166-1 or ISO-3166-2.
    So rather than countries being a horrible example of looking for surrogate keys because the ISO Standard exists, the ISO country codes standard is a horrible example of what not to use as a  primary key or as a surrogate key.

    When I have a table with two unique keys, one of which is available to people (as opposed to the system internals) to use as the key denoting the row they want and the other isn't, I will insist on calling the externally available key the primary key.   That way there is perhaps less risk of a surrogate being made externally visible by someone who doesn't realise it is a surrogate (or thinks that surrogates can be external).  So I don't regard the distinction between primary and surrogate as irrelevant, as you appear to.  It's probably irrelevant in a lot of tables, though, since many tables won't be visible externally (ie they won't be entry points to the data).

    You're right, I should have explicitly stated that "as long as you don't forcibly override the IDENTITY column and explicitly shove duplicate values in the IDENTITY column -- that is, if you use it as it was intended -- it gives you unique values." And yes, you can specify increasing or decreasing values. And yes, you can and will eventually get GAPS in between the entries.

    And yes as East and West Germany combine to form the single country of Germany you get changes to standards. You can probably expect change in ANY standard in which the underlying attributes are SUBJECT TO CHANGE.

    If you want to declare a dogmatic distinction between a PRIMARY KEY and a NON-NULL UNIQUE CONSTRAINT, that's your First Amendment Right.

    However, it's a DISTINCTION WITHOUT A DIFFERENCE, and quite honestly, a waste of time.

Viewing 10 posts - 121 through 130 (of 130 total)

You must be logged in to reply to this topic. Login to reply