﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Editorials / SQLServerCentral.com  / Overusing Identities / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 19 May 2013 11:15:59 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>So, in debugging a report to find a needle in a haystack last week, I was reminded of this topic/thread. One of the LOB apps our PUD uses tracks Utility Poles, and handles Billing to companies who attach their items to the Poles (fiber, tv cable, etc). Unfortunately, the app was written using all surrogate keys, and especially troublesome is that all their attribute (lookup) tables also use surrogate keys (they all start with 'LU' below). So, to trace all the variable dependencies, it required queries to join against all the surrogates for the attribute tables such as...[code="sql"]SELECT   ...FROM  (((((((((Attachment 	INNER JOIN AttachmentSpace ON Attachment.AttachmentSpaceID=AttachmentSpace.AttachmentSpaceID) 	INNER JOIN LUAttachmentType ON Attachment.LUAttachmentTypeID=LUAttachmentType.LUAttachmentTypeID) 	INNER JOIN Permit ON Attachment.PermitID=Permit.PermitID) 	INNER JOIN Pole ON AttachmentSpace.PoleID=Pole.PoleID) 	INNER JOIN LUAttachmentSpaceSupportEquipType ON AttachmentSpace.LUAttachmentSpaceSupportEquipTypeID    =LUAttachmentSpaceSupportEquipType.LUAttachmentSpaceSupportEquipTypeID) 	INNER JOIN LUUnitOfMeasure ON AttachmentSpace.LUUnitOfMeasureID=LUUnitOfMeasure.LUUnitOfMeasureID) 	INNER JOIN ResourceLocation ON Pole.ResourceLocationID=ResourceLocation.ResourceLocationID) 	INNER JOIN LURegion ON ResourceLocation.LURegionID=LURegion.LURegionID) 	INNER JOIN LUSubRegion ON ResourceLocation.LUSubRegionID=LUSubRegion.LUSubRegionID) 	INNER JOIN Agreement ON Permit.AgreementID=Agreement.AgreementIDWHERE   ...[/code]...which isn't terribly inconvenient to trace/join/expose for a single view or query, but we were looking for a needle in a haystack (lets just say that this app is short on constraints at the dbms level, supposedly enforced at the UI level, but not so much :crazy: )  For every query we built, we had to create otherwise redundant joins to get at the variable data.Because of the numerous ways we were going to have to slice/dice the data to sort through the issue, it was far more efficient to at least arrest the surrogates from the attribute tables, and refactor the entity tables to keep us from going blind. On a separate copy and a bit of quick work recasting the data/modifying the schema, the above was reduced to...[code]SELECT ...FROM  ((((Attachment 	INNER JOIN AttachmentSpace ON Attachment.AttachmentSpaceID=AttachmentSpace.AttachmentSpaceID) 	INNER JOIN Permit ON Attachment.PermitID=Permit.PermitID) 	INNER JOIN Pole ON AttachmentSpace.PoleID=Pole.PoleID) 	INNER JOIN ResourceLocation ON Pole.ResourceLocationID=ResourceLocation.ResourceLocationID) 	INNER JOIN Agreement ON Permit.AgreementID=Agreement.AgreementIDWHERE  ...[/code]...which only hints at the amount of time it saved and the simplicity it added... far more than just reducing it from 10 JOINS to 5.As to the argument for the possibility of 'changing attribute values'... it's moot. Cascade updates on attribute tables such as these have a depth of one, hence are not recursive, nor overly taxing on the system when they are modified, which is rare.Simplicity is king.AAR, JM2c...  :cool: </description><pubDate>Thu, 01 Apr 2010 11:32:31 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>As usual, it doesn't matter what guidelines and best practices are it is the informed decision that is required to see where and when the application is suitable and, more crucially, when applying best practice is unsuitable.A case of man over machine (or rather man over man's current ability to automate machines).</description><pubDate>Fri, 12 Mar 2010 03:37:06 GMT</pubDate><dc:creator>Gary Varga</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Jeff Moden (3/10/2010)[/b][hr]I agree with that... reporting tables are typically denormalized especially when rebuilt on a regular basis.[/quote]I don't think David was suggesting denormalization. He was describing the benefits of using a business key as foreign key instead of a surrogate. What you use for foreign keys has nothing to do with normalization / denormalization.I would expect that if the table's contents has to be rebuilt on a regular basis then it is better [i]not[/i] to denormalize because that will normally be more expensive for data loads (because you have to generate the redundant data each time).</description><pubDate>Wed, 10 Mar 2010 16:08:17 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>I agree with that... reporting tables are typically denormalized especially when rebuilt on a regular basis.So far as the 80 joins go, did you need all 80 joins for every report?</description><pubDate>Wed, 10 Mar 2010 15:50:11 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>I inherited a system where the database designer believed that identity columns should be used as the primary key because joins were faster that way.This particular database design was a poster child for a good idea totally mis-applied.The database was an operational data store for reporting.  Lots of data.  Lots of tables.  A given table might have a hundred columns in it, eighty of which might be codes.  That's what the source system was doing, and that's what was required by the customer in this database.So, each of those 80 codes in the table was turned into a numeric fk column pointing to the corresponding code table pk column.  Each code table had an alphanumeric code column too.Of course, being a reporting system, the users didn't want to see 80 columns of meaningless id columns.  They wanted to see the code values.So, 500,000 data rows in the data table.  80 joins per row.  That's 500,000 * 80 = 40,000,000 joins when querying that data table, all to show rows with the 80 alpha codes in it that we started with as part of the initial data load.  The users didn't need to see the code descriptions, they were long-time military or retired military enlisted staff that could recite the codes and their meanings in their sleep.  Each of those 40,000,000 joins was individually faster than it would be with an alphanumeric code as the fk; but none of those 40,000,000 joins needed to be done at all with the alphanumeric fks! Had the original reporting table just used the 80 alpha codes as fk columns, the system would have been awesomely simpler to code and run way faster.  I know, because that's what I changed it to do.</description><pubDate>Wed, 10 Mar 2010 11:52:52 GMT</pubDate><dc:creator>david_wendelken</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]roger.plowman (3/3/2010)[/b][hr]From an implementation standpoint, using SQL Server, their rigourous approach will produce sub-optimal results. Page fragmentation being the most glaring, along with the performance hit of having to constantly shuffle a clustered index. The whole point of clustering is *speed*, and it's always going to be faster to append a record than insert it.[/quote]I'm not sure what "rigorous" approach you are referring to but I said nothing at all about clustered indexes. I have been talking about candidate keys but I would hope you know that keys and clustered indexes are different and unrelated things. I don't believe I have said anything that will produce sub-optimal results. Quite the reverse.</description><pubDate>Wed, 03 Mar 2010 12:47:37 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]roger.plowman (3/3/2010)[/b][hr]David and Len ([i]Les actually[/i] :cool:) seem to be from a mathematics background. From that perspective their points about the relational model are entirely valid.[/quote]Well, initially music on my part, but switched to CS (BIT, MIT, and working on DCS-DSS at present...only took me 30yrs in the trenches to decide what I wanted to be). Still, music and mathematics are so intertwined as to be transparent, and set theory, well... kissing cousins with RA.[quote]However. :)From an implementation standpoint, using SQL Server, their rigourous approach will produce sub-optimal results. Page fragmentation being the most glaring, along with the performance hit of having to constantly shuffle a clustered index. The whole point of clustering is *speed*, and it's always going to be faster to append a record than insert it. [/quote]Oh contraire... great book I often refer to... "Database Tuning" by Dennis Shasha... a must read for any aspiring DBA. [quote]In the same vein using a 4 byte key (assuming you can tolerate the 2GB limit of int32 (assuming positive only keys) is the clear winner in both space-consumption and processing speed over some multi-field of strings.[/quote]Clearly, long strings rarely make good candidate keys, but not necessarily because of length (hashing does wonders), but usually because they simply are not good candidates and are seldom immutable. In many cases, UID's are the only realistic option, short of stamping a class of such objects with a VIN, or inserting/implanting an RFID  :w00t:[quote]Theory is one thing, but reality always applies additional constraints. :)[/quote]Without a doubt...</description><pubDate>Wed, 03 Mar 2010 09:32:40 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]David Portas (3/2/2010)[/b][hr][quote][b]RalphWilson (3/2/2010)[/b][hr][quote]If you just arbitrarily do things like using IDENTITY_INSERT or reseed the Identity column, then, IMHO, you [u]deserve[/u] to have problems.  If you don't screw around with the Identity column, then it won't have those problems. ;-)[/quote]You mean you've never done those things? And you've never written buggy code either? In Murphy's world, if you have an IDENTITY column without a unique constraint then you can expect duplicate values sooner or later. IDENTITY is merely a means of generating a default value, not a way to guarantee uniqueness.[/quote]Dunno about Ralph, but *I* never ever do those things. Buggy code won't change identity column seeds, and the only time to use Identity_Insert is when you're porting legacy data to a new system--and it's dead simple to compare the old columns against the new ones to make sure no shenanigans occurred. Oh, and also? Unique constraints on an identity column go without saying. There would be no purpose to an identity column that wasn't used as the PK.David and Len seem to be from a mathematics background. From that perspective their points about the relational model are entirely valid.However. :)From an implementation standpoint, using SQL Server, their rigourous approach will produce sub-optimal results. Page fragmentation being the most glaring, along with the performance hit of having to constantly shuffle a clustered index. The whole point of clustering is *speed*, and it's always going to be faster to append a record than insert it. In the same vein using a 4 byte key (assuming you can tolerate the 2GB limit of int32 (assuming positive only keys) is the clear winner in both space-consumption and processing speed over some multi-field of strings.Likewise, from an *implementation* perspective the PK is (or should be) application transparent, unchanging, and unique. It's only purpose is to link rows in different tables. As a *link* you want the above characteristics. Now, for human retrieval purposes obviously you want some kind of natural key, preferably unique. Any given table can of course have multiple alternate keys that uniquely identify the record, but these keys will usually be mutable and random (meaning you don't want them clustered), large (meaning you don't want them as FK's) and not necessarily unique (meaning they can't be uniquely constrained).Theory is one thing, but reality always applies additional constraints. :)</description><pubDate>Wed, 03 Mar 2010 07:21:21 GMT</pubDate><dc:creator>roger.plowman</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>As for the email address one, I (and a lot of other people) got dumped by an ex-free provider without any warning a few years ago. Instantly no access to the email account. Suddenly, I am registered at sites and identified by an email address that must change.This caused me a load of headaches. Some for those maintaining the websites. Some places were more helpful than others and some systems made it easier for me or their administrators than others.Either way, I think using a natural key as a PK when it is reasonable to expect it's value to change is not a good implementation. Suddenly, you either have to duplicate the natural key or lose that value in time where it might be appropriate to keep for auditing purposes - I guess it may only need to be duplicated where the historical value is required.PS Thanks again Steve. (Yep, this was one of the helpful places!!!)</description><pubDate>Tue, 02 Mar 2010 23:15:50 GMT</pubDate><dc:creator>Gary Varga</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]RalphWilson (3/2/2010)[/b][hr][quote]If you just arbitrarily do things like using IDENTITY_INSERT or reseed the Identity column, then, IMHO, you [u]deserve[/u] to have problems.  If you don't screw around with the Identity column, then it won't have those problems. ;-)[/quote]You mean you've never done those things? And you've never written buggy code either? In Murphy's world, if you have an IDENTITY column without a unique constraint then you can expect duplicate values sooner or later. IDENTITY is merely a means of generating a default value, not a way to guarantee uniqueness.</description><pubDate>Tue, 02 Mar 2010 17:21:39 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote]Not if you reseed the column or use IDENTITY_INSERT to insert values. If an IDENTITY column doesn't have a unique constraint or index then it may not be unique. [/quote]As my granddaddy used to say,[quote]You can make things fool-proof but you can't make 'em dang-fool proof . . . some dangeed fool will work at finding a way to break anything you make.[/quote]If you just arbitrarily do things like using IDENTITY_INSERT or reseed the Identity column, then, IMHO, you [u]deserve[/u] to have problems.  If you don't screw around with the Identity column, then it won't have those problems. ;-)</description><pubDate>Tue, 02 Mar 2010 17:09:22 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote]However, from a design perspective, using natural keys, superkeys, and abstract keys derived from natural keys can produce powerful design benefits.[/quote]This is true, from a [u]design[/u] perspective.  However, the question arises, is this still true from an [u]implementation[/u] perspective.  For instance, I encountered a very well designed database that had made extensive use of PK-FK relationships to maintain data integrity.  It had also made very extensive use of GUIDs for the PK's in manyy instances and, in others, the designer had chosen to use "Natural Keys" such as the 3 line address, city, state/province (fully spelled out), 9 characters for the Zip/Postal code, and the GUID for the country . . . resulting in a total of something like 275 characters for the PK.  In the data model and database diagrams it looked great and, in the test environment, it worked slicker than goose grease on a plastic floor.  However, there were some, uh, _issues_ when it went into production.Years ago, I learned the hard way that "smart keys" (as we [u]used[/u] to call "natural keys" ;-) tend to look great in theory but become less "smart" in practice.  However, I also learned that those who favor Natural Keys will neither sway to their way of thinking nor be swayed away from their way of thinking by those who do not . . . and vice versa. ;-)  However, based on the fact that I have at least 9 different email addresses and at least 4 different accounts at each of EBay, Amazon, and a couple of other sites based on those email addresses.  I would contend that your theory that an email address is a good Natural Key candidate for a PK on enay of those sites may have a flaw in it.  If you want to find out my current address, which of my accounts would you bring up?  If you want to track my purchases, which of my email addresses would you use?  (Whic, by the way, is part of why those accounts exist, although another part is that at various times one or another of the accounts got locked out because of site issues. ;-)I end to favor the Identity columns for PK's and other columns or composites for Unique keys/indexes.  This has proven useful in my life.  Some favor Natural Keys as PKs and abhor Identity columns unless absolutely necessary and that has usually proven useful in [u]their[/u] lives.  As a UK friend of mine puts it, "Horses for courses." ;-)Anyone want discuss the allowance of NULLs? ;-)or, to quote from "War Games":"How about a nice game of chess?"</description><pubDate>Tue, 02 Mar 2010 16:26:35 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Les Cardwell (3/2/2010)[/b][hr]In context, it was in reponse to Steve's posit...[quote]...In the paper world, when someone generates a duplicate PO number it may or may not get corrected, and may or may not cause issues. In a database, it's a bigger issue as we try to build rules for how the computer manages data..."[/quote]Tersely speaking, the relational model does not distinguish between primary keys and other candidate keys as expressed through relational calculus/relational algebra, except that one candidate key is chosen as the Primary Key. However, from a design perspective, using natural keys, superkeys, and abstract keys derived from natural keys can produce powerful design benefits. If so inclined, read the 'Case Study'... [url=http://sites.google.com/a/whiteboxinc.com/www/anf.pdf]http://sites.google.com/a/whiteboxinc.com/www/anf.pdf [/url] ...which was an actual implementation of an ERP solution which allowed for stripping thousands of lines of transactional code through simplification and abstraction of PK/FK constructs. At least a good cure for insomnia :doze:[/quote]Again, as long as you can guarantee that whatever you choose to enforce relationships doesn't keep changing at the drop of a hat, sure - you can get some benefit from natural keys.  That said -continually having to maintain foreign keys because your natural key keeps changing faster that a ceiling fan can spin means you have a design problem.Depending on how much data you have, having your primary keys change could become a HUGE nightmare  (at a previous gig we spent most of a year ripping out a government-issued physician identifier touted to "never change", spread over 17Million patient records, 20 or so years of care, etc...).  If you're a fan of Murphy - the data guaranteed to NEVER change...will.Just be careful with the ideal data model, lest reality come crashing your house down.</description><pubDate>Tue, 02 Mar 2010 15:55:06 GMT</pubDate><dc:creator>Matt Miller (#4)</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Steve Cullen (3/2/2010)[/b][hr]An email address is a mutable attribute.  As such it would make a terrible primary key for a database.Just because you use a email address to login to Amazon or anywhere else, doesn't mean it's a primary key.[/quote]Les said it before I could. There is no difference between a primary key and a candidate key. They mean the same thing. The concept of designating a primary key at all is an entirely arbitrary distinction of no logical or practical significance.As I said in my first post, what really matters is that you have whatever candidate keys are required for the integrity of your data. On that basis an email address makes a fine candidate key if your business requires it to be unique (even though Amazon apparently doesn't!).</description><pubDate>Tue, 02 Mar 2010 13:30:51 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]RalphWilson (3/2/2010)[/b][hr]Actually, A PK (or Primary Key) is the primary means [u]within the database[/u] for identifying the row in question.  Its main use is in PK-FK linkages or in the distinction of a given row from another given row which may be especially important in situations where there is a need to delete or update a given row.  PK's do [u]not[/u] "create business depndencies throughout the domain" but, rather, are [u]used[/u] to [u]enforce[/u] [b]data[/b] dependencies and database intergrity throughout the [u]database[/u].  The persistent concept of the PK [u]having[/u] to have meaning to the business side of the house is, IMHO, misplaced.  The PK needs to have meanng to the [u]database[/u] and, if it has meaning to the business side of the house, then that's "gravy".  [/quote]In context, it was in reponse to Steve's posit...[quote]...In the paper world, when someone generates a duplicate PO number it may or may not get corrected, and may or may not cause issues. In a database, it's a bigger issue as we try to build rules for how the computer manages data..."[/quote]Tersely speaking, the relational model does not distinguish between primary keys and other candidate keys as expressed through relational calculus/relational algebra, except that one candidate key is chosen as the Primary Key. However, from a design perspective, using natural keys, superkeys, and abstract keys derived from natural keys can produce powerful design benefits. If so inclined, read the 'Case Study'... [url=http://sites.google.com/a/whiteboxinc.com/www/anf.pdf]http://sites.google.com/a/whiteboxinc.com/www/anf.pdf [/url] ...which was an actual implementation of an ERP solution which allowed for stripping thousands of lines of transactional code through simplification and abstraction of PK/FK constructs. At least a good cure for insomnia :doze:</description><pubDate>Tue, 02 Mar 2010 12:48:29 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>I'm not sure I like email as a PK either. I might use it as a unique column, and display it, but since it is something that can easily change, it's not a change I want to propagate through my db when it changes. Or worse, when it's reused by the company.Email is a great attribute that you have no control over in your business process. Someone else assigns it, so it's not something I'd pick as the PK.Besides, in which circumstance does having email as a unique column and an identity as a PK cause a problem in your db?</description><pubDate>Tue, 02 Mar 2010 12:38:41 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>An email address is a mutable attribute.  As such it would make a terrible primary key for a database.Just because you use a email address to login to Amazon or anywhere else, doesn't mean it's a primary key.</description><pubDate>Tue, 02 Mar 2010 12:24:21 GMT</pubDate><dc:creator>Steve Cullen</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]RalphWilson (3/2/2010)[/b][hr]Excuse me but what DBMS allows you to create a P that is NO unique and why are you using it?In most DBS's that I have used, a PK, [i]by definition[/i], has to be unique although it may not be used as a [u]clustered[/u] index.Also, [i]by definition[/i], an IDentity column is unique, whether it is used as a PK, a Unique Key, a Unique Index, or just as a column.Now, if you set up an index on your PoleID and create an index on it but do not enforce the uniqueness with the index, that is jsut really sloppy design and implementation.  Even if there are identical PoleID's out there in the field, there has [i]got[/i] to be some other way (such as a county designator along with the PoleID) to get a unique result.  If so, then the index is simply not complete enough to be made unique.[/quote]LOL... ya think?   Most of the db's I come across are... well... crap  :)In that case, they defined the PoleID as a Surrogate/PK/UID, and PoleNumber as just a varchar... not a PK, even though it is the 'natural' PK...and no constraints to make it so. Part of the point is that just because every table has a Surrogate/UID/PK, and joins are instantiated within the dbms (vs ad-hoc), does not make the db 'normal', and in fact, creates far more work than using 'natural' PK's from the db architecture through to the UI...adding significant amounts of complexity that are otherwise not needed to achieve 'normal'.</description><pubDate>Tue, 02 Mar 2010 12:14:17 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Les Cardwell (3/1/2010)[/b][hr][quote][b]Steve Jones - Editor (3/1/2010)[/b][hr]...A PK has 'meaning', and creates business dependencies throughout the domain...Codd is sputtering... :pinch:[/quote]Actually, A PK (or Primary Key) is the primary means [u]within the database[/u] for identifying the row in question.  Its main use is in PK-FK linkages or in the distinction of a given row from another given row which may be especially important in situations where there is a need to delete or update a given row.  PK's do [u]not[/u] "create business depndencies throughout the domain" but, rather, are [u]used[/u] to [u]enforce[/u] [b]data[/b] dependencies and database intergrity throughout the [u]database[/u].  The persistent concept of the PK [u]having[/u] to have meaning to the business side of the house is, IMHO, misplaced.  The PK needs to have meanng to the [u]database[/u] and, if it has meaning to the business side of the house, then that's "gravy".  Now, don't get me wrong, there is a need for some sort of means for the business side of the house to identify specific rows.  However, [u]becuase[/u] the business side of the house is composed of [u]humans[/u], what the busines side considers to be "unique" (e.g. an Invoice Number or a Purchase Order Number or even a VIN) may or may not be unique but is very possible not immutable.  In other words, even if it does happen to be unique it may wind up changing in a given row because of a correction due to a data entry error.That is why I tend to assign Identity columns as the PK and then make other columns either a composite index or an index and, if I need them to be unique, I may append the Identity column if I have any reason to suspect that a column may not remain unique.As for clustering by PK, which is often the default in SQL databases, using an Identity column will ensure that you do not have new rows being inserted between existing rows (unless you are forced to use GUIDs which are, essentially, random numbers).  That means that you won't be having your table periodically resorting or else remaining fragmented.The definition of a clustered index, i.e. it is a [u]physical[/u] sort of the table, means that anything other than a uniformly ascending Key is going to result in either the fragmentation or the restorting of the table.</description><pubDate>Tue, 02 Mar 2010 11:24:57 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]David Portas (3/2/2010)[/b][hr][quote][b]RalphWilson (3/2/2010)[/b][hr][i]An email address is a perfectly reasonable unique identifier provided you have the ability to change it.[/i]David,You have [u][b]got[/b][/u] to be kindding.  Not [u]everyone[/u] has an email address and some people share an email address . . . either of which breaks the "unique identifier" aspect of an email address.[/quote]Amazon successfully uses my email address as a unique identifier. So does MS Passport and so do other web sites I use. The email address uniquely identifies my account because the site won't allow two accounts with the same email address. Whether the address is shared or not is irrelevant. As far as Amazon's business process is concerned the email address still identifies exactly ONE account, which is all that the business process requires.You are correct that an email address does not uniquely identify a [i]person[/i] but that isn't the issue because the business process does not require a person to be identified - it only needs to identify an [i]account[/i]. If it did need to identify people then other attributes would be used as well - such as a name for example.[quote]Also, [i]by definition[/i], an IDentity column is unique, whether it is used as a PK, a Unique Key, a Unique Index, or just as a column.[/quote]Not if you reseed the column or use IDENTITY_INSERT to insert values. If an IDENTITY column doesn't have a unique constraint or index then it may not be unique.[/quote]Just an aside, but Amazon has mucked up my account and I actually have two accounts with the same email address, just different passwords. Evidently, they don't have a unique constraint on their database.</description><pubDate>Tue, 02 Mar 2010 11:21:26 GMT</pubDate><dc:creator>Grant Fritchey</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]RalphWilson (3/2/2010)[/b][hr][i]An email address is a perfectly reasonable unique identifier provided you have the ability to change it.[/i]David,You have [u][b]got[/b][/u] to be kindding.  Not [u]everyone[/u] has an email address and some people share an email address . . . either of which breaks the "unique identifier" aspect of an email address.[/quote]Amazon successfully uses my email address as a unique identifier. So does MS Passport and so do other web sites I use. The email address uniquely identifies my account because the site won't allow two accounts with the same email address. Whether the address is shared or not is irrelevant. As far as Amazon's business process is concerned the email address still identifies exactly ONE account, which is all that the business process requires.You are correct that an email address does not uniquely identify a [i]person[/i] but that isn't the issue because the business process does not require a person to be identified - it only needs to identify an [i]account[/i]. If it did need to identify people then other attributes would be used as well - such as a name for example.[quote]Also, [i]by definition[/i], an IDentity column is unique, whether it is used as a PK, a Unique Key, a Unique Index, or just as a column.[/quote]Not if you reseed the column or use IDENTITY_INSERT to insert values. If an IDENTITY column doesn't have a unique constraint or index then it may not be unique.</description><pubDate>Tue, 02 Mar 2010 11:13:38 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>Les,[i]The real atrocity is when one uses an identity, but fails to make the PK unique. [/i]Excuse me but what DBMS allows you to create a P that is NO unique and why are you using it?In most DBS's that I have used, a PK, [i]by definition[/i], has to be unique although it may not be used as a [u]clustered[/u] index.Also, [i]by definition[/i], an IDentity column is unique, whether it is used as a PK, a Unique Key, a Unique Index, or just as a column.Now, if you set up an index on your PoleID and create an index on it but do not enforce the uniqueness with the index, that is jsut really sloppy design and implementation.  Even if there are identical PoleID's out there in the field, there has [i]got[/i] to be some other way (such as a county designator along with the PoleID) to get a unique result.  If so, then the index is simply not complete enough to be made unique.</description><pubDate>Tue, 02 Mar 2010 10:50:28 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[i]An email address is a perfectly reasonable unique identifier provided you have the ability to change it.[/i]David,You have [u][b]got[/b][/u] to be kindding.  Not [u]everyone[/u] has an email address and some people share an email address . . . either of which breaks the "unique identifier" aspect of an email address.  You might as well say that a First, Middle, &amp; Last Name or a telephone number (even including the country code) form a "reasonable unique identifier"!</description><pubDate>Tue, 02 Mar 2010 10:41:48 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Steve Jones - Editor (3/1/2010)[/b][hr]I think Roger has put things well. I'm not saying you don't have a PK. I'm saying identities work well in that role.[/quote]I agree that they can, just that it's not optimal, especially when one starts abstracting relations. The real atrocity is when one uses an identity, but fails to make the PK unique. One here is that they used identities throughout, and for example, a 'Poles' table has a Pole_Nbr attribute, but it's not unique. We now have duplicate Poles throughout the district :(  Yet another bit of magic to be levied in its resolution... :cool:</description><pubDate>Tue, 02 Mar 2010 10:34:51 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]David Portas (3/2/2010)[/b][hr][quote][b]Gary Varga (3/1/2010)[/b][hr]A great example of this is when systems use email addresses as the ID. Unless you maintain your own domain, all it takes is a change in company (even a rebranding) or ISP and you might no longer have access to that email address.Sure we can blame the users, Business Analysts, application developers, database developers, user acceptance testers and such like but that doesn't resolve this type of issue. Change is very expensive in systems.[/quote]That kind of change isn't necessarily expensive if it is foreseen by the designers. An email address is a perfectly reasonable unique identifier provided you have the ability to change it. Changing an email address key is ultimately a tiny change if the system is designed for it.[/quote]I don't disagree with David's point, however, I guess part of what I was trying to say, and didn't explicitly, is that sometimes requirements are not fully captured and information is used as keys in the technical implementation when the consequences of that implementation on the operational use of the data is not fully understood.</description><pubDate>Tue, 02 Mar 2010 01:20:18 GMT</pubDate><dc:creator>Gary Varga</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Gary Varga (3/1/2010)[/b][hr]A great example of this is when systems use email addresses as the ID. Unless you maintain your own domain, all it takes is a change in company (even a rebranding) or ISP and you might no longer have access to that email address.Sure we can blame the users, Business Analysts, application developers, database developers, user acceptance testers and such like but that doesn't resolve this type of issue. Change is very expensive in systems.[/quote]That kind of change isn't necessarily expensive if it is foreseen by the designers. An email address is a perfectly reasonable unique identifier provided you have the ability to change it. Changing an email address key is ultimately a tiny change if the system is designed for it.</description><pubDate>Tue, 02 Mar 2010 00:28:48 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>I am amazed how many times that users have clearly identified unique identifiers which, during their day to day operation, do not always get set, change or perhaps do not get created from the outset in all conditions. Often this is a rare circumstance but it only needs to occur once to stuff both the DB and the application.A great example of this is when systems use email addresses as the ID. Unless you maintain your own domain, all it takes is a change in company (even a rebranding) or ISP and you might no longer have access to that email address.Sure we can blame the users, Business Analysts, application developers, database developers, user acceptance testers and such like but that doesn't resolve this type of issue. Change is very expensive in systems.</description><pubDate>Mon, 01 Mar 2010 23:21:47 GMT</pubDate><dc:creator>Gary Varga</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>Question?I followed an advice when formatting hard disks for SQL servers, I should choose the allocation unit size to match the SQL extent size is 64KB (another reference: Disk Partition Alignment Best Practices for SQL Server, "...whether the cluster size is the NTFS default of 4,096 bytes or has been explicitly defined at 64 KB, which is a best practice for SQL Server.", http://msdn.microsoft.com/en-us/library/dd758814.aspx).Am I right in thinking that identity, GUID; primary and indexed columns, they all would be better in performance if their data type size matches multiples of 8 or so?  Not that the size of identity and GUID could be changed.In general, unless the business requirements say otherwise, data type size should be in multiples of 8?  Or this is another thread for a day?TIA.</description><pubDate>Mon, 01 Mar 2010 20:09:04 GMT</pubDate><dc:creator>Open Minded</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]TheSQLGuru (3/1/2010)[/b]...that [b]Celko[/b] character...[/quote]Oh, no!  His GOOGLE search bot is sure to find this thread now!</description><pubDate>Mon, 01 Mar 2010 19:00:10 GMT</pubDate><dc:creator>Michael Valentine Jones</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote]Codd is sputtering...[/quote]Who is this Codd character??  Probably someone related to that Celko character! :w00t:</description><pubDate>Mon, 01 Mar 2010 16:20:39 GMT</pubDate><dc:creator>TheSQLGuru</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>{edit}... never mind.  Bad post.</description><pubDate>Mon, 01 Mar 2010 16:02:19 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>I think Roger has put things well. I'm not saying you don't have a PK. I'm saying identities work well in that role.</description><pubDate>Mon, 01 Mar 2010 15:12:45 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Les Cardwell (3/1/2010)[/b][hr]Of course it creates issues... I'm dealing with a wagon load of such right now due to the fact that the legacy system did not use a WO# as a PK, and in converting to a new system (JDE1 - ugg), we're left with resolving. Not to mention the times that an engineer was using one WO#, logging costs, only to find that it was a dup... who's transactions had then posted to all other related systems. Can one say 'job security'?A PK has 'meaning', and creates business dependencies throughout the domain...Codd is sputtering... :pinch:[/quote]I think people are getting caught up in semantics. For me the PK is *purely* about linking records between tables and has no "meaning" beyond that.  For performance reasons and others you want that PK as small as possible, guaranteed unique, and never repeated. An identity meets those needs admirably. It also prevents page fragmentation because records are never inserted, only appended (assuming the PK is clustered, which it probably should be).A unique natural key should be *marked* unique. That way you can't get duplicate work orders or whatever. That, after all, is the *point* of a unique constraint. But it does NOT have to be the PK to do that.As for the arbitrary quality of natural keys, yes, and what's your point? :) That's a *good* reason to make them a non-clustered index, that way if you have to change them (like an ISBN) you won't be as constrained. Set up properly and your code can handle the change without modification.Always assuming the key doesn't change from integer to string or some such...</description><pubDate>Mon, 01 Mar 2010 15:09:55 GMT</pubDate><dc:creator>roger.plowman</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Steve Jones - Editor (3/1/2010)[/b][hr]It's not a question of the origin of the number, but the fact that it's arbitrary-generated, and as such isn't necessarily a PK. In the paper world, when someone generates a duplicate PO number it may or may not get corrected, and may or may not cause issues. In a database, it's a bigger issue as we try to build rules for how the computer manages data.If the key is only used in the db, and used to manage data and maintain links and integrity, is it an issue?[/quote]Of course it creates issues... I'm dealing with a wagon load of such right now due to the fact that the legacy system did not use a WO# as a PK, and in converting to a new system (JDE1 - ugg), we're left with resolving. Not to mention the times that an engineer was using one WO#, logging costs, only to find that it was a dup... who's transactions had then posted to all other related systems. Can one say 'job security'?A PK has 'meaning', and creates business dependencies throughout the domain...Codd is sputtering... :pinch:</description><pubDate>Mon, 01 Mar 2010 14:43:45 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Steve Jones - Editor (3/1/2010)[/b][hr]If the key is only used in the db, and used to manage data and maintain links and integrity, is it an issue?[/quote]It is a most important issue for the business user, yes. The business user needs to be able to identify in the real world the things which the database is supposed to record for her. A surrogate key does not do that. A business key does. Without a business key the data in the database is probably useless - or at least ambiguous.</description><pubDate>Mon, 01 Mar 2010 14:41:01 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>the basic rule is that if bad data is possible to exist by database constraints (even if it shouldn't by business rules) somehow it will get into your databaseif you have a natural key that should be unique, then make it uniquebut for a primary key I tend to always use a surogate key, as this will always be a single column Int for smaller indexes and fast joining, and to allow for changes to business logic that might affect the uniqueness of the natural keys</description><pubDate>Mon, 01 Mar 2010 14:40:59 GMT</pubDate><dc:creator>Daniel Hallam</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>It's not a question of the origin of the number, but the fact that it's arbitrary-generated, and as such isn't necessarily a PK. In the paper world, when someone generates a duplicate PO number it may or may not get corrected, and may or may not cause issues. In a database, it's a bigger issue as we try to build rules for how the computer manages data.If the key is only used in the db, and used to manage data and maintain links and integrity, is it an issue?</description><pubDate>Mon, 01 Mar 2010 14:34:56 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote]So many business  numbers we use are made up. Granted there are items like an ISBN that are regulated, and should provide a strong natural key, but even those change over time. ISBNs have changed (Grown) in the time I've been in publishing.I am not arguing that natural keys don't exist or that they shouldn't be used, but there are many time in business when you find something that appears to be a natural key, but isn't. Especially across time.[/quote]What do you mean by "natural key" in this case? I much prefer the term business key but natural key means the same. It means an identifier used by the business / end user that has some external meaning outside the database. Whether it is made up or not is pretty irrelevant and arbitrary it seems to me. Ultimately all names and numbers are symbols invented by humans. I don't see why it is useful to make any distinction about where keys originate from. If it is intended to be used by the business to identify something then de facto it is a business key (= "natural" if you prefer).By contrast a surrogate is a key which has no external meaning at all. It is never used outside the database.</description><pubDate>Mon, 01 Mar 2010 14:27:21 GMT</pubDate><dc:creator>sqlvogel</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Steve Jones - Editor (3/1/2010)[/b][hr]Shocked? What's a natural key? A PO number? Ever talked to someone about those? they're made up, usually by someone that keeps a spreadsheet, or even a piece of paper on a desk. Employee numbers? made up completely. SSNs, some logic, but essentially made up, and not guaranteed to be unique.  SKUs? sequential numbers applied for, same for ISBNs.So many business  numbers we use are made up. Granted there are items like an ISBN that are regulated, and should provide a strong natural key, but even those change over time. ISBNs have changed (Grown) in the time I've been in publishing.[/quote]LOL - well, Invoice#'s, PO#'s, and the like [i]should[/i] be the PK (the PK describes the attributes within a given tuple - or rather, the attributes describe the PK, and only the PK - e.g. all attributes in an invoice record describe that Invoice#). Employee numbers - same, whether system generated or manually derived. SSN's suffer from data integrity on a larger scale (and now have PC issues as well), so are relegated to attributes of a person, especially since all persons do not have SSN's. Rather, persons usually exist in a class context (Customers, Vendors, etc) and therefore are most appropriately relegated to surrogate PK's, especially given the data-correctness and uniqueness aspects of their name attributes. Then there are those PK's that [i]might[/i] change, and if it's decided that those that change [i]can[/i] be mutable (e.g. ISBN), it's the reason we have Cascade RI options. And 'Attribute tables' (UDC's, etc) pose signficant tertiary join problems when saddled with surrogate keys, adding significant amounts of transactional code to what is otherwise a simple 'lookup', especially when multiple dependencies exist in creating an assembly item record with attribute dependencies (e.g. -'Pears', 'Anjou', 'Box', '55ct', 'Size4', 'GradeA' - where each subsequent attribute is dependent on the one prior) As to identity keys being ordered and clustered... kinda nutz when one realizes that unless clustered on another key, the tuples alread exist as clustered unless the identity key is renumbered somewhere along the line.AAR, JM2c :)</description><pubDate>Mon, 01 Mar 2010 14:19:13 GMT</pubDate><dc:creator>Les Cardwell</dc:creator></item><item><title>RE: Overusing Identities</title><link>http://www.sqlservercentral.com/Forums/Topic874032-263-1.aspx</link><description>[quote][b]Les Cardwell (3/1/2010)[/b][hr]Steve,I'm shocked! :w00t:  The only purist rationalization for a surrogate PK is when no 'natural key' exists. Otherwise, it's just a throwback to mainframe days when files used data keys to reference (point) records, and at the least, a corruption/minimalization of a relational model. So much for unordered relations :crazy:[/quote]Shocked? What's a natural key? A PO number? Ever talked to someone about those? they're made up, usually by someone that keeps a spreadsheet, or even a piece of paper on a desk. Employee numbers? made up completely. SSNs, some logic, but essentially made up, and not guaranteed to be unique.  SKUs? sequential numbers applied for, same for ISBNs.So many business  numbers we use are made up. Granted there are items like an ISBN that are regulated, and should provide a strong natural key, but even those change over time. ISBNs have changed (Grown) in the time I've been in publishing.I am not arguing that natural keys don't exist or that they shouldn't be used, but there are many time in business when you find something that appears to be a natural key, but isn't. Especially across time.An identity is different than a record number or locator. It has nothing to do with physical location, or even insertion order, and you must be aware of that.</description><pubDate>Mon, 01 Mar 2010 13:25:00 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item></channel></rss>