Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Overusing Identities Expand / Collapse
Author
Message
Posted Monday, March 1, 2010 6:27 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:33 PM
Points: 37,075, Visits: 31,633
Filipe (3/1/2010)
Actually I must disagree and say that Identity columns are nice and convinient but can be very bad as well. Why would someone define an identity column on a table that already has another unique key, specially if that column is an int or bigint?
Recently I had living proof that is not a good practice.
I have one database that was developed with the maxim that identities are never enough, and this one table had a natural key (unique, int) that was theonly field used in the application, and had the identity, that was the clustered PK.
I started noticing dead-locks on that table, and the dead-lock was between updates on the PK and the unique key. Since all access to the table was via the UK, when deleting from that table the order of the locks was the reverse of the selects, and it resulted in dead-locks. Getting rid of the PK (and the identity all together) got rid of the problem as well.


Did you move the PK to the remaining column? Same question for the Clustered Index. Also, did this just happen to be a sequence table?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #874381
Posted Monday, March 1, 2010 6:30 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 2:16 PM
Points: 94, Visits: 578
I agree with Gail. I think the overuse of identities is a problem. Too often I come across data integrity issues caused by the theory of "identity on every table" rather than utilizing the keys from the parent table as part of the child table. Even if the parent table PK is an identity, the child table can benefit (and subsequent grandchildren) from carrying the parent PK. Integrity improves and simplifies queries - and we all know developers need all the help they can get.

Cheers
http://twitter.com/widba
http://widba.blogspot.com/
Post #874382
Posted Monday, March 1, 2010 6:35 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 10:10 AM
Points: 176, Visits: 294
Yes, moved PK and cluster to natural key column, and no, this is not a sequence table.
Post #874385
Posted Monday, March 1, 2010 6:43 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 1:03 PM
Points: 15,729, Visits: 28,132
I'm going to go with Gail & David, there's nothing all that wrong with using Identities as your PK as long as you've also defined the natural key through a unique constraint on the table. Otherwise, it's a recipe for data disaster.

I actually lean towards a design model these days that uses the Identity on all the key tables, and then uses compound keys on all the child and relationship tables. Those tables don't get an identity. I've found, after lots of testing, that most of the data access at our company is pretty hiearchical and building the tables in this way provides for the clustered index being the best access path to the data.

We've also been growing quite a few systems with code that writes code (ORM flavors, nHibernate, CRM, SharePoint,etc.) which frequently leads to GUIDs. The sequential ones work very well. They're a bit fatter than your basic integer, so you suffer in the number of pages used, but other than that, the sequential GUIDs work very well. The random GUIDs on the other hand... they're a royal pain.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #874389
Posted Monday, March 1, 2010 6:55 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, September 5, 2014 11:04 AM
Points: 4, Visits: 60
I tend to focus on High Performance databases where performance is critical. Aaron Bertrand suggests skipping a PK even while having a cluster in his ActivityLog example. From a performance aspect not having a PK strikes this writer as foolish. I tend to like the "ID INT IDENTITY(1,1) PRIMARY KEY," style since it takes half the index space as making the Datetime a PK. It tends to be a trade off, I like to use more storage space for faster performance, some people will trade performance for a little cost savings.
Best wishes,
Barry O’Connell
Post #874397
Posted Monday, March 1, 2010 7:51 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, September 9, 2013 8:32 AM
Points: 108, Visits: 166
Steve said:
But is overusing of identity a big problem? I tend to see most people building databases as beginning to intermediate data modelers. I'd probably include myself in that group as well, and rather than have them get confused about what a good natural key is, or building a complicated complex key, I think the default practice of just sticking a surrogate identity column on the table as the PK is a good idea. It's simple, and in the absence of knowing better, or having a reason not to do it, I think it's a good practice.


Are we talking about a profession or a hobby?

Steve, I'm a bit surprised at this statement from you, since you're not just another procedural developer lost in a declarative world. You've been writing this excellent newsletter for years, why are your modeling skills "beginning to intermediate"?

Are we talking about a profession or a hobby? For some reason, DDL and DML are the two places that hobby-level skills are tolerated (even encouraged) in information systems profession.

If a person is confused about what the natural key is (or about when there is no useful natural key, such as with individuals [assuming we can't sequence their DNA and rule out identical twins], why are they using a relational database in the first place? Put it in a file for pity's sake.

Have you ever maintained a "database" full of meaningless pointers? The most terrifying words are "it's just a simple little database". That means "I'm ignoring the true nature of the data and trying to build a network of pointers than has no inherent meaning". Easy to write, impossible to use long term.

Now, I write lousy Java code. I know it, and that's why I don't do a lot of it, and when I do I ask a good coder to critique it. Java folks are pretty serious fundamentalists - if you don't do it right, they rake you over the coals.

We don't have to be nasty about it. But yes, we can insist that people learn thier profession and understand how to determine the natural keys (or determine there is no "useful" natural key or another reason a surrogate makes senses) before letting them out of the sandbox.

Roger Reid






Roger L Reid
Post #874424
Posted Monday, March 1, 2010 8:20 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, November 11, 2011 1:21 PM
Points: 50, Visits: 162
Steve,

Your comment about "when in doubt . . . " struck a nerve on me. I will agree with the "when in doubt, add the INT Identity colummn"; however, if you add a GUID as your identity column and make it a PK, then you have, essentially, made a random number your PK. Think that one through for a minute. A PK is, by default in the SQL Server world, a clustered index. Do you really want to use a random number to sort your table? Doesn't that sort of guarantee that the next entry that is made to the table is going to start causing fragmentation and/or resorting of the table?

IMHO, if you must use a GUID for your Identity column (and I will concede that there are times when that is the proper choice ) then you should make it a Unique Key/Index instead of a PK.


Ralph D. Wilson II
Development DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
A. Lincoln
Post #874441
Posted Monday, March 1, 2010 8:22 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, December 2, 2013 6:30 AM
Points: 346, Visits: 691
I favor using identities in *almost* every table, even those that have one or more natural unique keys. For one thing it makes logging much simpler, a single quad of log tables (one for each CRUD type) with a table ID, a RID (my term for a 4 byte identity), a date/time and some other fluff, and you're good to go. A single quad of triggers to write too. Gotta love the simplicity...

They also have the advantage of compactness, serialism, and thus speed. As clustered primary keys they're pretty hard to beat.

At the same time you should always index your unique natural keys, you're going to sort by them if nothing else. RIDs can't be the end-all of indexing by themselves.

The only time I don't put a RID in a table is if the table is a many-to-many table consisting of a pair of RIDs. Although even here if logging is critical for auditing purposes this table too will get its own RID PK.

I don't think it's a matter of using *too many* identities, it's a matter of not using them *properly*. Unless your table is over 2 billion rows (or likely to exceed that limit over time) identities are the clear winners from any number of perspectives.
Post #874443
Posted Monday, March 1, 2010 8:28 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, November 22, 2010 10:32 AM
Points: 2, Visits: 15
GilaMonster (3/1/2010)
I don't have a problem with using identities. What I do have a problem with is using identities as the only unique identifier of the entire table.

When the primary key is an identity (or a GUID) and there is no other unique column or set of columns in the table, then it's easy to get into the situation where the only thing that differs between two rows is a meaningless artificial key. That easily leads to unamusing data integrity issues at a later date.

It's important to identify the candidate keys during modelling. Maybe one gets selected as the primary, maybe an artificial key gets used instead, doesn't really matter, but then the candidate keys should get unique constraints defined on them so that the meaningless artificial key is not the sole measure of uniqueness in the table.


Quite right. I, on the other hand, have a serious problem with using a composite key and not using identities. Case in point (battle I lost), a customer database that uses a composite key of Customer ID and Customer Set ID (to match our enterprise database). Fine, I said - UK this composite, but put an identity on it. Nope. So - 20 bytes on each and every table that has a link into the master customer table (just about everything). And 200+ developer hours wasted dealing with the composite everywhere, in UDFs, stored procedures, UIs, and most recently in an XML based app. More coding work, more chance for developer error, more debugging, more of everything that wouldn't have been necessary by adding about 4Mb to the database (~0.7%).
Post #874447
Posted Monday, March 1, 2010 8:34 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:25 PM
Points: 43,008, Visits: 36,164
RalphWilson (3/1/2010)
IMHO, if you must use a GUID for your Identity column (and I will concede that there are times when that is the proper choice ) then you should make it a Unique Key/Index instead of a PK.


There's nothing wrong with a GUID primary key, as long as it is PRIMARY KEY NONCLUSTERED



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #874450
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse