Using Identity Field as my main relation in joins

Question

Using Identity Field as my main relation in joins

sammuts

SSC Veteran

Points: 282
More actions
August 16, 2011 at 3:45 am

#104086

Hi,
In all my tables my first field is an Identity field which is set as my primary key. Is it a good practice to use this fied to join between tables or this might give me some trouble in the future?
10x
Ivan

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 1

Ideally the primary key should consist of column that have actual data rather than system generated values.

for example for a table like customer , the emailid would be a good candidate for the primary key clustered index which in turn helps with the joins.

Identity columns make more sense in table which store transactional data. for example a table storing orders etc.

In any cases don't rely purely on the identity column.

Jayanth Kurup[/url]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

sammuts (8/16/2011)
Hi,
In all my tables my first field is an Identity field which is set as my primary key. Is it a good practice to use this fied to join between tables or this might give me some trouble in the future?
10x
Ivan

It's an excellent choice if it is applicable to the business case, and is unlikely to cause some of the problems associated with a natural key (actual data rather than a system-generated value). This is a big subject, natural keys vs surrogate keys, and you may be better off dipping into a couple of textbooks rather than assembling an array of opinions from here. Almost all will be useful, some will be awful.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Sean Lange SSC Guru Points: 286609 More actions · Answer 3

One of the important elements in deciding on a natural key vs a surrogate key is to determine if the value you pick is going to change. In the first example I would argue strongly that email is NOT a valid natural key. Email addresses change. If it is your primary key you can't change it without changing the foreign key reference in every single table that references it. Maybe a better fit in that case would be the DUNS number or some other value that is constant. There is no perfect answer or implementation of this topic. Some are better than others and you have to make sure you understand the nuances of the choice and make decisions that best fit your needs.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Grant Fritchey SSC Guru Points: 399330 More actions · Answer 4

Just remember that while an identity value does uniquely identify a row, it doesn't uniquely identify it for the business. So, just as an example, you need to store first and last name and they should be unique (it's an example), simply putting the identity column in place still allows multiple first&last names to be added. You'd have to add a unique constraint to satisfy the business requirement on top of uniquely identifying the row.

One other consideration, if you do nothing, by default, the PK will be the clustered index. And if you're setting up identity values, they frequently become the cluster too. Just remember, that might not be the most frequently accessed path to the data and you'll want that most frequently accessed path to be the cluster more often than not. Just something else to keep in mind.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 5

nm

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004659 More actions · Answer 6

Jayanth_Kurup (8/16/2011)
Ideally the primary key should consist of column that have actual data rather than system generated values.
for example for a table like customer , the emailid would be a good candidate for the primary key clustered index which in turn helps with the joins.
Identity columns make more sense in table which store transactional data. for example a table storing orders etc.
In any cases don't rely purely on the identity column.

Gosh... I just can't agree with that, Jayanth. People have more than one email address and a lot of people change them when they change ISP's. Peoples names can change and, if it were actually legal to use SSN's for indentificaiton, even those can change when a special request is made. Even ISO country names change and when they do, you need some way to track historical rows as well as relating them to new data.

I know a lot of people will raise hell and prop it up on a stick for me recommending such a thing but even if a table has a strong natural key available, I still have an IDENTITY column on the table to save my bacon when the unexpected occurs. That also means that I may have both a PK and an AK on many tables.

On highly transational tables, an IDENTITY column as the Clustered Index is an absolute must for me to prevent page rampant page and extent splits and the resulting timeouts they will frequently cause especially when a table has a substantial number of FK's associated with it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004659 More actions · Answer 7

Heh... I didn't scroll down before I replied. Maybe there won't be so many people raising hell with me after all. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 8

Jeff Moden (8/18/2011)
Heh... I didn't scroll down before I replied. Maybe there won't be so many people raising hell with me after all. 😀

I went there too...and later decided an "nm" would be a much better post to leave behind...something about a horse... :Whistling:

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004659 More actions · Answer 9

Heh... I wondered what the withdrawal was. Now I know. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

tfifield SSCrazy Eights Points: 9655 More actions · Answer 10

Just to add my 2 cents. I've found that identity columns for both clustering key and primary key to be very, very fast when doing complex queries, and as Jeff pointed out you don't get rampant page splits when inserting.

I do, however, usually have another unique index that defines a business uniqueness for each row. In business queries, the top level table will usually be queried on the business unique index because that's what the users consider to be unique. From there on down all the joining is done on the identity columns and their Foreign Keys into child tables.

Since Identities are very narrow, 4 bytes for an INT, they are very fast in joins and I usually get MERGE type joins (the fastest for largish result sets) to the child tables since they are ordered.

I can live with the fact that surrogate keys aren't the best business keys when I trade it off against the performance gains.

Todd Fifield