Foreign Keys

Question

Foreign Keys

HappyGeek

SSCoach

Points: 18726
More actions
August 1, 2018 at 1:55 am

#366895

Is there any merit in having a foreign key which references a non-clustered index in another table, over one which references a clustered index.
have always thought that for best performance each table should have a clustered index and that the clustered index should be posted as the foreign key.
Your thoughts, experience and contributions are most welcome and thank you for your time.
...

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

andrew gothard SSChampion Points: 12301 More actions · Answer 1

Heaps, in my experience, have very limited applications in a competently designed system. Staging tables in ETL, but even then for loading only, once you start working with them you index appropriately before you start - well, same for Transformation tables I suppose.
As well as being slower initially, in any active table performance will degrade, probably catastrophically in tables over a few tens of million records, over time as your table fills with forwarded records, deletions occur and fragmentation occurs.
In a heap, your average search requirement is half the records in the table, this is significantly reduced in a binary search tree of any size.
I've seen badly designed systems with large heaps used with only a few million records (20-30) effectively grind to a halt (processing jobs against the data taking 3-4 hours + reduced to under 3 seconds by adding an appropriate clustered index

I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 2

So a little confusion here. Foreign keys should be bound to unique constraints/unique indexes. Now, whether that unique index is a clustered index or a nonclustered index, that's a different discussion. You get one clustered index per table. It defines the storage of that data in that table. In most circumstances, every table should have a clustered index. Choosing where to put the clustered index is the hard part. My general rule of thumb on this is simple to explain and understand. The clustered index should be on the most common path to the data. Basically, what will most queries be filtering on? That is where the clustered index should go. Probably, for most tables, that means the primary key. Since the primary key is the unique identifier on the table, it's going to make a great candidate for foreign key. That said, most tables are probably going to have an artificial primary key. This means that you should probably also have a second, logical or business-defined, unique constraint, that would, in the old days, have been a natural primary key. Depending on the circumstances, that could also be used as a foreign key. I wouldn't always have the primary key as the driver in relations to other tables, although, again, generally this will be true.

TLDR: Clustered index? Not always.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 3

Grant Fritchey - Thursday, August 2, 2018 5:02 AM
So a little confusion here. Foreign keys should be bound to unique constraints/unique indexes. Now, whether that unique index is a clustered index or a nonclustered index, that's a different discussion. You get one clustered index per table. It defines the storage of that data in that table. In most circumstances, every table should have a clustered index. Choosing where to put the clustered index is the hard part. My general rule of thumb on this is simple to explain and understand. The clustered index should be on the most common path to the data. Basically, what will most queries be filtering on? That is where the clustered index should go. Probably, for most tables, that means the primary key. Since the primary key is the unique identifier on the table, it's going to make a great candidate for foreign key. That said, most tables are probably going to have an artificial primary key. This means that you should probably also have a second, logical or business-defined, unique constraint, that would, in the old days, have been a natural primary key. Depending on the circumstances, that could also be used as a foreign key. I wouldn't always have the primary key as the driver in relations to other tables, although, again, generally this will be true.
TLDR: Clustered index? Not always.

Not directly related to the question asked by the OP... as with all else, I've found that "It Depends".:D

A lot of people do recommend that the clustered index should support as many queries as possible and, while that works, it doesn't always work well because of the width of the leaf level of the clustered index. For example, the system can only read whole pages even if a single row is to be queried. If the predominant method of data access is one row at a time and the table is wide with many columns that aren't actually used by most of the queries, then you're wasting a relatively large amount of memory on stuff that simply won't be used. The same is even more pronounced for large batch jobs and reporting queries.

There are other considerations, as well. Brent Ozar proposed that fragmentation doesn't matter because, once in memory, performance will be the same for well fragmented and totally unfragmented data and he is absolutely correct. I've proven that with test code. But, he was talking about "Logical Fragmentation" and he didn't emphasize the effect on "Read Aheads" if the data isn't already in memory and he didn't emphasize that the only way you can have logical fragmentation is if "Physical Fragmentation" in the form of page splits have occurred, which are devastating for non append-only inserts and expAnsive updates for duration, CPU, I/O, and the number of entries on the log file not to mention that it can easily waste a ton of memory by the "Natural Fill Factor" that results from such page splits which, in turn, can cause other queries to take longer because their data is driven out of memory.

https://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/
https://www.brentozar.com/archive/2013/09/why-index-fragmentation-doesnt-matter-video/
https://www.sqlskills.com/blogs/paul/how-expensive-are-page-splits-in-terms-of-transaction-log/
https://www.sqlskills.com/blogs/paul/performance-issues-from-wasted-buffer-pool-memory/

To be sure, both Brent and Paul are correct (IMHO) but, as with all else in SQL Server, "It Depends".

My point is that folks can't just take the stance that your Clustered Index should be based on what satisfies the most queries because there's so much more that needs to be considered especially if the table is wide and long.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 4

Jeff Moden - Thursday, August 2, 2018 8:54 AM
TLDR: It depends

Fixed that for you.

Ha!

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 5

My take on it, similar to what Grant and Jeff have said, is that the clustered index isn't necessarily the primary key, so the foreign key would reference that instead of the clustered index. I've found though that the primary key is not always the most common path to reading the data. Also in a situation where a join to this table only references a few columns, it may make more sense to have a covering nonclustered index that handles all of these little read requests and have your clustered index designed to handle the cases where many more columns are needed to resolve the WHERE clause conditions or SELECTed result set.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

Grant Fritchey - Thursday, August 2, 2018 9:02 AM
Jeff Moden - Thursday, August 2, 2018 8:54 AM
TLDR: It depends
Fixed that for you.
Ha!

Oh my! Took me five minutes to stop laughing out loud, Grant! Perfect fix. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

HappyGeek SSCoach Points: 18726 More actions · Answer 7

Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;

...

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 8

HappyGeek - Thursday, August 2, 2018 10:18 PM
Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;

I lean very heavily towards having all the tables have a clustered index and yeah, probably on the primary key. However, a blanket recommendation of just scripting a mass conversion on those lines would be dangerous in my opinion. We joke, but "it depends" really is a very common answer. I'd want to have more data to ensure that I was making the right decision. Have you looked at wait stats, execution plans, most frequently called queries, longest running queries, queries using the most resources. Ensure that you know where the slow performance is coming from absolutely, not just see a nominally questionable practice and aim to undo it. That's my advice. Measure twice, cut once.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 9

HappyGeek - Wednesday, August 1, 2018 1:55 AM
Is there any merit in having a foreign key which references a non-clustered index in another table, over one which references a clustered index.
have always thought that for best performance each table should have a clustered index and that the clustered index should be posted as the foreign key.
Your thoughts, experience and contributions are most welcome and thank you for your time.

The important thing to remember is that a foreign key is supposed to first provide data integrity, and not performance. Performance is a good thing ,of course. After that, the decision between clustered, non-clustered and no indexes is going to be one of those "well, it all depends" things. Is the referenced data heavily clustered by its nature? Is the volume so small that an index isn't worth the overhead? Etc.?

Please post DDL and follow ANSI/ISO standards when asking for help.

HappyGeek SSCoach Points: 18726 More actions · Answer 10

Grant Fritchey - Friday, August 3, 2018 5:05 AM
HappyGeek - Thursday, August 2, 2018 10:18 PM
Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;
I lean very heavily towards having all the tables have a clustered index and yeah, probably on the primary key. However, a blanket recommendation of just scripting a mass conversion on those lines would be dangerous in my opinion. We joke, but "it depends" really is a very common answer. I'd want to have more data to ensure that I was making the right decision. Have you looked at wait stats, execution plans, most frequently called queries, longest running queries, queries using the most resources. Ensure that you know where the slow performance is coming from absolutely, not just see a nominally questionable practice and aim to undo it. That's my advice. Measure twice, cut once.

Many thanks Grant but when I say I have no other data, I mean exactly that, I have no access to the server, no access to the SQL Instance, but can wander into the kitchen unaccompanied, if I could look at stats and other metrics it would be great!!

...

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

Grant Fritchey - Friday, August 3, 2018 5:05 AM
HappyGeek - Thursday, August 2, 2018 10:18 PM
Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;
I lean very heavily towards having all the tables have a clustered index and yeah, probably on the primary key. However, a blanket recommendation of just scripting a mass conversion on those lines would be dangerous in my opinion. We joke, but "it depends" really is a very common answer. I'd want to have more data to ensure that I was making the right decision. Have you looked at wait stats, execution plans, most frequently called queries, longest running queries, queries using the most resources. Ensure that you know where the slow performance is coming from absolutely, not just see a nominally questionable practice and aim to undo it. That's my advice. Measure twice, cut once.

I strongly concur with what Grant has stated above.

An example of doom is arbitrarily converting UNIQUE Non-Clustered Indexes to Clustered Indexes. That single act could very well cause huge performance problems because a Clustered Index is almost always wider than a non-clustered index. That means that the same queries would require more memory, more logical reads, and, perhaps, more physical reads because the increase in memory requirements may have just driven a whole bunch of other things out of memory and they'll need to be read from disk the next time they're used.

You MUST do as Grant says in this case... you MUST understand the effects that you may have with such simple, seemingly benign changes that are actually fully cocked and powerful bear traps waiting for you to step into them. Along with "It Depends", another bit of sage advise is "Must Look Eye"! 😀 Applying a cure without proof of a problem may kill the patient especially if you don't know what allergies the patient may have.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 12

HappyGeek - Friday, August 3, 2018 8:35 AM
Grant Fritchey - Friday, August 3, 2018 5:05 AM
HappyGeek - Thursday, August 2, 2018 10:18 PM
Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;
I lean very heavily towards having all the tables have a clustered index and yeah, probably on the primary key. However, a blanket recommendation of just scripting a mass conversion on those lines would be dangerous in my opinion. We joke, but "it depends" really is a very common answer. I'd want to have more data to ensure that I was making the right decision. Have you looked at wait stats, execution plans, most frequently called queries, longest running queries, queries using the most resources. Ensure that you know where the slow performance is coming from absolutely, not just see a nominally questionable practice and aim to undo it. That's my advice. Measure twice, cut once.
Many thanks Grant but when I say I have no other data, I mean exactly that, I have no access to the server, no access to the SQL Instance, but can wander into the kitchen unaccompanied, if I could look at stats and other metrics it would be great!!

OK... I'm seriously confused... If you have "no access to the server", how were you intending to make changes to the indexes?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

HappyGeek SSCoach Points: 18726 More actions · Answer 13

Jeff Moden - Friday, August 3, 2018 8:41 AM
HappyGeek - Friday, August 3, 2018 8:35 AM
Grant Fritchey - Friday, August 3, 2018 5:05 AM
HappyGeek - Thursday, August 2, 2018 10:18 PM
Many thanks to everyone for your valued contribution, the general opinion is aligned to what I think. However here is my situation; I have inherited an environment in which about a third of all tables have a primary key and no clustered index; the remainder have a primary key and clustered index. Of that third that do not; they have a primary key on an identity column, that identity column has a non clustered index on it which is then used as a foreign key, performance is suffering badly. It would seem sensible, without having access to any other data, to convert the unique non clustered index to a clustered index and yes I understand that this is not an ideal situation given the comments above, any thoughts on this scenario? Again many thanks for your time so far;
I lean very heavily towards having all the tables have a clustered index and yeah, probably on the primary key. However, a blanket recommendation of just scripting a mass conversion on those lines would be dangerous in my opinion. We joke, but "it depends" really is a very common answer. I'd want to have more data to ensure that I was making the right decision. Have you looked at wait stats, execution plans, most frequently called queries, longest running queries, queries using the most resources. Ensure that you know where the slow performance is coming from absolutely, not just see a nominally questionable practice and aim to undo it. That's my advice. Measure twice, cut once.
Many thanks Grant but when I say I have no other data, I mean exactly that, I have no access to the server, no access to the SQL Instance, but can wander into the kitchen unaccompanied, if I could look at stats and other metrics it would be great!!
OK... I'm seriously confused... If you have "no access to the server", how were you intending to make changes to the indexes?

Lol, not as confused as I am, told to sort the server and all I have to work with is an outdated copy of a dev version of a database, I can only advise on what may be pressure points from the database design point of view!! Strange job this :hehe::blink::blush:

...

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 14

HappyGeek - Friday, August 3, 2018 8:51 AM
Lol, not as confused as I am, told to sort the server and all I have to work with is an outdated copy of a dev version of a database, I can only advise on what may be pressure points from the database design point of view!! Strange job this :hehe::blink::blush:

Wow.

Can you at least get which queries are being called or anything along those lines? Otherwise this is just about stab in the dark.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning