Good SQL databases need good data with a certain level of performance. The referential integrity provided by foreign keys help provide the good data. Performance, on the other hand, can be hurt if foreign keys aren't indexed properly, and because FK indexing is in the background, you may not even realize there are issues. Today we'll talk about keeping the integrity of data while typically coming out ahead on performance.
Like many things in life, I have a script for that. However, what the script is looking for, the meaning of the results, and what to do with the results could use a proper explanation for most people. I'll start from the beginning, but won't hold it against you if you skipped ahead to the script below. Note, my simple version of a script for foreign keys doesn't get into the details or options like cascading.
What are Foreign Keys?
A foreign key ensures a child record such as an Order has a parent record such as a Customer. When you insert or update an Order the FK needs to verify that you associated it with a valid Customer, so it implicitly adds a join to the Customer table. When you update or delete a Customer then it has to verify that no Orders will be orphaned, so it implicitly adds a join to the Order table. These joins are where the performance complaints come from, and this post is how to handle those performance complaints the right way.
I'm easily persuaded on my writing to allow scope creep, but I'm going to give you some links instead this time. Ravi Lobo wrote an article for SQL Server Central almost a decade ago titled Foreign Keys, and the fundamentals are the same for SQL 2016 as they were for SQL 2000. Mike Byrd also wrote a nice piece in his two part post SQL Server Foreign Keys: Some of the Mystery Explained!. If you need to understand FKs better than my single paragraph, read those articles.
I've heard the advice that you have FKs in development to make sure your code works properly and drop the FKs in production because they'll just slow you down. The argument went that a properly developed system will enforce integrity by only running good code, and you tested your code to be good in development. Here are a couple holes in that argument that don't involve physical threats from a data architect:
- If Foreign Keys are one of your biggest performance complaints then you either indexed them wrong or you are a tuning god or goddess.
- The code today maintains the integrity, but who's coding tomorrow?
- The code in source control supposedly maintains the integrity, but what about ad-hoc updates?
- The code was tested to maintain integrity, but users... oh, what can I say about those users, especially that one "power" user... I have nothing nice to say, so I'll keep my mouth shut.
Even if I felt performance was a valid argument, performance can slow you down while data integrity can cause unscheduled updates to your résumé.
I mentioned earlier that there were implicit joins to maintain foreign keys, such as between an Orders and Customer table. The implicit joins used to maintain the FK relationship are just like any other join in any other query with a few key differences.
- The join isn't explicitly declared in the query.
- The index used on the referenced table permanently bound to the FK.
The first point is expected, the second point will kill us from time to time. Instead of creating execution plans on the fly, the indexes being used are written to a table that you can't change. Just like a normal query it will do what it needs to do and never complain. Unlike a normal query, it won't pick up a new index due to statistics being updated, a new index being created, or even a server being rebooted. The only way I'm aware of to have a FK bind to a different index is to drop and recreate the FK, letting it automatically select the index with no options to control it manually. I hope someone can prove me wrong on dropping and recreating the FK. Not because I don't like the step (I don't), but because it can take a while to make a FK trusted again. Chris Bell (b|t) gets into the details of establishing trust and what it means to you in his post Untrusted Foreign Keys.
Are My FKs Using Good Indexes?
Here's the script I use to determine if my FKs are using good indexes. What it classifies as "Good" is the first key column(s) of an index are the same as the column(s) of the foreign key. It also checks to make sure the bound index is a good index.
Note, I used the column names that are used in the system tables to make everything as transparent as possible. The words "referenced" and "parent" are right according to the system tables, but reversed from how I learned them.
WITH FK_ColumnCount AS ( SELECT kc.constraint_object_id , ColumnCount = max(kc.constraint_column_id) FROM sys.foreign_key_columns kc GROUP BY kc.constraint_object_id ) , ParentIndexGood AS ( SELECT kc.constraint_object_id , FK_CC = cc.ColumnCount , ic.index_id , I_CC = COUNT(1) FROM sys.foreign_key_columns kc INNER JOIN FK_ColumnCount cc ON kc.constraint_object_id = cc.constraint_object_id INNER JOIN sys.index_columns ic ON ic.key_ordinal <= cc.ColumnCount AND ic.object_id = kc.parent_object_id AND ic.column_id = kc.parent_column_id GROUP BY kc.constraint_object_id , cc.ColumnCount , ic.index_id HAVING cc.ColumnCount = COUNT(1) ) , ReferencedIndexGood AS ( SELECT kc.constraint_object_id , FK_CC = cc.ColumnCount , ic.index_id , I_CC = COUNT(1) FROM sys.foreign_key_columns kc INNER JOIN FK_ColumnCount cc ON kc.constraint_object_id = cc.constraint_object_id INNER JOIN sys.index_columns ic ON ic.key_ordinal <= cc.ColumnCount AND ic.object_id = kc.referenced_object_id AND ic.column_id = kc.referenced_column_id GROUP BY kc.constraint_object_id , cc.ColumnCount , ic.index_id HAVING cc.ColumnCount = COUNT(1) ) , ReferencedBoundIndexGood AS ( SELECT kc.constraint_object_id , FK_CC = cc.ColumnCount , ic.index_id , I_CC = COUNT(1) FROM sys.foreign_keys k INNER JOIN sys.foreign_key_columns kc ON k.object_id = kc.constraint_object_id INNER JOIN FK_ColumnCount cc ON kc.constraint_object_id = cc.constraint_object_id INNER JOIN sys.index_columns ic ON ic.key_ordinal <= cc.ColumnCount AND ic.object_id = kc.referenced_object_id AND ic.column_id = kc.referenced_column_id AND ic.index_id = k.key_index_id GROUP BY kc.constraint_object_id , cc.ColumnCount , ic.index_id HAVING cc.ColumnCount = COUNT(1) ) SELECT FK_Name = k.name , k.is_disabled , k.is_not_trusted , k.delete_referential_action_desc , k.update_referential_action_desc , ParentTable = ps.name + '.' + pt.name , ParentColumns = substring((SELECT (', ' + c.name) FROM sys.foreign_key_columns kc INNER JOIN sys.columns c ON kc.parent_object_id = c.object_id AND kc.parent_column_id = c.column_id WHERE kc.constraint_object_id = k.object_id ORDER BY kc.constraint_column_id FOR XML PATH ('') ), 3, 4000) , ReferencedTable = rs.name + '.' + rt.name , ReferencedColumns = substring((SELECT (', ' + c.name) FROM sys.foreign_key_columns kc INNER JOIN sys.columns c ON kc.referenced_object_id = c.object_id AND kc.referenced_column_id = c.column_id WHERE kc.constraint_object_id = k.object_id ORDER BY kc.constraint_column_id FOR XML PATH ('') ), 3, 4000) , ReferenceBoundIndex = ri.name , IsParentIndexedForFK = CASE WHEN EXISTS (SELECT * FROM ParentIndexGood WHERE ParentIndexGood.constraint_object_id = k.object_id) THEN 'Yes' ELSE 'No' END , IsReferenceIndexedForFK = CASE WHEN EXISTS (SELECT * FROM ReferencedIndexGood WHERE ReferencedIndexGood.constraint_object_id = k.object_id) THEN 'Yes' ELSE 'No' END , IsReferenceBoundToGoodIndex = CASE WHEN EXISTS (SELECT * FROM ReferencedBoundIndexGood WHERE ReferencedBoundIndexGood.constraint_object_id = k.object_id) THEN 'Yes' ELSE 'No' END FROM sys.foreign_keys k INNER JOIN sys.tables pt ON k.parent_object_id = pt.object_id INNER JOIN sys.schemas ps ON pt.schema_id = ps.schema_id INNER JOIN sys.tables rt ON k.referenced_object_id = rt.object_id INNER JOIN sys.schemas rs ON rt.schema_id = rs.schema_id LEFT JOIN sys.indexes ri ON k.referenced_object_id = ri.object_id AND k.key_index_id = ri.index_id ORDER BY 1
Any time you receive a "No" to IsParentIndexedForFK or IsReferenceBoundToGoodIndex, then you're doing at least a partial scan every time the integrity needs to be checked in that direction. By "at least a partial scan" I mean you may have a two column FK with an index that has one column as its first key column, but no indexes that have those two columns as the first two key columns. In the execution plan that will show up as a seek, but the details on that operator will show one SEEK PREDICATE, and one PREDICATE. Most FK's only have a single column, and this typically means full scan.
As for the bound indexes I talked about during the last section, it's possible that you're bound to a bad index while you have a good index it could be using. You'll see that when IsReferenceIndexedForFK = "Yes", but IsReferenceBoundToGoodIndex = "No". As I stated earlier, you need to drop and recreate the FK to resolve this, and it SHOULD automatically pick up the good index. Afterwards, you'll need to establish trust again if appropriate.
Consider Creating a New Index
If you don't have a good index for your FK, you'll want to consider creating one with the FK's column(s) as the first key column(s). Notice I said "consider" and "first key column(s)" as opposed to "do this" and "only key column(s)". Remember that this is part of your Indexing Strategy, and you're doing what's best for your database instead of what's best for your query. By "consider" I mean that depending on how often the FK data is updated, when those updates occur, if updates are by users or system processes, what's typically in cache, and what other queries would benefit from the index, you may be better off without it. Data that's modified once a year in an off-hours job wouldn't warrant an index that no one else would use.
While nothing will be that cut-and-dry for you, keep in mind that you're only considering adding an index and not being forced to do so. I also mentioned the "first" key column, hinting that there may be more to it than that. A single key column with no included columns is probably perfect for your FK, but not always for your database. If other queries would take advantage of this index but would benefit from another key column with an included column or two then that may be best for your database. The FK can still use the index, it would just go from lightning fast to...well...it would still be lighting fast. Full disclosure, slightly slower lighting, but I don't live in Florida and don't know the differences between types of lightning enough to elaborate.
Wrap It Up
A foreign key is basically just another join that's implicitly created during data modifications to the key columns on either side that would cause a query to error out before it would let referential integrity be compromised. Like any other join, there are no hard or fast rules on how fast it needs to run or if indexes are required, that is up to you (or your users, or your boss, but not me). These joins are handled differently, but mostly just because they're bound to an index. You can't delete an index bound to a FK, and a FK can't change what index it's bound to on the referenced table. These changes require you to drop the FK, make any changes that you need to make, recreate the FK, then preferably check the FK to make it trusted again.