A few months ago I read an article from SQLServerCentral.com about some Foreign Key gotchas. Since that article, I have seen several people asking about whether a foreign key should be indexed or not. I have seen the benefits of having FKs indexed as well as the agony of not indexing them. The article discusses some of the benefits. In summary, it will help to avoid deadlocks, reduce locks, and improve performance. My intent is not to go into depth on the article, and to focus on a couple of scripts to help identify which tables need indexes on the foreign keys. This article is a part of my Key documentation series.
If you browse the internet, you will likely be able to find several scripts to accomplish the same goal. There is one script that I have seen that will generate the script for you to create the index on the foreign key. Many scripts are similar – some perform slightly different things. I thought about evaluating some of the scripts and comparing them. I have found some to be less accurate than others. I decided after some thought that I would just share my scripts. I will show the evolution of the script that I now use.
This was just to hammer out a solution that would get me the results I sought. This version uses some objects that are to be deprecated. Thus should only really be used on Servers that are SQL 2000.
SELECT o.name AS TableName,c.name AS FKColumn FROM sys.sysreferences r INNER Join sys.syscolumns c ON r.fkeyid = c.id and r.fkey1 = c.colid INNER Join sys.sysobjects o ON c.id = o.id LEFT Outer Join sys.sysindexkeys k ON c.id = k.id And c.colid = k.colid WHERE k.id IS null ORDER BY o.name
This works fine and with limitations. Obviously there was the limitation of using objects that are scheduled to be deprecated. The second limitation is that it is not 100% accurate. This script does not pull all of the Foreign Keys that are missing Indexes. This is the same sort of issue that I found with several scripts on the internet. That said, this query is rather efficient. When compared to future versions, it was 8% less costly and about 150ms faster. However, it does consume more disk IO and more CPU time.
With these limitations in mind, I decided to create a second query.
SELECT OBJECT_NAME(FK.parent_object_id) AS FKTable ,OBJECT_NAME(FK.constraint_object_id) AS FKName ,C.name FKColumn FROM sys.foreign_key_columns FK INNER JOIN sys.columns C ON FK.parent_object_id = C.OBJECT_ID AND FK.Parent_column_id = C.column_id INNER Join sys.objects O ON FK.parent_object_id = o.OBJECT_ID LEFT OUTER JOIN sys.index_columns ic ON FK.parent_object_id = ic.OBJECT_ID AND FK.Parent_column_id = ic.column_id WHERE ic.OBJECT_ID IS null
This query is less limited than the first. Here, I have improved the performance substantially over the first query and the execution plan cost is lower. I have implemented the use of SQL 2005 objects, decreased Disk IO, and decreased run time. CPU time remains about the same as the previous example. Still one limitation that is pretty glaring. I am not returning all Foreign Keys that are missing an index.
Moving onto attempt number three.
As I was testing version three of this query, I noticed there was another item that I needed. I ran the tests in a database with multiple schemas and had difficulty locating the tables that were in the list. When looking at the expanded view in SSMS, the tables are grouped by schema and listed alphabetically within that schema. When I realized where the table was, it dawned on me the need to include the schema in the query results. Adding this little piece of information will save a few seconds when trying to verify the information in the report.
SELECT schema_name(o.schema_id) + '.' + OBJECT_NAME(FK.parent_object_id) AS FKtable ,OBJECT_NAME(FK.constraint_object_id) AS FKName ,C.name FKColumn FROM sys.foreign_key_columns FK INNER JOIN sys.columns C ON FK.parent_object_id = C.OBJECT_ID AND FK.Parent_column_id = C.column_id INNER Join sys.objects O ON FK.parent_object_id = o.OBJECT_ID --Inner Join sys.schemas s -- On s.schema_id = o.schema_id LEFT OUTER JOIN sys.index_columns ic ON FK.parent_object_id = ic.OBJECT_ID AND FK.Parent_column_id = ic.column_id And ic.index_column_id = FK.constraint_column_id WHERE ic.OBJECT_ID IS null ORDER BY FKtable
In the code, note that I have opted to use the schema_name() function rather than use the Join on the sys.schemas view. By using the function, the script performs a little better. There is another addition I made to the script.
LEFT OUTER JOIN sys.index_columns ic ON FK.parent_object_id = ic.OBJECT_ID AND FK.Parent_column_id = ic.column_id And ic.index_column_id = FK.constraint_column_id
An additional condition was added to the Join. With the addition of this condition, I am now able to pull back the additional foreign keys that are missing indexes. In addition to this additional condition, I have added an Order By, which will slow it down a bit. The offset to the decrease there is that the data is more manageable for verification.
There are many methods to arrive at this data. The difference in performance between the methods I wrote and evolved was negligible. The most important concept in the end is having a solution that is accurate and timely to provide this data. All of these solutions finish in under a second for me. None of them place a significant impact on the server, and thus I can choose for accuracy over performance in this instance.
This exercise is a part of getting to know your database. Finding and documenting is very helpful when it comes time to troubleshoot. Once this information is garnered, one needs to evaluate whether or not to place an index on that foreign key.