Find Missing FK Constraints and Fixing them

  • I am looking for a technique to try and find missing FK constraints in a database. I know you cannot necessarily detect these, but using something like this: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/missing-foreign-key-constraints/

    How would you determine if the FK should exist and then create it. Show 2-3 examples of where you may, or may not, create an FK.

  • I'll give this one a go Steve. It dovetails nicely with some work I'm going to be doing on RedShift and Teradata.

  • My initial thought is using information_schema views or sys tables to enumerate what table/column combinations constitute primary keys, join to any other tables that contains the same column combination, while excluding any tables which already contain a foreign key on those same columns. That would be more comprehensive than the referenced example which seems to assume that primary keys have "ID" in the name and contain only that one column.

    I'm in a new job, so I don't have bandwidth to complete the actual code, but here is some starter code for anyone who wants to mix and bake it:

    SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME

    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc

    INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku

    ON tc.CONSTRAINT_TYPE = 'PRIMARY KEY'

    AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME

    ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;

    SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME

    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc

    INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku

    ON tc.CONSTRAINT_TYPE = 'FOREIGN KEY'

    AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME

    ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Take it, David.

  • If (as I think you do?) you re-use column names in other tables then if you have CustomerID in your Customer Table, as the PKey or as a Unique constraint/index, then if there are columns on other tables also called CustomerID they would be potential candidates for a FKey.

    My naming, which we've debated and you are not a fan ;-), would be:

    EDIT: Sorry, it was Sean I debated this with, not you, apologies.

    cus_id on the customer table

    and

    inv_cus_id on the invoice table.

    We always use the full column name of the column from "the other table", so I do reliably use this to detect any column usage within another table and report where FKeys are missing.

    But I don't think my cus_id / inv_cus_id naming is any different to your CustomerID naming (in this regard) - provided that you use it reliably, and "everywhere"?

    But maybe I've misunderstood the question and you want to go beyond this?

    It did cross my mind whether it was worth checking SQL statements (e.g. from Profiler or some DMV) and parsing the JOIN conditions out of them for anything that didn't match an FKey. I suppose the query planner only considers indexes (and not FKeys?) and therefore you cannot get a "suggestion" for an FKey out of the DMVs as a side effect?

  • Using the contents of the query plan cache, you can also parse the JOIN conditions of SELECT statement, to infer foreign key relationships. That would compensate for situations where foreign key columns are given different names. I don't think I've ever seen that attempted before, but it's theoretically sound. It would require at least a full uninterrupted day to code up.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I've been playing with this sort of thing for a while.

    1. Identify missing primary keys and unique constraints
    2. Identify the number of fields in the key
    3. Identify naming conventions for primary key fields and the equivalent in other tables
    4. [Identify candidate FK where a table contains the equivalent fields as those participating in aa PK[/li]

    5. Discount those where there is already an FK relationship
    6. Test the data to precheck violations
    7. Generate PK & FK scripts through automated means

    Text parsing SYS.comments for joins in views should also help Identify candidates but is significantly more involved.

    I Have a working box of bits for Vertica,, Teradata, MySql, SQL server and I'm looking to experiment with RedShift in the near future. I think working out how to do this for Postgres would be handy.

  • We have a filter index including WHERE MyKeyColumn IS NOT NULL on some of our FKey-supporting indexes - e.g. where we have an FKey on a User Defined Column which may not be used / may not be used much.

    Might be an idea to consider if the FKey candidate column(s) is Nullable or not?

  • I've got a script i call missing and implied foreign keys that i built over the years.

    it's basically a peel the onion kind of attack, which kind of assumes some naming conventions are in place(the FK column exactly equals or contains the PK column name, maybe with a prefix/suffix.

    find all PK's and UQ's, and then any columns that equal or contain, but are not featured in the current FK's

    --current PK/UQ columns

    ;WITH CurrFKS

    AS (SELECT Object_name(constid) AS constraintname,

    OBJECT_SCHEMA_NAME(rkeyid) AS refschema,

    Object_name(rkeyid) AS reftable,

    COL_NAME(rkeyid, rkey) AS refcolumn,

    OBJECT_SCHEMA_NAME(rkeyid) AS fkschema,

    Object_name(fkeyid) AS fktable,

    COL_NAME(fkeyid, fkey) AS fkcolumn,

    ' ALTER TABLE '

    + Object_name(fkeyid)

    + ' ADD CONSTRAINT [PLACEHOLDER]'

    + ' FOREIGN KEY ('

    + COL_NAME(fkeyid, fkey)

    + ') REFERENCES '

    + Object_name(rkeyid) + '('

    + COL_NAME(rkeyid, rkey) + ')' AS fksql

    FROM sysforeignkeys

    ),

    MyCTE

    AS (SELECT Object_schema_name(colz.object_id) AS FKSchemaName,

    Object_name(colz.object_id) AS FKTableName,

    colz.NAME AS FKColumnName,

    colz.column_id AS FKcolumn_id,

    CurrentReferenceCandidates.*

    FROM sys.columns colz

    INNER JOIN (SELECT

    idxz.object_id,

    object_schema_name(idxz.object_id) As PKSchemaName,

    object_name(idxz.object_id) As PKTableName,

    pcolz.name As PKColumnName,

    pcolz.column_id As PKColumnID,

    idxz.is_primary_key ,

    idxz.is_unique

    from sys.indexes idxz

    inner join sys.index_columns icolz

    on idxz.object_id = icolz.object_id

    inner join sys.columns pcolz

    ON icolz.object_id = pcolz.object_id

    AND icolz.column_id = pcolz.column_id

    WHERE (idxz.is_primary_key = 1 OR idxz.is_unique = 1)

    and object_schema_name(idxz.object_id) <> 'sys') CurrentReferenceCandidates

    ON colz.NAME = CurrentReferenceCandidates.PKColumnName

    ----ON colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'

    AND colz.object_id != CurrentReferenceCandidates.object_id

    WHERE Object_Schema_name(colz.object_id) != 'sys'

    AND colz.object_id != CurrentReferenceCandidates.object_id

    AND Object_schema_name(colz.object_id) = pkschemaname

    AND colz.name = CurrentReferenceCandidates.PKColumnName --toggle below for second version

    --AND colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'

    ) SELECT ' ALTER TABLE ' + FKSchemaName + '.'

    + FKTableName + ' ADD CONSTRAINT [PLACEHOLDER]'

    -- + FKTableName + '_' + FKColumnName + ']'

    + ' FOREIGN KEY (' + FKColumnName

    + ') REFERENCES ' + pkschemaname + '.'

    + PKTableName + '(' + PKColumnName + ')' AS fksql

    FROM MYCTE

    WHERE FKTableName NOT IN(SELECT NAME

    FROM sys.views)

    EXCEPT

    SELECT FKSQL

    FROM CurrFKS;

    then i have variations of it, which assume someone didn't create primary keys in the first place.

    two versions, one which assumes any identity columns should be a PK but the PK is missing, and the last version where i assume the first column in a table is the potential PK, and compare other tables based on the same naming convention assumption.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Nearly finished my solution to this. Turns out to be far more involved and have loads of gotchas than a casual approach would reveal.

  • Thanks, looking forward to it.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply