NULL Valued Foreign Key Alternative?

  • I am doing a lot of cleanup on a database that desperately needs it! There are very few primary / foreign key relations, so adding those is obviously a big part of what I'm currently working on to be able to produce accurate DB diagrams.

    The database has a few tables that are self-referencing to produce tree structures. From what I've known in the past, it is common place to have the parent ID of the "root" level records be NULL. I know that's ok to do even with a foreign key, but a lot of people seem to think it's not necesarily the best approach. I've read a few places where people set the parent ID of "root" level records to the primary key of that record. I don't really like the idea of that, and I'm not even sure how it would work if your primary key is an identity field that auto-increments.

    What I think I have come up with for a solution is to insert a "dummy" record with a primary key value of -1 and a parent ID / foreign key value of -1 as well. That way any "root" level records can be inserted with a parent ID / foreign key of -1.

    Technically, this could be a solution for primary / foreign key relationships with 2 tables as well. I have seen arguments with one side saying that a foreign key field that references a lookup table should have NULL if it's not a required field and the user doesn't select a value. Then, the other side of the argument is that there should be a value in the lookup table of "Unknown" to be able to select. My solution would be similar to having "Unknown" as a lookup value, but you wouldn't have to display that in a dropdown (or whatever your input method is).

    This all makes sense in my head, but a lot of things that make sense in my head aren't the best solutions! So, I was curious if there would be any major drawbacks to using this as a way to be able to enforce key constraints.

  • lucidspoon (4/12/2011)


    I am doing a lot of cleanup on a database that desperately needs it! There are very few primary / foreign key relations, so adding those is obviously a big part of what I'm currently working on to be able to produce accurate DB diagrams.

    The database has a few tables that are self-referencing to produce tree structures. From what I've known in the past, it is common place to have the parent ID of the "root" level records be NULL. I know that's ok to do even with a foreign key, but a lot of people seem to think it's not necesarily the best approach. I've read a few places where people set the parent ID of "root" level records to the primary key of that record. I don't really like the idea of that, and I'm not even sure how it would work if your primary key is an identity field that auto-increments.

    What I think I have come up with for a solution is to insert a "dummy" record with a primary key value of -1 and a parent ID / foreign key value of -1 as well. That way any "root" level records can be inserted with a parent ID / foreign key of -1.

    Technically, this could be a solution for primary / foreign key relationships with 2 tables as well. I have seen arguments with one side saying that a foreign key field that references a lookup table should have NULL if it's not a required field and the user doesn't select a value. Then, the other side of the argument is that there should be a value in the lookup table of "Unknown" to be able to select. My solution would be similar to having "Unknown" as a lookup value, but you wouldn't have to display that in a dropdown (or whatever your input method is).

    This all makes sense in my head, but a lot of things that make sense in my head aren't the best solutions! So, I was curious if there would be any major drawbacks to using this as a way to be able to enforce key constraints.

    Picking a value of -1 will work if you can guarantee that (a) -1 will never be a real key and (b) your keys are integers. I often find that if I can guarantee (b) I can't guarantee (a) in which case it won't work. The other problem is that if you have a lot of cases where you need to distinguish leaves from non-leaves and the keys in these various cases you have to pick a lot of distinct root markers - one for each key type, and that can be a pain because you now can't use the same code to detect leaves.

    Having a row in the "lookup table" that represents unknown is possible, but I would hate it: I would probably be programming around its presence all the time, because what you calll the look-up table is generally the most important table, the one which gives the high level view of something.

    So the best way to deal with this is to use NULL in the source table to indicate that there is no matching record in the target table.

    Another way to handle this (which I don't much like) is to introduce a new table to hold the relationship between the original two tables (exactly as you would for a many to many relationship) and this auxiliary table only holds real pointers (its primary key is the same column as the original source table's primary key, but it only contains rows for which the source really points to something from the target); this means you don't have any nulls in the foreign key source column. The auxiliary table has foreign key relations to both the original tables, and these can (and probably should) both have ON DELETE CASCADE; the foreign key relation pointing to the original target table can (probably should) have ON UPDATE CASCADE; so can the other one perhaps, depending on the requirements of what you are doing. This is an approach advocated by some anti-null purists; but it gets them nowhere because when you come to outer join the three tables to get the full picture (which you almost certainly will do) all the nulls pop up again. So there's not much gained in return for having three tables instead of two, two foreign key constraints instead of one, and doing three way joins instead of two-way joins.

    Tom

  • Tom.Thomson (4/12/2011)


    Picking a value of -1 will work if you can guarantee that (a) -1 will never be a real key and (b) your keys are integers.

    All of our primary keys are integers (or smallint), and every one of them is an indentity field to auto-increment to start at 1, so I don't see how -1 would ever be a key any other way.

    Tom.Thomson (4/12/2011)


    The other problem is that if you have a lot of cases where you need to distinguish leaves from non-leaves and the keys in these various cases you have to pick a lot of distinct root markers - one for each key type, and that can be a pain because you now can't use the same code to detect leaves.

    I'm not sure if I understand what you're saying. I hope that means we aren't doing anything like that, but I'm curious what you mean.

    Tom.Thomson (4/12/2011)


    Having a row in the "lookup table" that represents unknown is possible, but I would hate it: I would probably be programming around its presence all the time, because what you calll the look-up table is generally the most important table, the one which gives the high level view of something.

    Currently pretty much all of the lookup tables have a "Active" field, so there is already code when referencing those tables to make sure that the values should be displayed. Going forward, everything is being date tracked, so I was planning on having the -1 value in each table be end dated so that it would be treated the same as a value that has been "deactivated".

  • lucidspoon (4/12/2011)


    Tom.Thomson (4/12/2011)


    Picking a value of -1 will work if you can guarantee that (a) -1 will never be a real key and (b) your keys are integers.

    All of our primary keys are integers (or smallint), and every one of them is an indentity field to auto-increment to start at 1, so I don't see how -1 would ever be a key any other way.

    Tom.Thomson (4/12/2011)


    The other problem is that if you have a lot of cases where you need to distinguish leaves from non-leaves and the keys in these various cases you have to pick a lot of distinct root markers - one for each key type, and that can be a pain because you now can't use the same code to detect leaves.

    I'm not sure if I understand what you're saying. I hope that means we aren't doing anything like that, but I'm curious what you mean.

    If some of your things have integer keys, some have varchar(30) keys, some have nchar(6) keys, some have numerid(14,5) keys, some have date keys, some have datetime keys, and some have datetime2 keys, and some have binary(16) keys you can't use -1 as the "unreal" value for all of them; so rather than copying and pasting code that says something like "if @linked_thing = -1 return" everywhere you need to check for that case you have to pick the right version of "unreal" for the thing you are looking at. But as you say all your keys are integers that's not going to be a problem.

    Tom.Thomson (4/12/2011)


    Having a row in the "lookup table" that represents unknown is possible, but I would hate it: I would probably be programming around its presence all the time, because what you calll the look-up table is generally the most important table, the one which gives the high level view of something.

    Currently pretty much all of the lookup tables have a "Active" field, so there is already code when referencing those tables to make sure that the values should be displayed. Going forward, everything is being date tracked, so I was planning on having the -1 value in each table be end dated so that it would be treated the same as a value that has been "deactivated".

    If you already have inactive markers of end dates then you can use them to avoid most of the problem with introducing a row to represent "unknown"; but there may still be a problem if you ever do any reports on inactive or expired entries, or do tidy ups that clear out expired/inactive stuff. It's probably not going to be a big problem though.

    Tom

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply