• lucidspoon (4/12/2011)


    I am doing a lot of cleanup on a database that desperately needs it! There are very few primary / foreign key relations, so adding those is obviously a big part of what I'm currently working on to be able to produce accurate DB diagrams.

    The database has a few tables that are self-referencing to produce tree structures. From what I've known in the past, it is common place to have the parent ID of the "root" level records be NULL. I know that's ok to do even with a foreign key, but a lot of people seem to think it's not necesarily the best approach. I've read a few places where people set the parent ID of "root" level records to the primary key of that record. I don't really like the idea of that, and I'm not even sure how it would work if your primary key is an identity field that auto-increments.

    What I think I have come up with for a solution is to insert a "dummy" record with a primary key value of -1 and a parent ID / foreign key value of -1 as well. That way any "root" level records can be inserted with a parent ID / foreign key of -1.

    Technically, this could be a solution for primary / foreign key relationships with 2 tables as well. I have seen arguments with one side saying that a foreign key field that references a lookup table should have NULL if it's not a required field and the user doesn't select a value. Then, the other side of the argument is that there should be a value in the lookup table of "Unknown" to be able to select. My solution would be similar to having "Unknown" as a lookup value, but you wouldn't have to display that in a dropdown (or whatever your input method is).

    This all makes sense in my head, but a lot of things that make sense in my head aren't the best solutions! So, I was curious if there would be any major drawbacks to using this as a way to be able to enforce key constraints.

    Picking a value of -1 will work if you can guarantee that (a) -1 will never be a real key and (b) your keys are integers. I often find that if I can guarantee (b) I can't guarantee (a) in which case it won't work. The other problem is that if you have a lot of cases where you need to distinguish leaves from non-leaves and the keys in these various cases you have to pick a lot of distinct root markers - one for each key type, and that can be a pain because you now can't use the same code to detect leaves.

    Having a row in the "lookup table" that represents unknown is possible, but I would hate it: I would probably be programming around its presence all the time, because what you calll the look-up table is generally the most important table, the one which gives the high level view of something.

    So the best way to deal with this is to use NULL in the source table to indicate that there is no matching record in the target table.

    Another way to handle this (which I don't much like) is to introduce a new table to hold the relationship between the original two tables (exactly as you would for a many to many relationship) and this auxiliary table only holds real pointers (its primary key is the same column as the original source table's primary key, but it only contains rows for which the source really points to something from the target); this means you don't have any nulls in the foreign key source column. The auxiliary table has foreign key relations to both the original tables, and these can (and probably should) both have ON DELETE CASCADE; the foreign key relation pointing to the original target table can (probably should) have ON UPDATE CASCADE; so can the other one perhaps, depending on the requirements of what you are doing. This is an approach advocated by some anti-null purists; but it gets them nowhere because when you come to outer join the three tables to get the full picture (which you almost certainly will do) all the nulls pop up again. So there's not much gained in return for having three tables instead of two, two foreign key constraints instead of one, and doing three way joins instead of two-way joins.

    Tom