RE: Category, Sub Category and Article Relation

SSC Guru

Points: 104773

March 12, 2012 at 6:59 pm

Method 1 works badly if some articles point to category and others to subcategory; either you give up on database-enforced referential transparency, or you do it with a trigger instead of with a foreign key a constraint, or you use two pointers and two nullable reference constraints plus a check constraint to ensure that one or the other pointer is null (accepting to have two nullable columns in the article table). So I wouldn't use method 1.

Method 2 copes slightly better with the basic referential transparency constraint (at most one of the columns which is source of a reference constraint has to be nullable); but if you want the database to ensure valid data, you have to ensure if two articles point to the same subcategory they also point to the same category. That can only be done with a trigger, which is required even if all articles point to a subcategory (so that neither reference needs to be nullable). Ideally of course the category to subcategory relationship is expressed between those two tables, which would have an effect on that trigger (probably make it easier to understand version, in fact). But again, I don't like it.

Method 3 could be varied a little: the article table points to an entry in the category_and_subcategory table, so there's only one pointer and it can never be null; each subcategory entry in the latter table points to the entry for its parent category, while each top level category points to itself (this pointer is the only extra column in the category table, a separate marker to indicate whether something is a subcategory or not is unneccessary; and it doesn't need to be nullable - indeed it must not be nullable). I don't understand what the pointer from category id to article id that you mention is. No trigger is needed for this method. This method, unlike the other two, can support a hierarchy with more than 2 levels of category, which may or may not be useful - probably not, judging by what you say about joins. There needs to be a clean relationship between primary keys of top level categories and primary keys of subcategories. If you have only two levels (top level category and subcategory) you could insist that top level categories have negative keys (if you use numeric keys) or have one of a small set of first characters (if they are strings) whereas subcategories have positive keys (using integers) or begin with none of those characters (using strings); this allows a simple check constraint to be written that ensures there are only two levels. If you have a deeper hierarchy you might want to insist that a subcategory's primary key compares greater than its parent's key (whether they are numeric or string - with of course all the mess of comparing composite keys if the keys are not simple, but it looks from your query as if they are simple single column ids); this ensures that a very simple check constraint can be used to ensure that there is no loop in the category hierarchy except where a top level category points to itself to indicate that it is top level. Anyway, uou end up with no nullable columns, which is good news, and no need for any triggers, which is also good news. So I think I would choose that method.

Tom