RE: To Parent/Child or Not To Parent/Child

SSC Guru

Points: 260824

December 15, 2009 at 11:41 am

Plenty of ways to manage that in a child table.

One option (pretty much the default) would be to delete the ones that were unchecked, and leave the rest of the rows alone.

Another option, have an "Active" column in the child table, and the checkmarks simply update those directly, instead of deleting.

Delete and replace (like you mentioned) is valid, but junks up the transaction log. Works, and probably won't create any problems, but it's "less elegant" than some of the other solutions. But that's just my prejudice showing.

But that's just part of it.

What happens when you want to delete a customer? In a parent-child multi-join (what you're talking about), you delete the rows for that customer ID from the child table, and you're done. In a "keep them in one column in the main table" solution, you first have to query every row in that table to find if that customer ID is in there, then you have to run an update on each of those rows. Can be done with a Replace command, but you had better be sure that deleting customer ID 3 doesn't accidentally change customer ID 13 to a second copy of customer ID 1 (removing the 3), and so on. One is simple (delete rows from a table based on a foreign key), the other is more complex.

One version (child table) enforces (with a unique index, quite possibly the clustered index) that any given customer ID can only be associated with any given bulletin ID once. Strings don't have that kind of enforcement, and it would be easy to end up sending multiple copies of the same bulletin to the same person.

And so on.

It's not that the list-in-one-field version can't be done, or is some sort of world-destryoing cataclysm, it's that decades of study and testing and live implementations have found that violating First Normal Form almost universally causes more problems than it solves.

If you want to, try it for a while. See for yourself. The kind of data you're dealing with, it probably won't cause enough problems to be a disaster. It'll end up being really annoying is all. But the experiment might be worth it just for your own edification.

My first database was a mess. Lots of places where I had solutions very similar to what you're talking about. The way I learned what works and what doesn't was by trying things. I've violated conventional DBA wisdom many, many times. Sometimes it's worked out well, other times it's ended being something I had to pay for, but learned from.

That's the difference between knowledge and information. Information is something you've studied, knowledge is something you've lived. Knowledge is better.

So, I say it's not that big a deal in this case, try whichever you like, and if you find that it works, use it. If you find that it's problematic, switch to the other design. It's not that hard to normalize a single field into a sub-table. Requires rewriting some code, etc., but from what you're saying, I don't think that'll be a make/break issue in this case.

I'd still go with the normalized version, and just play around with the denormalized one in a proof-of-concept environment. That's usually the better way to turn information into knowledge. But it's your data in your database, so it's your decision.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon