Last week I wrote Ground Zero Database Design, thinking that was a good enough bare-bones intro (with recommended reading!) that I could leave the subject alone for a while. Then I went looking through my blog ideas folder, and found this:
DB design note:A column must mean what it means always, not just in steps 1-3. The IsADog flag must always refer to the dogness of that row.
I experienced this syndrome firsthand at a previous engagement…a nightly import and rollup process used one particular field to mean one thing at step A of the process, and to mean something else entirely at step B of the process. Why, dear children, is this WRONG WRONG WRONG?
This is something SO basic – like explaining to little kids why they shouldn’t eat poo – that’s it’s difficult for me to outline the many reasons this is bad. I asked the Twitter hive mind, and (skipping for a moment past the concerns expressed about the developer’s mental health) we have some excellent thoughts that dovetail nicely with what I’m trying to say. In my own words, then…
Databases are complex, period. The job of a data architect – and anyone who adds a table or a column has to be included in that definition – is to make the database as simple as it can be, for supportability, understandability, speed, and ease of app development.
What’s even worse is to have a field change meaning depending on the value of ANOTHER field. To quote @ThomasRushton directly:
eg if TypeID = 3 then Field3 is a phone#, else if TypeID = 4 then Field3 is a sort code, or… I’ve worked on these systems. Grr.
This is ridiculous to code for, and displays a level of artificial obfustication entirely unnecessary to a system that’s bound to be naturally complex in the first place. Why not give the table a PhoneNum column, a SortCode column, etc. and skip the TypeID? And again, how do you support that? What if the process changes – how much extra code will require modifications? It’s just too much.
Bottom line: Stick to design principles. A table is a thing, a row is an instance of that thing, a column is an attribute of that instance. Once you start giving attributes variable meanings (based on the step of a process, or time of day, or method of retrieval, or anything else), you’ve blown simplicity and supportability.
Thanks to @PaulRandal, @ThomasRushton, @Tarwn, @SeanGallardy, @JoeWebb, @DMMaxwell, and @Wendy_Dance for your input!