A (Classic?) Table Design Issue

  • I think we’re veering off the highway and heading for a swamp, but I can’t explain how or why. Help!

     

    There’s this “UserPreference” table, along the line of:

     

    TABLE User (UserID, UserName)

     

    TABLE UserPrefernce (UserID, Col1, Col2, Col3, …Col121)

     

    …where the various “Cols” have clear and precise names used in one or another corner of the application—stuff like website layout, reporting options, client URLs, welcome messages, client operational defaults. As they are columns in the table, these values get set for every user, though some users might never use them (maybe they don’t have access to that web page or that report).

     

    Over time, more and more columns have been added to this table, making it more and more unwieldy. This happens because, as new applications and functionality is developed, new user-settable default and/or configuration values are generated, and they have to be stored somewhere… so why not add a few more rows to UserPreferences? (The table could maybe have been split into multiple tables, such as WebPreferences, ReportPreferences, etc., but this has not been done.)

     

    To manage this, the following table design has been proposed:

     

    TABLE Preference (PreferenceID, Name, Description, DataType)

     

    TABLE UserPreference_new (UserID, PreferenceID, Value)

     

    …so you’d add preference “GreetingMessage” to Preference, and for each user who might require it you enter the appropriate UserPreference for that user in UserPreference_new. The “Value” is a varchar, and “DataType” really just a reference to what ought to be in there (since “dynamic data type domain integrity” cannot be enforced). This would change us from a “wide shallow” table to a “thin deep” one.

     

    The thing of it is, we did this at the last job I was at. It started out cool and useful, and ended up one a tremendous pain in the a** … because people (not just the developers, but modelers and designers and even requirements writers who’d heard about it) would toss most any old thing in there, and some things (state values, key values, even address information) just doesn’t belong in such a table.

     

    My problem is that I can’t explain this clearly, concisely, and (most importantly) convincingly. I know that in many situations this is  a really bad idea, but I can’t explain why well enough to convince people. It also seems like it is a valid situation in some situations for some data, but I cannot clearly define to my own satisfaction what those circumstances would be.

     

    Most importantly, this has to be a wheel—a classic database design issue that’s come up time and time again… and I don’t know the working name or title for this, and so can’t readily look anything up on the subject.

     

    Any links to prior discussions or references to erudite database theory would be greatly appreciated!

     

       Philip

     

  • This was removed by the editor as SPAM

  • If your developers, designers, and requirements weenies are allowed to "toss any old thing" into your database, you don't have technical problems, you have people problems.  Why do they care in which table it gets stashed, as long as they can read or update it efficiently when necessary?  I've done the name-value pair technique as well, right up until SQL Server decided that 300 connections hitting this one table simultaneously for every web page refresh was a lot of work and slowed the whole web site down in protest.  This generated lots of "donated" overtime for moi, and lots of stomach acid.  If I were you, I would explain to them how indexes work, how indexing the PreferenceID column will only get you so far, how by putting all the values in a generic "Value" column will lose the value of strong datatyping, and how things seem to be performing fine right now, but you can't guarantee how long that will continue and when it does go south, it will be a very dark day filled with much gnashing of teeth and rending of garments and the cries of the afflicted will fill the valley.  Or words to that effect.  Throw in mention of "violation of Third Normal Form" (don't forget to capitalize) and referential disambiguation, and they should be putty in your hands. 

    There is no "i" in team, but idiot has two.
  • Sound advice, and I'll give it the try. The uphill curve is that I'm new at this company and thus the "other" DBA (of two), and my opinion doesn't carry all that much weight yet. I've hit the "name-value pair" [good title!] conundrum before and know the pain it can induce, but that'e experience that I'm having trouble clearly communicating.

    I believe that they're reluctant to add further rows to the table because of the implementation--Java, using Hibernate as the data access layer. (I'm still a bit vague on how this tool works, but to-date I'm not overly impressed.)

    And it's not so much that we have people problems as it is lack of (DBA) control over database design. If we open a door in the database like this, I just know that eventually that cat will drag in something dreadful.

       Philip

  • Sound advice, and I'll give it the try. The uphill curve is that I'm new at this company and thus the "other" DBA (of two), and my opinion doesn't carry all that much weight yet. I've hit the "name-value pair" [good title!] conundrum before and know the pain it can induce, but that'e experience that I'm having trouble clearly communicating.

    I believe that they're reluctant to add further rows to the table because of the implementation--Java, using Hibernate as the data access layer. (I'm still a bit vague on how this tool works, but to-date I'm not overly impressed.)

    And it's not so much that we have people problems as it is lack of (DBA) control over database design. If we open a door in the database like this, I just know that eventually that cat will drag in something dreadful.

       Philip

  • Grr. Rural satellite wireless connection = double post.

     

  • Don't do it!  This is a train wreck waiting to happen.  This is essentially the same thing that I refered to as a MUCK table in this article:

    http://www.sqlservercentral.com/columnists/dpeterson/lookuptablemadness.asp

    Also see the discussion of this article.  Ignore the article itself, look at the discussion...

    http://www.sqlservercentral.com/columnists/rnarasimhan/databasedesignandreferencetables.asp

    This is a bad idea in EVERY situation.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Phillip:  I feel your pain.  I, too, am the 2nd banana in a two DBA shop.  And I really believe that you have a people problem if they won't let you manage the database as you see fit.  Unfortunately, there is probably not much you can do about it.  You just have to "stay on message" as they say in the White House until they either see the value of having a pro look after their bytes, or learn to trust you.  Good luck.

    There is no "i" in team, but idiot has two.
  • Good stuff here!  Thanks to everyone that has posted!  I have a question along these lines:

     

    What about a company that sells lots of different products such as shoes, food, cars, dresses.  Should there be a product table for each product type or should there be a wide product table to accommodate all the different properties these products will have.  I can see now that the “name-value pair” design would be bad – based on the posts here.  It was a compelling way to go.  How about a central product table that feeds out to custom product tables?

     

    Here is another problem with the same issues:  An aircraft manufacture wants to put all of the parts that go into the airplane into a Parts table, but the properties of the parts very widely.  Should each part-group get its own table?  This is very similar to the first example.

     

    I am guessing there is a “best practices” here and I would love to here it!  Thanks for your time!

  • A model that should work in these cases is that of  type-subtype.  A key component is that there is a set of properties shared by all the items you are tracking, but the items fall into groups of subtypes with their own distinct properties. You have a "main type" table, and one "sub type" table for each type that has properties unique enough to not belong in the main table.

    The textbook example is frequently vehicles.  The main table ("cars") contains properties pertinent to all vehicles (engine type, mileage, fuel type, etc.), and you add subtypes for each type you're tracking (cars have numbe of doors, motorcyles don't, trucks could have maximum cargo space/weight, and so on.)

    Design-wise, the main table has a primary key, and each subtype table uses that key as both its own primary key and a foreign key to the main table. No primary key value appears in more than one sub-type table.  This, alas, cannot be forced at the relational integrity level, and so must be enforced either with code (triggers, stored procedures, application logic) or wishful thinking.

    The above is a very brief intro based on theory, not experience--I've never implemented a proper type/subtype model. I have, however, dreamed of them, as implementing such logical constructs as "independant" tables rather than type/subtypes can lead to very painful code down the road. Definitely a subject to read up on in books on database design. A suggestion: start a new thread here on this subject and see what people with practical experience advise.

       Philip

  • Philip

     

    Thanks for the quick reply.  I have seen your solution in OO programming before.  I don’t know why I didn’t think to apply that to databases.  I am sure that this method is the way to go now that I have thought about it.

     

    Thanks for your time

  • I'm almost done with a system that does exactly what you are talking about. So far it all works great. I am having only one problem that I have not been able to solve which revolves around how to work with parts like a Tire that can be placed front or rear.

    How to deal with this when a tire is just that a Tire. A tire has no front or rear attribute to it since you can generally place a tire on the front or rear of a vehicle. You will only run into this problem when you want to keep track of where a customer wants to have the tire placed on their vehicle. Say someone wants to place one brand of tire (or size/type/etc.) on the front and another on the rear.

    So when you have a Tire table that has the tires attributes and you then have something like an Assembly table that stores the parts selected how do you know which of two tires in the Assembly table go where on the build vehicle?

    Anyone have any ideas/thoughts on this?

  • I would have two fields:

    Front_Tire

    Rear_Tire

     

    Both bit type.  Both in your Tire table and in your Assembly table.  If the tire is rear only then it gets a 1 in the Rear_Tire field and a 0 in the Front_Tire field.  Reverse for a front only tire.  If it goes in both it gets a 1 in both.  Same for the Assembly table.

  • I'll tell you a little more about what I did. Instead of adding part type specific fields into the assembly table which is a generic part store, meaning it holds all types of parts together; I added a single field called FrontOrRear. This FrontOrRear field can have the following values ('F', 'R', 'S', 'B'). The values are fairly straight forward in that they stand for (Front, Rear, Set, Both). So when parts are inserted into the Assembly table, I called it a BuildItem table because Assembly conflicts with the .net framework namespace/classes, they get inserted with the value appropriate for where the part is to be used in the physical object.

    This solved the first problem that I ran into. The second part of this problem is how to deal with some of the different part types that I have. Say for example I have a Brake, well a brake is a part that has in my case identical attributes front and rear, which is why I didn’t create FrontBrake and RearBrake tables, but its shape and size are very different between the front and rear. In my system I have a Brake and the attribute FrontOrRear with values that are always either (‘F’, ‘R’, ‘S’). There aren’t any brakes that would have a value (‘B’) because there isn’t a case where a single physical brake could be used in either the front or the rear. This part is easy to deal with because whenever the brake gets inserted into the BuildItem table the correct value for BuildItem.FrontOrRear can always come directly from Brake.FrontOrRear.

    The other part Tire is a little more complicated as a tire more often has the FrontOrRear values of (‘B’, ‘F’, ‘R’), (‘B’) being the most common. So when the value is most always going to be (‘B’) we can’t use Tire.FrontOrRear for BuildItem.FrontOrRear because we won’t know if the inserted tire is meant for the front or rear of the finished physical object. What we have to do is somehow know from which DropDownList the inserted tire came from and pass the value of (‘F’) or (‘R’) into the BuildItem.FrontOrRear column. This is where I am now with the problem.

    I feel like I’m ranting a bit and I’ll stop here and let anyone else add their input.

  • It sounds like you need to differentiate between your parts table which contains the attributes of the parts and a work table which indicates which parts are to be installed (and where) for a given work order etc...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply