Lookup Table Madness

  • I appologize for the confusion.  I believe a careful reading of the article will make this very clear, but to summarize:

    The three seperate tables, each with a single type of code is the proper method.  The combined table causes numerous problems and its use cannot be justified.

    How's that for being non-wordy?

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • The perceived virtue of entity-attribute-value tables is that they will assist in creating a simplified schema (less tables = simplicity in some peoples' minds). The other perceived benefit is that there will be a performance gain due to less JOINs.

    Neither of these are real. These tables are not typesafe and all-but-impossible to constrain, thereby destroying any simplicity come maintenance time. Furthermore, if and when the database grows beyond its test size, the single-table solution will almost always result in performance issues due to table scans.

    Projects I've seen that utilized these tables shared some common traits: The databases were designed by junior team members, the projects were under budget, and by the time I got there the tables had corroded into massive problem areas that no one wanted to touch. One particular project had a table of this sort with a "datatype" flag. One of the flags was 'bool', which according to the data dictionary (which had not been updated for three years), should be '0' or '1'. Of course, with no constraints in place, this wasn't maintained. When I showed up, I discovered: '0', '1', 'Y', 'N', 'T', 'F', 'Yes', 'No', and NULL. Various applications that had been coded against the database all relied on their particular token value, so fixing the issue took months.

    --
    Adam Machanic
    whoisactive

  •  

    I've used this table structure for lookups, codes, two-field type situation for years.  Its been extremely effective and held under almost every situation.

    CREATE TABLE [dbo].[LookUp] (

     [GroupName] [varchar] (25)  NOT NULL ,

     [EntryId] [int] IDENTITY (1, 1) NOT NULL ,

     [ListEntry] [varchar] (30)  NULL ,

     [SortOrder] [int] NULL ,

     [Description] [varchar] (150) NULL ,

     [DateActivate] [datetime] NULL ,

     [DateDeactivate] [datetime] NULL ,

     [Active] [bit] NULL ,

    )

    Its got the ListEntry (or Code) and the Description to spell it out, but the nice extras were SortOrder and the DateActivate/DateDeactivate.  That helped avoid the dreaded deletion of inactive codes.  The Active column was a little extra that I maintain through a nightly job.  But with the GroupName column, I get to combine everything together.  Also, there's the requirement that GroupName+ListEntry be unique.

    On another note, despite the apology in the article, the sarcasm was much too heavy. 

  • First and foremost a database is designed in order to establish order out of chaos.  It contains data to facilitate information sharing; however, it contains rules by virtue of relational priciples that provide meaning to that information without which no understanding is passed on to the end-user.

    Who cares whether or not a particular database solution would or would not work for an application?  Who cares about performance?  Performance is a secondary requirement, not the first.  Performance can be improved by hardware, by physical database mechanisms, and, yes, by properly normalized database design.

    The database should be, must be, designed for the Business Model it is trying to mimic, nothing more, nothing less.

    Applications are designed to provide specific actions in the context of the GIVEN, not created, database model.

    The whole context of whether or not MUCK tables would be good here or there or in this or that application is pointless.  They are a bad database design for all applications...period.

    If you want a MUCK and you want the application to manage those rules, feel free.  Create a VIEW that UNIONs all the entities together for you, create a class, embed and "encapsulate" all of the code muck you'll have to create to maintain it, and present it to whatever application desires its usage.  Although, I will presume it will have performance "issues."

    But for the database, follow the rules...the relational rules.

     

  • I too thought the article was wordy.  It was quite refreshing, getting a plateful of detail and depth on a subject I've burned a lot of thought on, instead of the talking head sound bytes so common in many industry magazines (but not that common on SQL Central!)  After going back and forth on the "what's best" subject several times, I too have finally settled on the one table per lookup solution, for reasons covered by both this article and a similar one I found by Joe Celko--run, don't walk, to learn from Joe Celko--at http://www.dbazine.com/celko22.shtml.

    The arguments I had for "MUCKing" lookup tables was I didn't want to have to support dozens and dozens of two-column tables with three rows each (and maybe that was just our application).  I also wanted a single, simple code-lookup table, where I--and anyone else--could quickly and easily find all the codes used by our disturbingly complex system.  But I also wanted DRI, the assurance that all the data was indeed good and valid, and you just can't get that with a MUCK, not without triggers (urg) or application-enforced data integrity (yeah, right).

    These days, if I want a one-stop source for code looukp, I'd make a view "concatenating" the various source tables, something like (warning, psuedo code follows):

    CREATE VIEW CODE_LOOKUP

      (

        Code_Type

       ,Code_Value

       ,Code_Title

       ,Code_Description

      )

     as

      select 'A', Code_Value, Code_Title, Code_Description

       from Lookup_Table_A

      union select 'B', Code_Value, Code_Title, Code_Description

       from Lookup_Table_B

      union select 'C', Code_Value, Code_Title, Code_Description

       from Lookup_Table_C

    This code is not checked, and details and mileage may vary by application and implementation, but you should get the idea.  I don't think this would be suitable for application use, but for quick reference (particularly for developers and system trouble-shooters) it can be invaluable.

    As for "zealotry", Mr. Peterson pales in zealotry when compared with anything by Fabian Pascal (http://www.dbdebunk.com), whose writings support this article strongly.  Of course, it isn't "zealotry", but rather "conviction and the will to stand up for it"; it only becomes zealotry when--rightly or wrongly--you strongly disagree with what they say.

       Philip Kelley

     

  • Bravo!    I’ve had this very same discussion of numerous different projects.

     

    BAD

  • I had an application where "best practice" described in this article would have resulted in around 300 lookup tables averaging 7 entries per table.

    A single table with and id, type, description structure worked fine with no headaches.

    It required 4 stored procedures to manage the whole thing.

    I'm not sure where the table scan argument comes into it.

    SELECT Id , Description

    FROM dbo.MUCKTABLE

    WHERE Type=@lType

    does not produce a table scan unless you forget to index the table.

    Yes, having separate tables for each id, description type leaves less room for errors but building 300 tables + all the maintenance tables to manager 2,100 records seems excessive.

    The Id column is a primary key so you cannot cock up the referential integrity unless you are so terminally brain dead as to allow updates to primary keys that are used in referential integrity!

    A search for a specific entry in the lookup table will produce an index seek

  • Don,

    <<I apologize for the confusion. I believe a careful reading of the article will make this very clear, but to summarize: The three separate tables, each with a single type of code is the proper method. The combined table causes numerous problems and its use cannot be justified. How's that for being non-wordy?>>

    That's perfect. Thanks for taking the time to explain that. I did try re-reading the article but was still unable to verify for %100 that what you have stated above is what the article was conveying. My comment about the Wordy part was probably over kill. I too tend to have rather lengthy comments especially when trying to make a point. I am the firm believer that too much information is far better then too little.

    BTW... I also read your piece 'Is XML the Answer?'. Great article. I thought I was the only person on the planet who did not see XML as the second coming. XML is being used in massive excess and as you pointed out will in the end turn out to be less then it has been hyped and many will be left looking for a way to correct their over use of XML. I see XML as another tool in the development toolkit, not the do-it-all tool that will make most everyone thing else obsolete. I have been toying around with the Beta for SQL Server 2005 and I am not keen on the idea of storing XML within a single column in a table. My company provides a Client/Server product that comes in a web based version and their currently is some research into Web services. I fear that our development may get sucked into the XML madness.

    Thanks again for taking the time to explain when you didn't have to.

    Ed

    Kindest Regards,

    Just say No to Facebook!
  • David,

    When developers attempt to use these tables to get rid of JOINs, they'll convert this:

    SELECT AuxTbl1.Value AS Aux1, AuxTbl2.Value AS Aux2

    FROM SomeTable

    JOIN AuxTbl1 ON SomeTable.AuxID1 = AuxTbl1.AuxID1

    JOIN AuxTbl2 ON SomeTable.AuxID2 = AuxTbl2.AuxID2

    ...

    Into this (which appears to be a cross-tabulation but which is really just re-normalization):

    SELECT

    MAX(CASE WHEN SomeTable.AuxID1 = MUCK.AuxID1 THEN MUCK.Value ELSE NULL END) AS Aux1,

    MAX(CASE WHEN SomeTable.AuxID2 = MUCK.AuxID2 THEN MUCK.Value ELSE NULL END) AS Aux2

    FROM SomeTable

    JOIN MUCK ON SomeTable.AuxID1 = MUCK.AuxID1 OR SomeTable.AuxID2 = MUCK.AuxID2

    The thought process behind this change is that less joins = greater performance. In reality, the query optimizer will rarely be able to utilize an index for this kind of query.

    There is, of course, also the type safety issue. How will you constrain to make sure that dates in the right format, '1' means true (rather than 'T' or 'Y'), that the products your company sells include 'Widgets' and 'Thingers' but not '123', that a valid price for one of your products can be $39.25, but not 'ABC'.... the list goes on and on. And I have seen some form of every one of these examples (and a lot more) in real-world projects on which I've worked.

    --
    Adam Machanic
    whoisactive

  • Having suffered at the hands of "let's buy a reference table package because we don't have to build the code ourselves (i.e., do it right with normalized tables)", may I say "Thanks" for the sense and enjoyable sarcasm, Don.  MUCK TABLES SUCK!  In 20 years of relational design I've yet to see a good reason for polluting a design with MUCK.

    If you don't have the time to develop correctly, make sure you're prepared to accept all the consequences a MUCK table will bring you.  If you really must MUCK with your DB, make plans to un-MUCK down the line as part of later development and maintenance projects, taking one 'real' table at a time out of the MUCK heap. As pointed out in Don's article, its quite easy to clone functionality, especially maintenance windows/forms/pages and classes, so once you've undone your first piece of MUCK, the second and successive cleanups will be relatively cheap and will improve your DB.

    One plus-point for MUCK is that, according to an appropriate Yorkshire saying, "Where there's muck there's brass" ("brass" = money). So, for those in the community who enjoy being paid to wade around in MUCK, at least you have the consolation of many years of enjoyment and remuneration yet to come.

  • Everything depends on situation. It is may be right when you talking about 5-10 lookup tables. It may create madness for the databases with 100-200 such tables or on a server with 200-300+ databases each one using the same lookup tables. Your scenario will required 300 * 20 (number of lookup tables) = 6000 redundant tables. My point - design must be carefully chosen. In many cases mixed design is working very well. Mixed in term of 3-4 tables for generic standard codes (one main table) and standard (one table per code) design per specific codes. Don’t lock your mind.

  • meandi1999,

    Now you've introduced even greater data integrity problems. Why are these 200-300+ databases using redundant tables? Why aren't they sharing the data? What happens if (when!) one or more of them get out of sync with each other?

    --
    Adam Machanic
    whoisactive

  • My mind if firmly locked when it comes to logical correctness and data integrity. (really these two things are just different sides of the same coin).  Besides openmindedness is overrated.  If your mind is too open, what's to stop it from being filled with garbage?

    I don't care if there are 10,000 tables, if one has gone through the proper steps to design the database correctly then 10,000 is what it takes and combining some of them does not simplify things, but as I and others have shown, it does quite the opposite.

    P.S. there is no such thing as a "generic standard" code in a relational database.  That term is meaningless and the data stored in such a table will be largely meaningless too.  At best it will be overly complicated to make sense out of it.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Joe,

    I recall reading that article and it formed at least a part of my thinking on the matter, so thank you.  Actually, if I was to be completely honest about things I have to admit that I have had very few original thoughts on this subject (or any others for that matter.)  The thoughts expressed in my article are really not much more than a synthisis of the thoughts expressed by many other writers such as yourself and too many others to list.  I just hope that my writings are not so close a synthisis as to be plagarisim.

    I still think that any of the various terms are not precise enough to be truly useful.  Whether it is a code table, calendar, sequence or whatever, it still represents a unique entity and that representation should be treated no differently than a "more important" entity such as Customer etc...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I very much enjoyed the article. I sent it on to our team because we've had this debate internally several times.

    That said, a few bones to pick...

    It is a bit of a polemic. Tone down the rhetoric and concentrate on the message to get the information across in a more clear fashion.

    It glosses over one of the bigger problems that I've had with the single table/code table solution and that's multiple foriegn key constraints to the code table. Maintaining three or four different constraints between only two tables requires quite a bit of work when updating structures to prevent the loss of a constraint which would then cause the introduction of bad data. That coupled with the fact that you would need multiple triggers to maintain the appropriate data, it argues very strongly for the "complicated" approach of having a unique code table for each type of lookup instead of a single MUCK table.

    Again, good article. Nice work. Tone down the emotion just a tad.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 16 through 30 (of 202 total)

You must be logged in to reply to this topic. Login to reply