Lookup Table Design

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/lPeysakh

  • Joe Celko gives the argument against design 2, the "One True Lookup Table" (a.k.a "monster of all lookups") at http://www.dbazine.com/celko22.shtml

  • I don't really have much experience with DB design but the sql query in design 3 with the inner joins seems to put more work on the DB server, doesn't it?

  • It's not completely necessary to use a textual method for identifying the group of related codes.

    Similarily, having a status based on a textual value is also not ideal.

    eg.

    Use one table to contain a list of all the possible Code Groups :-

    CREATE TABLE [dbo].[tbl_Code_Group] (

     [Code_Group_Id] [udt_Code_Group_Id] IDENTITY (0, 1) ,

     [Code_Group_Descr] [udt_Descr] NOT NULL ,

     [Code_Group_Type] [udt_Code_Id] NOT NULL ,

     [Code_Group_Status] [udt_Code_Id] NOT NULL ,

     [Code_Group_Purpose] [udt_Text] NULL ,

     [Create_Date] [udt_Date_Time] NOT NULL ,

     [Create_User] [udt_User_Id] NOT NULL ,

     [Update_Date] [udt_Date_Time] NOT NULL ,

     [Update_User] [udt_User_Id] NOT NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    Another is used to contain all the possible codes :-

    CREATE TABLE [dbo].[tbl_Code] (

     [Code_Group_Id] [udt_Code_Group_Id] NOT NULL ,

     [Code_Id] [udt_Code_Id] NOT NULL ,

     [Language_Id] [udt_Language_Id] NOT NULL ,

     [Code_Descr] [udt_Descr] NOT NULL ,

     [Code_Descr_Short] [udt_Descr_Short] NULL ,

     [Code_Mnemonic] [udt_Code_Mnemonic] NULL ,

     [Code_Text] [udt_Text] NULL ,

     [Code_Type] [udt_Code_Id] NOT NULL ,

     [Code_Status] [udt_Code_Id] NOT NULL ,

     [Code_External_Ref] [udt_Code_External_Ref] NULL ,

     [Code_Display_Status] [udt_Code_Id] NOT NULL ,

     [Create_Date] [udt_Date_Time] NOT NULL ,

     [Create_User] [udt_User_Id] NOT NULL ,

     [Update_Date] [udt_Date_Time] NOT NULL ,

     [Update_User] [udt_User_Id] NOT NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    NB: All columns with "_Id" are of type Integer.

    Provided you know the numeric Code_Group_Id you can always retrieve the appropriate Codes for the Code Group. Having additional fields for external refernces / Mnemonics where they are needed implies you can lump it all into this one table.

    If you then write an object class to deal with the Codes it is very simple to encorporate all the appropriate functionality of the tables into your application using fairly straightforward programming constructs.

    The Use of a Language_Id in the second table facilitates support for multiple language applications.

    Note, The Code_Status is itself another Code_Group_Id as is the Code_Type and Code_Display_Status.

    Basically, I never use any non-numeric keys for any form of codes. They are always translated through the Code table. Keeps life really simple.

    Lots of ways to skin the cat I suppose.

  • I definitely agree with having the Code_ID field an Integer (and/or AutoNumber), especially when it becomes a foreign key.  It gets kind of ugly dealing with user-entered values as the foreign key (and gets a little uglier with Oracle).  Not to mention, it gives you much more flexibility with your design.  We typically include a CODE_ID column with the explicit purpose of allowing the user to enter a code if they prefer or if we are synching/importing from another database. 
     
    As for the dilemma Joe Celko references with the excessive paging, this can always be prevented/aided with a clustered index on the Group_ID field.
     
    And lastly, you will probably want some type of Order column and maybe even a Status column.  At some point, the users will want a particular order or grouping of the lookup values.  And eventually, they will want to "mothball" the values so they do not appear in their dropdown lists.  We combine both into one column by allowing them to order the values 1..whatever, and even duplicate numbers they want grouped together.  To prevent the value from showing in the dropdowns, we have them set the value to 0 or a neg. number.  And yes, I know, you usually do not want a column to serve as a "two-headed beast", but, so far, this has been well received by the users.  The neg. values only hide the values from new data entry values, but the reports do not limit on this column.
     
  • Ok, I am sorry - not really - but this is design madness. 

    Seperate tables for each lookup table.  Surrogate int keys when appropriate (almost always).  This keeps referential integrity possible - hello thats the point of a relational database.

    ISO codes - give them a column with a unique constraint.

    Never, never, never design anything that uses triggers for referential integrity unless it is as a last resort.

    I want my lookup tables to be easy to use

    Select F.Comments, S.StatusCode

    From Form F

    Join Status S

    On F.StatusID = S.StatusID

    Where F.FormID = 34567

    There, I have comment and the text version of the status information, something the user can read.

    Also, if I want to cache Status Code information somewhere the table is small and light so it is easy to cache as say XML


    Michael R. Schmidt
    Developer

  • Where could one get a list of these standard codes?

    Signature is NULL

  • I think that the article is on target.  I am no newbie and I have grown to love appreciate the combined code table (scenario 2).  I would say that this solution is most useful for codes that are defined within an application and hidden for most external use.

    In my experience it is not too confusing to use a self-reference for code type values within the table, but I have yet to see referential integrity enforced in the database, as it is just too blind and disallows a great deal of flexibility and value.  That's another topic I suppose.

    Celko's points are valid when the codes are externally defined and strongly typed/formatted.  I would never put country codes in the same table with diagnosis codes.   In some of these cases the number of codes is very large and, in any case, deserve separate maintenance.

    Also, I've found that a simple function call is SOOOOO much better than a join to do a code lookup!

  • Usually when an abritrary ID is used in place of actual codes it is necessary to provide alternative keys to assure uniqueness of the values. The patterns here do not show any alternative keys so I can only assume there are none. Most designers forget that is you have (code, description) The code is the primary and the description must be an alternate key to prevent duplicate descriptions. Now add in a new value such as (ID, code, Description) Now the primary key is probably ID + Code - but remember you must also have ID + Description to prevent the duplication. With each additional component to the pattern the issue of maintinaing unique codes and descriptions become more difficult. A real world example. The city was concerned that not enough tax revenue was created by a certain property type. Upon a one-minute investigation it was found that 5 codes had the same description. So, if you did not ask for all 5 codes you would not get the right answers. SOlution - assign one code + description then update all the bad records to that one code-description and what do you know - revenue. Without assurance that this holds true in this design I hold a reserved opinion. (No duplicates of any columns in lookup tables)

    Pat

  • I would prefer ID, Code, Description where ID is identity int and Code is unique.  Not sure I care if description is unique although I can see that as a possible advantage.  One issue with all data is to apply the 18 inch rule.  Whoever is 18 inches in front of the monitor has to be responsible for the data that is input.  Lookup tables should be as static as possible and therefore any additions to a lookup table need to be made very carefully.


    Michael R. Schmidt
    Developer

  • I've have worked with lots of systems that utilize the 1 table approach to lookups. And my opinion is that they should never, ever be used. Imagine making a datawarehouse that relies on loopups for values. Analysis services will not allow complex joins with where statements, so you will need to make views that combine all the data. Now take the main table (say a list of employees) and say you need to return 18 different lookup values. With 1 lookup table structure your where statement is huge and unless well indexed, the joins will perform terribly.

    Anyone that has ever done analysis on data from a system with 1 lookup table will tell you that it is a bad design and difficult to work with.

    Now think of table size for your 1 lookup table to work you need an extra field to store what field the value is used for. Suppose you have 500 lookup values, you could create another table for values and maybe only store an int for the field value, but then you need to do another join to figure your values out. Anyway having 1 lookup table will take up more physical space regardless of the design. 

  • In the case of data warehousing, I agree.  Each dimension should have its own table for performance and maintenance reasons.  More importantly, dimensions are often more than just code lookups and may have many other attributes for each value, specific to the dimension's type or meaning.  Generally speaking, the number of dimensions in a fact table is low so the number of tables is not great, at least for a group of similar fact tables.

    In the case of an operational database, I am quite happy with the one table approach.  Our table is properly indexed and performs well.  These are codes that simply replace repeating text and don't have distinct attributes for each type.  We would require more than 100 code lookup tables to track all the different code types in this database if we separated them out.  That would be a development and maintenance nightmare. 

    In any case, you can and should avoid the ugly WHERE clauses and poor performance with one or more user defined functions for code lookup and proper indexes on the code table.

  • As you can see in article, real life scenario 2 is more complicated than the one table. It consists of 4 tables (in my case). I don't like codes usage and linkage to be done in the same table (recursive tables with unlimited number of levels). Based on 4 tables you DO NOT HAVE MORE THAN 1 LEVEL of dependency. And codes are separated by groups. Plus, I store ONLY GENERIC codes in the structure. The codes that are not company specific. Company specific codes are in their own tables based on scenario 1. This allows reuse generic codes and separates them from user specific ones. If you have one database or one big data warehouse it may be irrelevant. But if server has 200+ databases it will be much less overhead to keep generic code tables into on separated database. In my case, 2-3 new databases created in server weekly. All of them have the same generic codes and it will be complete disaster try to synchronize them when changes necessary.

  • As you can see in article, real life scenario 2 is more complicated than the one table. It consists of 4 tables (in my case). I don't like codes usage and linkage to be done in the same table (recursive tables with unlimited number of levels). Based on 4 tables you DO NOT HAVE MORE THAN 1 LEVEL of dependency. And codes are separated by groups. Plus, I store ONLY GENERIC codes in the structure. The codes that are not company specific. Company specific codes are in their own tables based on scenario 1. This allows reuse generic codes and separates them from user specific ones. If you have one database or one big data warehouse it may be irrelevant. But if server has 200+ databases it will be much less overhead to keep generic code tables into on separated database. In my case, 2-3 new databases created in server weekly. All of them have the same generic codes and it will be complete disaster try to synchronize them when changes necessary.

  • As you can see from real live example (scenario 2) code table have id as integer. Group and type tables are very small (50-100 records each). And in a lot of cases if group code is COUN then description COUNTRY is not really necessary. It makes description unnecessary for many cases. Plus with such small number of rows performance is not suffering at all.

     

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply