Is having a lookup table that is for multiple entities a common practice?

  • I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    Rod

  • This is a controversial subject - the term for the structure is "One-True-Lookup-Table" (OTLT). It's a valid model, but I'm confident it's not the enterprise-grade best-practice it might look like.

    This post from spaghettidba gives some good comments on the idea. I choose not to lock-in to this model (primarily due to it being non-standard, and difficult to add constraints to) and the majority of blog posts seem to weigh-in against the idea, but many people claim to have used it successfully. I can see benefits, mainly avoiding the schema growth that comes through many small lookup tables. I wouldn't mind it a new employer requested I implement a structure like this.

    Andrew

  • IMO, it's an extremely bad practice.

    It reduces database integrity, because foreign keys can't have filters on them, and so there's no way for the DB to prevent a country's lookup value been used in a StatusID or a Status's lookup code being used in the GenderID column (assuming there are foreign keys).
    It encourages mistakes, like 'Active' being added to the lookup table with the ST (state) code, rather than the SS (Status). 'Active' in a lookup of states and provinces would be easy to spot, in a OTLT it's much harder to spot.

    And on the design theory side, a table should have one thing in it. A table should have transactions or customers or vehicles, etc. When there's a table that contains cars, planes and watermelons it suggests something is wrong.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Andrew P - Wednesday, February 8, 2017 11:59 PM

    This is a controversial subject - the term for the structure is "One-True-Lookup-Table" (OTLT). It's a valid model, but I'm confident it's not the enterprise-grade best-practice it might look like.

    I'd be inclined to describe it as an enterprise-grade worst practice, it's throwing away improved relational integrity provided by the relational model by introducing a table which does not conform to that model.

    Tom

  • I worked in a place that had one of these monstrosities. It was named MasterXRef. By the time I arrived it was as you describe, somewhat wide and way too length for a lookup table. It was on the order of 40-50 columns and thousands and thousands of rows. This was a fairly large system so the table had to be expanded multiple times. The column for the "LookupGroup" was a varchar and identity used as the primary clustered index. It was truly awful. We had a lot of orphaned data in there for systems that no longer existed, we had invalid group values because there was no integrity. And worst of all it was getting slower and slower to do something that developers need to do repeatedly (fill comboboxes with lookup values). It wasn't horribly noticeable for 1 or 2 comboboxes but each query would be in the 3-4 second range. Now consider a webpage with 15 comboboxes being loaded by 10 people at the same time. Suddenly a query that wasn't so bad is crippled. Suffice it say I would agree with the others here that this is a horrible design.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM

    I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    I agree with the others.  It's a really bad idea that most consider to be an absolute worst practice.  The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby.  The best you could do is to setup a test that proves that it's a huge mistake.  Without such a test, he's not going to listen.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, February 9, 2017 9:41 PM

    Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM

    I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    I agree with the others.  It's a really bad idea that most consider to be an absolute worst practice.  The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby.  The best you could do is to setup a test that proves that it's a huge mistake.  Without such a test, he's not going to listen.

    Jeff, this is a very interesting idea. And I'm beginning to think one I'd better to something about, soon. I've been working with this data design architecture for over a year, but it's taken me a very long time to completely comprehend what my boss wants this general purpose table to do. Just this week I've realized that it is not only the One Master Lookup Table (or One True Lookup Table as spaghettidba puts it), but my boss also wants it to be used for saving user preferences for where windows are to be placed, sort order preferences, etc. Everything related to how the user interacts with the application we're writing. Also all reports information is to be stored there, such as the customized reports a user wants and the definition of those reports. I wouldn't be surprised at all if I haven't yet comprehended everything this table is meant to store. But let me ask you a question about what you've suggested. Right now we're been working on some applications (3 currently) which are not yet in production. So up to this point there's been no contention for reading or writing to the table. How can I simulate load onto this table, such as might happen in production?

    Rod

  • Doctor Who 2 - Saturday, February 11, 2017 6:54 PM

    Jeff Moden - Thursday, February 9, 2017 9:41 PM

    Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM

    I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    I agree with the others.  It's a really bad idea that most consider to be an absolute worst practice.  The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby.  The best you could do is to setup a test that proves that it's a huge mistake.  Without such a test, he's not going to listen.

    Jeff, this is a very interesting idea. And I'm beginning to think one I'd better to something about, soon. I've been working with this data design architecture for over a year, but it's taken me a very long time to completely comprehend what my boss wants this general purpose table to do. Just this week I've realized that it is not only the One Master Lookup Table (or One True Lookup Table as spaghettidba puts it), but my boss also wants it to be used for saving user preferences for where windows are to be placed, sort order preferences, etc. Everything related to how the user interacts with the application we're writing. Also all reports information is to be stored there, such as the customized reports a user wants and the definition of those reports. I wouldn't be surprised at all if I haven't yet comprehended everything this table is meant to store. But let me ask you a question about what you've suggested. Right now we're been working on some applications (3 currently) which are not yet in production. So up to this point there's been no contention for reading or writing to the table. How can I simulate load onto this table, such as might happen in production?

    First off, let me agree with the others.  Storing multiple lists of unrelated data in different columns of a single table is just asking for trouble.  It's a disaster waiting to happen.  Like Gail pointed out, if you want to inactivate one status, does that mean the state, region, district and division on the same row are all inactive because the Active bit was set to zero?

    I can't assume to know exactly what Jeff is thinking, but I had an idea that might be along the same lines.  If I'm "seeing the table" you describe properly, this might drive the point home.  Let's say you have 30 different lists with 1 code and 1 description each for a total of 60 columns in your table.  You also have 10 user-setting columns in there, storing the saved values.  From the sounds of it, you'll likely have an XML or large varchar column or two.

    Your application has to display the contents of the 14th list in a dropdown, so write the query to return that list.  Take a look at the number of reads it took to build that list.  Now that that's done, it should always work, right?  Well, populate 1 or 2 of the user columns and make sure to run the table out to 1 million rows.  The size of the 14th list hasn't changed, so the query to return the data should still be the same, so run it again.  How have the reads changed?  Now run the table out to 5 million rows, repeat the query for the 14th list and take a look at the reads.  If you had multiple users trying to query the 14th list to display in a dropdown at the same time, what do you think the consequences of the design would be?

    Granted, the problems you encounter could be addressed with nonclustered indexes, but the problem with the physical design is still there.  Make no mistake - the physical design is important and it's difficult to change once it's in production.  I don't know how many users are going to be using your applications or how many rows your tables are going to contain, but you should really design your application to scale up front or you'll face a significant amount of pain when you have to fix it later in life.

    Like I said earlier, I'm not 100% sure if I'm "seeing the table" you describe accurately.  The DDL for the table would help.

  • Doctor Who 2 - Saturday, February 11, 2017 6:54 PM

    Jeff Moden - Thursday, February 9, 2017 9:41 PM

    Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM

    I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    I agree with the others.  It's a really bad idea that most consider to be an absolute worst practice.  The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby.  The best you could do is to setup a test that proves that it's a huge mistake.  Without such a test, he's not going to listen.

    Jeff, this is a very interesting idea. And I'm beginning to think one I'd better to something about, soon. I've been working with this data design architecture for over a year, but it's taken me a very long time to completely comprehend what my boss wants this general purpose table to do. Just this week I've realized that it is not only the One Master Lookup Table (or One True Lookup Table as spaghettidba puts it), but my boss also wants it to be used for saving user preferences for where windows are to be placed, sort order preferences, etc. Everything related to how the user interacts with the application we're writing. Also all reports information is to be stored there, such as the customized reports a user wants and the definition of those reports. I wouldn't be surprised at all if I haven't yet comprehended everything this table is meant to store. But let me ask you a question about what you've suggested. Right now we're been working on some applications (3 currently) which are not yet in production. So up to this point there's been no contention for reading or writing to the table. How can I simulate load onto this table, such as might happen in production?

    The trouble with "load testing" is the you must not only load test his solution, you must also load test YOUR solution under the same simulated load.  Have you designed a system to replace his?  If not, the battle will be lost.  Remember that most bosses don't want to be presented with problems... they quite rightly want to be presented with solutions.  It's a rare thing to run into a boss that won't listen if you present at least a Proof-of-Principle that clearly demonstrates a reasonable advantage in the areas of ease of use/maintainability, extensibility, performance, and reduction in resource usage.

    If you can't do something like that, then you'll have to learn to enjoy the ride your boss has created and, perhaps, help him improve the current form of the system..  Heh... at least it's not some bloody form of XML or JSON! 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Doctor Who 2 - Saturday, February 11, 2017 6:54 PM

     How can I simulate load onto this table, such as might happen in production?

    Gut feel, load will not be the first problem you run into. First will probably be data integrity problems caused by bugs in the apps or people changing stuff in the DB (or things like Excel connected directly to the DB), and that's not really something you can simulate

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden - Sunday, February 12, 2017 9:48 AM

    Doctor Who 2 - Saturday, February 11, 2017 6:54 PM

    Jeff Moden - Thursday, February 9, 2017 9:41 PM

    Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM

    I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.

    At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.

    However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?

    I agree with the others.  It's a really bad idea that most consider to be an absolute worst practice.  The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby.  The best you could do is to setup a test that proves that it's a huge mistake.  Without such a test, he's not going to listen.

    Jeff, this is a very interesting idea. And I'm beginning to think one I'd better to something about, soon. I've been working with this data design architecture for over a year, but it's taken me a very long time to completely comprehend what my boss wants this general purpose table to do. Just this week I've realized that it is not only the One Master Lookup Table (or One True Lookup Table as spaghettidba puts it), but my boss also wants it to be used for saving user preferences for where windows are to be placed, sort order preferences, etc. Everything related to how the user interacts with the application we're writing. Also all reports information is to be stored there, such as the customized reports a user wants and the definition of those reports. I wouldn't be surprised at all if I haven't yet comprehended everything this table is meant to store. But let me ask you a question about what you've suggested. Right now we're been working on some applications (3 currently) which are not yet in production. So up to this point there's been no contention for reading or writing to the table. How can I simulate load onto this table, such as might happen in production?

    The trouble with "load testing" is the you must not only load test his solution, you must also load test YOUR solution under the same simulated load.  Have you designed a system to replace his?  If not, the battle will be lost.  Remember that most bosses don't want to be presented with problems... they quite rightly want to be presented with solutions.  It's a rare thing to run into a boss that won't listen if you present at least a Proof-of-Principle that clearly demonstrates a reasonable advantage in the areas of ease of use/maintainability, extensibility, performance, and reduction in resource usage.

    If you can't do something like that, then you'll have to learn to enjoy the ride your boss has created and, perhaps, help him improve the current form of the system..  Heh... at least it's not some bloody form of XML or JSON! 😉

    Jeff, you've made an excellent point. I need to focus upon solutions rather than whatever problems I foresee. I need to think of the positive things of having some more lookup tables. Perhaps what I should do is mention some of the negative things having one massive table that has everything in it plus the proverbial kitchen sink (I'll not use that phrase), but cover it very quickly, then move on to positive aspects of using a few more lookup tables. Thanks!

    Rod

  • Also consider and test for the data integrity problems that Gail spoke of.  She's absolutely correct there.  From what your boss sounds like, though, he'd shrug that off and insist that it will be properly handled in the app.  It's difficult to prove that it won't be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Adding my voice to the chorus.  This is a VERY bad idea.  I work with a database that has these on a small scale, with each major entity has it's own master lookup table.  It's impossible to do data integrity correctly and we routinely run into issues.  I can only imagine how much worse it would be if each of these were combined.  Yet I couldn't say the implementation is not "successful."  The application does work.  But there are continual small issues.  Even if this design your boss proposes is 99% accurate, it would still have some problems.  In my view, there is no excuse for loss of data or data integrity due to faulty design.

  • I think using a single table for everything you need to look up is bad. However, like anything else, it is a tool that has some legitimate uses. We use a similar table for a generic interface process that is configurable. We can add a new interface without doing any coding by using the lookup table to map the various bits to the appropriate place in the file handlers as well as the archive folders, etc. We do have some additional mapping in the table that I argued against and lost. Our lead developer said "I am not creating a separate table for 6 items when we have a perfectly good table that does the same thing". We also have some "miscellaneous stuff" tables that contain various feature toggles to turn features on and off, some random one off configuration items, and items like current fiscal year.

  • Oooohhh, boy, did this one ever bite me in the butt...

    Ok, the OTLT is a bad idea in general, because of what everyone else has said. There are a number of guises this succubus of a db design feature will don to seduce the developer/dba.

    First, is the "common schema". That is, you have a bunch of different tables, but they all have exactly the same structure. And I do mean exactly the same. For example, a pair of "cash pile" tables that's partially denomalized for performance reasons. Say, having a cash header table for summary (cash total, coin total, etc.). Then you have the cash denomination table that basically has a cash header field, denom field, and amount.

    Let's also say you have maybe 15 different kinds of records that use a cash pile. So, the OTLT philosophy would say "put a type field in the header record and voila!, problem solved". One pair of tables that hold a "cash pile" entity.

    Except, you've created a hotspot. Anything that needs info on cash amounts is going to be hitting those tables. Worse, those two tables are likely to be *huge*, millions of records in even a small company with a couple of years of data.

    Second, unless you're VERY careful it would be extremely easy for error conditions to change the type field of records read/written and then you have a real mess on your hands.

    The second guise is the one you've run into, a lot of different data going into a master lookup table. That's perhaps the easiest one to swat because there's no upside to it. At least in the cash pile example you can reuse the same code to read any type of cash pile. With the table you describe you lose that one "advantage".

    Having said that, if the OTLT succubus does get its claws into the design, there's actually one guise that's fairly harmless, but it requires a huge amount of discipline to make it work. (read, extra work on the developer's part).

    Create a pair of tables. The first (header) has 3 fields maximum. These are a primary key (probably an identity), the "lookup table name", ie the name of this "virtual table", and (optionally) a varchar(max) field for human readable notes that describe what the "virtual table" is for.

    The second table contains 3 (possibly 4) fields as well, an identity to be the primary key (good for audit logging of changes, etc), the human name for combo boxes and lastly a varchar field (possibly max, but probably varchar(10) since this is a code which is supposed to short!) and (optionally) a varchar(max) field for human readable notes describing the entry. Optionally you can also include a byte field for numeric codes, but that's starting down a bad road...

    Then, follow these rules about what type of "virtual tables" you're creating:

    1) Data must be static, i.e. it doesn't change much and (hopefully) never deletes entries. States (in the US) are a good example of this.

    2) Data should be no more than 10 entries (or so) in a given "table" (ie queries against the combobox entries should never be expensive). 

    3) Ideally, these "virtual tables" should be looked up once when the application starts running, so the cost is incurred when the user is more relaxed. This lessens server load while the application executes, as well.

    Given the restrictions above I won't say OTLT (at least in the most benign guise) is a worthless idea, but it's certainly not the beguiling creature it makes itself out to be. :laugh:

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply