RE: Is having a lookup table that is for multiple entities a common practice?

SSC Guru

Points: 287024

February 12, 2017 at 8:23 am

#1927991

Doctor Who 2 - Saturday, February 11, 2017 6:54 PM
Jeff Moden - Thursday, February 9, 2017 9:41 PM
Doctor Who 2 - Wednesday, February 8, 2017 9:57 PM
I've been programming and designing databases of various sorts for several years. And I've been on teams who have designed databases. Whenever we needed to have a lookup table for something, we would always create a lookup table for whatever. For example, let's say we wanted to have a lookup table for the states of the United States. We'd create a lookup table for that, with maybe a CHAR(2) for the key and then a VARCHAR(50) for the full name of the table. That's the idea I'm trying to convey.
At my current job my boss doesn't like doing that at all. Instead what he does is creates one, huge and wide (as in many columns) lookup table. Then he puts everything into it. For me, this is difficult because when I want to find something I've got to try and remember whatever special value in one or two columns it is that lets me know what other column has the value for the key (not the primarily key, but the designation that you'd use like 2 characters for a state abbreviation) and where the other column is for the full name, etc.
However, I'm wondering, perhaps its just me. In the past I've worked for smaller IT shops, with at most 5 people. Now I work for a large agency with over 200 IT people. Is this the way that larger organizations design their databases?
I agree with the others. It's a really bad idea that most consider to be an absolute worst practice. The problem is that it's your boss doing it and he doesn't appear to be in the listen'n'learn mode on this one especially since it's his baby. The best you could do is to setup a test that proves that it's a huge mistake. Without such a test, he's not going to listen.
Jeff, this is a very interesting idea. And I'm beginning to think one I'd better to something about, soon. I've been working with this data design architecture for over a year, but it's taken me a very long time to completely comprehend what my boss wants this general purpose table to do. Just this week I've realized that it is not only the One Master Lookup Table (or One True Lookup Table as spaghettidba puts it), but my boss also wants it to be used for saving user preferences for where windows are to be placed, sort order preferences, etc. Everything related to how the user interacts with the application we're writing. Also all reports information is to be stored there, such as the customized reports a user wants and the definition of those reports. I wouldn't be surprised at all if I haven't yet comprehended everything this table is meant to store. But let me ask you a question about what you've suggested. Right now we're been working on some applications (3 currently) which are not yet in production. So up to this point there's been no contention for reading or writing to the table. How can I simulate load onto this table, such as might happen in production?

First off, let me agree with the others. Storing multiple lists of unrelated data in different columns of a single table is just asking for trouble. It's a disaster waiting to happen. Like Gail pointed out, if you want to inactivate one status, does that mean the state, region, district and division on the same row are all inactive because the Active bit was set to zero?

I can't assume to know exactly what Jeff is thinking, but I had an idea that might be along the same lines. If I'm "seeing the table" you describe properly, this might drive the point home. Let's say you have 30 different lists with 1 code and 1 description each for a total of 60 columns in your table. You also have 10 user-setting columns in there, storing the saved values. From the sounds of it, you'll likely have an XML or large varchar column or two.

Your application has to display the contents of the 14th list in a dropdown, so write the query to return that list. Take a look at the number of reads it took to build that list. Now that that's done, it should always work, right? Well, populate 1 or 2 of the user columns and make sure to run the table out to 1 million rows. The size of the 14th list hasn't changed, so the query to return the data should still be the same, so run it again. How have the reads changed? Now run the table out to 5 million rows, repeat the query for the 14th list and take a look at the reads. If you had multiple users trying to query the 14th list to display in a dropdown at the same time, what do you think the consequences of the design would be?

Granted, the problems you encounter could be addressed with nonclustered indexes, but the problem with the physical design is still there. Make no mistake - the physical design is important and it's difficult to change once it's in production. I don't know how many users are going to be using your applications or how many rows your tables are going to contain, but you should really design your application to scale up front or you'll face a significant amount of pain when you have to fix it later in life.

Like I said earlier, I'm not 100% sure if I'm "seeing the table" you describe accurately. The DDL for the table would help.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions