Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Store Key - Storing Misc Data Expand / Collapse
Author
Message
Posted Friday, November 28, 2003 12:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, March 25, 2004 5:05 AM
Points: 5, Visits: 1
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/nraghavendra/storekeystoringmiscdata.asp


Post #18651
Posted Thursday, December 04, 2003 8:28 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, March 21, 2014 7:45 AM
Points: 1,138, Visits: 698
I don't agree with the content of this article at all. It's a cute idea, and will probably reduce the size of your database schema -- at the expense of performance and data maintainability. The database will be forced to perform needless scans of the "store key" table every time any type of operation is done. The author hasn't discussed what happens when the table grows to millions or billions of rows. How will it be maintained? Will you delete items from it as they age? How will items in the global store key table be searched? What happens when you're storing redundant names, redundant addresses, redundant names of departments, redundant product codes, and twenty other redundant types in the table and suddenly your new business requirement is to write a stored procedure to search for customers by address? Do you want your database scanning all of those extra rows every time? Or needlessly joining the entire customers table to the "store key" table first to reduce the text scanning workload a bit? And what happens when you have two very large entity types stored in the table and you need to boost performance of searching them both at the same time? Good luck partitioning the data onto different physical discs when you have no identifier of entity type on the table.

The author discusses "normalization" but fails to realize that the idea of a global "store key table" violates the very important concept of seperation of entity types within the database.

Edited by - amachanic on 12/04/2003 10:36:48 AM

Edited by - amachanic on 12/04/2003 10:38:07 AM


--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #89067
Posted Thursday, December 04, 2003 9:18 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, June 23, 2004 5:36 PM
Points: 7, Visits: 1
Hi Everyone,

I have been working with this sort of database design for several years now. It is very powerful and can be very useful, but I ***STRONGLY*** Recommend AGAINST this concept.

The problem isn't inherent to the model itself. The problem is that programmers are human beings and timelines are limited. Writing queries and maintaining data is MUCH more expensive AND much more error prone because of the increased complexity.

Every team has someone whoose SQL skills aren't very good. This person usually is invaluable for their other skills. A team working with this sort of Database design will:
A) Quickly steal all of the SQL experts from all the other teams.
B) Still make more SQL errors than other projects
C) Lack all of the other skills that the team needs as getting the DB to work becomes the only priority.

I know that I'm painting the doomsday scenario here, but let me assure you that this little boit of cleverness turns a profitable 1000 hour project into a 7000 hour project that almost kills the project team.

JUST SAY NO

Steven




Post #89068
Posted Thursday, December 04, 2003 1:46 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, December 09, 2003 12:00 AM
Points: 21, Visits: 1
I read the article thinking that I might find a technique that could make my database designs either easier, faster, or better in SOME way, but I don't believe this technique accomplishes any of this.

I wrote an article a while back that had hints of this concept, but had a more structured base and had a definitive use. http://www.sqlservercentral.com/columnists/dsumlin/lookupstrategy.asp

The main problem I see with this concept is that there is no control over which values exist for a particular domain. (I'm sure there's a complicated technical term for this, so I'll explain).

The technique that he presents appears to be a good place to store lookup type values, OR values that may be used in other places so as to eliminate redundancy. The problem with this is that there is NO way to determine boundaries of that data.

For example: Let's say you have a Customer table with a CustomerType field in it. The allowable values are (Wholesale, Retail, Distributor). You then have another table called Sales with a ReferredBy field. The allowable values are (Internet, Friend, Distributor, etc)

How do you (from an application standpoint) limit the values that can be put into a particular field and how do you display lists of allowable values for instance? (e.g. A drop down list of Customer Types?) Do you have to do a DISTINCT/GROUP BY of the Customer table each and every time? And how would you inactivate a selection for one domain, while allowing it to be valid for another domain. (e.g. Distributor might no longer be a valid value for CustomerType, but be valid for ReferredBy)

The only thing that I see that this concept does besides being confusing to most developers, making the database actually more complicated, and making changes to data more difficult is save some data space. And the space saved isn't worth it in my opinion.

David[/url][url]




Post #89069
Posted Thursday, December 04, 2003 2:06 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, March 21, 2014 7:45 AM
Points: 1,138, Visits: 698
sumlin,

Doesn't your technique still violate 1NF? You are representing different entity types in a single column. How is this any better or less complex than maintaining individual lookup tables? Again, your schema may look simplified, but your life down the road will only be more complex.



--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #89070
Posted Thursday, December 04, 2003 2:31 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, December 09, 2003 12:00 AM
Points: 21, Visits: 1
amachanic,

Yes, I realize that it's not 1NF. In fact, I mention that in the article.
Now, I know that there are probably a number of you who immediately see that this design breaks the 1st form of normalization. I contend that there are always exceptions to the rule based upon applicability of the situation.

I may be storing multiple types of entities (e.g. CustomerType, OrderStatus, etc) in the same table, but I'm logically separating them via the LookTypeID column. Therefore I CAN determine the domains of each value. And I'm not taking a single value and allowing it to have multiple domains such as nraghavendra suggests doing. (e.g. I can have Distributor in my Look table multiple times, but it'll be identified by a different LookGID, LookTypeFID, and LookConst value for each domain it may be in)

As far as complicated down the road, I have used this techniqe in literally dozens of applications ranging from e-commerce shopping carts, financial data warehouses, and everything in between. This technique has saved me so much time & effort I can't even imagine building databases without it anymore. I'm not saying there's not some limitations & caveats to it, but in my development experience, they pale next to the time I've saved.

David




Post #89071
Posted Thursday, December 04, 2003 2:58 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, March 21, 2014 7:45 AM
Points: 1,138, Visits: 698
sumlin,

I said it would cause problems down the road, perhaps not initially; unless you've developed these dozens of applications for one gigantic company, I can only assume you weren't around two years later when your bloated combined lookup tables started causing performance problems and confusion amongst developers due to a not-so-well-maintained data dictionary. But perhaps I should thank you for persisting this technique: I have been called in on more than one occasion to undo the damage it ultimately causes. Perhaps sometime soon I'll be earning some additional income thanks to your article!

Regardless of all of this, I'd still like to know why you feel this technique saves you time. Is the modelling of small lookup tables really all that time-consuming?



--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #89072
Posted Thursday, December 04, 2003 4:54 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, December 09, 2003 12:00 AM
Points: 21, Visits: 1
amachanic,

1) This technique is still being used many years after it's creation, and being used by many other developers that may at first balk at the concept due to purity sake, but realize that for clarity and ease of use it is a good idea. In fact, I contend the value of the technique gets better the longer the application is around, and especially with applications that are continually changing.

2) Why do you assume that there is no well maintained data dictionary? If, as required by some business requirements, they want to see a separate entity per domain, I'll just give them a view (e.g. v_CustomerType).

3) And what makes you think that there's performance problems? I once had this debate in an architectural design meeting when building a table that had about 25 lookup fields. It was assumed that by joining to the Look table 25 times was going to be slower than joining to 25 different tables. That assumption proved wrong. It turned out it was actually faster (not by much) to use the Look concept. That was a project almost 5 yrs ago, so I don't have the performance data still.

4) You mention damage it eventually causes? I'm curious as to what kind of damage that this technique causes that you have to be called in to repair.

5) The time is not necessarily saved on table creation, that's obviously an easy task. The time isn't necessarily saved on the creation of INSERT/UPDATE/DELETE procedures for each of those tables or creation of views for those tables, although the addition of these objects does start to make the list of objects in your database get much longer. The time saved is on the application side. Instead of having to create a class for each table, or functions for each data manipulation process, you only do it once. No matter how many fields, domains, values you add to the database. Yes, you could try to create a generic class that basically took an object name and dynamically determined which procs to call, but that becomes a very fragile piece of code. (I know, I've tried that way too)

For example, I currently am interfacing with a DB2 database developed by another development team completely, although I have been tasked with reporting functionality. They developed the database with separate lookup tables...all 200 of them. There have actually been problems with their lookup tables (see a question I just posed under a different thread http://www.sqlservercentral.com/Forum/topic.asp?TOPIC_ID=18853&FORUM_ID=23&CAT_ID=2&Topic_Title=How+would+you+do+this%3F&Forum_Title=General) that would have been avoided using my technique. This database has so many lookup tables that just scrolling the table list becomes a pain in the rear. Realize that they have a view for each table and 3 procs for each table. That's 600 procs!!! What did they gain? IMHO, not very much if anything. Did they add complexity and bloat (your term) to a database structure and application architecture? I think so.

I think that this technique holds more appeal if you have to do both database architecture and application programming. My .02

David




Post #89073
Posted Friday, December 05, 2003 7:45 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, March 21, 2014 7:45 AM
Points: 1,138, Visits: 698
sumlin,

Perhaps you have implemented these things in the field in different ways than what I've seen. Said performance draining bloat and data dictionary issues are the results that I personally have witnessed.

In the extreme case, the application was a very large securities data warehouse which used a centralized lookup table. The warehouse was five years old and countless hands had been involved in adding various types of entities to the table. No one had bothered to delete deprecated entities (and there were HUNDREDS), developers had named entities based on a variety of schemes both real and imagined, and no one was left from the time of the original creation of the table. As a result, confusion ruled the day. Do we use the 'CTY' tag to find a city, or the 'CITY' tag? Both have cities! Join performance was horrible. The nightly build also included parallel processes that both needed to write different types of data to the table, causing disc IO issues.

Solved all issues by spending a lot of time sorting through the data, figuring out what belonged where, and breaking everything into individual lookup tables with properly defined constraints. Join performance shot way up (no more needless sifting through unrelated data), I was able to seperate the tables to different discs to optimize the nightly build, and was able to eliminate a lot of confusion. Yes, had they bothered to maintain documentation the latter wouldn't have happened, but honestly, how many shops have you been to that had even slightly up-to-date documentation (again, years AFTER the initial build phase)?

As for your situation with the view on each table, that seems a bit unwarranted. What are these views doing, other than selecting straight from the lookup tables?



--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #89074
Posted Friday, December 05, 2003 7:52 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, March 21, 2014 7:45 AM
Points: 1,138, Visits: 698
By the way, just looked at the thread you posted and I feel that there should NOT be two lookup tables in that situation; statuses of court cases are the same logical types of entity, are they not? "OPEN PENDING TRIAL", "CLOSED CONVICTED", etc (assuming these are actual possible statuses), could all go into a single table, thereby eliminating your problem, and you would still not need a different lookup tag to sort through the data in the table. I don't know anything about legal data, so perhaps I'm missing something.



--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #89075
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse