Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Store Key - Storing Misc Data Expand / Collapse
Author
Message
Posted Saturday, December 6, 2003 6:43 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:55 AM
Points: 6,779, Visits: 1,858
Good to have discussion, but lets keep this good natured, ok? I know this technique isnt for everyone, but used to some degree isn't going to end the world either. I use a variation of it for lookup tables and it works well. Could it be used badly? Sure.

I'd encourage one or both of you to really make your case in an article. PROVE to me (and everyone else) that your way is better. We're all here to learn!

Andy
http://www.sqlservercentral.com/columnists/awarren/




Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #89076
Posted Saturday, December 6, 2003 6:41 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 2:28 PM
Points: 33,062, Visits: 15,174
I agree with Andy. Any issues with performance are not completely issues with the design, but the administration and implementation also play a part. In the case mentioned, the lack of cleanup and standardization can be an issue.

I've used this technique, more like what sumlin has described in places where we had sparse storage techniques, or different amounts of data per row. I know, perhaps there were other issues, but in modifying designs in real time to deal with changing business requirements it worked well.

That being said, we controlled what was put into the tables. True lookup tables are separated into their own tables. Sparse lookups, say things like a credit limit for a customer. Not everyone has one, not a zero limit, but one that doesn't apply, so this type of "attribute" would be stored in a separate table, linking back with an FK to the parent, only for those parents that it applied to. Each attribute of this type would be examined to determine if we thought it was something that would persist, we'd normalize it properly. If it seemed like some half baked idea (lots of those from the marketing and execs) we'd use it in this type of structure. We'd also examine the criteria of whether or not this was something that was queried often. We found lots of data that was very rarely queried, it was also a good candidate for sparse storage.

You've both brought up good arguements. Be nice to see an article outlining good and bad arguments, maybe work together and build two argumentative articles.

Steve Jones
sjones@sqlservercentral.com
http://www.sqlservercentral.com/columnists/sjones
The Best of SQL Server Central.com 2002 - http://www.sqlservercentral.com/bestof/
www.dkranch.net







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #89077
Posted Sunday, December 7, 2003 6:45 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, July 23, 2012 6:29 AM
Points: 359, Visits: 109
quote:
I think that this technique holds more appeal if you have to do both database architecture and application programming. My .02



I've done both.

Perhaps more importantly I've done application programming management where I've been responsible for having many programmers over time maintain applications, fixing the (real or perceived) sins of their predecessors.

Whenever someone gets "clever" at the expense of architectural purity, there ends up being a cost. It may be simple confusion (leading to bugs or rework), or it may be that certain situations are not handled thoroughly, usually because the "tricks" end up hiding what is going on underneath.

That "cost" may sometimes be appropriate to pay. The case being made here is that the overhead of separate tables for each concept is a higher cost than the potential cost of cheating on purity of design. Each case has to be weighed individually.

But to me the biggest problem with creating this architecture is the order of the decision making. Let me compare it to denormalization in general.

I have programmers who START database designs saying "I need to denormalize these fields and these tables so this will run fast enough". They say it before they ever even figured out cardinality, much less did any experimentation. Most times they are simply wrong -- it isn't significantly faster. By starting off in the direction instead of coming to it only when necessary, they make a mess.

If you create this infrastructure, then people will tend to go to it first, without (enough) thought of consequence. You lower the "cost" of this step -- or at least the perceived cost -- and it will be used more widely than intended. That this does is hide the architectural (as opposed to implementation) cost of it.

I've done it both ways, in file systems before rdbms' and in rdmbs', and it has always come back to bite us in loss of referential integrity, in difficulty in implementing after-thought triggers and functionality specific to individual domains, etc.

It has its plance, but my suggestion is treat it like surgery - try it after everything else has failed.





Post #89078
Posted Monday, December 8, 2003 7:29 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, November 19, 2012 5:20 PM
Points: 376, Visits: 69
I've used this technique with success on various large databases. It has helped with the performance on lookups and the physical size of tables. However, on the downside you do increase the complexity and the time taken to write queries. I would not suggest using this method across the entire database, but rather for splitting tables which may have a lot of duplicated varchar or char fields mixed in with int, float etc. I refer to this method as partitioning tables rather than using a "store key" but from what I read in the article the concept is the same.

Overall I think if you have a lot of repeated data due to the lack of orginal thoughful database design (and it happens) this is a method that may be worth trying. Of course test everything before implementing it into a production system.

Cheers,

Angela



Post #89079
Posted Tuesday, December 9, 2003 8:56 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Sunday, August 29, 2004 3:57 PM
Points: 50, Visits: 1
Let me start by saying that I agree with most of the concerns expressed in this forum. I think that this is a technique that should be rarely used, and only in places where truly sparse data exists, not in the usage envisioned by the author.

One problem I see with this method is that the values in the lookup table are "shared by code" rather than by design. In other words, the user or application enters a text string for "address2" and the "instead of INSERT" trigger looks to see if the value already exists. If it does, the existing value is used.

Here's the problem:
Let's say that Joe lives in on 4th street in Seattle. He gets a record with a link to a string for the street address, rather than just storing the address. Now, Mary's record is entered. She lives on 4th street in Portland. Her record is entered and she get's a link to the same text string as Joe.

Now, Joe calls up and says that his address has to be modified. He lives on 4th Street NorthEast. The application updates the text string.

Now, Mary's address has changed. Possibly to an address that does not exist in her city.

I tried to make this discussion tangible to make it make sense, but my concern is that this would occur for a wide array of values.

I've worked with databases where the developers thought that they were being clever by looking EVERYTHING up, instead of just repeating information that should have been repeated. Maintaining the data was a NIGHTMARE.

I fear that wide use of this technique would lead to more databases with this problem.

As a result of this experience, I do not believe that this kind of "automatic lookup" is a valid exercise, and I certainly do not believe that it reduces complexity in the long run.




Post #89080
Posted Tuesday, December 9, 2003 10:12 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, May 14, 2008 12:28 PM
Points: 49, Visits: 4
I don't know that I would ever use a global table to look up so many attributes for a wide array of objects, but I have used a similar system as an attribute table for other a set of objects.

I think the system works best and is most useful when you can't predict what attributes a given set of entities are going to have during the main design phase of a project. In one case, however, I wound up using the system and then the list of attributes didn't change... I quite frankly wish that I could go back and replace it with a less flexible, more straight-forward design now. I don't think that invalidates the design concept, but I think it does point towards the argument some have been making that it is best used when you're dealing with sparse attributes. It beats creating a bunch of different tables with a 1-to-1 relationship with your entity tables, that all wind up with 5 rows that you have to join every time you run a query. Also keeps you from having to add a bunch of extraneous fields to your tables that may not ever really get used.

One gotcha I have found with my implementation is that I can't really use views to single out the attributes by entity in large joins without taking a significant performance hit. I haven't tried something like indexed views to make it faster, but instead I take the code from the views and integrate it into the query I'm writing. I'll use the views until I find a performance issue then go do the work to rewrite as needed. Part of the problem, however, is probably that I don't have just a simple identifier in the attribute table to say "this is an item color", or "this is an item cost", but instead have other tables that have to join to the attribute table to determine what kind of attribute it is. I have this requirement partly because I expected the list of attributes that were valid for an entity to change over time, but for a time period, they would all be the same; this was a way to enforce which attributes were valid when and allow them to be categorized since the application presents them to the user based on what type of attribute it is... I now think it could be done better since this has caused a fair amount of coding effort to support. On the other hand, I directly attribute the ease of a few changes to the system on how flexible it is.

Either way, I thank the author for sharing the article. IMHO, anything that causes design debate in the DBA community is a Good Thing.


Matthew Galbraith



Post #89081
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse