Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Antipathy for Entity Attribute Value data models


Antipathy for Entity Attribute Value data models

Author
Message
BillyJack
BillyJack
SSC Rookie
SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)

Group: General Forum Members
Points: 45 Visits: 65
I think Eric's example demonstrates the strength of an EAV design..you can add new attributes via data and not table structure.

David, I see from the responses that your article on this topic has initiated some lively debate! Well done!
Sergiy
Sergiy
SSCertifiable
SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)

Group: General Forum Members
Points: 5825 Visits: 11396
Eric M Russell (1/21/2013)

I still don't see the practical benefit of seperating phone / fax numbers from emails and twitter accounts.

There were no twitter accouns several years ago.
Database designed using traditional model back then needs to be redesigned to accomodate the new type of contact reference.
When in EAV model it's a matter of adding a row in the list of contact types.

People usually supply a single way of contacts (or 2 at most), and you have to keep a row with all possible contact option populated with NULLs.

P.S. just in case - I'm not arguing with Eric, actually my post is in his support. :-)
Peter Schott
Peter Schott
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1031 Visits: 1906
Not sure what a traditional model would be when it comes to contacts. Seems to me that a table linking the contact info to a person/entity, a contact type, and the value (with maybe a listed/unlisted value as well) was a better design than one column per contact type or multiple tables for phone, email, fax, etc. The worst designs I've seen were those that had just a handful of columns in the model for storing these values - they just don't extend well. I never really considered this an EAV model because it still only stores communication-type values. We're not mixing in addresses, favorite colors, and such in the same table. I guess I'm just missing the point in this particular instance, though. Smile



Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)

Group: General Forum Members
Points: 4586 Visits: 9518
BillyJack (1/21/2013)
I think Eric's example demonstrates the strength of an EAV design..you can add new attributes via data and not table structure.

David, I see from the responses that your article on this topic has initiated some lively debate! Well done!

Depending on how we look at it, the Customer_Contact table I presented above is containing only one attribute (contacts). It's just flexible enough to accomodate multiple types of contacts.

"Write a query that returns customer's name, primary method of contact, and last date/time of contact for all customers who have a late payment past 30 days."


Given the above requirement, I don't want phone numbers, emails, and faxes all stored in seperate tables. Also, if dealing with seperate tables, I especially don't want a requirement from end users to start supporting text messaging.

For the majority of contact types, a single table need contain only a type code and a single alpha-numeric value. The application or user can then decide what to do with the information. They just need to know the contact type they're dealing with for context.

Of course mailing and home address should almost certainly be in a seperate table, because it is composed of multiple attributes (city, zip, geo location, etc.) that are commonly queried seperately.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
David.Poole
David.Poole
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3674 Visits: 3114
Sergiy (1/21/2013)


And if you do not know how to retrieve items with all 3 attributes having requested values it indicates weakness of your SQL skills, not EAV model.


It isn't a case of a DBAs skills, its the case of people who have a basic knowledge but whose primary tool is not SQL. Users of SAS Analytics for example. I'm interested in designs that maximize the audience that can gain benefit from accessing the data.

There is little point in coming up with a design that requires a data priesthood in order to get business value from data.

As most of my work over the past 12 months has really emphasised the problems caused by poor data quality. Anything that opens the door to poor data quality has to be resisted at all costs in my environment. As data has to be shared with external organisations as part of what my organisation serves its customers poor data quality has a direct affect on revenue.

I'm not sure that your locking statement is true. I could be wrong but I don't think SQL Server locks fields, it locks records and anything that requires more than 5,000 locks results in lock escalation.

LinkedIn Profile

Newbie on www.simple-talk.com
Tom Powell-334692
Tom Powell-334692
SSC Rookie
SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)

Group: General Forum Members
Points: 26 Visits: 109
Good article, thanks. I'll pass it on to my team. We're stuck in the Third Circle of Hell implementing a massive EAV in a third party tool. Helps to know what you are up against.

Tom Powell
http://philergia.wordpress.com/

carlos.iglesias
carlos.iglesias
SSC Rookie
SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)

Group: General Forum Members
Points: 45 Visits: 91
Turning into Magento EAV model (http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache), EntityAttributeValue table is splitted in 5 different tables: EntityAttributeValue_datatime, EntityAttributeValue_int, EntityAttributeValue_decimal, EntityAttributeValue_text, EntityAttributeValue_varchar. This solution improve storage but it increase the model complexity.
There are a bit of EAV model in Microsoft CRM, is a clear example of business requirement flexibility where entities are pre-defined but attributes can be manage by the end users.
In my opinion EAV is a good & nice model with a huge weakeness: performance and complexity (as a colleague said above it's too far away from the logical model)
A nice discussion in this thread.
Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)SSCarpal Tunnel (4.6K reputation)

Group: General Forum Members
Points: 4586 Visits: 9518
LOL! All of you guys who added a [Customer_Tweet] table to your database, are you going to add tables for supporting Hoots and Chirps too?


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
lnardozi 61862
lnardozi 61862
SSC-Enthusiastic
SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)

Group: General Forum Members
Points: 136 Visits: 615
I made an entire subsystem for EAV data, in order to hold configuration information to be used by the entire enterprise.
There are stored procs, DTS packages, jobs, C# apps, web services and VB apps that access this data, which has been placed in lists. Those lists can be lists, or they can be lists that vary according to a second definable parameter, or tables that vary according to a definable parameter. Those 900 configuration tables we can can be abstracted down to 3 or 4 and the data normalized, since there are only keys, values (varchar(max)) and descriptions (also varchar(max)). We don't have a need to store binary information, and we use the TVFs to populate table variables that are then used in our queries (so casting is minimized). Even taking the hit where nested case statements would normally be used, it is actually faster than the nested case statements themselves and much cleaner and more maintainable. My goal is to have all our IFs and CASEs eventually use the configuration data so the entire enterprise is table driven. I guess the thing I'm most proud of is the index on varchar(max). Heh. Compute the SHA, persist it, close enough when you say 'the hash' = '@the hash' AND value = @value. Make sure there's indices on all the foreign keys and it's nothing but index seek. It conforms to my 'all your shit in one sock' algorithm, which places the configuration data where you know it'll be.
peter-757102
peter-757102
SSC-Addicted
SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)

Group: General Forum Members
Points: 449 Visits: 2549
lnardozi 61862 (1/21/2013)
I made an entire subsystem for EAV data, in order to hold configuration information to be used by the entire enterprise.
There are stored procs, DTS packages, jobs, C# apps, web services and VB apps that access this data, which has been placed in lists. Those lists can be lists, or they can be lists that vary according to a second definable parameter, or tables that vary according to a definable parameter. Those 900 configuration tables we can can be abstracted down to 3 or 4 and the data normalized, since there are only keys, values (varchar(max)) and descriptions (also varchar(max)). We don't have a need to store binary information, and we use the TVFs to populate table variables that are then used in our queries (so casting is minimized). Even taking the hit where nested case statements would normally be used, it is actually faster than the nested case statements themselves and much cleaner and more maintainable. My goal is to have all our IFs and CASEs eventually use the configuration data so the entire enterprise is table driven. I guess the thing I'm most proud of is the index on varchar(max). Heh. Compute the SHA, persist it, close enough when you say 'the hash' = '@the hash' AND value = @value. Make sure there's indices on all the foreign keys and it's nothing but index seek. It conforms to my 'all your shit in one sock' algorithm, which places the configuration data where you know it'll be.


Hasing can accelerate searching for equality/inequality, but after an equality match you still got to compare the base data as collisions will occur. More frequently so as the size of your tables increase! In the end all it takes is exactly one collisions to render a system useless.

I made an 'object model' in SQL Server in the past and it had tricks that SQL Server could not do at the time (on delete set null for example) and was superior in cascading delete flexibilities. But as the amount of data put into the system grew, problems arose, many problems. Some of them I could today alleviate somewhat with better code, but the core of the problems remain. Inefficient IO, locking and lots of trigger/procedural code to enforce things. I have to say I did wend extreme and put everything into the model, the whole database, including all things I knew at design time. This is certainly not something I will ever repeat!

I am all for metamodels for handling design time unknowns and/or to physically extend a set model with extra attributes when it benefits. Physical simply because you can access the data where it otherwise would be modeled too and is open to native indexes and constraints. This is more efficiently in all aspects concerned (speed/correctness/locking/code execution).

There might be one case I would consider EAV over physical table and that is with very wide tables that exhibit access patterns over a fraction of the attributes at a time. But even then I would seem to structure it more as it is very likely that some attributes are always accessed in groups and thus a form partitioning is likely more effective. In a sense EAV is a form of partitioning that is overdone in most instances!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search