Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

Antipathy for Entity Attribute Value data models Expand / Collapse
Author
Message
Posted Monday, January 21, 2013 12:38 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, March 19, 2013 7:40 AM
Points: 37, Visits: 65
I think Eric's example demonstrates the strength of an EAV design..you can add new attributes via data and not table structure.

David, I see from the responses that your article on this topic has initiated some lively debate! Well done!
Post #1409680
Posted Monday, January 21, 2013 12:41 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Tuesday, July 29, 2014 4:39 PM
Points: 4,576, Visits: 8,342
Eric M Russell (1/21/2013)

I still don't see the practical benefit of seperating phone / fax numbers from emails and twitter accounts.

There were no twitter accouns several years ago.
Database designed using traditional model back then needs to be redesigned to accomodate the new type of contact reference.
When in EAV model it's a matter of adding a row in the list of contact types.

People usually supply a single way of contacts (or 2 at most), and you have to keep a row with all possible contact option populated with NULLs.

P.S. just in case - I'm not arguing with Eric, actually my post is in his support.
Post #1409681
Posted Monday, January 21, 2013 1:01 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 3:47 PM
Points: 837, Visits: 1,259
Not sure what a traditional model would be when it comes to contacts. Seems to me that a table linking the contact info to a person/entity, a contact type, and the value (with maybe a listed/unlisted value as well) was a better design than one column per contact type or multiple tables for phone, email, fax, etc. The worst designs I've seen were those that had just a handful of columns in the model for storing these values - they just don't extend well. I never really considered this an EAV model because it still only stores communication-type values. We're not mixing in addresses, favorite colors, and such in the same table. I guess I'm just missing the point in this particular instance, though. :)


Post #1409685
Posted Monday, January 21, 2013 1:59 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:40 AM
Points: 1,606, Visits: 4,599
BillyJack (1/21/2013)
I think Eric's example demonstrates the strength of an EAV design..you can add new attributes via data and not table structure.

David, I see from the responses that your article on this topic has initiated some lively debate! Well done!

Depending on how we look at it, the Customer_Contact table I presented above is containing only one attribute (contacts). It's just flexible enough to accomodate multiple types of contacts.

"Write a query that returns customer's name, primary method of contact, and last date/time of contact for all customers who have a late payment past 30 days."


Given the above requirement, I don't want phone numbers, emails, and faxes all stored in seperate tables. Also, if dealing with seperate tables, I especially don't want a requirement from end users to start supporting text messaging.

For the majority of contact types, a single table need contain only a type code and a single alpha-numeric value. The application or user can then decide what to do with the information. They just need to know the contact type they're dealing with for context.

Of course mailing and home address should almost certainly be in a seperate table, because it is composed of multiple attributes (city, zip, geo location, etc.) that are commonly queried seperately.
Post #1409708
Posted Monday, January 21, 2013 2:30 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 3:46 PM
Points: 2,892, Visits: 1,786
Sergiy (1/21/2013)


And if you do not know how to retrieve items with all 3 attributes having requested values it indicates weakness of your SQL skills, not EAV model.


It isn't a case of a DBAs skills, its the case of people who have a basic knowledge but whose primary tool is not SQL. Users of SAS Analytics for example. I'm interested in designs that maximize the audience that can gain benefit from accessing the data.

There is little point in coming up with a design that requires a data priesthood in order to get business value from data.

As most of my work over the past 12 months has really emphasised the problems caused by poor data quality. Anything that opens the door to poor data quality has to be resisted at all costs in my environment. As data has to be shared with external organisations as part of what my organisation serves its customers poor data quality has a direct affect on revenue.

I'm not sure that your locking statement is true. I could be wrong but I don't think SQL Server locks fields, it locks records and anything that requires more than 5,000 locks results in lock escalation.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1409715
Posted Monday, January 21, 2013 3:18 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, June 5, 2014 9:05 AM
Points: 23, Visits: 96
Good article, thanks. I'll pass it on to my team. We're stuck in the Third Circle of Hell implementing a massive EAV in a third party tool. Helps to know what you are up against.

Tom Powell
http://philergia.wordpress.com/
Post #1409730
Posted Monday, January 21, 2013 3:27 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, September 26, 2013 6:55 AM
Points: 43, Visits: 91
Turning into Magento EAV model (http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache), EntityAttributeValue table is splitted in 5 different tables: EntityAttributeValue_datatime, EntityAttributeValue_int, EntityAttributeValue_decimal, EntityAttributeValue_text, EntityAttributeValue_varchar. This solution improve storage but it increase the model complexity.
There are a bit of EAV model in Microsoft CRM, is a clear example of business requirement flexibility where entities are pre-defined but attributes can be manage by the end users.
In my opinion EAV is a good & nice model with a huge weakeness: performance and complexity (as a colleague said above it's too far away from the logical model)
A nice discussion in this thread.
Post #1409731
Posted Monday, January 21, 2013 3:37 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:40 AM
Points: 1,606, Visits: 4,599
LOL! All of you guys who added a [Customer_Tweet] table to your database, are you going to add tables for supporting Hoots and Chirps too?
Post #1409735
Posted Monday, January 21, 2013 9:25 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 1:16 PM
Points: 107, Visits: 511
I made an entire subsystem for EAV data, in order to hold configuration information to be used by the entire enterprise.
There are stored procs, DTS packages, jobs, C# apps, web services and VB apps that access this data, which has been placed in lists. Those lists can be lists, or they can be lists that vary according to a second definable parameter, or tables that vary according to a definable parameter. Those 900 configuration tables we can can be abstracted down to 3 or 4 and the data normalized, since there are only keys, values (varchar(max)) and descriptions (also varchar(max)). We don't have a need to store binary information, and we use the TVFs to populate table variables that are then used in our queries (so casting is minimized). Even taking the hit where nested case statements would normally be used, it is actually faster than the nested case statements themselves and much cleaner and more maintainable. My goal is to have all our IFs and CASEs eventually use the configuration data so the entire enterprise is table driven. I guess the thing I'm most proud of is the index on varchar(max). Heh. Compute the SHA, persist it, close enough when you say 'the hash' = '@the hash' AND value = @value. Make sure there's indices on all the foreign keys and it's nothing but index seek. It conforms to my 'all your shit in one sock' algorithm, which places the configuration data where you know it'll be.
Post #1409772
Posted Tuesday, January 22, 2013 4:18 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
lnardozi 61862 (1/21/2013)
I made an entire subsystem for EAV data, in order to hold configuration information to be used by the entire enterprise.
There are stored procs, DTS packages, jobs, C# apps, web services and VB apps that access this data, which has been placed in lists. Those lists can be lists, or they can be lists that vary according to a second definable parameter, or tables that vary according to a definable parameter. Those 900 configuration tables we can can be abstracted down to 3 or 4 and the data normalized, since there are only keys, values (varchar(max)) and descriptions (also varchar(max)). We don't have a need to store binary information, and we use the TVFs to populate table variables that are then used in our queries (so casting is minimized). Even taking the hit where nested case statements would normally be used, it is actually faster than the nested case statements themselves and much cleaner and more maintainable. My goal is to have all our IFs and CASEs eventually use the configuration data so the entire enterprise is table driven. I guess the thing I'm most proud of is the index on varchar(max). Heh. Compute the SHA, persist it, close enough when you say 'the hash' = '@the hash' AND value = @value. Make sure there's indices on all the foreign keys and it's nothing but index seek. It conforms to my 'all your shit in one sock' algorithm, which places the configuration data where you know it'll be.


Hasing can accelerate searching for equality/inequality, but after an equality match you still got to compare the base data as collisions will occur. More frequently so as the size of your tables increase! In the end all it takes is exactly one collisions to render a system useless.

I made an 'object model' in SQL Server in the past and it had tricks that SQL Server could not do at the time (on delete set null for example) and was superior in cascading delete flexibilities. But as the amount of data put into the system grew, problems arose, many problems. Some of them I could today alleviate somewhat with better code, but the core of the problems remain. Inefficient IO, locking and lots of trigger/procedural code to enforce things. I have to say I did wend extreme and put everything into the model, the whole database, including all things I knew at design time. This is certainly not something I will ever repeat!

I am all for metamodels for handling design time unknowns and/or to physically extend a set model with extra attributes when it benefits. Physical simply because you can access the data where it otherwise would be modeled too and is open to native indexes and constraints. This is more efficiently in all aspects concerned (speed/correctness/locking/code execution).

There might be one case I would consider EAV over physical table and that is with very wide tables that exhibit access patterns over a fraction of the attributes at a time. But even then I would seem to structure it more as it is very likely that some attributes are always accessed in groups and thus a form partitioning is likely more effective. In a sense EAV is a form of partitioning that is overdone in most instances!
Post #1409918
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse