Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««34567»»

Antipathy for Entity Attribute Value data models Expand / Collapse
Author
Message
Posted Thursday, January 31, 2013 3:48 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 10:08 AM
Points: 2,921, Visits: 1,870
@Jeff What sort of size, scale and traffic are you putting through your EAV models?

LinkedIn Profile
Newbie on www.simple-talk.com
Post #1414379
Posted Thursday, January 31, 2013 5:27 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:12 AM
Points: 35,770, Visits: 32,434
David.Poole (1/31/2013)
@Jeff What sort of size, scale and traffic are you putting through your EAV models?


Some of the tables got quite large (several million rows). The post just a couple up from here is a reasonable example of what one app used for contact info. We had one for address information, as well. Proper indexing on the "entity" and a bit of dynamic SQL as a dynamic crosstab worked well in most of the instances.

It's definitely not the path I'd have liked to travel down but we made it work and, especailly for what it did, it worked quite fast (apologies... it was several years ago and I don't have the metrics available). A side advantage was it did greatly simplify certain types of searches.

I have to admit that one of my favorite types of EAV is for data collection by automation. Something with a device number, a date, and some value. Makes life super easy for creating reports by just about any time frame you can imagine. In those areas, many millions of rows and hundreds of read and thousands of writes per minute. On the IVR system I wrote, I could do reporting by hour of the day across 9 different catagories for every day of a month including category grand and cumulative totals on 12 million rows of data in about 5 seconds. It would have been a huge pain without the use of an EAV.

Again, I hate to call these things "models" because that implies a much broader scope than what we used. We had some EAV "tables".


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1414407
Posted Friday, February 1, 2013 7:50 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 10:08 AM
Points: 2,921, Visits: 1,870
@Jeff, it sounds as if you should write a counterpoint article for EAV.

Streaming stuff into an EAV has never been a problem, its the use cases for getting it out that are problematical.

I can see it apply to telematics data to an extent in that I have an event type, date/time, value, GPS co-ordinate. Basically anything that is entered by machine reduces the data quality issues.

One comment I didn't understand in the feedback in this thread was about SQL_Variant making rubbish of the points in my article. I know it wasn't you who said it but can you shine any light on this?

Simple aggregations have worked fine, retrieval of all attributes for an entity have been fine. Pivoting I've seen eat a box.

The particular example I gave where complex combinations of criteria caused no end of problems simply because the complexity of the SQL grew geometrically.

One particular implementation was knocking on the door of 600GB so you can imagine the number of records in EAV form. It was a 9 figure number of records per day. We had to switch off mirroring due to the excessive IO involved.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1414687
Posted Friday, February 1, 2013 9:03 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, December 15, 2014 3:27 AM
Points: 337, Visits: 2,282
The cases in recent postings are much bigger than what I implemented so many years ago. I ran into problems pretty fast and in retrospect, my usage of the concept was extreme (and wrong). I suspect some of the bad reputation comes from similar cases! Even so, there are certainly issues and limitations with the model one needs to be aware off (referential integrity comes to mind).

My case used an EAV to store attributes known at design time. It could be done directly in tables with the consequence of adding entity specific triggers to implement cascading deletes for example (with properly working on delete set null etc.). Just as an explanation, cascading deletes still do not work well in SQL Server! We now have "on delete set null", but it is half baked implemented and supports just one such cascading relation case between two of the same entities. A simple thing such as keeping track of who did the creation of a record and who did the last update is already handicapped out of the box, even today!

There was also no uniformity in attribute value type and each type had its own table to store native values in (no use of sql_variant). During performance measurements back then, the most expensive operation was the number of joins required to build a complete record. And as you might guess, updating required transactions for even the simplest of updates and that across multiple tables....not exactly ideal. I went for it at the time because of initial development speed and the desired cascading behavior and unique ID of every record in the database....after that I never touched the idea again...obviously ;).

One downside of EAVs could be solved (and many more other things) by having a where clause on relations. Just as we eventually got filtered indexes and filtered statistics, we really need filtered foreign keys too (even basic checks on originating and target table would do). It is a much recurring wish by me and i suppose by anyone that is consistent in applying foreign key constraints.
Post #1414724
Posted Friday, February 1, 2013 6:42 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:12 AM
Points: 35,770, Visits: 32,434
David.Poole (2/1/2013)
@Jeff, it sounds as if you should write a counterpoint article for EAV.


I kind of did already.
http://www.sqlservercentral.com/articles/T-SQL/63681/
http://www.sqlservercentral.com/articles/Crosstab/65048/

Admittedly, they're at the 100 level but they provide a decent start to the concept.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1414889
Posted Saturday, February 2, 2013 2:46 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
Hey, Jeff I only read the first and last pages so I hope this hasn't been mentioned already. My company is a major developer for users of the DotNetNuke content management system which was originally developed as an open source project a dozen years ago.

Probably due to lack of foresight as to how big the DotNetNuke platform would become, some of the major components of the DNN schema implemented the EAV model. Now for the "regular" folks who just want to add "Shirt Size" or "Dog Breed" attributes to their system's user profile data this is really cool. They can add whatever categories upon categories with whatever attributes they want in any language. They can even create master lists (like states in OZ vs no states in NZ) or lists of bird calls by continent or whatever they can think of.

But almost all of my interface with DNN is through SSMS and customization of the database. This is my daily job. And there's tons to like about the DNN platform, but having to deal with these throwback EAV tables often makes me want to throw a shoe at somebody.

 
Post #1414928
Posted Saturday, February 2, 2013 10:03 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:12 AM
Points: 35,770, Visits: 32,434
Heh... at least you know what you'll be doing every day, Steven. And, you can buy shoes by the pallet at Walmart with just a little notice!

I agree that EAVs that are used like you say are one of the primary reasons for the hatred of them. Like any other tool, they have their uses and their abuses.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1414978
Posted Saturday, February 2, 2013 10:46 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:12 AM
Points: 35,770, Visits: 32,434
David.Poole (2/1/2013)
One comment I didn't understand in the feedback in this thread was about SQL_Variant making rubbish of the points in my article. I know it wasn't you who said it but can you shine any light on this?


My apologies. I missed this comment/question and I don't know if it will be "light" that I can shed on the subject but I can make a comment here.

In your good article, you stated the following...

If the Entity-Attribute-Value model is to be truly generic then that means that the "value" field has to be generic and untyped which usually means it has to be some form of string or variant.


... and that's probably what a lot of folks took exception to because they probably stopped reading before they got to the word "variant". I know I missed it on the very first read through.

It doesn't shred what you said about EAVs in the quote above or in other places in the article but there's a great advantage to using the SQL_Variant datatype in EAVs. You don't have to do an explicit conversion for source datatypes that don't have an implicit conversion to, say, VARCHAR(8000). You can also figure out what datatype something was stored from by using the SQL_VARIANT_PROPERTY function. Of course (and I strongly agree with you here), if you can more accurately type the value column as you suggested, that would frequently be better even if you might have to use more than one EAV. The most successful EAVs that I've used have had some form of numeric typed value column. Of course, for "element" based audit tables stored as EAVs, that option is usually out of the question because it's probably not as effective to have a table for every datatype nor performant for the audit triggers to make so many decisions. In those cases, SQL_Variant may work better for performance by the associated audit triggers. It does suck for indexing though.

So, to answer your question, "It Depends". I would't say that the use of SQL_Variant "makes rubbish" of the points in your article except at what people think of at face value. There's a lot more than just face value at stake here, though, and I think that anyone that ignores your points in the article just because they used SQL_Variant are probably overlooking some potential problems (like the indexing problem, for example).

[Edit]
I think another problem that some folks may have taken up with certain parts of the article could be based on some misunderstanding. For example, you stated in the article that....
As the ParentID attribute will be a string and the record to which is refers will be some form of integer then a lot of casting will have to take place in order to resolve the hierarchy.

I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1414981
Posted Monday, February 4, 2013 4:50 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 10:08 AM
Points: 2,921, Visits: 1,870
I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.


It was because I've seen people try and do everthing in the EAV design and not take a step back and think "hang on, perhaps hierachal concerns are best handled outside of the EAV".

My personal experience is that when someone comes up with these designs they become evangelical about them to the point of insanity. Their treatment of disenters would have made Torquemada a bit queasy. There needs to be a sense of perspective and an understanding that it is a tool not the tool.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1415232
Posted Monday, February 4, 2013 7:02 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:12 AM
Points: 35,770, Visits: 32,434
David.Poole (2/4/2013)
I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.


It was because I've seen people try and do everthing in the EAV design and not take a step back and think "hang on, perhaps hierachal concerns are best handled outside of the EAV".

My personal experience is that when someone comes up with these designs they become evangelical about them to the point of insanity. Their treatment of disenters would have made Torquemada a bit queasy. There needs to be a sense of perspective and an understanding that it is a tool not the tool.


Ah! Now I understand. And agreed. But, the key is also that it still IS a tool, albeit a tool that seems to suffer the same level of abuse that Cursors, While loops, and recursive CTE's that count do. Something more closely centered between the religious and deadly fervor of a Torquemada and the mistaken visceral fear that Chicken Little exuded from every pore is the way to go on subjects like this. I agree there, too. It's unfortunate that both forms of fear are as virulent as any plague and frequently take the place of common sense, especially in our line of business.

To wit and to summarize the use and abuse of EAVs, "It Depends" and this has been an awesome discussion on the subject. Thanks for writing the article that made it possible, David.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1415551
« Prev Topic | Next Topic »

Add to briefcase «««34567»»

Permissions Expand / Collapse