Antipathy for Entity Attribute Value data models

  • BillyJack (1/29/2013)


    Jeff Moden (1/28/2013)


    Heh... wait a minute now. Are you saying that you work in an environment where developers can make physical table changes directly in the production environment without testing/etc and without going through a DBA? Or is that what you're trying to avoid with the use of EAVs?

    The point I am trying to make is that with an EAV model, you don't need to do any table maintenance in order to add new attributes. They are simply new rows of data added to the EAV tables

    Yep. I agree. I've been there. It does make such a thing wicked easy. And it's simple enough to pivot the data into a table-like result set when needed by using a bit of dynamic SQL.

    It is a bit wicked, though. How are you handling similar but different attribute names for the same thing? For example, in address related data, you might have an attribute for "Street Number". Not including the potential for phat phingering, I can think of at least 15 different ways that might be entered as an attribute name. Short of a huge list of possible attribute names and a "simto" search, I've not found a way to overcome that problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • lnardozi 61862 (1/29/2013)


    Now I'm really curious. What does changing a table structure have to do with SQL programming skills?

    --Jeff Moden

    I'm sure you've seen this hundreds of times before - you're one of the brighter lights on this board. You need some configuration data stored somewhere, so a table is created. Then another and another and before you know it you've got a few hundred. Those few hundred tables each have their own queries, their own table design, they end up getting used in joins - sometimes very badly. When the tables get created, do they have the correct indices? What is involved enterprise wide when it's time to change one of those tables - particularly one used in several hundred queries? Assuming your DBA has the time to look at all those queries (mine doesn't), what are the odds he'll hit every single one right on the head? My system is heavily optimized for retrieval and supports everything except binary - which means the TVFs can be used in stored procs, views, web services, wcf services, restful services, Json services and Sharepoint. Anything in our enterprise that might have had hard coded information in instead table driven with no duplication of data and a well defined map that explains the dependencies enterprise wide. Sniff if you want, but it's the best thing I've ever done in almost 30 years of programming.

    I try not to sniff at anything except unnecessary RBAR or the improper use of CLR. 😀

    Unless I'm mistaken, you talking about a bunch of similar small reference tables that may have an ID, a code, and a description in them? If that's correct, I don't see those tables ever having a new column being added to them. Could you explain why you might want to add a column to such a reference table?

    If you have hundreds of them, I can see where adding new data to them could be a bit of a problem because you'd have to find the right one first. That would be easy if DRI were properly instantiated at table creation time.

    Of course, I might be misunderstanding what you mean. Are you, in fact, saying that you consolidated all such reference tables into a single EAV? If so, how are you handling the implicit conversion problems that destroy SARGability during joined lookups? Did you use SQL_Variant to fix that particular problem or some other method?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/30/2013)


    It is a bit wicked, though. How are you handling similar but different attribute names for the same thing? For example, in address related data, you might have an attribute for "Street Number". Not including the potential for phat phingering, I can think of at least 15 different ways that might be entered as an attribute name. Short of a huge list of possible attribute names and a "simto" search, I've not found a way to overcome that problem.

    One of the requirements I've always had when an EAV model is to be implemented is that the business owner MUST have someone (a data czar for lack of a better name) responsible for managing new attribute setup (and their valid values) in the system. If not, you will inevitably wind up with multiple attribute names representing the same thing as you indicated, and you will be writing a "merge" proc to consolidate all of the attributes and their values under the proper attribute name. So, with built in business flexibility comes added business responsibility...

  • BillyJack (1/31/2013)


    Jeff Moden (1/30/2013)


    It is a bit wicked, though. How are you handling similar but different attribute names for the same thing? For example, in address related data, you might have an attribute for "Street Number". Not including the potential for phat phingering, I can think of at least 15 different ways that might be entered as an attribute name. Short of a huge list of possible attribute names and a "simto" search, I've not found a way to overcome that problem.

    One of the requirements I've always had when an EAV model is to be implemented is that the business owner MUST have someone (a data czar for lack of a better name) responsible for managing new attribute setup (and their valid values) in the system. If not, you will inevitably wind up with multiple attribute names representing the same thing as you indicated, and you will be writing a "merge" proc to consolidate all of the attributes and their values under the proper attribute name. So, with built in business flexibility comes added business responsibility...

    Concur and that's where I was going to on this one. In every case except for one internal app, we made it so that only those folks with such responsibility could actually add attributes. Is it idiot proof? Not by a long shot but the customers sure were/are happy about it.

    We did go a bit further, though. We made it so that the unique attributes could be listed for any given entity (there weren't THAT many because the rest of the system was properly normalized) and then through a simple system of check boxes, made it so that one of the data czar's you spoke of could combine attributes for that given entity. As you can well imagine, the stored procedure for such an update was incredibly simple and didn't even require any dynamic SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What about containing customer contacts?

    I still that one Customer_Contact table that contains something like Customer_ID, Contact_Type, Contact_Code columns (Entity-Attribute-Value ???) is better than seperate Customer_Phone, Customer_Fax, Customer_Email, Customer_Tweet, ad nauseam tables.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • @jeff What sort of size, scale and traffic are you putting through your EAV models?

  • David.Poole (1/31/2013)


    @Jeff What sort of size, scale and traffic are you putting through your EAV models?

    Some of the tables got quite large (several million rows). The post just a couple up from here is a reasonable example of what one app used for contact info. We had one for address information, as well. Proper indexing on the "entity" and a bit of dynamic SQL as a dynamic crosstab worked well in most of the instances.

    It's definitely not the path I'd have liked to travel down but we made it work and, especailly for what it did, it worked quite fast (apologies... it was several years ago and I don't have the metrics available). A side advantage was it did greatly simplify certain types of searches.

    I have to admit that one of my favorite types of EAV is for data collection by automation. Something with a device number, a date, and some value. Makes life super easy for creating reports by just about any time frame you can imagine. In those areas, many millions of rows and hundreds of read and thousands of writes per minute. On the IVR system I wrote, I could do reporting by hour of the day across 9 different catagories for every day of a month including category grand and cumulative totals on 12 million rows of data in about 5 seconds. It would have been a huge pain without the use of an EAV.

    Again, I hate to call these things "models" because that implies a much broader scope than what we used. We had some EAV "tables".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff, it sounds as if you should write a counterpoint article for EAV.

    Streaming stuff into an EAV has never been a problem, its the use cases for getting it out that are problematical.

    I can see it apply to telematics data to an extent in that I have an event type, date/time, value, GPS co-ordinate. Basically anything that is entered by machine reduces the data quality issues.

    One comment I didn't understand in the feedback in this thread was about SQL_Variant making rubbish of the points in my article. I know it wasn't you who said it but can you shine any light on this?

    Simple aggregations have worked fine, retrieval of all attributes for an entity have been fine. Pivoting I've seen eat a box.

    The particular example I gave where complex combinations of criteria caused no end of problems simply because the complexity of the SQL grew geometrically.

    One particular implementation was knocking on the door of 600GB so you can imagine the number of records in EAV form. It was a 9 figure number of records per day. We had to switch off mirroring due to the excessive IO involved.

  • The cases in recent postings are much bigger than what I implemented so many years ago. I ran into problems pretty fast and in retrospect, my usage of the concept was extreme (and wrong). I suspect some of the bad reputation comes from similar cases! Even so, there are certainly issues and limitations with the model one needs to be aware off (referential integrity comes to mind).

    My case used an EAV to store attributes known at design time. It could be done directly in tables with the consequence of adding entity specific triggers to implement cascading deletes for example (with properly working on delete set null etc.). Just as an explanation, cascading deletes still do not work well in SQL Server! We now have "on delete set null", but it is half baked implemented and supports just one such cascading relation case between two of the same entities. A simple thing such as keeping track of who did the creation of a record and who did the last update is already handicapped out of the box, even today!

    There was also no uniformity in attribute value type and each type had its own table to store native values in (no use of sql_variant). During performance measurements back then, the most expensive operation was the number of joins required to build a complete record. And as you might guess, updating required transactions for even the simplest of updates and that across multiple tables....not exactly ideal. I went for it at the time because of initial development speed and the desired cascading behavior and unique ID of every record in the database....after that I never touched the idea again...obviously ;).

    One downside of EAVs could be solved (and many more other things) by having a where clause on relations. Just as we eventually got filtered indexes and filtered statistics, we really need filtered foreign keys too (even basic checks on originating and target table would do). It is a much recurring wish by me and i suppose by anyone that is consistent in applying foreign key constraints.

  • David.Poole (2/1/2013)


    @Jeff, it sounds as if you should write a counterpoint article for EAV.

    I kind of did already.

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    http://www.sqlservercentral.com/articles/Crosstab/65048/

    Admittedly, they're at the 100 level but they provide a decent start to the concept.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey, Jeff I only read the first and last pages so I hope this hasn't been mentioned already. My company is a major developer for users of the DotNetNuke content management system which was originally developed as an open source project a dozen years ago.

    Probably due to lack of foresight as to how big the DotNetNuke platform would become, some of the major components of the DNN schema implemented the EAV model. Now for the "regular" folks who just want to add "Shirt Size" or "Dog Breed" attributes to their system's user profile data this is really cool. They can add whatever categories upon categories with whatever attributes they want in any language. They can even create master lists (like states in OZ vs no states in NZ) or lists of bird calls by continent or whatever they can think of.

    But almost all of my interface with DNN is through SSMS and customization of the database. This is my daily job. And there's tons to like about the DNN platform, but having to deal with these throwback EAV tables often makes me want to throw a shoe at somebody. 😛

     

  • Heh... at least you know what you'll be doing every day, Steven. 😉 And, you can buy shoes by the pallet at Walmart with just a little notice! 😛

    I agree that EAVs that are used like you say are one of the primary reasons for the hatred of them. Like any other tool, they have their uses and their abuses.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David.Poole (2/1/2013)


    One comment I didn't understand in the feedback in this thread was about SQL_Variant making rubbish of the points in my article. I know it wasn't you who said it but can you shine any light on this?

    My apologies. I missed this comment/question and I don't know if it will be "light" that I can shed on the subject but I can make a comment here. 😀

    In your good article, you stated the following...

    If the Entity-Attribute-Value model is to be truly generic then that means that the "value" field has to be generic and untyped which usually means it has to be some form of string or variant.

    ... and that's probably what a lot of folks took exception to because they probably stopped reading before they got to the word "variant". I know I missed it on the very first read through. :blush:

    It doesn't shred what you said about EAVs in the quote above or in other places in the article but there's a great advantage to using the SQL_Variant datatype in EAVs. You don't have to do an explicit conversion for source datatypes that don't have an implicit conversion to, say, VARCHAR(8000). You can also figure out what datatype something was stored from by using the SQL_VARIANT_PROPERTY function. Of course (and I strongly agree with you here), if you can more accurately type the value column as you suggested, that would frequently be better even if you might have to use more than one EAV. The most successful EAVs that I've used have had some form of numeric typed value column. Of course, for "element" based audit tables stored as EAVs, that option is usually out of the question because it's probably not as effective to have a table for every datatype nor performant for the audit triggers to make so many decisions. In those cases, SQL_Variant may work better for performance by the associated audit triggers. It does suck for indexing though.

    So, to answer your question, "It Depends". I would't say that the use of SQL_Variant "makes rubbish" of the points in your article except at what people think of at face value. There's a lot more than just face value at stake here, though, and I think that anyone that ignores your points in the article just because they used SQL_Variant are probably overlooking some potential problems (like the indexing problem, for example).

    [Edit]

    I think another problem that some folks may have taken up with certain parts of the article could be based on some misunderstanding. For example, you stated in the article that....

    As the ParentID attribute will be a string and the record to which is refers will be some form of integer then a lot of casting will have to take place in order to resolve the hierarchy.

    I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.

    It was because I've seen people try and do everthing in the EAV design and not take a step back and think "hang on, perhaps hierachal concerns are best handled outside of the EAV".

    My personal experience is that when someone comes up with these designs they become evangelical about them to the point of insanity. Their treatment of disenters would have made Torquemada a bit queasy. There needs to be a sense of perspective and an understanding that it is a tool not the tool.

  • David.Poole (2/4/2013)


    I know a lot of good folks looked at that and, having worked with Adjacency Lists where both the child and parent IDs are numeric, might wonder why you even brought this up as an EAV. I have to admit, even I don't understand why you brought this up as an EAV.

    It was because I've seen people try and do everthing in the EAV design and not take a step back and think "hang on, perhaps hierachal concerns are best handled outside of the EAV".

    My personal experience is that when someone comes up with these designs they become evangelical about them to the point of insanity. Their treatment of disenters would have made Torquemada a bit queasy. There needs to be a sense of perspective and an understanding that it is a tool not the tool.

    Ah! Now I understand. And agreed. But, the key is also that it still IS a tool, albeit a tool that seems to suffer the same level of abuse that Cursors, While loops, and recursive CTE's that count do. Something more closely centered between the religious and deadly fervor of a Torquemada and the mistaken visceral fear that Chicken Little exuded from every pore is the way to go on subjects like this. I agree there, too. It's unfortunate that both forms of fear are as virulent as any plague and frequently take the place of common sense, especially in our line of business.

    To wit and to summarize the use and abuse of EAVs, "It Depends" and this has been an awesome discussion on the subject. Thanks for writing the article that made it possible, David.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply