Entity-Attribute-Value is evil ? how to avoid...

  • Hi all
    consider the following scenario:

    • I have a FILE table with known attributes (size, creation date, ...)
    • The user needs to add his own attributes for classification purposes
    • Nedd to apply permissions on attributes, since some user may edit those attributes, while some others may only read
    • The user wants some of those attributes to be localized (for example an abstract or a description) in different languages and each user has a different language

    During a discussion on a previous post someone just pointed out that EAV design is not the best we can do...
    that's true, but I don't see any other viable option to do somehing like this.

    Here EAV table design I've made ...
    Anyone can share some alternative design idea ?

    create table [UserLanguage] (
    [lang_id] int,
    [iso639code] char(3),
    )

    create table [UserDefinedAttribute] (
    [attr_id] int identity(1, 1) primary key,
    [name] nvarchar(max),
    [type] nvarchar(max),
    )

    create table [UserDefinedAttributeLocalization] (
    [attr_id] int,
    [lang_id] int,
    [name] nvarchar(max)
    )

    create table [UserDefinedAttributePermission] (
    [attr_id] int identity(1, 1) primary key,
    [group_id] int,
    [permission] nvarchar(max), -- read, read+write
    )

    create table [File] (
    [file_id] int primary key identity(1, 1),
    [path] nvarchar(255),
    bigint,
    [owner_name] nvarchar(max),
      [creation_date] date
    )

    create table [FileUserAttributes] (
    [file_id] int primary key identity(1, 1),
    [attr_id] int,
    [value] nvarchar(max), -- or maybe sql_variant
    )

    create table [FileUserAttributeLocalization] (
    [attr_id] int,
    [lang_id] int,
    [value] nvarchar(max), -- or maybe sql_variant
    )

  • I don't see EAV as evil. They don't scale well and they can be problematic for busy workloads, but for an ad hoc item that you don't want to put in every row, they make sense.

    For your design,  why have the localization item separate? I might combine these instead as:

    create table [UserDefinedAttribute] (
    [attr_id] int identity(1, 1) primary key,
    [name] nvarchar(max),
    [type] nvarchar(max),
    [lang_id] int,
    )

    create table [FileUserAttributes] (
    [file_id] int primary key identity(1, 1),
    [attr_id] int,
    [lang_id] int,
    [value] nvarchar(max), -- or maybe sql_variant
    )

    I assume if you have localization that the native, default language is listed. You can default these to English, or whatever, and keep them there with the appropriate language indicator.

    I'd be wary of MAX datatypes everywhere, as these can be slowed for performance, but if this stuff is accessed a lot, I'd also be nervous. For permissions, this seems fine.

  • I'd say this is one of the few cases where EAV is actually a reasonable normalised design. You aren't using it for the things the system itself is likely to need to know about - size, owner, creation date etc. Normally where EAV goes bad is when you try to reduce what would be every single column of data into a separate attribute row, because then querying the data becomes painfully complex.

    For user-defined key-value pairs, however, it's pretty much the only sane way to do it - except possibly using JSON/XML blobs with a dynamic structure and that isn't actually going to be any better from a querying perspective should you ever need to do it.

  • Those don't appear to be EAV tables.

    Your overuse of NVARCHAR(MAX) is going to end up being a problem, though... especially since I don't see anything that forces them out of row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason A. Long - Wednesday, November 28, 2018 6:33 PM

    They are a tool.  They may be of limited value depending on what you are trying to accomplish but I wouldn't write them off completely.

  • Lynn Pettis - Wednesday, November 28, 2018 6:41 PM

    They are a tool.  They may be of limited value depending on what you are trying to accomplish but I wouldn't write them off completely.

    Totally agreed.  Like anything else in T-SQL and SQL Server, "It Depends" and there are some incredible uses for both EAVs and NVPs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, November 28, 2018 8:59 PM

    Totally agreed.  Like anything else in T-SQL and SQL Server, "It Depends" and there are some incredible uses for both EAVs and NVPs.

    I totally disagree... They violate pretty much every principal of the relational model. The fact that you will, by design, give up data type constraints, having BITs, INTs (of all flavors), DATEs, MONEY and everything else, all being stored as NVARCHAR(MAX)... Because you have to be able to store anything and everything...  No FK or CHECK constraints either... So it's inevitable that you'll end up with "Amount Due = 'Bird'" or "Date of Birth = $42.99"...

    As far as calling them tools... maybe... but only in the sense that a pointy rock or broken beer bottle can be considered tools... The primary difference, of course, being that I can actually envision circumstances where I'd be grateful to have a pointy rock or broken beer bottle.

    That said, I get totally understand the aversion to arbitrarily restrictive rules that limit the tools at our disposal... So... I'll keep an open mind... What is a valid use case where using an EAV is the preferred option? And when was the last time that either of you actually chose to implement an EAV, rather than standard lookup tables ?

  • Steve Jones - SSC Editor - Wednesday, November 28, 2018 11:11 AM

    I don't see EAV as evil. They don't scale well and they can be problematic for busy workloads, but for an ad hoc item that you don't want to put in every row, they make sense.

    For your design,  why have the localization item separate? I might combine these instead as:

    create table [UserDefinedAttribute] (
    [attr_id] int identity(1, 1) primary key,
    [name] nvarchar(max),
    [type] nvarchar(max),
    [lang_id] int,
    )

    create table [FileUserAttributes] (
    [file_id] int primary key identity(1, 1),
    [attr_id] int,
    [lang_id] int,
    [value] nvarchar(max), -- or maybe sql_variant
    )

    I assume if you have localization that the native, default language is listed. You can default these to English, or whatever, and keep them there with the appropriate language indicator.

    I'd be wary of MAX datatypes everywhere, as these can be slowed for performance, but if this stuff is accessed a lot, I'd also be nervous. For permissions, this seems fine.

    Hi Steve. Using your way there is a different attribute for each language, while my solution proposes a translations for the same attribute... they are both valid approaches but requires a different handling from the client application perspective... I will think about it. Thanks

  • Jason A. Long - Wednesday, November 28, 2018 11:40 PM

    I totally disagree... They violate pretty much every principal of the relational model. The fact that you will, by design, give up data type constraints, having BITs, INTs (of all flavors), DATEs, MONEY and everything else, all being stored as NVARCHAR(MAX)... Because you have to be able to store anything and everything...  No FK or CHECK constraints either... So it's inevitable that you'll end up with "Amount Due = 'Bird'" or "Date of Birth = $42.99"...

    As far as calling them tools... maybe... but only in the sense that a pointy rock or broken beer bottle can be considered tools... The primary difference, of course, being that I can actually envision circumstances where I'd be grateful to have a pointy rock or broken beer bottle.

    That said, I get totally understand the aversion to arbitrarily restrictive rules that limit the tools at our disposal... So... I'll keep an open mind... What is a valid use case where using an EAV is the preferred option? And when was the last time that either of you actually chose to implement an EAV, rather than standard lookup tables ?

    What you are describing is absolutely the worst-case implementation for an EAV approach. When you start trying to use them to hold structured data, where you have known data types and relational requirements on the data. It is possible to force the design of almost anything into an EAV data model, but then you will suffer all the pain you can possibly imagine trying to maintain or query it.

    However, not all situations are quite that extreme, for a case where there is a need for user defined metadata fields on a record the design that naturally falls out of a relational model is basically EAV with the user able to create their own "field names" and assign some arbitrary data to it on a record-by-record basis. This can be an effective design, with the caveat that such metadata is typically less queryable than otherwise.

    Even if you need differing data types (lets assume a user might want to add dates or yes/no metadata), you aren't necessarily restricted to storing everything in one varchar column - you can have different tables for each of the metadata types and allow front end systems to handle the process differently.

    Ultimately EAV is a tool - it's one that is very easy to abuse and one that people who aren't database design experts sometimes gravitate towards because it looks like something that means you don't ever have to think about databases. Of course it's precisely because of those that factors that it can quickly become an anti-pattern. Certainly if you come up with a system designed such that the entire thing is EAV, then it's almost certainly fundamentally flawed.

  • andycadley - Wednesday, November 28, 2018 3:22 PM

    For user-defined key-value pairs, however, it's pretty much the only sane way to do it - except possibly using JSON/XML blobs with a dynamic structure and that isn't actually going to be any better from a querying perspective should you ever need to do it.

    Totally agree... over the years I've used XML columns in many scenarios and found that they are not a pleasure to work with if you want to query, insert, replace and so on... and if you put them in a where filter they don't scale well and you'll end up with performance bottlenecks.

  • Jason A. Long - Wednesday, November 28, 2018 6:33 PM

    I see your point since I don't like EAV too.
    Just saying "I'll call them evil" is easy... do you have a pragmatic solution to share ?

  • Jason A. Long - Wednesday, November 28, 2018 11:40 PM

    I totally disagree... They violate pretty much every principal of the relational model. The fact that you will, by design, give up data type constraints, having BITs, INTs (of all flavors), DATEs, MONEY and everything else, all being stored as NVARCHAR(MAX)... Because you have to be able to store anything and everything...  No FK or CHECK constraints either... So it's inevitable that you'll end up with "Amount Due = 'Bird'" or "Date of Birth = $42.99"...

    As far as calling them tools... maybe... but only in the sense that a pointy rock or broken beer bottle can be considered tools... The primary difference, of course, being that I can actually envision circumstances where I'd be grateful to have a pointy rock or broken beer bottle.

    That said, I get totally understand the aversion to arbitrarily restrictive rules that limit the tools at our disposal... So... I'll keep an open mind... What is a valid use case where using an EAV is the preferred option? And when was the last time that either of you actually chose to implement an EAV, rather than standard lookup tables ?

    I see them quite a bit with a particular use case, and it would be interesting to hear of an alternative implementation, pretty much if you have a database that lists things and attributes for these things. Normally you'd have attributes that specify these things in columns right? Well how would you enter a brand new attribute for this thing WITHOUT adding a new column? EAV tables just let you add a new row with the attribute and value for any particular thing. 

    Without EAV, you'd have to add a new column each time you want to have a new attribute to assist in describing this thing, but with EAV, you just add the new attribute and the value for that attribute in a new row. I suspect relational purists who hate EAV will make you alter the table to add the new attribute as a column or columns, but this means the end user cannot independently add new attributes unless they can programmatically alter the table to add a new column.

  • @Carlo P - I think my handling tends to be more consistent where if you have translations, you're a) putting one language ahead of others and b) means I need to handle translations differently than one language. I think both of those incur some technical debt, but at the expense of some (perhaps) more initial development.

    Mileage may vary. I think I'd be interested in what you choose, and really be interested in an article on how/why you did this if you're up for it.

  • @jason

    I do think EAV has a place, especially in the real world. We don't live in a perfect world, especially with regard to requirements/specifications and time. We do need to implement designs that are effective and efficient, but also timely.

    EAVs are dangerous, and should be used judiciously where they fit a problem. In this case, I have data that isn't well normalized in structure. I've had this as well in the past where we had attributes about products. Let's say I have widgets and knicknacks. They both have a price, a cost, a size, etc.

    If widgets have a UPC, I certainly could add that to the products table, but if it's not a part of knicknacks, does that make sense? Maybe. It's a design decision. If knicknacks have a capacity, but widgets don't, do I add that as well? Sparse columns were added for this reason, but to me, we're still building a strange structure that isn't normalized. Better would be to add tables for UPC and capacity,but then I get into a complex query pattern based on product.

    I have to decide on whether the normalization makes sense and sometimes it doesn't. If we're querying these attributes very rarely, and I want to keep my index small and narrow on the main table, I could do a vertical partition of the "sparse" attributes, but I could also use EAV. I've chosen both, but when I do, I also monitor the percentage of attributes v the number of entities. If lots start to need an attribute, I often think it's better to stick it back in the main table.

    That's fine. Software evolves, the requirements change. I can make the change and it usually is not a big  problem for app dev.

    What I often find is that these strange attributes are items that  are  speculative requirements on the part of the client. Many of them die away, and so the EAV is fine. Didn't really matter.

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply