EAV's and NVP's are bad. So... What do YOU use as an alternative?

  • I've just read a dozen articles/posts where people absolutely condemn the use of EAV's (Entity, Attribute, Value) and NVP's (Name, Value Pair) tables and lambasted anyone who stood up and even eluded to a possible good use.

    So... let me ask with the promise that I'm not going to lambaste anyone for their opinion... when you have a front end app that allows users to build their own custom "record" format, how do YOU store that information in a database without using EAV's or NVP's? Or is THAT the exception?

    Thanks ahead of time for your thoughts, folks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/1/2011)


    I've just read a dozen articles/posts where people absolutely condemn the use of EAV's (Entity, Attribute, Value) and NVP's (Name, Value Pair) tables and lambasted anyone who stood up and even eluded to a possible good use.

    So... let me ask with the promise that I'm not going to lambaste anyone for their opinion... when you have a front end app that allows users to build their own custom "record" format, how do YOU store that information in a database without using EAV's or NVP's? Or is THAT the exception?

    Thanks ahead of time for your thoughts, folks.

    I'd have to say that IS the exception. You've basically left the realm of design at that point and left it in the hands of the end users. When your users are going to design their own database... well... you do what you can.

    In general I would avoid them, but there's another exception to me, and that's the proper use of a EAV table that uses multiple columns dependent on the type of value put in, and then only during the first year of a constructed DB. The idea here is that you're not quite sure what the users are going to need at first, so you design incredibly generically. After a year or two, you revisit the design wholesale, once you've learned what the end users are really doing, rather then what they thought they wanted.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks for the feedback, Craig. I'm going to hold up on telling folks whether or not I use them because I don't want to bias anyone. I might even have a surprise for folks that think they're not using them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've seen them used very successfully in applications where the end users have choices on configuration. An accounting system I used to support allowed users to create ledgers with 2 to 10 coding elements. Each element type had a letter as a prefix, and the master data was stored in a single table for all elements. Using a single table made coding somewhat easier, and gave the users maximum flexibility.

    I've used EAV tables to store configuration data in cases where it made sense. I don't like creating tables to hold a single record, so "misc" table has been a part of most of my designs. I include a comment field to document the values inserted.

    Like any other concept, there are appropriate places for EAV/NVP constructs, and it's up to us to make sure they are used correctly.

  • I've used them. Those and XML columns both work for "user-defined data structures". Which one I use depends on degree of flexibility needed and how it's going to be used.

    EAV is often easier, but XML has the advantage of the data all being in one place. If you delete a row, but not a set of rows, out of an EAV table, you've potentially corrupted the data. Not so with an XML column. XML also works nicely with .NET front ends, since the .NET languages have all kinds of XML shredding tools. Other application development scenarios are probably just as good at XML, but I don't know that one way or the other.

    On the other hand, EAV doesn't require jumping through any hoops to contain markup characters. And, if you include a column for each basic data type, and a column that indicates which type a particular row has in it, you can use that, plus a simple check constraint, to enforce pretty specific data integrity rules. Re-constituting the data is a performance killer every time I've seen these used, but it can be mostly overcome by some relatively simple tricks. Will never be fast, and doesn't index well, but it can be less-slow. It can also take a lot less disk space than XML, in some cases, and thus also less I/O work, less RAM, et al.

    I generally prefer XML columns for this kind of thing these days, but EAV has its place and its uses. It'd be more useful if T-SQL's Pivot and Unpivot operators weren't essentially junk.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Until recently I was biased against them having had extremely bad experiences with them. I have seen an implementation where it seems to work rather well. Thus, I would say it really depends on implementation and knowledge level of people implementing it.

    If ever you are curious, I have it on good account that match.com uses an EAV.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Thanks for the replies and my apologies... I lost track of this thread for a while because of all the activity on an article discussion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ross McMicken (5/3/2011)


    I've seen them used very successfully in applications where the end users have choices on configuration.

    ...

    I've used EAV tables to store configuration data in cases where it made sense.

    ...

    Like any other concept, there are appropriate places for EAV/NVP constructs, and it's up to us to make sure they are used correctly.

    My thoughts, exactly. It all boils down to "It Depends". It's a bit like saying that all Cursors and While loops are bad. While they should be generally avoided, there are times where it's not only acceptable but they actually are the correct solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (5/3/2011)


    I've used them. Those and XML columns both work for "user-defined data structures". Which one I use depends on degree of flexibility needed and how it's going to be used.

    EAV is often easier, but XML has the advantage of the data all being in one place. If you delete a row, but not a set of rows, out of an EAV table, you've potentially corrupted the data. Not so with an XML column. XML also works nicely with .NET front ends, since the .NET languages have all kinds of XML shredding tools. Other application development scenarios are probably just as good at XML, but I don't know that one way or the other.

    On the other hand, EAV doesn't require jumping through any hoops to contain markup characters. And, if you include a column for each basic data type, and a column that indicates which type a particular row has in it, you can use that, plus a simple check constraint, to enforce pretty specific data integrity rules. Re-constituting the data is a performance killer every time I've seen these used, but it can be mostly overcome by some relatively simple tricks. Will never be fast, and doesn't index well, but it can be less-slow. It can also take a lot less disk space than XML, in some cases, and thus also less I/O work, less RAM, et al.

    I generally prefer XML columns for this kind of thing these days, but EAV has its place and its uses. It'd be more useful if T-SQL's Pivot and Unpivot operators weren't essentially junk.

    Well said. Thanks for the feedback, Gus.

    I've not tried the XML route, yet. I should at least look into its perforance and handling requirements. I guess the biggest deterent from me trying it, so far, is because I've had such good luck with using Cross Tabs instead of using Pivot. And, I agree... what were they thinking when they introduced PIVOT/UNPIVOT to T-SQL. The Pivot function in Access absolutely blows the T-SQL Pivot out of the water. You'd have thought they would use that as a model.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLRNNR (5/3/2011)


    Until recently I was biased against them having had extremely bad experiences with them. I have seen an implementation where it seems to work rather well. Thus, I would say it really depends on implementation and knowledge level of people implementing it.

    Exactly. "It Depends". I am curious though... what were the "extremely bad experiences" you had with them?

    The bad experiences that I had with them was someone decided that every table in our system (for a previous company) need to have an NVP table as an audit table. Although it made the data collection and audit triggers stupid-simple to write, they were a nightmare when it came to determining what a row looked like at a given point in time. It wasn't such a bad idea for very wide tables because it did save on storage, but those saving were pretty much negated by the extra storage required for the narrower tables which made up most of the system.

    If ever you are curious, I have it on good account that match.com uses an EAV.

    Ok, now you DID get my curiosity up. What are they using it for? Their main "people" table or ????

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've seen and used EAV in several cases and, like you said, it's all in the context of the data that is being stored.

    In short, if the data points are known beforehand, tend to stay static and are well defined then simply use a relational model with proper normalization.

    On the other hand, sometimes even the customer cannot know in advance all of the possible future data points they may need to track (that is why the NIH/CDC uses an EAV model for tracking diseases - you can't know all of the variables in advance). In this case, an EAV structure allows new datapoints to be added at-will without invalidating older data (since the older, related data will simply not have the newer attributes).

    There is naturally a concern with those terrible End Users adding attributes to the EAV store willy-nilly... that's why you restrict the attribute definitions to an administrative screen with limited access.

    I can't tell you how many databases I've run across with hundreds if not thousands of columns for a single entity which were mostly filled with NULLS and required expensive program rewrites every time a new data point (column) needed to be tracked. EAV fixes this and makes searching a snap. Reporting can be a bear, like others mentioned, because of the need to 'flatten' the data and yes, there is a performance hit as well, but when Scalability outweighs performance issues, then EAV/NVP may be the way to go.

    Just like Indexes are a fantastic way of speeding up queries, adding 100 of them to a single table is probably a bad idea and makes things worse instead of better. Everything in moderation, use the right tool for the job at hand.

  • Hello,

    My problem is consisted of the subjects where one part of the subject also

    should be used EAV modelling technique.

    But have many opened questions: http://www.sqlservercentral.com/Forums/Topic1480416-373-1.aspx.

    How to prepare the database so the validation in form fields in application

    are dynamically generated when new attribute is defined.

    And same way when displaying atribbute value in reports ( value masks ).

    Another feature is how to deal when type of the attribute is such that values

    are referenced from some simple entity ( table ) let say with

    primary keys consisted of 1, 2, 3 columns. So the value of the attribute is

    vector?

    I need to make some limitations to the project not to modell somethign that

    would make mass in the application code and hard to maintain?

  • I inherited a large EAV (TRANSLATE_CODE_VALUE) that contained descriptions matched to a series of single-character codes across dozens of columns in several primary tables. The schema (and primary key) was ColumnName, Code, Description. One table had to join to the descriptions table nearly 20 times and performance obviously suffered. The surprisingly satisfactory solution was to write a procedure to convert all of those codes into a set of inline table valued functions, each consisting of a single CASE statement (see below).

    It only takes a second or so to regenerate the entire set when the descriptions table is changed, and the query plan benefitted greatly from replacing joins to the EAV with CROSS APPLYS to the newly generated functions. Essentially replacing a lot of I/O with simple compute scalar operations.

    This solution also preserves the only rational function I've ever had to concede for an EAV, NVP table, which is to keep a common repository of all definitions in one place for maintenance.

    Below is a sample function that was created automatically by the translate procedure. The text description is also automatically generated to warn off future generations from making changes directly to the function rather than continuing to maintain the TRANSLATE_CODE_VALUE table.

    ALTER FUNCTION [dbo].[tfn_TRANSLATE_APPR_ASSIGNMENT_TYPE] (@input varchar(10))

    ----------------------------------------------------------------------------------------------------

    -- This function was automatically generated by dbo.stGenerateTranslateFunctions on

    -- Apr 16 2014 11:37:20:807AM (Pacific). All codes and their translated long values

    -- come from [dbo].[TRANSLATE_CODE_VALUE].

    ----------------------------------------------------------------------------------------------------

    RETURNS TABLE

    AS

    RETURN

    (SELECT CASE ltrim(rtrim(@input))

    WHEN 'O' THEN 'Other'

    WHEN 'P' THEN 'Purchase'

    WHEN 'R' THEN 'Rental'

    ELSE '??'

    END as output_text_long);

    Admittedly, this function could be improved on for cases where dozens of codes existed for a single column. Nested CASE statements could emulate a binary search to minimize the execution time of functions with a truly large number of codes. However the performance has been more than adequate so far without that modification.

    Edited to add that yes I know this thread was almost five years old when I added this.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply