Antipathy for Entity Attribute Value data models

  • Comments posted to this topic are about the item Antipathy for Entity Attribute Value data models

  • I could not agree more.

    However isn't this the same concept being used for "Big Data" .

    Every presentation I see about big data (aside from hadoop), seems to look just like EAV models that we so hate. I fear we are returning to those days.

  • Hi

    A good example of EAV model are e-commerce apps like Magento. They are using an EAV model and duplicate writes to maintain some denormalized tables, they are using EAV for writes and relational model for reads. You can also find EAV models in clinic models due the huge range of attributes to analyze. In my opinion, as you said in your article, business requirement standardization is the main reason for using it.

    Regards

    Carlos

  • Not EAV, but I'd be interested in people's views on Anchor Modelling (www.anchormodeling.com) which basically normalises out every column except the primary key, but provides decent generated code for accessing as it still in 3NF.

    For an application we have that needs every change to any column to be recorded, it looks attractive from a storage-space perspective and maintainability for schema changes (also a requirement in this case), but I'm a bit concerned about performance for a live transactional system (rather than a data warehouse).

  • I think you miss the most critical points of EAV data models:

    a. "models" in general, not just data models, are abstractions that help humans understand a process or a system. A conceptual or logical data model describes the underlying business as a relational database, derived by a data architect working with the Subject Matter Expert to rapidly build the correct database structures that "model" the business or process.

    b. The physical databases implemented from a logical data model will automatically enforce database integrity: both entity integrity and relational (foreign key) integrity.

    c. The ANSI - SPARC three-schema architecture, supported by high quality data modeling tools like Embarcadero's ER Strudio and ERWin, does the coupling of conceptual, logical and physical models and keeps them in sync. Tremendous value.

    d. The physical database is, or may be, just one of several possible implementations of a logical data model.

  • In many ways SQL Server itself is an EAV system. Tables, columns, constraints, they're all records in tables.

    However, having said that, why reinvent the wheel? SQL Server actually abstracts EAV so we don't have to worry about all the (myriad) implementation details for a complex EAV system. I can live with that!

    The one use case I have no trouble allowing in a model is that of "minor" tables, such as a 3-4 item list that would otherwise be hardcoded into the application and merely stored with a check constraint in a byte field.

    One example would be a status field that basically said: Active, Inactive, Parted Out, or Destroyed. If you have a dozen such small status fields an EAV approach of key, value, and (perhaps) user explanation fields isn't burdensome.

    Beyond that--well, too much work. 😀

  • Wow, I never knew someone would use EAV so extensively! Usually the only time we employ such a scenario is when we have a need for "Customer-Defined Fields" (CDFs as opposed to UDFs). These are not fields we would know about ahead of time, nor could we predict what a Customer wants. Obviously we attempt to guide them to use any existing fields "correctly" because (as we all know) every Customer uses different terminology for the same things. Lastly, once we notice multiple Customers using the same "Custom-Defined Fields", we "promote" those to real fields and standardize everyone at the same time.

    The idea (I believe) of EAVs is to allow flexibility in the short-term, while at the same time learning about what the audience of the system needs, and then promoting those needs to real models as trends are noticed.

  • Datomic (http://www.datomic.com/) by Rich Hickey addresses many of your criticisims. It add Time to EAV (EAVT), so the state of the database is available as a value at any point in time. It uses Datalog query language to handle queries. In addition it does have a schema that includes data types, whether one entity is a component of another. Certainly worth taking a look at.

  • I agree with tymberwyld above. I normally limit EAVs to those things that cannot be known at design time.

    I also think that the ability to datatype such a thing has been grossly underestimated. SQL_Variant does a dandy job there although I'll admit it provides no datatype enforcement by itself. While I also agree that contraints and FK's are all but useless, that's the whole point behind EAVs... they allow the end user to create "columns" on the fly. A trigger that "learns" could be applied to the table for a form of DRI and datatype enforcement. Yep... users can make a mess of such a table but such messes are sometimes best left to the customer.

    Shifting gears, there is actually quite a bit of data that is better served in the form of an EAV. A simple example would be monthly plans vs actuals where you store the month or day or other temporal data in a column, the entity, the attribute (plan, actual, etc), and, of course, the value. Some would even call this particular type of EAV a "normalized fact table". Once you learn how to quickly aggregate and pivot the data, they're extremely useful and easy to manipulate. Of course, the cool part about this form of an EAV is that you can actually have DRI by entity and attribute and date/time.

    Another very common use for EAVs is for auditing very wide tables. Yep... it can be a pain to figure out what a given row looked like at a given point of time the first time but, once you've done it, you won't think it so bad. Well, unless you're an ISO/ANSI purist. 😛

    Also, in real life, I try to avoid storing XML anywhere in the database. It just seems as stupid as storing CSVs and certainly has more bloat both in parsing and storing. I'd much rather see the XML be shredded and propagated to real, normalized tables.

    As far as polyglots go, I try to NOT go there. One app for the front end and one app for the backend is usually just right. Anything more is like a very complicated trip to work where you drive a car to a train station, walk from the train station to the curb where you pick up a bus that's not going directly to where you want it to go, get dropped off near work, cross several streets, take an elevator, follow a road map to your cube, and spend the next half hour reading emails that shouldn't have been sent to you to begin with before you can do any of your own work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Having used EAV designs for well over a decade, I still haven't seen a good alternative for data driven systems that provides the flexibility of an EAV in dealing with adding new attributes and incorporating them into an organization's business rules. Yes, data typing of the attribute value is an issue, but so is the ongoing maintenance of adding columns to tables and updating SQL to incorporate those new columns and managing application versioning/rollout. I would love to have an alternative for EAV designs where new attributes are regularly added to an entity's definition.

  • As I understand it, that's exactly what Anchor Modeling (see earlier post) is trying to achieve, since new attributes can be added without detriment to the existing schema and with auto-generated code to ease the use of the added complexity required to allow this. That's one reason it interests me, though I do have my doubts.

  • Customer contact information is an example of where an entity-attribute-value model makes sense. The design is simple, efficient, and can be easily extended to support any type of contact, wether it be G.O.T. style messenger ravens or yet to be defined holograms.

    create table Customer_Contact

    (

    primary key clustered ( Customer_ID, Contact_Type ),

    Customer_ID int not null,

    Contact_Type char(2) not null,

    Contact_Code varchar(8000) not null

    );

    PH = Home Phone

    PC = Cell Phone

    PW = Work Phone

    PA = After Hours Phone

    PP = Primary Phone

    PS = Secondary Phone

    PE = Emergency Phone

    EC = Personal Email

    EW = Work Email

    EP = Primary Email

    34592 PH 555-381-0922

    34592 PC 555-381-8111

    34592 PW 555-576-0542

    34592 PP 555-576-0542

    34592 EC customer34592@gmail.com

    34592 EP customer34592@gmail.com

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/21/2013)


    Customer contact information is an example of where an entity-attribute-value model makes sense. The design is simple, efficient, and can be easily extended to support any type of contact, wether it be G.O.T. style messenger ravens or yet to be defined holograms.

    create table Customer_Contact

    (

    primary key clustered ( Customer_ID, Contact_Type ),

    Customer_ID int not null,

    Contact_Type char(2) not null,

    Contact_Code varchar(8000) not null

    );

    PH = Home Phone

    PC = Cell Phone

    PW = Work Phone

    PA = After Hours Phone

    PP = Primary Phone

    PS = Secondary Phone

    PE = Emergency Phone

    EC = Personal Email

    EW = Work Email

    EP = Primary Email

    34592 PH 555-381-0922

    34592 PC 555-381-8111

    34592 PW 555-576-0542

    34592 PP 555-576-0542

    34592 EC customer34592@gmail.com

    34592 EP customer34592@gmail.com

    Personally I would have a PhoneContactPoint table and an EmailContactPoint table

    CREATE TABLE PhoneContactPointType(

    PhoneContactPointTypeID SMALLINT NOT NULL

    CONSTRAINT PK_PhoneContactPointType PRIMARY KEY CUSTERED,

    PhoneContactPointType VARCHAR(50) NOT NULL

    CONSTRAINT UQ_PhoneContactPointType UNIQUE,

    CONSTRAINT CK_PhoneContactPointType CHECK(LEN(PhoneContactPointType)>0)

    )

    CREATE TABLE PhoneContactPoint (

    CustomerID INT NOT NULL

    CONSTRAINT FK_PhoneContactPoint_Customer FOREIGN KEY REFERENCES Customer_Contact(CustomerID)

    PhoneContactPointTypeID SMALLINT NOT NULL

    CONSTRAINT FK_PhoneContactPoint_PhoneContactPointType FOREIGN KEY REFERENCES PhoneContactPointType(PhoneContactPointTypeID),

    TelephoneNumber VARCHAR(25) NOT NULL,

    CONSTRAINT PK_PhoneContactPointType PRIMARY KEY (TelephoneNumber ,PhoneContactPointTypeID )

    )

    Ditto EmailContactPoint.

    You could even generalise EmailContactPoint to InternetContactPoint if you wanted blog sites and Twitter details

  • For Customer Contacts, why wouldn't I make a "Contact Type" table and just normalize this? Sure it can store just about anything, but since all contacts are going to be a string of some sort, I don't see the value in not going the one extra step and normalizing. With the existing model, I still have to either know what the two-character code represents or have to maintain a lookup table somewhere. I could also potentially run out of combinations or be forced to make new entries that don't make sense just so they're unique ("XX" - what does this mean again?). (And yes, this is highly unlikely, but it could potentially happen.)

    Regarding the article, I appreciated the examples in the article and the practical cases for them. Too many times we get someone modeling out an EAV config just because it seems expedient when it will more than likely cause issues later down the line. I've also heard "we'll prevent bad data through the code" too many times. That's never worked well in my experience, though hoping others have had better experience.

  • David.Poole (1/21/2013)


    ...

    ...

    Personally I would have a PhoneContactPoint table and an EmailContactPoint table

    CREATE TABLE PhoneContactPointType(

    PhoneContactPointTypeID SMALLINT NOT NULL

    CONSTRAINT PK_PhoneContactPointType PRIMARY KEY CUSTERED,

    PhoneContactPointType VARCHAR(50) NOT NULL

    CONSTRAINT UQ_PhoneContactPointType UNIQUE,

    CONSTRAINT CK_PhoneContactPointType CHECK(LEN(PhoneContactPointType)>0)

    )

    CREATE TABLE PhoneContactPoint (

    CustomerID INT NOT NULL

    CONSTRAINT FK_PhoneContactPoint_Customer FOREIGN KEY REFERENCES Customer_Contact(CustomerID)

    PhoneContactPointTypeID SMALLINT NOT NULL

    CONSTRAINT FK_PhoneContactPoint_PhoneContactPointType FOREIGN KEY REFERENCES PhoneContactPointType(PhoneContactPointTypeID),

    TelephoneNumber VARCHAR(25) NOT NULL,

    CONSTRAINT PK_PhoneContactPointType PRIMARY KEY (TelephoneNumber ,PhoneContactPointTypeID )

    )

    Ditto EmailContactPoint.

    You could even generalise EmailContactPoint to InternetContactPoint if you wanted blog sites and Twitter details

    Yeah, but to what end? There is no practical reason to partition internet based contacts from telecom contacts in the data model. Most phone or fax calls today are done using an internet enable device, wether it be iPhone, Skype, or magicJack. I still have the same 999-999-9999 I had ten years ago, only it's since been ported over from an analog provider to magicJack.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 1 through 15 (of 62 total)

You must be logged in to reply to this topic. Login to reply