Antipathy for Entity Attribute Value data models

  • Hasing can accelerate searching for equality/inequality, but after an equality match you still got to compare the base data as collisions will occur. More frequently so as the size of your tables increase! In the end all it takes is exactly one collisions to render a system useless.

    True, but the hash can narrow the result set down enormously before that final comparison takes place. I store my hashes as computed persisted fields and use a TVF that only computes the hash being searched for once. The advantage of this kind of system is I don't need to know the number or types of things I'll be configuring. Also, the system is self documenting as it requires a description for each key value pair entered. There is a full text index on the description field and the keys and the values so the metadata can be searched by people who only see the leg of the elephant so they can see everything which affects their process or which changes to their process might affect.

  • I use EAV design for surveys that we conduct. The number of responses are generally not expected to be high (working off a defined list of contacts) and we prefer to keep it open ("early 2012" may be an acceptable date range), with analysts reviewing results for reports. I pivot the data for SSRS reports. I realize that it would not be a good solution for larger amounts of data but we're small and it works great for our purposes.

  • What percentage of a typical organisations attributes are not known at initial design time? Typically less than 5% - in which case why compromise a physical design for the spurious and unavoidable reason that requirements or business process may change in future?

    The 'alternative' is to model all known attributes correctly in the first place using declarative referential integrity and constraints. Then accept the overhead of updating the data model if and when new attributes are needed. The impact of altering tables to accommodate new attributes is inconsequential compared to changing business logic in front end applications, operational reports, data extracts and analytical/BI applications. The object of the physical design exercise is not to make the life of data architects or DBA's easier, but to support the enterprise by:

    1. facilitating getting data in and out

    2. making the entity relationships as transparent as possible

    In some cases new untyped attributes do need to be frequently added or amended (e.g. product catalogues or survey questionairres). Only in these cases should you implement a 'data driven' hybrid design which supports flexible key value pairs. For these enterprise attributes a data governance programme should devolve ownership and maintenance of this metadata to the business.

  • long_shanks64 (1/23/2013)


    What percentage of a typical organisations attributes are not known at initial design time? Typically less than 5% - in which case why compromise a physical design for the spurious and unavoidable reason that requirements or business process may change in future?

    I completely disagree with the "typically less than 5%" statement. If you are in a dynamic business environment, entity attributes are added frequently in response to new business requirements which cannot wait for IT to perform the necessary maintenance/testing/etc. associated with physical table changes.

  • BillyJack

    Q. Who implements the restrictions as to which business functions and 3rd parties have access and update permissions to this new attribute. Who decides how this new attribute will be initialised?

    answer: A IT. B. it doesn't matter C. we don't have time

    Q. Presumably this new attribute is of a certain data type, with constraints as to permissible values (possibly restricted to other existing entities and attributes). Who enforces this rule?

    answer: A IT. B. it doesn't matter C. we don't have time

    Q. Who make changes to the data capture applications? Or to the ETL to extract data to 3rd party systems. Or the existing operational reports. Or the existing analytical reports?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    Q. Who updates the enterprise data model and users' metadata dictionary?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    Q. Who enforces data quality checks on the new attribute?

    answer: A IT. B. it doesn't matter C. we don't have time

    If you answered A to these questions - IT are responsible for implementing these rules and controls. So they need to accommodate the new requirement AS PART OF THE OVERALL SOLUTION in a modification to the existing data model.

    If you answered B or C to these questions - :w00t: please see me after class.

  • long_shanks64

    IT should not be the center of the universe. Most of your questions should be answered by "Not IT" with a proper EAV model in place. Access and editing can all be handled via the data used to define the attribute.

    And spare me the arrogance of your "see me after class" comment. The topic is EAV models, not how smart you think you are.

  • I'll have a crack at answering these, since I've designed an EAV system for storing configuration data using lists and lists or lists

    Q. Who implements the restrictions as to which business functions and 3rd parties have access and update permissions to this new attribute. Who decides how this new attribute will be initialised?

    answer: A IT. B. it doesn't matter C. we don't have time

    The restrictions are stored in another list of course - and the web front end for the configurator is the only process allowed to update any of the tables.

    Q. Presumably this new attribute is of a certain data type, with constraints as to permissible values (possibly restricted to other existing entities and attributes). Who enforces this rule?

    answer: A IT. B. it doesn't matter C. we don't have time

    The types are stored in (wait for it) another list - and the web front end enforces those types.

    Q. Who make changes to the data capture applications? Or to the ETL to extract data to 3rd party systems. Or the existing operational reports. Or the existing analytical reports?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    IT does - then they assign the attributes to the business entity responsible for that process. Since everything is table driven, changing processes and adding new business requirements is a breeze because a general process can be tailored to each individual customer without recoding. God forbid I should ever have to open another stored procedure and change a list of valid clients or status codes or conditions.

    Q. Who updates the enterprise data model and users' metadata dictionary?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    The web front end does - and the stakeholders can only change the attributes in their purview, but can see anything in the whole system that might affect their process. It enforces detailed comments, which have a full text index the search page uses.

    Q. Who enforces data quality checks on the new attribute?

    answer: A IT. B. it doesn't matter C. we don't have time

    D. The MOST important person in the process - far more important than any programmer. The Subject Matter Expert assures data quality, be HE has the power to validate (and change) the configuration data. Better him than me, I want to spend my time growing the business.

  • OK, to summarise all this.

    • Ultimately IT provides a service to the commercial insterests of an organisation. Business requirements are king
    • There ARE valid use cases for EAV models as was clearly stated as a non-exhaustive list in the article. I was aware of the medical use case but didn't mention it or MUMPS because Mr Google always brings back the medical analogy.
    • The article is aimed at EAV within RDBMS systems and NOT at EAV or its relations via NOSQL solutions
    • Anchor modelling may off an interesting alternative approach
    • There are risks and issues in confusing logical and physical models
    • There has to be joined up thinking within the organisation to ensure that all data concerns are addressed (flexibility vs discipline, agility vs sustainability, data quality, security...etc.
    • Data concerns may be implemented by IT but responsibility for those concerns is a business wide issue
    • For multi-faceted queries a certain level of skill is required (yes, I do know how to query an EAV for multiple conditions).
    • There are scalability issues with EAV models but not everyone at the scale where this becomes apparent.
  • BillyJack (1/23/2013)


    long_shanks64 (1/23/2013)


    What percentage of a typical organisations attributes are not known at initial design time? Typically less than 5% - in which case why compromise a physical design for the spurious and unavoidable reason that requirements or business process may change in future?

    I completely disagree with the "typically less than 5%" statement. If you are in a dynamic business environment, entity attributes are added frequently in response to new business requirements which cannot wait for IT to perform the necessary maintenance/testing/etc. associated with physical table changes.

    Heh... wait a minute now. Are you saying that you work in an environment where developers can make physical table changes directly in the production environment without testing/etc and without going through a DBA? Or is that what you're trying to avoid with the use of EAVs?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I wouldn't want to change the table structure every time a requirement changes.

    The DBA has better things to do, and many DBAs are not noted for their SQL programming skills.

    Above all, I want my design to be flexible and maintainable by someone else but me.

  • lnardozi 61862 (1/28/2013)


    I wouldn't want to change the table structure every time a requirement changes.

    The DBA has better things to do, and many DBAs are not noted for their SQL programming skills.

    Above all, I want my design to be flexible and maintainable by someone else but me.

    Now I'm really curious. What does changing a table structure have to do with SQL programming skills?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • long_shanks64 (1/23/2013)


    BillyJack

    Q. Who implements the restrictions as to which business functions and 3rd parties have access and update permissions to this new attribute. Who decides how this new attribute will be initialised?

    answer: A IT. B. it doesn't matter C. we don't have time

    Q. Presumably this new attribute is of a certain data type, with constraints as to permissible values (possibly restricted to other existing entities and attributes). Who enforces this rule?

    answer: A IT. B. it doesn't matter C. we don't have time

    Q. Who make changes to the data capture applications? Or to the ETL to extract data to 3rd party systems. Or the existing operational reports. Or the existing analytical reports?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    Q. Who updates the enterprise data model and users' metadata dictionary?

    answer: A IT (and their BI counterparts) B. it doesn't matter C. we don't have time

    Q. Who enforces data quality checks on the new attribute?

    answer: A IT. B. it doesn't matter C. we don't have time

    If you answered A to these questions - IT are responsible for implementing these rules and controls. So they need to accommodate the new requirement AS PART OF THE OVERALL SOLUTION in a modification to the existing data model.

    If you answered B or C to these questions - :w00t: please see me after class.

    Just a little chippy for someone that doesn't have the correct answer listed for any of the questions. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/28/2013)


    Heh... wait a minute now. Are you saying that you work in an environment where developers can make physical table changes directly in the production environment without testing/etc and without going through a DBA? Or is that what you're trying to avoid with the use of EAVs?

    The point I am trying to make is that with an EAV model, you don't need to do any table maintenance in order to add new attributes. They are simply new rows of data added to the EAV tables

  • long_shanks64 (1/23/2013)


    The 'alternative' is to model all known attributes correctly in the first place using declarative referential integrity and constraints.

    Sometimes it's just not possible.

    Classic example is recording addresses.

    Here in NZ we have frequently deal with suppliers/distributors which do not separate Australian and NZ markets.

    Therefore, if I want to register on such site I need enter my address into the fields matching AUS address definitions.

    In Australia it's mandatory to fill "State" field, and in NZ we do not have neither states, nor provinces, only separation by North Island and South Island, which Australians do not understand. So, I have to play "guessing game" every time filling such form - what should I put into State field marked with a red star (mandatory) to match their expectations.

    In proper design the field should be not-nullable with limited set of allowed values for AUS addresses and NULL for all NZ addresses.

    Obviously, you cannot implement it with a "normal" design, as the number of fields is dynamic, and constraints are different from case to case.

    You can choose, of course, to go with separate tables for each coutry, but then you need to run search requests against multiple tables.

    And when you pull an address for a customer you need to use either dynamic SQL or include 2 or more tables into your SQL requests.

    To me it looks much more ugly than any "in code" check constraint implemented in "saving address" procedures.

    _____________
    Code for TallyGenerator

  • Now I'm really curious. What does changing a table structure have to do with SQL programming skills?

    --Jeff Moden

    I'm sure you've seen this hundreds of times before - you're one of the brighter lights on this board. You need some configuration data stored somewhere, so a table is created. Then another and another and before you know it you've got a few hundred. Those few hundred tables each have their own queries, their own table design, they end up getting used in joins - sometimes very badly. When the tables get created, do they have the correct indices? What is involved enterprise wide when it's time to change one of those tables - particularly one used in several hundred queries? Assuming your DBA has the time to look at all those queries (mine doesn't), what are the odds he'll hit every single one right on the head? My system is heavily optimized for retrieval and supports everything except binary - which means the TVFs can be used in stored procs, views, web services, wcf services, restful services, Json services and Sharepoint. Anything in our enterprise that might have had hard coded information in instead table driven with no duplication of data and a well defined map that explains the dependencies enterprise wide. Sniff if you want, but it's the best thing I've ever done in almost 30 years of programming.

Viewing 15 posts - 31 through 45 (of 62 total)

You must be logged in to reply to this topic. Login to reply