Ensuring Each Client has a Full Set of Key-Value Pairs

  • Comments posted to this topic are about the item Ensuring Each Client has a Full Set of Key-Value Pairs

  • Don't you "love" it when someone rates an article as a "2" (the rating I encountered I first saw the posting) and doesn't take the time to explain why they rated the article that low?

    Haven't read the article yet but thanks for posting it.  EAVs are always a good subject to write about and discuss.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • :), opinions do vary.

    I really dislike EAVs, though they do have a very limited place at times. Perhaps someone feels differently and really likes them. Or they hate them and think no one can mention them.

  • I don't think you are the only one to dislike them, I do too. Detest would be a better description.

    It's like someone couldn't be bothered to learn normalisation and just "had a go".

  • It could be, but I also have used these in places where we weren't sure of the domain model for an entity. The business is a little flaky, and they want a couple random attributes that vary for customers. If it's low volume, like settings for a client, I've done this before. We've had a case where this was loaded once by each client in a session, and there were a few settings they wanted for some, but not for others. Since it was a rapidly changing set of requirements, making constant table alterations and having a wide table of settings didn't make sense. Simpler and easier to EAV these.

    Of course, we had to convince the developers to "reload" these settings in the app if the client changed things. For some reason they thought forcing a client to log out to get a new setting that changed was a good idea

  • Not a fan of EAVs. Whenever EAVs topic comes up, I bring up non-relational/hybrid solutions. Losing most of the relational database engine goodness, such as type enforcement, proper indexing, and the query acrobatics that must be done to gather the proper sets of data, and business rules enforcement gets messy. The trade-offs are not worth it, imho. But, for a small, quick private/internal app it may work. All about the time/cost.

  • Jeff Moden - Tuesday, July 10, 2018 7:37 AM

    Don't you "love" it when someone rates an article as a "2" (the rating I encountered I first saw the posting) and doesn't take the time to explain why they rated the article that low?

    Haven't read the article yet but thanks for posting it.  EAVs are always a good subject to write about and discuss.

    Obviously someone in the business of selling extremely expensive MDM software

  • Steve Jones - SSC Editor - Tuesday, July 10, 2018 7:58 AM

    If it's low volume..

    That's the key phrase. If it's low volume, then as long as it works, I wouldn't be bothered but if it's any kind of volume, poor design will only come back to bite you.

  • A well-written article about a knotty problem and a naughty design for an RDBMS. I am afraid the article's "let the DBA fix it" comments will earn this article a low rating - I will not contribute to that as a rating, but I will explain the low rating (from a DBA's viewpoint): 

    I have almost 1000 databases (one per customer, across multiple environments) using a similar "Item" and "ItemSetting" EAV design. As a twist (of the EAV knife), "Item" includes a PrimaryID and a SecondaryID relationships as self-references. Apart from one or two Item Settings that will be present per active system user, most are singular to absent (depending upon the features a customer wants to use). I typically see approximately 100 different query hashes used in approximately 1000 different execution plans, per database. That's about a million execution plans to ponder. Parameter sniffing issues are common, but the number of plan guides I can create are finite, and many plan guides become obsolete in the next release. Now that we implemented  "microservices", the issues are compounded ("Who's on First? What's on Second? IDunno's on Third!"), ASYNC_NETWORK_IO waits have spiked, and we have doubled the sizes of app and web servers (as a partial mitigation for that wait).  All because some thought the database's %_Settings tables were "too many and too confusing". There was, and still is, no dictionary for the developers, and two different ORMs are in use. 

    Meanwhile, a coworker DBA (or me or another, when he is out of office) spends half their day manually running scripts designed "customize" ItemSettings. When ItemSettings are "missing", the DBAs say "that's an application problem". But what the DBAs say falls on the deaf ears of Agile development and a need to keep production up. We only have 100 customers in production, and our customer base has grown 10-fold in the past 3 years.  We plan on hiring a junior DBA. Given our growth rate, that junior DBA will do nothing but "fix" ItemSettings, full time, with the next 2 years. 

    This doesn't scale. There are rumblings about somewhat trashing the Item/ItemSettings EAV design. I and my DBA coworkers already know our skills are squandered on the trash heap of an EAV design.  I expect our next move will be towards a hierarchical DBMS (away from SQL Server). That for me is not a pleasant thought, with the saving grace being that our business may survive.

  • Dibs-129480 - Tuesday, July 10, 2018 8:01 AM

    Steve Jones - SSC Editor - Tuesday, July 10, 2018 7:58 AM

    If it's low volume..

    That's the key phrase. If it's low volume, then as long as it works, I wouldn't be bothered but if it's any kind of volume, poor design will only come back to bite you.

    That's true, but I found that every few months we need to re-evaluate this and think about moving EAV items to proper columns. Easy to have this get out of control.

  • alen teplitsky - Tuesday, July 10, 2018 8:00 AM

    Jeff Moden - Tuesday, July 10, 2018 7:37 AM

    Don't you "love" it when someone rates an article as a "2" (the rating I encountered I first saw the posting) and doesn't take the time to explain why they rated the article that low?

    Haven't read the article yet but thanks for posting it.  EAVs are always a good subject to write about and discuss.

    Obviously someone in the business of selling extremely expensive MDM software

    Who?  Me in the MDM software business?  Not on your life! 😀  I just hate to see an article get a low rating without an explanation.  It's also a bit of a cowardly act and offers nothing to other folks or the author (everyone can learn, especially from the discussions that follow an article like this).

    And, just to be clear, I've not read the article yet.  It could be that I'll give it a low rating but, if I do, I'll explain why I did.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SoHelpMeCodd - Tuesday, July 10, 2018 8:16 AM

    I typically see approximately 100 different query hashes used in approximately 1000 different execution plans, per database. That's about a million execution plans to ponder. Parameter sniffing issues are common, but the number of plan guides I can create are finite, and many plan guides become obsolete in the next release.

    I'm curious, why so many plans? Is it because these settings are queried in real time for various functions?

    I get that devs think this is a DBA issue, but a load/reload of settings for the app and kept in a cache is something many devs often are happy to implement. Heck, this could easily be a client side cookie that gets loaded for a new session (or reloaded on demand). No reason to query this stuff in real time constantly if it's not changing.

  • EAV tables are a scourge upon the data landscape. We have several vendor applications which seem to be using EAV as an extreme form of obscuring what they deem to be valuable propriety information about their product. I have joked with one of them, why did they create more than one table: surely, more effort could have been applied to get the whole database to run off a single table!

    In my opinion, the only argument for them is as an application specific settings table. The settings should on be relevant to the maintenance of the application data, not for any specific user. For example, we have an in-house application that monitors a billing system. There is a nightly full load for complete reconciliation. There is also a near-real-time process that runs throughout the day. We use a flag to indicate in which mode the database is operating as signal to the SSIS packages to apply the dates to only collect deltas.

    The attribute values should be enforced as unique. There should be a couple of extra columns such as create date, modified date and a description. These are a courtesy to any future DBA.

  • EAV design has a place.  It is a tool that can be used for good or evil.  Microsoft uses in their compliance system for all the products they offer.  I interviewed, and did not get, a position with this group several years ago because I wasn't familiar with the EAV model.

  • Sorry for the slow reply Steve.

    Both tables are JOINed to a variety of other tables (~500 tables in the db). Their plans usually come with a high compile cost, poor cardinality estimates, large resultsets, and hopefully are not  executed frequently. The 2 tables' total size is up to about 500 MB (in pages) per db. Some of the legacy code makes ADO.Net calls inline, but the 2 ORMs handle most calls. Caching is done when execution counts are high. The cache is aged out after about two hours, and in more recent builds it can be busted by some of the microservices (less so by legacy code). At least one db currently has one of the tables' simpler plans (with 2 self-referencing joins and no other tables) being executed 100 times/second, which is anomalous by 2 or 3 orders of magnitude. I know the T-SQL is from legacy in-line code (not one of the ORMs) with a suspicious implicit conversion due to a data type mismatch, likely to be a regression, and is a clear candidate for caching. Another one recently caught was a parameter sniffing issue, due to a required but empty uniqueidentifier parameter being passed at compile time, once or twice a week. I (narrowly, in retrospect) requested the passed uniqueidentifier to be a valid, existing uniqueidentifier (next Sprint, assuming the team choses it). While its query perf issue is not relevant to an EAV design, you reminded me that seeing its plan being reused brings to question why the code wasn't harnessing cache - thanks!

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply