• lnardozi 61862 (1/21/2013)


    I made an entire subsystem for EAV data, in order to hold configuration information to be used by the entire enterprise.

    There are stored procs, DTS packages, jobs, C# apps, web services and VB apps that access this data, which has been placed in lists. Those lists can be lists, or they can be lists that vary according to a second definable parameter, or tables that vary according to a definable parameter. Those 900 configuration tables we can can be abstracted down to 3 or 4 and the data normalized, since there are only keys, values (varchar(max)) and descriptions (also varchar(max)). We don't have a need to store binary information, and we use the TVFs to populate table variables that are then used in our queries (so casting is minimized). Even taking the hit where nested case statements would normally be used, it is actually faster than the nested case statements themselves and much cleaner and more maintainable. My goal is to have all our IFs and CASEs eventually use the configuration data so the entire enterprise is table driven. I guess the thing I'm most proud of is the index on varchar(max). Heh. Compute the SHA, persist it, close enough when you say 'the hash' = '@the hash' AND value = @value. Make sure there's indices on all the foreign keys and it's nothing but index seek. It conforms to my 'all your shit in one sock' algorithm, which places the configuration data where you know it'll be.

    Hasing can accelerate searching for equality/inequality, but after an equality match you still got to compare the base data as collisions will occur. More frequently so as the size of your tables increase! In the end all it takes is exactly one collisions to render a system useless.

    I made an 'object model' in SQL Server in the past and it had tricks that SQL Server could not do at the time (on delete set null for example) and was superior in cascading delete flexibilities. But as the amount of data put into the system grew, problems arose, many problems. Some of them I could today alleviate somewhat with better code, but the core of the problems remain. Inefficient IO, locking and lots of trigger/procedural code to enforce things. I have to say I did wend extreme and put everything into the model, the whole database, including all things I knew at design time. This is certainly not something I will ever repeat!

    I am all for metamodels for handling design time unknowns and/or to physically extend a set model with extra attributes when it benefits. Physical simply because you can access the data where it otherwise would be modeled too and is open to native indexes and constraints. This is more efficiently in all aspects concerned (speed/correctness/locking/code execution).

    There might be one case I would consider EAV over physical table and that is with very wide tables that exhibit access patterns over a fraction of the attributes at a time. But even then I would seem to structure it more as it is very likely that some attributes are always accessed in groups and thus a form partitioning is likely more effective. In a sense EAV is a form of partitioning that is overdone in most instances!