Problems displaying this newsletter? View online.
SQL Server Central
Featured Contents
Question of the Day
Redgate SQL Change Automation
The Voice of the DBA
 

Relationally Divided over EAV

This editorial was originally published on Aug 26, 2015. It is being re-run as Steve is on holiday.

The EAV (Entity-Attribute-Value) data model gets bad press in the world of the relational database, and with some justification. I'd read a lot of the evidence against it and even published on Simple-talk a tale of EAV gone wrong that is not for the faint-hearted. I regarded EAV designs largely as an "anti-pattern" arising from misguided attempts to transpose the world of object-orientation and loosely-typed languages directly into the relational model.

If EAV design has a "avoid at all costs" reputation among many, the fact remains that sometimes you just can't know in advance all the required attributes, and in such cases there are few alternatives. Peter Larsson, in his EAV session at the recent SQL Rally event in Amsterdam, described one such case, a database application to return medical insurance documentation, where there was simply no way to predict, over time, exactly what sort of data may need to be stored.

When Peter arrived on the project, the database contained 20 million rows, and one of the most important document search algorithms took 1 minute to return its data. When the table reached 1 billion rows, which would happen quite quickly at current growth rates, they estimated, with a high degree of confidence, that the same document search would take 134 days.

Peter then explained – and proved – that after fixing some flaws in their database design, and in their inefficient search algorithm, the same document search query, on a billion rows, returned its data in milliseconds. I nearly fell off my chair. The remainder of the session was going to explain how he did it, and it's safe to say he had his audience's attention.

I wondered briefly if by "tweaking the database design" he really meant "replace it with a proper relational model", but no. He had applied some sensible normalization but the model was a hybrid, with an EAV table containing unique attribute-value pairs alongside the normalized tables. In the EAV table, each row comprises three columns that describe an entity, an attribute, or characteristic, of that entity, and a value for that attribute.

He explained that the key to efficient querying of such a model was a technique called relational division. Let's say you live in a dorm with a number of other students who all own random number of socks of various colors. Your socks are red, green and blue and you want to know which other students have a matching set of socks. In relational division, you query the table (the dorm) for each entity (student) and their attributes (socks) that match your values (colors). The dorm is the dividend and your socks are the divisor. If student A owns yellow, black and green socks, the quotient is zero (not fulfilled). However, if Student B has yellow, red, black, blue, purple and green socks then the quotient is 1, since you and student B own the same subset of red, green and blue socks.

With the right algorithm, Peter proved that relational division can be highly efficient. Coupled with a clever approach to indexing, to facilitate "ordered index scans", and sensible statistics management, he was able to achieve formidable results – at least, I think we can call 134 days to a few milliseconds, for a billion rows, formidable.

Clearly EAV models are "difficult". They can and frequently do cause bad performance and maintenance problems, as the database grows. However, I also appreciated the lesson in why you should rarely form closed opinions on 'good' or 'bad' practices in database and query design. Sometimes there is no alternative to an EAV design, and the techniques do exist to make them work in a relational world.

Cheers,

Tony.

Further reading: Check out Peter's The E, the A and the V PDF

Tony Davis

Join the debate, and respond to today's editorial on the forums

 
 Featured Contents

How Often Do You Give Your Server a Physical?

pamelamooney1966 from SQLServerCentral

The start of a new series that looks at examining your servers and keeping track of their health.

SQL Server Logon Trigger Examples

Additional Articles from MSSQLTips.com

In this tip we look at how to create SQL Server Logon Triggers and show different examples of how this can be used.

Making the switch to SQL Monitor

Additional Articles from Redgate

Managed IT Services provider Claranet turned to Redgate after having issues with their old monitoring solution and haven’t looked back since. With SQL Monitor named most popular third-party monitoring tool in this years State of SQL Server Monitoring report, see how it helps Claranet deliver more with less.

From the SQL Server Central Blogs - Running Containers In a Virtual Machine

Grant Fritchey from The Scary DBA

The more you work with containers, the more you just want to work with containers. However, there are still reasons to have a virtual machine for some types of...

From the SQL Server Central Blogs - David's book: Putting Your Head in the Clouds

GRE (Gethyn Ellis) from The SQL DBA in the UK

I have had a few people contact me over the last few weeks asking if David’s ebook “Putting Your Head in the Clouds” is still available. The short answer...

 

 Question of the Day

Today's question (by Steve Jones - SSC Editor):

 

Dropping the External Table

I have created this table in one of my Azure SQL Databases:
CREATE EXTERNAL TABLE MyTable
( SomeKey int
, SomeValue varchar(100)
)
WITH (DATA_SOURCE = OtherDB);
Now I want to remove it. How do I do this?

Think you know the answer? Click here, and find out if you are right.

 

 

 Yesterday's Question of the Day (by Steve Jones - SSC Editor)

Combining Lists

I have two lists in Python:

>>> a = ["SQL Prompt", "SQL Compare", "SQL Data Compare"]
>>> b = ["SQL Source Control", "SQL Change Automation", "DLM Dashboard"]

I now want to combine these into 1 list, under the variable a, which is a single dimension. Which of these will work?

Answer: a.extend(b)

Explanation: The extend() method will add two lists together in a single dimension. The append method will add the parameter to the end of the list, but this becomes a list of 4 elements, one of which is another list. Combine doesn't work. Ref: Data Structures - https://docs.python.org/2/tutorial/datastructures.html

Discuss this question and answer on the forums

 

 

 

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.


SQL Server 2017 - Administration
Can't add user to db_owner role - I have a simple SQL Server 2017 database in which I am unable to add one of the database users to the db_owner role.  I have confirmed the user is actually setup in the database.  When I try to add it to the db_owner role from my own machine, using SSMS or t-sql, I get […]
SQL Server 2017 - Development
In-Memory OLTP - N00b Question - I am new to the In-Memory OLTP world (yes, I am a little behind the times, mock me if you must :)) My question is the following: If I have a time tracking which is written most often but has occasional updates applied. Data is not deleted. Would these tables be good candidates for In-Memory […]
SQL Server 2016 - Administration
Getting counts before deleting data - Hi, I'm working on data archiving to Archive database from source database. I have to insert data from source database table to archive database and then delete from source. But I want to perform a check by getting no.of rows inserted  is equal to no.of rows that are selected to delete. Table1 is parent table […]
Access and rights - Vendor app is requesting for db_owner role to do some development, when I asked for the reason for db_owner then they requested the following access for development. Rights to execute following actions. Table creation, Stored procedure, trigger creation, Views creation and rights to drop, delete and truncate tables. Creating reference And may be some activities […]
How to identify if data is skewed in a table - Hi, I'm seeing key lookup which is taking 95% of time for a query. I have verified and made sure that there is no index fragmentation and stats are up to date. Still the query is taking almost 8 seconds to complete. I want to check if the tables are skewed with data ? How […]
Login error with AD authentication remotely with SSMS - Hello all, I am getting this error on one of our servers and only when logging in remotely with a Windows AD account . Local SQL auth works fine. All ports are open but something is blocking it somewhere on the server please see capture. "The target principal name is incorrect. Cannot generate SSPI." […]
misaligned log IOs - While restoring a log on a secondary replica to prepare a database for joining an availability group I suddenly received a flurry of 'There have been n misaligned log IOs which required falling back to synchronous IO' messages. In 20 years working with SQL Server this is a completely new message to me. A Google […]
Administration - SQL Server 2014
Migration Project - Hi This is the first time I've had to migrate a mission critical 30 vCPU SQL VM from 6 year old VMWare hosts ( Dell R920s I think ) to new Nutanix kit ( the Nutanix spec as yet unknown - but I will post back as I get it ). The SQL app is […]
SQL 2012 - General
Build Failed for SSIS Solution in VS2017 - We have our SSIS solution up and running in VS2012, and we want to migrate to VS2017. But when we open the solution and do a build on VS2017 (Version - 15.9.14), without changing the target server or any settings, the build failed with the below error message. Current version running and built successfully in […]
Performance Improvement Tips - Hi, When you are handed over an environment and have to do performance improvement. What would you do? I know you may be thinking first why do  you need or thinking about improvement. I mean if is just to check if we can make it little more faster. Couple of things I am doing CPU […]
SQL Server 2012 - T-SQL
Dual Duplicates redress. - I have created a stored procedure that pulls personal information from several tables. The output is formatted into XML. I need to manipulate two of the elements to be. I have written a sub-query to return the duplicates within two data fields.I read the base query into INSERT INTO #temp_dups SELECT [RowNumber], [Party], [EID], [BID], […]
Reporting Services
Help issue with iif and basic mathematical functions - Now this seems simple but its driving me up the wall for something that seems so simple. I am attempting to preform an mathematical action on using the value from a text box in a table within an iif statement, for example if the value in textbox1 is numeric then display the value of textbox1 […]
SSRS 2017 Fresh Install - First I know this is the wrong place but SSRS 2017 does not appear as yet, so........ Fresh install on SSRS, build 14.0.600.1274, all went well until trying to trying to connect and then get the error (from the logs\) "Could not find stored procedure 'GetAllFavoriteItems'" and beyond that stuck, any ideas?
Analysis Services
Model with unrelated tables - Today I came across a modeling that I did not understand. I opened a data warehouse project with the following tables: Dimensions: - Dim_Store - Dim_Date - Dim_Customer - Dim_Location Fact: - FactSales The fact  FactSales is relating to the above dimensions, so far ok. But when I open the project in SSAS there are four […]
Integration Services
VisualStudio/SSIS - Exporting from MySQL to MSSQL using ADO.Net connector -Fail - Hi, I'm in the process of creating a package that will export X data from a table in MySQL and import into an almost identical table in MSSQL. I have successfully configured the connection to MySQL using a '.Net providers\ MySQL Data Provider' and can connect to the DB fine. I have also successfully configured […]
 

 

RSS FeedTwitter

This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -