Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««123

Redesign help Expand / Collapse
Author
Message
Posted Tuesday, October 23, 2012 2:57 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 4:52 PM
Points: 7,079, Visits: 12,569
Steven Willis (10/23/2012)
opc.three (10/22/2012)
This is just a pared-down example of a method used by one of the largest open-source development platforms in existence.

And that platform would be...care to share?

DotNetNuke

DotNetNuke Users List

...and no, I don't work for DNN. But it is our primary development platform.

 

Thanks for sharing. That is not surprising. By nature DNN is built to handle structured, semi-structured and unstructured data so using an EAV may make sense. I would place that in the niche I mentioned. Most of what they store is likely unstructured, loosely typed metadata or document/site artifacts, not necessarily data domain elements. In my opinion when you have command over your data domain and need to store strongly typed data elements you should do everything in your power to avoid employing an EAV model (a non-relational concept) in a relational database. What is thought to be saved initially by implementing an EAV is paid for dearly when it comes time to work with that data in a tabular way, the definition of an anti-pattern. Just my two cents.

edit: spelling


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1376252
Posted Tuesday, October 23, 2012 3:01 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 4:52 PM
Points: 7,079, Visits: 12,569
Eric M Russell (10/23/2012)
Instead of separate denormalized columns, I'd rather contain user defined attributes in an XML column, preferably in a seperate extended type table. In the example below, each extended type (regular, contractor, temp, etc.) would have a seperate extype_id, and an employee could be of multiple types. Users can create their own types and dump their junk in it without cluttering up and fragmenting an otherwise normalized primary table. You can even create a view (or views) that join the two tables together, providing the users with a denormalized flat resultset that's easy for them to query and visualize like an Excel sheet.

create table employee
(
employee_id smallint not null primary key,
ssn char(12) not null,
hire_date datetime not null,
reportsto_id smallint null
);

create table employee_extype
(
primary key (employee_id, extype_id ),
employee_id smallint not null,
extype_id smallint not null,
exttype_xml XML null
);


I have not tried nor studied this technique in any great detail, but have heard from people I trust that the technique does not scale particularly well.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1376255
Posted Tuesday, October 23, 2012 3:26 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 3:53 PM
Points: 1,593, Visits: 4,576
opc.three (10/23/2012)
Eric M Russell (10/23/2012)
Instead of separate denormalized columns, I'd rather contain user defined attributes in an XML column, preferably in a seperate extended type table. In the example below, each extended type (regular, contractor, temp, etc.) would have a seperate extype_id, and an employee could be of multiple types. Users can create their own types and dump their junk in it without cluttering up and fragmenting an otherwise normalized primary table. You can even create a view (or views) that join the two tables together, providing the users with a denormalized flat resultset that's easy for them to query and visualize like an Excel sheet.

create table employee
(
employee_id smallint not null primary key,
ssn char(12) not null,
hire_date datetime not null,
reportsto_id smallint null
);

create table employee_extype
(
primary key (employee_id, extype_id ),
employee_id smallint not null,
extype_id smallint not null,
exttype_xml XML null
);


I have not tried nor studied this technique in any great detail, but have heard from people I trust that the technique does not scale particularly well.

Of course the ideal solution is to have all tables normalized, and I've never actually designed a table like this in production.

What's described above would be one form of a "property bag" technique. I have been tasked with building data extract and reporting from a large ISV database (PegaSystems) on Oracle that used this technique. From what I've been told, it's efficient from the application's perspective, because they can pull up a single patient's chart with only a few page reads. However, for aggregate reporting, the performance was bad, like full table scans. The developers actually had to replicate normalized versions of the tables just to facilitate the reporting queries.
In the later versions of both Oracle and SQL Server, it's possible to index XML columns, which would make it more scalable as far as querying, but I still prefer fully normalized table design. Fortunately I don't get many requests for tables that allow the users to create their own attributes or "fields" on the fly.
Post #1376264
Posted Tuesday, October 23, 2012 3:39 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 4:52 PM
Points: 7,079, Visits: 12,569
Eric M Russell (10/23/2012)
What's described above would be one form of a "property bag" technique. I have been tasked with building data extract and reporting from a large ISV database (PegaSystems) on Oracle that used this technique. From what I've been told, it's efficient from the application's perspective, because they can pull up a single patient's chart with only a few page reads. However, for aggregate reporting, the performance was bad, like full table scans. The developers actually had to replicate normalized versions of the tables just to facilitate the reporting queries.

Yep, totally agree. That's exactly where with EAVs and these type of property bag models fall down. They seem like a great idea at first because the developer making the choice is building an OLTP app..."it's so elegant, it can handle any type of property anyone could ever imagine, and it performs well, what could be better than this?" Until the poor soul that actually has to access more than one row at a time (reporting, batch processing, searching, etc.). I have been that poor soul, hence the soapbox On the other hand I have been paid well to undo messes like this

In the later versions of both Oracle and SQL Server, it's possible to index XML columns, which would make it more scalable as far as querying, but I still prefer fully normalized table design. Fortunately I don't get many requests for tables that allow the users to create their own attributes or "fields" on the fly.

I have not worked with an XML property bag implementation as described but have worked with XML indexes a bit and they have their own set of baggage to consider.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1376270
Posted Tuesday, October 23, 2012 3:53 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
opc.three (10/23/2012)
Steven Willis (10/23/2012)
opc.three (10/22/2012)
This is just a pared-down example of a method used by one of the largest open-source development platforms in existence.

And that platform would be...care to share?

DotNetNuke

DotNetNuke Users List

...and no, I don't work for DNN. But it is our primary development platform.

 

Thanks for sharing. That is not surprising. By nature DNN is built to handle structured, semi-structured and unstructured data so using an EAV may make sense. I would place that in the niche I mentioned. Most of what they store is likely unstructured, loosely typed metadata or document/site artifacts, not necessarily data domain elements. In my opinion when you have command over your data domain and need to store strongly typed data elements you should do everything in your power to avoid employing an EAV model (a non-relational concept) in a relational database. What is thought to be saved initially by implementing an EAV is paid for dearly when it comes time to work with that data in a tabular way, the definition of an anti-pattern. Just my two cents.

edit: spelling

OH, YES! I strongly agree! That little bit of "flexibility" established in DNN 1.0 from the start has caused me many headaches all the way up the the current v6.xx. When doing custom development within the DNN framework I avoid the ProfilePropertyDefinition table if at all possible and just create my own custom table(s) for the particular requirement.

My example above actually shows one of the major problems with the approach whereas I often have to use a PIVOT operator to get the necessary results. But very often I get handed a pre-existing site in which the previous admin has used the native definition tables and it would be more work to change it than to just apply the procedures I've collected over the years like my pivot query above. A pivot seems simple once you see a good example, but TOO MANY TIMES TO COUNT I've seen something like this:

SELECT
*
FROM
(
SELECT up.Value
FROM User u
INNER JOIN UserProfile up ON u.UserID = up.UserID
INNER JOIN ProfilePropertyDefinition ppd ON up.PropertyID = ppd.PropertyID
WHERE u.UserID = 12345 AND ppd.PropertyName = 'FName'
) AS FName
,(
SELECT up.Value FROM User u INNER JOIN ..... etc
) AS LName
,(
SELECT up.Value FROM User u INNER JOIN ..... etc
) AS City
,(
SELECT up.Value FROM User u INNER JOIN ..... etc
) AS State
/*** repeated dozens of times for all the profile items...(!) ***/
) AS Result
etc


Then, to make things even worse code like this will sometimes be wrapped by a cursor looping through all of the UserIDs!

Ultimately, I don't think such a construction is necessarily always bad (an EAV that is, not the bad query above)--sometimes they can be very useful. But it's a different kind of data relationship that takes some thought to build efficient queries.

Another construct I'd put in this category is something like a hierarchical Product table with multiple parent-child levels. That has its pros and cons as well--but writing EFFICIENT queries to build proper hierarchies can be a real PIA.

 
Post #1376272
Posted Wednesday, October 24, 2012 11:10 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, July 10, 2014 7:06 AM
Points: 20, Visits: 107
opc.three (10/23/2012)
Eric M Russell (10/23/2012)
What's described above would be one form of a "property bag" technique. I have been tasked with building data extract and reporting from a large ISV database (PegaSystems) on Oracle that used this technique. From what I've been told, it's efficient from the application's perspective, because they can pull up a single patient's chart with only a few page reads. However, for aggregate reporting, the performance was bad, like full table scans. The developers actually had to replicate normalized versions of the tables just to facilitate the reporting queries.

Yep, totally agree. That's exactly where with EAVs and these type of property bag models fall down. They seem like a great idea at first because the developer making the choice is building an OLTP app..."it's so elegant, it can handle any type of property anyone could ever imagine, and it performs well, what could be better than this?" Until the poor soul that actually has to access more than one row at a time (reporting, batch processing, searching, etc.). I have been that poor soul, hence the soapbox On the other hand I have been paid well to undo messes like this

In the later versions of both Oracle and SQL Server, it's possible to index XML columns, which would make it more scalable as far as querying, but I still prefer fully normalized table design. Fortunately I don't get many requests for tables that allow the users to create their own attributes or "fields" on the fly.

I have not worked with an XML property bag implementation as described but have worked with XML indexes a bit and they have their own set of baggage to consider.


Exactly! I had to support a system like this for patient case records. It worked fantastic when a nurse only had to pull data for one patient. As soon as they wanted reports then you start seeing the issues. I always had to tell folks to try and keep the requested fields from those particular tables to a minimum due to the table scans. Unfortunately it wasn't our app so I couldn't make any changes to schema. Otherwise I could have allieviated some of the issues with some well placed indexes but it still wouldn't be ideal.
Post #1376601
« Prev Topic | Next Topic »

Add to briefcase «««123

Permissions Expand / Collapse