Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

Name value pair (EAV) model Expand / Collapse
Author
Message
Posted Tuesday, March 25, 2008 10:21 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, April 18, 2014 2:06 PM
Points: 37, Visits: 176
Just like to thank everyone for an excellent discussion. Very informative.

Jeff Roughgarden, Ph.D., MCDBA, MCSD



Post #474216
Posted Tuesday, March 25, 2008 12:14 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 12:04 PM
Points: 266, Visits: 2,601
Thanks for the article. I'm interested in understanding different design schemes. My question may have already been answered, but I'm not sure.

For the first time, I'm considering an EAV setup, and I would be very interested to hear opinions. My staff need to do a "risk assessment", but you can also think of it as a "survey" problem. I think there is a very good chance that the questions in the assessment/survey will vary over time as we refine our assessment process. Here's a subset of some of the questions in the risk assessment to give you a sense of what I am talking about:

> Risk of a fall
> Risk to client or caregivers from pets, livestock, etc.
> home has odors (gas, rotting food, etc.)
> client has skin condition
> client has access to adequate medical supplies
> client can recall recent events


The answer to "every" (as least this is what is claimed is true not only now, but in the future) question is "none, minor, moderate, Severe, or n/a". Also note that the questions/assessment categories are grouped together. For example, there are questions under the categories of: physical condition of the client, availability of shelter, hazards, sanitation of the home, etc.

Each answer has a point/percentage implication for the final "risk assessment" for that client. I'm not sure yet of the formulas, but I think the application may have to do things like: "for every question in this category of questions (say for all the "hazards" questions), give 5 for every "minor" answer, 10 for every "moderate" answer, 15 for every "sever" answer. Then add it all up.

Taking all that in, I was thinking that the following type of schema might make sense. But I'm nervous about it in light of this article. Note that this is bare-bones. I'm not trying to define the schema completely. I'm just exploring the idea of the basic set-up.

Table: QuestionList
qID - number for each question
qStartDt - when question becomes valid
qStopDt - when question is no longer valid
CategoryCd - link to another "lookup" table that lets us group questions according to a category
qDescription - the actual question (examples are given above)

Table: Assessments (the main table, with EAV)
ClientID - link to the client getting the assessment. I would expect a set of qID records for every client getting assessed.
qID - link to above table
AnswerCd - link to another "lookup" table that gives us the answers listed above (minor, moderate, etc). That lookup table would also include info for calculations.

This is a simplified version of what I would do, but the basic idea is there. It seems to me that this would be easy to query, do formulas, and not be such a bad performer. Also since the answers all follow the same format, I can do good data validation/restriction. But I do not have much experience with this type of design, especially with the amount of usage the design could potentially get.

Which brings us to this note. I'm not sure of the usage for this data. I would think that worse-case would put us at about 4 million records after 10 years. But that is only if the program ends up getting much wider usage than my little agency.

Thoughts?
Post #474304
Posted Tuesday, March 25, 2008 6:59 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, July 14, 2014 10:33 PM
Points: 85, Visits: 260
Aren't you just normalizing the question type out of the client-question-response ?

consider finding a client that has


have no Risk of a fall
have low Risk to client or caregivers from pets, livestock, etc.
have high home odors (gas, rotting food, etc.)
have no skin condition
have low access to adequate medical supplies
has no recall recent of events


Would you look for clients by their responses across survey's ?
If they didn't answer that question should that mean that they are still eligable to be returned if your search was only looking for clients with no risk of fall ?

I would not be scared of this type of normalisation with client,question and client-question-response covered in appropriate indexes and only a couple of million responses.

If you were proposing to have client as an normalised type out of say person I would still probably not.

But if Question,Person and survey were normalized types out of some entity table, and person-has-a-survey, survey-has-a-question and Person-takes-a-survey-question where normalized types out of some entity-relationship table then I would start to squirm.

If you know that the client takes a survey not a supervisor or an operator and that the survey is the only thing that has questions that you need to search for clients by. Then denormalising into the database foreign key relationship the survey-question-response increases the scalability possibilities, specifically the case of clients that have no risk of fall type search. And has low, if not no risk to the flexibility of the application.

Anyone else ?
Post #474458
Posted Wednesday, March 26, 2008 2:12 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 12:04 PM
Points: 266, Visits: 2,601
David: Thanks for your reply!

To answer one of your questions, "Would you look for clients by their responses across survey's?" - At the moment, I don't see that happening for particular questions, but you never know. At the moment, the main purpose of the survey would be to define a total risk level for the client. If the client's risk level is say "high", then the investigator/user would know to do certain actions to help the client. I also imagine the users wanting annual summary stats such as: How many clients last year had a risk level of low? med? high?

You do not have to explain, but I thought I would let you know that I didn't understand this part of your reply: "But if Question,Person and survey were normalized types out of some entity table, and person-has-a-survey, survey-has-a-question and Person-takes-a-survey-question where normalized types out of some entity-relationship table then I would start to squirm."

As for this part: "If you know that the client takes a survey not a supervisor or an operator...", I'm not sure I understood your full point, but I can comment on this one part. The clients themselves will never actually be filling out the "survey". Instead, the operator (our investigators/staff) will decide if a survey/assessment is required. If so, then a completely survey will likely be set up for that client, by which I mean a stored proc will likely be called to generate a record for every active question and attach to that client. The default answer for each question would be null or "not applicable", and the user would just change the answers for the questions that apply. For now, I think that set-up would allow me to create the most user-friendly interface.

Thanks again for your interest.
Post #475076
Posted Wednesday, March 26, 2008 2:47 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 12:04 PM
Points: 266, Visits: 2,601
David: I've been thinking more about your reply. Particularly, I've been caught on your first sentence: "Aren't you just normalizing the question type out of the client-question-response?"

That's such a good point. So, is my proposal good normalization or EAV? At what point is something EAV vs good normalization? I understand that the answer relates to the content/subject of the data. But I think it is worth thinking about.
Post #475091
Posted Wednesday, March 26, 2008 7:21 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, July 14, 2014 10:33 PM
Points: 85, Visits: 260
I would suggest that the EAV is where the types are taken out of the database entirely and only the application context would be able to decipher the significance of what entity type id x means.

Which is fine when the application doesn't want to 'view' the data sorted by something other than the order that IS in the data model.(eg clientid order). Getting the answers for a customer; doesn't need the question types in the schema and has the advantage that if the customer has another set of 'typed' data that needs to be stored that there is no schema level change. (or change to SQL to extract for this customer survey)

For example entity response type id 1-32 could be the original question responses that the application stored about the client. You could just as easily create response entity ids 33-66 that store a physicians responses to a different set of questions that had a contextually different meaning determined by the application (but about the same client). This would be to say that the database schema did not describe the applications data model or rather the bit of it where 1-32 means client responses and 33-66 means physicians and the application / stored procedure could then extract the two contextually different data sets internally knowing by id what the responses meant.

Great so we have just allowed for the application to have the flexibility to change its data model without any change to db schema. This is at the cost of needing to recreate the part of the data model that is not in the schema each time we present it logically to the user. For simple parts like a flat list there is little effort or even no schema to be reconstructed in producing a flat list for a specific range of a larger flat list. For more abstracted types the reconstruction becomes non trivial.

As you add more and more abstracted types (and layers of abstraction) you end up reducing the selectivity of the index (and increasing the number of seeks by the level of abstraction). For example the selectivity of the and client-question-response index by question id. At some number of layers of abstraction and amount of data being indexed the work reconstructing the actual types becomes measurable. I doubt a single abstracted type is going to do anything measurable.

An EAV becomes a target for a change to a indexed denormalized relational structure with schema defined types (tables) similar to your proposal, if the application ends up with time critical grouping demands on the E's for a large number of V's in a grouping order very different to the order indexed (or initial grouping). Ie the ability to view clients by their id is as important as clients by their response to question 1 a survey done in 2005.

Alternatively if you are happy that the relationships between the client type the question type and the survey type are unlikely to change then putting it into the db schema only adds the risk of needing to change the schema (as well as the application change which would be mandated anyway) if these relationships change in their actual as use.


Post #475173
Posted Thursday, March 27, 2008 9:27 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 12:04 PM
Points: 266, Visits: 2,601
David: Thanks again for your reply. You did a good job of laying out a decision framework / set of criteria to use for making these kinds of decisions. Nice.
Post #475517
Posted Sunday, March 30, 2008 4:10 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, February 2, 2013 8:21 AM
Points: 283, Visits: 268
This is exactly what I was talking about in my previous post. You might be able to represent this type of Client-Question-Response using an EAV, but you'll find it's better to have a good relational model with "limited" EAV-ness. The only EAV portion I can see in your model would be the Client "Answers" or "Responses" table.

Basically there are two pieces:

1.) The storage model for the Surveys, Questions, and Answers:

Surveys (table)
Questions(table) - this table stores the Question, type of Question (multiple-choice, true / false, or written). When the question type = "multiple choice" then the Answers table will be used. You could also provide a "Score" column here that would represent the overall points associated with this question if it had more "weight" than others (in the situation where this is a survey type of "Test / Exam"??).
Answers (table) - this table stores the answers, their sequence (so they can be re-ordered) and any other info about them. This table is probably only used for multiple-choice questions. You could also provide a "Score" column in this table if you're trying to derive a total "risK assessment" type of survey.

2.) The storage model for the Surveys, Clients, and their Responses / Answers. In this side of the design, you are sort of using an EAV on the "ClientAnswers" because you're not sure that the "Value" column is storing an answer to a True/False or a Multiple-Choice question.

Surveys (table) - same as above
ClientSurveys (table) - stores which surveys the client has taken. Maybe a total score if you're not always summing the score from the ClientAnswers.
ClientAnswers (table) - stores the answer the client chose for the particular question in the survey. This is a child of ClientSurveys. In this table, the "Answer" (or "Value") column should probably be a Sql_Variant. This way you can either store the Value from one of the multiple-choice answers (an Int usually) or the "T" or "F" as a Bit data type.

Actually, I'd have two valid "Value" columns (Value Sql_Variant NULL, and ValueText (Text or VarChar(MAX) NULL). This way if one of the question is a "written answer" then the storage or the answer will be more efficient because only for a limited number of records will the Text or Varchar(MAX) column be used.

I hope I haven't confused you too much...



Post #476738
Posted Monday, March 31, 2008 9:12 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 12:04 PM
Points: 266, Visits: 2,601
tymberwyld: Thank you for taking the time to reply. So much to think about....
Post #477051
Posted Wednesday, April 30, 2008 8:32 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, August 7, 2008 6:46 AM
Points: 1, Visits: 11
Interesting discussion. I certainly see where EAV design is effective for front-end systems which expect to receive large numbers of unknown values from multiple sources or when the query objectives are for display of a single response set (to display a risk score on the screen, for example). The time and resource savings for redesign of data models are evident. Great for "getting data in". However, what this practice really does is delay the integration decisions to a later step in the process.

We data warehouse developers struggle with name value pair design in our front end (source) systems. One of the primary purposes of a good data model is communication of the business relationships among the logical entities about which the business has an interest. The EAV model cannot be used as such a communication tool. On the data warehousing side and the downstream reporting side the intent is to make the data available to end report users and decision makers in an unambiguous easily queryable form that is tuned to return millions of rows in aggregated form. We are tasked with analyzing and dissecting the value pairs and creating a logical business model that is representative of the integrated content. Its really a case of pay now or pay later.

We took an approach in our datamart (star schema) where each fact record represented an assessment and had a FK to our question table for each question. This of course reduced the number of storage rows by raising the grain from a single response to a set of responses, but had the negative and limiting effect of creating multiple joins, which at 24 currently is slowing performance not to mention confusing our OLAP Reporting tool developers.
We will now create a view that performs the the joins for them, but I am still searching for an approach for the survey problem that combines reporting performance and business clarity with the lowest possible storage requirements. Look forward to T.Asfaw's second article and hope you will address the impact of EAV design on the downstream Data Warehousing and Reporting arenas.
Post #492983
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse