Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1617181920»»

Lookup Table Madness Expand / Collapse
Author
Message
Posted Sunday, July 23, 2006 10:17 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

You are absolutely correct, the Business Model rules because it is the model that contains all of the business requirements.

The Data Model models the data about which we describe the Business Model, from which the database is derived.

The Application Model, however does not describe the business nor the data, merely the business requirements as to how the data is processed.  However, how do you model the process until you have all of the inputs, the business requirents and data structures?

To often, the Data Model (if there is one at all) is constructed or derived only after the application process has been constructed.  It is this that I contend that is backwards.

As for your clinical trials, clearly you have provided insufficient information from which to derive any meaningful discussion.  However, as far as what you optimize, optimization is last.  You model the business; you model the data, for integrity because without it, the storage is meaningless, even if optimal, I assume, for speed.

The whole point of normalization is for the reduction of data anomolies, from which the data maintains its consistency.  Without this, the rest is fruitless.  A side effect of normalization is performance optimization.  Now, this is argued against quite often due to the JOINs that must take place.  However, if you consider all of the performance issues, not just the JOINs, but include the time necessary to maintain the integrity of the system, you will see that the normalized database is more efficient.

I really do not believe either one of us will agree with the other on this point, as you have brought up your clinical trial data before.  However, you have yet to provide the structures of this data, either as you have designed it, or as we have suggested.

Sincerely,

Anthony Thomas

 




Post #296615
Posted Monday, July 24, 2006 6:47 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 11:43 AM
Points: 15,446, Visits: 27,837

May as well flog this horse skeleton into dust....

"Good" & "Correct" in terms of data integrity within a relational database system are absolutely not in the eyes of the beholder. It's a Yoda moment when you get down to data integrity, do or do not.

That's one of the main reasons that MUCK tables can, do, and will, fail so miserably. While they're "flexible" in terms of development, they will make the data in the system completely meaningless in very short order. That's experience from two stand points speaking. First, I build my own MUCK system about seven years ago. It was so cool when it went out the door. Then the maintenance headaches started. Second, I've inherited a MUCK system where I now work that is almost completely disabled. It's so flexible, it can't be touched for fear it will break the code. The data that was supposed to be infinitely flexible has become a maintenance nightmare. The problem is, when anyone can put anything they want into the system in any way that they term "right" you can't maintain data integrity. So again, it's do or do not, there is no MUCK.



----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #296670
Posted Monday, July 24, 2006 8:15 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 10:33 AM
Points: 132, Visits: 111

Oh, I absolutely agree, the data model comes first. It is the heart of any system. The "application model" comes on top of that. No arguments there.

The reason why I couldn't show you the data models of the clinical trials is the sheer size of the metadata. 250 * 35, say, distinct table structures--do you want to go through that?

Within each individual trial, the design is generally fine. It comes when you want to create reports, etc. that cut across trial lines. Maybe I wasn't clear on that point. The maintenance nightmare, along with incredibly onerous and stupid validation requirements, really calls for those otherwise despised MUCK tables.

I bring up the clinical trial environment because it is an outlier. It's like asking which way a clock turns--it depends on your perspective!

Post #296733
Posted Monday, July 24, 2006 10:56 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

It sounds like you've got a pretty good handle on the operatioanl components, but the drudgery of the reporting is what you are finding the MUCK solution for.

I'm not sure how long you retain this information (I suspect for some time), but what you are describing certainly sounds like a warehouse scenario.

Do you do distributed queries against all of your operational databases, or do you go through some sort of ETL to a centralized, long-term repository?

Even in this environment, I have found that the RM is a better methodology.  Part of the headache for a warehouse, however, is the "T" in ETL: trying to define the common data elements.

Also, you never report off of your warehouse; you create Data Marts to tailor the analysis.

Sincerely,

Anthony Thomas

 




Post #296826
Posted Monday, July 24, 2006 11:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 10:33 AM
Points: 132, Visits: 111

"I'm not sure how long you retain this information (I suspect for some time), but what you are describing certainly sounds like a warehouse scenario."

You need to retain clinical data for 20 years, by law. 

It was indeed for a clinical data warehouse, I believe the very first one ever attempted.

"Do you do distributed queries against all of your operational databases, or do you go through some sort of ETL to a centralized, long-term repository?"

Me and my buddy wrote a dynamic ETL tool to move the data to a centralized repository. We read the metadata repository and a metameta configuration table to create gigantic SELECT statements, which we saved as an Oracle view.

The SELECT statement: the ultimate ETL tool.

 

Post #296844
Posted Monday, July 24, 2006 12:01 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

Well, that could certainly spawn off into a whole other argument: using relational or dimensional warehouse structures.

E-gad.

 

Anthony Thomas

 




Post #296849
Posted Monday, July 24, 2006 12:45 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 577, Visits: 102
"That said, "good", "correct" is in the eye of the beholder. If you read back a ways in this thread, you'd read where yes, development time is much more important optimization than sheer speed in some application spaces. In the clinical trial world, if a routine takes 10 or 20 or 30 minutes, it usually doesn't matter a bit. However, if the developer wastes a day or two optimizing those 20 minutes away, that's not good."
 
E-gad is right.  You know, the scary thing is that we're talking about software that could concievably end up KILLING PEOPLE, and it's more important that development happens quickly instead of accurately?  'Cause referential integrity is what we're really discussing, not performance.  No personal attacks meant here, but you're freakin' me out, man.
 
"OK, lemme ask you. There are 250 clinical trials, each heterogensously structured, each with about 30-40 different table structures. The analysis software is run maybe 3 or 4 times in production. The analysis can easily be run in the background as the user does other things."

Yeah, this really isn't enough information, but as far as I'm concerned a database can have thousands of tables or ten, but it's a pile of crap if referential integrity is not maintained.  If you do want to pursue this design with the community, maybe another thread is in order?
 
 
 
 


Signature is NULL
Post #296864
Posted Monday, July 24, 2006 12:59 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

Once, we had a round-table database design question with the Senior IT leadership, based on this exact topic, data integrity and RI versus the development effort surrounding a MUCK design—yes, the "discussion" escalated that high.

In that meeting, the Enterprise Architect actually stated, "Speed of delivery is more important than accuracy."  In the same room, at the same table as the Directors of Data Center Operations and Information Services.

At that point, the discussion ended—what was the point of continuing a "rational" discussion in the face of such obsured stupidity.

Later, he recanted, of course.  But, this goes to show you just how far the "prevailing" wisdom travels.

Sincerely,

Anthony Thomas

 




Post #296871
Posted Monday, July 24, 2006 1:12 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 10:33 AM
Points: 132, Visits: 111

"E-gad is right.  You know, the scary thing is that we're talking about software that could concievably end up KILLING PEOPLE, and it's more important that development happens quickly instead of accurately?  'Cause referential integrity is what we're really discussing, not performance.  No personal attacks meant here, but you're freakin' me out, man."

Let me be clear about something. Within each clinical trial, referential integrity is extremely important. Most of the code written to support a clinical trial that is not statistical analysis is devoted to data validation. Lots of referential integrity is built into the databases created, then lots more batch applications on top of that. Lots and lots and lots of resources were spent on referential integrity within a single trial. I'm sorry if I conveyed another impression.

The problem I was dealing with was when the users wanted to run reports over several different trials. Every researcher wanted to think outside the box and they all used different structures to hold the same exact data. Different table and column names, datatypes, and sizes, different code lists, etc. Bringing that all together to run cross trial, let alone cross compound reports, was a monumental task.

Post #296876
Posted Monday, July 24, 2006 1:47 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 577, Visits: 102
"Within each clinical trial, referential integrity is extremely important."

OK, good; it wasn't clear from your previous posts. If you go back over the forum posts for this topic, you'll see the entire discussion was around referential integrity. The assumption was you were continuting this discussion.

I guess that's incorrect, though; what you are talking about is Data Warehousing. It has it's own set of challenges that have nothing to do with referential integrity, and really don't belong in this discussion about a MUCK table.

Standardizing dimensional data from multiple sources can produce structures SIMILAR to MUCK, but they are a whole 'nother ball game related to OLAP instead of OLTP. IE: you're OLAP schema may be completely different than your OLTP schema, as their function is completely different.

But you probably already know this...

cl


Signature is NULL
Post #296879
« Prev Topic | Next Topic »

Add to briefcase «««1617181920»»

Permissions Expand / Collapse