Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1617181920»»

Lookup Table Madness Expand / Collapse
Author
Message
Posted Monday, July 24, 2006 6:47 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 2:08 PM
Points: 15,326, Visits: 29,511

May as well flog this horse skeleton into dust....

"Good" & "Correct" in terms of data integrity within a relational database system are absolutely not in the eyes of the beholder. It's a Yoda moment when you get down to data integrity, do or do not.

That's one of the main reasons that MUCK tables can, do, and will, fail so miserably. While they're "flexible" in terms of development, they will make the data in the system completely meaningless in very short order. That's experience from two stand points speaking. First, I build my own MUCK system about seven years ago. It was so cool when it went out the door. Then the maintenance headaches started. Second, I've inherited a MUCK system where I now work that is almost completely disabled. It's so flexible, it can't be touched for fear it will break the code. The data that was supposed to be infinitely flexible has become a maintenance nightmare. The problem is, when anyone can put anything they want into the system in any way that they term "right" you can't maintain data integrity. So again, it's do or do not, there is no MUCK.



----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of:
SQL Server Query Performance Tuning
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #296670
Posted Monday, July 24, 2006 8:15 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, March 31, 2015 1:15 PM
Points: 134, Visits: 133

Oh, I absolutely agree, the data model comes first. It is the heart of any system. The "application model" comes on top of that. No arguments there.

The reason why I couldn't show you the data models of the clinical trials is the sheer size of the metadata. 250 * 35, say, distinct table structures--do you want to go through that?

Within each individual trial, the design is generally fine. It comes when you want to create reports, etc. that cut across trial lines. Maybe I wasn't clear on that point. The maintenance nightmare, along with incredibly onerous and stupid validation requirements, really calls for those otherwise despised MUCK tables.

I bring up the clinical trial environment because it is an outlier. It's like asking which way a clock turns--it depends on your perspective!

Post #296733
Posted Monday, July 24, 2006 10:56 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

It sounds like you've got a pretty good handle on the operatioanl components, but the drudgery of the reporting is what you are finding the MUCK solution for.

I'm not sure how long you retain this information (I suspect for some time), but what you are describing certainly sounds like a warehouse scenario.

Do you do distributed queries against all of your operational databases, or do you go through some sort of ETL to a centralized, long-term repository?

Even in this environment, I have found that the RM is a better methodology.  Part of the headache for a warehouse, however, is the "T" in ETL: trying to define the common data elements.

Also, you never report off of your warehouse; you create Data Marts to tailor the analysis.

Sincerely,

Anthony Thomas

 




Post #296826
Posted Monday, July 24, 2006 11:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, March 31, 2015 1:15 PM
Points: 134, Visits: 133

"I'm not sure how long you retain this information (I suspect for some time), but what you are describing certainly sounds like a warehouse scenario."

You need to retain clinical data for 20 years, by law. 

It was indeed for a clinical data warehouse, I believe the very first one ever attempted.

"Do you do distributed queries against all of your operational databases, or do you go through some sort of ETL to a centralized, long-term repository?"

Me and my buddy wrote a dynamic ETL tool to move the data to a centralized repository. We read the metadata repository and a metameta configuration table to create gigantic SELECT statements, which we saved as an Oracle view.

The SELECT statement: the ultimate ETL tool.

 

Post #296844
Posted Monday, July 24, 2006 12:01 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

Well, that could certainly spawn off into a whole other argument: using relational or dimensional warehouse structures.

E-gad.

 

Anthony Thomas

 




Post #296849
Posted Monday, July 24, 2006 12:45 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 576, Visits: 102
"That said, "good", "correct" is in the eye of the beholder. If you read back a ways in this thread, you'd read where yes, development time is much more important optimization than sheer speed in some application spaces. In the clinical trial world, if a routine takes 10 or 20 or 30 minutes, it usually doesn't matter a bit. However, if the developer wastes a day or two optimizing those 20 minutes away, that's not good."
 
E-gad is right.  You know, the scary thing is that we're talking about software that could concievably end up KILLING PEOPLE, and it's more important that development happens quickly instead of accurately?  'Cause referential integrity is what we're really discussing, not performance.  No personal attacks meant here, but you're freakin' me out, man.
 
"OK, lemme ask you. There are 250 clinical trials, each heterogensously structured, each with about 30-40 different table structures. The analysis software is run maybe 3 or 4 times in production. The analysis can easily be run in the background as the user does other things."

Yeah, this really isn't enough information, but as far as I'm concerned a database can have thousands of tables or ten, but it's a pile of crap if referential integrity is not maintained.  If you do want to pursue this design with the community, maybe another thread is in order?
 
 
 
 


Signature is NULL
Post #296864
Posted Monday, July 24, 2006 12:59 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, February 11, 2014 7:59 AM
Points: 354, Visits: 67

Once, we had a round-table database design question with the Senior IT leadership, based on this exact topic, data integrity and RI versus the development effort surrounding a MUCK design—yes, the "discussion" escalated that high.

In that meeting, the Enterprise Architect actually stated, "Speed of delivery is more important than accuracy."  In the same room, at the same table as the Directors of Data Center Operations and Information Services.

At that point, the discussion ended—what was the point of continuing a "rational" discussion in the face of such obsured stupidity.

Later, he recanted, of course.  But, this goes to show you just how far the "prevailing" wisdom travels.

Sincerely,

Anthony Thomas

 




Post #296871
Posted Monday, July 24, 2006 1:12 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, March 31, 2015 1:15 PM
Points: 134, Visits: 133

"E-gad is right.  You know, the scary thing is that we're talking about software that could concievably end up KILLING PEOPLE, and it's more important that development happens quickly instead of accurately?  'Cause referential integrity is what we're really discussing, not performance.  No personal attacks meant here, but you're freakin' me out, man."

Let me be clear about something. Within each clinical trial, referential integrity is extremely important. Most of the code written to support a clinical trial that is not statistical analysis is devoted to data validation. Lots of referential integrity is built into the databases created, then lots more batch applications on top of that. Lots and lots and lots of resources were spent on referential integrity within a single trial. I'm sorry if I conveyed another impression.

The problem I was dealing with was when the users wanted to run reports over several different trials. Every researcher wanted to think outside the box and they all used different structures to hold the same exact data. Different table and column names, datatypes, and sizes, different code lists, etc. Bringing that all together to run cross trial, let alone cross compound reports, was a monumental task.

Post #296876
Posted Monday, July 24, 2006 1:47 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 576, Visits: 102
"Within each clinical trial, referential integrity is extremely important."

OK, good; it wasn't clear from your previous posts. If you go back over the forum posts for this topic, you'll see the entire discussion was around referential integrity. The assumption was you were continuting this discussion.

I guess that's incorrect, though; what you are talking about is Data Warehousing. It has it's own set of challenges that have nothing to do with referential integrity, and really don't belong in this discussion about a MUCK table.

Standardizing dimensional data from multiple sources can produce structures SIMILAR to MUCK, but they are a whole 'nother ball game related to OLAP instead of OLTP. IE: you're OLAP schema may be completely different than your OLTP schema, as their function is completely different.

But you probably already know this...

cl


Signature is NULL
Post #296879
Posted Monday, March 26, 2007 8:19 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, May 21, 2015 6:22 AM
Points: 26, Visits: 207

I expect articles to be somewhat subjective and enjoyed the author's opinions. I prefer not to mix lookup codes strictly from the FK's perspective.

The author mentioned a dislike of the term "lookup" and other similar terms. I am curious what the author's opinion and other reader's opinions are of the technically correct term, as well as examples of good "lookup" table names.

For example, if I have a Cars table and the ColorCode column looks up to another table with the code and description, what would you name the table? I typically go with the much disparaged table qualifier, e.g. ddColor, where dd means "DataDictionary" so that all my lookup tables are grouped together.

I always experience inner torment when coming up with good table names and was just wondering what others do ...

Post #353850
« Prev Topic | Next Topic »

Add to briefcase «««1617181920»»

Permissions Expand / Collapse