The Daily Lookup

  • Over at the Daily WTF, I saw the database diagram above and it caught my eye, as it probably caught yours.

    Apparently in this entry the guy looking over the database found that there was a lookup table for dates. And every table used a DateID to "look up" the date rather than storing the date. I've heard of over normalizing, but I think this takes the cake.

    I caught a link to this from a review of a SQL Book. It wasn't a great review and the comments on the review were what got me thinking. Why do so many bad designs seem to exist out there?

    I'm not formally trained. I didn't apprentice, though I bothered more than my share of experts with questions. I didn't get a degree in databases. I read a lot, asked people why they made decisions, and basically learned what worked well and what didn't on the job. I've built schemas that I later thought required too many joins and made decisions that required work to undo.

    But mostly those were inconvenient designs. Not aggregeous errors in judgement that most DBAs would question immediately.

    Most of you who have read this for awhile know I like to poke fun at developers who dabble in SQL. It's all in fun and I know that there are lots of developers out there who can write very good queries and would never design anything like this.

    But there are many more who can't. And that's part of the problem. Just as baseball has seen pitching get diluted through expansion, so have the quality developers been diluted by everyone who built a web page or Hello World seeking to make a good living in software.

    Hacking together code is fun. We all do it at one time or another. But bad SQL and schema will tend to outlast bad code any day. Do us all a favor and mentor someone. Be polite, courteous, and respectful, but help those inexperienced folks learn a thing or two about why we make our schema decisions.

    Don't give a man a query, teach a man how to query 🙂

    Steve Jones

  • Amen.

    I've done application development and I've done DB design / SQL development.  I've had projects where I've done it all and projects where I was responsible for exclusively one or the other.  There is a very definite difference and I'm seeing more and more shops recognizing this.

    Currently I'm consulting for a large company that used to give it's programmers free reign to develop SQL queries.  I'm here to clean up their years of bad/misinformed stored procedures and to establish SQL coding practices.  It is not pretty at all.

    Still and always learning...

  • It is an intriguing database design, but I wouldn't want to dismiss it out of hand as wrong without knowing a lot more about the application. I'm interested in the other columns in the table. The name of the data and the date description in particular.  What business process, one wonders, led to this sort of design? Even if there is some esoteric reason for applying unique date descriptions to reservations, I suspect that the table is misleadingly named.

    On the general point of programmers being given free rein to design databases and write SQL, I'd like to encourage the practice wherever possible as I derive a good living from being called in to put things right when the inevitable problems eventually surface. You can't assume that someone trained to think in procedural terms has the skills to operate in an environment which is intrinsically set-based and parallel. The skills don't necessarily transfer. Those who can write both procedural code and SQL generally are well aware of the great difference.

  • I think the comments on the WTF are missing the point mostly. There are very good reasons to look up dates in a table. I use it all the time in Datawarehouse design. MS SQL's datetime field leaves a lot to be desired.

    If you don't need the full date and time (mostly I'm building things concerned only with the date), then I convert the datetime to an an integer in my main fact table and look that up in a Date Dimension table. (actually, you have to convert it to decimal, and take the integer part using floor())

    The Date Dimension table has lots of fields prepopulated related to that the date, such as the financial year, weekday or weekend, bank holiday (public holiday in the US), Year and Month etc, so that I don't have to do intensive conversions on the datetime field for millions of records in my reports. This is fairly standard in a data warehousing environment where a) you have a LOT of records and b) it's pretty much a snapshot, i.e. not constantly changing.

    The mistake the builder of this design made was simply relying on a manual process to update the Dimension table. A good developer would have ensured that it was populated from the main fact table.

     

  • I'm tend to agree with Chris Smith.

    Such design bad for OLTP, but in most cases good for OLAP.

    Not for OLAP exactly, but for database containing fact tables for OLAP DB.

  • I may be missing something, but the argument this design could be about warehousing data strikes me as improbable. In a data warehouse, that table's rows would have to be static to avoid historically affecting the data. For example, the date (day-month) may not be a holiday now, but what if it becomes a holiday? I do see the argument, but my point is that rather than store a multi-dimensional date like that, wouldn't it be wiser to perform a logical process when the data was committed, and store the hints for the date in the actual data tables where that knowledge would be directly accessible to queries without a join? It would also mean that once committed, the tables would easier to view without a link.

    The design also fails for another basic reason, though this is just my humble opinion. While business logic may be stretched to explain the need for some tables, others that are referencing it certainly don't need to know holiday status, etc. At what stage in reality would it be relevant to know the account was activated on a holiday? Or cancelled on one? Excessive joins just to "normalize" are still excessive.

    My first thought on seeing this was that the design necessitated such hints for one particular field, and the designer allowed a vague understanding of normal rules to warp their use of the table.

    As for the datetime fields being overkill for some things...very true. It's why a character representation sometimes works just fine, I suppose.

    I would love to know what the database was supposed to do, and what it looks like in its entirety just for chuckles.

  • Going back to Steve's original point: That, generally speaking, many developers simply don't know how to design good databases; I present the following real-world example of bad design. Or, what I consider bad design. I'm sure many of you would tell me why this isn't.

    I like to call this: Taking the R out of RDBMS

    I was on a conference call for a new application my team is going to support. The designer/developer was talking and we were going over his documentation. While he was talking, I got an email from the project manager, a former SQL developer himself, who asked me to "look at the ERD".

    There were six tables in the application; it's a small app. There were no foreign key relationships even though all 6 of the tables were related. The developer said he would handle all the DI programatically.

    4 of the tables had a column with the exact same name but different data types. It was a char(4) in one, a char(5) in another, a char(6) in another, and a char(7) in the last. You are right to assume that the smaller ones are substrings of the longer ones. For example, the "master" value may be "A123B12". The first related sub-value would be "A123B1", the second is "A123B" and so on. Yet, the names of the fields didn't reflect that. I would expect columns of the same exact name in related tables to have a FK relationship.

    I sent back my questions to the PM.

    His response: "Good, I thought it was just me."

     

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • This is one of the comments I agreed with in the WTF thread... I design/support large Data Warehoues and follow Ralph Kimball's Dimensional Model. I use date tables and it's worked great for us. I'm not a regular at WTF, maybe I missed the point as to what they were poking fun at.

    QUOTE STARTS HERE

    This has been a great thread ... Let's try to classify the (non-sarcastic, non-joking) responses so far:

    * Database "newbie": "Date tables are stupid. Who would create a table full of dates for any purpose?"

    * Experienced database admin/developer: "Date tables can be very useful, but creating integer DateID's to represent all dates throughout your system is not a good design and the column names are either silly (DateDate) or vague (what does the 'Name' or 'Description' of a date represent?). Plus the fact that this date table is not widely known and the UI does not provide a way to easily update or view this table apparently."

    * Data Warehousing "expert": "Data warehouses use ID's to represent entities, so this looks good to me. Anyone who knows anything about databases knows that *everything* must be represented by an 'ID'."

    * WTF artist: "What they should have done instead is ...." (Suggestions that are convoluted and even worse .... triggers ???? user defined functions ????)

    * Ignorant Script Kiddie: "This bad design is the fault of Micro$oft / normalization / VB / Access / Republicans / Aliens / etc ....

    QUOTE ENDS HERE

  • The previous post is spot on .. this bad design is definitely the fault of the Republicans ...

  • I've actually implemented a structure similar to this, and it is surprisingly powerful for Warehouse querying.

    Imagine you get this query (like I did) "I need to know the average number of tickets we get on days that the company is on Holiday" (therefore Helpdesk has a skeleton staff, who is getting double time and comp days).

    If all your tickets just have a date field, you have to go through some very complicated logic to determine when the holidays were, or have some "Holiday" table storing the dates of every holiday, then you have to do an (expensive) join to that table (if it's even possible...if you've got a dateTIME stamp, then the ticket that came in at 11:30 won't line up with the date holding a 00:00 stamp by default.)  Your query will take a long time, and you won't be certain of the results.

    Then add in the complication of a multiple platform environment, with multiple different concepts of what a "Date" field really is...and it's chaos.

    Instead, you have a table that has information about any given day (what day of the week was it?, was it a holiday?  Was the office closed because of natural disaster? etc, etc, etc).  When building the "ticket" table, a simple lookup creates the key, and now a query is as simple as (pseudocode) 

    SELECT date, count(tickets) FROM TICKETTABLE INNER JOIN DATETABLE ON TICKETTABLE.DATEKEY = DATETABLE.DATEKEY WHERE ISHOLIDAY = 1

    SQL Server processes it quickly and effeciently, and spits your answer out instantly.  The analyst is impressed, you get a raise, and you win!

    (and yes, all of that happened to me).

    So no, this structure isn't all that dumb.  You just didn't spend enough time asking the question "What will they be asking this information to do for them?"  Normalization isn't a bad thing.

  • While I feel the ID-based joins are a little over-the-top, I have used date tables in two locations for years.  One, on the OLAP side, to reduce processing time in favour of quick retrieval of information such as day-of-week, month name, etc.  Two, in any system with a shifting fiscal week which must be linked to the calendar date at any time.

    My company uses a fiscal calendar that must contain an extra week every 5 years or so to be in sync with our corporate head office.  This means that the fiscal date entries cannot be simply "calculated" in a function on the fly.  Maintaining a date table in this case greatly increases speed and accuracy for us.

    Regards,
    Michael Lato

  • In reply to Grasshopper "I'm not a regular at WTF, maybe I missed the point as to what they were poking fun at." I am a regular and sometimes the WTFs are clearly immensely idiotic and the site does serve (in a roundabout way) to improve standards.

    The problem seems to me to be that there are several things wrong with the design, but a lot of people seem to have got hold of the wrong end of the stick. The lookup from DateID seems to have been the thing that many people where poking fun at as being 'stoopid' or taking relational data a bit too far, whereas clearly that's not NECESSARILY bad and for an OLAP dimension table can be very good - as long as it's handled correctly.

    The original developer didn't ensure that there would always be a corresponding value for his DateIDs - this is what caused the original error. Also his naming conventions are clearly pretty poor, but beyond that it's easy to criticise without knowing all the facts.

     

  • quote...I like to poke fun at developers who dabble in SQL...

    Hey, I resemble that remark

    A developer that had to write sql out of necessity and then having to design databases (without training, or a safty net  , I might add)

    Laugh, they think I'm a DBA

    Far away is close at hand in the images of elsewhere.
    Anon.

  • While I do understand and agree that of course an OLAP db will have a date table, we are missing the point here. This is clearly a transactional system and is clearly using the date table as if they were code values. It's a bad design. It's OK to poke a bit of fun at it.

    With the exception of the (pretty interesting) need for an adjustable fiscal week or something similar, who in their right mind would build a database with dates as a lookup value?

    This does run into a problem that I've seen a lot of. Developers who are otherwise incredibly intelligent and well informed can make horrible decisions when it comes to databases. I've had to deal with developer designed systems that had no FK constraints at all, UDF's that call UDF's that call UDF's (5 levels deep and everyone is scratching their heads wondering why the database is so slow, must be those dba's fault), the one true lookup table, multiple generic columns with data types determined by values set in other columns in the table, but that can all be edited independently with no constraints on the data in the database... insert your own horror story here. It's not that developers are stupid or inherently incapable of doing this type of work. The issue is, this type of work is a specialty which requires some time & experience to develop the requisite knowledge. That's not even counting the mistakes and poor choices that I get to toss into the mix.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • My humble opinion is that relational database design is a lot harder than people think. It is even harder than a lot of developers think. Perhaps this is due to the ease of use that Microsoft Access has provided over the years to desktop users. I think that the genuine ease with which one can create queries and reports in Access has misled many developers -- and even many end users -- into thinking that designing the databases is just as easy.

    But designing a database is not about using an easy GUI but about understanding the relationships between the various kinds of data used to accomplish a task. And even once someone understands that part of it, the design has to be properly expressed in the "language" of relational database design -- normal forms, etc. Moreover, there is also a lot of judgment required to correctly decide how much security and data validation to put at the database level versus the application level. Generally I bet most readers here would err on the side of the database level, but those are the kinds of decisions made incorrectly by people who think database design is just about linking boxes and crow's feet.

    I certainly have a way to go myself in these areas, but, to paraphrase the wisdom of old, I know what I don't know. I can't presume to comment on the ERD shown in the picture, but given that dates are not a typical candidate for lookup tables, it is worth a closer look at the design, and more importantly a look at the rationale for the design.

    Finally, I agree with the maxim "Don't give a man a query, teach a man how to query." But I also think that a query is only as good as the database design it is querying.

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply