Which table design is better ?

  • Hi,

    I have a scenario where a worker can do multiple training courses each year (max 4).

    Is it better to create a table this way:

     

    WorkerID |Year |Courses
    Worker01 |2021 |CourseCode01
    Worker01 |2022 |CourseCode01
    Worker01 |2022 |CourseCode02
    Worker01 |2022 |CourseCode03

    Or this way:

     

    WorkerID |Year|course1     |Course2        |Course3        |Course4
    Worker01 |2021|CourseCode01| | |
    Worker01 |2022|CourseCode01| CourseCode02 | CourseCode03 |

    Thank you

    • This topic was modified 1 month, 1 week ago by  Johnson330.
  • Depends on what you will do with it. Do you quickly need to answer what is the second course Worker01 took in Year Y?

    Usually the first form WorkerId Year CourseCode as it is more flexible. If you need what is the second course worker x took in year Y, you'll need some counter-field

    WorkId Year CourseCode Counter

     

  • As Jo mentioned, it depends on your requirements

    I would most likely use the following

    tb_Worker

    WorkerID

    FName

    LName

    -- Other Worker Attributes

    tb_Course

    CourseID

    CourseDescription

    -- Other Course Attributes

    tb_WorkerCourse

    WorkerID

    CourseID

    DateCompleted

  • I'd personally recommend using the normalised approach, with 1 Course column. If you needed the data in the denormalised format (too), then you could use a VIEW with conditional aggregation to put the values into (up to) 4 columns.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • You should only ever use the first method, period, no matter what your current requirements are.

    The second approach would be a maintenance and query nightmare.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • First option is the correct relational way to design it.

  • Consider, for example, with design #1, a unique constraint will insure that the same course is not entered twice for the same year.

    But for design #2, you'd have to insure yourself that Course1, Course2, etc., don't duplicate any other CourseCode.  Just try writing the CHECK constraint(s?) that would accurately enforce that basic restriction!

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher wrote:

    Consider, for example, with design #1, a unique constraint will insure that the same course is not entered twice for the same year.

    But for design #2, you'd have to insure yourself that Course1, Course2, etc., don't duplicate any other CourseCode.  Just try writing the CHECK constraint(s?) that would accurately enforce that basic restriction!

    And - as happens in almost every case - when you need to track 5 courses instead of 4 you have to add a new column and modify code everywhere to account for that new column.  Or - when you now want to only track 2 but have 4 columns you now have the issue of which columns are actually used and the end users will use columns 3 and 4 instead of 1 and 2...because they can.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • (Notes from the field: I worked in a place where they didn't know how to normalize... As a result, I was there for 8 months instead of maybe a week. It was Access, but you would run into the same querying problem. About the only thing you can do after the fact is use CROSS APPLY to create a view that returns a normalized result and query that, but why even put yourself through that?)

    Spend the time to normalize properly. Your future self will thank you, I promise.

    Of course, if you want to learn by "escarmiento" (the hard way), that's good too. Make a dummy database and create both tables. Then write down a bunch of questions you want to answer from the data and try writing the SQL for each. Proof is in the pudding.

    I was doing simple data summaries...

    1. "what's the severest grade for each symptom for each enrolled patient?"

    2. how many are there of each symptom/grade? (Count based on query 1)

    If you create the "Reported Symptom" table like this:

    CREATE TABLE PatientSymptom (PatientID INT NOT NULL, SymptomID INT NOT NULL, Grade TINYINT NOT NULL...);

    and then have a table of Symptoms, like this:

    CREATE TABLE SymptomDictionary (SymptomID INT NOT NULL PRIMARY KEY,

    Category VARCHAR(255) NOT NULL,

    Term VARCHAR(255) NOT NULL,

    Grade TINYINT NOT NULL);

    then this is a stupid simple query.

    Join the two tables, group by PatientID, Category, Term... get MAX(Grade)

    then count (wrap that query in another, and just do COUNT(*).

    With effectively repeating fields, it gets ugly fast. Requires CROSS APPLY to stack the repeating groups... BLECH.

    But since experience is a much better teacher than a lecture, try it yourself.  (Seriously, it's absolutely the best way to learn. turn on the timer stuff, (I think it's SET TIME ON), and see how fast your queries are. But remember to clear the cache between tests. (Kevin Kline has some really funny stories about that!).

    if you get stuck, post back. But take my word (or better, TRY IT!, learn from objective tests, not someone's opinion!)

    Happy learning!

    • This reply was modified 1 month, 1 week ago by  pietlinden.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply