Which table design is better ?

Question

Post reply

Which table design is better ?

Johnson330

Old Hand

Points: 345

July 5, 2022 at 8:33 am

#4058978

Hi,

I have a scenario where a worker can do multiple training courses each year (max 4).

Is it better to create a table this way:

WorkerID |Year |Courses
Worker01 |2021 |CourseCode01
Worker01 |2022 |CourseCode01
Worker01 |2022 |CourseCode02
Worker01 |2022 |CourseCode03

Or this way:

WorkerID |Year|course1     |Course2        |Course3        |Course4
Worker01 |2021|CourseCode01|               |               |
Worker01 |2022|CourseCode01| CourseCode02  | CourseCode03  |

Thank you

This topic was modified 4 years ago by Johnson330.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Jo Pattyn SSC-Dedicated Points: 32681 More actions · Answer 1

Depends on what you will do with it. Do you quickly need to answer what is the second course Worker01 took in Year Y?

Usually the first form WorkerId Year CourseCode as it is more flexible. If you need what is the second course worker x took in year Y, you'll need some counter-field

WorkId Year CourseCode Counter

DesNorton SSC-Insane Points: 24763 More actions · Answer 2

As Jo mentioned, it depends on your requirements

I would most likely use the following

tb_Worker

WorkerID

FName

LName

-- Other Worker Attributes

tb_Course

CourseID

CourseDescription

-- Other Course Attributes

tb_WorkerCourse

WorkerID

CourseID

DateCompleted

How to post data/code on a forum to get the best help.

Thom A SSC Guru Points: 99141 More actions · Answer 3

I'd personally recommend using the normalised approach, with 1 Course column. If you needed the data in the denormalised format (too), then you could use a VIEW with conditional aggregation to put the values into (up to) 4 columns.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ScottPletcher SSC Guru Points: 101244 More actions · Answer 4

You should only ever use the first method, period, no matter what your current requirements are.

The second approach would be a maintenance and query nightmare.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jonathan AC Roberts SSCoach Points: 18405 More actions · Answer 5

First option is the correct relational way to design it.

ScottPletcher SSC Guru Points: 101244 More actions · Answer 6

Consider, for example, with design #1, a unique constraint will insure that the same course is not entered twice for the same year.

But for design #2, you'd have to insure yourself that Course1, Course2, etc., don't duplicate any other CourseCode. Just try writing the CHECK constraint(s?) that would accurately enforce that basic restriction!

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeffrey Williams SSC Guru Points: 90347 More actions · Answer 7

ScottPletcher wrote:

Consider, for example, with design #1, a unique constraint will insure that the same course is not entered twice for the same year.
But for design #2, you'd have to insure yourself that Course1, Course2, etc., don't duplicate any other CourseCode. Just try writing the CHECK constraint(s?) that would accurately enforce that basic restriction!

And - as happens in almost every case - when you need to track 5 courses instead of 4 you have to add a new column and modify code everywhere to account for that new column. Or - when you now want to only track 2 but have 4 columns you now have the issue of which columns are actually used and the end users will use columns 3 and 4 instead of 1 and 2...because they can.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

pietlinden SSC Guru Points: 63686 More actions · Answer 8

(Notes from the field: I worked in a place where they didn't know how to normalize... As a result, I was there for 8 months instead of maybe a week. It was Access, but you would run into the same querying problem. About the only thing you can do after the fact is use CROSS APPLY to create a view that returns a normalized result and query that, but why even put yourself through that?)

Spend the time to normalize properly. Your future self will thank you, I promise.

Of course, if you want to learn by "escarmiento" (the hard way), that's good too. Make a dummy database and create both tables. Then write down a bunch of questions you want to answer from the data and try writing the SQL for each. Proof is in the pudding.

I was doing simple data summaries...

1. "what's the severest grade for each symptom for each enrolled patient?"

2. how many are there of each symptom/grade? (Count based on query 1)

If you create the "Reported Symptom" table like this:

CREATE TABLE PatientSymptom (PatientID INT NOT NULL, SymptomID INT NOT NULL, Grade TINYINT NOT NULL...);

and then have a table of Symptoms, like this:

CREATE TABLE SymptomDictionary (SymptomID INT NOT NULL PRIMARY KEY,

Category VARCHAR(255) NOT NULL,

Term VARCHAR(255) NOT NULL,

Grade TINYINT NOT NULL);

then this is a stupid simple query.

Join the two tables, group by PatientID, Category, Term... get MAX(Grade)

then count (wrap that query in another, and just do COUNT(*).

With effectively repeating fields, it gets ugly fast. Requires CROSS APPLY to stack the repeating groups... BLECH.

But since experience is a much better teacher than a lecture, try it yourself. (Seriously, it's absolutely the best way to learn. turn on the timer stuff, (I think it's SET TIME ON), and see how fast your queries are. But remember to clear the cache between tests. (Kevin Kline has some really funny stories about that!).

if you get stuck, post back. But take my word (or better, TRY IT!, learn from objective tests, not someone's opinion!)

Happy learning!

This reply was modified 4 years ago by pietlinden.