November 17, 2011 at 12:33 pm
I understand normalization, but where I get into trouble is the logical breakdown of what an entity ends and another begins. Many are clear cut, but others I can just go based on feel. Then when someone tries to violate this and I call them on it, I can't vocalize and justify where that line is.
Example: A student entity
Student
Name
dob
age
ethnicity
income
address
SSN
Drivers License number
grade
Status
schoolyear
enrollmentdate
Teacher (in a single teacher school)
school
home language
primary language
special ed
pregnant
remedial
ever suspended
and the list can go on and on.
I can strongly justify any breakout of a 1:many relationship. People still argue that a course a student takes is an attribute of the student, but nobody will argue that it belongs in the student entity because a student takes multiple courses.
To me this is like saying that everything about a car is an attribute of the car, so the car table can be a flat 800 fields. I know it's not true, but what is the distinction where you draw the lines?
IMO, a student entity is basically the most immediate information about what a student is. Maybe just their name, SSN, birthdate. A lot more can be broken out into StudentDemographic, Studentgrade, StudentAddress, studentethnicity.
The issue that comes along with this is that this table with 90 fields on it, might only have 20 required upon enrollment and only 3 required to be entered into the system at all before enrollment (name, ssn).
I can argue the normalization of having all of the null fields, but the non dbas would rather have this than what they assume would be one entity broken out amongst 80 tables to accomidate the systems nullability rules.
It's not any one thing in specific, I'm just looking for pointers where I can just know and argue where these divisions lie.
November 17, 2011 at 1:38 pm
I pretty much break it down on what has a 1:1 relationship to the object, persists, and helps to identify the specific object.
So, with a student, I wouldn't include address, because that's subject to a time dimension with arbitrary changes, and can be subject to multiple values at a given point in time (mailing vs residence vs billing, etc.). Same for phone number. SSN and name and birthday are good, but in some databases, even name is subject to changes where you might need to track history. That's where "Account Number" or "Student ID" come in handy.
Honestly, one of the best editorials I've read on the subject is: http://www.simple-talk.com/opinion/opinion-pieces/the-joy-of-nad/
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 18, 2011 at 12:44 am
To me this is like saying that everything about a car is an attribute of the car, so the car table can be a flat 800 fields. I know it's not true, but what is the distinction where you draw the lines?
Few examples:
When you talk about seat / seat cover, its just seat's attributes not car's.
When you say Car has 4 wheels, it's car's attribute. Wheel manufactured by XYZ Company, its wheel’s attribute.
For More: Database Normalization
November 18, 2011 at 11:18 am
ShawnTherrien (11/17/2011)
IMO, a student entity is basically the most immediate information about what a student is. Maybe just their name, SSN, birthdate. A lot more can be broken out into StudentDemographic, Studentgrade, StudentAddress, student ethnicity.
Please avoid at all cost 1-to-1 relationships e.g. the ethnicity of a studend cannot change over time and in most systems you can have just one ethnicity per person like White, Latino, White/Latino, Asian, etc. etc.
I strongly suggest to build a consistent ER Model, them implement at least a 3NF one.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.November 18, 2011 at 11:43 am
Dev (11/18/2011)
To me this is like saying that everything about a car is an attribute of the car, so the car table can be a flat 800 fields. I know it's not true, but what is the distinction where you draw the lines?
Few examples:
When you talk about seat / seat cover, its just seat's attributes not car's.
When you say Car has 4 wheels, it's car's attribute. Wheel manufactured by XYZ Company, its wheel’s attribute.
Thank you. This is about what I can argue for drawing the line.
But with this logic, a student has 100+ attributes and 99% of them are optional. That's in no way normalized.
Like take a course a student would take. Opening/closing, work, grade.
When yo the two tables. I know this is true, but when people argue the point I can't argue the fact that grade/credits/close date are an attribute of a course. There is no real object CoursesOpened/CoursesClosed, but if you look at it this way rather than just Courses, it's a breakdown that makes more sense to me and reporting and normalization... See, I know it, but I can't argue it.
I think part of it is when you get into talks of normalizing and a student having 100+ optional attributes, that could easily turn into a design with 100+ tables. I think some people start looking over normalization and freak out at the extreme end at just now complex the database could become... 100 tables vs 800 tables. Better performance and more consistent data, but a lot more complexity for someone to learn and use to get to everything.u open the course, you have a start date indicator. You don't have work, you don't have a grade, you don't have credits. Not normalized, so it should be broken out into StudentCourseOpened, StudentCourseClosed tables, because while closing date, credits and grade are attributes of a course, normalization would break this out into the less concrete object of CoursesClosed.
Are CoursesClosed Courses? Yes. Is a course the same as a closed course? No.
I think people are afraid of getting to far into normalization and abstraction of data... that student table with 100 optional null-able fields could easily become 80+ tables. A DBA might argue better overall performance and more consistent data... but someone else would look at that 100 table erd and see normalization as a 1000 table erd making it far more complex.
I think the balance is somewhere in the middle, but it's hard to make the arguments just on the logic of where/how to break out objects data structure.
A car is far easier to encapsulate because it's a concrete example that we know we can take the transmission out and see what belongs to just the transmission. We can see that the parts and abilities of it are a part of that transmission.
I'm not disagreeing with you. I use the car example. It's just hard to push that over to people and everything that you say is part of a human.
thinking things through...
Student is a person. A person is not a student. So Person would have all of the data that applies to the person itself, not to the student. The student is going to have information that applies to being a student.
Special education information is part of a student, but not all students are special education students...
Person (information that pertains to the person)
Student (pertains to students and not to people in general)
StudentSpecialEducation (pertains to special education students that isn't universal for all students)
To me, it's just another way of saying that Special Ed fields are nullable and shouldn't be just tacked onto the student table and saying that we can't represent a person who is not a student because a student requires information that people in general don't have.
November 18, 2011 at 11:57 am
PaulB-TheOneAndOnly (11/18/2011)
ShawnTherrien (11/17/2011)
IMO, a student entity is basically the most immediate information about what a student is. Maybe just their name, SSN, birthdate. A lot more can be broken out into StudentDemographic, Studentgrade, StudentAddress, student ethnicity.Please avoid at all cost 1-to-1 relationships e.g. the ethnicity of a studend cannot change over time and in most systems you can have just one ethnicity per person like White, Latino, White/Latino, Asian, etc. etc.
True, but many of these are not 1:1, they're 1:0/1. They're optional or not required at all steps of a students career.
And just ironically, I've worked over the last year to convert our system from a single ethnicity to multiple. States don't want to know your single eithnicity, but all of them. If you're half black, half chinese, they want both listed.
Ethnicity is also required at a certain step of enrollment, but not before. This means it can also not exist at some point, so if it's part of Student, it's got to be nullable. It's also required at a certain point. Since it can be nullable in the student object you would have to rely on the application to enforce that it's required at a certain point and just begs to end up with missing data in some students who should have it required.
PaulB-TheOneAndOnly (11/18/2011)I strongly suggest to build a consistent ER Model, them implement at least a 3NF one.
3NF... I'm trying to make the arguments to get there.
What would you define a consistent ER Model? If you came to my DB as a consultant, what you would look for to say it's consistent or not?
I can tell you that it's a chore to keep the data consistant in a 100+ field table 🙂
November 18, 2011 at 12:21 pm
ShawnTherrien (11/18/2011)
Ethnicity is also required at a certain step of enrollment, but not before. This means it can also not exist at some point, so if it's part of Student, it's got to be nullable. It's also required at a certain point. Since it can be nullable in the student object you would have to rely on the application to enforce that it's required at a certain point and just begs to end up with missing data in some students who should have it required.
When modeling data it doesn't matter "when" a particular attribute will be populated - it just matters to which *entiry* the attribute belongs.
In this particular case I agree the attribute will initially be set to Null OR to a value you decide like "Not Needed Yet". I also agree that the application has to share responsibility in enforcing the quality of the data.
ShawnTherrien (11/18/2011)
What would you define a consistent ER Model? If you came to my DB as a consultant, what you would look for to say it's consistent or not?I can tell you that it's a chore to keep the data consistant in a 100+ field table 🙂
A "consisten ER Model" comes way before any tables are defined. a "consistent ER Model" is a model that reflects reality in terms of how the *entities* interact with each other thru *relationships*. If the ER Model does not reflects how the particular system works in the real world then it is not a consistent model.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.November 19, 2011 at 5:56 am
I'm not disagreeing with you. I use the car example. It's just hard to push that over to people and everything that you say is part of a human.
It's not matter of pushing it on someone. It should be something you should be convinced upon & you should have justifications for that. As long as your justifications are valid you are good to go.
All parts belong to human body but Mucus is Nose's property. (Sorry for dirty example but it’s human's imagination / visualisation capabilities that help him in remembering the facts for long time)
Based on what you said, all we will land up to a BIG FAT table. Even DBs are not required because Employee has Salaries and Employee works for Department(s) and assigned to multiple Projects. So place all this data in one table with NULLable columns. A BIG FAT Employee table.
I can help you in understanding / implementing DB Design in depth but it would be paid service. 😉
November 20, 2011 at 12:19 am
ShawnTherrien (11/18/2011)
Thank you. This is about what I can argue for drawing the line.But with this logic, a student has 100+ attributes and 99% of them are optional. That's in no way normalized.
Like take a course a student would take. Opening/closing, work, grade.
When yo the two tables. I know this is true, but when people argue the point I can't argue the fact that grade/credits/close date are an attribute of a course. There is no real object CoursesOpened/CoursesClosed, but if you look at it this way rather than just Courses, it's a breakdown that makes more sense to me and reporting and normalization... See, I know it, but I can't argue it.
You did a pretty good job of arguing it, so far. "Optional" columns are usually a pretty good indication that they need to be in their own table for proper normalization. But skipping that nuance, the courses a student takes aren't 1:1 or even 0:1... they're many:1 and that justifies putting them in a separate table.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply