Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

my favorite normalization case study Expand / Collapse
Author
Message
Posted Friday, January 13, 2012 11:29 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
the problem is an attendance tracker for an educational institution.

The foreign keys would have student number, course number, and meeting date. The data (attendance) could be as small as a bit.

Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types. Maximum size of the group per student was greater than the max number of meetings I anticipated.

On the surface, it looks like repeating groups, but according to database theory as I stubbornly choose to interpret it (http://www.dbdebunk.com/page/page/622318.htm "...Maximizing performance is a physical optimization issue..."), as long as the one table with repeating groups was presented to the querying user as two or more joined tables, I'm all set, as the repeating group is simply a physical storage issue which is outside the scope of the topic of normalization.

Its worth noting that since there was no sql server involved, I was essentially the storage engineer, and for reporting, I fed a stream of attendance rows which were normalized with the foreign keys to our reporting system.

Whadya think? Anybody else ever see some funny normalization cases?
Post #1235860
Posted Friday, January 13, 2012 11:31 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 10:04 AM
Points: 15,442, Visits: 9,590
I'm more used to "funny" denormalization cases. Busy fixing a bunch of performance problems caused by an EAV structure right now.

In your case, it sounds like your bitmapping the data instead of storing it "relationally". In that case, normalization doesn't even apply, so don't worry about it.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1235861
Posted Saturday, January 14, 2012 4:42 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 3:34 PM
Points: 8,295, Visits: 8,748
I don't see this as repeating groups in the first place; "repeating groups" is a topic that's much misunderstood, with some people classifying far more things as repeating groups than others. When Ted Codd (in his paper "A Relational Model of Data for Large Shared Data Banks" published in CACM vol 13 no 6, June 1970) defined what is now called 1st normal form a repeating group was an attribute in a non-simple domain, that is in a domain where an individual attribute actually represented multiple values in underlying domains. A string of distinct attributes like "attendance at 1st meeting", "attendance at second meeting" and so on in a single row was not a repeating group, and would not be expected to be normalised out. I know it is currently regarded by a lot of people as heresy to rely on Codd's definition of first normal form, but I find it quite satisfactory since I can't think of any update anomalies, insertion amomalies, or deletion anomalies that arise from using that definition rather than one that would forbid me to store someone's home landline number and his mobile number in the same row because "two telephone numbers is a repeating group". There is of course the restriction that the number and domains of attributes in each row of a relation is defined by the relation, so is the same for every row in the relation, and that often makes it necessary to split things out to avoid simulating a variable number of columns by use of nulls, but I don't see how that applies here; and as a practical matter it doesn't make much sense to have a relation with so many attributes that the sheer number of them adds to the programming difficulty, but I don't see that as applying here either.

edit: incidentally, "repeating group" is pre-relational terminology, it's a pity that term isn't shunned and replaced for relational theory by Codd's "non-simple domain".


Tom
Post #1236121
Posted Saturday, January 14, 2012 12:46 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,068, Visits: 4,639
patrickmcginnis59 (1/13/2012)
Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types.


If the above is the description of your "course" table I can tell this is not normalized at all.

It would be easier for us to understand your scenario if a list of involved tables is provided including a summary of the data each one contains alongside PKs and FK relationships.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1236197
Posted Monday, January 16, 2012 11:52 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
PaulB-TheOneAndOnly (1/14/2012)
patrickmcginnis59 (1/13/2012)
Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types.


If the above is the description of your "course" table I can tell this is not normalized at all.

It would be easier for us to understand your scenario if a list of involved tables is provided including a summary of the data each one contains alongside PKs and FK relationships.


Well if it had been normalized, and I wanted to track each attendence day, I would have

COURSE_ID VARCHAR(20),
STUDENT_ID VARCHAR(10),
MEETING_ID DATETIME,
ATTENDANCE VARCHAR(1)

With everything but ATTENDANCE being a foreign key. As I actually did it, I had:

COURSE_ID VARCHAR(20),
STUDENT_ID VARCHAR(20),
ATTENDANCE VARCHAR(1) OCCURS MANY TIMES

and

COURSE_ID VARCHAR(20),
MEETING_ID, DATETIME OCCURS MANY TIMES.

and more importantly, the attendance and meeting_id occupy corresponding positions in their respective rows. I did it that way because everything worked nicer when it was "prejoined" so to speak.

Now this is according to my rather informal understanding of normalization. I could elaborate further if I took the time and read up, and of course I'm the new RDBMS person here, feel free to poke holes in anything I post. Its also worth noting that a valid optimization is to only store the absences or other abnormal stuff we want track and otherwise assume the student was present for that meeting.
Post #1236777
Posted Monday, January 16, 2012 1:09 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,068, Visits: 4,639
Would you mind in writting a query to answer the question... "Did Peter Bishop attend Particle Physics 501 on January 9th, 2012?"

Now... write a query to answer the same question on the 3NF structure.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1236817
Posted Monday, January 16, 2012 1:31 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
PaulB-TheOneAndOnly (1/16/2012)
Would you mind in writting a query to answer the question... "Did Peter Bishop attend Particle Physics 501 on January 9th, 2012?"

Now... write a query to answer the same question on the 3NF structure.



Its worth noting that since there was no sql server involved, I was essentially the storage engineer, and for reporting, I fed a stream of attendance rows which were normalized with the foreign keys to our reporting system.


The obvious question is then, why did I not do all of it as an RDBMS? Because I preferred that data entry be easy, and I always knew that reporting would be possible, if perhaps requiring more effort.

I don't feel too bad about the design, indeed theres even some parallels to todays trends, that collection of data is often distanced a bit from reporting of that same data.
Post #1236832
Posted Monday, January 16, 2012 2:26 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,068, Visits: 4,639
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1236859
Posted Monday, January 16, 2012 2:53 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
PaulB-TheOneAndOnly (1/16/2012)
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want


It wasn't actually meant to be all that detailed a thread, but I'm interested that you felt it important to mention that "you don't even care about the basics of data modeling" when I clearly posted both the normalized and denormalized views, and actually shipped the normalized view to our reporting system. I took advantage of the denormalized view to help me ship. Its not like I didn't consider the normalized view, like I said, I materialized it for reporting.

Why did you feel the need to say "you don't even care about the basics of data modeling"? I thought it was an interesting case, but now your reaction is the interesting item in this thread

So tell me about why you believe I don't care or didn't care when I produced this system?
Post #1236876
Posted Monday, January 16, 2012 8:10 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 3:34 PM
Points: 8,295, Visits: 8,748
patrickmcginnis59 (1/16/2012)
PaulB-TheOneAndOnly (1/16/2012)
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want


It wasn't actually meant to be all that detailed a thread, but I'm interested that you felt it important to mention that "you don't even care about the basics of data modeling" when I clearly posted both the normalized and denormalized views, and actually shipped the normalized view to our reporting system. I took advantage of the denormalized view to help me ship. Its not like I didn't consider the normalized view, like I said, I materialized it for reporting.

Why did you feel the need to say "you don't even care about the basics of data modeling"? I thought it was an interesting case, but now your reaction is the interesting item in this thread

So tell me about why you believe I don't care or didn't care when I produced this system?

Don't get too upset Patrick. Paul is (I think - I hope he'll correct me if I'm wrong) taking things from the point of view of someone who uses an RDBMS which sorts out all those storage issues for him, whereas you are taking the point of view of someone who has to design the optimal storage and then present a relational view on top of it. Obviously you will differ sometimes (probably almost always, given the two points of view), and this is one of those times. It doesn't help, of course, that none of our currently available RDBMSs can be told what sort of queries they will get with what frequency and attempt to optimise the storage mechanism to fit, but have instead a fixed set (usually two members to that set) of storage organisations within which they permit the user to attempt to optimise by choosing indices and clustering factors. It's not clear to me that the relational model prefers one of those two views over the other, but for most people it seems to be absolutely clear that the relational model implies Paul's view, not yours (needless to say, I disagree with them; that doesn't mean that I think your storage structure is relational, of course, it just means that I think it's fairly clear that it can be used to support a relational view of the data; I don't for a moment think that Microsoft's view storage view of pages, files, and filegroups is the slightest bit relational either, but it certainly can supprt something close to a relational view (maybe exactly a relational view) of the data. But I do think it's unreasonable to ask for an explanation of storage engine primitives in terms of relational primitives - that's confusing the cart and the horse.


Tom
Post #1236958
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse