SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


my favorite normalization case study


my favorite normalization case study

Author
Message
patrickmcginnis59
patrickmcginnis59
SSChasing Mays
SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)

Group: General Forum Members
Points: 645 Visits: 2333
the problem is an attendance tracker for an educational institution.

The foreign keys would have student number, course number, and meeting date. The data (attendance) could be as small as a bit.

Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types. Maximum size of the group per student was greater than the max number of meetings I anticipated.

On the surface, it looks like repeating groups, but according to database theory as I stubbornly choose to interpret it (http://www.dbdebunk.com/page/page/622318.htm "...Maximizing performance is a physical optimization issue..."), as long as the one table with repeating groups was presented to the querying user as two or more joined tables, I'm all set, as the repeating group is simply a physical storage issue which is outside the scope of the topic of normalization.

Its worth noting that since there was no sql server involved, I was essentially the storage engineer, and for reporting, I fed a stream of attendance rows which were normalized with the foreign keys to our reporting system.

Whadya think? Anybody else ever see some funny normalization cases?
GSquared
GSquared
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23283 Visits: 9730
I'm more used to "funny" denormalization cases. Busy fixing a bunch of performance problems caused by an EAV structure right now.

In your case, it sounds like your bitmapping the data instead of storing it "relationally". In that case, normalization doesn't even apply, so don't worry about it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14192 Visits: 12197
I don't see this as repeating groups in the first place; "repeating groups" is a topic that's much misunderstood, with some people classifying far more things as repeating groups than others. When Ted Codd (in his paper "A Relational Model of Data for Large Shared Data Banks" published in CACM vol 13 no 6, June 1970) defined what is now called 1st normal form a repeating group was an attribute in a non-simple domain, that is in a domain where an individual attribute actually represented multiple values in underlying domains. A string of distinct attributes like "attendance at 1st meeting", "attendance at second meeting" and so on in a single row was not a repeating group, and would not be expected to be normalised out. I know it is currently regarded by a lot of people as heresy to rely on Codd's definition of first normal form, but I find it quite satisfactory since I can't think of any update anomalies, insertion amomalies, or deletion anomalies that arise from using that definition rather than one that would forbid me to store someone's home landline number and his mobile number in the same row because "two telephone numbers is a repeating group". There is of course the restriction that the number and domains of attributes in each row of a relation is defined by the relation, so is the same for every row in the relation, and that often makes it necessary to split things out to avoid simulating a variable number of columns by use of nulls, but I don't see how that applies here; and as a practical matter it doesn't make much sense to have a relation with so many attributes that the sheer number of them adds to the programming difficulty, but I don't see that as applying here either.

edit: incidentally, "repeating group" is pre-relational terminology, it's a pity that term isn't shunned and replaced for relational theory by Codd's "non-simple domain".

Tom

PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSCertifiable
SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)

Group: General Forum Members
Points: 5329 Visits: 4639
patrickmcginnis59 (1/13/2012)
Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types.


If the above is the description of your "course" table I can tell this is not normalized at all.

It would be easier for us to understand your scenario if a list of involved tables is provided including a summary of the data each one contains alongside PKs and FK relationships.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
patrickmcginnis59
patrickmcginnis59
SSChasing Mays
SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)

Group: General Forum Members
Points: 645 Visits: 2333
PaulB-TheOneAndOnly (1/14/2012)
patrickmcginnis59 (1/13/2012)
Long story short, I used an array of characters and a nice scrolly matrix gadget with dates across the top and students down the side, the whole matrix being a course section, and I allowed a few extra attendance types.


If the above is the description of your "course" table I can tell this is not normalized at all.

It would be easier for us to understand your scenario if a list of involved tables is provided including a summary of the data each one contains alongside PKs and FK relationships.


Well if it had been normalized, and I wanted to track each attendence day, I would have

COURSE_ID VARCHAR(20),
STUDENT_ID VARCHAR(10),
MEETING_ID DATETIME,
ATTENDANCE VARCHAR(1)

With everything but ATTENDANCE being a foreign key. As I actually did it, I had:

COURSE_ID VARCHAR(20),
STUDENT_ID VARCHAR(20),
ATTENDANCE VARCHAR(1) OCCURS MANY TIMES

and

COURSE_ID VARCHAR(20),
MEETING_ID, DATETIME OCCURS MANY TIMES.

and more importantly, the attendance and meeting_id occupy corresponding positions in their respective rows. I did it that way because everything worked nicer when it was "prejoined" so to speak.

Now this is according to my rather informal understanding of normalization. I could elaborate further if I took the time and read up, and of course I'm the new RDBMS person here, feel free to poke holes in anything I post. Its also worth noting that a valid optimization is to only store the absences or other abnormal stuff we want track and otherwise assume the student was present for that meeting.
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSCertifiable
SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)

Group: General Forum Members
Points: 5329 Visits: 4639
Would you mind in writting a query to answer the question... "Did Peter Bishop attend Particle Physics 501 on January 9th, 2012?"

Now... write a query to answer the same question on the 3NF structure.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
patrickmcginnis59
patrickmcginnis59
SSChasing Mays
SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)

Group: General Forum Members
Points: 645 Visits: 2333
PaulB-TheOneAndOnly (1/16/2012)
Would you mind in writting a query to answer the question... "Did Peter Bishop attend Particle Physics 501 on January 9th, 2012?"

Now... write a query to answer the same question on the 3NF structure.



Its worth noting that since there was no sql server involved, I was essentially the storage engineer, and for reporting, I fed a stream of attendance rows which were normalized with the foreign keys to our reporting system.


The obvious question is then, why did I not do all of it as an RDBMS? Because I preferred that data entry be easy, and I always knew that reporting would be possible, if perhaps requiring more effort.

I don't feel too bad about the design, indeed theres even some parallels to todays trends, that collection of data is often distanced a bit from reporting of that same data.
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSCertifiable
SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)SSCertifiable (5.3K reputation)

Group: General Forum Members
Points: 5329 Visits: 4639
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want :-)

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
patrickmcginnis59
patrickmcginnis59
SSChasing Mays
SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)

Group: General Forum Members
Points: 645 Visits: 2333
PaulB-TheOneAndOnly (1/16/2012)
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want :-)


It wasn't actually meant to be all that detailed a thread, but I'm interested that you felt it important to mention that "you don't even care about the basics of data modeling" when I clearly posted both the normalized and denormalized views, and actually shipped the normalized view to our reporting system. I took advantage of the denormalized view to help me ship. Its not like I didn't consider the normalized view, like I said, I materialized it for reporting.

Why did you feel the need to say "you don't even care about the basics of data modeling"? I thought it was an interesting case, but now your reaction is the interesting item in this thread ;-)

So tell me about why you believe I don't care or didn't care when I produced this system?
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14192 Visits: 12197
patrickmcginnis59 (1/16/2012)
PaulB-TheOneAndOnly (1/16/2012)
Please do not take it the wrong way but, out of curiosity, why is the subject of this thread "my favorite normalization case study" if you never intended to normalize the data and you don't even care about the basics of data modeling?

I'm not judging, you can do it anyway you want :-)


It wasn't actually meant to be all that detailed a thread, but I'm interested that you felt it important to mention that "you don't even care about the basics of data modeling" when I clearly posted both the normalized and denormalized views, and actually shipped the normalized view to our reporting system. I took advantage of the denormalized view to help me ship. Its not like I didn't consider the normalized view, like I said, I materialized it for reporting.

Why did you feel the need to say "you don't even care about the basics of data modeling"? I thought it was an interesting case, but now your reaction is the interesting item in this thread ;-)

So tell me about why you believe I don't care or didn't care when I produced this system?

Don't get too upset Patrick. Paul is (I think - I hope he'll correct me if I'm wrong) taking things from the point of view of someone who uses an RDBMS which sorts out all those storage issues for him, whereas you are taking the point of view of someone who has to design the optimal storage and then present a relational view on top of it. Obviously you will differ sometimes (probably almost always, given the two points of view), and this is one of those times. It doesn't help, of course, that none of our currently available RDBMSs can be told what sort of queries they will get with what frequency and attempt to optimise the storage mechanism to fit, but have instead a fixed set (usually two members to that set) of storage organisations within which they permit the user to attempt to optimise by choosing indices and clustering factors. It's not clear to me that the relational model prefers one of those two views over the other, but for most people it seems to be absolutely clear that the relational model implies Paul's view, not yours (needless to say, I disagree with them; that doesn't mean that I think your storage structure is relational, of course, it just means that I think it's fairly clear that it can be used to support a relational view of the data; I don't for a moment think that Microsoft's view storage view of pages, files, and filegroups is the slightest bit relational either, but it certainly can supprt something close to a relational view (maybe exactly a relational view) of the data. But I do think it's unreasonable to ask for an explanation of storage engine primitives in terms of relational primitives - that's confusing the cart and the horse.

Tom

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search