Second Normal Form

  • Jan Van der Eecken (7/14/2011)


    Hi Tom,

    Again a nice article, and looking forward to 3NF. Are you contemplating going to go any further than that?

    I'm planning/hoping to do three articles after the 3NF one: one on normal forms intermediate between 3NF and 4NF and how the representation principle may get violated somewhere in that space (which is perhaps the most important article in the series, because understanding the representation principle in its various forms is the core concept of normalisation), one on 4NF and one on 5NF which will probably include a definition of 6NF but certainly no examples of 6NF (that will be Date's 6NF, not Fagin's 6NF - Fagin's 6NF aka DKNF may get a mention in the representation principle article, or not, depending on how I work the article out).

    Just one question, further above in the discussion you mention Bernstein's Algorithm. Never heard of that, and a (rather quick) search on Google didn't yield any useful results. Could you elaborate a little bit on that?

    The article describing the algorithm was published in ACM ToDS 1/4 (December 1976) beginning at page 277, the author was P.A.Bernstein and it was titled "Synthesizing Third Normal Form relations from functional dependencies"; I don't know of any way of getting it without paying the ACM (they deserved paymenty for the original publication, but given that they are a learned society and that ToDSis supposed to be a genuine technical joyrnal I think that this long afterwards http://www.gnu.org/philosophy/right-to-read.html is relevant). A related article is "A New Normal Form for the Design

    of Relational Database Schemata" by Carlo Zaniolo, in ACM ToDS 7/3 (Sept 1982), which is available without charge somewhere on the web (I'll not say where, in case the US copyright mafia sue me - stating facts is now a serious act entailing punitive damages in American law, and since it is also classified in the USA as a criminal offence and we have a crazy extradition treaty with the US the lunatics in charge of the US Asylum can extradite Brits from Britain for offenses committed in Europe against US law which are perfectly legal acts in Europe).

    Tom

  • Greate article and the disscussion about historical data, here I am going to say something and need your help.

    I would also add effectiveStartDate and effectiveEndDate fields to the PayGrade_Details table so that you could see that Charlie Fixit spent 3 years at paygrade 12 and then got bumped up to paygrade 13 and when it happened. Then I would just grab the paygrade record whose start date/end date falls within the proper date range that I need...I'm not sure if that is the correct way to do it or what NF that is but that's the way I've been doing it

    I am also doing this fromdate/todate technique to store historical data, but I am not sure that this is the good one to do this... In this particular way we need to keep NULL value or any predefined default value into 'todate' column untill the row is treated as the current... please advice any other way to mange the historical data...

    I could also use a historical or archive paygrade table and move the old records into that table when they are changed...but I'm not sure which way is the recognized "proper" way to do it.

    I haven't try it till now, I think it could be better than first one. because there is no NULL value stored here......

    Thanks,

    [font="Verdana"]Fazal Khan[/font]
    C.T.O
    [font="Verdana"]TARECO SOFTONICS[/font]

  • fazalkhansb (7/15/2011)


    Greate article and the disscussion about historical data, here I am going to say something and need your help.

    I would also add effectiveStartDate and effectiveEndDate fields to the PayGrade_Details table so that you could see that Charlie Fixit spent 3 years at paygrade 12 and then got bumped up to paygrade 13 and when it happened. Then I would just grab the paygrade record whose start date/end date falls within the proper date range that I need...I'm not sure if that is the correct way to do it or what NF that is but that's the way I've been doing it

    I am also doing this fromdate/todate technique to store historical data, but I am not sure that this is the good one to do this... In this particular way we need to keep NULL value or any predefined default value into 'todate' column untill the row is treated as the current... please advice any other way to mange the historical data...

    I could also use a historical or archive paygrade table and move the old records into that table when they are changed...but I'm not sure which way is the recognized "proper" way to do it.

    I haven't try it till now, I think it could be better than first one. because there is no NULL value stored here......

    Thanks

    As I said in an earlier message, this is an area where my experience is limited. You can choose to distinguish historical and current data either by the historical data in a separate table (which complicates the introduction of a new value - the old one has to be moved) or by not recording the end date at all because it is redundant - the previous value's reign ends when the next value begins (but then you have to be wary of performance - watch out for "triangular joins") or by using a special value (or NULL, but that's usually not a good idea) for the end date to distinguish current data (that may be the best option - but don't use a date anywhere in the feasible lifetime of your database as that special value). Look at Esteban Zimányi's 2006 paper "Temporal Aggregates and Temporal Universal Quanti?cation in Standard SQL" (available on pages 16 to 25 of this pdf doc, or get hold of the book "Temporal Data and the Relational Model" by Date, Darwen and Lorentzos if you are heavily into data warehousing with history requirements. For a brief discussion the normal form level of some table types that may be useful there is this but be warned it's aimed at showing that their chosen table forms are all in 6NF, rather than showing how to do anything useful with them; but if you like it, or the tables it describes, you may want to read up on anchor modelling more generally.

    Tom

  • Look at Esteban Zimányi's 2006 paper "Temporal Aggregates and Temporal Universal Quanti?cation in Standard SQL" (available on pages 16 to 25 of this pdf doc, ...

    Wow that's great...thanks for the link Tom.

    George

  • Good post ... although I must say that the first paragraph has no learning value, but it does establish your 'geek credentials' 😉

    One informal analogy I use when thinking about 2NF is 'being on subject', or 'topicality':

    Tables store data about things, and that thing, or 'topic' or 'subject', should be as narrowly defined as possible. Ideally, the table name should tersely convey this topic.

    You can then ask the question 'is this field topical to the table?' or 'is it on subject?'. In your example, when viewing 'Current_Assignment' as a topic name, it becomes clear that fields 'PayGrade' and 'YearsInGrade' aren't topical: they belong to the employee, and don't add anything to the subject of 'projects employees are assigned to'.

    A candidate key then becomes a minimal set of facts required to say something truly unique about the topic. If your candidate key is 'Employee and Project', then again it becomes clear that saying something about just the employee adds nothing new to the topic of project assignments.

  • Look at Esteban Zimányi's 2006 paper "Temporal Aggregates and Temporal Universal Quanti?cation in Standard SQL" (available on pages 16 to 25 of this pdf doc, or get hold of the book "Temporal Data and the Relational Model" by Date, Darwen and Lorentzos if you are heavily into data warehousing with history requirements. For a brief discussion the normal form level of some table types that may be useful there is this but be warned it's aimed at showing that their chosen table forms are all in 6NF, rather than showing how to do anything useful with them; but if you like it, or the tables it describes, you may want to read up on anchor modelling more generally.

    Thanks a lot Mr. Thomson...

    I have learned some new things about database design like Temporal Data and 6th Normal Form. I have downloaded your suggested papers and also I managed to get the Date's Book "Temporal Data and the Relational Model". I have started reading all about temporal data and I am very excited to know this new terminology (at least for me it is new). Thanks a lot to add some new things into my knowledge horizon 🙂

    [font="Verdana"]Fazal Khan[/font]
    C.T.O
    [font="Verdana"]TARECO SOFTONICS[/font]

  • Tom.Thomson (7/14/2011)


    That's sort of true. But Occam's razor is a useful tool when trying to present things in a simple manner, and what a functional dependency is all about is that some values are fixed by some other values; at least I hope that's what you mean by a functional dependency (it's what every academic paper I've read that contained the term used it for).

    I think it is important to be clear on definitions given what you said in your previous article about 1NF permitting nulls. If a reader mistakenly thinks that 2NF and other normal forms apply equally well to a table with nulls as to a table without them then they may want to know what a functional dependency means in a table with nulls. The concept of functional dependency was developed without considering the impact of nulls and I think that pretty much every other text I've read on the topic excludes the possibility of nulls being part of any FD. You have opened the door to nulls in your first article but now you seem to be ignoring the consequences.

    It's important to remember that Codd and Heath were discussing NULL (not two different NULLs, just one) way back when they were preparing their early papers on normalisation, so that it is extremely unlikely that NULLs weren't considered at all.

    I will take Codd's word for it when he says "The concepts and rules of functional dependence, multi-valued dependence, and join dependence were all developed without considering missing db-values. All the normal forms based on these dependencies were also developed without considering missing dbvalues."[1]. Whether or not nulls were actually in his mind at the time, he chose to leave them out of what he published. Codd goes on to say:

    "the normalization concepts do NOT apply, and should NOT be applied, globally to those combinations of attributes and tuples containing [null] marks. Instead, they should be applied [...] to a conceptual version of the database in which tuples containing missing-but-applicable information in the pertinent attributes have been removed"

    [1] Missing Informatlon (Applicable and Inapplicable) in Relational Databases, SIGMOD RECORD, Vol. 15, No. 4, December 1986

    So despite what you say, in Codd's view 2NF does exclude nulls. You haven't said in your article that 2NF applies only to a version of a database without nulls but you haven't suggested any alternative way of evaluating 2NF for a database with nulls either.

    I don't mean to imply that I agree wtih Codd's view. I think it's a terrible suggestion. It completely ignores the very real possibility that the kind of redundancy and resulting anomalies which the NFs were designed to eliminate can also occur when some rows include nulls. What bothers me is that you didn't mention either this or any other strategy for dealing with the nulls that you say you want to permit. Your readers are left in the dark about it.

    Just a fortnight ago one of you anti-NULL purists was claiming that there was no such term as "primary key" in relational theory, only in SQL! (Complete nonsense, as you will agree.)

    I think most people will understand that I used "the primary key" in the sense used in the paper which introduced the relational model to the world - it is a primary key that has been chosen to be "the" primary key as opposed to just "a" primary key and it is a perfectly sensible description of a candidate key to say that it is something that can be so chosen, both because that's why the word "candidate" is there - a candidate is something that can be chosen (whether by fate, by a schema designer, or otherwise) and since the term "candidate key" was in fact introduced as a replacement for "primary key" rather later. The concept of superkey is of course completely superfluous.

    Sure, but the science and the language have moved on since Codd used the term "primary key". Almost no-one today uses that phrase to refer to any or all keys of a relation and using it to mean just one key is completely redundant in a discussion about normalization.

    If the person you mentioned was confused about primary keys then I'm not surprised and he's not the only one. In the previous 1NF article discussion you said "SQL has a restriction that there can be only one primary key". Taken literally that statement is obviously untrue. By expecting us to guess each time whether you mean "the primary key" or "a primary key" you are committing the same linguistic folly that Codd did. Please let's leave "primary" keys out of a discussion about normal forms.

  • David Portas (7/17/2011)


    I think it is important to be clear on definitions given what you said in your previous article about 1NF permitting nulls. If a reader mistakenly thinks that 2NF and other normal forms apply equally well to a table with nulls as to a table without them then they may want to know what a functional dependency means in a table with nulls.

    Actually I think it is far more important to explain the fundamental underlying concept - that the business rules so constrain values that knowing values for some attributes guarantees that you also know the values for some others - than to mess about with the mathematical abstraction "functional dependency", which has no meaning at all to almost everyone involved in database development. That underlying concept is what matters in the real world of database design. You may be one of those people who think better in terms of mathematical abstractions than in terms of the real world phenomena they model - if so, you are not part of the audience I was writing for. I don't want to tell my audience that they have to learn a lot of pure mathematics to understand normalisation (which is NOT part of relational theory, but part of real world database design). Nor do I want to arrive at a situation where my audience ends up, by the end of the article series, foolishly believing that maximum decomposition is more imprtant than accurate representation (so you'll really hate at least one of the later articles in the series). Nor do I want my audience to believe that normalisation can't be applied to their real-world databases where missing values are inevitable so they have NULLs, because that would deprive them of a valuable tool-kit for improving their schema designs. Perhaps you will understand my position better if I use an analogy: "look, we want to keep the baby - so be careful when you discard the bathwater"?

    While "permit NULL only when it is absolutely necessary (and be extremely sure that it really is absolutely necessary)" is a good rule that every schema designer should follow, "forbid NULL everywhere" is nonsense perfectionism (at least until we have better ideas for NULL elimination - a research topic which anti-NULL purists have ignored ever since Date's bizarre "default values will always work in place of NULL" idea) of the same order as "make all your schemas conform to DKNF" (which is, as Date pointed out here, starting about 20 lines down, provably impossible in many practical cases; in fact we don't even approach understanding where it is possible).

    I will take Codd's word for it when he says "The concepts and rules of functional dependence, multi-valued dependence, and join dependence were all developed without considering missing db-values. All the normal forms based on these dependencies were also developed without considering missing dbvalues."[1]. Whether or not nulls were actually in his mind at the time, he chose to leave them out of what he published. Codd's goes on to say:

    "the normalization concepts do NOT apply, and should NOT be applied, globally to those combinations of attributes and tuples containing [null] marks. Instead, they should be applied [...] to a conceptual version of the database in which tuples containing missing-but-applicable information in the pertinent attributes have been removed"

    I don't mean to imply that I agree wtih Codd's view. I think it's a terrible suggestion. It completely ignores the very real possibility that the kind of redundancy and resulting anomalies which the NFs were designed to eliminate can also occur when some rows include nulls. What bothers me is that you didn't mention either this or any other strategy for dealing with the nulls that you say you want to permit. Your readers are left in the dark about it.

    Then what do you mean to imply? That I should provide about 16000 words of abstruse mathematics to demonstrate the several ways (one of which is the way Codd mentioned and you apparently failed to notice in the quotation you used - you apparently also failed to notice that the informal approach in the article exactly followed that way except for an insistence on banning NULL in all prime attributes, which I think Codd probably intended although he didn't actually mention it that sentence) in which the formal theory can be adapted, instead of sticking to the actual design issues that the formal theory is intended to deal with and model? When I was very young I might have taken that approach - I was a pure mathematician - but since then I have become an applied mathematican/computer scientist/engineer and gained experience by applying and teaching people to apply mathematics in industry; this has taught me that going back to the real world things that are being modelled and talking about them is where I need to be if I want to teach effectively. It has also taught me that I understand the mathematics much better and have far more insight into the abstract theory if I approach it that practical way.

    Anyway, I don't think it's useful to try to hijack this discussion (as you have successfully hijacked others) into a debate on whether NULL is a bad thing. We know it's a bad thing. That it's also a necessary thing is something that is unfortunate, and it's even more unfortunate that some people can't accept that we must live with that necessity.

    Tom

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply