Normal Forms and Data Integrity

  • Jason Wolfkill

    SSCrazy Eights

    Points: 9772

    Ow . . . ow . . . ow . . . ow

    ^^^ My brain working on this question.

    Jason Wolfkill

  • SQLRNNR

    SSC Guru

    Points: 281243

    So when will they add TTNF to the fray?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • logitestus

    SSCrazy

    Points: 2877

    Even reading up on Normalisation (or is it Normalization?), I still ended up guessing but it did get me to re-assess what I think of 1NF to 4NF.

    Thanks!

  • Thomas Abraham

    SSChampion

    Points: 10761

    As soon as I read "When can normalising ..." I knew it was a Tom question. Too bad I second guessed myself and took the fourth option over the second option.

    Thanks Tom for being a normalization pedant. (Sorry, but the spell checker insists on using a "Z" instead of an "S". Damn Yanks.)

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    First: I am very glad to see a question about normalization. There should be more of those. Much more. So thank you, Tom!

    And yet, my pedantic me just has to comment.

    The question itself is unclear and uses false premises. It suggests that normalizing to a higher normal form can introduce anomalies. That is not the case. Normalising to a higher normal form will always reduce anomalies.

    If in a database schema "all data integrity anomalies arising from functonal dependencies were eliminated by normalisation" then, no matter what normal form you consciously normalized to, it is in BCNF. That follows from the definition of BCNF. To quote Wikipedia: "If a relational scheme is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist". Or another way to put it: in BCNF, every non-trivial full functional dependency of an attribute has to be on a candidate key. (Or another way to put it, every non-trivial functional dependency has to be on a superkey).

    In the last part of the question, I think Tom has a type: "acquire functional dependency-related dependencies" - I think you meant to write "acquire functional dependency-related anomalies"? But as already said, it's still impossible to acquire more anomalies when normalizing to a higher normal form.

    Then the explanation really misrepresents the situation. "EKNF is the highest normal form in which it is possible to ensure that all functional dependencies are enforced by the schema and its key constraints". No, that is BCNF. But in some cases, that form is impossible to achieve. In those (rare) cases, EKNF is the best you can do. (It has been proven that EKNF is always possible, and there are documented cases where nobody has yet been able to find a BCNF schema, and the general consensus is that this is impossible").

    So here's the actual relationship between the various Normal Forms.

    3NF (probably the best known normal form) requires that every non-trivial full functional dependency of a non-key attribute has to be on a candidate key. Note that nothing is said about key attributes. For single-column keys, that is not relevant - if a single-column key attribute depends on something, then whatever it depends on is a candidate key by definition. But for composite candidate keys, the exception is relevant. A column that is included in a composite key can depend on a non-key attribute, or on a subset of another candidate key, without violating 3NF.

    BCNF (also fairly well known) removes with this exception: every non-trivial full functional dependency of any attribute has to be on a candidate key. As already mentioned, this implies that there can be no functional dependency-related anomalies. Unfortunately, this is not always achievable.

    EKNF sits halfway between 3NF and BCNF. The exception that is present for key attributes in 3NF and removed in BCNF is partly removed in EKNF: every non-trivial full functional dependency of an attribute that is not an elementary key has to be on a candidate key. So the rule still applies to non-key attributes, and it also applies to attributes that are part of a key that is not elementary - but for attributes in a key that IS elementary, the exception still applies.

    (The distinction between elementary and non-elementary keys is very subtle and hard to explain - please take a rain check for that, I am too tired for it at this time).

    Bottom line - if it is possible to create a schema that eliminates all anomalies arising from functional dependencies, then that schema is in BCNF (and hence also in all lower normal forms). In the rare cases where this is not possible, the best available alternative is EKNF, and you'll have to live with the anomalies that could not be eliminated.

    Oh, and since you also mention 4NF and 5NF in your question (and I'm a pedantic perfectionist):

    4NF implies that every non-trivial multi-valued dependency is a dependency on a superkey. Since a functional dependency is just a special case of a multi-valued dependency, this automatically implies BCNF.

    5NF means that every non-trivial join dependency is a dependency on a superkey. And hey, look here - a multi-valued dependency then turns out to be a special value of a join dependency, so 5NF automatically implies 4NF, and all "lower" normal forms.

    Whew! Now I really need to stop writing and get some sleep!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • TomThomson

    SSC Guru

    Points: 104773

    Hugo Kornelis (8/23/2013)


    First: I am very glad to see a question about normalization. There should be more of those. Much more. So thank you, Tom!

    And yet, my pedantic me just has to comment.

    I disagree with most of your comments, but I think the reasoning will have to wait until I have had some rest; or maybe even wait until I get round to finishing the fourth article in my normalization series.

    One thing I do agree with: the last word in the question is indeed an error: it should, as you suggest, be "anomalies" rather than "dependencies". Thanks for spotting that.

    Tom

  • PChiragS

    SSCarpal Tunnel

    Points: 4965

    very good question..

    Thanks Tom..

  • Kavita Tiwari

    SSC Journeyman

    Points: 80

    Although a tough one, but i got it correct.

    Thanks Tom

  • TomThomson

    SSC Guru

    Points: 104773

    L' Eomot Inversé (8/23/2013)


    Hugo Kornelis (8/23/2013)


    First: I am very glad to see a question about normalization. There should be more of those. Much more. So thank you, Tom!

    And yet, my pedantic me just has to comment.

    I disagree with most of your comments, but I think the reasoning will have to wait until I have had some rest; or maybe even wait until I get round to finishing the fourth article in my normalization series.

    One thing I do agree with: the last word in the question is indeed an error: it should, as you suggest, be "anomalies" rather than "dependencies". Thanks for spotting that.

    OK, now I've had my 7 hours sleep and taken my morning medication, so I'll try to respond to your comments.

    There seem to me to be three issues:

    3) Is a schema where no anomalies are derived from non-trivial functional dependencies necessarily in BCNF?

    2) What is the definition of BCNF

    1) Can normalizing a schema from EKNF to BCNF introduce functional-dependency derived anomalies?

    And, although it's not anything we disagree about, we probably ought to have a definition of EKNF too, as probably most people don't know what it is, so let's have

    4) what is the definition of EKNF.

    There's also the question of what "schema" means!

    I'll tackle these in reverse order.

    4) This definition is in two parts: first a definition for a table, and second a definition for a schema.

    A table is in EKNF if whenever S -> x is a nontrivial functional dependency in that table either x is an elementary key attribute of the table or S is a superkey of the table; the dependency is trivial if S contains x; x is an elementary key attribute if x is an element of an elementary key. A candidate key K is an elementary key if there is at least one nontrivial elementary functional dependency K -> y; and a functional dependency T -> y is elementary if no functional dependency T1 -> y exists for which T1 is a proper subset of T. That's quite a long chain of definitions, but it's pretty straightforward.

    A schema is in EKNF is every table in the schema is in EKNF and every functional dependency between attributes in the schema is enforced by the schema (ie no. assignment of attribute values that is consistent with the constraints in the schema fails to satisfy all the functional dependencies.

    This part of the definition was a break away from normalisation tradition; earlier definitions or normal forms dealing with functional dependencies (2NF, 3NF, BCNF) had just defined the normal form for single tables. They didn't claim to enforce all functional dependencies: obviously 2NF and 3NF couldn't include that in the definition; the definition of BCNF didn't include such a clause either, because normal form definitions just didn't include any such statement (perhaps because people felt that such a statement should not be included in definitions unless there was a proof that it could be achieved).

    3) No, there are schemas with no functional dependency related anomalies which are in EKNF but not in BCNF (and EKNF without anomalies is always achievable). There is a classical example for this, and here it is:

    Suppose that we have a schema with three attributes x, y and z, all integers, in our schema and that the set of non-trivial funtional dependencies between the attributes CONSISTS OF {x,y} -> z and {z} -> y and everything that is implied by those two (and nothing else). We can represent this schema in SQL as follows:

    create table T0 (

    z int not null primary key,

    y int not null,

    unique(z,y) -- needed only because SQL doesn't allow proper superkeys as targets of foreign key constraints; another SQL failure to follow the model.

    );

    create table T1(

    x int not null,

    y int not null,

    z int not null,

    primary key (x,y),

    constraint F foreign key (z,y) references T0(z,y)

    );

    Table T0 is in 3NF and EKNF and BCNF, since the only nontrivial functional dependency is enforced by the primary key constraint. Table T1 is in 3NF because there are no non-prime attributes: x and y are attributes in the primary key so they are prime attributes, and {z,x} -> y is a consequence of {z} -> y, so {z,x} is a candidate key, so z is a prime attribute. T0 is also in EKNF and in BCNF (as are all tables with only two attributes). T1 is in EKNF (because z is an elementary prime attribute) but not in BCNF because the table has the attributes z and y and the (elementary) FD {z} -> y but {z} is not a superkey.

    There are no possible anomalies in this schema: {z} -> y can't be unsatisfied in T0 because of the primary key constraint, and it can't be unsatisfied in T1 because the foreign key ensures that the {z,y} pairs in this table are taken from T0, which means they satisfy the constraint. The other FD, {x,y} -> z is satisfied in the only table containing all three attributes, because {x,y} is the primary key of that table, so all the FDs implied by those two FDs are satisfied in the schema.

    So this schema is a counter-example to the claim that every schema in which all FR-related anomalies are eliminated is in BCNF.

    Achievability follows trivially from Zaniolo's proof that Bernstein's algorithm generates schemas in EKNF, as follows: given a schema in which every table is in EKNF, the only FD-related anomalies possible are those caused by nontrivial elementary functional dependencies in some tables. Suppose E -> x is an elementary dependency in table T creating such an anomaly; create a new table with primary key E and non-prime attributes y such that E -> y is an elementary

    FD in T; add a uniqueness constraint over all columns to the new table, and a foreign key in T pointing to the new tables from the corresponding columns in the new table; this eliminates n of the anomalies, where n is the number of non-prime attributes in the new table, and all the tables are in EKNF. Repeat until there are no anomalies left; now the whole schema is in EKNF as defined above - no table has nontrivial FDs not implied by the candidate keys with attributes other than elementary key attributes as target, and the schema enforces all the FDs so there are no anomalies. If I remember correctly this anomaly elimination step is actually redundant because Bernstein's algorithm generates a set of tables that enforce all the FDs fed into it rather than just generating something where every table is in EKNF as an individual table (so achievability of EKNF which is anomaly-free would be a direct consequence of the fact that Bernstein's algorithm generates it, without any extra reasoning needed) but I'm not 100% certain of that, only 99%, because it's too many years since I last looked at it and I don't currently possess a copy so I can't check.

    Note that the anomaly elimination process described in that achievability proof does increase the data size. That's a pity, but the assurance that bad code can't corrupt the data so that it doesn't conform to the FDs implied by the business rules has that store cost as its price.

    2 (and 1). It's rather obvious that 2 depends on 1. If the definition of BCNF is "a relation R is in BCNF if and only if for every nontrivial FD S -> x that R satisfies X is a superkey of R" or some equivalent definition (ie something compatible with the original definition, and the definition most commonly stated today) it is inevitable the some idiot who thinks normalisation is about eliminating FDs within a table rather than enforcing FDs within a schema will inevitably take the table T1 in the example above, discard the z column and the Foreign key constraint, discard the unique constraint on table T0, and pat himself on the back for a job well done: each table is in BCNF and some space has been saved; he won't realise that he's opened up the path for some quite horrible anomalies by throwing away the FD {x,y} -> z, because he's doing normalization not anomaly elimination. The definition on the wikipedia page you quote looks like that that definition, since "schema" there is clearly being used as a synonym for "relation" or "table" (which is how it was first used long ago, but seems a bit archaic these days) since it refers to a superkey in the schema, which would imply a set of attributes on which all the attributes in all tables depend if schema meant all the tables. However, the sentence immediately proceeding it does suggest another, radically different definition with two parts to it, just like the definition of EKNF (one concerning the form of each table, and the other concerning enforcement of all FDs that are needed rather than allowing some to be ignored because only FDs involving only one table matter). I would be quite happy if that rather more useful definition were generally accepted, but it isn't. The generally accepted definition is the one that refers just to one table, so the broken two table schema which fails to enforce {x,y} -> z is BCNF because there's no table in which all three attributes occur. So BCNF is always achievable, although acceptable BCNF is not.

    EDIT:

    After writing all that I was discovered I was sufficiently too bugged by the question whether anomaly elimination work needed to be done on the output from Bernstein's algorithm to leave it unresolved. In hunting through some very old stuff I came across some work I had completely forgotten about, that I started a couple of decades ago and never found time to finish, on mending 4NF so that it preserved the FD representation principle, and it was very clear there that Bernstein's algorithm will indeed generate all the auxiliary tables needed to ensure that all FDs are enforced in the nasty cases where BCNF can't achieve that, so that the anomaly elimination step I described above for finishing the job in case Bernstein's algorithm didn't is completely unnecessary - I should have a better memory. I might pick up that stuff and finish it now, it seems trivially easy to do.

    Tom

  • Carlo Romagnano

    SSC-Insane

    Points: 21969

    Tom, re-word the question, maybe I get it right!

    Unreadable!

  • Hany Helmy

    SSChampion

    Points: 13488

    Complicated question! :w00t:

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Hi Tom,

    I just wasted two hours typing a long reply to your message, then forgetting to do a copy/paste before posting. I really hate this site at times like that. How long has this bug been known? Will it ever be fixed? Does anybody at RedGate actually really care?

    Anyway, I don't have tome to do this all again. Maybe later. In the mean time, there is one question I do want to repeat.

    You mention that EKNF is not only defined at the level of individual relations, but also includes a schema-level requirement. I have studied Zaniolo's paper on EKNF (http://www.cs.ucla.edu/~zaniolo/papers/tods82b.pdf) extensively, but I didn't find any mention of this. Can you point me to where I might find that?

    I'll try to reconstruct the rest of my reply later. When I have enough time. And the motivation to repeat previous work.

    Cheers,

    Hugo


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • TomThomson

    SSC Guru

    Points: 104773

    Hugo Kornelis (8/26/2013)


    Hi Tom,

    I just wasted two hours typing a long reply to your message, then forgetting to do a copy/paste before posting. I really hate this site at times like that. How long has this bug been known? Will it ever be fixed? Does anybody at RedGate actually really care?

    It's been there ever since I first used the site, which was nearly 5 years ago. The bug is an out and out pain. RedGate appears to be more interested in tarting up the appearance the daily and weekly newsletters to make it better for tablets and phones (while making it a lot less pleasant for people using netbooks, laptops, or desktops, of even for people who use a docked mobile with a decent monitor) that in fixing appalling bugs like this one which make the site extremely frustrating to use.

    Anyway, I don't have tome to do this all again. Maybe later. In the mean time, there is one question I do want to repeat.

    You mention that EKNF is not only defined at the level of individual relations, but also includes a schema-level requirement. I have studied Zaniolo's paper on EKNF (http://www.cs.ucla.edu/~zaniolo/papers/tods82b.pdf) extensively, but I didn't find any mention of this.

    Can you point me to where I might find that?

    I 'don't know. It's something I remember from roughly quarter of a century ago, when I was first researching how to build an RDBMS on a declarative system. My problem was that a database inevitably has to break declarativeness, as of course do many other things; back then there was no Haskell, in fact no declarative functional language that was capable of being used to build an industrial-strength operating system, so it was not easy going, and I was very happy when someone told me about this new normal form which guaranteed no FD-related anomalies, because it meant that I wouldn't need to worry about how to ensure that we could have a relational calculus whose structure forced absence of FD-related anomalies, we could leave that to the schema; I think it was someone in the USC group - not Ginsburg himself (I was never lucky enough to meet him) but probably one of his team I ran into when visiting one of the British Universities that my outfit was connect with in one way or another in those days (it would be Oxford, Manchester, Edinburgh, St Andrews, UCL or QMC - those were the Universities I visited often in the course of work in the late 80s). I suppose it may have been someone who visited us at ICL WG, we used to get senior academics from various Universities up to the office to give seminars now and again, but I don't think it was. Zaniolo's paper makes it clear that if you normalise to EKNF using Bernstein's algorithm you end up with a schema which prevents all FD-related anomalies, so when I read it (a few years later - things were harder to get over the internet before we had the web) I didn't notice that Zaniolo didn't explicitly include that in his definition - I didn't notice in fact until you asked just now and I looked at the paper to find the bit you must have missed and discovered that I couldn't find any such bit. So I may have had no published basis for including that statement as part of the definition (I don't know whether it's in another one of the hordes of papers from TODS and PODS and so on that I read, although since the whole point of EKNF is that it fully satisfies the representation principle it seems a necessary part of the definition and its inclusion seems to me to be clearly implied, although not stated in so many words, by Zaniolo's unambiguous statement that that principle is satisfied and that that is the reason why it is worth identifying it as a specific normal form.

    I'll try to reconstruct the rest of my reply later. When I have enough time. And the motivation to repeat previous work.

    You have my sympathy. Guess how many times I typed the thing to which you were replying (twice ;-)) and why (than damnable bug).

    This time I remembered to copy it before hitting "post quoted reply". 🙂

    Tom

  • sqlnaive

    SSCoach

    Points: 17435

    Well a tough question. Somehow was able to answer but not satisfied. Need to further dive down.

  • jfgoude

    SSCrazy

    Points: 2586

    did not ever understand the question

    Am I so dummy !!?

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply