Normalization

Question

Normalization

sjsubscribe

SSCrazy

Points: 2080
More actions
January 9, 2010 at 3:39 pm

#218107

http://lambda-the-ultimate.org/node/3762
[p] Some of long-standing issues being revisited, but it is interesting to see the debate from functional programming angles. [/p]

Viewing 15 posts - 1 through 15 (of 144 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

oh yes! very interesting indeed.

I found the article as interesting as the theory that Grand Cannyon was an alien mining operation is 😀 which everybody knows is true, init?

Let me quote "...practitioners have rejected normalization..."

:blink: Really?

Are you kidding? 😀

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

sjsubscribe SSCrazy Points: 2080 More actions · Answer 2

Certainly the current practice is mixed at best or else no DBMS would allow null values as valid entries in a field. That's what the relational theory purists wants. Rejection refers to that specifically. Hyperbole aside, that's what I got out of the debate anyway.

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 3

sjsubscribe (1/20/2010)
Certainly the current practice is mixed at best or else no DBMS would allow null values as valid entries in a field. That's what the relational theory purists wants.

I can see no conflict between data normalization and accepting null values in non-key columns - as all rdbms out there allow and enforce.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 4

PaulB-TheOneAndOnly (1/21/2010)

I can see no conflict between data normalization and accepting null values in non-key columns - as all rdbms out there allow and enforce.

All the conventional Normal Forms apply exclusively to relations with values, never nulls. In fact normalization predates the invention of SQL and nulls. So you can't say that any table with a null represents a relation in Normal Form.

If you allow an attribute to be null under normalization, you either have to ignore dependencies involving that attribute or you must make an informal (and incorrect) assumption that null can always be treated like a proper value. Formally speaking nulls just can't work with normalization. In particular, there is no clear answer as to what nonloss decomposition or join dependency means for tables that have nulls in them. So no table with a null is truly normalized. Since it is always possible to decompose further and eliminate nulls they are not formally necessary either - you can always create a truly normalized design without them.

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 5

David Portas (1/22/2010)...you can always create a truly normalized design without them.

Yes! I'm agreement... but the truth is you can do with or without them :Whistling:

I know there are people out there investing a lot of time in missinforming good gals and pals but the truth will always prevail - let's go back to the sacred scriptures, in my case - when in doubt - I always go to the old book of Codd.

Firstly... it is not true that Null got invented alongside SQL, Codd described Null. By the way, he got furious when SQL coded a single representation for Null when everybody knows there are two kinds of Null.

Secondly... Codd said "relations have attributes which contain values within a domain" so, lets see what we got here.

a) Null is not a value but the absence of value, domains are made up of values then Null cannot be part of a domain because Null is not a value. As you can see Null gets a free pass.

b) Some attributes in a tuple are not meant to be part of any relationship so people are safe to populate such relationshipless attributes with a Null when needed. Just an example, how about the famous MailAddressSecondLine attribute describing the optional second line of a mail address? you can populate it if needed or you can leave it with no value a.k.a. Null if you have nothing to put in there.

Bottom line is ... Codd invented everything at the same time - including Null - and the model was good but as it happens in all domains of life some people attempts to read what is not there and to stretch the words to extremes beyond what the author intended.

Cheers!

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 6

PaulB-TheOneAndOnly (1/22/2010)
Bottom line is ... Codd invented everything at the same time - including Null - and the model was good but as it happens in all domains of life some people attempts to read what is not there and to stretch the words to extremes beyond what the author intended.

Codd invented the relational model in 1969. His earliest discussion of nulls was in the "RM/T" paper "Extending the database relational model to capture more meaning" which was 10 years later. Even then he didn't properly attempt to introduce nulls to the relational model until much later, in his RM Version 2 book (1990). If you disagree then please tell me the precise work authored by Codd that you are referring to. You will find no mention of nulls at all in his original papers of 1969 and 1970 so it's not true that nulls originated simultaneously with the relational model itself.

Irrespective of that, the Normal Forms, upto and including 6NF don't leave any room for the possibility of nulls. Codd himself says so: "The concepts and rules of functional dependence, multi-valued dependence, and join dependence were all developed without considering missing db-values" ("Missing Informatlon in Relational Databases", 1986). Consider for example a join dependency JD *({A,B},{A,B,C}). What does this mean if A or B contain nulls? In SQL "nulls don't join" so the join of any projections including null will have fewer rows than you started with. How then can any SQL table with nulls be equal to the join of the projections on its superkeys (in other words 5NF)? It cannot be and therefore no table with nulls can accurately be described as being in 5NF.

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 7

wow wow WOW! :w00t: No need to declare Jihad on me... we both read from the same book! remember? :laugh:

hey ... you missed an opportunity to explain how the old-n-good MailAddressSecondLine issue works for you. Are we in agreement there are attributes that do not call for any relationship and nobody cares if they carry or not a value?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 8

Nothing personal. I just wanted to set the record straight, keeping in mind your comments about reading what is not there and stretching the meaning of words beyond their author's intent.

hey ... you missed an opportunity to explain how the old-n-good MailAddressSecondLine issue works for you. Are we in agreement there are attributes that do not call for any relationship and nobody cares if they carry or not a value?

If nobody cares then by the YAGNI principle the attribute should be left out of the database altogether.

I think the problem you are referring to is that it can be a mistake to assume certain properties about addresses which do not hold true in reality. That doesn't have very much to do with a discussion about normalization. It just means that as part of the design process you should determine what dependencies your data is actually going to satisfy.

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 9

So, when YOU create an address table, do you have a "second address" line or do you normalize it? Enquiring minds want to know. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

David Portas (1/22/2010)

PaulB-TheOneAndOnly (1/21/2010)
I can see no conflict between data normalization and accepting null values in non-key columns - as all rdbms out there allow and enforce.
All the conventional Normal Forms apply exclusively to relations with values, never nulls. In fact normalization predates the invention of SQL and nulls. So you can't say that any table with a null represents a relation in Normal Form.

This just is not true David. E. F. Codd himself championed the use of Nulls in Relational DBMS's and if he saw no problem with it, then I sure do not.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 11

RBarryYoung (1/23/2010)
David Portas (1/22/2010)
All the conventional Normal Forms apply exclusively to relations with values, never nulls. In fact normalization predates the invention of SQL and nulls. So you can't say that any table with a null represents a relation in Normal Form.
This just is not true David. E. F. Codd himself championed the use of Nulls in Relational DBMS's and if he saw no problem with it, then I sure do not.

I already mentioned that in later years Codd did champion the use of nulls (more precisely he championed the use of I-marks and A-marks and was less happy with SQL-style nulls). See the references in my previous reply. That doesn't alter the fact that the normalization and normal forms predate the invention of nulls. NFs 2,3,BCNF,4,5,6 therefore deal only with relations with values, never with nulls.

I'm not sure what it is you are disagreeing with. I'm always happy to learn where I'm wrong but I think the chronology is pretty indisputable. The fact that the dependency theory on which normalization is based excludes the possibility of nulls is also pretty clear and uncontroversial as far as I know (see my previous quotation for Codd's own view). What am I wrong about?

P.S. Codd did in fact acknowledge and write about some problems with his later attempts to retro-fit null marks and other features into the RM (his "RM version 2"). Arguably he created more problems than he solved. That may partly explain why as far as I know Codd's RM V2 model has never been implemented and it has been neglected by both the scientific and industry communities.

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 12

Jeff Moden (1/23/2010)
So, when YOU create an address table, do you have a "second address" line or do you normalize it? Enquiring minds want to know. 🙂

I'd still like an answer to the question above, David.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 13

Jeff Moden (1/23/2010)
Jeff Moden (1/23/2010)
So, when YOU create an address table, do you have a "second address" line or do you normalize it? Enquiring minds want to know. 🙂
I'd still like an answer to the question above, David.

You are asking how I would implement optional parts of an address within the limitations of a fixed set of numbered address line attributes? If an address has fewer lines than there are attributes then I'd populate the extra lines with zero-length strings. Null wouldn't be appropriate because all the parts of the address are known and present and using null would give undesirable results. For example if you compare all the parts of an address in a join or subquery you ought to expect two identical addresses to be returned as equivalent, but that wouldn't work if you start adding nulls into addresses.

I'm not much of a fan of the "AddressLine" design pattern. There are alternatives. For example the BS7666 address standard we have in the UK. Even a single attribute for the whole address (except postal code) can make more sense than having numbered lines in an address table.

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 14

I was hoping that you'd say that. I agree. All parts of the address are known is it is known that the 2nd address line has a value of nothing rather than being unknown which is what a NULL would indicate.

In an even more practical sense, the use of NULL's for "end dates" can also be even more bothersome and I'll use the proverbial "end of time" date (9999-12-31) instead of a NULL.

The problem with both of those is, or course, duplication of "non data" similar to the "conventional" use of NULL. Some folks could imply that such duplication in those columns I speak of is justification for additional normalization and that may even be true. It just wouldn't be practical in my humble opinion. That's where I thought you guys were going with all of this and I'm relieved that's not the case... Heh... I didn't want to be a renegade for yet another thing in SQL. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)