Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««45678»»

First Normal Form Expand / Collapse
Author
Message
Posted Tuesday, July 5, 2011 5:57 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 12:34 AM
Points: 446, Visits: 3,327
Tom,

I'm well aware that the relational model and TTM require relations that consist of atomic attributes. What I see as a problem is that your article uses another criterion for 1NF:

Is this or isn't it in 1NF? That depends on what the database is expected to do with it

You define 1NF based on intended usage. This doesn't make any sense to me. Suppose I create a relation variable using my RDBMS and call it T. T happens to have a string attribute that encodes multiple telephone numbers. That's OK because a string is by definition an atomic value. According to your definition T is in 1NF as long as "the database doesn't know anything about that encoding". I don't have any problem with that so far because I don't ever intend to split out the phone numbers in the database. T is in 1NF.

Later on however, I change my mind. I want to split out the phone numbers using some code in my database. As you put it: "There will have to be code in queries to decode that complex string". Now you say that T is not in 1NF. But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation. So your suggested criteria is an additional requirement for 1NF over and above the requirement for T to be a relation - a relation being a thing which is defined by structure alone and never by intended usage. According to you, the relation variable T can change from moment to moment from being in 1NF to not being in 1NF depending on what code exists in my database.

The relations which according to your definition aren't in 1NF could also still satisfy higher normal forms like BCNF or 5NF even though you say they violate 1NF. If I've misunderstood your criterion for deciding 1NF then I think you need to clarify this point. To me it is fundamental that 1NF means nothing more than the definition of a relation, which is simply a data structure in a database. 1NF does not depend in any way on the intended usage of the data but purely on its structure.

Tom.Thomson (7/1/2011)
If NULL is a type of value (which is certainly the domain-theoretic interpretation of it)

However Codd says that Null marks are not values and Date and pretty much everyone else agrees. Codd never used the the domain theory definition of a domain in any work of his that I'm aware of. In fact he specifically requires that null marks be independent of domain - by which I understand him to mean they are orthogonal to the concept of a domain. If you are putting forward a different definition of nulls and null behaviour (as you have done in another SSC thread) then you ought to make it clear that your 1NF definition isn't generally applicable to other systems using different things called "nulls".


David
Post #1136400
Posted Tuesday, July 5, 2011 7:23 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:26 AM
Points: 11,194, Visits: 11,136
David Portas (7/5/2011)
...But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation...

Is it just me, or does this sound like a quantum theory of normal forms?




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1136454
Posted Tuesday, July 5, 2011 9:03 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 31, 2013 8:01 AM
Points: 1,232, Visits: 1,046
SQLkiwi (7/5/2011)
David Portas (7/5/2011)
...But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation...

Is it just me, or does this sound like a quantum theory of normal forms?


To me It reads like two or more people are re-writing/debating the original RDBMS theory material. Trying to fit things Chris Date or Ted Codd wrote about to there own personal thoughts or ideas. I am not a big fan. People come here to learn. They should not be given the personal thoughts abouts someone elses ideas as a fact. That is misinformation.

An actual Quote from Chris Date: "There was a general feeling at the time that research belonged to everyone. It was public. People at IBM didn’t like it, but all the papers were available. Ellison felt it was obvious that Codd was onto something, so he built a clone of what Ted outlined"

If you want to keep writing about what you "think" Codd & Date meant please read some of the things they wrote yourself before posting. Try to stay away from what Oracle or IBM re-wrote later. Post some live links to that factual information along with your thoughts.
Since we have several examples in this thread of how not to do that. Here are some examples of how to do it:

For anyone that want's to read what Ted Codd wrote about NULL (SQL):
http://en.wikipedia.org/wiki/Null_(SQL) I know it is wikipedia, but so what if it is correct.

For anyone that want's to read what Codd & Date wrote about 1NF:
http://www2.yk.psu.edu/~lxn/IST_210/normal_form_definitions.html
FYI: The main definition for 1NF in the link above has the word DOMAIN in it. This is how Ted and later Chris both saw it.

FYI: Chris Date did not work with Ted Codd on the original definition of NULL. Chris came into the picture later.

It is the RDBMS Ternary Value theory Codd introduced that most people seem to confuse the most. Codd never said NULL = Unknown.
The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.


Codd saw the need for differant levels of NULL in a RDMS to store and return the lack of a value properly. In 1990 Codd wrote in The Relational Model for Database Management version 2
"The single Null mandated by the SQL standard is inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively."


NULL, just like the mostly empty void we call the universe, is a large subject.
Post #1136548
Posted Tuesday, July 5, 2011 4:23 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 4:58 AM
Points: 2,467, Visits: 6,437
SQLkiwi (7/4/2011)
James Goodwin (7/1/2011)
It would be nice if there was a more clean SQL syntax than (X is NULL OR X <> 'Value'), but that is implementation, not theory.

There is an alternative to that, which I use quite a lot:
http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx


Nice blog post, Paul. Never thought of trying that route. One more trick in my bag, thanks!


--------------------------------------------------------------------------
A little knowledge is a dangerous thing (Alexander Pope)

In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)
Post #1136898
Posted Tuesday, July 5, 2011 5:44 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 6:46 PM
Points: 36,944, Visits: 31,446
SanDroid (7/5/2011)
To me It reads like two or more people are re-writing/debating the original RDBMS theory material. Trying to fit things Chris Date or Ted Codd wrote about to there own personal thoughts or ideas. I am not a big fan. People come here to learn. They should not be given the personal thoughts abouts someone elses ideas as a fact. That is misinformation.


I agree! SanDroid for President!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1136943
Posted Thursday, July 7, 2011 6:07 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 8,682, Visits: 9,207
SanDroid (7/5/2011)
For anyone that want's to read what Ted Codd wrote about NULL (SQL):
http://en.wikipedia.org/wiki/Null_(SQL) I know it is wikipedia, but so what if it is correct.

It's a pretty good reference. Some trivial errors (for example the two distinct null values were introduced more than a decade before the RM V2 book, not in it.). Most importantly it points out that NULL indicates the absence of a value for the datum, not the truth value "unknown".
For anyone that want's to read what Codd & Date wrote about 1NF:
http://www2.yk.psu.edu/~lxn/IST_210/normal_form_definitions.html

It's certainly not what Codd wrote about normal forms. It may be something like some of what CJD write (I haven't read all his writings), but referring everything to THE primary key instead of to the set of primary keys (usually now called candidate keys) is not at all Codd's style.

It is the RDBMS Ternary Value theory Codd introduced that most people seem to confuse the most. Codd never said NULL = Unknown.
The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.

That's another of those errors - sometimes comparing NULL with NULL delivers True (Paul's clever use of INTERSECT mentioned above explicitly depends on this), and TC stated that it should deliver True in some cases (cases listed in his SIGMOD 1979 paper, and the Dec 79 ToDS version). Wikipedia even mentions these particular comparisons further down the page.

If you read the Dec 79 ToDs paper (it's available somewhere online without requiring an ACM library subscription, but I'm not telling you where because that would be encouraging copyright violation - as well as encouraging TC's view that the research belongs to everyone ) you will see that TC had introduced the two NULLs and 4VL system before then, but thought it reasonable to merge the NULLs and stick to 3VL for many purposes. If you read the 1990 book, in particlular the rules for replacing NULLs with actual values (you can download thbe book from the ACM library for $15, if I recall correctly) you'll see that the i and a NULLs aren't actually "inapplicable" and "applicable" since that would have precluded convertions between them.


Tom
Post #1138028
Posted Thursday, July 14, 2011 3:54 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 8,682, Visits: 9,207
David Portas (7/5/2011)
Tom,

I'm well aware that the relational model and TTM require relations that consist of atomic attributes. What I see as a problem is that your article uses another criterion for 1NF:

Is this or isn't it in 1NF? That depends on what the database is expected to do with it

You define 1NF based on intended usage. This doesn't make any sense to me. Suppose I create a relation variable using my RDBMS and call it T. T happens to have a string attribute that encodes multiple telephone numbers. That's OK because a string is by definition an atomic value. According to your definition T is in 1NF as long as "the database doesn't know anything about that encoding". I don't have any problem with that so far because I don't ever intend to split out the phone numbers in the database. T is in 1NF.

Later on however, I change my mind. I want to split out the phone numbers using some code in my database. As you put it: "There will have to be code in queries to decode that complex string". Now you say that T is not in 1NF. But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation. So your suggested criteria is an additional requirement for 1NF over and above the requirement for T to be a relation - a relation being a thing which is defined by structure alone and never by intended usage. According to you, the relation variable T can change from moment to moment from being in 1NF to not being in 1NF depending on what code exists in my database.

I think you are missing the point -it's not about precluding the database from holding certain objects, it's about ensuring that the objects it holds have so far as the database is concerned only types which are directly supported by the relational calculus or algebra language made available by the database system. Not an issue of abstruse theory, a really simple and straightforwards practical issue.

It's not the decision to have the object in the database that breaks 1NF - it's the decision to have the object in the database and require it to be manipulated by the database language (relational algebra or calculus with the interfaces of supported types) in a manner that requires understanding of its structure without including support for its type in the database language. And don't try to tell me that the type of a list of telephone numbers is string - because correct typing precludes nonsense like allowing 'XvMZQ54(w@^%' as list of telephone numbers.

If you modify the database system to recognise a type "phone number list" and add the operations that implement this type (append number to list, delete number from list, is number in list, is list empty, create empty list, merge two lists, and so on) then you are still in 1NF. If instead to decide to write those longhand in the database language using the primitives for operating on lists of characters (strings) provided by your database then you are not in 1NF because the object is not atomic in the database.


Tom
Post #1141586
Posted Friday, July 15, 2011 12:37 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 12, 2013 10:21 AM
Points: 16, Visits: 78
Tom,

Firstly, great article. I actually stumbled onto it after reading your 2NF article via the link back here, and wanted to start from the beginning. Glad I did

After reading, I was like "Wow... Good times..."

Then, after reading the comments, I was like "Aww... Bad Times..."

I will echo others in saying: Good job defending you stance. The scrutiny has been, at times, excessive (I thought), but I think you really made some great points both in your article and your replies. Honestly, coming from an App Dev background into this realm of DBAs, the article definitely helped put the concept of normal forms under a more focused lens for me (all theoretical nuance squabbling aside).

I also liked how you addressed the concept of context as it relates to "atomicity". So many data elements that we take for granted as being "simple" in nature rarely ever are (one nit that went unpicked: Full Name... atomic? yes/no/may-be-so), and are in fact themselves hierarchical arrays of data carried over from "real life" attempts at bringing a measure of order to The Chaos. I definitely think it was an important point to raise. I think using terms like "atomic" and "natural" in all this lingo tends to get us thinking to abstract and we forget that we're not actually talking about the indestructible, elemental entities of the cosmos. These data elements are, more often than not, just a bunch of non-sense concocted by a band of numb-skulls (and I do not exclude myself from being a subset of that domain)

I think you touched on a great point, which really drives it home for me:

Tom.Thomson (6/30/2011)

...a structure that is not future-proof is not a good structure. When considering what the business rules are when determining what is 2NF, 3NF, EKNF, BCNG or 4NF one should consider what changes in business rules may happen next year, or the year after, or even later. For 1NF one should consider what someone might produce as a new requirement on the database in the future, as a new requirement may make a list-valued attribute into a violation of 1NF.


And as most users here probably have come to realize, being cogs in the corporate machine, this one principle definitely holds true in the business world: "The only constant is change". Ultimately, this is a good thing overall, I think. Humans are inherently less than perfect, and therefore our attempts to define and organize our world will be equally so. Our propensity for constant change gives us the ability to adapt to the unforeseen eventualities that will inevitably come due to our insufficient understanding of the world we seek to explain.

This means that anything we build must facilitate (or at least account for), and not restrict this element of change. A system that is too rigid will not hold up over time.

All this puts me in a mind to do what I love to do most in these situations (as some may have come to realize from my previous posts)... Over-simplify a vast and complex issue by distilling it down to a quaint movie reference!

I call it "The Matrix Paradox" (I probably read it somewhere, so I'm sure I'm not the only one who calls it that... ) ...

Premise: Humans are imprecise, inaccurate, and to some degree unpredictable. For that reason, they are incapable of existing within a "perfect" system, because it would be too rigid, leaving no tolerance for error (which humans generate with great proficiency).

Dilemma:
- For the system to function properly it should be perfect
> (i.e. - in perfect adherence with "the rules". Note: Rules created by who? Exactly.)
- If the system is perfect it doesn't work

Summary:
As The Architect put it, "The problem ... is choice." Or to put it a different way (and I think more accurately): The problem is human error. Because in the world of 1 and 0, true and false, right and wrong... Choice is an illusion, and in the end you are either right or wrong. In adherence or in error.



Train of Thought... derailed!
Post #1142714
Posted Friday, July 15, 2011 7:06 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:38 AM
Points: 8,682, Visits: 9,207
pjcrout (7/15/2011)
Tom,

Firstly, great article. I actually stumbled onto it after reading your 2NF article via the link back here, and wanted to start from the beginning. Glad I did

After reading, I was like "Wow... Good times..."

Thanks for that - every comment like that encourages me to keep on writing.

Then, after reading the comments, I was like "Aww... Bad Times..."

I will echo others in saying: Good job defending you stance. The scrutiny has been, at times, excessive (I thought), but I think you really made some great points both in your article and your replies. Honestly, coming from an App Dev background into this realm of DBAs, the article definitely helped put the concept of normal forms under a more focused lens for me (all theoretical nuance squabbling aside).

I don't think anything was actually excessive - it's all fair comment and debate. I'm glad though to hear that you thought I did a good job in defending my position. I came from an mathematician and app developer background into database research, and after a lot of other things (largely not DB related) into doing the DBA thing as well as much else (a couple of "you name it, I do it" jobs), which I think is very like coming from a developer background into the DBA realm so we probably have a lot of ideas in common. Anyway, I am trying to write for practical DBAs who got into that role accidentally (as I did) and not for the theorists. I do find it an advantage to know some of the theoretical nuances too, though - it save a lot of trouble when designing a schema.

I also liked how you addressed the concept of context as it relates to "atomicity". So many data elements that we take for granted as being "simple" in nature rarely ever are (one nit that went unpicked: Full Name... atomic? yes/no/may-be-so), and are in fact themselves hierarchical arrays of data carried over from "real life" attempts at bringing a measure of order to The Chaos. I definitely think it was an important point to raise. I think using terms like "atomic" and "natural" in all this lingo tends to get us thinking to abstract and we forget that we're not actually talking about the indestructible, elemental entities of the cosmos.

Ted Codd, who introduced the atomicity idea into relational database theory (which he invented, of course) didn't use the term "atomic" in his early writingss. Instead he said "simple". I like that term (indeed I prefer it to atomic, because atomic doesn't properly reflect his insistence - which I absolutely believe was correct - on sticking to a first order relational calculus) but I don't use it because sticking to "atomic" usually gets me into fewer silly arguments (I'm very lazy, so avoiding pointless arguments is one of my prime objectives).

These data elements are, more often than not, just a bunch of non-sense concocted by a band of numb-skulls (and I do not exclude myself from being a subset of that domain)
Join the club - I'm definitely a member of that domain

Premise: Humans are imprecise, inaccurate, and to some degree unpredictable. For that reason, they are incapable of existing within a "perfect" system, because it would be too rigid, leaving no tolerance for error (which humans generate with great proficiency).

Dilemma:
- For the system to function properly it should be perfect
> (i.e. - in perfect adherence with "the rules". Note: Rules created by who? Exactly.)
- If the system is perfect it doesn't work

Summary:
As The Architect put it, "The problem ... is choice." Or to put it a different way (and I think more accurately): The problem is human error. Because in the world of 1 and 0, true and false, right and wrong... Choice is an illusion, and in the end you are either right or wrong. In adherence or in error.

That only works if logic is restricted to black and white (true and false, no other possibilities). Constructive mathematicians work in a world where there are infinitely many shades of grey. Even non-constructivists are happy to use all sorts of MVLs instead of classical two valued logic (the difference between them and constructivists is that the constructivists refuse to use the 2VL black-and-white logic, because they (or we - I'm not admitting to be on either side of that debate) believe it just doesn't work). Relational Database theoreticians have a similar division - anti-MVL and pro-MVL; it's kind of sad that in the Relational Database world they don't respect each other's positions, unlike the situation in mathematics.


Tom
Post #1142881
Posted Saturday, July 16, 2011 2:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 12, 2013 10:21 AM
Points: 16, Visits: 78
Tom.Thomson (7/15/2011)

That only works if logic is restricted to black and white (true and false, no other possibilities). Constructive mathematicians work in a world where there are infinitely many shades of grey. Even non-constructivists are happy to use all sorts of MVLs instead of classical two valued logic (the difference between them and constructivists is that the constructivists refuse to use the 2VL black-and-white logic, because they (or we - I'm not admitting to be on either side of that debate) believe it just doesn't work). Relational Database theoreticians have a similar division - anti-MVL and pro-MVL; it's kind of sad that in the Relational Database world they don't respect each other's positions, unlike the situation in mathematics.


Fair enough I can appreciate the "shades of grey" stance. For practical purposes, a tolerance for "shades of grey" is generally the more flexible approach, and more in line with "common sense" and real life practices, etc... as it seems we can rarely ever make a clear distinction about real world affairs that we might then attempt to model. I guess the question there though would be: Is it because there is no distinction, or is it because we just can't see it?

For the sake of debate (which I love) let me play devil's advocate and posit these notions and see how far down the rabbit hole we can go:
- Anything outside and/or BOTTOM/TOP of the domain of "true or false" is either inapplicable to true or false or unknown/unknowable as true or false and therefore cannot be evaluated as such. In this example, the other values outside true and false, or the grey, is akin to the mind filling in the blanks of an incomplete picture, and not actually an implication that a BOTTOM or TOP actually exists or that there is actually anything outside that domain. The grey here is only an approximation on the unknown, unknowable, or the seemingly inapplicable. Once known, knowable, or applicable though... it becomes either true or false. That is a limitation on "the knower", not true/false.
- As for any known "value" that is seemingly "grey", seemingly not quite true or false - as opposed to chalking this up to a multi-valued system of logic, one could instead attribute this to a layer of complexity (contrived or otherwise) on top of a fundamental true/false framework, but still essentially a subset of that framework. That's not to say there's anything wrong with going for complexity. The world can be a pretty complex place, so having a complex system of logic can help with understanding interactions at higher orders of magnitude, without the need to evaluate each element individually (i.e. - When counting on your fingers, the finger is only ever up or down, either there is/true/up or there is not/false/down, but since we only have a limited number of fingers to work with... decimal system).
- Any known value perceived to be somewhere either in between true and false or other than, and not one or the other, is actually not one value (simple/atomic), but only represented as one value as a result of higher levels of complexity on top of the fundamental. At it's core though, it is a collection of true and false values, which if decomposed would yield only the two values, regardless of how infinitely or infinitesimally true or false they might be.

I'm sure I have violated one or more mathematical principles somewhere along the way, so feel free to school me I'm more of a philosopher than a mathematician any way

This is what happens when I can't sleep...
Post #1142911
« Prev Topic | Next Topic »

Add to briefcase «««45678»»

Permissions Expand / Collapse