Normalization

  • Nice simple question, and reasonable answer.

    One thing I didn't like in the explanation though was "though a primary key is part of the definition according to Chris Date", as if Date had originated that concept. In fact the introduction of the primary key requirement to the relational model was in Ted Codd's "A Relational Model of Data for Large Shared Data Banks" in CACM 13.6 (June 1970) (and is of course repeated in his RM 2 book). (Of course "primary key" is a concept that's a good deal older than the relational model, but point here is that the idea that a relation has to have a primary key has been a fundamental part of the relational model ever since that paper introduced the relational model to the world).

    Tom

  • calvo (4/27/2011)


    Great thought. It's funny how even a definition of something so basic, like 1NF, can still be fuzzy. If you're going by Chris Date's definition, his fourth rule states "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)." Now, are all categories of telephone numbers (mobile, home, work etc.) considered of the same "domain"? Not to mention, adding a second telephone number field would mean one of them could possibly be null. Null fields, to some, is also a violation of 1NF.

    ... I think it's a trick question and I want extra points!

    I love it! :-D:-D:-)

    Tom

  • Its a basic question, but very important question.

    explanation is very good.

    Thanks
    Vinay Kumar
    -----------------------------------------------------------------
    Keep Learning - Keep Growing !!!

  • Steve Jones - SSC Editor (4/27/2011)


    I may be wrong, and I'll admit that. However the "use" of the data in terms of when/how you might call someone is debatable. The data domain itself is a contact method for phones, and in that definition they are definitely repeated elements even if home/cell.

    Well, I would say you are possibly right if we are using Codd's definition of 1NF and definitely wrong if we are using Date's, since Date threw away the "repeating groups" idea and replaced it by the atomicity principle which was originally introduced by Codd as part of the definition of a relation quite separate from the concept of 1NF; in the Date definition you can have as many telephone number columns as you like, you just can't put two telephone numbers in one telephone number column (but you can have a column whose domain is "list of telephone numbers") and you can avoid nulls by using "default values". It was this ditching of the original 1NF and replacing it by something almost unrelated that allowed Date to claim that a table not in 1NF couldn't represent a relation and also led to his claim that his atomicity principle was different from Codd's. (The two models are certainly different in respect of which class of domains attributes can be in: Date effectively allows a higher order relational algebra, where an attribute may have a value which is itself relation; and Codd forbids this - but he forbids it in a rule quite separate from his atomicity principle, so it's not a difference in the atomicity principles that causes this difference in the models.)

    Tom

  • That's why I feel that normalization rules are really "mostly guidelines"

    "Normalization" is a matter of differering orthodoxies, really. I've seen arguments about whether NULLs should be allowed in "normalized data" that actually included personal insults against anyone disagreeing with a certain point of view. I guess my problem is that I don't have strong feelings about normalization one way or the other. I go with what's most practical for the problem I'm trying to solve. And have been, essentially, told I was a heretic for doing so.

    I always become a little concerned when I see comments like this among database professionals (although I woudn't use the word heretic). Too many of us have seen the consequences of the "guidelines" not being followed. Usually these problems can't be fixed except at great cost, so they simply become someone's pain for as long as the database exists. I agree that people like Chris Date who otherwise make great contributions to the field take it too far with the idea that there can't be NULLs, but that doesn't invalidate the rest of it. There is a common sense factor. Technically, for example, Street Address 1 and Street Address 2 should be in a separate table, but practically that's a non-starter, even if for most Street Address 2 will be Null, or more correctly, blank.

  • GSquared (4/27/2011)


    Peter Trast (4/27/2011)


    GSquared (4/27/2011)


    Richard Warr (4/27/2011)


    Nice question and an interesting example. Would renaming the columns "Telephone 1" and "Telephone 2" to "Home Phone" and "Mobile Phone" mean that the table was then in 1NF?

    Only if a person only ever has one home number and one mobile number. I had to carry two cell phones for a while, since I had to have a company one for "off-hours support calls", and wasn't allowed to use it for any personal calls. So I had mine, and theirs. Which would you list in that column?

    Easy. Home, PersonalCell, WorkCell 😀

    And each has a different purpose.

    It is the same argument as the name normalization discussion.

    OK, here is the fix for hardcore 1NF fans. One column, long text datatype (pick your fav) like concatenating an entire name if there is no use for breaking a name down into it's parts.

    H7605551212P7605551313W7605551212D7605551515 (H for home, P for personal cell, W for work cell, D for desk, etc.)

    Just concatenate them all into one column with letters to identify each portion of the string and an area code of 000 would let you know they do not have such a phone

    H7605551212P0000000000W7605551212D7605551515

    Just make sure the string is long enough to include every possible phone...

    But the business requirement almost ALWAYS has a need for breaking this data down. That is why we talk about normalization, right?

    And let me know if any of you adopt this new column....lol... I would love to see your dialing rules to handle that string of data... :w00t:

    Woot! I love it! 😀

    Actually, because of multiple phone possibilities, I usually break it down into a Phones table, with a type column. I've even gone so far as to set up a many-to-many join to an Entities table (similar to a People table, but allows for companies and other legal "entities" that aren't individual human beings), since in some/many/most cases, one phone can reach multiple people. Other than the many-to-many piece, a vertical table is pretty usual. If it includes a country code, it can internationalize pretty easily too.

    But I really like yours because it just hits my sense of humor the right way, and, sometimes, that's the most important aspect to writing code! :w00t:

    Unfortunatly, I am seeing this practice more and more, except it's XML. Create a table, give it a name and an identity as an artficial key, and create an field that contains XML.

    The developers can put anything they want in it!

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Peter Trast (4/27/2011)


    OK, here is the fix for hardcore 1NF fans. One column, long text datatype (pick your fav) like concatenating an entire name if there is no use for breaking a name down into it's parts.

    H7605551212P7605551313W7605551212D7605551515 (H for home, P for personal cell, W for work cell, D for desk, etc.)

    Just concatenate them all into one column with letters to identify each portion of the string and an area code of 000 would let you know they do not have such a phone

    H7605551212P0000000000W7605551212D7605551515

    Just make sure the string is long enough to include every possible phone...

    Oh no! you've introduced an unneccessary NULL (spelt "000...."). If they don't have a personal cell, leave out the P and its nonexistent number - why have a symbol for absent data when you can easily just leave it out?

    And you do realise that if this string were VARCHAR(MAX) you would be restricting each person to 246 telephones? Is that enough? Surely we should use NVARCHAR instead and allow up to 65526 phones per person? Or are those numbers 244 and 65524 - can we include # and * in phone numbers?

    :w00t::w00t::crazy::hehe:

    Tom

  • GSquared (4/27/2011)


    I guess my problem is that I don't have strong feelings about normalization one way or the other. I go with what's most practical for the problem I'm trying to solve. And have been, essentially, told I was a heretic for doing so.

    I know the feeling. I guess you're my kind of heretic (an engineer).

    Tom

  • Michael L John (4/28/2011)


    Unfortunatly, I am seeing this practice more and more, except it's XML. Create a table, give it a name and an identity as an artficial key, and create an field that contains XML.

    The developers can put anything they want in it!

    Yes, with XML attributes and lots of nioce cursors it has become quite easy to write C++ programs in T-SQL and that's what many developers want to do. Of course they probably have fewer bugs than C++ programs written in C++ :hehe:

    Tom

  • Oh how I love (fast and easy) denormalized data; just write your code well enough to keep the tables in sync.

    That is asking a lot. In a large project, something will slip through no matter how careful you try to be.

  • RonKyle (4/29/2011)


    Oh how I love (fast and easy) denormalized data; just write your code well enough to keep the tables in sync.

    That is asking a lot. In a large project, something will slip through no matter how careful you try to be.

    Even in a small project, you are likely to write some bugs if you have more that a very small amount of stuff that's not in 3NF. And since devising tests to detect errors that arise from failing to normalise is usually more difficult than doing the normalisation would have been, it's unlikely that you will actuually detect these bugs until they get up and bite you.

    Denormalising below 3NF is a dangerous step that has to be very carefully evaluated in every case to make sure you are not buying yourself more problems than you are solving.

    Tom

  • Tom.Thomson (4/28/2011)


    Peter Trast (4/27/2011)


    OK, here is the fix for hardcore 1NF fans. One column, long text datatype (pick your fav) like concatenating an entire name if there is no use for breaking a name down into it's parts.

    H7605551212P7605551313W7605551212D7605551515 (H for home, P for personal cell, W for work cell, D for desk, etc.)

    Just concatenate them all into one column with letters to identify each portion of the string and an area code of 000 would let you know they do not have such a phone

    H7605551212P0000000000W7605551212D7605551515

    Just make sure the string is long enough to include every possible phone...

    Oh no! you've introduced an unneccessary NULL (spelt "000...."). If they don't have a personal cell, leave out the P and its nonexistent number - why have a symbol for absent data when you can easily just leave it out?

    And you do realise that if this string were VARCHAR(MAX) you would be restricting each person to 246 telephones? Is that enough? Surely we should use NVARCHAR instead and allow up to 65526 phones per person? Or are those numbers 244 and 65524 - can we include # and * in phone numbers?

    :w00t::w00t::crazy::hehe:

    LOL... I can't believe you are carrying my example to it's inevitable conclusion and working out the details... Makes me wonder if something like this is already being used or MIGHT be used. I almost feel bad for suggesting it now 🙂

    Peter Trast
    Microsoft Certified ...(insert many literal strings here)
    Microsoft Design Architect with Alexander Open Systems

  • Tom.Thomson (4/28/2011)


    Peter Trast (4/27/2011)


    OK, here is the fix for hardcore 1NF fans. One column, long text datatype (pick your fav) like concatenating an entire name if there is no use for breaking a name down into it's parts.

    H7605551212P7605551313W7605551212D7605551515 (H for home, P for personal cell, W for work cell, D for desk, etc.)

    Just concatenate them all into one column with letters to identify each portion of the string and an area code of 000 would let you know they do not have such a phone

    H7605551212P0000000000W7605551212D7605551515

    Just make sure the string is long enough to include every possible phone...

    And you do realise that if this string were VARCHAR(MAX) you would be restricting each person to 246 telephones? Is that enough? Surely we should use NVARCHAR instead and allow up to 65526 phones per person? Or are those numbers 244 and 65524 - can we include # and * in phone numbers?

    Um, how do you figure that you can only fit 246 phone numbers in a 2GB VARCHAR(MAX) column? If anything you can hold exactly half as much in a NVARCHAR(MAX) column. (It is still limited to 2GB but each character takes two bytes to store.)

    In any case you should be able to fit 195,225,786 phone numbers in a VARCHAR(MAX) column using that format, or 97,612,893 in a NVARCHAR(MAX) column.

  • UMG Developer (4/29/2011)


    Tom.Thomson (4/28/2011)


    Peter Trast (4/27/2011)


    OK, here is the fix for hardcore 1NF fans. One column, long text datatype (pick your fav) like concatenating an entire name if there is no use for breaking a name down into it's parts.

    H7605551212P7605551313W7605551212D7605551515 (H for home, P for personal cell, W for work cell, D for desk, etc.)

    Just concatenate them all into one column with letters to identify each portion of the string and an area code of 000 would let you know they do not have such a phone

    H7605551212P0000000000W7605551212D7605551515

    Just make sure the string is long enough to include every possible phone...

    And you do realise that if this string were VARCHAR(MAX) you would be restricting each person to 246 telephones? Is that enough? Surely we should use NVARCHAR instead and allow up to 65526 phones per person? Or are those numbers 244 and 65524 - can we include # and * in phone numbers?

    Um, how do you figure that you can only fit 246 phone numbers in a 2GB VARCHAR(MAX) column? If anything you can hold exactly half as much in a NVARCHAR(MAX) column. (It is still limited to 2GB but each character takes two bytes to store.)

    In any case you should be able to fit 195,225,786 phone numbers in a VARCHAR(MAX) column using that format, or 97,612,893 in a NVARCHAR(MAX) column.

    Well, I don't see any smileys so I guess I'd better answer that assuming you were being serious. 😉

    Each phone number is introduced by a 1 character code, and all these codes have to be different or it will be impossible to extract the single required phone number when you come to parse the string. (If you think I can get 195000000 plus a few single character codes using 8 bit characters you must think two to the power eight is a lot bigger than I think it is :hehe:.) Then from the 256 that gives me I have to subtract 10 because the characters 0,1,2,...9 can not be used as codes because I can't safely assume all numbers are the same length (if in a particular application I can guarantee that all numbers are the same length, and I know what that length is, I get 256 instead of 246; but so what - it' still a lot less than 195000000). I am surprised that you didn't notice (a) that the numbers dropped by 2 if we allowed # and * to be dialable characters (two more characters that couldn't be used to as code to indicate which of the person's phones this number addresses) and (b) that nvarchar allowed more numbers than varchar not fewer - not half as many but about 266 times as many, because both of those should make it blindingly obvious that the numbers are not concerned with the total length of the string, and how big varchar(max) can be is irrelevant.

    Tom

  • Tom.Thomson (4/29/2011)


    Well, I don't see any smileys so I guess I'd better answer that assuming you were being serious. 😉

    Each phone number is introduced by a 1 character code, and all these codes have to be different or it will be impossible to extract the single required phone number when you come to parse the string. (If you think I can get 195000000 plus a few single character codes using 8 bit characters you must think two to the power eight is a lot bigger than I think it is :hehe:.) Then from the 256 that gives me I have to subtract 10 because the characters 0,1,2,...9 can not be used as codes because I can't safely assume all numbers are the same length (if in a particular application I can guarantee that all numbers are the same length, and I know what that length is, I get 256 instead of 246; but so what - it' still a lot less than 195000000). I am surprised that you didn't notice (a) that the numbers dropped by 2 if we allowed # and * to be dialable characters (two more characters that couldn't be used to as code to indicate which of the person's phones this number addresses) and (b) that nvarchar allowed more numbers than varchar not fewer - not half as many but about 266 times as many, because both of those should make it blindingly obvious that the numbers are not concerned with the total length of the string, and how big varchar(max) can be is irrelevant.

    Duh, I was missing the obvious issue by paying more attention to the MAX portion. You are right, there would be no point in declaring it as more than VARCHAR(2816) if you only want to store one phone number of each type. But there is no reason I couldn't have multiple home, cell, fax, etc. numbers listed as long as they don't need to be identified separately. (The position in the string could identify the order to call them.)

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply