Normalization

  • 31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.
  • Yes.  Author and Publisher do not belong in the table shown.  I strongly doubt Edition or Media belongs there either.

    I'm not sure about Date.  I'm not sure how ISBN is handled for subsequent editions of the same title.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Then what should I do 

    Fully Functional Dependency:
    ISBN --> Title , Publisher, Date , Edition ,Media,

    Partial Dependency:

    Title --> Author_LastName , Author_FirstName

    Transitive Dependency
    Title -->Publisher

    Is that correct?

  • ScottPletcher - Monday, April 23, 2018 4:34 PM

    Yes.  Author and Publisher do not belong in the table shown.  I strongly doubt Edition or Media belongs there either.

    I'm not sure about Date.  I'm not sure how ISBN is handled for subsequent editions of the same title.

    ISBN is unique for each edition of the same book.

  • Let me know Partial and transitive dependency in it?
    Can we write Title --> Publisher?
    what about author entity ?

  • Lynn Pettis - Monday, April 23, 2018 8:11 PM

    ScottPletcher - Monday, April 23, 2018 4:34 PM

    Yes.  Author and Publisher do not belong in the table shown.  I strongly doubt Edition or Media belongs there either.

    I'm not sure about Date.  I'm not sure how ISBN is handled for subsequent editions of the same title.

    ISBN is unique for each edition of the same book.

    Although, there are two different ISBN values for each edition. Probably not applicable here, but weird little fact.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Tuesday, April 24, 2018 6:15 AM

    DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    You're conflating logical design and physical design.  Normalization is a logical process.  Performance doesn't exist at this stage. 
    Rather, the most important thing is to model the data accurately and completely.  Formality is needed for a truly good design.

    Author can't go in the same table because of 1NF -- a book can have many authors, and repeating values are not allowed in any normalized form.

    During the physical design stage, denormalization is done if/as necessary.  [Btw, just slapping an identity column on every table is not really a "design" --  the single most destructive myth in table design is that "every table should by default be clustered on identity" (yes, that's my pet peeve, because it is so damaging.)]

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, April 24, 2018 7:53 AM

    sgmunson - Tuesday, April 24, 2018 6:15 AM

    DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    You're conflating logical design and physical design.  Normalization is a logical process.  Performance doesn't exist at this stage. 
    Rather, the most important thing is to model the data accurately and completely.

    Author can't go in the same table because of 1NF -- a book can have many authors, and repeating values are not allowed in any normalized form.

    During the physical design stage, denormalization is done if/as necessary.

    Yep.  And I'll continue to do so until someone can demonstrate a clear benefit to changing my perspective.   I haven't had a database design I've created need changing at any time in the last 25 years, so I'll keep doing what works.   Formalizing and separating logical and physical design for me is a waste of my valuable time.   I have to live in the real world, and given most of the designs I keep coming across that seem to focus a tad too intently on normalization, despite their having been ample evidence to support a slight deviation, I continue to be amazed at just how silly they can get, as I end up having to either seriously change the design or jump through query hoops left right and sideways.  Both are usually costly and time-consuming.   And yes, I do have an attitude.  It didn't take all that long to develop, either, and in the last quarter century, I have yet to see ANY evidence supporting a better alternative.   I'll continue to focus solely on the solution.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Tuesday, April 24, 2018 8:14 AM

    ScottPletcher - Tuesday, April 24, 2018 7:53 AM

    sgmunson - Tuesday, April 24, 2018 6:15 AM

    DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    You're conflating logical design and physical design.  Normalization is a logical process.  Performance doesn't exist at this stage. 
    Rather, the most important thing is to model the data accurately and completely.

    Author can't go in the same table because of 1NF -- a book can have many authors, and repeating values are not allowed in any normalized form.

    During the physical design stage, denormalization is done if/as necessary.

    Yep.  And I'll continue to do so until someone can demonstrate a clear benefit to changing my perspective.   I haven't had a database design I've created need changing at any time in the last 25 years, so I'll keep doing what works.   Formalizing and separating logical and physical design for me is a waste of my valuable time.   I have to live in the real world, and given most of the designs I keep coming across that seem to focus a tad too intently on normalization, despite their having been ample evidence to support a slight deviation, I continue to be amazed at just how silly they can get, as I end up having to either seriously change the design or jump through query hoops left right and sideways.  Both are usually costly and time-consuming.   And yes, I do have an attitude.  It didn't take all that long to develop, either, and in the last quarter century, I have yet to see ANY evidence supporting a better alternative.   I'll continue to focus solely on the solution.

    And that's why developers should not "design" databases.  Btw, this is impossible in reality to maintain a best-performing db: "I haven't had a database design I've created need changing at any time in the last 25 years", since just changes in data types and new performance options make that unattractive.

    I'm sure your leaving the Author with the ISBN and Title would require a database change (or egregious amounts of data anomalies.  But not to worry, that's just "fancy" talk, no need to worry about that, it's just millions more updates while trying to keep overly denormalized data in sync.  Well, at least you create new "work" for yourself, albeit all wasted).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, April 24, 2018 8:23 AM

    sgmunson - Tuesday, April 24, 2018 8:14 AM

    ScottPletcher - Tuesday, April 24, 2018 7:53 AM

    sgmunson - Tuesday, April 24, 2018 6:15 AM

    DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    You're conflating logical design and physical design.  Normalization is a logical process.  Performance doesn't exist at this stage. 
    Rather, the most important thing is to model the data accurately and completely.

    Author can't go in the same table because of 1NF -- a book can have many authors, and repeating values are not allowed in any normalized form.

    During the physical design stage, denormalization is done if/as necessary.

    Yep.  And I'll continue to do so until someone can demonstrate a clear benefit to changing my perspective.   I haven't had a database design I've created need changing at any time in the last 25 years, so I'll keep doing what works.   Formalizing and separating logical and physical design for me is a waste of my valuable time.   I have to live in the real world, and given most of the designs I keep coming across that seem to focus a tad too intently on normalization, despite their having been ample evidence to support a slight deviation, I continue to be amazed at just how silly they can get, as I end up having to either seriously change the design or jump through query hoops left right and sideways.  Both are usually costly and time-consuming.   And yes, I do have an attitude.  It didn't take all that long to develop, either, and in the last quarter century, I have yet to see ANY evidence supporting a better alternative.   I'll continue to focus solely on the solution.

    And that's why developers should not "design" databases.  Btw, this is impossible in reality to maintain a best-performing db: "I haven't had a database design I've created need changing at any time in the last 25 years", since just changes in data types and new performance options make that unattractive.

    I'm sure you're leaving the Author with the ISBN and Title would require a database change (or egregious amounts of data anomalies, but that's just "fancy" talk, no need to worry about that, just do millions more updates while trying to keep overly denormalized data in sync; at least you create new "work" for yourself, albeit all wasted).

    At no point did I either suggest leaving the table as is, or advocate any specific changes.   Usually that's a good idea when someone wants to know what's transitive and what's partial, because there's too much opportunity for unclear meaning.   I'll ask questions instead.  In this case, normalization for both author and publisher seem likely to be necessary, but without knowledge of the specific application, it's impractical to guess.   I'm not against normalization, and my designs are usually normalized to some degree.   What degree is determined by the need and/or applicable usage, including; but not limited to; the volume thereof and response time expectations.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Grant Fritchey - Tuesday, April 24, 2018 5:15 AM

    Lynn Pettis - Monday, April 23, 2018 8:11 PM

    ScottPletcher - Monday, April 23, 2018 4:34 PM

    Yes.  Author and Publisher do not belong in the table shown.  I strongly doubt Edition or Media belongs there either.

    I'm not sure about Date.  I'm not sure how ISBN is handled for subsequent editions of the same title.

    ISBN is unique for each edition of the same book.

    Although, there are two different ISBN values for each edition. Probably not applicable here, but weird little fact.

    IIRC, same edition different formats will also get different ISBNs as well.  I am going off memory from when I worked at a previous employer 13 years ago and actually had to deal with ISBNs.
    Also, pretty sure those ISBN values in the original post aren't.

  • Lynn Pettis - Tuesday, April 24, 2018 9:02 AM

    IIRC, same edition different formats will also get different ISBNs as well.  I am going off memory from when I worked at a previous employer 13 years ago and actually had to deal with ISBNs.
    Also, pretty sure those ISBN values in the original post aren't.

    Yep. These are from my book:
    ISBN-13:  978-1430267430  
    ISBN-10:  1430267437

    So, yeah, I'd radically redesign the tables we've been looking at here.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • sgmunson - Tuesday, April 24, 2018 8:14 AM

    ScottPletcher - Tuesday, April 24, 2018 7:53 AM

    sgmunson - Tuesday, April 24, 2018 6:15 AM

    DBA Live - Monday, April 23, 2018 4:30 PM

    31068894_144334073075083_2256829684245331968_n.png

    1. is above table has partial dependency between title and author name ?
    2. if yes then there would be a transitive dependency between title and publisher or not ? please help.

    I gave up trying to be so formal about the nature of data.   ISBN numbers are unique, so while a fully normalized table would contain pointers to Author and Publisher tables, calling something a "partial dependency" or a "transitive dependency" just confuses most people.   Every ISBN has an Author and a Publisher, even if those entities turn out to be one and the same, so I fail to see anything "partial" here, from my point of view.   However, my point of view doesn't always agree with the alleged science.   Practicality and performance are just too important to ignore for the sake of normalization.  Things that can complicate matters...  1) the potential for multiple authors for a title (e.g. for royalty purposes), 2) the need to link multiple editions together for search purposes, 3) Date, as shown, is only a year, and usually there is both year and month somewhere on most books within the first couple of physical pages, and the same ISBN can have multiple printings or re-printings, which speaks potentially to Edition, but not necessarily.  Thus one might need a column to designate a number representing which print run it was, which could take you back to having to expand primary key beyond ISBN to include which printing it is, and 4) how about much older books that for whatever reason, break this model in some way...

    Plenty to think about there, so what may be far more important is how such a table will be used, and how often, and by how many users.   Such answers are far more useful than some simple but mostly meaningless phrase that speaks to a type of dependency for certain of the data.   Data modeling is often too much science and not enough practical experience.  Over normalization is usually the result, which often leads to poor performance and is often wasteful of resources - especially disk space.   Such models are also often considerably less flexible and changes can be hard to make,    Consider your options carefully....

    You're conflating logical design and physical design.  Normalization is a logical process.  Performance doesn't exist at this stage. 
    Rather, the most important thing is to model the data accurately and completely.

    Author can't go in the same table because of 1NF -- a book can have many authors, and repeating values are not allowed in any normalized form.

    During the physical design stage, denormalization is done if/as necessary.

    Yep.  And I'll continue to do so until someone can demonstrate a clear benefit to changing my perspective.   I haven't had a database design I've created need changing at any time in the last 25 years, so I'll keep doing what works.   Formalizing and separating logical and physical design for me is a waste of my valuable time.   I have to live in the real world, and given most of the designs I keep coming across that seem to focus a tad too intently on normalization, despite their having been ample evidence to support a slight deviation, I continue to be amazed at just how silly they can get, as I end up having to either seriously change the design or jump through query hoops left right and sideways.  Both are usually costly and time-consuming.   And yes, I do have an attitude.  It didn't take all that long to develop, either, and in the last quarter century, I have yet to see ANY evidence supporting a better alternative.   I'll continue to focus solely on the solution.

    Yes, Dr. Codd and CJ Date and all the experts who've thought about this for the last 55 years -- and those that follow them -- or so are completely stupid and wrong and you, and you alone, are the only one that's right.  They've just (deliberately?) wasted our time by making us do a logical design.

    After all, as long as the db performs OK when it is first implemented, then it must be a good "design", right?  Just slap the data quickly into tables and it will work out just fine.  In the real world, that caused massive data issues and rework, which is why it was changed to think a little before leaping.  But, obviously, that's not for everyone.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Where's the popcorn, the flames will cook it fast.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply