Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Second Normal Form Expand / Collapse
Author
Message
Posted Thursday, July 14, 2011 7:55 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 3:05 AM
Points: 40, Visits: 271
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...

Post #1141781
Posted Thursday, July 14, 2011 9:40 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:37 AM
Points: 7,105, Visits: 15,456
Smendle (7/14/2011)
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...



True. However to play devil's advocate:
- the report is already "broken" since it (potentially) returns bad data.
- space usage would likely DECREASE, since you're not duplicating all of the employee-dependent data.

It's definitively a non-trivial task to do this after the fact, but - there is a gain to be found in there.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1141870
Posted Thursday, July 14, 2011 10:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:45 AM
Points: 7,804, Visits: 9,556
Smendle (7/14/2011)
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...


That's why you should try to do all the normalisation while designing the schema, before writing code for queries and reports.

Peronally I recommend that when you have a rough schema design you list all the business rules that you would like the schema to enforce for you without requiring code to do it, express those rules as functional dependencies where you can do so, and then use Bernsteins algorithm to derive a schema which conforms to 3NF (actually it will conform to EKNF, which is better than 3NF, even though Bernstein was only targetting 3NF when he designed the algorithm). Check and make sure that all business rules are now covered by key constraints (which will involve adding NOT NULL constraints, UNIQUE constraints, and Foreign Key constraints as needed - it may also involve adding some auxiliary tables if any of the weird cases for which BCNF was intended exist: this is usually a better bet than actually going to BCNF, since in those cases BCNF can't enforce all the functional dependencies, while EKNF can if you add the auxiliary tables). Then look at any business rules which couldn't be expressed as functional dependencies: if you didn't have to add auxiliary tables to enforce FDs without going to BCNF, you can normalise to 4NF to get a schema which enforces any business rules expressed as multivalued dependencies as well as those expressed as functional dependencies; or if you are lucky you may find that the multi-valued dependencies have no impact on the bits of the schema that needed the auxiliary tables, so you can enforce the multi-valued dependencies by taking the other parts of the schema to 4NF (there is no easy way like Bernstein's algorithm to get to 4NF - or even to BCNF - so this is a bit of a slog). At worst you will end up with an EKNF schema with bits of it in 4NF and some bits not in 4NF, and there the only advice I can offer is to examine each unenforced multi-valued dependency very carefully and decide whether to enforce it (taking another part of the schema to 4NF but losing some enforcement of simple FDs) or not - either way you will need code (either a good stored procedure-based interface with no direct access from the application to tables and views, only to SPs, or a set of triggers) to enforce the business rules that the key constraints can't enforce (obviously check constraints can help sometimes, but they can't do the whole job); then look and see if any of your tables have any non-trivial join dependencies, and if so are they implied by the candidate keys or not; if there are any not implied by the candidate keys, you may want to split those tables (ie go to 5NF) - but as with the MVDs, there can be a trade-off here, since the split may lose enforcement of FDs (and you may find that the number of tables involved in joins is getting too big, potentially making the code hard to understand). If you want to have a lot of time-dependent information - lots of things have values associated with a period in time, and you want to retain all the history - you may want to elimainate all non-trivial dependencies (go to 6NF) but I can't offer any advice there, it's not within my field of competence.
Anyway, once you've done all that hard work of schema design, always being sure to allow as much flexibility as possible since the business rules may change next week (or next year, or whenever) you will find it's enormously easier to write simple and bug-free code than if you hadn't done it. If you just throw something together without any thought for normalisation you will find that you get some very complex code, and probably find it hard to maintain, hard to emhance, and discover that you need to normalise to survive - but it is now much harder that it was up front.


Tom
Post #1141929
Posted Thursday, July 14, 2011 12:26 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 31, 2013 8:01 AM
Points: 1,228, Visits: 1,046
Tom.Thomson (7/14/2011)
Lempster (7/14/2011)
I must admit that I got a bit lost by the description in your first paragraph, but the example that you gave made it clear and confirmed to me that 2NF was what I thought it was - phew!

Thanks very much, looking forward to 3NF and beyond...


Thanks for that.

I should have used a betterl description (the form of words I used was pretty poor). Something like "If X is a set of prime attributes and A is a non-prime attribute and the value of A is determined by the values of X then X contains the whole of at least one candidate key". Your comment made me read that first paragraph again and I found it confusing me this time and I'm not sure what I wrote is actually right (it looks a bit too much like a definition of 3NF).

EDIT: I provided Steve with new text for the definition and it is now incorporated in teh article. Thanks' Lemperer, for helping make this a better article.


TOM: I have never seen 2NF explained without a VIN diagram or a Set Diagram before. Not Bad. Mere mortals might have grasped this faster with a Set diagram, but the examples and descriptions taught the reader what this is in Data Relation theory, not Set math. NICE!
Post #1142008
Posted Thursday, July 14, 2011 3:02 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, September 26, 2014 3:23 PM
Points: 2,116, Visits: 6,439
Hi Tom,

Again a nice article, and looking forward to 3NF. Are you contemplating going to go any further than that?

Just one question, further above in the discussion you mention Bernstein's Algorithm. Never heard of that, and a (rather quick) search on Google didn't yield any useful results. Could you elaborate a little bit on that?

Regards,
Jan


--------------------------------------------------------------------------
A little knowledge is a dangerous thing (Alexander Pope)

In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)
Post #1142109
Posted Thursday, July 14, 2011 6:47 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:45 AM
Points: 7,804, Visits: 9,556
Jan Van der Eecken (7/14/2011)
Hi Tom,

Again a nice article, and looking forward to 3NF. Are you contemplating going to go any further than that?

I'm planning/hoping to do three articles after the 3NF one: one on normal forms intermediate between 3NF and 4NF and how the representation principle may get violated somewhere in that space (which is perhaps the most important article in the series, because understanding the representation principle in its various forms is the core concept of normalisation), one on 4NF and one on 5NF which will probably include a definition of 6NF but certainly no examples of 6NF (that will be Date's 6NF, not Fagin's 6NF - Fagin's 6NF aka DKNF may get a mention in the representation principle article, or not, depending on how I work the article out).

Just one question, further above in the discussion you mention Bernstein's Algorithm. Never heard of that, and a (rather quick) search on Google didn't yield any useful results. Could you elaborate a little bit on that?

The article describing the algorithm was published in ACM ToDS 1/4 (December 1976) beginning at page 277, the author was P.A.Bernstein and it was titled "Synthesizing Third Normal Form relations from functional dependencies"; I don't know of any way of getting it without paying the ACM (they deserved paymenty for the original publication, but given that they are a learned society and that ToDSis supposed to be a genuine technical joyrnal I think that this long afterwards http://www.gnu.org/philosophy/right-to-read.html is relevant). A related article is "A New Normal Form for the Design
of Relational Database Schemata" by Carlo Zaniolo, in ACM ToDS 7/3 (Sept 1982), which is available without charge somewhere on the web (I'll not say where, in case the US copyright mafia sue me - stating facts is now a serious act entailing punitive damages in American law, and since it is also classified in the USA as a criminal offence and we have a crazy extradition treaty with the US the lunatics in charge of the US Asylum can extradite Brits from Britain for offenses committed in Europe against US law which are perfectly legal acts in Europe).


Tom
Post #1142144
Posted Friday, July 15, 2011 1:42 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 4, 2013 5:00 AM
Points: 11, Visits: 28
Greate article and the disscussion about historical data, here I am going to say something and need your help.


I would also add effectiveStartDate and effectiveEndDate fields to the PayGrade_Details table so that you could see that Charlie Fixit spent 3 years at paygrade 12 and then got bumped up to paygrade 13 and when it happened. Then I would just grab the paygrade record whose start date/end date falls within the proper date range that I need...I'm not sure if that is the correct way to do it or what NF that is but that's the way I've been doing it


I am also doing this fromdate/todate technique to store historical data, but I am not sure that this is the good one to do this... In this particular way we need to keep NULL value or any predefined default value into 'todate' column untill the row is treated as the current... please advice any other way to mange the historical data...


I could also use a historical or archive paygrade table and move the old records into that table when they are changed...but I'm not sure which way is the recognized "proper" way to do it.


I haven't try it till now, I think it could be better than first one. because there is no NULL value stored here......

Thanks,


Fazal Khan
C.T.O
TARECO SOFTONICS
Post #1142261
Posted Friday, July 15, 2011 3:29 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:45 AM
Points: 7,804, Visits: 9,556
fazalkhansb (7/15/2011)
Greate article and the disscussion about historical data, here I am going to say something and need your help.


I would also add effectiveStartDate and effectiveEndDate fields to the PayGrade_Details table so that you could see that Charlie Fixit spent 3 years at paygrade 12 and then got bumped up to paygrade 13 and when it happened. Then I would just grab the paygrade record whose start date/end date falls within the proper date range that I need...I'm not sure if that is the correct way to do it or what NF that is but that's the way I've been doing it


I am also doing this fromdate/todate technique to store historical data, but I am not sure that this is the good one to do this... In this particular way we need to keep NULL value or any predefined default value into 'todate' column untill the row is treated as the current... please advice any other way to mange the historical data...


I could also use a historical or archive paygrade table and move the old records into that table when they are changed...but I'm not sure which way is the recognized "proper" way to do it.


I haven't try it till now, I think it could be better than first one. because there is no NULL value stored here......

Thanks

As I said in an earlier message, this is an area where my experience is limited. You can choose to distinguish historical and current data either by the historical data in a separate table (which complicates the introduction of a new value - the old one has to be moved) or by not recording the end date at all because it is redundant - the previous value's reign ends when the next value begins (but then you have to be wary of performance - watch out for "triangular joins") or by using a special value (or NULL, but that's usually not a good idea) for the end date to distinguish current data (that may be the best option - but don't use a date anywhere in the feasible lifetime of your database as that special value). Look at Esteban Zimányi's 2006 paper "Temporal Aggregates and Temporal Universal Quantification in Standard SQL" (available on pages 16 to 25 of this pdf doc, or get hold of the book "Temporal Data and the Relational Model" by Date, Darwen and Lorentzos if you are heavily into data warehousing with history requirements. For a brief discussion the normal form level of some table types that may be useful there is this but be warned it's aimed at showing that their chosen table forms are all in 6NF, rather than showing how to do anything useful with them; but if you like it, or the tables it describes, you may want to read up on anchor modelling more generally.


Tom
Post #1142316
Posted Friday, July 15, 2011 5:18 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, May 4, 2013 5:21 AM
Points: 52, Visits: 117
Look at Esteban Zimányi's 2006 paper "Temporal Aggregates and Temporal Universal Quantification in Standard SQL" (available on pages 16 to 25 of this pdf doc, ...


Wow that's great...thanks for the link Tom.

George
Post #1142378
Posted Friday, July 15, 2011 9:38 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, April 4, 2012 8:55 AM
Points: 68, Visits: 94
Good post ... although I must say that the first paragraph has no learning value, but it does establish your 'geek credentials'

One informal analogy I use when thinking about 2NF is 'being on subject', or 'topicality':

Tables store data about things, and that thing, or 'topic' or 'subject', should be as narrowly defined as possible. Ideally, the table name should tersely convey this topic.

You can then ask the question 'is this field topical to the table?' or 'is it on subject?'. In your example, when viewing 'Current_Assignment' as a topic name, it becomes clear that fields 'PayGrade' and 'YearsInGrade' aren't topical: they belong to the employee, and don't add anything to the subject of 'projects employees are assigned to'.

A candidate key then becomes a minimal set of facts required to say something truly unique about the topic. If your candidate key is 'Employee and Project', then again it becomes clear that saying something about just the employee adds nothing new to the topic of project assignments.
Post #1142577
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse