Would you store foreign key from grandparent tables? Why or why not?

  • I'm curious about potential ramifications of storing a foreign key that is derivable.

    For example, consider that we have traditional 3 tables; Customers, Orders and Order Details with appropriate foreign key constraints in place. We could want to filter all Order Details rows using Customer primary key. Customer key can be derived via the intervening Orders table so there is no need to store the Customer key in the Order Details table.

    In most scenarios, we usually would be joining all grandparent, parent and the child tables (e.g. we usually want the customer's name, order's date and then the line items) so this is usually a moot point. However, suppose there is a legitimate need to filter the child table based on grandparent's keys without joining the parent table in. A possible example is when you have a Customer -> Account -> Order and for large part of the application, you just need to correlate Customer to Order and not be concerned about Account details. In theory, storing & indexing Customer's keys in the Order table would save us from joining in the intervening Account table and should be a bit faster.

    Has this worked out in practice? If you've done it before, how did it work out? If not, why not?

    Sample T-SQL illustrating possible schema for enforcing grandparent keys in the child table is provided.

    EDIT: Updated the script to include the ON UPDATE CASCADE option on child's foreign key.

    SET QUOTED_IDENTIFIER ON;

    SET ANSI_NULLS ON;

    GO

    CREATE DATABASE demo;

    GO

    USE demo;

    GO

    CREATE TABLE dbo.GrandParent(

    ID INT CONSTRAINT PK_GrandParent PRIMARY KEY

    );

    CREATE TABLE dbo.Parent(

    ID INT CONSTRAINT PK_Parent PRIMARY KEY,

    GrandParentID INT NOT NULL CONSTRAINT FK_Parent_GrandParentID FOREIGN KEY REFERENCES dbo.GrandParent(ID),

    CONSTRAINT UQ_Parent_ID_GrandParentID UNIQUE (ID, GrandParentID)

    );

    CREATE TABLE dbo.Child(

    ID INT CONSTRAINT PK_Child PRIMARY KEY,

    GrandParentID INT NOT NULL,

    ParentID INT NOT NULL,

    CONSTRAINT FK_Child_Parent_GrandParent FOREIGN KEY (ParentID, GrandParentID) REFERENCES dbo.Parent(ID, GrandParentID) ON UPDATE CASCADE

    );

    GO

    INSERT INTO dbo.GrandParent(ID) VALUES

    (1),

    (2),

    (3)

    ;

    INSERT INTO dbo.Parent(ID, GrandParentID) VALUES

    (1, 1),

    (2, 1),

    (3, 1),

    (4, 2),

    (5, 2),

    (6, 2),

    (7, 3),

    (8, 3),

    (9, 3)

    ;

    INSERT INTO dbo.Child(ID, GrandParentID, ParentID) VALUES

    (1, 1, 1),

    (2, 1, 2),

    (3, 1, 3),

    (4, 2, 4),

    (5, 2, 5),

    (6, 2, 6),

    (7, 3, 7),

    (8, 3, 8),

    (9, 3, 9),

    (10, 1, 1),

    (11, 1, 2),

    (12, 1, 3),

    (13, 2, 4),

    (14, 2, 5),

    (15, 2, 6),

    (16, 3, 7),

    (17, 3, 8),

    (18, 3, 9),

    (19, 1, 1),

    (20, 1, 2),

    (21, 1, 3),

    (22, 2, 4),

    (23, 2, 5),

    (24, 2, 6),

    (25, 3, 7),

    (26, 3, 8),

    (27, 3, 9),

    (28, 1, 1),

    (29, 1, 2),

    (30, 1, 3),

    (31, 2, 4),

    (32, 2, 5),

    (33, 2, 6),

    (34, 3, 7),

    (35, 3, 8),

    (36, 3, 9)

    ;

    GO

    --Should fail; violates contrsaint

    INSERT INTO dbo.Child(ID, GrandParentID, ParentID) VALUES

    (37, 1, 4);

    GO

    --Should fail; key is in use

    UPDATE dbo.Parent

    SET ID = 10

    WHERE ID = 2;

    GO

    --Should succeed, child's grandparentkey will be updated

    UPDATE dbo.Parent

    SET GrandParentID = 3

    WHERE ID = 2;

    GO

    --Should succeed; changing the parent but not grandparent

    UPDATE dbo.Child

    SET ParentID = 2

    WHERE ID = 1;

    GO

    --Should fail; not valid ParentID for the given Grandparent

    UPDATE dbo.Child

    SET ParentID = 4

    WHERE ID = 1;

    GO

    --Should succeed, changing both parent and grandparent

    UPDATE dbo.Child

    SET ParentID = 4,

    GrandParentID = 2

    WHERE ID = 1;

    GO

    --Should fail; can't change GrandParentID without changing ParentID

    UPDATE dbo.Child

    SET GrandParentID = 3

    WHERE ID = 1;

    GO

    USE master;

    GO

    DROP DATABASE demo;

    GO

  • I would highly recommend NOT doing that. It will cause you untold pain in the future. You are proposing to violate normal form which will cause you far more pain than it will save. There is a reason you identify these tables as grandparent -> parent -> child. You will end up with a total mess on your hands. You are denormalizing your database to make it easier for queries. This comes at the cost of data integrity. Just my 2ยข.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I'd have agreed if the updates required additional work (e.g. manually updating grandparent key in both parent and child table) but the schema I demonstrated does that via ON UPDATE CASCADE and UNIQUE constraint on the parent table and therefore requires no additional coding. Since this is pure SQL with no triggers or other manual coding, I'm less than sure that this is a clear-cut case of "bad" denormalizing, which is why I asked.

    Can you demonstrate where data integrity will be broken in this schema?

  • Well your constraints do prevent bad data. It certainly makes updates more of a pain to deal with. I guess the issue really is denormalization. There are certainly times when this is appropriate. I can't really hazard a guess if your situation is one of those or not because the sample tables are so simple. My guess is that because you are asking the question it probably is not the best way to proceed. Typically if something doesn't seem right, it probably isn't.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ill add a little of my view to sean's. in an OLTP (Transaction) system i would say it would be very limited where i would de-normalize, i would not recommend it as a first choice tool. there are times (and in my current job before i got here they de-normalized the structure) where it can make sense but that determination is probably not going to be made through a forum thread.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Sean:

    I certainly do agree that we're doing more work on the inserting new rows to Child tables and that also allows for more errors (e.g. trying to insert wrong grandparent ID due to a bug in the application).

    However, I hope you forgive me for saying that but I was hoping for more than "nobody has done it so it must be wrong". After all, it didn't stop Codd when he first formalized what we now know as normalization. ๐Ÿ˜‰

    Normalization are quite clear-cut because we can see the negative consequence of update anomalies created when normalization is violated. The only reason to denormalize is really due to physical limitations and even so it's mightily an iffy proposition in OLTP realm. For the above schema, the only valid reason (to me at least) is to be able to create an index on the Child that correlates to the GrandParent records and having enough of cardinality to make a difference... in theory.

    For example, we could compare those queries:

    Traditional approach:

    SELECT c.Customer, o.*

    FROM dbo.Customers AS c

    INNER JOIN dbo.Accounts AS a

    ON c.ID = a.CustomerID

    INNER JOIN dbo.Orders AS o

    ON a.ID = o.AccountID

    WHERE c.Inactive = 0

    AND c.CustomerType = 1;

    Alternative approach:

    SELECT c.Customer, o.*

    FROM dbo.Customers AS c

    INNER JOIN dbo.Orders AS o

    ON c.ID = o.CustomerID

    WHERE c.Inactive = 0

    AND c.CustomerType = 1;

    Under traditional approach, we'd need to use indices on Order's AccountID and on Account's CustomerID to effectively process the results whereas the alternative approach would use only single index. That would certainly mean less pages to access in returning the results, right? Or am I missing factors/other details that may make this unnecessary or overstating the benefits?

    capnhector:

    I absolutely agree that denormalization shouldn't be the first choice. My intention was primarily to find out if someone else thought of that already and see what experience was like. Sean's suggestion that having to asking suggests that it may be wrong is typically correct but I didn't just want to simply assume. ๐Ÿ™‚

  • Traditional approach:

    SELECT c.Customer, o.*

    FROM dbo.Customers AS c

    INNER JOIN dbo.Accounts AS a

    ON c.ID = a.CustomerID

    INNER JOIN dbo.Orders AS o

    ON a.ID = o.AccountID

    WHERE c.Inactive = 0

    AND c.CustomerType = 1;

    Alternative approach:

    SELECT c.Customer, o.*

    FROM dbo.Customers AS c

    INNER JOIN dbo.Orders AS o

    ON c.ID = o.CustomerID

    WHERE c.Inactive = 0

    AND c.CustomerType = 1;

    We do something similar to the above. if you have alot of queries that make that join it does justify testing the improvement of all of the queries. in our case we found a large improvement in overall execution time of the queries we were able to simplify (executed several million times a day 2-3 ms makes a huge difference over a day). so if you can justify it with performance testing and showing an improvement i dont think there would be an issue.

    The question then becomes why were AccountID and CustomerID seperate to begin with? Do you need both in the orders table?


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • capnhector -

    In this hypothetical example, it was assumed that a customer could have several accounts and orders are related to accounts for budgeting purposes. In this situation, we shouldn't store CustomerID in the Orders table because it's already implied by AccountID.

    You've indicated that in some situations, being able to skip a join can make a big difference provided that we can back the specific scenario with a performance test. Did you include the unique constraint & update cascade?

  • we did not include the unique constraint and do not foreign key to the accounts table on CustomerID, AccountID from the orders table. (all tables changed to reflect the previously posted queries). we did however leave the AccountID and just added the CustomerID column to the orders table with a foreign key back to the customers table. and while this could lead to an update error our business rules are written that the update is never made.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply