PK / FK design question

  • Hi there,

    Little background. I need to design a set of tables that will hold transactions. There are different types of transactions and although about half of the columns are common for all types the other half is totally different depending on which kind of transaction we are talking about.

    Pretty simple and common scenario but for sake of having an example to work with in our discussion let's write a random case scenario:

    Columns:

    columns common to all transaction types: TranID, CustomerID, Amount, Date

    additional columns for trantype 1: ReferredCustomerID

    additional columns for trantype 2: SourceProductID, ReversedDate

    Possible Implementations:

    Scenario #1: Create one distinct table for each transaction type and create a view that provides a 'union all' of all the common columns across all the tables. The primary key on all these tables would be the TranID.

    Scenario #2: Create one global table that has all the shared columns along with a TranTypeID. When you need to query all information for a specific TranType then you need to join with whatever table contains the extra info for that TranType. The primary key on all these tables would be the TranID.

    Scenario #3: Same as Scenario #2 except that the primary key on all the tables would be TranID+TranTypeID. The table containing the additional columns for TranType #1 would always have a "1" in the TranType column. And the table containing the additional columns for TranType #2 would always have a "2" in the TranType column...

    In the past I've pretty much always followed "scenario #2" when faced with this type of scenario. However this type I am debating if I should go with #3... Going with #3 would enforce stronger integrity of my data because it guarantees that you will never have a case where a transaction of TranType 2 would end up having an entry in the details table for TranType 1... On the other hand scenario #3 uses more disk space and the overhead of maintaining the FK will be higher...

    I hope I explained myself well. As you can see it's a very simple situation but I would be curious knowing what you guys' opinion is on the matter

    -------------------

    FYI if anyone has any comment on my other post at http://www.sqlservercentral.com/Forums/Topic466314-361-1.aspx please reply! I didn't receive any feedbacks yet...

    -------------------

    Thanks

  • I think that Scenario is definitely preferred, and the canonical way to do it as well. The additional overhead seems minimal to me.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • thanks for the reply,

    so let's say I go with scenario #3...

    Knowing that in one of the two tables TranTypeID will always be the same. (In the table containing the extra columns depending on the type...)

    Is it more efficient to defined a PK on TranID+TranTypeID in both tables or should my PK be TranID only, along with a unique constraint on TranID+TranTypeID in both table. (I need one or the other otherwise I can't create a FK constraint)

  • Harveysburger (3/14/2008)


    thanks for the reply,

    so let's say I go with scenario #3...

    Knowing that in one of the two tables TranTypeID will always be the same. (In the table containing the extra columns depending on the type...)

    Is it more efficient to defined a PK on TranID+TranTypeID in both tables or should my PK be TranID only, along with a unique constraint on TranID+TranTypeID in both table. (I need one or the other otherwise I can't create a FK constraint)

    If TranID is unique, then it alone should be the PK. Extending a Primary key with auxillary data will just defeat it's constraint purposes.

    So PK: TranID

    And AK: TranID+TranTypeID

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks!

  • happy to help.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I just read a really good article that explained all of this, but I cannot find it.

    I think it was on this site too...

    Regardless, I agree with approach 3. If your record can be of only one type, put the TypeID column on both your main table and the tables with the additional columns and FK on both fields.

    If your record can be more than one type, leave the type off of the main table and OK just by the one field.

    Either way, include the TypeID on the tables with the additional fields and I would also recommend either using a calculated field (which in SQL 2005 you can now use as part of the PK) or use a check constraint to ensure the TypeId field cannot be accidentally over-written by an over-enthusiastic developer.

  • I never would have thought of doing the TypeID in the sub-tables as a calculated column. Tried it out, and you have to persist it to use it that way, and it works. Not sure if it has advantages one way or the other over a check constraint, but it is another option, and those are good to know.

    On the original post, I'd definitely go with #3. I've used variations on all three, and that's the easiest one to maintain and work with.

    The main problem with #1 is that, if you add a new type, you have to add a new table, and revise ALL code that references the tables. Definitely avoid that one.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • With a constraint, you can possibly bulk insert with constraints off or you can simply disable a constraint.

    I don't think it is much risk, but I suppose it is possible.

    I am not sure if there is a down side to using a calculated field. There probably is something.

    Sometimes I think we are starting to get too many options.

    In a way, it was nice every once-in-awhile to do something because there was no other choice, now we spend hours trying to figure stuff like this out...

  • In this thread people seem to agree that it is more appropriate to have the foreign key on both the PK and TranTypeID... however I ran into a little design issue, which I described in

    http://www.dbforums.com/showthread.php?t=1628597

    http://www.sqlservercentral.com/Forums/Topic473696-361-1.aspx

    in those thread the idea of having such composite foreign keys turned out to be very unpopular... any comments?

  • hey all, sorry to bring this post back up again,

    I was wondering, perhaps a scenario #4 would be to define insert/update triggers on the details table that will essentially make sure the detail entry has the correct typeID for the table... I know trigger can be disabled so it's not fool proof but I am wondering performance wise will it be better to ...

    a) have such trigger

    b) have composite foreign key as described in scenario #3

  • The popular answer on this and many of the other threads is scenario 3 and some for scenario 2. Just my humble opinion, but I think they might all be wrong (heh... imagine that 🙂 ). Despite the fact that it's not the popular answer and despite the fact that it'll take a bit more initial planning, I believe that Scenario 1 actually has the most utility and flexibility over the long haul...

    1. It will allow you to make changes to any transaction type without affecting any of the others. These types of changes happen more often than most people would ever expect.

    2. Using a view or views as the interface is a tried and true method that even MS uses (Information_Schema_Views, for example). Done correctly, it will provide the GUI(s) with a single point interface that will withstand the tests of time and the GUI won't have to refer to dozens of tables.

    3. It's actually the least complex scenario (single interface view yet control over individual tables if needed for indexing, etc, etc), has no duplication of data as scenario 2 and 3 do (TypeID), and as currently described, has no chance of nulls like scenario 3 has. Sounds like a good example of 3rd normal form, to me.

    4. If carefully planned so far as ID's go, would also allow the view to be an updateable partitioned view which will be of tremedous utility to any GUI's involved... it's the same effect that scenario 2 and 3 are being used for. Further, the updateable partioned view would allow for a TypeID column without ever materializing, if that is the desire.

    5. A byproduct of the partioned updateable view is incredible potential performance even if the transaction tables get large.

    6. Another possible byproduct is performance by distributing the work load... something working on Type 1 transactions would have zero blocking potential with something working on Type 2 transactions.

    7. It will allow for much easier "what if's" and easier reporting potential because reporting views will be easier to write and allow easier exclusion of transaction types further improving performance when this gets big.

    8. The PK/FK requirements will be greatly simplified and the only possible trigger requirements will be for Audit Triggers.

    9. Custom indexing for performance won't cause Insert/Update/Delete performance problems because each transaction type has it's own table. The other scenarios place all indexes for all types in a single table... we all know what happens when you have too many indexes on a transactional table.

    There's a couple of more "advantages" to Scenario 1 but you probably get the idea.

    Scenario 1 wasn't the popular answer at work, either. Everyone wanted to do a Scenario 2 or 3 or even 4 because it was "easy" in the short haul and could be brought to fruition very quickly and without much thought (look for the operative words there, folks 😉 )... my Architect, my DBA's, and I have all seen what happens over the long haul if Scenario 1 isn't used... denormalization of the transaction table, difficult maintenance/changes especially at the GUI interface level, performance problems, long winded reporting runs, keying problems, index problems etc, etc. We stood our ground and now we have a system (new OSS) that has no such problems. And, we'll likely never see a deadlock or extended blocking and the resulting timeouts that seem to plague so many systems.

    Like I said, Scenario 1 isn't the popular answer... but I think it's the right answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow, that was great Jeff...thanks. I'll have to admit, I've been just watching this thread waiting for a solid explanation of the benefits of any of the solutions provided.

    We have a similar dilemma here so I was looking for some constructive analysis of the scenarios provided. My gut feeling told me that #1 was the way to go, but all of the designers here are promoting the super/sub type models.

    You mentioned that scenarios 2, 3 and 4 lead to denormalization of the transaction table. Can you expand on this? The proponents of scenarios 2, 3, and 4 typically promote that model due to the redundant data that you end up with for scenario 1.

    One more question for you, have you had experience with using either of these models and trying to maintain hierarchical relationships between the types?

    Thanks in advance....

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • BAM! Jeff has spoken

    gotta say although it's not the most popular answer your arguments are strong

    thanks for the input, I will meditate on that for a while and see what comments others put here in the meantime,

  • John Rowan (3/27/2008)


    Wow, that was great Jeff...thanks. I'll have to admit, I've been just watching this thread waiting for a solid explanation of the benefits of any of the solutions provided.

    We have a similar dilemma here so I was looking for some constructive analysis of the scenarios provided. My gut feeling told me that #1 was the way to go, but all of the designers here are promoting the super/sub type models.

    You mentioned that scenarios 2, 3 and 4 lead to denormalization of the transaction table. Can you expand on this? The proponents of scenarios 2, 3, and 4 typically promote that model due to the redundant data that you end up with for scenario 1.

    One more question for you, have you had experience with using either of these models and trying to maintain hierarchical relationships between the types?

    Thanks in advance....

    You bet... Thanks for the kudo, John.

    On the denormalization caused by 2/3/4... all of them are going to have some duplication of data caused by, if nothing else, the transaction TYPEID. Scenarion 3 and 4 will also have NULLs in columns not having to do with a particular transaction type which is another form of denormalization. Scenario 1 doesn't have either of those problems and, just to be clear, you don't end up with any duplication of data in Scenario 1... absolutely none.

    The maintenance of hierarchical relationships between types isn't something I've often seen so far as transactions go... I normally see those type of relationships in "product catalogs" where a "package" may contain many products or perhaps, sub-packages and products under those. The transactions are normally at the top level for these types of things. And, yes, a package may consist of a single product. But, even there, I've managed to convince folks of a scenario like 1 in that the products and related information is stored in one table, the packages in another, and a 3rd table to describe the package hierarchy.

    For those that may still be in doubt that Scenario 1 is the way to go, ask yourself this... when is the last time you saw a database with transactions built like someone's home checkbook in Excel? Wouldn't the database, instead, have transactions stored in separate tables like, for example, a Payment table, an Adjustment table, a Refund table, a LineItem table, a Tax table or two, a Usage table, etc, etc? Why should this particular situation for Harvey be any different? Why would anyone put these different transaction types all in one table? I just don't see the merit of Scenarios 2/3/4... Scenario 1 is the only one that makes good long-term database sense to me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply