Is My Foreign Key Table Unnecessary?

  • I'm sure this has been addressed many times here but none of the search terms I've tried have surfaced anything relevant.

    Simply, I have a SalesLedgerTransaction table. One of the columns is SalesLedgerTransactionTypeID which is a Foreign Key column to to the SalesLedgerTransactionType table. That table only has two columns - ID and Description - and will only ever contain five records, with values in the Description column of 'Invoice', 'Payment', 'Refund', 'Write-Off' and 'Credit Note'.

    I'm wondering whether this is a pointless use of a Foreign Key column. I could easily replace it (we're still in the early stages of development) with a CHAR(10) column and just write 'Invoice', 'Payment', etc. to that column, then drop the SalesLedgerTransactionType table altogether. The value will only ever be populated by the application's code, never by the user, so there's no potential for invalid values to be entered (barring developer errors in the code, I realise - we could always add a constraint on the column to restrict the permissible values, if necessary).

    As far as I can tell, the only advantage that the Foreign Key column and table are giving me is that storing a single-digit integer takes up less disk space than the word 'Invoice' or 'Payment' over and over again, but given the size of the user base and the business that is going to be a very small difference in actual bytes.

    Is there anything I'm missing, or any strong case to be made for one solution or the other?

  • Your preference.

    If you drop the lookup table, make sure that you add constraints to the column. "Integrity enforced in the application" = "Garbage data eventually"
    Also consider that  if you need to fetch the data to populate a dropdown, without the lookup table you'll have to query a much larger table and do a DISTINCT on the results, which is a lot slower than querying a 5-row table. Finally, when, at some point in the future, someone decides that a 6th value is needed, consider how much work will be needed in each case (or when someone decides that Write-Off should become Write-off in every instance)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Tuesday, January 2, 2018 8:16 AM

    Your preference.

    If you drop the lookup table, make sure that you add constraints to the column. "Integrity enforced in the application" = "Garbage data eventually"
    Also consider that  if you need to fetch the data to populate a dropdown, without the lookup table you'll have to query a much larger table and do a DISTINCT on the results, which is a lot slower than querying a 5-row table. Finally, when, at some point in the future, someone decides that a 6th value is needed, consider how much work will be needed in each case (or in the case where someone decides that Write-Off should become Write-off in every instance)

    +1

    ...

  • Phil Stratford - Tuesday, January 2, 2018 8:12 AM

    I'm sure this has been addressed many times here but none of the search terms I've tried have surfaced anything relevant.

    Simply, I have a SalesLedgerTransaction table. One of the columns is SalesLedgerTransactionTypeID which is a Foreign Key column to to the SalesLedgerTransactionType table. That table only has two columns - ID and Description - and will only ever contain five records, with values in the Description column of 'Invoice', 'Payment', 'Refund', 'Write-Off' and 'Credit Note'.

    I'm wondering whether this is a pointless use of a Foreign Key column. I could easily replace it (we're still in the early stages of development) with a CHAR(10) column and just write 'Invoice', 'Payment', etc. to that column, then drop the SalesLedgerTransactionType table altogether. The value will only ever be populated by the application's code, never by the user, so there's no potential for invalid values to be entered (barring developer errors in the code, I realise - we could always add a constraint on the column to restrict the permissible values, if necessary).

    As far as I can tell, the only advantage that the Foreign Key column and table are giving me is that storing a single-digit integer takes up less disk space than the word 'Invoice' or 'Payment' over and over again, but given the size of the user base and the business that is going to be a very small difference in actual bytes.

    Is there anything I'm missing, or any strong case to be made for one solution or the other?

    Sure sounds like it's needed. Today there will only, ever, be five values. Tomorrow, totally different world. Also, what happens as the code changes? I don't think about referential integrity as only enforcing constraints. It's also a backup for the code. Sure, the code can do this and do it well. However, as we all know, it can get messed up. Why not have the backup in place to ensure appropriate data is always entered correctly, no errors, no slip-ups.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Tuesday, January 2, 2018 8:33 AM

    Phil Stratford - Tuesday, January 2, 2018 8:12 AM

    I'm sure this has been addressed many times here but none of the search terms I've tried have surfaced anything relevant.

    Simply, I have a SalesLedgerTransaction table. One of the columns is SalesLedgerTransactionTypeID which is a Foreign Key column to to the SalesLedgerTransactionType table. That table only has two columns - ID and Description - and will only ever contain five records, with values in the Description column of 'Invoice', 'Payment', 'Refund', 'Write-Off' and 'Credit Note'.

    I'm wondering whether this is a pointless use of a Foreign Key column. I could easily replace it (we're still in the early stages of development) with a CHAR(10) column and just write 'Invoice', 'Payment', etc. to that column, then drop the SalesLedgerTransactionType table altogether. The value will only ever be populated by the application's code, never by the user, so there's no potential for invalid values to be entered (barring developer errors in the code, I realise - we could always add a constraint on the column to restrict the permissible values, if necessary).

    As far as I can tell, the only advantage that the Foreign Key column and table are giving me is that storing a single-digit integer takes up less disk space than the word 'Invoice' or 'Payment' over and over again, but given the size of the user base and the business that is going to be a very small difference in actual bytes.

    Is there anything I'm missing, or any strong case to be made for one solution or the other?

    Sure sounds like it's needed. Today there will only, ever, be five values. Tomorrow, totally different world. Also, what happens as the code changes? I don't think about referential integrity as only enforcing constraints. It's also a backup for the code. Sure, the code can do this and do it well. However, as we all know, it can get messed up. Why not have the backup in place to ensure appropriate data is always entered correctly, no errors, no slip-ups.

    You're absolutely right that a sixth, seventh or even eighth value could be required in the future, but if that happened we could simply start writing the new value to the CHAR(10) column. However, I take the broader point that you're making and I agree that the foreign key table is the best way to go. Thanks!

  • Phil Stratford - Wednesday, January 3, 2018 3:47 AM

    You're absolutely right that a sixth, seventh or even eighth value could be required in the future, but if that happened we could simply start writing the new value to the CHAR(10) column. 

    Yeah, 100%. However, it would require code modification, a new round of testing, and a deployment. Versus, add a row to a table. Not only is referential integrity of this sort good for the data, but it makes for faster & safer changes to your system. Let's not even mention that enforced referential integrity helps query performance (OK, let's mention it).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Phil Stratford - Wednesday, January 3, 2018 3:47 AM

    You're absolutely right that a sixth, seventh or even eighth value could be required in the future, but if that happened we could simply start writing the new value to the CHAR(10) column.

    After changing the constraint to allow the new values. DDL change + code change vs one more row added in one table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Phil Stratford - Tuesday, January 2, 2018 8:12 AM

    Your first problem is that there is no such thing as a "<something>_type_id" in RDBMS. If you read the metadata committee standards or ISO 11179 rules, you'll find that "_type" and "_id" or what we call attribute properties. This means that the attribute can be a type or an identifier but not some weird god-awful hybrid. Do you have a "blood_type_id"? No! A data element should not change its name from table to table. It should not have more than one attribute property.

    I'm scared stiff that you might have used the "_id" in this because you use an identity column instead of actually designing your transaction types.

    As a heuristic, I tell people that if you're encoding is relatively static and has a small list of values, then use a CHECK (something_type IN (..)) constraint. However, if he encoding is very large or dynamic, then use a references clause and separate out the values in the table.

    Since all we have is your narrative I'm going to guess that in your case, a check () would be fine. It's also pretty darn fast.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Phil Stratford - Tuesday, January 2, 2018 8:12 AM

    I'm sure this has been addressed many times here but none of the search terms I've tried have surfaced anything relevant.

    Simply, I have a SalesLedgerTransaction table. One of the columns is SalesLedgerTransactionTypeID which is a Foreign Key column to to the SalesLedgerTransactionType table. That table only has two columns - ID and Description - and will only ever contain five records, with values in the Description column of 'Invoice', 'Payment', 'Refund', 'Write-Off' and 'Credit Note'.

    I'm wondering whether this is a pointless use of a Foreign Key column. I could easily replace it (we're still in the early stages of development) with a CHAR(10) column and just write 'Invoice', 'Payment', etc. to that column, then drop the SalesLedgerTransactionType table altogether. The value will only ever be populated by the application's code, never by the user, so there's no potential for invalid values to be entered (barring developer errors in the code, I realise - we could always add a constraint on the column to restrict the permissible values, if necessary).

    As far as I can tell, the only advantage that the Foreign Key column and table are giving me is that storing a single-digit integer takes up less disk space than the word 'Invoice' or 'Payment' over and over again, but given the size of the user base and the business that is going to be a very small difference in actual bytes.

    Is there anything I'm missing, or any strong case to be made for one solution or the other?

    Two things come to mind...

    (1) You'll fail if you try to do that with a CHAR(10) - "Credit Note" is 11 characters, which leads on to
    (2) If you do need to create a longer description than the field will accept, you'll need to change a potentially very large table to do it, rather than quickly change a small lookup table.

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • ThomasRushton - Monday, January 8, 2018 2:15 AM

    Phil Stratford - Tuesday, January 2, 2018 8:12 AM

    I'm sure this has been addressed many times here but none of the search terms I've tried have surfaced anything relevant.

    Simply, I have a SalesLedgerTransaction table. One of the columns is SalesLedgerTransactionTypeID which is a Foreign Key column to to the SalesLedgerTransactionType table. That table only has two columns - ID and Description - and will only ever contain five records, with values in the Description column of 'Invoice', 'Payment', 'Refund', 'Write-Off' and 'Credit Note'.

    I'm wondering whether this is a pointless use of a Foreign Key column. I could easily replace it (we're still in the early stages of development) with a CHAR(10) column and just write 'Invoice', 'Payment', etc. to that column, then drop the SalesLedgerTransactionType table altogether. The value will only ever be populated by the application's code, never by the user, so there's no potential for invalid values to be entered (barring developer errors in the code, I realise - we could always add a constraint on the column to restrict the permissible values, if necessary).

    As far as I can tell, the only advantage that the Foreign Key column and table are giving me is that storing a single-digit integer takes up less disk space than the word 'Invoice' or 'Payment' over and over again, but given the size of the user base and the business that is going to be a very small difference in actual bytes.

    Is there anything I'm missing, or any strong case to be made for one solution or the other?

    Two things come to mind...

    (1) You'll fail if you try to do that with a CHAR(10) - "Credit Note" is 11 characters, which leads on to
    (2) If you do need to create a longer description than the field will accept, you'll need to change a potentially very large table to do it, rather than quickly change a small lookup table.

    +1 I was thinking Telegraphic Transfer as a possible transaction for the same reason as you cite.

    ...

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply