Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

FK vs Check Constraint for Lookup/Reference Tables Expand / Collapse
Author
Message
Posted Sunday, August 25, 2013 9:17 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, February 07, 2014 6:33 AM
Points: 496, Visits: 815
I often see lookup tables used to reference some sort of description (status,mode etc..) with a FK relationship to ensure data integrity. I've been thinking about replacing the normal design of linking IDs with storing the actual Status Description in the parent table and using a UDF within a check constraint to ensure the data is accurate based on a reference table. This would save time of having to join tables unnecessarily while still enforcing data integrity. As long as the Status is only used within that single table this seems like a good idea. Obviously if it was shared among multiple tables this wouldn't be a good fit.

Pros, cons, suggestions? It "seems" like a good idea but is it? TIA


Traditional Model:
tEmployees (EmployeeID int,StatusID int)
tEmployeesStatus (StatusID int,StatusDescription (varchar(20))

Proposed Model:
tEmployees (EmployeeID int,StatusDescription varchar(20))
tEmployeesStatus (StatusDescription varchar(20))

Create a UDF that validates the data being entered into the StatusDescription field:

Create FUNCTION [dbo].[fnEmployeeStatusList](@EmployeeStatus varchar(20))
RETURNS char(1)
AS
BEGIN

Declare @Exists char(1)

SELECT @Exists = [StatusDescription ] from tEmployeeStatus Where [StatusDescription ] = @EmployeeStatus
IF @Exists > '' BEGIN SET @Exists = 'Y' END
IF ISNULL(@Exists,'') = '' BEGIN SET @Exists = 'N' END
-- Return the result of the function
RETURN @Exists

END

ALTER TABLE [dbo].[tEmployees] WITH CHECK ADD CONSTRAINT [CK_tEmployees_Status]CHECK (([dbo].[fnEmployeeStatusList]([EmployeeStatus])='Y'))
GO

ALTER TABLE [dbo].[tEmployees CHECK CONSTRAINT [CK_tEmployees_Status]
GO
Post #1488204
Posted Sunday, August 25, 2013 12:39 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 12:16 AM
Points: 8,289, Visits: 8,742
I imagine that calling that multi-line UDF for each update/insert operation on the table will be a pretty horrifying overhead, but it's easy enough to rewrite it as a single statement UDF. Another issue is storage: if several employees shares a status, the old model stores that status just once, while your new model stores it several times; that may not matter is the number of employees isn't large. In addition, either there's going to be a search on status for each update/insert operation, which may need to be supported by an index if there are enough distinct statuses; the old model has a index on statusId, which is an int, which suggests a lot of distinct statuses - otherwise why isn't it smallint (or even tinyint)? - so indexing a 20-bye status as required in the new model will cost some extra storage. But it could improve the performance overall provided the employee table is read far more often than it is updated, and usually has to be joined with the status table every time it's read, and there aren't too many employees, and not too many employees have the same status as each other, and the size of the status isn't going to grow too much in future.

But there are considerations other than the performance of employee table manipulations. Is an individual status value something that applies to a class of employees, rather than to a single employee - ie is an employee status a business-world entity in its own right, that may change wile remaining applicable to the same group of employees, or are statuses just an attribute of an employee, ie something which, when it changes, normally affects just one employee? That may be a more important consideration than performance, because getting the table structure to reflect the business world is generally a good thing, and if status changes that affect many employees simultaneously are common that will be a performance hit against the new model. If in fact the status changes always affect only a single employee the status is an attribute of an employee so it's certainly better design to hold the status in the main table, and the only reason for splitting it out (which is something that people like me regard as dangerous, because someone may carelessly update something that affects many employees when they intended to affect only one) would be to improve performance. On the other hand, if it's a separate entity in its own right, it's certainly better design to put it into a separate table, and the only reason for pulling it into the employee table (which is something everyone calls denormalization) would be to improve performance.

At the end of the day it comes back to one thing: what are the business considerations - all the business considerations, not just performance, which makes quite a long list. Those include flexibility, ease of development, support cost, and a lot things which at first site aren't schema design issues but on proper consideration are; performance is somewhere on that list, but it usually isn't top by any means.

edit: your check constraint confuses a parameter's name with its value, so it wont work unless that is fixed.


Tom
Post #1488222
Posted Sunday, August 25, 2013 1:53 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Yesterday @ 3:30 PM
Points: 41,531, Visits: 34,448
That check constraint won't prevent rows in tEmployeesStatus from being deleted/updated if there's a matching row in the child table. A foreign key will (or will cascade the delete/update if required)
The check constraint doesn't give the optimiser extra information about the relationship, a foreign key will.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1488229
Posted Sunday, August 25, 2013 2:39 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, February 07, 2014 6:33 AM
Points: 496, Visits: 815

L' Eomot Inversé,

Thank you for the reply.

This was a generic example but a few actual examples would be statuses applied to invoices and shipments, both of which will have a significant amount of data. The uniqueness of the status will be diluted as the table grows and more importantly as shipments and invoices move along through their respective process. At some point 95% of shipments and invoices will move into a final status (Paid, Shipment Delivered) so I would think indexes wouldn't be useful. Queries will be made based on these statuses quite often as will updates to these status will be common.

Yes the statuses won't really grow and there aren't that many of them (1/2 dozen maybe). A tinyint would be a better fit and I copied from a working version but I hastily renamed some of the code for this post (hence the non-working version I posted).

Yes they will need to be joined almost always because it's vital info to know almost every time a request is made.

In regards to duplicate data... If I choose to store a StatusID multiple times in the table vs the actual name, it's still duplicate data (imo).

In regards to the considerations you mentioned about status being applied to just that entity vs a group.. yes these will be specific to just that entity and won't represent a group.

Design woes.... :D


Post #1488241
Posted Sunday, August 25, 2013 2:41 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Yesterday @ 3:30 PM
Points: 41,531, Visits: 34,448
AVB (8/25/2013)
At some point 95% of shipments and invoices will move into a final status (Paid, Shipment Delivered) so I would think indexes wouldn't be useful.


If you frequently query for non-final statuses, those indexes will potentially be very useful indeed.

p.s. Have you tested and confirmed that the time required for the join is noticeable and unacceptable?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1488243
Posted Sunday, August 25, 2013 2:44 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, February 07, 2014 6:33 AM
Points: 496, Visits: 815
GilaMonster,

Thank you for the reply as well.

The actual table utilizes a PK/FK relationship for other tables associated based on the "EmployeeID" (really ShipmentId or InvoiceID) with cascade delete/update. I know the post was a little vague but I was trying to convey the idea without getting too specific early on in case it was a quick.. "You're doing WHAT?!" kind of response. :)
Post #1488244
Posted Sunday, August 25, 2013 2:49 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, February 07, 2014 6:33 AM
Points: 496, Visits: 815
GilaMonster,

No testing has been done yet as I'm in a "merged" logical and physical design phase.

I'm probably hesitant of using the lookup table with a FK relationship because of the previous system I used had a huge MUCK table. Of course I wouldn't design it like that but it just begged the question if I can reduce the amount of joins why not? Joining 5+ tables to get a whole overview of a single shipment is cumbersome but there is a lot of data to be displayed and I don't know it's unavoidable.
Post #1488246
Posted Sunday, August 25, 2013 2:59 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Yesterday @ 3:30 PM
Points: 41,531, Visits: 34,448
AVB (8/25/2013)
No testing has been done yet as I'm in a "merged" logical and physical design phase.


Beware premature optimisation (making non-standard, over-complex design decisions based on untested performance assumptions)

You're looking at making the database bigger, removing any update/delete constraints for these lookups, adding extra code, unusual check constraints in a non-standard way of doing things that could lead to unanticipated errors when changing data in the child table, with no testing to see if the foreign key/join design it'll be replacing is inefficient.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1488249
Posted Sunday, August 25, 2013 4:51 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, February 07, 2014 6:33 AM
Points: 496, Visits: 815
GilaMonster you make a good point as well.
Post #1488264
Posted Monday, August 26, 2013 9:09 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 12:16 AM
Points: 8,289, Visits: 8,742
If the status really is just an attribute of the main item rather than a separate entity that is related to it, the restriction on the set of values is a domain constraint; there are two standard ways of implementing domain constraints: (a) as an auxiliary table and a foreign key relationship with the main table pointing to the auxiliary table; and (b) as a row level check constraint without any auxiliary table (which doesn't call any UDFs which reference any data other than attributes in the row which are passed to them as parameters and literal constants held in the UDF's definition, which is a single statement). Method (a) is the used only when it's needed because the size of the attribute is bigger than the reference to a row in an auxiliary table, and putting the data in the main table would increase the row size enough to cause performance problem, since method (a) was is aimed at expressing entity relationships not at specifying attributes of an entity; method (b) is the norm. I think it's probably not a good idea to invent a hybrid, where you are implementing a domain constraint by an auxiliary table accessed through a check constraint. I wouldn't go as far as Gail and say use method (a) definitely, because you've said that tinyint is more appropriate than varchar(20) and I suspect that you probably have only half a dozen possible values, so it's going to be trivial to write an appropriate check constraint using no UDF at all and no auxiliary table (and you can index the main table on that attribute too, it's quite likely that you'll need to for performance). But I agree completely when she says in effect that trying to design optimisations too early is not a good approach - implement one of the standard approaches, if it doesn't perform try the other, and only when that also doesn't perform try to optimise by doing something new. That wouldn't have been good advice 33 years ago, when there was very little knowledge of good structures and algorithms built from experience in RDBMS, but a third of a century on there is a lot of history to learn from and well established standard practise to follow when there's no good reason not to. Maybe that would not be good advice in a research situation either, but I don't imagine that's where you are.

Tom
Post #1488418
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse