SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


FK vs Check Constraint for Lookup/Reference Tables


FK vs Check Constraint for Lookup/Reference Tables

Author
Message
AVB
AVB
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2338 Visits: 820
I often see lookup tables used to reference some sort of description (status,mode etc..) with a FK relationship to ensure data integrity. I've been thinking about replacing the normal design of linking IDs with storing the actual Status Description in the parent table and using a UDF within a check constraint to ensure the data is accurate based on a reference table. This would save time of having to join tables unnecessarily while still enforcing data integrity. As long as the Status is only used within that single table this seems like a good idea. Obviously if it was shared among multiple tables this wouldn't be a good fit.

Pros, cons, suggestions? It "seems" like a good idea but is it? TIA


Traditional Model:
tEmployees (EmployeeID int,StatusID int)
tEmployeesStatus (StatusID int,StatusDescription (varchar(20))

Proposed Model:
tEmployees (EmployeeID int,StatusDescription varchar(20))
tEmployeesStatus (StatusDescription varchar(20))

Create a UDF that validates the data being entered into the StatusDescription field:

Create FUNCTION [dbo].[fnEmployeeStatusList](@EmployeeStatus varchar(20))
RETURNS char(1)
AS
BEGIN

Declare @Exists char(1)

SELECT @Exists = [StatusDescription ] from tEmployeeStatus Where [StatusDescription ] = @EmployeeStatus
IF @Exists > '' BEGIN SET @Exists = 'Y' END
IF ISNULL(@Exists,'') = '' BEGIN SET @Exists = 'N' END
-- Return the result of the function
RETURN @Exists

END

ALTER TABLE [dbo].[tEmployees] WITH CHECK ADD CONSTRAINT [CK_tEmployees_Status]CHECK (([dbo].[fnEmployeeStatusList]([EmployeeStatus])='Y'))
GO

ALTER TABLE [dbo].[tEmployees CHECK CONSTRAINT [CK_tEmployees_Status]
GO
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25523 Visits: 12494
I imagine that calling that multi-line UDF for each update/insert operation on the table will be a pretty horrifying overhead, but it's easy enough to rewrite it as a single statement UDF. Another issue is storage: if several employees shares a status, the old model stores that status just once, while your new model stores it several times; that may not matter is the number of employees isn't large. In addition, either there's going to be a search on status for each update/insert operation, which may need to be supported by an index if there are enough distinct statuses; the old model has a index on statusId, which is an int, which suggests a lot of distinct statuses - otherwise why isn't it smallint (or even tinyint)? - so indexing a 20-bye status as required in the new model will cost some extra storage. But it could improve the performance overall provided the employee table is read far more often than it is updated, and usually has to be joined with the status table every time it's read, and there aren't too many employees, and not too many employees have the same status as each other, and the size of the status isn't going to grow too much in future.

But there are considerations other than the performance of employee table manipulations. Is an individual status value something that applies to a class of employees, rather than to a single employee - ie is an employee status a business-world entity in its own right, that may change wile remaining applicable to the same group of employees, or are statuses just an attribute of an employee, ie something which, when it changes, normally affects just one employee? That may be a more important consideration than performance, because getting the table structure to reflect the business world is generally a good thing, and if status changes that affect many employees simultaneously are common that will be a performance hit against the new model. If in fact the status changes always affect only a single employee the status is an attribute of an employee so it's certainly better design to hold the status in the main table, and the only reason for splitting it out (which is something that people like me regard as dangerous, because someone may carelessly update something that affects many employees when they intended to affect only one) would be to improve performance. On the other hand, if it's a separate entity in its own right, it's certainly better design to put it into a separate table, and the only reason for pulling it into the employee table (which is something everyone calls denormalization) would be to improve performance.

At the end of the day it comes back to one thing: what are the business considerations - all the business considerations, not just performance, which makes quite a long list. Those include flexibility, ease of development, support cost, and a lot things which at first site aren't schema design issues but on proper consideration are; performance is somewhere on that list, but it usually isn't top by any means.

edit: your check constraint confuses a parameter's name with its value, so it wont work unless that is fixed.

Tom

GilaMonster
GilaMonster
SSC Guru
SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)

Group: General Forum Members
Points: 217593 Visits: 46278
That check constraint won't prevent rows in tEmployeesStatus from being deleted/updated if there's a matching row in the child table. A foreign key will (or will cascade the delete/update if required)
The check constraint doesn't give the optimiser extra information about the relationship, a foreign key will.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


AVB
AVB
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2338 Visits: 820
L' Eomot Inversé,

Thank you for the reply.

This was a generic example but a few actual examples would be statuses applied to invoices and shipments, both of which will have a significant amount of data. The uniqueness of the status will be diluted as the table grows and more importantly as shipments and invoices move along through their respective process. At some point 95% of shipments and invoices will move into a final status (Paid, Shipment Delivered) so I would think indexes wouldn't be useful. Queries will be made based on these statuses quite often as will updates to these status will be common.

Yes the statuses won't really grow and there aren't that many of them (1/2 dozen maybe). A tinyint would be a better fit and I copied from a working version but I hastily renamed some of the code for this post (hence the non-working version I posted).

Yes they will need to be joined almost always because it's vital info to know almost every time a request is made.

In regards to duplicate data... If I choose to store a StatusID multiple times in the table vs the actual name, it's still duplicate data (imo).

In regards to the considerations you mentioned about status being applied to just that entity vs a group.. yes these will be specific to just that entity and won't represent a group.

Design woes.... BigGrin
GilaMonster
GilaMonster
SSC Guru
SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)

Group: General Forum Members
Points: 217593 Visits: 46278
AVB (8/25/2013)
At some point 95% of shipments and invoices will move into a final status (Paid, Shipment Delivered) so I would think indexes wouldn't be useful.


If you frequently query for non-final statuses, those indexes will potentially be very useful indeed.

p.s. Have you tested and confirmed that the time required for the join is noticeable and unacceptable?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


AVB
AVB
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2338 Visits: 820
GilaMonster,

Thank you for the reply as well.

The actual table utilizes a PK/FK relationship for other tables associated based on the "EmployeeID" (really ShipmentId or InvoiceID) with cascade delete/update. I know the post was a little vague but I was trying to convey the idea without getting too specific early on in case it was a quick.. "You're doing WHAT?!" kind of response. Smile
AVB
AVB
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2338 Visits: 820
GilaMonster,

No testing has been done yet as I'm in a "merged" logical and physical design phase.

I'm probably hesitant of using the lookup table with a FK relationship because of the previous system I used had a huge MUCK table. Of course I wouldn't design it like that but it just begged the question if I can reduce the amount of joins why not? Joining 5+ tables to get a whole overview of a single shipment is cumbersome but there is a lot of data to be displayed and I don't know it's unavoidable.
GilaMonster
GilaMonster
SSC Guru
SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)

Group: General Forum Members
Points: 217593 Visits: 46278
AVB (8/25/2013)
No testing has been done yet as I'm in a "merged" logical and physical design phase.


Beware premature optimisation (making non-standard, over-complex design decisions based on untested performance assumptions)

You're looking at making the database bigger, removing any update/delete constraints for these lookups, adding extra code, unusual check constraints in a non-standard way of doing things that could lead to unanticipated errors when changing data in the child table, with no testing to see if the foreign key/join design it'll be replacing is inefficient.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


AVB
AVB
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2338 Visits: 820
GilaMonster you make a good point as well.
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25523 Visits: 12494
If the status really is just an attribute of the main item rather than a separate entity that is related to it, the restriction on the set of values is a domain constraint; there are two standard ways of implementing domain constraints: (a) as an auxiliary table and a foreign key relationship with the main table pointing to the auxiliary table; and (b) as a row level check constraint without any auxiliary table (which doesn't call any UDFs which reference any data other than attributes in the row which are passed to them as parameters and literal constants held in the UDF's definition, which is a single statement). Method (a) is the used only when it's needed because the size of the attribute is bigger than the reference to a row in an auxiliary table, and putting the data in the main table would increase the row size enough to cause performance problem, since method (a) was is aimed at expressing entity relationships not at specifying attributes of an entity; method (b) is the norm. I think it's probably not a good idea to invent a hybrid, where you are implementing a domain constraint by an auxiliary table accessed through a check constraint. I wouldn't go as far as Gail and say use method (a) definitely, because you've said that tinyint is more appropriate than varchar(20) and I suspect that you probably have only half a dozen possible values, so it's going to be trivial to write an appropriate check constraint using no UDF at all and no auxiliary table (and you can index the main table on that attribute too, it's quite likely that you'll need to for performance). But I agree completely when she says in effect that trying to design optimisations too early is not a good approach - implement one of the standard approaches, if it doesn't perform try the other, and only when that also doesn't perform try to optimise by doing something new. That wouldn't have been good advice 33 years ago, when there was very little knowledge of good structures and algorithms built from experience in RDBMS, but a third of a century on there is a lot of history to learn from and well established standard practise to follow when there's no good reason not to. Maybe that would not be good advice in a research situation either, but I don't imagine that's where you are.

Tom

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search