RE: Toward Integrity - Part 2

SSCertifiable

Points: 5383

December 10, 2007 at 6:21 pm

Hi Timothy;

I'm glad you brought this topic up as it lets me expand a bit on the design tradeoffs I'm making as well as reiterate my position on the poor relational domain (data type) support in SQL.

First off, I'd like to make sure I understand your first suggestion of using integers. What I read into it is that you propose a "lookup" table that would contain the validated species name and a corresponding uniquely identifying numeric value that is used throughout the rest of the design for referring to species.

This is a typical design decision and one that I would normally recommend -- at least in the face of the current SQL Server product (e.g., 2005 and earlier... 2008 might be a different story -- more on that in a different article!) for the physical implementation. However, rather than truly solving the problem of constraining the species name to be a singular noun representing an animal, it simply moves the problem "one table deeper" (if I may speak very loosely). In other words, there's still nothing inherently preventing someone from populating the table with "1 - Dog" and "2 - Dogs".

On the other hand, creating this "lookup" table may be a way to make the data stewardship more controlled in that the people assigned to the population of these "foundational" data items can more easily find the tables they are responsible for. So here we see, specifically, a design tradeoff where we bias the design toward the "query writer end user" and away from the "data steward end user".

Of course, the other reason I chose this simplistic approach is that it makes the diagrams and example code somewhat more compact for the purposes of the article.

Now, with respect to the use of a UDT, that's another possibility that we, as physical implementers, should at least consider. The nice thing is that, done well, a UDT (perhaps using CLR code to back it) gets us part of the way toward simulating relational domains. We can theoretically define a strict set of operators that apply to the custom datatype, and may be able to specify the physical representation in the db table(s). I personally have avoided using them, though, due to the difficulties they impose with respect to changing table definitions and due to the way CLR was integrated into the engine. I would, however, like to hear if anyone has a "success story" using UDT's, especially backed with CLR code.

TroyK