Toward Integrity - Part 2

  • Comments posted to this topic are about the item Toward Integrity - Part 2

  • I like the article over all and it is definitely addressing a significant topic, but unless I am missing something, you do have relatively an easy way to enforce the constraints on species of ensuring it is always singular.

    My personal preference would be to have the species field be an integer and give the species table an integer primary key and then establish the foreign key relationship.

    If you wanted to stick with a varchar data type and avoid table proliferation you could add a check constraint requiring that species be within an approved list. Another option would be to create a user defined type, but that is more complicated and carries more baggage than either a look up table or a constraint.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Hi Timothy;

    I'm glad you brought this topic up as it lets me expand a bit on the design tradeoffs I'm making as well as reiterate my position on the poor relational domain (data type) support in SQL.

    First off, I'd like to make sure I understand your first suggestion of using integers. What I read into it is that you propose a "lookup" table that would contain the validated species name and a corresponding uniquely identifying numeric value that is used throughout the rest of the design for referring to species.

    This is a typical design decision and one that I would normally recommend -- at least in the face of the current SQL Server product (e.g., 2005 and earlier... 2008 might be a different story -- more on that in a different article!) for the physical implementation. However, rather than truly solving the problem of constraining the species name to be a singular noun representing an animal, it simply moves the problem "one table deeper" (if I may speak very loosely). In other words, there's still nothing inherently preventing someone from populating the table with "1 - Dog" and "2 - Dogs".

    On the other hand, creating this "lookup" table may be a way to make the data stewardship more controlled in that the people assigned to the population of these "foundational" data items can more easily find the tables they are responsible for. So here we see, specifically, a design tradeoff where we bias the design toward the "query writer end user" and away from the "data steward end user".

    Of course, the other reason I chose this simplistic approach is that it makes the diagrams and example code somewhat more compact for the purposes of the article.

    Now, with respect to the use of a UDT, that's another possibility that we, as physical implementers, should at least consider. The nice thing is that, done well, a UDT (perhaps using CLR code to back it) gets us part of the way toward simulating relational domains. We can theoretically define a strict set of operators that apply to the custom datatype, and may be able to specify the physical representation in the db table(s). I personally have avoided using them, though, due to the difficulties they impose with respect to changing table definitions and due to the way CLR was integrated into the engine. I would, however, like to hear if anyone has a "success story" using UDT's, especially backed with CLR code.

    TroyK

  • You are right that using a look up table only moves the problem one level deeper, but in doing so, it may make it easier to control and limit the opportunity for mistakes. For instance, you may have vets joining and changing specialties often enough that some less knowledgeable user may need to have authority, but perhaps the look up table will need additions rarely enough for the DBA to handle it, or at least rarely enough that there will be fewer opportunities for typos there.

    The other piece of course is that with a look up table you could make front end so the user would choose from a menu instead of typing it in at all.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Hi Troy,

    Good series! I'd like to suggest that you include a bit more "real life" stucture to the model. Just enough to make the examples more "logically" realistic.

    For example, there must be attributes that would be specific to different species (dog vs. cat). Otherwise why bother modeling them as subtypes? What would be the usefullness of modeling them as subtypes if they had no distinguishing attributes? Such distinguishing attributes are crucial to the rational and usefullness for modeling a subtype.

    I'm sure we could all easily think of a few more typical data modeling "patterns" that appear repeatedly in real world work and that also typically are not handled well by many developers. Another would be optional attributes.

    The missing model elements actually make the example more difficult to understand, in my opinion, due to incompleteness.

    Thanks,

    Ken

  • kenambrose (12/12/2007)


    Hi Troy,

    Good series! I'd like to suggest that you include a bit more "real life" stucture to the model. Just enough to make the examples more "logically" realistic.

    For example, there must be attributes that would be specific to different species (dog vs. cat). Otherwise why bother modeling them as subtypes? What would be the usefullness of modeling them as subtypes if they had no distinguishing attributes? Such distinguishing attributes are crucial to the rational and usefullness for modeling a subtype.

    I'm sure we could all easily think of a few more typical data modeling "patterns" that appear repeatedly in real world work and that also typically are not handled well by many developers. Another would be optional attributes.

    The missing model elements actually make the example more difficult to understand, in my opinion, due to incompleteness.

    Thanks,

    Ken

    Hi Ken;

    Thank you for the comments... you're spot on.

    If you check the first article, (http://www.sqlservercentral.com/articles/Data+Modeling/61526/), you will see a little more information on the rules surrounding the Dog and Cat subtypes. For this example, the additional information of interest for a Dog is whether or not it's registered with the AKC, and for a Cat, whether or not it's been declawed.

    Given the assumption that the conceptual model is complete with respect to the jurisdiction and universe of discourse, we can conclude that for any other particular species, there is no additional information of interest -- hence only specifying Dog and Cat as subtypes of Pet.

    The 5th (and last) article in the series, due to be published in 3-4 weeks, will look at how we can model and implement the entity supertype/subtype "pattern" in SQL Server.

    In my opinion, the 3rd article (to be published this coming Monday, the 17th) is where things start to get really interesting for those that already have substantial modeling experience.

    Enjoy!

    TroyK

  • Ok, I think I got it.

    The "conceptual" level intentionally omits the distinguishing attributes for the subtypes. Those will be documented later in the "logical" model. Great!

    Will your series follow through with some implementation examples? Maintaining data, implementing UIs , and reporting from designs specifying subtypes has a lot of potential for difficulty when using Sql Server given that the products currently do not provide direct support for subtype implementation.

    Therefore the need to maintain procedural code (triggers etc.) to actually maintain data for such structures...

    Subtypes are a very useful and effective modeling abstraction, but quite awkward to implement in Sql Server IMHO. If the series can offer some examples for how to overcome Sql Server's inherent implementation hurdles in this area, I think it would add a lot to the discussion.

    Thanks,

    Ken

  • Hi Ken;

    This article series is concerned with modeling and implementing the model in such a way as to prevent "bad" data from entering the system (hence the name "Toward Integrity").

    As you point out, there is another side to the coin in that particular design and implementation choices we make will impact (usually in a siginificant way) the usability of the data from the query writer's perspective.

    I think it would be an excellent idea to take the various pattern implementations from these articles and drill down into some alternative implementations, taking a look at how they impact the ability to ask certain questions. I'll work on these as follow-ups for when this series is done.

    TroyK

  • Thanks Troy, I look forward to the implementation discussion.

    In regards to modeling approaches, I am reminded of a well known quip from the famous American comedian Will Rogers.

    During WWI, the u-boat problem was tremendous for the allies. Will suggested a solution- simply boil the oceans! Then the boats would be forced to the surface where they could be easily dealt with.

    When asked how it could be possible to boil the oceans, the gist of his reply was "I'm the theorist, you provide the implementation"...

    As a long time evangelist for investment in conceptual and logical modeling efforts I do my best to always address the implementation questions that inevitably arise.

    Best,

    Ken

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply