Explicit Columns Or NULL Foreign Keys?

  • Took me ages to think of a subject for this thread, and the one I've ended up with is pretty terrible! So let me explain the question:

    I have a table called Appointment, for storing (you guessed it) appointments at a medical practice. If an appointment is cancelled, I want to collect a cancellation reason, so let's say that I create a second table called Cancellation which has a foreign key relating to the Appointment table's primary key, AppointmentID, and another column, Reason. Now, in order to indicate that an appointment was cancelled, I could include a Cancelled column in the Appointment table with a bit datatype, or instead I could infer that an appointment must be cancelled if it has a corresponding record in the Cancellation table.

    It may be that it'd be better to store the cancellation reason in the Appointment table - feel free to tell me why I should or shouldn't. But regardless, let's say I stick with the two-table solution described above, and I subsequently want to write a query to list all appointments which have been cancelled. If I had the Cancelled column in the Appointment table, I could simply query for all records in that table where that column's value was FALSE. If I went the other way and DIDN'T have a Cancelled column, I could instead write a query joining the Appointment and Cancellation tables to return all records in Appointment with a corresponding record in Cancellation.

    That latter method, whilst slightly more complicated because it involves joining two tables, seems to me to be the most normalised. Instead of storing the fact that an appointment is cancelled in two different tables, that fact is only stored in the Cancellation table. My question, really, though, is whether that's a good idea. Would there be a performance hit in using the two-table, 'inferred cancellation' method rather than just having a bit column in the Appointment table? Would that performance hit be enough to persuade you to use a Cancellation column in the Appointment table instead? And what about if I were to apply that method to other things associated with each appointment, e.g. Is it completed? Is it chargeable to the client or an insurance company? Is the client and in-patient or out-patient?

    I've designed quite a few databases before but they've all been relatively small. What best practice is, and how things might work when scaling up the size of the database, are still things I'm learning.

  • Interesting question.

    At first blush, I think I'd probably put the cancellation reason right in the appointment database. The reason being, why put it in a related table? It's always going to be one to one, right? A given appointment can't have more than one cancellation reason, I'd think.

    But, let's talk about the two table solution. Without getting too far into the weeds talking about the number of rows stored on the leaf level of a clustered index, etc., there is a little more overhead for SQL Server to make a join to another table. But, it's a pretty trivial overhead. As long as things are appropriately indexed and the query is filtered, I don't mind joining 20-30 tables in a single query. So the worries on the join are, in my opinion (which are like body parts, every has one & they all stink), not that big a deal. You could pretty easily choose to identify canceled appointments by the JOIN, especially since it's an INNER JOIN as currently defined. But then, finding all non-cancelled appointments, the more common query, yes, that's going to be an OUTER JOIN, potentially slower. Which, makes another point for just storing it locally on the Appointment table.

    I'm absolutely against storing it in both places. I think that's a waste of time & processing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'm not sure about the best option, but I would say that using a single table adding columns for cancellation information.

    I wouldn't use a bit (it might be the appropriate type) and use a nullable datetime column instead to keep track of the cancellation date. It might become useful in the future to get reports about cancellations. I'm not saying you must change it, but you could consider this option. 😉

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks very much for the feedback, both of you. I guess what puts me off putting the cancellation information in the Appointment table is probably a neurosis that dislikes having columns in tables which will be NULL for most records. I know that's often unavoidable, but even though it's true that the relationship between appointments and cancellations would always be one-to-one, it somehow just doesn't feel right to have columns relating to cancellations in a table where the vast majority of appointments won't be cancelled.

    But you've reasoned on it very well so maybe I just need to get over myself!

  • Instead of setting Cancellation, how about AppointmentStatus. Then it's a series of things it could be, tentative, open, closed, cancelled.. other stuff. And it doesn't have to be null... but then, that would actually work better in a separate table since any given appointment will have multiple statuses (statusi?)... more than one status.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant. We actually started from a plan of having an appointment status, but it quickly gets complicated, especially as many statuses aren't mutually exclusive. I've gone with that solution in the past and found that statuses need adding or changing fairly frequently, that people need diagrams to explain the order of the statuses ('Cancelled' is an easy status, but others aren't so obvious), and that the application's workflow tends to be built around the status, leading to issues if the statuses change subsequently.

  • I would say from experience of how these things can go wrong that I would actually go for a mix of the two...

    Have a Cancelled column (Y/N) (and potentially a CancelledDate (datetime)) on the Appointment table, with a separate CancellationReasons table to log the given reasons - because let's face it, there could be more than one reason per cancellation/appointment, and even if there weren't, a cancellation reason could be argued to be related information, not directly Appointment information.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply