RE: Defining keys as sets that must not intersect instead of scalar values that must not be equal

SSC Guru

Points: 104773

December 19, 2017 at 9:54 am

#1972732

TomThomson - Tuesday, December 19, 2017 9:49 AM
Don Halloran - Monday, December 18, 2017 10:37 PM
It is quite common to encounter the problem of needing multiple versions of a thing, which cannot overlap in time. For example in a data warehouse with a slowly changing dimension, the interval represented by a version cannot overlap with the interval representing any other version. Another example might be the case of an employment relationship, where one person can only be employed by a company once at any given instant in time.
The problem with this sort of rule is that it cannot be declaratively enforced. I can define the interval via its start date and end date, but I can put no kind of declarative constraint on the start date column, nor the end date column, to ensure that two intervals in the table do not overlap.
But there's another way of thinking about the start date/end date definition, and that is to think of it as a set of times which has been intentionally defined - ie, defined by a rule. Given these two dates, you can check whether any point in time is a member of "the set of all times between the specified dates". An easier way to think about might be to think about what would happen if we changed this to an extensionally defined set, ie, one in which the set is defined by listing every member. I could create a table containing an employment agreement and a date. Then I look at the start and end date of the employment agreement, and put a row in this table for every date >= the start and < the end date. Then we could say that the combination of those two columns is a key that ensures there are no overlaps.
Of course, the problem with that approach is that I have turned my intentional definition into an extensional definition, and the members aren't the same. I had to pick some level of granularity for the extensional definition. I happened to pick whole days, but what happens when someone says "actually, they can be the same day, but not the same hour", and then later "actually, they can be the same hour, just not the same second", and so on. But the purpose of the example is to demonstrate that the problem of "non overlapping intervals of time" is in fact the same kind of problem as the contact problem above: We would like to define a key as being composed of a set, which in this case is intentionally defined compared to the extensional definition in the case of contact protocol endpoints. In other words, if it were able to define set-based keys, defined either intentionally or extensionally, then the problem of creating a constraint which disallows overlapping intervals in time is the same as creating a constraint which disallows the intersection of two sets.
This is easily solved in a declarative system, assuming you can identify each individual "thing" whose multiple versions can't overlap in time. It's then a matter of defiing an appropriate counting function (that counts the number of pairs of overlapping periods for an individual thing) and a check constraint that requires the result of that function to be zero. Of course if you can't identify the individual "thing"s then you don't have any sort or usable database at all. Current MS documentation of SQL Server's CHECK Constraints gives a nice example of a sort of similar function and the problem that's caused if you try to require row counts to be non-zero (if you don't have any rows there are no rows with counts that are not non-zero, so the constraint gives UNKNOWN instead of FALSE and therefor doesn't work because CHECK constraints object ony to FALSE). It doesn't give an example where a count is required to be zero, because in that case an absence of rows won't cause a problem: if there are no relevant rows there are certainly none that overlap, and if there are relevant rows the check will deliver TRUE if there are no overlaps and FALSE otherwise. So the pronblem you claim to have for avoiding overlapping intervals is imaginary. And of course the omly granularity involved is the granularity of the datetime type you choose to use for the times, and inventing all the rows needed to cover all intervals would just blow your database up into a rdiculous (and pointlessly) large size compared to storing only those intervals that are actually relevant in the real world because something relevant actually changed at the ends of them.

Tom