RE: Defining keys as sets that must not intersect instead of scalar values that must not be equal

SSC Guru

Points: 104773

December 19, 2017 at 1:03 pm

#1972782

Don Halloran - Tuesday, December 19, 2017 10:19 AM
TomThomson - Tuesday, December 19, 2017 9:54 AM
TomThomson - Tuesday, December 19, 2017 9:49 AM
assuming you can identify each individual "thing" whose multiple versions can't overlap in time [...].So the problem you claim to have for avoiding overlapping intervals is imaginary.
Your initial assumption begs the question. The whole point of my claim is that the interval is part of what identifies the thing.
You said: "It doesn't give an example where a count is required to be zero, because in that case an absence of rows won't cause a problem: if there are no relevant rows there are certainly none that overlap"
...which implies a misunderstanding of what I'm suggesting, because that situation cannot possibly occur: if it did then we would have an entity with a key value of the empty set. We cannot have an employment agreement unless the employment agreement spans some interval of time. My claim, very precisely, is that the interval of time, plus the person, functionally determines all other attributes of the employment agreement. Then, by the relational definition of a key, this set of {date, person} tuples is the natural key for the employment agreement.
If we knew that the granularity only needed to extend to the date level, then this extensional definition would be sufficient. But as you said - repeating what I originally said - this isn't always suitable, because some of these sets need to be intentionally defined. It would be infeasible to extensionally define "instants in time". But whether we define the set intentionally or extensionally doesn't make any difference to the concept, because the concept is still that of a set acting as a key.

Begs the question? There is some set of intervals which must not overlap. End of story - if I don't know which intervals are members of that set, I can't enforce non-overlapping whether I do it declaratively or somehow else, and neither can you. So either we have some way of determining hich intervals are in the set, and therefor can count the overlaps (if any), or you don't have any realisable model at all. End of story! I can do it declaratively if I can have the set of intervals, if I don't have the set of intervals you don't have any sort of database.

There are no relevant rows - surely to anyone it's utterly obvious that "relevant rows" means rows in the set generated from pairs of overlapping intervals within a set which is supposed not to overlap, not rows in the original set!

A set of Date, Person tuples is precisely the thing that at the front of your reply you said couldn't exist (because you appear to say that there's no such thing as a Person). And claiming it's the key to an employment agreement is nonsense, since a date is a point in time, not an interval (unless you have storage and processing time to burm, and are willing to have 7,305 dates insteaa of just two to cover a 20 year emploment for perhaps each of a few thousand employees. And even if it were an interval the set would not be an employment agreement, it would be multiple employment agreements. I had about a dozen different employments by the European Commision at different dates and in different roles, with separate employment agreements for each. Clearly it would be nonsense for those to overlap (I couldn't be in Brussels and be chairing a meeting in the south of France both at the same time, could I?); and all sorts of other things (like where payment went, whether what I received from them was tax exempt or not, whether they paid directly for my travels and living expenses or gave me a fixed allowance for them, whether I was required to work in English or in French, whether I would do half or more of the job from home or have to be in the appropriate office(s) for all of it) were different in some respect for each employment (legally different, with effects separate in EU law and in British law - you don't expect to ignore the law with your database, I hope). But they were all supposed to be full time employment on the days I worked on them, so they could not (in theory) overlap on dates - and that is something you will need to be able to handle if you want to have a general capacity to represent employments.

On the other hand, employment by two different employers can overlap on dates in various special circumstances (I've been there).

The problem with the set acting as a key is that it works provided you can look within the key, just as in a conventional relational system you can look at the set of named scalars (columns) that form the key. It doesn't work if you can't, just as the relational model would break down if you couldn't look at projections of the key instead of at the full key.

Let's be clear: I'm not saying that what your are aiming for can't be made to work. With enough care and peristence in beating out all (or even just enough of) the problems, it probably can - actually I think it's not joust probably but very probably. But working on non-solutions to non-problems (like the overlap of periods one) isn't what you want to spend time on. Back in 1970 I was an academic looking at some information retrieval problems and I put a lot of effort in and got exactly nowhere becasue I didn't really understand how the existing (pre-relational) models had tackled that problem (the things I did know about back then were error management, programming language design and computation for physics and chemistry problems, not databases and info retrieval, and the things I lectured on were Fortran programming and mathematical algorithms).

I don't know if you've read Ted Codd's early papers on the relational model, but if you haven't you should read them - not to undertsand the relational model, but to understand why Codd arrived at it, what he was trying to achieve, what effect it would have on the whole database idea as it was before then. Looking at someof the pre-relational stuff wold be a good idea too, and so would looking at some of the modern stuff. and then thinking about how what those people were/are trying to do relates to what your new model is aiming at, and how what your new model might be applicable to some of the problems that they were trying to solve (but I would recommend ignoring Chris Date and Fabian Pascal as far as possible). Of course for all I know you've already done all that reading and thinking, and if you have my comments are not much help.

Tom