Many to Many join with date ranges

  • This problem is really foxing me and I'm struggling to think of a strategy to solve it, let alone write the code! I belive it's a fairly tricky problem, so any help you guys can give would be much appreciated.

    Two tables are linked by a person ID. They are Address and SportsClub. A person can be in a single club whilst living at mulitple addresses, or at a single address whilst at multiple clubs - or a combination of both. The first table below holds records of the clubs someone is a member of over time. The second table holds their postal adresses as they change over time.

    PersonClub[ClubFromDate][ClubToDate]

    14CB101/01/200101/02/2001

    14MK401/02/200101/03/2001

    14LD501/03/200101/08/2001

    14AA701/08/200101/02/2002

    PersonAddress[AddFromDate][AddToDate]

    14Cambs01/01/199501/04/2001

    14Bristol01/04/200101/12/2001

    14London01/12/200101/12/2007

    I need to build a single table containing the records for both, such as:

    PersonClubAddressClubFromDateClubToDateAddFromDateAddToDate

    14CB1Cambs01/01/200101/02/200101/01/199501/04/2001

    14MK4Cambs01/02/200101/03/200101/01/199501/04/2001

    14LD5Cambs01/03/200101/08/200101/01/199501/04/2001

    14LD5Bristol01/03/200101/08/200101/04/200101/12/2001

    14AA7Bristol01/08/200101/02/200201/04/200101/12/2001

    14AA7London01/08/200101/02/200201/12/200101/12/2007

    I think the solution in plain English is as follows:

    > Start with the first Club.

    If the ClubFromDate falls between AddFromDate and AddToDate, then display the row joined with that Address.

    Next, if the ClubToDate falls between AddFromDate and AddToDate, then display another row joined with that Address.

    > Loop through each Club.

    I can't see how to do this in SQL and any solution I come up with will be very clunky and probably use a cursor - however these tables have millions of rows and performance is a bit of an issue. I look forward to hearing your creative ideas and thanks again in advance!

  • Something like this?

    select s.Person,

    s.Club,

    a.Address,

    s.ClubFromDate,s.ClubToDate,

    a.AddFromDate,a.AddToDate

    from SportsClub s

    inner join Address a on a.Person=s.Person

    and a.AddToDate > s.ClubFromDate

    and a.AddFromDate < s.ClubToDate

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • It does not look too difficult. Maybe I am missing something. Check this out.

    Create table #Person (person int, Club varchar(5), ClubFromDate datetime, ClubToDate datetime)

    GO

    Create table #Address (person int, Address varchar(25), AddFromDate datetime, AddToDate datetime)

    GO

    insert into #Person

    Select 14, 'CB1', '01/01/2001', '01/02/2001'

    UNION ALL

    Select 14, 'MK4', '01/02/2001', '01/03/2001'

    UNION ALL

    Select 14, 'LD5', '01/03/2001', '01/08/2001'

    UNION ALL

    Select 14, 'AA7', '01/08/2001', '01/02/2002'

    insert into #Address

    Select 14, 'Cambs', '01/01/1995', '01/04/2001'

    UNION ALL

    Select 14, 'Bristol', '01/04/2001', '01/12/2001'

    UNION ALL

    Select 14, 'London', '01/12/2001', '01/12/2007'

    Select a.Person, Club, Address, ClubFromDate, ClubToDate, AddfromDate, AddtoDate from #person as a

    JOIN #Address as b

    ON a.Person = b.Person and (a.ClubfromDate between b.Addfromdate and b.AddToDate or a.ClubToDate between b.Addfromdate and b.AddToDate)

    The result looks like what you wanted.

    -Roy

  • Mark's solution is the one I would use. Should do exactly what you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • There are 5 overlap conditions between a Club interval and and Address interval which means that they occur during the same interval. This is easiest shown graphically with an interval for the Club and then identify the Address intervals that overlap:

    Club Interval|----------|

    1 Address Equal|----------|

    2 Address within |-----|

    3 Address Before |------|

    4 Address after |-----|

    5 Address complete |------------------|

    The SQL for these overlappig conditions is:

    SELECTAddress.PersonId

    ,Club.ClubId

    ,Address.AddressDescr

    ,Club.FromDate , Address.FromDate, Club.ToDate , Address.ToDate

    --Greater of the From Dates

    ,CASE WHEN Address.FromDate < Club.FromDate then Club.FromDate

    else Address.FromDate

    end as FromDate

    --Lesser of the To Dates

    ,CASE WHEN Address.ToDate > Club.ToDate then Club.ToDate

    else Address.ToDate

    end as ToDAte

    FROM Club

    joinAddress

    on Address.PersonId= Club.PersonId

    WHERE(( Address.FromDate = Club.FromDate AND Address.ToDate = Club.ToDate ) -- 1 Address Equal CLub

    or( Address.FromDate between Club.FromDate AND Club.ToDate

    AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 2 Address within CLub

    or( Address.FromDate < Club.FromDate

    AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 3 Address starts Before Club and ends within

    or( Address.FromDate between Club.FromDate AND Club.ToDate

    AND Address.ToDate >= Club.ToDate ) -- 4 Address starts within Club and ends after

    or(Address.FromDate Club.ToDate ) -- 5

    )

    SQL = Scarcely Qualifies as a Language

  • All five of those are covered by the conditions in Mark's query. I've used that method before, and it works.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • SQL Code for condition 5 got messed up and the Greater than and Less Than signs are making the code disappear - just replace GT and LT with the approriate symbols.

    or(Address.FromDate LT Club.FromDate AND Address.ToDate GT Club.ToDate ) -- 5

    [/]

    SQL = Scarcely Qualifies as a Language

  • Carl Federl (5/13/2008)


    There are 5 overlap conditions between a Club interval and and Address interval which means that they occur during the same interval. This is easiest shown graphically with an interval for the Club and then identify the Address intervals that overlap:

    Club Interval|----------|

    1 Address Equal|----------|

    2 Address within |-----|

    3 Address Before |------|

    4 Address after |-----|

    5 Address complete |------------------|

    The SQL for these overlappig conditions is:

    SELECTAddress.PersonId

    ,Club.ClubId

    ,Address.AddressDescr

    ,Club.FromDate , Address.FromDate, Club.ToDate , Address.ToDate

    --Greater of the From Dates

    ,CASE WHEN Address.FromDate < Club.FromDate then Club.FromDate

    else Address.FromDate

    end as FromDate

    --Lesser of the To Dates

    ,CASE WHEN Address.ToDate > Club.ToDate then Club.ToDate

    else Address.ToDate

    end as ToDAte

    FROM Club

    joinAddress

    on Address.PersonId= Club.PersonId

    WHERE(( Address.FromDate = Club.FromDate AND Address.ToDate = Club.ToDate ) -- 1 Address Equal CLub

    or( Address.FromDate between Club.FromDate AND Club.ToDate

    AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 2 Address within CLub

    or( Address.FromDate < Club.FromDate

    AND Address.ToDate between Club.FromDate AND Club.ToDate ) -- 3 Address starts Before Club and ends within

    or( Address.FromDate between Club.FromDate AND Club.ToDate

    AND Address.ToDate >= Club.ToDate ) -- 4 Address starts within Club and ends after

    or(Address.FromDate Club.ToDate ) -- 5

    )

    There are 2 scenarios where Club and Address do *not* overlap

    Club Interval |----------|

    1 Address Before |------|

    2 Address After |-----|

    Scenario 1 is where address end date is earlier than club start date, in other words

    Address.AddToDate < SportsClub.ClubFromDate

    Scenario 2 is where address start date is after than club end date, in other words

    Address.AddFromDate > SportsClub.ClubToDate

    For no overlap, either one of these conditions must be true

    Address.AddToDate < SportsClub.ClubFromDate OR

    Address.AddFromDate > SportsClub.ClubToDate

    Therefore the condition for overlap is the inverse of the above case. Here I'm

    ignoring the equality case ( > vs. >= ) as this hasn't been specified by the OP

    as to whether this counts as overlap or not.

    NOT (Address.AddToDate < SportsClub.ClubFromDate OR

    Address.AddFromDate > SportsClub.ClubToDate)

    which is equivalent to

    Address.AddToDate > SportsClub.ClubFromDate AND

    Address.AddFromDate < SportsClub.ClubToDate

    This gives the solution I posted.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks a lot for that guys, much appreciated. I'm going to have to sit down and understand the code, but it seems okay at first glance.

  • On another thread I posted my X-paradigm for easily remembering how to code if two dates overlap. Thus two intervals S1-F1 and S2-F2 overlap if S1 < F2 and S2 < F1.

    S1 F1

    X

    S2 F2

  • There are 2 temporal designs for how to define a period's end:

    1. The end is one time unit less than "next" begin. This is the "closed-closed" design.

    2. The end is equal to the same "next" begin. This is the "closed-open" design.

    Reference "Developing Time-Oriented Database Applications in SQL" at

    http://www.cs.arizona.edu/people/rts/tdbbook.pdf

    The time unit depends on the datatype chosen:

    For datetime, 3 milliseconds

    For smalldatetime, 1 hour

    For date, 1 day

    For datetime2, variable with a maximum precision of seven decimal places for seconds. ( 0.0000001 )

    In Tom Clark's example data, the ToDate is the same as the "next" FromDate:

    Person Address [AddFromDate] [AddToDate]

    14 Cambs 01/01/1995 01/04/2001

    14 Bristol 01/04/2001 01/12/2001

    14 London 01/12/2001 01/12/2007

    [\code]

    Hence, this is the "closed-open" design.

    Mark 's solution works with a "closed-closed" design but does not return correct results under a "closed-open" design. My solution works only with "closed-open" design.

    Comparing the two solutions, the "closed-closed" design has significantly less complicated SQL but can is more confusing for personnel entering the period end units.

    SQL = Scarcely Qualifies as a Language

  • Mark's solution does work with the "closed-open" design, it's just a question of whether you use "<=" or "<" in the join.

    If you need it, I can provide the mathematical proof for this. It's very simple, 9th/10th grade Geometry in a one-dimensional space.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Finally had time to examine Mark's "not excluded" solution in detail and it produces exactly the same results as the more complicated included solution.

    Reference: "Propositional Login" http://en.wikipedia.org/wiki/Propositional_logic[\url]

    Given Constraints:

    1)PersonAddressFromTs < PersonAddressToTs

    2)PersonClubFrom< PersonClubToTs

    For closed-closed intervals:

    AND NOT ( PersonAddressToTs < PersonClubFromTs

    OR PersonAddressFromTs > PersonClubToTs

    )

    Applying De Morgan's Theorem (2)

    AND NOT ( PersonAddressToTs < PersonClubFromTs )

    AND NOT ( PersonAddressFromTs > PersonClubToTs)

    Changing "Not Comparison" to "Reverse comparison", e.g.

    "NOT Lesser Than" to "Greater than or Equal"

    "NOT Greater Than" to "Less than or Equal"

    AND PersonAddressToTs >= PersonClubFromTs

    AND PersonAddressFromTs <= PersonClubToTs

    For closed-open intervals:

    AND NOT ( PersonAddressToTs <= PersonClubFromTs

    OR PersonAddressFromTs => PersonClubToTs

    )

    Can be transformed to :

    AND PersonAddressToTs > PersonClubFromTs

    AND PersonAddressFromTs < PersonClubToTs

    I have looked at about 50 temporal join solution and all use the complex include solution instead of the much simpler "not excluded" solution.

    For a schema containing 1000 Person, 3 Clubs for each person and 1 to 5 Addresses per Person, the "not included" and the transformed version, under SQL Server 2008, all produce identical execution plans. Seems the optimizer does know De Morgan's Theorems

    SQL = Scarcely Qualifies as a Language

  • Wow, that reminded me of some long forgotten university lectures. Nice work!

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply