Needed to find all possible Dates Overlaps from a single Table

Question

Needed to find all possible Dates Overlaps from a single Table

Hell-1934

Old Hand

Points: 333
More actions
February 22, 2020 at 8:17 am

#3727882
Hi,
I have the following Episode table example:
Where DischDate = "2020-02-20" means - client hasn't been discharged.
My goal is to display all possible date range - AdmDate - DischDate - overlaps combinations for the Client (EHRClientFK)
I expect to display the values for EHRClientFK = 2, 3, 4, 5 (see picture below)
Values where EHRClientFK = 8, 9, 10 have less than 2 dates combinations
Values for EHRClientFK = 6, 7 have no overlaps
So I expect the result, like :
Here is my Template of all the Overlaps:
I wrote the following code to achieve my goal, but when I apply it on the "Real" table (with more than 20K records) - I still see some NON overlapping dates - (for example Value 2 with AdmDate1 = 2/10/20 DischDate1 = 2/12/20; AdmDate2 = 2/13/20)and also the values with just one date range AdmDate1, DischDate1 and no dates 2...
Please see my code below:
My question - how should I change my Where clause in order to only dispaly more than one date range for my Client + the Overlap Dates only?
```
SELECT 
           a.[EHRClientFK]
          ,a.[AdmDate]
          ,a.[DischDate]

    FROM [WH].[dbo].[Episode] a
    INNER JOIN [WH].[dbo].[Episode] b ON a.EHRClientFK = b.EHRClientFK 

    WHERE
         ((a.DischDate = '2020-02-20') AND (b.DischDate = '2020-02-20'))
      OR ((a.AdmDate <= b.DischDate) AND (b.AdmDate <= a.DischDate))


   GROUP BY
           a.[EHRClientFK]
          ,a.[AdmDate]
          ,a.[DischDate]

   HAVING COUNT (*) >1
```

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63686 More actions · Answer 1

Seems like you should be able to join the (AdmitDate, DischargeDate) to a Calendar table to expand the date range to a set of dates, and then group by EHRClientFK. Then any with a count > 1 has an overlap on that day.

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 2

Nice post and perfectly described but could you write the code without test data to know if you wrote the code correctly? Neither can we. 😉 Take the data you used to make this graphic and render it as readily consumable data in the form of code. Help us help you. Please see the first link in my signature line below for one way to do that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18405 More actions · Answer 3

Jeff, Here is the data in a consumable format. I couldn't work out from the question exactly what's wanted so I'll leave it to someone else.

;with Episode as
(
    select * from (values
    (2,convert(datetime,'20181010'),convert(datetime,'20200220')),
    (2,'20181012','20200220'),
    (4,'20190508','20200515'),
    (4,'20190510','20200220'),
    (3,'20190522','20190530'),
    (3,'20190523','20190604'),
    (5,'20181212','20181227'),
    (5,'20181214','20181222'),
    (6,'20200204','20200210'),
    (6,'20200212','20200215'),
    (7,'20200217','20200218'),
    (7,'20200219','20200220'),
    (8,'20200205','20200207'),
    (9,'20200405','20180410'),
   (10,'20170601','20200220')) T(EHRClientFK, AdmDate, DischDate)
)
SELECT a.[EHRClientFK],
       a.[AdmDate],
       a.[DischDate]
  FROM [Episode] a
 INNER JOIN [Episode] b ON a.EHRClientFK = b.EHRClientFK 
 WHERE ((a.DischDate = '2020-02-20') AND (b.DischDate = '2020-02-20'))
        OR ((a.AdmDate <= b.DischDate) AND (b.AdmDate <= a.DischDate))
 GROUP BY a.[EHRClientFK] ,a.[AdmDate] ,a.[DischDate]
HAVING COUNT (*) > 1;

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 4

Jonathan AC Roberts wrote:

Jeff, Here is the data in a consumable format. I couldn't work out from the question exactly what's wanted so I'll leave it to someone else.

Except for a little bit of unnecessary criteria, it looks to me like you worked it out just fine, Jonathan. All I'm doing below is the same as you except I'm putting the test data into a table to demonstrate that the Cartesian Product can be avoided by the addition of an index and eliminating the extra criteria.

And thanks for posting the readily consumable data. We now have two different methods to show the OP how to help the people both understand the question better and make it easier for them to provide a working coded solution.

Here's my version of the readily consumable test data with a couple of explanations added in.

--=============================================================================
--      Create and populate the test table.
--      This is not a part of the solution.  We are just creating demonstrable,
--      readily consumable test data to test code with.
--=============================================================================
--===== If it exists, drop the test table to make reruns in SSMS easier.
     IF OBJECT_ID('TempDB..#Episode','U') IS NOT NULL
   DROP TABLE #Episode
;
GO
--===== Without knowing anything else about the table, create the test table
     -- with a guess at the datatypes.  Change then if necessary. The solution
     -- code should still work unless you do something totally wonky with the
     -- datatypes.
 CREATE TABLE #Episode
        (
         EHRClientFK    INT  NOT NULL
        ,AdmDate        DATE NOT NULL
        ,DischDate      DATE
        )
;
--===== Assuming that there''s more to this table than just those 3 columns
     -- and that a Clustered Index already exists on the table, add a UNIQUE
     -- Non-Clustered Index to the table to make it so that we don''t end up
     -- with a Cartesian Product due to scans in the upcoming query.
 CREATE UNIQUE NONCLUSTERED INDEX IX_Cover01
     ON #Episode (EHRClientFK, AdmDate, DischDate)
;
--===== Populate the test table with test data according to what the OP posted.
 INSERT INTO #Episode WITH (TABLOCK)
        (EHRClientFK, AdmDate, DischDate)
 VALUES  ( 2,'20181010','20200220')
        ,( 2,'20181012','20200220')
        ,( 4,'20190508','20200515')
        ,( 4,'20190510','20200220')
        ,( 3,'20190522','20190530')
        ,( 3,'20190523','20190604')
        ,( 5,'20181212','20181227')
        ,( 5,'20181214','20181222')
        ,( 6,'20200204','20200210')
        ,( 6,'20200212','20200215')
        ,( 7,'20200217','20200218')
        ,( 7,'20200219','20200220')
        ,( 8,'20200205','20200207')
        ,( 9,'20200405','20180410')
        ,(10,'20170601','20200220')
;

Here's one solution almost identical to Jonathan's but without the extra criteria. Please READ THE WARNING IN THE HEADER!

--=============================================================================
--      Solve the problem in a fashion similar to how Jonathan AC Roberts
--      solved it in his previous post on this thread.
--      To learn how it works, please see the following article on the subject.
-- https://www.sqlservercentral.com/articles/finding-%e2%80%9cactive%e2%80%9d-rows-for-the-previous-month-sql-spackle-1
--      NOTE THAT THIS CODE COULD BECOME A PERFORMANCE ISSUE IF THERE ARE A LOT
--      OF ROWS FOR ANY GIVEN EHRClientFK VALUE DUE TO MULTIPLE SMALLER 
--      CARTESIAN PRODUCTS.  IF THAT''S THE CASE, PLEASE POST BACK AND WE''LL USE
--      ANOTHER METHOD.
--=============================================================================
 SELECT  a.EHRClientFK
        ,a.AdmDate
        ,a.DischDate
   FROM ##Episode a
   JOIN ##Episode b  ON a.EHRClientFK = b.EHRClientFK 
                    AND a.AdmDate    <= b.DischDate
                    AND b.AdmDate    <= a.DischDate
  GROUP BY a.EHRClientFK, a.AdmDate, a.DischDate
 HAVING COUNT (*) > 1
;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18405 More actions · Answer 5

Jeff Moden wrote:

Jonathan AC Roberts wrote:
Jeff, Here is the data in a consumable format. I couldn't work out from the question exactly what's wanted so I'll leave it to someone else.
Except for a little bit of unnecessary criteria, it looks to me like you worked it out just fine, Jonathan.
--=============================================================================
--      Solve the problem in a fashion similar to how Jonathan AC Roberts
--      solved it in his previous post on this thread.

It's not my solution, I just pasted in the OP's SQL from the question.

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 6

Heh... I definitely blew that one then. I didn't look at the OPs code closely enough to realize that you had provided a copy of the same code. I was also confused because it provided the correct answer according to the graphics the OP posted.

Anyway, thanks for the correction, Jonathan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ben.brugman SSChampion Points: 13369 More actions · Answer 7

Hell-1934 wrote:

In general when using the cases of overlap, people tend to forget specific cases because there multiple overlap situations. In the above I am missing for example overlaps where the second line the A becomes before the first line.

So testing for a 'NON' overlap is far simpler test if the end. In text:

If period A finishes before period B starts the periods do not overlap.

OR

If period B finishes before period A starts the periods do not overlap.

Do not overlap is true when:

A.finish <= B.start OR B.finish <= A.start

The 'inverse' of this test Overlap:

NOT (A.finish <= B.start OR B.finish <= A.start)

this is equivalent to (Overlap):

A.finish>B.start AND B.finish>A.start

Remark, NULL values do throw of the NOT in this reasoning (three valued logic), so get rid of the NULL's before using the above. If there is a NULL use a very early date for the start or a very late date for the finish.

Starting from the 'non overlap' test en progressing to the 'overlap' test makes the thinking process so much simpler, only two conditions have to be tested and the understanding is far simpler. This had been done wrong by plenty of people even people considered to be SQL Guru's at the top level (although that was Oracle).

Hope to have made the world a bit simpler with the above reasoning.

Ben

This reply was modified 6 years, 5 months ago by ben.brugman. Reason: Again the formatting is not briliant, sorry for that

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 8

Here's an option that eliminates the need for a self join by using windowing functions (LAG & LEAD)

WITH 
cte_overlap AS (
SELECT 
e.EHRClientFK,
e.AdmDate,
e.DischDate,
Overlap = CASE 
WHEN 
e.AdmDate < LAG(e.DischDate, 1, '1900-01-01') OVER (PARTITION BY e.EHRClientFK ORDER BY e.AdmDate) 
OR
e.DischDate > LEAD(e.AdmDate, 1, '9999-12-31') OVER (PARTITION BY e.EHRClientFK ORDER BY e.AdmDate)
THEN 1 
ELSE 0 
END 
FROM
#Episode e
)
SELECT 
o.EHRClientFK,
o.AdmDate,
o.DischDate
FROM
cte_overlap o
WHERE 
o.Overlap = 1;

Hell-1934 Old Hand Points: 333 More actions · Answer 9

Thank you very much - ben.brugman + Jason A. Long!

These are the decisions that should definately work, I am going to try today.

I tried the previous authors but unfortunately - no difference with what I wrote and the same mistakes ((

Yes, case is probably more elegant (I knew but wasn't sure how to use it)

Displaying DO NOT overlap variations seems like very correct one!

Thx again!

This reply was modified 6 years, 5 months ago by Hell-1934.

Hell-1934 Old Hand Points: 333 More actions · Answer 10

Also, thank you - the authors above - next time when problem I'll read the "posting" rules.

As I am thrown between the different projects and systems (languages) and can't 100% be concentrated on sql, there will be problems, for sure , again -)