SQLServerCentral Article

Group Islands of Contiguous Dates (SQL Spackle)

,

SQL Spackle" is a collection of short articles written based on multiple requests for similar code. These short articles are NOT meant to be complete solutions. Rather, they are meant to "fill in the cracks".

--Phil McCracken

Introduction

One of the more common questions I've run into is "How can I determine overlapping dates?" What most people seem to mean by that is "How can I group dates into 'Islands' of contiguous dates?"

Let's fill in this "crack" with our good friend, ROW_NUMBER.

Test Data

You just can't appreciate any code without some test data. As always, the details are in the code:

--=============================================================================
--      Create the test data.  This is NOT a part of the solution.
--      This is virually instantaneous.
--=============================================================================
--===== Conditionally drop the test table to make reruns easier.
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
;
GO
--===== Create the test table
 CREATE TABLE #MyHead
        (SomeDate DATETIME)
;
--===== Populate the test table with test data
 INSERT INTO #MyHead
        (SomeDate)
 SELECT '2010-01-01' UNION ALL --1st "Group" of dates (StartDate and EndDate)
 SELECT '2010-01-01' UNION ALL --Duplicate date
 SELECT '2010-01-03' UNION ALL --2nd "Group" of dates (StartDate and EndDate)
 SELECT '2010-01-05' UNION ALL --3rd "Group" of dates (StartDate)
 SELECT '2010-01-06' UNION ALL --3rd "Group" of dates (EndDate)
 SELECT '2010-01-10' UNION ALL --4th "Group" of dates (StartDate)
 SELECT '2010-01-10' UNION ALL --Duplicate date
 SELECT '2010-01-11' UNION ALL --4th "Group" of dates
 SELECT '2010-01-11' UNION ALL --Duplicate date
 SELECT '2010-01-11' UNION ALL --Duplicate date
 SELECT '2010-01-12'           --4th "Group" of dates (EndDate)
;

What we want for a result is a simple listing of contiguous date ranges and the number of days for each range. Like this...

StartDate               EndDate                 Days
----------------------- ----------------------- -----------
2010-01-01 00:00:00.000 2010-01-01 00:00:00.000 1
2010-01-03 00:00:00.000 2010-01-03 00:00:00.000 1
2010-01-05 00:00:00.000 2010-01-06 00:00:00.000 2
2010-01-10 00:00:00.000 2010-01-12 00:00:00.000 3

Let's start by first finding unique dates. We'll also throw in a row number just for discussion purposes.

--===== Find Unique dates and add a row number
 SELECT UniqueDate = SomeDate,
        RowNum     = ROW_NUMBER() OVER (ORDER BY SomeDate)
  FROM #MyHead
  GROUP BY SomeDate
;

That gives us the following. Notice the comments?

UniqueDate              RowNum
----------------------- --------------------
2010-01-01 00:00:00.000 1
2010-01-03 00:00:00.000 2
2010-01-05 00:00:00.000 3
2010-01-06 00:00:00.000 4
2010-01-10 00:00:00.000 5 --What date will we have if we subtract the 5?
2010-01-11 00:00:00.000 6 --What date will we have if we subtract the 6?
2010-01-12 00:00:00.000 7 --What date will we have if we subtract the 7?

If we look at the rows above with the comments and subtract the RowNum as suggested, those rows will all end up with the same date of 2010-01-05. In fact, if we subtract the RowNum on each row from the UniqueDate on that same row using the following code...

--===== Find unique dates and create "date groups"
 SELECT UniqueDate = SomeDate,
        DateGroup  = DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY SomeDate), SomeDate)
  FROM #MyHead
  GROUP BY SomeDate
;

... we end up with the following result:

UniqueDate              DateGroup
----------------------- -----------------------
2010-01-01 00:00:00.000 2009-12-31 00:00:00.000 --1st Group
2010-01-03 00:00:00.000 2010-01-01 00:00:00.000 --2nd Group
2010-01-05 00:00:00.000 2010-01-02 00:00:00.000 --3rd Group
2010-01-06 00:00:00.000 2010-01-02 00:00:00.000 --3rd Group
2010-01-10 00:00:00.000 2010-01-05 00:00:00.000 --4th Group
2010-01-11 00:00:00.000 2010-01-05 00:00:00.000 --4th Group
2010-01-12 00:00:00.000 2010-01-05 00:00:00.000 --4th Group

Now, even though the DateGroup looks like a date, it really doesn't mean anything except for the fact that it very handily makes identical dates for each group of contiguous dates. That also means we're almost done filling this crack because all we need to do now is find the MIN and MAX UniqueDate for each DateGroup. Add a little more code to count the number of days and we're done!

WITH
cteGroupedDates AS
( --=== Find the unique dates and assign them to a group.
     -- The group looks like a date but the date means nothing except that adjacent
     -- dates will be a part of the same group.
 SELECT UniqueDate = SomeDate,
        DateGroup  = DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY SomeDate), SomeDate)
  FROM #MyHead
  GROUP BY SomeDate
)
--===== Now, if we find the MIN and MAX date for each DateGroup, we'll have the
     -- Start and End dates of each group of contiguous daes.  While we're at it,
     -- we can also figure out how many days are in each range of days.
 SELECT StartDate = MIN(UniqueDate),
        EndDate   = MAX(UniqueDate),
        Days      = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1
   FROM cteGroupedDates
  GROUP BY DateGroup
  ORDER BY StartDate
;

Here are the results of that bit of computational heaven...

StartDate               EndDate                 Days
----------------------- ----------------------- -----------
2010-01-01 00:00:00.000 2010-01-01 00:00:00.000 1
2010-01-03 00:00:00.000 2010-01-03 00:00:00.000 1
2010-01-05 00:00:00.000 2010-01-06 00:00:00.000 2
2010-01-10 00:00:00.000 2010-01-12 00:00:00.000 3

We can just as easily do the same thing for multiple "objects". Here's the test code to build random date ranges across 10 years for 10 different objects for a total of 100,000 rows:

--=============================================================================
--      Create the test data.  This is NOT a part of the solution.
--      This is virually instantaneous.
--=============================================================================
--===== Conditionally drop the test table to make reruns easier.
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
;
GO
--===== This builds 10 years of random dates stating with 2010-01-01
     -- for 10 different "object" ID's.
     -- Note that dates can be duplicated or can be missing.
 SELECT TOP (100000)
        SomeObjectID = ABS(CHECKSUM(NEWID()))%10+1,
        SomeDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%3652,'2010')
   INTO #MyHead
   FROM master.sys.all_columns ac1
  CROSS JOIN master.sys.all_columns ac2
;

All we have to do is to take into account that we have multiple "object" ID's. That means that we need to include SomeObjectID in the GROUP BYs and ORDER BYs and we need to start the row number over when we run into a new "object" ID by using PARTION BY.

WITH
cteGroupedDates AS
( --=== Find the unique dates and assign them to a group.
     -- The group looks like a date but the date means nothing except that adjacent
     -- dates will be a part of the same group.
 SELECT SomeObjectID,
        UniqueDate = SomeDate,
        DateGroup  = DATEADD(dd, - ROW_NUMBER() OVER (PARTITION BY SomeObjectID ORDER BY SomeObjectID,SomeDate), SomeDate)
  FROM #MyHead
  GROUP BY SomeObjectID,SomeDate
)
--===== Now, if we find the MIN and MAX date for each DateGroup, we'll have the
     -- Start and End dates of each group of contiguous daes.  While we're at it,
     -- we can also figure out how many days are in each range of days.
 SELECT SomeObjectID,
        StartDate = MIN(UniqueDate),
        EndDate   = MAX(UniqueDate),
        Days      = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1
   FROM cteGroupedDates
  GROUP BY SomeObjectID,DateGroup
  ORDER BY SomeObjectID,StartDate
;

Of course, the output is a little to long to post in this article so give the code a try to find out for yourself! Just to make it easier for you to run, I've attached the code to build the test table and solve the problem for this longer solution as a bit of "crack filler".

Crack filled! 😉

Thanks for listening, folks.

--Jeff Moden

p.s. I just found out that this method is also available in the "SQL Server MVP Deep Dives" book (available at the following URL: http://www.manning.com/nielsen/ ). All of the profits from that book benefit the "War Child International" charity. It looks like a good "crack" to fill in the world and have just placed an order for it.

Resources

Rate

4.8 (92)

You rated this post out of 5. Change rating

Share

Share

Rate

4.8 (92)

You rated this post out of 5. Change rating