SQLServerCentral Article

Finding “Active” Rows for the Previous Month (SQL Spackle)

,

"SQL Spackle" is a collection of short articles written based on multiple requests for similar code or to share short methods for getting certain things done. These short articles are NOT meant to be complete solutions. Rather, they are meant to "fill in the cracks."

--Phil McCracken

Introduction

One of the classic problems in SQL is finding “active” rows in a table for a given date range, typically for the previous month.  It sounds easy enough.  Just find everything that has a date in the row that falls between the start of the month (inclusive) and the start of next month (exclusive).  But what if each row in the table has both a StartDate and an EndDate each of which may or may not be in the given month?  What if one of the rows started in January 2013, ended in January 2014, and you want to know if that row was active during June 2013?  Now it gets a bit more complicated.

This classic problem has appeared in the forums here on SSC several times in the past 6 months so I thought I’d write a short article on the subject.  The example used in this article happens to be based on month boundaries but the concept can be used for other date-part boundaries and non date-part boundaries, as well.

Typical Requirements

The general request on most of the posts is usually something like the following:

“I want some code that will return all of the data for all of last month based on the start and end dates.”

They go into other details but, to make a much longer story shorter, here are the six rules that most people want the code to follow as to which rows will be included in the return or not.  Note that this assumes that the StartDate is <= the EndDate and that neither date is NULL.

  1. Both dates are less than (to the “left”) and outside of the desired month. Don't include.
  2. Both dates are greater than (to the “right”) and outside of the desired month. Don't include.
  3. Both dates within the desired month. Include.
  4. StartDate within the desired month. Include.
  5. End date within the desired month. Include.
  6. StartDate is less than (to the “left”) and outside the desired month and EndDate is greater than (to the “right”) of and outside the desired month. In other words, the row dates span the entirety of the desired month. Include.

A lot of people write code for all six conditions and it’s not necessary.  All of the rules above boil down to just two simple conditions and it’s easier to understand them if I draw a simple picture.

A Pictorial of the Problem

Here’s the simple picture of all the rules from above.  I’ve numbered each range of values with the rule that applies from above.  Each range is marked with an “S” for StartDate and an “E” for EndDate.  The two sets of pipes are markers for the desired month and they have been marked as “DS” for “Desired StartDate” and “DE” for “Desired EndDate”.

              | Desired |

Left "Outside"|  Month  |Right "OutSide"

<-------------|---------|-------------->

              |         |

   S--1--E    | S--3--E |    S--2--E

              |         |

           S--5--E   S--4--E

              |         |

   S----------|----6----|----------E

              |         |

              DS        DE

LEGEND:

S  = StartDate of a row.

E  = EndDate of a row.

DS = StartDate of desired month.

DE = EndDate of desired month.

Stating the obvious, again, we don’t want to include ranges 1 or 2 because no part of those ranges are within the desired range of the month.  There are two ways to solve this problem using only two simple conditions each that will meet the requirements of all six rules.

First, we need some test data to demonstrate with.

Test Data

To make the test data match the diagram, the “RangeType” column will contain the range type number from the diagram.  We’ll pretend that the desired month is October 2013 for this POP code (Proof of Principle).

--===== Do this in a nice, safe place that everyone has.
    USE tempdb
;
--===== Conditionally drop the test table to make reruns in SSMS easier
     IF OBJECT_ID('tempdb.dbo.TestTable','U') IS NOT NULL
        DROP TABLE tempdb.dbo.TestTable
GO
;
--===== Create the test table
 CREATE TABLE dbo.TestTable
        (
         RangeType INT         NOT NULL PRIMARY KEY CLUSTERED
        ,StartDate DATETIME    NOT NULL
        ,EndDate   DATETIME    NOT NULL
        ,Comment   VARCHAR(30) NOT NULL
        )
;
--===== Populate the test table with known test data.
 INSERT INTO dbo.TestTable
        (RangeType,StartDate,EndDate,Comment)
 SELECT 1,'01 Sep 2013','30 Sep 2013','Both Left Outside'  UNION ALL
 SELECT 2,'01 Nov 2013','30 Nov 2013','Both Right Outside' UNION ALL
 SELECT 3,'01 Oct 2013','31 Oct 2013','Both Inside'        UNION ALL
 SELECT 4,'31 Oct 2013','30 Nov 2013','StartDate Inside'   UNION ALL
 SELECT 5,'01 Sep 2013','01 Oct 2013','EndDate Inside'     UNION ALL
 SELECT 6,'01 Sep 2013','30 Nov 2013','Both Outside but Straddle'
;
GO

Negative Logic Method

Here’s the logic diagram again just so you don’t have to scroll back to it.

              | Desired |

Left "Outside"|  Month  |Right "OutSide"

<-------------|---------|-------------->

              |         |

   S--1--E    | S--3--E |    S--2--E

              |         |

           S--5--E   S--4--E

              |         |

   S----------|----6----|----------E

              |         |

              DS        DE

If you look at ranges 1 and 2, they have something in common.  Both the StartDate and EndDate are outside of the desired month range of dates.  Writing a bit of pseudo-code for this method, we see that the two conditions (again, assuming that StartDate is always less than EndDate and neither is NULL) in the WHERE clause will find everything that we don't want and then we use that to find everything except those unwanted rows.

To summarize...

  1. If the end date of a given range is less than the start of the desired month, then the whole range is to the left of and outside of the desired month and we don't want to innlude that row.
  2. If the start date of a given range is greater than the end of the desired month, then the whole range is to the right of and outside of the desire month and we don't want to include that row.

In other words, the code excludes everything that could be either a Type 1 or a Type 2 range.

 SELECT columnlist
   FROM dbo.sometable t
  WHERE rowid NOT IN
        ( --=== This finds all Type 1 and 2 range types
         SELECT rowid
           FROM dbo.sometable
          WHERE E < @DS
             OR S > @DE
        )
;

The problem with this method is that we have to make two trips to the table.  Let’s not use this method except to learn from.

Positive Logic Method

The code in the sub-query of the Negative Logic Method is easy to understand but it’s backwards of what we want.  The sub-query returns everything that we want to exclude.  In order to change it to everything that we want to include, we have to reverse the logic.

The rules for reversing the logic are to use the opposite of the all the relationships in the criteria of the WHERE clause.  The opposite of “<” is “>=”, the opposite of “>” is “<=”, and the opposite of “OR” is “AND”. 

WARNING-WARNING-WARNING!!!  LOGIC REVERSALS LIKE THIS WON"T WORK IF NULLs ARE INVOLVED AND MAY NOT WORK IF "NOT" IS INVOLVED!  TEST, TEST, TEST!

With that in mind, here’s the pseudo-code with the reversed logic to find only what we want to include instead of exclude.

 SELECT columnlist
   FROM dbo.sometable
  WHERE E >= @DS
    AND S <= @DE

If we look at the picture of the problem again, we see that the EndDates (E) for range types 3 through 6 are all greater than (or equal) the desired month StartDate (DS).  We also find that that the StartDates (S) for those same ranges are all less (or equal) than the desired month EndDate (DE).

              | Desired |

Left "Outside"|  Month  |Right "OutSide"

<-------------|---------|-------------->

              |         |

   S--1--E    | S--3--E |    S--2--E

              |         |

           S--5--E   S--4--E

              |         |

   S----------|----6----|----------E

              |         |

              DS        DE

So, the pseudo-code is correct.  Let’s write some real code.  Of course, we had to make some substitutions in our pseudo-code to make it work with the actual table and real dates.

--===== Find all rows that are "active" anytime in the desired month
 SELECT RangeType,StartDate,EndDate,Comment
   FROM dbo.TestTable
  WHERE EndDate   >= '01 Oct 2013'
    AND StartDate <= '31 Oct 2013'
;

As you can see in the results that follow, only ranges 3, 4, 5, and 6 were returned as required.

RangeType StartDate               EndDate                 Comment
--------- ----------------------- ----------------------- -------------------------
3         2013-10-01 00:00:00.000 2013-10-31 00:00:00.000 Both Inside
4         2013-10-31 00:00:00.000 2013-11-30 00:00:00.000 StartDate Inside
5         2013-09-01 00:00:00.000 2013-10-01 00:00:00.000 EndDate Inside
6         2013-09-01 00:00:00.000 2013-11-30 00:00:00.000 Both Outside but Straddle
(4 row(s) affected)

Making the Code “Bullet Proof”

If the StartDate for ranges 3 and 4 were on the last day of October 2013 and also had a time of anything other than 00:00:00.000, they wouldn’t have been returned by the Positive Logic method.  You can test this by adding the following rows where RangeType 7 is the same as Range Type 3 and Range Type 8 is the same as Range Type 4 but both have a StartDate with a time on the last day of October 2013.  With the current code, they won’t be returned, but they should.

 INSERT INTO dbo.TestTable
        (RangeType,StartDate,EndDate,Comment)
 SELECT 7,'31 Oct 2013 00:00:00.003','31 Oct 2013 00:00:00.003','Both Inside'      UNION ALL
 SELECT 8,'31 Oct 2013 00:00:00.003','30 Nov 2013'             ,'StartDate Inside'
;

We can make our code “Bullet Proof” by using the start of the next month instead of the end of the desired month and removing one “=” from the code.  Like this…

--===== Find all rows that are "active" anytime in the desired month
 SELECT RangeType,StartDate,EndDate,Comment
   FROM dbo.TestTable
  WHERE EndDate   >= '01 Oct 2013'
    AND StartDate <  '01 Nov 2013'
;

… and that returns everything we want.

RangeType StartDate               EndDate                 Comment
--------- ----------------------- ----------------------- -------------------------
3         2013-10-01 00:00:00.000 2013-10-31 00:00:00.000 Both Inside
4         2013-10-31 00:00:00.003 2013-11-30 00:00:00.000 StartDate Inside
5         2013-10-31 00:00:00.003 2013-10-31 23:58:00.000 EndDate Inside
6         2013-09-01 00:00:00.000 2013-11-30 00:00:00.000 Both Outside but Straddle
7         2013-10-31 00:00:00.003 2013-10-31 00:00:00.003 Both Inside
8         2013-10-31 00:00:00.003 2013-11-30 00:00:00.000 StartDate Inside
(6 row(s) affected)

Make the Code Easy to Use

To make the code really easy to use, you could change the code into a “parameterized view”, which is also known as an “Inline Table Valued Function” or “iTVF” for short.  Like this… (note that "0" in the DATEDIFF and DATEADD functions is the date-serial number for "1900-01-01")...

 CREATE FUNCTION dbo.GetActiveMonthDataFromTestTable
        (
         @SomeDate DATETIME
        )
RETURNS TABLE --WITH SCHEMABINDING --Commented out to make experiments easier
     AS
 RETURN
--===== Find all rows that are "active" anytime in the desired month
 SELECT RangeType,StartDate,EndDate,Comment
   FROM dbo.TestTable
  WHERE EndDate   >= DATEADD(mm,DATEDIFF(mm,0,@SomeDate)  ,0) --Finds first of month
    AND StartDate <  DATEADD(mm,DATEDIFF(mm,0,@SomeDate)+1,0) --Finds first of next month
;
GO

... and then call the function as follows.  Note that the date of 'Oct 2013' is interpreted by SQL Server as 2013-10-01.

--===== Find all rows that are active for October 2013
 SELECT RangeType,StartDate,EndDate,Comment
   FROM dbo.GetActiveMonthDataFromTestTable('Oct 2013')
;
GO

You could also use GETDATE() to get the “active” rows for the current month or subtract a month from GETDATE(), like below, to automatically and always get the “active” rows from the previous month no matter what today is.

--===== Find all "active" rows for the previous month.
 SELECT RangeType,StartDate,EndDate,Comment
   FROM dbo.GetActiveMonthDataFromTestTable(DATEADD(mm,-1,GETDATE())) --Subract one month from now
;

Thanks for listening folks,

"Crack Filled!"

 


© Copyright – Jeff Moden - 01 January 2014 – All Rights Reserved

Rate

4.53 (49)

You rated this post out of 5. Change rating

Share

Share

Rate

4.53 (49)

You rated this post out of 5. Change rating