Blog Post

Last Non-NULL Date in SQL Server

,

The simplest of requests are often the most difficult to execute. For example, a finance team needs to know every time a customer did not invoice for 90 days in the past 2 years. The simplicity of the ask is deceiving. Tracking differences across multiple dimensions (customer and invoice date in this case) and accounting for NULL values in the changing dimension (aka when a customer did not invoice on a day) appears to be hopeless without the support of a CRM code change. But have no fear, complicated SQL is here!

Testing Scenario: the business would like you to create a customer attrition report. To do this, you need to find gaps in invoice dates per customer and determine when and how many customers go “inactive” and are “reactivated” in the past two years. A customer is deemed “inactive” whenever there are greater than 90 days since the last invoice. This can occur multiple times in one year, so a customer can be “reactivated” multiple times in one year.

Resources Needed:

  1. SQL Server Access to the needed data elements
    • In this scenario, this consists of invoice date by customer. You can swap this out for any other date range or any other unique ID.
  2. Business logic
    • In this scenario, activations in a year = anytime a customer has invoiced first the first time in a 90 day period. You can swap customer field for any dimension such as sales rep, carrier, business segment, etc. You can also swap out invoice date for any date field such as creation date, pickup date, paid date, delivery date, etc.
  3. Start and End dates
  4. Ability to use CTE’s/Temp Tables
    • This really comes into play if you are trying to create a Direct Query based report in Power BI or using any other reporting tools that do not allow calling Temp Tables. If you hit this limitation, then you will need to leverage a database/code solution instead of the method below.

Notes:

  • If your SQL server instance is after 2016, then you will not need to use the custom date temp table and can use IGNORE NULL within the MAX OVER statement (see alternative line in the final SQL code below).
  • The process below lays out each portion of the final query, but feel free to skip ahead to the end for the final sql statement if you don’t need each section explained.

Process:

  1. Set up parameters
    • DECLARE @StartDate DATE = '2019-01-01'

      DECLARE @EndDate DATE = GETDATE()

      DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive".

  2. Create a date/calendar table. Check with your DBA’s first to make sure they haven’t already created something similar that you can use, all you need is a list of sequential calendar dates with no gaps.
    • ;WITH cte AS (

      SELECT @StartDate AS myDate

      UNION ALL|

      SELECT DATEADD(day,1,myDate) as myDate

      FROM cte

      WHERE DATEADD(day,1,myDate) <= @EndDate

      )

      SELECT myDate 'CalendarDate'

      INTO #Calendar

      FROM cte

      OPTION (MAXRECURSION 0) –this works around the usual 100 recursion row limit

  3. If you need to partition by a dimension other than date, such as customer in this scenario, you will need to create a table to grab that dimension’s values as well. After this, you’ll need to create a bridge table that will have a value for every date in your range and every customer (or other dimension) value as well.
    • –Customer Table

      SELECT DISTINCT

      DA.AccountsKey

      ,DA.CompanyID

      ,DA.CompanyName

      ,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'

      INTO #Companies

      FROM DimAccount DA

      JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey

      WHERE FSF.InvoiceDateKey IS NOT NULL

      GROUP BY

      DA.AccountsKey

      ,DA.CompanyID

      ,DA.CompanyName

    • –Bridge Table that combines both Customer and Date values

      SELECT DISTINCT

      C.CalendarDate

      ,Comp.CompanyID

      ,Comp.CompanyName

      ,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'

      ,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'

      INTO #Bridge

      FROM #Calendar C, #Companies Comp

      GROUP BY

      C.CalendarDate

      ,Comp.CompanyID

      ,Comp.CompanyName

      ,CONCAT(C.CalendarDate,Comp.CompanyID)

  4. Next, we need to create our unique ID’s that combine all the dimensions we are hoping to account for in our “IGNORE NULLS” scenario. In this test case, we need to create one ID that grabs the actual dates a customer invoiced on and another for all the dates in our range that a customer could have possibly invoiced on. Then, we join the two together to grab the last time a customer invoiced and get ignore those pesky NULL values. This is the section where having SQL Server 2016 and later will do you a lot of favors (see code below).
    • –Actual Invoiced Dates by Customer

      SELECT DISTINCT

      FSF.InvoiceDateKey

      ,DA.CompanyName

      ,DA.CompanyID

      ,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'

      INTO #ShipmentData

      FROM ShipmentFacts FSF

      JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey

      WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate

    • –Joining together and filling in the NULLS with the previous invoiced date by customer

      SELECT DISTINCT

      C.ID

      ,S.ID 'ShipData'

      ,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

      --ALTERNATIVE FOR POST SQL Server 2012--

      --,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

      INTO #RunningDates

      FROM #Bridge C

      LEFT JOIN #ShipmentData S ON S.ID = C.ID

  5. The rest of the code is based on business logic, so please use at your discretion and edit for your own needs.

Full SQL Code:

DECLARE @StartDate DATE = '2019-01-01'

DECLARE @EndDate DATE = GETDATE()

DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive"

;WITH cte AS (

SELECT @StartDate AS myDate

UNION ALL

SELECT DATEADD(day,1,myDate) as myDate

FROM cte

WHERE DATEADD(day,1,myDate) <= @EndDate

)

SELECT myDate 'CalendarDate'

INTO #Calendar

FROM cte

OPTION (MAXRECURSION 0)

SELECT DISTINCT

DA.AccountsKey

,DA.CompanyID

,DA.CompanyName

,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'

INTO #Companies

FROM DimAccount DA

JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey

WHERE FSF.InvoiceDateKey >= '2000-01-01'

GROUP BY

DA.AccountsKey

,DA.CompanyID

,DA.CompanyName

SELECT DISTINCT

C.CalendarDate

,Comp.CompanyID

,Comp.CompanyName

,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'

,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'

INTO #Bridge

FROM #Calendar C, #Companies Comp

GROUP BY

C.CalendarDate

,Comp.CompanyID

,Comp.CompanyName

,CONCAT(C.CalendarDate,Comp.CompanyID)

SELECT DISTINCT

FSF.InvoiceDateKey

,DA.CompanyName

,DA.CompanyID

,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'

INTO #ShipmentData

FROM ShipmentFacts FSF

JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey

WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate

SELECT DISTINCT

C.ID

,S.ID 'ShipData'

,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

--ALTERNATIVE FOR POST SQL Server 2012--

--,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

INTO #RunningDates

FROM #Bridge C

LEFT JOIN #ShipmentData S ON S.ID = C.ID

SELECT DISTINCT

R.ID

,R.ShipData

,R.PreviousInvoiceDateKey

,LEFT(R.PreviousInvoiceDateKey,10) 'PreviousInvoiceDate'

,LEFT(R.ID,10) 'DateKey'

,RIGHT(R.ID,5) 'CompanyId'

,B.FirstInvoiceDate

INTO #ActivationData

FROM #RunningDates R

LEFT JOIN #Bridge B ON B.ID = R.ID

SELECT DISTINCT

A.ID

,A.DateKey

,A.CompanyId

,A.PreviousInvoiceDate

,YEAR(A.DateKey) 'Year'

,YEAR(A.FirstInvoiceDate) 'InitialActivationYear'

,CASE WHEN YEAR(A.DateKey) = YEAR(A.FirstInvoiceDate) THEN 1 ELSE 0 END 'IsActivationYear'

,DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) 'DaysSinceInvoice'

,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN 1 ELSE 0 END 'IsInactive'

,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN A.DateKey ELSE NULL END 'InactiveDate'

INTO #ActivationDetails

FROM #ActivationData A

SELECT DISTINCT

D.Year

,D.CompanyId

,SUM(D.IsInactive) 'InactivatedPeriods'

,MAX(D.IsActivationYear) 'IsFirstActivationYear'

,MAX(D.DaysSinceInvoice) 'BiggestGapInInvoicing (Days)'

,MAX(D.InactiveDate) 'LastInactiveDate'

,MAX(D.PreviousInvoiceDate) 'LastInvoiceDate'

,CASE WHEN MAX(D.InactiveDate) > MAX(D.PreviousInvoiceDate) THEN -1 ELSE 0 END 'NotActiveAtEndOfYear'

--to grab the activations per customer per year follow equation below

-- Activations = [InactivatedPeriods] + [NotActiveAtEndOfYear] + [IsFirstActivationYear] --this part will be done in Power BI

FROM #ActivationDetails D

GROUP BY

D.Year

,D.CompanyId

DROP TABLE #Calendar

DROP TABLE #Companies

DROP TABLE #Bridge

DROP TABLE #ShipmentData

DROP TABLE #RunningDates

DROP TABLE #ActivationData

DROP TABLE #ActivationDetails

Additional Resource:

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

3.5 (2)

Share

Share

Rate

3.5 (2)