Category Entry and Exit Dates per ID

Question

Category Entry and Exit Dates per ID

byrak

SSC Enthusiast

Points: 174
More actions
April 26, 2021 at 5:30 pm

#3875957

Hello,
I have the following table, where ID is the unique identifier. An can move from category to category, both up and down. My table records each day an ID stays in a given category. I am trying to identify the start date and the end date of an ID in a given category. The problem is that an ID can move up a category, and move back down to its original category after a certain number of days. Here is my table as an example with only 1 ID:
ID    Category     Date
1            1              2021-01-01
1            1              2021-01-02
...
1           1               2021-01-24
1           2             2021-01-25
...
1          2               2021-02-15
1          1               2021-02-16
...
1         1                 2021-04-20
1          2               2021-04-21
When I try to get the MIN(DATE) and MAX(DATE) and group by the category and ID, it shows me that the account was in Category 1 from 2021-01-01 to 2021-04-20, and in Category 2 from 02-25 to 04-21. I am trying to track the movements of the file in each bucket step by step, meaning in my ideal result, the movements of the account will be tracked as:
ID        Category      StartDate        EndDate
1                1              2021-01-01        2021-01-24
1               2             2021-01-25          2021-02-15
1               1             2021-02-16          2021-04-20
1               2            2021-04-21                NULL (or GETDATE())

How can I achieve this result? Any help would be appreciated. I tried using the RANK() function but because the table records every single day, it seems useless.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247201 More actions · Answer 1

This is not particularly elegant, but it seems to get the job done.

Next time, please remember to post your sample data in consumable format, so that others can simply paste into SSMS and start coding.

DROP TABLE IF EXISTS #SomeId;

CREATE TABLE #SomeId
(
    Id INT NOT NULL
   ,Category INT NOT NULL
   ,StartDate DATE NOT NULL
   ,
   PRIMARY KEY CLUSTERED (
                             Id
                            ,Category
                            ,StartDate
                         )
);

INSERT #SomeId
(
    Id
   ,Category
   ,StartDate
)
VALUES
(1, 1, '20210101')
,(1, 1, '20210102')
,(1, 1, '20210124')
,(1, 2, '20210125')
,(1, 2, '20210215')
,(1, 1, '20210216')
,(1, 1, '20210420')
,(1, 2, '20210521');

WITH NextDates
AS (SELECT si.Id
          ,si.Category
          ,si.StartDate
          ,PrevCategory = LAG(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
          ,NextCategory = LEAD(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
          ,NextDate = LEAD(si.StartDate, 1, GETDATE()) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
    FROM #SomeId si)
SELECT NextDates.Id
      ,NextDates.Category
      ,NextDates.StartDate
      ,EndDate = DATEADD(
                            DAY
                           ,-1
                           ,LEAD(NextDates.NextDate, 1) OVER (PARTITION BY NextDates.Id ORDER BY NextDates.StartDate)
                        )
FROM NextDates
WHERE (
          NextDates.PrevCategory <> NextDates.NextCategory
          AND NextDates.NextCategory <> NextDates.Category
      )
      OR NextDates.PrevCategory IS NULL
ORDER BY NextDates.Id
        ,NextDates.StartDate;

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 2

Here is another take:

 --==== Create some test data
   Drop Table If Exists #testData;

Declare @startDate date = '2021-01-01'
      , @endDate date = '2021-06-30';

   With t(n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (num)
     As (
 Select Top (datediff(day, @startDate, @endDate) + 1) 
        checksum(row_number() over(Order By @@spid))
   From t t1, t t2, t t3
        )
 Select ID = 1
      , Category = Case When dt.StartDate Between '2021-01-25' And '2021-02-15' Then 2
                        When dt.StartDate Between '2021-04-21' And '2021-06-15' Then 2
                        Else 1
                    End
      , dt.StartDate
   Into #testData
   From iTally                                                      it
  Cross Apply (Values (dateadd(day, it.num - 1, @startDate)))       dt(StartDate);

 --==== Solution using above test data
   With dateGroups
     As (
 Select *
      , StartGroup = iif(lag(td.Category, 1, 0) over(Partition By td.ID Order By td.StartDate) <> td.Category, 1, 0)
   From #testData                       td
        )
 Select dg.ID
      , dg.Category
      , dg.StartDate
      , EndDate = dateadd(day, -1, lead(dg.StartDate) over(Partition By dg.ID Order By dg.StartDate))
   From dateGroups                      dg
  Where dg.StartGroup = 1;

The first part is to define the row that starts each group - then we simply filter by that row and calculate the end date as 1 day less than the next rows start date. To define a start row - we look at the 'next' rows Category - if the next rows category changes then a new group is started.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Ken McKelvey SSCoach Points: 18979 More actions · Answer 3

WITH SeqBreaks
AS
(
SELECT ID, Category, [Date]
,CASE
WHEN Category = LAG(Category) OVER (PARTITION BY ID ORDER BY [Date])
THEN 0
ELSE 1
END AS SeqBreak
FROM YourTable
)
,Grps
AS
(
SELECT ID, Category, [Date]
,SUM(SeqBreak) OVER (PARTITION BY ID ORDER BY [Date]) AS Grp
,LEAD(Category) OVER (PARTITION BY ID ORDER BY [Date]) AS NextCategory
FROM SeqBreaks
)
SELECT ID, Category
,MIN([Date]) AS StartDate
,MAX(CASE WHEN NextCategory IS NULL THEN '9999-12-31' ELSE [Date] END) AS EndDate
FROM Grps
GROUP BY ID, Category, Grp
ORDER BY ID, StartDate

This reply was modified 4 years, 7 months ago by Ken McKelvey.

ScottPletcher SSC Guru Points: 101222 More actions · Answer 4

I think this is a form of Gaps and Islands (term coined by Itzik Ben-Gan? maybe?). It's rather traditional in G&I to use the name "grp", so I stuck with that:

;WITH grps AS (
    SELECT id, category, StartDate, 
        ROW_NUMBER() OVER(PARTITION BY id ORDER BY StartDate) - 
        ROW_NUMBER() OVER(PARTITION BY id ORDER BY category, StartDate) AS grp
    FROM #SomeId
) 
SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
FROM grps 
GROUP BY id, category, grp 
ORDER BY id, StartDate

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Mark Cowne One Orange Chip Points: 26987 More actions · Answer 5

Here's another way to do gaps and islands

WITH C1 AS (
SELECT id, category, StartDate, 
CASE WHEN LAG(category) OVER(PARTITION BY id ORDER BY StartDate) = category THEN 0 ELSE 1 END AS IsStart
FROM #SomeId
),
C2 AS (
SELECT id, category, StartDate,
   SUM(IsStart) OVER(PARTITION BY id ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS grp
FROM C1
) 
SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
FROM C2
GROUP BY id, category, grp 
ORDER BY id, StartDate;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537