Help Finding Subset-Grouping in a sequence

  • Hello all,
    First time caller, long time listener...
    I have been banging my head on the desk trying to develop a set-based solution to the following problem.  It appears like Islands and Gaps, but has an extra-dimension that i can't quite figure out.
    Problem:
    I have a set of data that indicates how a particular object has changed states over time.  What I need to know is how the states are grouped contiguously.   Keep in mind that the condition of being in a state is not unique: i.e. the object could be in the state multiple times over time and not contiguously.

    create table ObjectStates (
       [ID]  int,
       [State] int,
       [Seq] int
    )
    insert into ObjectStates (ID, State, Seq)
    values(1,4,1),
          (1,3,2),
          (1,3,3),
          (1,3,4),
          (1,3,5),
          (1,2,6),
          (1,1,7),
          (1,2,8),
          (1,4,9),
          (1,3,10),
          (1,3,11)

    Desired Output
    ID    State    Seq    groupID
    1      4         1    1
    1      3         2    2
    1     3         3    2
    1      3        4    2
    1      3        5    2
    1     2    6    3
    1      1    7    4
    1      2    8    5
    1      4    9    6
    1      3    10    7
    1      3    11    7

    Any help will be greatly appreciated.

  • I assumed that you wanted to partition by ID.  That can easily be changed if that's not the case.


    create table #ObjectStates (
     [ID] int,
     [State] int,
     [Seq] int
    )
    insert into #ObjectStates (ID, State, Seq)
    values(1,4,1),
      (1,3,2),
      (1,3,3),
      (1,3,4),
      (1,3,5),
      (1,2,6),
      (1,1,7),
      (1,2,8),
      (1,4,9),
      (1,3,10),
      (1,3,11)
    ;
    WITH CTE AS
    (
        SELECT *, CASE WHEN LAG(os.State, 1, 0) OVER(PARTITION BY ID ORDER BY os.Seq) <> os.State THEN 1 ELSE 0 END AS is_start
        FROM #ObjectStates os
    )
    SELECT ID, State, Seq, SUM(is_start) OVER(PARTITION BY ID ORDER BY Seq ROWS UNBOUNDED PRECEDING) AS grp
    FROM CTE

    I also assumed that 0 was not a possible state.  If it is, then you'll want to use some other value for the default in the LAG function.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, January 23, 2017 3:17 PM

    I assumed that you wanted to partition by ID.  That can easily be changed if that's not the case.


    create table #ObjectStates (
     [ID] int,
     [State] int,
     [Seq] int
    )
    insert into #ObjectStates (ID, State, Seq)
    values(1,4,1),
      (1,3,2),
      (1,3,3),
      (1,3,4),
      (1,3,5),
      (1,2,6),
      (1,1,7),
      (1,2,8),
      (1,4,9),
      (1,3,10),
      (1,3,11)
    ;
    WITH CTE AS
    (
        SELECT *, CASE WHEN LAG(os.State, 1, 0) OVER(PARTITION BY ID ORDER BY os.Seq) <> os.State THEN 1 ELSE 0 END AS is_start
        FROM #ObjectStates os
    )
    SELECT ID, State, Seq, SUM(is_start) OVER(PARTITION BY ID ORDER BY Seq ROWS UNBOUNDED PRECEDING) AS grp
    FROM CTE

    I also assumed that 0 was not a possible state.  If it is, then you'll want to use some other value for the default in the LAG function.

    Drew

    Drew,
    I knew I was over-thinking it.  Your solution is elegant and spot-on.  Thank you!

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply