Counting records only when the status of a record has changed

  • I'm tracking a variety of processes to know when a given process is "compliant" vs "non-compliant". In the end, I just want to count the number of times a process has switched from non-compliant to compliant. So, below is a simplied example with sample data for a six month period:

    ProcessID, Month, Compliant (0= non-compliant, 1 = compliant)

    1,1,0

    1,2,0

    1,3,1

    1,4,1

    1,5,0

    1,6,1

    From the example, the process switched from non-compliant to compliant twice: once from month 2 to month 3, and then again from month 5 to month 6. I don't need to count the number of actual months that the process was compliant, only when it switches to be compliant.

    I keep thinking that I could leverage a CTE or a self join, but I'm obviously struggling to solve it.

    Thanks in advance for assistance,

    Pete

  • Pete

    Something like this?

    SELECT COUNT(*)

    FROM MyTable t1

    JOIN Mytable t2

    ON t1.ProcessId = t2.ProcessId

    AND t2.Month = t1.Month + 1

    AND t1.Compliant <> t2.Compliant

    John

  • Try this

    DECLARE @t TABLE(ProcessID INT, Month INT, Compliant INT)

    INSERT INTO @t(ProcessID, Month, Compliant)

    SELECT 1,1,0 UNION ALL

    SELECT 1,2,0 UNION ALL

    SELECT 1,3,1 UNION ALL

    SELECT 1,4,1 UNION ALL

    SELECT 1,5,0 UNION ALL

    SELECT 1,6,1;

    WITH CTE AS (

    SELECT ProcessID, Month, Compliant,

    ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) AS rn

    FROM @t)

    SELECT x.ProcessID, x.Month

    FROM CTE x

    WHERE x.Compliant=1

    AND EXISTS (SELECT * FROM CTE y

    WHERE y.ProcessID=x.ProcessID

    AND y.Compliant=0

    AND y.rn=x.rn-1);

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • This is presenting itself as a "running total" problem...i.e. you need to compare a value in the previous row to a value in the current row to make a decision on whether to aggregate.

    This would be best handled in an application layer better suited to iterative processing...but if you're stuck in T-SQL you're looking at a cursor, a self-referencing sub-select (similar to a triangular join) which was presented by Mark or maybe a quirky update.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Mark - your query solution is great. I was leaning towards a cte and row_number(), but your technique with correlating a query using EXISTS provides the missing piece that eluded me.

    John - I initially tried a self-join, as in your query, but without success. Your solution, unfortunately, returns an incorrect count (3), instead of 2. Nonetheless, I appreciate your assistance. I'm sure I've seen a self-join solution before for problems similar as the one I've presented -- wish I had saved them off.

    /* HERE'S THE KICKER */

    Looks like I've got a moving target --> Since the moment of when I posted my quest for help, the scope of the project appears to be shifting quickly to something similar to maintaining a SCD (slowly changing dimension). So, rather than merely being able to count the number of times the Compliant flag changes from non-compliant to compliant, it seems a table may be needed to track the startmonth & endmonth for each series of the compliant flag. Thus, in my example, 4 rows would be extracted as follows:

    Process, StartMonth, EndMonth, Compliant

    1,1,2,0

    1,3,4,1

    1,5,5,0

    1,6,null,1

    It's a simple process of summing the Compliant field, thereafter.

    Unfortunately, I don't have a lot of practice with transforming rows into a date-span recordset; I'm used to going in the opposite direction of taking date spans, and transforming them into rows. I'm aware that SSIS has a SDC component, and as well as alternative techniques in SSIS, but is it a straight-forward enough process to maintain a SCD (aka date-span recordset) using just T-SQL?

  • Something like this?

    WITH CTE AS (

    SELECT ProcessID, Month, Compliant,

    ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) -

    ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month) AS rnDiff

    FROM @t)

    SELECT ProcessID,

    MIN(Month) AS StartMonth,

    MAX(Month) AS EndMonth,

    Compliant

    FROM CTE x

    GROUP BY ProcessID, Compliant, rnDiff

    ORDER BY ProcessID,StartMonth;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Marrrrrrk!

    OK, I'm completely jazzed by the elegance of the query. Reminds me of the various techniques I've seen by Itzik Ben-Gan leveraging row_number() and other ranking functions.

    So, was this type of query something you keep around in your bag of tricks or were you able to think it through? (I've parsed out your cte to better understand how the data comes out of it before it gets rolled up - I really want to understand the logic.)

    Thanks,

    Pete

  • It's quite a well known technique, just google for sql "gaps and islands".

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark,

    Just before you provided the second query solution, I actually started reading up on gaps and islands --I just didn't get a chance to put something to the test. (At least I now know I was on the right track... )

    Thanks again for your help

    --Pete

  • Also a minor change will give you the NULL for the latest end month

    WITH CTE AS (

    SELECT ProcessID, Month, Compliant,

    ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month DESC) AS rn1,

    ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month DESC) AS rn2

    FROM @t)

    SELECT ProcessID,

    MIN(Month) AS StartMonth,

    MAX(CASE WHEN rn1>1 THEN Month END) AS EndMonth,

    Compliant

    FROM CTE x

    GROUP BY ProcessID, Compliant, rn1-rn2

    ORDER BY ProcessID,StartMonth;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (3/16/2011)


    Also a minor change will give you the NULL for the latest end month

    WITH CTE AS (

    SELECT ProcessID, Month, Compliant,

    ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month DESC) AS rn1,

    ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month DESC) AS rn2

    FROM @t)

    SELECT ProcessID,

    MIN(Month) AS StartMonth,

    MAX(CASE WHEN rn1>1 THEN Month END) AS EndMonth,

    Compliant

    FROM CTE x

    GROUP BY ProcessID, Compliant, rn1-rn2

    ORDER BY ProcessID,StartMonth;

    BONUS!

    thx

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply