How to get a consecutive count based on the first value

  • Hello all. I was asked for a very interesting report today that has been driving me nuts all morning. We have customer accounts that we measure usage. We want to run a report for all customers whose current usage is 0 and a count of how many months it has been zero consecutively. Here is an example.

    declare @YourTable table (

    CustomerID int,

    ReadDate datetime,

    usage int

    )

    insert into @YourTable select 1,' 1 mar 2014',0

    insert into @YourTable select 1,' 1 feb 2014',0

    insert into @YourTable select 1,' 1 jan 2014',0

    insert into @YourTable select 2,' 1 mar 2014',0

    insert into @YourTable select 2,' 1 feb 2014',100

    insert into @YourTable select 2,' 1 jan 2014',0

    insert into @YourTable select 3,' 1 mar 2014',150

    insert into @YourTable select 3,' 1 feb 2014',200

    insert into @YourTable select 3,' 1 jan 2014',0

    This should return

    1,3

    2,1

    This is what I am currently using but it isn't working right

    WITH cte

    AS

    (

    SELECT *,COUNT(1) OVER(PARTITION BY cnt,CustomerID) pt FROM

    (

    SELECT tt.*

    ,(SELECT COUNT(Customerid) FROM @YourTable WHERE Usage = 0 AND ReadDate < tt.ReadDate) AS cnt

    FROM @YourTable tt

    WHERE usage = 0

    ) t1

    )

    SELECT CustomerID,count(*) FROM cte GROUP BY Customerid

    Any ideas?

  • This is a gaps and islands problem

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    WITH CTE AS (

    SELECT CustomerID,ReadDate,usage,

    ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY ReadDate DESC) AS rn1,

    ROW_NUMBER() OVER(PARTITION BY CustomerID,CASE WHEN usage=0 THEN 0 ELSE 1 END ORDER BY ReadDate DESC) AS rn2

    FROM @YourTable)

    SELECT CustomerID,COUNT(*) AS cnt

    FROM CTE

    GROUP BY CustomerID,CASE WHEN usage=0 THEN 0 ELSE 1 END,rn1-rn2

    HAVING MIN(rn1)=1 AND MIN(usage)=0

    ORDER BY CustomerID,MIN(rn1);

    Simpler version

    WITH CTE AS (

    SELECT CustomerID,ReadDate,usage,

    ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY ReadDate DESC) AS rn,

    SUM(ABS(usage)) OVER(PARTITION BY CustomerID ORDER BY ReadDate DESC) AS total

    FROM @YourTable)

    SELECT CustomerID,MAX(rn) AS cnt

    FROM CTE

    WHERE total=0

    GROUP BY CustomerID

    ORDER BY CustomerID;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Hi, here's one possibility to get that report. There are other possibilities but this just seemed easy for me.

    SELECT CustomerID,

    DATEDIFF( MM,

    ISNULL( MAX(CASE WHEN usage > 0 THEN ReadDate END), DATEADD( MM, -1, MIN(ReadDate))),

    MAX( ReadDate)) Option1,

    ISNULL( DATEDIFF( MM, MAX(CASE WHEN usage > 0 THEN ReadDate END), MAX( ReadDate))

    ,DATEDIFF( MM, DATEADD( MM, -1, MIN(ReadDate)), MAX( ReadDate))) Option2

    FROM @YourTable

    WHERE ReadDate <= @Date

    GROUP BY CustomerID

    HAVING MAX( ReadDate) = MAX(CASE WHEN usage = 0 THEN ReadDate END)

    If you have any questions, feel free to ask them.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I'm betting this could be more elegant and possibly use fewer CTEs, but I think it works.

    DECLARE @YourTable TABLE (

    CustomerID int,

    ReadDate datetime,

    usage int)

    insert into @YourTable select 1,' 1 mar 2014',0

    insert into @YourTable select 1,' 1 feb 2014',0

    insert into @YourTable select 1,' 1 jan 2014',0

    insert into @YourTable select 2,' 1 mar 2014',0

    insert into @YourTable select 2,' 1 feb 2014',100

    insert into @YourTable select 2,' 1 jan 2014',0

    insert into @YourTable select 3,' 1 mar 2014',150

    insert into @YourTable select 3,' 1 feb 2014',200

    insert into @YourTable select 3,' 1 jan 2014',0;

    WITH

    lastReadDateForCustomer AS

    (SELECT CustomerId, MAX(ReadDate) LastReadDate

    FROM @YourTable

    GROUP BY CustomerID),

    customerWithLastUsageZero AS

    (SELECT t.CustomerId

    FROM @YourTable t

    JOIN lastReadDateForCustomer r

    ON r.CustomerID = t.CustomerID

    AND r.LastReadDate = t.ReadDate

    WHERE t.usage = 0),

    lastNonZeroUsage AS

    (SELECT z.CustomerId, MAX(ReadDate) NonZeroDate

    FROM customerWithLastUsageZero z

    LEFT JOIN @YourTable t

    ON z.CustomerID = t.CustomerID

    AND t.usage > 0

    GROUP BY z.CustomerID),

    noUsage AS

    (SELECT z.CustomerId, MIN(ReadDate) EarliestZeroDate

    FROM customerWithLastUsageZero z

    JOIN @YourTable t

    ON z.CustomerID = t.CustomerID

    WHERE t.usage = 0

    GROUP BY z.CustomerID)

    SELECT n.CustomerID, DATEDIFF(MM, COALESCE(n.NonZeroDate, DATEADD(mm, -1, nu.EarliestZeroDate)), r.LastReadDate) as ZeroMonths

    FROM lastNonZeroUsage n

    JOIN lastReadDateForCustomer r

    ON r.CustomerID = n.CustomerID

    JOIN noUsage nu

    ON nu.CustomerID = n.CustomerID

  • Thanks Stephanie this looks perfect. I'm going to test it with our full dataset and see how it does. I'm sure you are right and it could be condensed down but it is very readable and easy to follow in its current form.

  • Even if it's readable, I wouldn't suggest using Stephanie's code. It reads the table 7 times which can become a nightmare on large tables.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Well I ran it on our production table with 1.2 million rows and it runs in 260 ms so I think that will work just fine 🙂

  • Even if it's readable, I wouldn't suggest using Stephanie's code. It reads the table 7 times which can become a nightmare on large tables.

    Agreed. The other two candidate queries are more efficient. They were posted while I was writing mine. Since I am primarily an application developer, I tend to overuse CTEs, which lend themselves to sequential processing.

  • Stephanie Giovannini (1/14/2015)


    Even if it's readable, I wouldn't suggest using Stephanie's code. It reads the table 7 times which can become a nightmare on large tables.

    Agreed. The other two candidate queries are more efficient. They were posted while I was writing mine. Since I am primarily an application developer, I tend to overuse CTEs, which lend themselves to sequential processing.

    The only truly efficient query is Luis Cazares's which need a single scan and mimimal number of logical reads.

    Another one which does ROW_NUMBER twice, will have no less number of scans than yours (even more, actually much more), the only difference is : they will be performed on Work Table instead of on source one.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Mark Cowne (1/13/2015)


    This is a gaps and islands problem

    Correct!

    My solution suggestion:

    SELECT CustomerID, COUNT(*)

    FROM

    (

    SELECT CustomerID, ReadDate, Usage

    ,rn=DATEDIFF(month, ReadDate, 0) -

    ROW_NUMBER() OVER

    (

    PARTITION BY CustomerID

    ORDER BY ReadDate DESC

    )

    FROM @YourTable

    WHERE usage = 0

    ) a

    GROUP BY CustomerID, rn

    HAVING MAX(ReadDate) = (SELECT MAX(ReadDate) FROM @YourTable);


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Luis Cazares (1/13/2015)


    Hi, here's one possibility to get that report. There are other possibilities but this just seemed easy for me.

    SELECT CustomerID,

    DATEDIFF( MM,

    ISNULL( MAX(CASE WHEN usage > 0 THEN ReadDate END), DATEADD( MM, -1, MIN(ReadDate))),

    MAX( ReadDate)) Option1,

    ISNULL( DATEDIFF( MM, MAX(CASE WHEN usage > 0 THEN ReadDate END), MAX( ReadDate))

    ,DATEDIFF( MM, DATEADD( MM, -1, MIN(ReadDate)), MAX( ReadDate))) Option2

    FROM @YourTable

    WHERE ReadDate <= @Date

    GROUP BY CustomerID

    HAVING MAX( ReadDate) = MAX(CASE WHEN usage = 0 THEN ReadDate END)

    If you have any questions, feel free to ask them.

    I assume you are setting @Date like this?

    DECLARE @Date DATE = ' 1 mar 2014';

    You could do that in mine also instead of doing a second clustered index scan to find the max date in the table. Assuming of course that the table definition is like this:

    declare @YourTable table (

    CustomerID int,

    ReadDate datetime,

    usage int,

    PRIMARY KEY(CustomerID, ReadDate)

    );


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yes and no.

    I declare the variable like that but it could be any date to get historic reports. I believe that I tested it with a simple GETDATE().

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply