Trying to incorporate RANK or NTILE with LEAD function

  • My scenario is as follows: We have accounts that pay for a particular "premium" service. It's entirely possible an account paid for this service for three consecutive months in 2013, then stopped paying, then started paying again. Why I'm trying to establish is, for the FIRST period of time the accout paid for this service, for how many consecutive months did they pay? Here is my test data:

    if object_id('tempdb..#SampleData') is not null

    drop table #SampleData

    go

    if object_id('tempdb..#DateAnalysis') is not null

    drop table #DateAnalysis

    go

    -- Create temp tables for sample data

    create table #SampleData

    (AccountID int,

    RandomDate datetime)

    create table #DateAnalysis

    (RowID int identity(1,1),

    AccountID int,

    RandomDate datetime,

    NextDate datetime,

    LeadInMonths int)

    -- Insert some sample data

    insert into #SampleData values

    (1, '1/1/13'), (1, '2/1/13'), (1, '3/1/13'), (1, '6/1/13'), (1, '10/1/13'), (1, '11/1/13'), (1, '12/1/13'),

    (2, '1/1/13'), (2, '4/1/13'), (2, '5/1/13'), (2, '6/1/13'),

    (3, '2/1/13'), (3, '3/1/13'), (3, '4/1/13'), (3, '9/1/13'), (3, '10/1/13'), (3, '11/1/13')

    -- Use lead function to determine how many months are between

    -- consecutive dates per account

    ; with DateInterval as

    (select AccountID, RandomDate,

    NextDate = lead (RandomDate, 1, NULL) over (partition by AccountID order by RandomDate)

    from #SampleData)

    insert into #DateAnalysis

    select AccountID, RandomDate, NextDate,

    datediff(mm, RandomDate, NextDate) as 'Lead'

    from DateInterval

    where NextDate is not null -- Last row will contain NULL for NextDate. Don't include these rows.

    -- Show the results

    select *,

    'NTile' = NTILE(3) over (partition by AccountID order by RandomDate),

    'RowNum' = row_number() over (partition by AccountID order by RandomDate)

    from #DateAnalysis

    Results (this is not getting me what I'm looking for):

    RowIDAccountIDRandomDateNextDateLeadInMonthsNTileRowNum

    11 2013-01-012013-02-01111

    21 2013-02-012013-03-01112

    31 2013-03-012013-06-01323

    41 2013-06-012013-10-01424

    51 2013-10-012013-11-01135

    61 2013-11-012013-12-01136

    72 2013-01-012013-04-01311

    82 2013-04-012013-05-01122

    92 2013-05-012013-06-01133

    103 2013-02-012013-03-01111

    113 2013-03-012013-04-01112

    123 2013-04-012013-09-01523

    133 2013-09-012013-10-01124

    143 2013-10-012013-11-01135

    This is what I'm trying to achieve:

    RowIDAccountIDRandomDateNextDateLeadInMonthsRankForThisLeadInMonths

    11 2013-01-012013-02-0111

    21 2013-02-012013-03-0111

    31 2013-03-012013-06-0132

    41 2013-06-012013-10-0143

    51 2013-10-012013-11-0114

    61 2013-11-012013-12-0114

    72 2013-01-012013-04-0131

    82 2013-04-012013-05-0112

    92 2013-05-012013-06-0112

    103 2013-02-012013-03-0111

    113 2013-03-012013-04-0111

    123 2013-04-012013-09-0152

    133 2013-09-012013-10-0113

    143 2013-10-012013-11-0113

    The problem comes with accounts like AccountID = 1. They paid consecutively to start, then skipped, then started paying consecutively again. When using window functions, I'm running into trouble attempting to partition by AccountID and LeadInMonths. It's putting all the LeadInMonths = 1 together and that will give me skewed results if I want to know the earliest and latest date within the FIRST consecutive range of dates where the account paid. I've tried NTILE but it expects an integer and there's no telling how many "tiles" would be in AccountID partition.

    I've looked at the OVER clause and the new "ROWS BETWEEN" syntax and still cannot get the desired results. Perhaps I need an entirely different approach?

    Thank you

  • Hey there,

    This solution gives you what you've described, but with different output columns:

    --First I use LAG to find where a new payment period begins by checking the months between the current row and the previous row

    --split by AccountID

    WITH FlagPaymentPeriods AS

    (

    SELECT

    AccountID

    ,RandomDate

    ,CASE WHEN LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate) IS NULL

    OR DATEDIFF(month, LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate), RandomDate) > 1

    THEN 1

    ELSE 0

    END AS NewPaymentPeriodFlag

    FROM

    SampleData

    )

    --Now I can effectively do a running total of the flags again split by AccountID

    , PaymentPeriods AS

    (

    SELECT

    AccountID

    ,RandomDate

    ,SUM(NewPaymentPeriodFlag) OVER (PARTITION BY AccountID ORDER BY RandomDate) AS PaymentPeriod

    FROM

    FlagPaymentPeriods

    )

    --Now we just find how many rows we have in account where the PaymentPeriod = 1 (the first payment period)

    SELECT

    AccountID

    ,COUNT(PaymentPeriod) AS MonthsPaid

    FROM

    PaymentPeriods

    WHERE

    PaymentPeriod = 1

    GROUP BY

    AccountID

    ,PaymentPeriod

    *EDIT: Changed the aggregate in the last part of the query to use COUNT instead of SUM - makes more sense that way.

    Results are:

    AccountID MonthsPaid

    ----------- -----------

    1 3

    2 1

    3 3

    Hope that helps!

  • Wow!!! That is fantastic! Thank you so much!

  • No problem, happy to help 🙂

    In case you wanted to extend this to see how many months people pay for in subsequent payment periods, you can change the last part of the query to this:

    SELECT

    AccountID

    ,PaymentPeriod

    ,COUNT(PaymentPeriod) AS MonthsPaid

    FROM

    PaymentPeriods

    GROUP BY

    AccountID

    ,PaymentPeriod

    ORDER BY

    AccountID

    ,PaymentPeriod

  • Are you looking for something like this?

    SELECT RowID=ROW_NUMBER() OVER (ORDER BY AccountID, RandomDate)

    ,AccountID, RandomDate, NextDate, LeadInMonths

    ,RankForThisLeadInMonths=SUM(lm) OVER

    (

    PARTITION BY AccountID ORDER BY RandomDate

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    )

    FROM

    (

    SELECT AccountID, RandomDate, NextDate, LeadInMonths

    ,lm=CASE WHEN LeadInMonths = LAG(LeadInMonths, 1, 0) OVER (PARTITION BY AccountID ORDER BY RandomDate)

    THEN 0 ELSE 1 END

    FROM

    (

    SELECT AccountID, RandomDate

    ,NextDate=LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate)

    ,LeadInMonths=DATEDIFF(month, RandomDate

    ,LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate))

    FROM #SampleData

    ) a

    WHERE NextDate IS NOT NULL

    ) a

    ORDER BY AccountID, RandomDate;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yes dwain.c, that would work as well! Thanks for the input.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply