Recursive CTE? Not Sure How To Do This...

  • Hey Gang-

    I have an interesting challenge which I haven't fully been able to wrap my head around. I suspect that the solution will involve a recursive CTE and windowing functions, but I'd love guru input. We're on SQL2K5.

    I'll simplify my explanation as best I can, though it's a bit complex IMO. We are recording attempted sales from SalesReps to Customers. At this stage of processing, the goal is first to identify sales that are potentially eligible for commission.

    Our business logic dictates that once a SalesRep initiates a sale, that customer is "locked" for n days (n subject to change, but currently 1). Subsequent sales to that customer within the "lock" period are ineligible for commission.

    I'm including DDL below which I hope will help illustrate the situation. I'm including comments for each row of data to indicate the expected result (eligible or ineligible) along with a reason why.

    USE [tempdb]

    IF object_id('#Sales') IS NOT NULL

    DROP TABLE #Sales

    CREATE TABLE #Sales

    (

    SaleID INT IDENTITY(1,1) NOT NULL,

    CustID INT NOT NULL,

    SalesRepID INT NOT NULL,

    SaleDate datetime NOT NULL

    )

    -- Create a variable to hold the number of days

    -- after an initial Eligible sale

    -- for which subsequent sales to a given customer

    -- are denied commission

    DECLARE @LockDays TINYINT

    SET @LockDays = 1

    -- Insert a bunch of hypothetical sales for CustID 100

    INSERT INTO #Sales (CustID, SalesRepID, SaleDate)

    SELECT 100, 1, '2014-05-01 12:00:00'-- Sale #1: Eligible

    UNION ALL

    SELECT 100, 1, '2014-05-01 15:00:00'-- Sale #2: Ineligible, date is within lock period of Sale #1

    UNION ALL

    SELECT 100, 2, '2014-05-02 14:00:00'-- Sale #3: Eligible, date is after lock period of Sale #1

    UNION ALL

    SELECT 100, 3, '2014-05-02 11:00:00'-- Sale #4: Ineligible, date is within lock period of Sale #3

    UNION ALL

    SELECT 100, 1, '2014-05-02 11:01:00'-- Sale #5: Ineligible, date is within lock period of Sale #3

    UNION ALL

    SELECT 100, 1, '2014-05-02 11:02:00'-- Sale #6: Ineligible, date is within lock period of Sale #3

    UNION ALL

    SELECT 100, 1, '2014-05-03 11:03:00'-- Sale #7: Ineligible, date is within lock period of Sale #3

    UNION ALL

    SELECT 100, 1, '2014-05-03 14:00:00'-- Sale #8: Eligible, date is after lock period of Sale #3

    UNION ALL

    SELECT 100, 1, '2014-05-03 19:00:00'-- Sale #9: Ineligible, date is within lock period of Sale #8

    UNION ALL

    SELECT 100, 1, '2014-05-04 13:00:00'-- Sale #10: Ineligible, date is within lock period of Sale #8

    UNION ALL

    SELECT 100, 1, '2014-05-05 13:00:00'-- Sale #11: Eligible, date is after lock period of Sale #8

    UNION ALL

    SELECT 100, 1, '2014-05-06 13:00:00'-- Sale #12: Eligible, date is after lock period of Sale #11

    UNION ALL

    SELECT 100, 1, '2014-05-07 13:00:00'-- Sale #13: Eligible, date is after lock period of Sale #12

    UNION ALL

    SELECT 100, 1, '2014-05-08 13:00:00'-- Sale #14: Eligible, date is after lock period of Sale #13

    SELECT

    CustID,

    SalesRepID,

    SaleDate,

    'Eligibility Here' AS eligibility

    FROM #Sales


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • P.S. There is some added logic to do with whether subsequent sales are attempted by the original SalesRep or another SalesRep, but I'll leave that for later. Just wanted to explain the presence of the SalesRepID field in the above code. 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (5/16/2014)


    P.S. There is some added logic to do with whether subsequent sales are attempted by the original SalesRep or another SalesRep, but I'll leave that for later. Just wanted to explain the presence of the SalesRepID field in the above code. 🙂

    Here is one way of doing it, slightly verbose as trying to make it more self-explanatory. It should work for more than one customer.

    😎

    USE [tempdb]

    IF object_id('#Sales') IS NOT NULL

    DROP TABLE #Sales

    CREATE TABLE #Sales

    (

    SaleID INT IDENTITY(1,1) NOT NULL,

    CustID INT NOT NULL,

    SalesRepID INT NOT NULL,

    SaleDate datetime NOT NULL

    )

    -- Create a variable to hold the number of days

    -- after an initial Eligible sale

    -- for which subsequent sales to a given customer

    -- are denied commission

    DECLARE @LockDays TINYINT

    SET @LockDays = 1

    -- Insert a bunch of hypothetical sales for CustID 100

    INSERT INTO #Sales (CustID, SalesRepID, SaleDate)

    SELECT 100, 1, '2014-05-01 12:00:00' UNION ALL -- Sale #1: Eligible -- X

    SELECT 100, 1, '2014-05-01 15:00:00' UNION ALL -- Sale #2: Ineligible, date is within lock period of Sale #1 --

    SELECT 100, 2, '2014-05-02 14:00:00' UNION ALL -- Sale #3: Eligible, date is after lock period of Sale #1 -- X

    SELECT 100, 3, '2014-05-02 11:00:00' UNION ALL -- Sale #4: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-02 11:01:00' UNION ALL -- Sale #5: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-02 11:02:00' UNION ALL -- Sale #6: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-03 11:03:00' UNION ALL -- Sale #7: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-03 14:00:00' UNION ALL -- Sale #8: Eligible, date is after lock period of Sale #3 -- X

    SELECT 100, 1, '2014-05-03 19:00:00' UNION ALL -- Sale #9: Ineligible, date is within lock period of Sale #8 --

    SELECT 100, 1, '2014-05-04 13:00:00' UNION ALL -- Sale #10: Ineligible, date is within lock period of Sale #8 --

    SELECT 100, 1, '2014-05-05 13:00:00' UNION ALL -- Sale #11: Eligible, date is after lock period of Sale #8 -- X

    SELECT 100, 1, '2014-05-06 13:00:00' UNION ALL -- Sale #12: Eligible, date is after lock period of Sale #11 -- X

    SELECT 100, 1, '2014-05-07 13:00:00' UNION ALL -- Sale #13: Eligible, date is after lock period of Sale #12 -- X

    SELECT 100, 1, '2014-05-08 13:00:00' -- Sale #14: Eligible, date is after lock period of Sale #13 -- X

    /* Add a row_number to devide the set by Customers

    and order by sale date

    */

    ;WITH SALE_LIST AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY S.CustID

    ORDER BY S.SaleDate

    ) AS CUST_RID

    ,S.CustID

    ,S.SalesRepID

    ,S.SaleDate

    FROM #Sales S

    )

    /* Identify all possible Ineligible periods*/

    ,EL_START AS

    (

    SELECT

    X.SL_RID

    ,X.SL_CUST_RID

    ,X.SL2_CUST_RID

    ,X.SL_SaleDate

    ,X.SL2_SaleDate

    ,X.CustID

    ,X.SalesRepID

    FROM

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY SL.CUST_RID

    ORDER BY SL.SaleDate,SL2.SaleDate

    ) AS SL_RID

    ,SL.CUST_RID AS SL_CUST_RID

    ,SL2.CUST_RID AS SL2_CUST_RID

    ,SL.SaleDate AS SL_SaleDate

    ,SL.SaleDate AS SL2_SaleDate

    ,SL.CustID AS CustID

    ,SL.SalesRepID AS SalesRepID

    FROM SALE_LIST SL

    OUTER APPLY SALE_LIST SL2

    WHERE SL.CustID = SL2.CustID

    AND SL.CUST_RID < SL2.CUST_RID

    AND DATEDIFF(MINUTE,SL.SaleDate,SL2.SaleDate) >= (1440 * CAST(@LockDays AS INT))

    ) AS X WHERE X.SL_RID = 1

    )

    /* Simple recursion to match periods */

    ,SPERIOD AS

    (

    SELECT TOP (1)

    EL1.SL_RID

    ,EL1.SL_CUST_RID

    ,EL1.SL2_CUST_RID

    ,EL1.SL_SaleDate

    ,EL1.SL2_SaleDate

    ,EL1.CustID

    ,EL1.SalesRepID

    FROM EL_START EL1

    UNION ALL

    SELECT

    E1.SL_RID

    ,E1.SL_CUST_RID

    ,E1.SL2_CUST_RID

    ,E1.SL_SaleDate

    ,E1.SL2_SaleDate

    ,E1.CustID

    ,E1.SalesRepID

    FROM EL_START E1

    INNER JOIN SPERIOD S1

    ON E1.SL_CUST_RID = S1.SL2_CUST_RID

    AND E1.CustID = S1.CustID

    )

    /* Final output query */

    SELECT

    SL.CustID

    ,SL.SalesRepID

    ,SL.SaleDate

    ,CASE

    WHEN SP.SL_CUST_RID IS NOT NULL THEN 'Eligible'

    ELSE 'Ineligible'

    END AS eligibility

    FROM SALE_LIST SL

    LEFT OUTER JOIN SPERIOD SP

    ON SL.CUST_RID = SP.SL_CUST_RID

    AND SL.CustID = SP.CustID

    Results

    CustID SalesRepID SaleDate eligibility

    ----------- ----------- ----------------------- -----------

    100 1 2014-05-01 12:00:00.000 Eligible

    100 1 2014-05-01 15:00:00.000 Ineligible

    100 3 2014-05-02 11:00:00.000 Ineligible

    100 1 2014-05-02 11:01:00.000 Ineligible

    100 1 2014-05-02 11:02:00.000 Ineligible

    100 2 2014-05-02 14:00:00.000 Eligible

    100 1 2014-05-03 11:03:00.000 Ineligible

    100 1 2014-05-03 14:00:00.000 Eligible

    100 1 2014-05-03 19:00:00.000 Ineligible

    100 1 2014-05-04 13:00:00.000 Ineligible

    100 1 2014-05-05 13:00:00.000 Eligible

    100 1 2014-05-06 13:00:00.000 Eligible

    100 1 2014-05-07 13:00:00.000 Eligible

    100 1 2014-05-08 13:00:00.000 Ineligible

  • Thanks for the reply! Looks as if you may have cracked it! 🙂 I'll take a close look on Monday morning when I have SSMS access and get back to you once I figure out exactly what wizardry you've done here. 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • So, first off, thanks again. This is amazing to me.

    I've spent well over an hour looking at this and trying to break it down into its individual elements, doing research on previously unfamiliar operators (like APPLY). I found one small typo (I think) in the EL_START clause which threw me for a minute:

    ,SL.SaleDate AS SL2_SaleDate

    should be

    ,SL2.SaleDate AS SL2_SaleDate

    I definitely have more questions, if you (or others) are willing to help. I'd really like to understand this solution 100%.

    1. In EL_START, I'm wondering why you chose to convert to minutes and use DATEDIFF?

    DATEDIFF(MINUTE,SL.SaleDate,SL2.SaleDate) >= (1440 * CAST(@LockDays AS INT))

    I would have thought to use something like this to compare the exact dates as-is:

    DATEADD(day, @LockDays, SL.SaleDate) <= SL2.SaleDate

    2. Can you explain EL_START and SPERIOD a bit more for me?

    3. Your results are currently omitting the final (eligible) sale (2014-05-08 13:00:00). But I haven't been able to figure out exactly why yet.

    Thanks again, I am very grateful and enthusiastic to learn from this solution!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • I got stumped with work, be back to you tomorrow:cool: or as soon as I can.

  • Auto,

    While Eirikur's solution will work, there's a massive triangle join in the middle around that outer apply, unless I misread it (possible, I'm doped on Nyquil at the moment). You're right, you want to basically do a running totals/recursion solution against this if there's a large # of records. It may not matter though for your current quantity.

    I'm currently unable to spend the time on this problem to hand you the solution (I'll come back tomorrow, barring work overload), but your best solution is probably going to be using the quirky update for a single pass against the data dumped into a temp table with just the data you need for locating the concern.

    Jeff Moden's Quirky Update Article[/url]

    The short form of the technique is get your data for all the customers/salespeople/etc you want to check into a single temp table, where you can control the clustered index/key directly without worrying about underlying schema, follow the rules in that article, and do a single pass for all values. Once you've done that, you can then return to the originating table(s) and update them with the findings.

    Moving forward, I'd recommend you 'Instead of' trigger the table that these belong to, and use a very particular non-clustered inclusion index to feed the trigger to determine eligibility as it gets inserted. Have it set the value as records come in so you don't have to do these kind of 'maintenance passes' against the data. Of course, that depends on the sales records being consistent, which you may not have control over, so that may be a moot recommendation, and won't be kind to you if you bulk insert multiple days of data at a time.

    EDIT: Misspelt Eirikur's name, sorry.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig for the considered reply. I spent a good 2 hours today reading through Jeff's QU solution and all the dozens of pages of replies and heated debate. After reading everything, and considering my situation, I think I should probably not go with the Quirky Update method, though I will certainly be using it for my own projects in future.

    But in that discussion, i came across Jeff's version of someone else's method:

    http://www.sqlservercentral.com/Forums/FindPost816964.aspx

    I am going to try testing that out a bit, though it is currently taking dramatically longer than I thought it would. And I also hope Eirikur will be following up on his solution as well, even though it does have RBAR in it.

    Oh, and yes, we are updating on a weekly (or so) basis in batches, so probably no Triggers here.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Had a quick look at it again and discovered minor snags:-D

    Firstly, this is not going to work for more than one customer at a time so I added a variable to filter on customer.

    Secondly, I didn't add a "closing" record for the last transaction, hence it can never be matched in the final part.

    Here is the code with the corrections and few added comments. Feel free to ask if any questions.

    😎

    USE [tempdb]

    IF object_id('#Sales') IS NOT NULL

    DROP TABLE #Sales

    CREATE TABLE #Sales

    (

    SaleID INT IDENTITY(1,1) NOT NULL,

    CustID INT NOT NULL,

    SalesRepID INT NOT NULL,

    SaleDate datetime NOT NULL

    )

    -- Create a variable to hold the number of days

    -- after an initial Eligible sale

    -- for which subsequent sales to a given customer

    -- are denied commission

    DECLARE @LockDays TINYINT

    SET @LockDays = 1

    DECLARE @CustID INT

    SET @CustID = 100;

    -- Insert a bunch of hypothetical sales for CustID 100

    INSERT INTO #Sales (CustID, SalesRepID, SaleDate)

    SELECT 100, 1, '2014-05-01 12:00:00' UNION ALL -- Sale #1: Eligible -- X

    SELECT 100, 1, '2014-05-01 15:00:00' UNION ALL -- Sale #2: Ineligible, date is within lock period of Sale #1 --

    SELECT 100, 2, '2014-05-02 14:00:00' UNION ALL -- Sale #3: Eligible, date is after lock period of Sale #1 -- X

    SELECT 100, 3, '2014-05-02 11:00:00' UNION ALL -- Sale #4: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-02 11:01:00' UNION ALL -- Sale #5: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-02 11:02:00' UNION ALL -- Sale #6: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-03 11:03:00' UNION ALL -- Sale #7: Ineligible, date is within lock period of Sale #3 --

    SELECT 100, 1, '2014-05-03 14:00:00' UNION ALL -- Sale #8: Eligible, date is after lock period of Sale #3 -- X

    SELECT 100, 1, '2014-05-03 19:00:00' UNION ALL -- Sale #9: Ineligible, date is within lock period of Sale #8 --

    SELECT 100, 1, '2014-05-04 13:00:00' UNION ALL -- Sale #10: Ineligible, date is within lock period of Sale #8 --

    SELECT 100, 1, '2014-05-05 13:00:00' UNION ALL -- Sale #11: Eligible, date is after lock period of Sale #8 -- X

    SELECT 100, 1, '2014-05-06 13:00:00' UNION ALL -- Sale #12: Eligible, date is after lock period of Sale #11 -- X

    SELECT 100, 1, '2014-05-07 13:00:00' UNION ALL -- Sale #13: Eligible, date is after lock period of Sale #12 -- X

    SELECT 100, 1, '2014-05-08 13:00:00' -- Sale #14: Eligible, date is after lock period of Sale #13 -- X

    ;WITH PSEUDO_LAST_SALE AS

    /* In order to include the last transaction, a fake transaction

    is added outside the expire time of the last actual transaction.

    Without it, the last transaction can never match itself in

    the final select statement. */

    (

    SELECT

    S.CustID

    ,0 AS SalesRepID

    ,DATEADD(DAY,@LockDays,MAX(S.SaleDate)) AS SaleDate

    FROM #Sales S

    WHERE S.CustID = @CustID

    GROUP BY S.CustID

    )

    /* Add the fake transaction and a row_number to

    order by sale date.

    */

    ,SALE_LIST AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY (SELECT NULL)

    ORDER BY S.SaleDate

    ) AS CUST_RID

    ,S.CustID

    ,S.SalesRepID

    ,S.SaleDate

    FROM

    (

    SELECT

    S.CustID

    ,S.SalesRepID

    ,S.SaleDate

    FROM #Sales S

    WHERE S.CustID = @CustID

    UNION ALL

    SELECT

    S.CustID

    ,S.SalesRepID

    ,S.SaleDate

    FROM PSEUDO_LAST_SALE S

    ) AS S

    )

    /* Identify all possible Ineligible periods by matching

    the set to itself and filtering on the duration.

    */

    ,EL_START AS

    (

    SELECT

    X.SL_RID

    ,X.SL_CUST_RID

    ,X.SL2_CUST_RID

    ,X.SL_SaleDate

    ,X.CustID

    ,X.SalesRepID

    FROM

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY SL.CUST_RID

    ORDER BY SL.SaleDate,SL2.SaleDate

    ) AS SL_RID

    ,SL.CUST_RID AS SL_CUST_RID

    ,SL2.CUST_RID AS SL2_CUST_RID

    ,SL.SaleDate AS SL_SaleDate

    ,SL.CustID AS CustID

    ,SL.SalesRepID AS SalesRepID

    FROM SALE_LIST SL

    OUTER APPLY SALE_LIST SL2

    WHERE SL.CUST_RID < SL2.CUST_RID

    AND DATEDIFF(MINUTE,SL.SaleDate,SL2.SaleDate) >= (1440 * CAST(@LockDays AS INT))

    ) AS X WHERE X.SL_RID = 1

    )

    /* Simple recursion to match the periods.

    The first part is the starting point and the

    second part "walks" through the list by matching

    beginning-end=beginning-end...

    */

    ,SPERIOD AS

    (

    SELECT TOP (1)

    EL1.SL_RID

    ,EL1.SL_CUST_RID

    ,EL1.SL2_CUST_RID

    ,EL1.SL_SaleDate

    ,EL1.CustID

    ,EL1.SalesRepID

    FROM EL_START EL1

    WHERE EL1.SL_RID = 1

    UNION ALL

    SELECT

    E1.SL_RID

    ,E1.SL_CUST_RID

    ,E1.SL2_CUST_RID

    ,E1.SL_SaleDate

    ,E1.CustID

    ,E1.SalesRepID

    FROM EL_START E1

    INNER JOIN SPERIOD S1

    ON E1.SL_CUST_RID = S1.SL2_CUST_RID

    )

    /* Final output query */

    SELECT

    SL.CustID

    ,SL.CUST_RID

    ,SL.SalesRepID

    ,SL.SaleDate

    ,CASE

    WHEN SP.SL_CUST_RID IS NOT NULL THEN 'Eligible'

    ELSE 'Ineligible'

    END AS eligibility

    FROM SALE_LIST SL

    LEFT OUTER JOIN SPERIOD SP

    ON SL.CUST_RID = SP.SL_CUST_RID

    AND SL.CustID = SP.CustID

    WHERE SL.SalesRepID > 0;

  • The Quirky Update will do this on a million row table in scant seconds but, because it's not supported and there's quite a bit of FUD in the eyes of many people, so here's an (ugh!) alternate solution. Note that I materialize the RowNum column by dumping it and the other columns of teh #Sales table into a different Temp Table which, in turn, prevents the nasty full-up CROSS JOIN that you'd normally get when joining a ROW_NUMBER() column to itself. This is going to suck just as bad as a Cursor and While loop but I don't know of a decent set-based T-SQL-Only method to solve this problem. Hopefully, someone smarter than I can come up with one and hopefully they'll remember this is for 2005.

    This one only hits the table twice. If the table already existed with a properly ordered (Sale #3 is out of order in the original table) and had a proper index on it, this would do one seek and one (required) scan. If you try to change the (required) scan to a seek, it will do one seek for every row and that's horribly inefficient.

    Here's the code, which works against the test harness already provided.

    --drop table #SalesWork;

    SELECT *, RowNum = ISNULL(ROW_NUMBER() OVER (ORDER BY SaleDate),0)

    INTO #SalesWork

    FROM #Sales

    ;

    WITH

    cteMark AS

    (

    SELECT RowNum, CustID, SalesRepID, SaleDate,

    IsEligible = 1,

    PrevEligibleSalesDate = SaleDate

    FROM #SalesWork

    WHERE RowNum = 1

    UNION ALL

    SELECT s.RowNum, s.CustID, s.SalesRepID, s.SaleDate,

    IsEligible = CASE WHEN s.SaleDate >= m.PrevEligibleSalesDate+1 THEN 1 ELSE 0 END,

    PrevEligibleSalesDate = CASE WHEN s.SaleDate >= m.PrevEligibleSalesDate+1 THEN s.SaleDate ELSE m.PrevEligibleSalesDate END

    FROM cteMark m

    JOIN #SalesWork s

    ON s.RowNum-1 = m.RowNum

    )

    SELECT * FROM cteMark

    ;

    Note that I didn't make it so that it would handle multiple CustIDs... I figured I'd let you study this and then add that nuance to the CASE functions. I also didn't make it take a variable for the N number of days. I wanted to keep this example simple. It's easy enough for you to do once you understand how this particular rCTE works.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Simplest solution I can come up with:

    SELECT

    SaleID,

    CustID,

    SalesRepID,

    SaleDate,

    case when exists

    (select 1

    from #Sales as Sales2

    where Sales2.CustID = Sales1.CustID

    and Sales2.SaleDate > dateadd(day, -1, Sales1.SaleDate)

    and Sales2.SaleDate < Sales1.SaleDate

    and Sales2.SaleID != Sales1.SaleID) then 0

    else 1

    end as eligibility

    FROM #Sales as Sales1

    order by SaleDate;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks to everyone who has replied already. A better group of fellas I couldn't possibly ask for.

    @Eirikur: Thank you again for all your work on this solution! I have been studying your solutions and have learned a lot that will surely be useful.

    @Gus/GSquared: Your solution was so short and sweet I thought surely it was too good to be true. And alas, I believe it was. Unless I'm missing something, it only takes into account the date of the previous sale, and I need to take into account the most recent valid sale. But again, your solution is elegant and sleek and has offered me a great learning experience.

    @jeff: A thousand thanks to you for your discussion and consideration. Your proposed solution is the one I will be running with here. To get it to work with multiple CustIDs, I added a "PARTITION BY CustID" in the ROW_NUMBER function, and then added "s.CustID = m.CustID" to the JOIN conditions. Hopefully that is right, it seems to be generating the desired results. Now I will need to make sure my indexes are properly set up and continue testing on larger sets.

    Thanks again, I'm humbled by and grateful to you guys, as always. What a wonderful place you've all made this site over the years.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (5/21/2014)


    Thanks to everyone who has replied already. A better group of fellas I couldn't possibly ask for.

    @Eirikur: Thank you again for all your work on this solution! I have been studying your solutions and have learned a lot that will surely be useful.

    @Gus/GSquared: Your solution was so short and sweet I thought surely it was too good to be true. And alas, I believe it was. Unless I'm missing something, it only takes into account the date of the previous sale, and I need to take into account the most recent valid sale. But again, your solution is elegant and sleek and has offered me a great learning experience.

    @jeff: A thousand thanks to you for your discussion and consideration. Your proposed solution is the one I will be running with here. To get it to work with multiple CustIDs, I added a "PARTITION BY CustID" in the ROW_NUMBER function, and then added "s.CustID = m.CustID" to the JOIN conditions. Hopefully that is right, it seems to be generating the desired results. Now I will need to make sure my indexes are properly set up and continue testing on larger sets.

    Thanks again, I'm humbled by and grateful to you guys, as always. What a wonderful place you've all made this site over the years.

    Thanks for the feedback.

    I believe you'll also need to chance the CASE statements so that they start the "PrevEligibleSalesDate" part of the rCTE so that the "first" row for a customer is always an "Eligible" sale.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff, in testing your rCTE code on larger data sets, it seems to work most optimally in batches of about 1000 records at a time. (And even so, I have to set MAXRECURSION to 0.) So I think I'm going to set it up to process 1000 records at a time in a while loop until all pending records are processed. Does this sound like a fair compromise in this situation? Or should I be looking at other options?

    Regarding your comment about making sure my batches include the first "eligible" sale per customer, yes, I agree. I'm not sure how best to explain this but... For each batch I'm selecting 1000 unprocessed records into the temp table, and then appending all "eligible" records that are within the "lock" period of MIN(#TempTable.SaleDate) to it before processing that batch. That way the integrity of each "lock" period is preserved. Hope that makes sense...


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (5/21/2014)


    @Jeff, in testing your rCTE code on larger data sets, it seems to work most optimally in batches of about 1000 records at a time. (And even so, I have to set MAXRECURSION to 0.) So I think I'm going to set it up to process 1000 records at a time in a while loop until all pending records are processed. Does this sound like a fair compromise in this situation? Or should I be looking at other options?

    Regarding your comment about making sure my batches include the first "eligible" sale per customer, yes, I agree. I'm not sure how best to explain this but... For each batch I'm selecting 1000 unprocessed records into the temp table, and then appending all "eligible" records that are within the "lock" period of MIN(#TempTable.SaleDate) to it before processing that batch. That way the integrity of each "lock" period is preserved. Hope that makes sense...

    Personally, I'd use the Quirky Update method especially since you're doing this on a Temp Table.

    That, notwithstanding, if you're going to go the route of breaking things up into batches, I'd make each batch handle just one customer at a time so that you don't have to worry where you left off, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply