Help with complicated running balance

  • I have a challenge that I was hoping to solve using the new SQL Server 2012 running balance capabilities, but it's a bit more complicated than I realized. I've attached a small spreadsheet with some sample data.

    Basically these are inventory transactions for multiple LotIDs that are all part of a single PudID group and I need to be able to calculate the running balance for the PudID, which is the PudIDAsOfBal column in the spreadsheet. I've included some comments on how that number is calculated. I added the "LotIDBalSeq" column thinking I could somehow sum up the last PudIDLotBal for each LotID as of the TranDate.

    The following code will return the results I have to work with. What I need is one more column with the PudIDAsOfBal:

    SELECTPudID,

    TranDate,

    TranQty,

    RunBal,

    PudIDPct,

    LotID,

    PudIDLotBal,

    LotIDBalSeq,

    'help?'PudIDAsOfBal

    FROM

    (SELECT 751 PudID,'2011-09-19 18:21:09.000' TranDate,64298 TranQty,64298 RunBal,56.477600 PudIDPct,'42011025' LotID,36314 PudIDLotBal,1 LotIDBalSeq

    UNION SELECT 751,'2011-09-20 03:20:08.000',-45938,64298,56.477600,'42011025',36314,2

    UNION SELECT 751,'2011-09-20 03:20:08.000',45938,64298,56.477600,'42011025',36314,3

    UNION SELECT 751,'2011-09-20 03:21:08.000',-18360,60346,56.483300,'42011025',25947,4

    UNION SELECT 751,'2011-09-20 03:21:08.000',14408,60346,56.477600,'42011025A',8137,1

    UNION SELECT 751,'2011-09-20 18:16:49.000',27122,87468,100.00000,'42011027',27122,1

    UNION SELECT 751,'2011-09-20 19:51:29.000',35454,122922,19.805900,'42011026',7021,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',12757,121271,10.055800,'42011022A',1282,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',-14408,121271,56.477600,'42011025A',0,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',23515,121271,19.805900,'42011026',7021,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',-23515,121271,19.805900,'42011026',7021,3

    UNION SELECT 751,'2011-09-21 02:11:00.000',-11939,121271,19.800000,'42011026',4655,4

    UNION SELECT 751,'2011-09-21 02:11:00.000',11939,121271,19.805900,'42011026A',2364,1

    ) WORK

    I thought I could use some variation of SUM(PudIDLotBal) OVER (PARTITION... but having the multiple lots and needing the last balance for each lot as of the TranDate is over my head. Any help woud be greatly appreciated.

    Thanks,

    Gina

  • gcresse (4/15/2013)


    I've included some comments on how that number is calculated.

    Where? There's a whole lot of attachments.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry about that. I must've grabbed the wrong file. I've re-uploaded the correct attachment.

  • I'm thinking that we should simplify the requirements a bit. From what I see, it boils down to two things.

    1. Do a normal running total first.

    2. Update any tied date/times of the first result set with the MAX value from that date/time group.

    You might be able to do Step 2 in the same code as Step 1 if you preaggregate the data so that you can create a "single previous row" for each datetime group and then join the result set back to the original data. I've not done a deep dive on this but it looks like you'll need two passes on the table one way or another. I've also not worked with the new capabilites of the Windowing Functions in 2012 so I could certainly be wrong there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK, I'm trying plan b but the running balance is not correct. It seems to ignore my request for only the highest seq for each lot. I must be missing something obvious, but I just can't see it. Any ideas?

    SELECTAsOfPudID,

    AsOfTranDate,

    AsOfLotKey,

    AsOfLotBal,

    AsOfLotBalSeq,

    (SELECTSUM(B.AsOfLotBal)

    FROMtestAsOfProdGalsByLot B

    WHEREB.AsOfPudID= A.AsOfPudID AND

    B.AsOfTranDate <= A.AsOfTranDate AND

    B.AsOfLotBalSeq = ( SELECTTOP 1 C.AsOfLotBalSeq

    FROMtestAsOfProdGalsByLot C

    WHEREC.AsOfPudID= B.AsOfPudID AND

    C.AsOfTranDate <= B.AsOfTranDate AND

    C.AsOfLotKey= B.AsOfLotKey

    ORDER BY AsOfLotBalSeq DESC))

    FROM

    (SELECTPudIDAsOfPudID,

    TranDateAsOfTranDate,

    LotIDAsOfLotKey,

    PudIDLotBalAsOfLotBal,

    LotIDBalSeqAsOfLotBalSeq

    FROM

    (SELECT 751 PudID,'2011-09-19 18:21:09.000' TranDate,64298 TranQty,64298 RunBal,56.477600 PudIDPct,'42011025' LotID,36314 PudIDLotBal,1 LotIDBalSeq

    UNION SELECT 751,'2011-09-20 03:20:08.000',-45938,64298,56.477600,'42011025',36314,2

    UNION SELECT 751,'2011-09-20 03:20:08.000',45938,64298,56.477600,'42011025',36314,3

    UNION SELECT 751,'2011-09-20 03:21:08.000',-18360,60346,56.483300,'42011025',25947,4

    UNION SELECT 751,'2011-09-20 03:21:08.000',14408,60346,56.477600,'42011025A',8137,1

    UNION SELECT 751,'2011-09-20 18:16:49.000',27122,87468,100.00000,'42011027',27122,1

    UNION SELECT 751,'2011-09-20 19:51:29.000',35454,122922,19.805900,'42011026',7021,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',12757,121271,10.055800,'42011022A',1282,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',-14408,121271,56.477600,'42011025A',0,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',23515,121271,19.805900,'42011026',7021,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',-23515,121271,19.805900,'42011026',7021,3

    UNION SELECT 751,'2011-09-21 02:11:00.000',-11939,121271,19.800000,'42011026',4655,4

    UNION SELECT 751,'2011-09-21 02:11:00.000',11939,121271,19.805900,'42011026A',2364,1

    ) WORK

    ) A

    UPDATE: I found my issue. This code seems to work for me now:

    SELECTAsOfPudID,

    AsOfTranDate,

    AsOfLotKey,

    AsOfLotBal,

    AsOfLotBalSeq,

    (SELECTSUM(B.AsOfLotBal)

    FROMtestAsOfProdGalsByLot B

    WHEREB.AsOfPudID= A.AsOfPudID AND

    B.AsOfTranDate <= A.AsOfTranDate AND

    B.AsOfLotBalSeq = ( SELECTTOP 1 C.AsOfLotBalSeq

    FROMtestAsOfProdGalsByLot C

    WHEREC.AsOfPudID= A.AsOfPudID AND

    C.AsOfTranDate <= A.AsOfTranDate AND

    C.AsOfLotKey= B.AsOfLotKey

    ORDER BY AsOfLotBalSeq DESC))

    FROM

    (SELECTPudIDAsOfPudID,

    TranDateAsOfTranDate,

    LotIDAsOfLotKey,

    PudIDLotBalAsOfLotBal,

    LotIDBalSeqAsOfLotBalSeq

    FROM

    (SELECT 751 PudID,'2011-09-19 18:21:09.000' TranDate,64298 TranQty,64298 RunBal,56.477600 PudIDPct,'42011025' LotID,36314 PudIDLotBal,1 LotIDBalSeq

    UNION SELECT 751,'2011-09-20 03:20:08.000',-45938,64298,56.477600,'42011025',36314,2

    UNION SELECT 751,'2011-09-20 03:20:08.000',45938,64298,56.477600,'42011025',36314,3

    UNION SELECT 751,'2011-09-20 03:21:08.000',-18360,60346,56.483300,'42011025',25947,4

    UNION SELECT 751,'2011-09-20 03:21:08.000',14408,60346,56.477600,'42011025A',8137,1

    UNION SELECT 751,'2011-09-20 18:16:49.000',27122,87468,100.00000,'42011027',27122,1

    UNION SELECT 751,'2011-09-20 19:51:29.000',35454,122922,19.805900,'42011026',7021,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',12757,121271,10.055800,'42011022A',1282,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',-14408,121271,56.477600,'42011025A',0,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',23515,121271,19.805900,'42011026',7021,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',-23515,121271,19.805900,'42011026',7021,3

    UNION SELECT 751,'2011-09-21 02:11:00.000',-11939,121271,19.800000,'42011026',4655,4

    UNION SELECT 751,'2011-09-21 02:11:00.000',11939,121271,19.805900,'42011026A',2364,1

    ) WORK

    ) A

  • The Plan B query is excrutiatingly slow 🙁

    Any suggestions on how I can get the same results but with better performance?

  • I thought you said you were using 2012 functionality. It now appears that you've used a Triangular Join ( See the following link for why they are so slow http://www.sqlservercentral.com/articles/T-SQL/61539/ ).

    Like I suggested, first do a "normal" running total using the 2012 functionality and then mark the dupes with the max.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff. Yes, I want to use the 2012 functionality, but I guess I don't quite understand what you're suggesting. I can create a running total on the balances (which will be wrong) using SUM OVER PARTITION... but after that, I'm not sure what you mean by marking the duplicates.

  • SELECTPudID,

    TranDate,

    TranQty,

    RunBal,

    PudIDPct,

    LotID,

    PudIDLotBal,

    LotIDBalSeq,

    'help?'PudIDAsOfBal

    may be I am being particulalary thick this morning but I cant seem to follow your spreadsheet for "Runbal" and "PudIDLotBal"....are these columns in your database table that are already populated by the application or are they columns that you created by some previous query?

    [font="Courier New"]PudIDTranDateTranQtyRunBal

    7519/19/11 18:216429864298

    7519/20/11 3:20-4593864298

    7519/20/11 3:204593864298

    7519/20/11 3:21-1836060346

    7519/20/11 3:211440860346

    7519/20/11 18:162712287468

    7519/20/11 19:5135454122922

    7519/20/11 21:3012757121271

    7519/20/11 21:30-14408121271

    7519/21/11 2:0923515121271

    7519/21/11 2:09-23515121271

    7519/21/11 2:11-11939121271

    7519/21/11 2:1111939121271[/font]

    [font="Courier New"]PudIDTranDateTranQtyJLS runtot

    7519/19/11 18:2164298

    7519/20/11 3:20-4593818360

    7519/20/11 3:204593864298

    7519/20/11 3:21-1836045938

    7519/20/11 3:211440860346

    7519/20/11 18:162712287468

    7519/20/11 19:5135454122922

    7519/20/11 21:3012757135679

    7519/20/11 21:30-14408121271

    7519/21/11 2:0923515144786

    7519/21/11 2:09-23515121271

    7519/21/11 2:11-11939109332

    7519/21/11 2:1111939121271[/font]

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • The results I've included in this post are from a query. I've just tried to simplify the issue since I'm really only interested in getting the correct running balance for the PudIDLotBal. Let me see if I can clarify this a bit.

    These transactions are for wine lots (LotID). We purchase grapes from various growers using purchase orders (PudID) then we crush and process the grapes on inventory transactions (TranDate, TranQty). The resulting wine gets blended into multiple lots, and wine from multiple purchase orders can be included in a single lot. What I'm trying to do is determine how many gallons of wine were produced from each purchase order as of any transaction date.

    In the spreadsheet, the RunBal is simply a running balance of the total transaction qty, and is really of no use for what I'm trying to do. I should have left it out of the spreadsheet to avoid confusion. Sorry about that. The PudIDPct is the portion of the resulting lot balance that was sourced from the purchase order (PudID) and that is included simply for auditing purposes. The PudIDLotBal is the amount of wine for that Lot as of that transaction date/time that was sourced from the purchase order and that's the column that is important to me. As more and more lots of wine that have portions sourced from the purchase order enter the picture, I need to keep track of how many gallons those grapes from that purchase order have produced. Does that make sense?

    As I drifted off to sleep last night, I had the idea that perhaps I should try to pivot the results so each LotID becomes a column. I haven't tried it yet, but that will probably be my next experiment, unless someone has a brilliant idea for me to try.

    Thanks,

    Gina

  • gcresse (4/15/2013)


    I have a challenge that I was hoping to solve using the new SQL Server 2012 running balance capabilities, but it's a bit more complicated than I realized. I've attached a small spreadsheet with some sample data.

    Hi Gina

    I remember a similar requirement from a year or two ago - how did you solve it then, and how different is the requirement now?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hi Chris,

    That other post was an attempt to see if I could use some sort of "fast" running balance to calculate blend percentages (variety of grape, vintage of grape, appellation of grape) on-the-fly for a single lot rather than having to maintain a data warehouse that updates during a nightly job. I was never able to create a view that was fast enough to calculate the percentages so we continue to use the data warehouse.

    This current post is about using the percent data from that same data warehouse but needing to group and sort those results according the purchase order that sourced the grapes that ultimately became wine.

    Thanks,

    Gina

  • is the following any where near what you are looking for?

    note...rounding issues...used for simplicity

    SELECT PudID

    , TranDate

    , TranQty

    , LotID

    , CAST( TranQty * PudIDPct / 100 as INT) AS jlsqty

    , CAST (SUM ( TranQty * PudIDPct / 100 ) OVER ( ORDER BY PudID , TranDate , LotID , LotIDBalSeq ) AS INT) AS jlstot

    , CAST (SUM ( TranQty * PudIDPct / 100 ) OVER ( ORDER BY PudID , TranDate ) AS INT) AS jlstot2

    FROM

    (SELECT 751 PudID,'2011-09-19 18:21:09.000' TranDate,64298 TranQty,64298 RunBal,56.477600 PudIDPct,'42011025' LotID,36314 PudIDLotBal,1 LotIDBalSeq

    UNION SELECT 751,'2011-09-20 03:20:08.000',-45938,64298,56.477600,'42011025',36314,2

    UNION SELECT 751,'2011-09-20 03:20:08.000',45938,64298,56.477600,'42011025',36314,3

    UNION SELECT 751,'2011-09-20 03:21:08.000',-18360,60346,56.483300,'42011025',25947,4

    UNION SELECT 751,'2011-09-20 03:21:08.000',14408,60346,56.477600,'42011025A',8137,1

    UNION SELECT 751,'2011-09-20 18:16:49.000',27122,87468,100.00000,'42011027',27122,1

    UNION SELECT 751,'2011-09-20 19:51:29.000',35454,122922,19.805900,'42011026',7021,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',12757,121271,10.055800,'42011022A',1282,1

    UNION SELECT 751,'2011-09-20 21:30:22.000',-14408,121271,56.477600,'42011025A',0,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',23515,121271,19.805900,'42011026',7021,2

    UNION SELECT 751,'2011-09-21 02:09:25.000',-23515,121271,19.805900,'42011026',7021,3

    UNION SELECT 751,'2011-09-21 02:11:00.000',-11939,121271,19.800000,'42011026',4655,4

    UNION SELECT 751,'2011-09-21 02:11:00.000',11939,121271,19.805900,'42011026A',2364,1

    ) WORK

    ORDER BY PudID , TranDate , LotID , LotIDBalSeq;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Perfect! That's exactly the answer I was looking for. I tried all sorts of SUM OVER PARTITION BY combinations with no luck, but I see the PARTITION wasn't required. I guess I was over-thinking it.

    Thank you so much, JLS. If ever you want a really good bottle of wine, just let me know 😀

    Gina

  • gcresse (4/20/2013)


    Perfect! That's exactly the answer I was looking for. I tried all sorts of SUM OVER PARTITION BY combinations with no luck, but I see the PARTITION wasn't required. I guess I was over-thinking it.

    Thank you so much, JLS. If ever you want a really good bottle of wine, just let me know 😀

    Gina

    Thanks Gina....would suggest that you examine exec plans for larger data sets and index where applicable.

    good luck and thanks for the offer (Châteauneuf-du-Pape 1998_<grin>)

    edit ...for multiple PudId use partition by

    eg

    OVER ( PARTITION BY PUDID ORDER BY PudID , TranDate )

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply