Solving FIFO Queues Using Windowed Functions

  • drew.allen

    SSC Guru

    Points: 76739

    Comments posted to this topic are about the item Solving FIFO Queues Using Windowed Functions

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Eirikur Eiriksson

    SSC Guru

    Points: 182438

    Nice write-up Drew, well done!

    😎

    One thing I missed seeing in the article was the mentioning of a POC index which is absolutely essential when working with the window functions, especially when more than 90% of the execution cost are the sort operators.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Interesting logic flow, thanks.

  • Ken Wymore

    SSCoach

    Points: 16611

    Excellent write up!

  • TheSQLGuru

    SSC Guru

    Points: 134017

    Very nicely done Drew!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Alan Burstein

    SSC Guru

    Points: 61087

    Great, great work Drew. Very well done.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • wtren

    SSC Eights!

    Points: 806

    beautiful work, I check the execution plan, the main cost is on sort operator, in fact we do not need that much sort, we just need sort the credit operation by transid in every custid group to make a FIFO, get a correct runningtotalofcredit order, calculate the final sum debits for every custid, and then join the two result on custid , compare two sum number to get the result.

    here is the some simple code for that:

    set statistics time on

    select c.custid, transtype,

    case when runningtotalofc + amountd >=0 then 0

    when runningtotalofc + amountd < amount or amountd is null then -amount

    else -runningtotalofc - amountd

    end as remain

    from

    (

    (select custid,transtype,amount,sum(amount) over (partition by custid order by transid) as runningtotalofC

    from dbo.transactions where transtype = 'C') as c

    full outer join

    (select custid,sum(amount) amountd

    from dbo.transactions where transtype = 'D' group by custid) as d

    on c.custid = d.custid)

    order by custid

    set statistics time off

    please point out if there any problems, thank you.

  • drew.allen

    SSC Guru

    Points: 76739

    wtren (10/8/2016)


    beautiful work, I check the execution plan, the main cost is on sort operator, in fact we do not need that much sort, we just need sort the credit operation by transid in every custid group to make a FIFO, get a correct runningtotalofcredit order, calculate the final sum debits for every custid, and then join the two result on custid , compare two sum number to get the result.

    here is the some simple code for that:

    set statistics time on

    select c.custid, transtype,

    case when runningtotalofc + amountd >=0 then 0

    when runningtotalofc + amountd < amount or amountd is null then -amount

    else -runningtotalofc - amountd

    end as remain

    from

    (

    (select custid,transtype,amount,sum(amount) over (partition by custid order by transid) as runningtotalofC

    from dbo.transactions where transtype = 'C') as c

    full outer join

    (select custid,sum(amount) amountd

    from dbo.transactions where transtype = 'D' group by custid) as d

    on c.custid = d.custid)

    order by custid

    set statistics time off

    please point out if there any problems, thank you.

    One of the requirements of the original question was to display when a credit was depleted, which requires knowing the date of the debit that did so. Since your subquery only contains a grand total of debits for a customer, it cannot return that information.

    Drew

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • J Livingston SQL

    SSC Guru

    Points: 51272

    I did some performance testing on the two pieces of code. They don't produce the same results, and I haven't worked through where the issues are, so these results may not stand. I've also attached a portion of the data. I couldn't upload the whole set of data due to size restrictions.

    Drew...I think that if you remove

    WHERE rn = 1

    AND Results.debitdate IS NOT NULL

    then the number of rows will be the same

    iirc, I put this in to meet OP requirement ?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • louisjeorge75

    Grasshopper

    Points: 17

    hi drew ,

    that is wonderful work , i need a help to write this in SQL DB2 , how can i do this.

  • drew.allen

    SSC Guru

    Points: 76739

    louisjeorge75 - Wednesday, December 20, 2017 5:02 AM

    hi drew ,

    that is wonderful work , i need a help to write this in SQL DB2 , how can i do this.

    Thanks.  I don't know DB2, so I can't help you with the rewrite.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • autoexcrement

    SSCertifiable

    Points: 5885

    ISNULL(LAG([to], 1) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount), 0)
    Am I misunderstanding something here? The ISNULL check would never return NULL, and thus 0, because you've already provided a default value (1) in your LAG statement in case of NULLs. No?

    Edit: Nevermind. It's the offset that's been provided, not the default value. But still, wouldn't this make more sense?
    LAG([to], 1, 0) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount)


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement

    SSCertifiable

    Points: 5885

    Question #2...I am not entirely following this bit:

    When SUM() is used with an OVER clause that contains an ORDER BY, then a window frame is "required."  If you do not specify a window frame, then it will use the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The problem with this is that RANGE will always write to disk, but ROWS will not if there are fewer than 10,000 records per partition. When used with SUM(), there are only ever TWO rows per partition, because of the way that SQL calculates the running totals.

    What is the significance of "writing to disk", and what is the 10k vs 2 rows per partition issue? I'm sorry for my ignorance, trying to understand what you mean here. Thanks!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • TheSQLGuru

    SSC Guru

    Points: 134017

    autoexcrement - Thursday, November 29, 2018 10:59 PM

    ISNULL(LAG([to], 1) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount), 0)
    Am I misunderstanding something here? The ISNULL check would never return NULL, and thus 0, because you've already provided a default value (1) in your LAG statement in case of NULLs. No?

    Edit: Nevermind. It's the offset that's been provided, not the default value. But still, wouldn't this make more sense?
    LAG([to], 1, 0) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount)

    That should be more efficient,  because the replacement of missing values is coded into the same branch that does the LAG itself, as opposed to being a separate process after the LAG is computed. Looks like a nice find.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru

    SSC Guru

    Points: 134017

    TheSQLGuru - Friday, November 30, 2018 7:36 AM

    autoexcrement - Thursday, November 29, 2018 10:59 PM

    ISNULL(LAG([to], 1) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount), 0)
    Am I misunderstanding something here? The ISNULL check would never return NULL, and thus 0, because you've already provided a default value (1) in your LAG statement in case of NULLs. No?

    Edit: Nevermind. It's the offset that's been provided, not the default value. But still, wouldn't this make more sense?
    LAG([to], 1, 0) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount)

    That should be more efficient,  because the replacement of missing values is coded into the same branch that does the LAG itself, as opposed to being a separate process after the LAG is computed. Looks like a nice find.

    But looking at the lasted code, posted just today, it seems he fixed this:

       LAG(tot, 1, 0) OVER(PARTITION BY CustID ORDER BY tot, TransType) AS PrevTot

    That is the only LAG in his query now.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply