Solving FIFO Queues Using Windowed Functions

Introduction

Recently, someone posted a question on SQLServerCentral that boiled down to a FIFO queue. FIFO is short for First In, First Out. In a FIFO queue, the first item in must be the first item processed. The specific problem in this case was applying debits to credits. Debits must be applied to the first credit that has a remaining balance until that balance becomes zero. Any remaining debit is then applied to the next credit. This process is repeated until there are no remaining debits, credits, or both. The solution that I proposed relies heavily on window functions to solve this problem.

Expectations

Here are the expectations for the solution.

It must work for multiple customers
There are business rules in place to prevent debits from exceeding credits.¹
It must handle the following conditions:
- A single credit is exactly offset by a single debit
- A single credit is exactly offset by two or more debits
- A single credit is offset by two partial debits
- A single credit is offset by one or more full debits and one or two partial debits
- A credit has a partial remaining balance
- A credit has its full balance remaining
- A single debit offsets one or more credits exactly
- A single debit partially offsets two credits
- A single debit fully offsets one or more credits and partially offsets one or two credits

Data Setup

To make it easier to follow the logic, I'm setting up one customer for each of the situations. The data for these customers will be exactly what is needed to demonstrate that particular situation. In addition, I'm setting up one more customer with a complex situation that involves more than one of the situations above. (It's impossible to set it up with all of the situations, because it's impossible to simultaneously have both a remaining credit balance and a remaining debit balance based on the rules in the introduction.)

CREATE TABLE Transactions (
    TransID INT IDENTITY,
    CustID INT NOT NULL,
    TransType CHAR(1) CONSTRAINT CK_TransType CHECK (TransType IN ('C', 'D')),
    TransDate DATE,
    Amount MONEY
)
;
INSERT Transactions(
    CustID,
    TransType,
    TransDate,
    Amount
)
VALUES
    (1, 'C', '20160101', -20),  -- credit to offset
    (1, 'D', '20160201', 20),    -- offsetting debit
    (2, 'C', '20160101', -40),  -- credit to offset
    (2, 'D', '20160201', 30),    -- first debit fully applied
    (2, 'D', '20160301', 10),    -- second debit fully applied
    (3, 'C', '20160101', -40),
    (3, 'D', '20160201', 30),
    (3, 'C', '20160301', -20),  -- credit to offset
    (3, 'D', '20160401', 20),    -- first debit with partial balance
    (3, 'C', '20160501', -20),
    (3, 'D', '20160601', 30),    -- second debit leaving partial balance on debit
    (4, 'C', '20160101', -40),
    (4, 'D', '20160201', 30),
    (4, 'C', '20160301', -40),  -- credit to offset
    (4, 'D', '20160401', 20),    -- first debit with partial balance
    (4, 'D', '20160501', 20),    -- credit fully applied
    (4, 'C', '20160601', -10),
    (4, 'D', '20160701', 20),    -- credit leaving partial balance on debit
    (5, 'C', '20160101', -20),  -- credit to offset
    (5, 'D', '20160201', 10),    -- debit fully applied leaving credit balance
    (6, 'C', '20160101', -20),  -- credit with its full balance
    (7, 'C', '20160101', -20),  -- first credit to offset
    (7, 'C', '20160201', -10),  -- second credit to offset
    (7, 'D', '20160301', 30),    -- credit exactly offsets two credits
    (8, 'C', '20160101', -40),
    (8, 'D', '20160201', 30),    -- leaves balance on first credit
    (8, 'C', '20160301', -40),
    (8, 'D', '20160401', 20),    -- offsets balance on first credit, leaves partial balance on second credit
    (9, 'C', '20160101', -20),  -- first credit
    (9, 'D', '20160201', 10),    -- leaves a partial balance on first credit
    (9, 'C', '20160301', -20),  -- second credit
    (9, 'C', '20160401', -20),  -- third credit
    (9, 'D', '20160501', 40),    -- offsets balance on first credit, offsets second credit in full, leaves balance on third credit
    (10, 'C', '20160101', -20),
    (10, 'D', '20160201', 20),
    (10, 'C', '20160301', -100),
    (10, 'D', '20160401', 50),
    (10, 'D', '20160501', 50),
    (10, 'C', '20160601', -50),
    (10, 'D', '20160701', 20),
    (10, 'C', '20160801', -50),
    (10, 'D', '20160901', 20)
;
-- Expected Results
SELECT *
FROM (
    VALUES
    (1, CAST('2016-01-01' AS DATE), 0.00, CAST('2016-02-01' AS DATE)),
    (2, '2016-01-01', 0.00, '2016-03-01' ),
    (3, '2016-01-01', 0.00, '2016-04-01' ),
    (3, '2016-03-01', 0.00, '2016-06-01' ),
    (3, '2016-05-01', 0.00, '2016-06-01' ),
    (4, '2016-01-01', 0.00, '2016-04-01' ),
    (4, '2016-03-01', 0.00, '2016-07-01' ),
    (4, '2016-06-01', 0.00, '2016-07-01' ),
    (5, '2016-01-01', 10.00, '2016-02-01' ),
    (6, '2016-01-01', 20.00, NULL),
    (7, '2016-01-01', 0.00, '2016-03-01' ),
    (7, '2016-02-01', 0.00, '2016-03-01' ),
    (8, '2016-01-01', 0.00, '2016-04-01' ),
    (8, '2016-03-01', 30.00, '2016-04-01' ),
    (9, '2016-01-01', 0.00, '2016-05-01' ),
    (9, '2016-03-01', 0.00, '2016-05-01' ),
    (9, '2016-04-01', 10.00, '2016-05-01' ),
    (10, '2016-01-01', 0.00, '2016-02-01' ),
    (10, '2016-03-01', 0.00, '2016-05-01' ),
    (10, '2016-06-01', 10.00, '2016-09-01' ),
    (10, '2016-08-01', 50.00, NULL)
) v(CustID, CreditDate, RemainingBalance, LastRedeemedDate)

Preliminary Solution

After reading the problem, I started formulating a solution, but didn't have time to fully work out the details. When I returned to the problem, I saw that J. Livingston SQL had posted a solution based on a solution to a similar problem that had been posted by ChrisM. J. Livingston's solution only worked with one customer. I've updated his solution to work with multiple customers and to reference the table/columns holding my sample data. I also replaced credit ID with credit date. The modified code is below.

-- drew.allen's comment
-- code based on J. Livingston's initial query at http://www.sqlservercentral.com/Forums/FindPost1808929.aspx
-- J. Livingston's comment
--code based on ChrisM fine work at http://www.sqlservercentral.com/Forums/Topic1731617-391-1.aspx
WITH
Debits AS (
SELECT  CustID,
        debitdate = TransDate,
        DebitID = TransID,
        DebitAmount = Amount,
        [from] = ISNULL(LAG([to], 1) OVER(PARTITION BY CustID ORDER BY d.TransDate, d.Amount), 0),
        [to]
FROM
(
    SELECT *,
           [to] = SUM(Amount) OVER(PARTITION BY CustID ORDER BY TransDate, Amount)
    FROM Transactions
    WHERE TransType = 'D'
) d
)
,
Credits as (
SELECT  CustID,
        CreditDate = TransDate,
        PaymentID = TransID,
        CreditAmount = -Amount,
        [from] = ISNULL(LAG([to], 1) OVER(PARTITION BY CustID ORDER BY c.TransDate, c.Amount), 0),
        [to]
FROM
(
    SELECT *,
           [to] = SUM(-Amount) OVER(PARTITION BY CustID ORDER BY TransDate, Amount)
    FROM Transactions
    WHERE TransType = 'C'
) c
)
, Results as (
SELECT c.CustID,
       c.CreditDate,
       debitdate,
       Balance = CASE
                      WHEN c.[to] > d.[to]
                      THEN c.[to] - d.[to]
                      WHEN d.[to] IS NULL
                      THEN c.CreditAmount
                      ELSE 0
                 END,
       ROW_NUMBER() OVER(PARTITION BY c.CustID, c.CreditDate, c.PaymentID ORDER BY DebitDate DESC, DebitAmount DESC) rn
FROM Debits d
FULL OUTER JOIN Credits c ON d.CustID = c.CustID
    AND c.[from] < d.[to] AND c.[to] > d.[from]
)
SELECT CustID,
       CreditDate,
       Balance AS remaining_balnce,
       debitdate AS last_redeemed_date
FROM results
WHERE rn = 1
  AND Results.debitdate IS NOT NULL
ORDER BY CustID, CreditDate;

J. Livingston's code uses a variation on the overlapping intervals problem. He divides the data into two parts: debits and credits. He calculates a running total for each of those segments, and then uses the LAG function to find the previous running total in order to establish the range for the interval that corresponds to this particular record. Finally, he matches the debits and credits based on their ranges. A credit is completely used by the debit where the final total for the credit lies within the range for the debit.

This was very similar to my initial approach to the problem with one major difference which eventually led me to a very different solution. Before I get into that, I wanted to bring up one point.

IMPORTANT: J. Livingston made a very common omission in his query. When SUM() is used with an OVER clause that contains an ORDER BY, then a window frame is "required." If you do not specify a window frame, then it will use the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The problem with this is that RANGE will always write to disk, but ROWS will not if there are fewer than 10,000 records per partition. When used with SUM(), there are only ever TWO rows per partition, because of the way that SQL calculates the running totals.

The big difference between J. Livingston's approach and my initial approach is that J. Livingston split the debits and credits immediately whereas I decided to keep them together as long as possible, because we were treating them essentially the same. I also realized that I didn't need to use lag to get the beginning of the range, because I could calculate the beginning by subtracting the current amount from the running total.²

Breakthrough

It was at this point that I had my big breakthrough. I realized that both running totals were isotonic (or monotonically increasing). This quickly led to several realizations that greatly simplified the problem.

There were no overlaps within a transaction type, so I didn't have to worry about double dipping.
There were no gaps within a transaction type, so I didn't have to worry about not applying debits to credits.
I didn't have to separate the transaction types at all. I could simply sort them by their running totals and they would fall into the correct place.
I no longer needed to calculate the beginning of the range, because the sort would place them in the correct location. I did have to make sure that credits sorted before debits if they had the same running total amount.

My code is listed below.

;
WITH totals AS (
    SELECT *, ABS(Amount) AS Amt, SUM(ABS(Amount)) OVER(PARTITION BY CustID, TransType ORDER BY TransDate ROWS UNBOUNDED PRECEDING ) AS tot
    FROM Transactions
)
, balances AS (
    SELECT *,
        MIN(CASE WHEN TransType = 'D' THEN TransDate END) OVER(PARTITION BY CustID ORDER BY tot, TransType ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS DebitDate,
        MAX(CASE WHEN TransType = 'D' THEN TransDate END) OVER(PARTITION BY CustID ORDER BY tot, TransType ROWS UNBOUNDED PRECEDING) AS LastDebit,
        LAG(tot, 1, 0) OVER(PARTITION BY CustID ORDER BY tot, TransType) AS PrevTot
    FROM totals
)
SELECT TransID, CustID, TransDate AS CreditDate,
    CASE
        WHEN DebitDate IS NOT NULL THEN 0
        ELSE tot - PrevTot
    END AS RemainingBalance,
    CASE
        WHEN DebitDate IS NOT NULL THEN DebitDate
        WHEN tot - PrevTot < Amt THEN LastDebit
    END AS LastRedeemedDate,
    tot - balances.PrevTot
FROM balances
WHERE TransType = 'C'
ORDER BY balances.TransType, balances.CustID, balances.TransDate
--OPTION(MAXDOP 1)
;

How the Code Works

The first CTE (totals) calculates the running totals. It is partitioned by the customer ID and the transaction type. The combination of the ORDER BY clause and the ROWS window frame ensures that we are calculating a running total rather than a grand total.

The second CTE uses several different window functions to find various values that are used in the main query. All of the functions are partitioned on the customer and ordered by the running total from the previous CTE and the transaction type. Each of the expressions is described further below.

DebitDate

The DebitDate is the date of the first debit for the current customer on or after the date of the current record and represents the date of the first debit with a running total greater than or equal to the running total on the current record. For credits, this is the debit that brings the credit balance to zero (and may possibly leave a debit balance for subsequent credits); for debits, it will always be the current record. Refer to Figure 1 which graphically displays the sections of the DebitDate calculation.

We use the CASE expression to exclude any date that is not associated with a debit, the window frame to ensure that we are getting dates from records with running totals greater than or equal to the current record, and the MIN() aggregate to ensure that we are getting the first.

LastDebit

This is essentially the same as the DebitDate, except that we are looking for the last debit on or before the date of the current record.³ This is reflected in the fact that we are using a different window frame and a different aggregate. This represents the last time that any credit balance up to and including the current record reached zero.

PrevTot

This is used to calculate the remaining balance. It is also used to determine whether to populate the LastRedeemedDate when there is a partial balance on a credit.

The LAG function copies a value from a previous record within the partition based on the specified order. There are two optional parameters to specify how many rows back to find the value, and a value to use if there are insufficient records before the current record. The default values are 1 for the number of records and NULL for the value to use in the case of insufficient records. Since we want to override the default for the second optional parameter, we also need to supply the first optional parameter even though we are using the default value.

In this case the value is the running total. We want to use the value from the previous record, so we use 1 as the number of rows to look back. We also want to use the value 0 for the previous total on the first record. NULL represents an unknown value, but, in this case, we know that the total started at zero, so we specify it here.

Main Query

The main query is where we pull everything together. We are focused mainly on the credit balances (since we stipulated that we cannot have a debit balance), so we filter for credits. If there is a debit after any credit (DebitDate IS NOT NULL), then we know that the credit has no remaining balance, and the DebitDate is the date that the balance became zero for that particular credit.⁴

See Figure 2. Remember, these records are sorted by their running totals, not by date.

If the credit has no following debits, we know that the current credit has a balance. The balance will be the current running total minus the previous running total. Furthermore, we know that if the previous transaction is a credit, then it also has a balance, because the current record is a credit and it doesn't have any following debits, so the previous record doesn't have any following debits either. In this case, the current balance will be equal to the original balance on the credit.

If the credit has a balance and the previous record is a debit, the debit depleted any previous credits with remaining balances. The debit also either has no remaining debit available or has some remaining debit available, but not enough to deplete the current credit (otherwise it would be sorted after the credit). In the first case, there have been no debits applied to the current credit and the balance is the original balance and the last redeemed date should be NULL.

In the second case, the credit will be partially redeemed and the last redeemed date will be the date of the previous debit.

Performance Testing

I did some performance testing on the two pieces of code. They don't produce the same results, and I haven't worked through where the issues are, so these results may not stand. I've also attached a portion of the data. I couldn't upload the whole set of data due to size restrictions.

/*  J. Livingston's  */ (50000 row(s) affected) Table 'Transactions'. Scan count 10, logical reads 3586, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 503999, logical reads 3000006, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  SQL Server Execution Times:    CPU time = 12546 ms,  elapsed time = 3648 ms. /* drew.allen's  */ (50000 row(s) affected) Table 'Transactions'. Scan count 5, logical reads 1793, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  SQL Server Execution Times:    CPU time = 6720 ms,  elapsed time = 2122 ms.

As you can see, my code has no scan counts or logical reads on the 'Worktable' and has exactly half the scan count and logical reads on the 'Transactions' table.

Footnotes

1. The main reason for this restriction is that the OP didn't specify how he wanted to handle this situation, so my solution did not account for it. At some point, I may come back and work out a solution that removes this restriction.

2. I also considered using a different way to calculate the beginning of the range that would not require using nested queries. The beginning of the range is SUM(ABS(Amount)) OVER(PARTITION BY CustID, TransType ORDER BY TransID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING). I didn't test to see whether this would have been faster than simply subtracting.

3. Remember that the sort order is running total and then transaction type. This means that if a credit and debit have the same running total, the credit will come before the debit.

4. There is a bug with some versions of SQL Server that requires using the MAXDOP query hint to guarantee correct results.

References

1. Original question:Credits and Debits

2. J. Livingston's first solution

3. ChrisM's solution to a similar problem

4. drew.allen's original solution

5. drew.allen's corrected solution

6. Connect item with the parallelism bug discussion.

7. Thread where the bug is discussed.

Solving FIFO Queues Using Windowed Functions

Introduction

Expectations

Data Setup

Preliminary Solution

Breakthrough

How the Code Works

DebitDate

LastDebit

PrevTot

Main Query

Performance Testing

Footnotes

References

Resources

Rate

Share

Categories

Share

Rate

Solving FIFO Queues Using Windowed Functions

Introduction

Expectations

Data Setup

Preliminary Solution

Breakthrough

How the Code Works

DebitDate

LastDebit

PrevTot

Main Query

Performance Testing

Footnotes

References

Resources

Rate

Share

Categories

Share

Rate

Related content

All the rows before this one

SQL Server : Usage of OVER Clause

Tame Those Strings - Using SUBSTRING

Displaying Execution Plans

Tame Those Strings! Part 2 - Using CHARINDEX