A discussion about a forum post for handling a running subtraction in a SQL query.
For my first forum fun post, I’m going to discuss a running calculation SQL question posted by a user, asyssolvers, on the SQLTeam website here. Here’s the question [slightly shortened]:
I need to get result as in second chart. Basically I have a total of received quantity in each line and I need to show received quantity against Expected Quantity, if there is any shortage I need to show in last line. First Result Set [Chart]: Item ExpectedQty ReceivedQty Short Item01 30 45 5 Item01 20 45 5 Item02 40 38 2 item03 50 90 10 item03 30 90 10 item03 20 90 10 Query for first Result Set: select a.Item, a.ExpectedQty,b.ReceivedQty, b.Short from a join b on a.Item = b.Item Second Result Set [Chart]: Item ExpectedQty ReceivedQty Short item01 30 30 0 item01 20 15 5 item02 40 38 2 item03 50 50 0 item03 30 30 0 item03 20 10 10
Below is the query I came up with as a response. But first the disclaimer: This query will not work in situations where the Expected table has many records because of the < in the join condition for the derived table. If there are under 300,000 or so rows in the Expected table, then this query will work without issue. Once there’s more than 500,000 to 600,000 records, it starts to get somewhat slow, and it shouldn’t be used if there are over a few million records. In other words, it doesn’t scale very will. Depending on the task at hand, this may be a deal breaker or no big deal.
-- Use table names Expected and Received instead of a and b -- First, set up the tables and add dummy data to work with: CREATE TABLE Expected(dt datetime, Item VARCHAR(10), ExpectedQty INT) CREATE TABLE Received(Item VARCHAR(10), ReceivedQty INT, Short INT) INSERT INTO Expected VALUES('1/1/2011','Item01','30') INSERT INTO Expected VALUES('1/2/2011','Item01','20') INSERT INTO Expected VALUES('1/1/2011','Item02','40') INSERT INTO Expected VALUES('1/1/2011','Item03','50') INSERT INTO Expected VALUES('1/2/2011','Item03','30') INSERT INTO Expected VALUES('1/3/2011','Item03','20') INSERT INTO Received VALUES('Item01','45', '5') INSERT INTO Received VALUES('Item02','38', '2') INSERT INTO Received VALUES('Item03','90', '10') -- The primary query: SELECT ex.item, ex.ExpectedQty, ex.ExpectedQty - CASE WHEN Short > threshholdForShort AND Short-threshholdForShort <= ExpectedQty THEN Short-threshholdForShort WHEN Short > threshholdForShort AND Short-threshholdForShort > ExpectedQty THEN ExpectedQty ELSE 0 END AS 'ReceivedQty', CASE WHEN Short > threshholdForShort AND Short-threshholdForShort <= ExpectedQty THEN Short-threshholdForShort WHEN Short > threshholdForShort AND Short-threshholdForShort > ExpectedQty THEN ExpectedQty ELSE 0 END AS 'TotalShort' FROM ( -- This derived table gives an indication of how significant --the 'Short' value needs to be before it impacts this particular record. SELECT e1.dt, e1.item, isnull(SUM(e2.ExpectedQty), 0) AS 'threshholdForShort' FROM Expected e1 LEFT JOIN Expected e2 ON e1.item = e2.item AND e1.dt < e2.dt GROUP BY e1.dt, e1.item ) minShortVal INNER JOIN Expected ex ON ex.item = minShortVal.item AND ex.dt = minShortVal.dt INNER JOIN Received rec ON minShortVal.item = rec.item
How did I come up with this query? After reading the forum question, my first task was to get a solid understanding of the requirements. After reviewing the provided sql and output tables, I soon realized that one ReceivedQty value for an item in the first result set was the sum of the expected quantity values minus one of the short values. I next needed some sort of ordering for the records [none was provided in the question, but I assumed one was likely available - either a date or an identity field]. I decided to use a date field for my response. The ordering is needed so the received values can be matched to the expected quantities in a certain order [earlier records match first, later records match afterwards].
With a better understanding of the problem, I went about coming up with a solution. I first created the tables and some dummy data to work with. I tweaked the data until the original simple query returned the first result set in the original question.
I needed a query to keep track of the order, so I came up with the derived table in the above solution to handle this. For example, if the data is the following [with a combined expected quantity of 170] and the quantity short is 100, then the thresholds are:
order item ExpectedQty ReceivedQty Short ThreshholdForShort 1 xyz 20 20 0 170-20 = 170 2 xyz 40 40 0 170-(20+40) = 110 3 xyz 30 10 20 170-(20+40+30) = 80 4 xyz 50 0 50 170-(20+40+30+50) = 30 5 xyz 30 0 30 0
Here, we would have to be 170 short to fully impact record 1, and 170-20 = 150 or lower to safely guarantee that record 1 isn’t short at all. In this example, since we are 100 short, record 3 is partially short, while for the records 1 and 2 we matched the expected to the received while for the latter records all the quantity was short/unfulfilled. With this logic in place, I then came up with the sql in the case statements to show the short and received quantities as appropriate.
After building out the query and testing, I was able to get the data from the second result set. A success! I tested a few variations of the input data just to make sure I got the logic right and didn’t miss anything. All checks worked, so I posted the solution.
I’ve since done some additional performance testing of the query with tables of larger sizes, and while it doesn’t scale too well [see the disclaimer above], the approach can be useful in cases where the amount of data isn’t too great for problems similar in structure to this one.