In Accounting, a common occurrence when building reports that sum vast numbers of transactions is for a transaction to be accidentally counted more than once. This typically only happens when you are summing summary transactions. The challenge is to identify the double counted transactions.
Double counting as defined by Wiki is: “an error whereby a transaction is counted more than once, for whatever reason.”
Let’s create an example for clarification. Since our audience is SQL-folk, we will build our example in T-SQL. You may ignore the Sign column for now because it will be used in a later example. For those of you that don’t want to copy/paste each code snippet that follows into SSMS, you may download the script attached to this article.
CREATE TABLE #Trans ( ID INT IDENTITY(10001,1) , Amount MONEY , [Sign] CHAR(1) ) ; INSERT INTO #Trans (Amount) SELECT 27.87 UNION ALL SELECT 465.23 UNION ALL SELECT 100.44 UNION ALL SELECT 23.99 UNION ALL SELECT 156.55 UNION ALL SELECT 234.32 UNION ALL SELECT 322.11 UNION ALL SELECT 59.99 UNION ALL SELECT 184.42 UNION ALL SELECT 201.74 SELECT SUM(Amount) FROM #Trans
The result displayed by the final SELECT is $1,776.66. Suppose we have a reference total, meaning that we know what the sum of the transactions should be, and that this amount is $1,592.24. The difference between the sum of our transactions and our reference total is $184.42.
By inspection it is pretty easy to see that one of our transactions is exactly that amount, so according to our description of this problem it is probably being double counted. Not quite as easily identified by inspection is the fact that we have two other transactions, that when combined also equal the difference: $27.87 and $156.55. Now we have identified two potential candidates being double counted.
Of course, if you are a savant you may have also noticed that the amounts: $100.44, $23.99 and $59.99 also sum to $184.42. What, you missed that one? Thinking about it, identifying cases where even doublets pair up to match our difference across a few hundred rows could be quite a challenge. Ideally we would like to write a script to do this. Having spent many years supporting Accounting applications, I can attest to the ridiculous number of hours I’ve spent manually reconciling reports looking for double counted entries.
A General Solution
We wish to write a script where any unique combination of singlets, doublets, triplets, etc. of transactions equals the difference between our transaction total and our reference total. Hmmm… combinations of doublets, triplets… sounds like n-Tuples. This is very familiar to us.
Not too very long ago, I was proud to have SQLServerCentral publish an article called Generating n-Tuples with SQL. The second script in that article is exactly what we need for this solution. After closer examination of those scripts as applied to other solutions, some performance improvements were suggested by friends (specifically Chris Morris) and I also came up with a few of my own and posted a better version in that article’s discussion thread.
-- Problem #1: Identify potentially double counted transactions -- Solution #1a: Use UNIQUEnTuples to enumerate the combinations and -- keep only those that qualify DECLARE @ReferenceTotal MONEY = 1592.24 ,@Delimiter CHAR(1) = ',' ,@ActualTotal MONEY = (SELECT SUM(Amount)FROM #Trans) DECLARE @AmountOfInterest MONEY = @ActualTotal - @ReferenceTotal ;WITH UNIQUEnTuples (n, Tuples, ID, Amount) AS ( SELECT 1, CAST(ID AS VARCHAR(8000)), ID, Amount FROM #Trans WHERE Amount <= @AmountOfInterest UNION ALL SELECT 1 + n.n, CAST(t.ID AS VARCHAR(8000)) + @Delimiter + n.Tuples ,t.ID, t.Amount + n.Amount FROM UNIQUEnTuples n CROSS APPLY (SELECT t.ID, Amount FROM #Trans t WHERE t.ID < n.ID ) t WHERE t.Amount + n.Amount <= @AmountOfInterest ) SELECT n, [Double Counted Tuples]=Tuples, Amount FROM UNIQUEnTuples WHERE Amount = @AmountOfInterest
The results when running this script enumerates all combinations that could potentially be double counted. The column n is the number of recursions required to identify that particular solution.
n Double Counted Tuples Amount 1  184.42 3  184.42 2  184.42
Note that this script will work even when some of the transactions are negative.
If you think you might have combinations that exceed 100-Tuples, you’ll need to add OPTION(MAXRECURSION 0) to the query. Normally though, the order of the Tuples we’ll be looking for is relatively small, and this is fortunate because with higher order Tuples the script will be quite slow. This may occur when you have many small quantities in your list of transactions.
In any event, you may limit the WHERE clause in the recursive leg of UNIQUEnTuples, such as “AND n < 10” and the recursion depth (and thus the run time of the script) will not get out of hand.
Alas, your work is not yet done. All that our script did was to identify potentially double counted transactions. You must still do some analysis to determine which Tuple was actually double counted (they all cannot be).
Relationship to Bin Packing
While it may not be readily apparent, this is a bin packing problem of sorts. If we think of each summary account as an assembly (grouping) of products to be placed in a bin, we seek to make sure that the contents of the bin contain only unique (atomic) products. An assembly may be a proper subset of another; however an assembly may not contain atomic items that are in another assembly along with unique (atomic) items.
The Converse of this Problem
Suppose that instead of trying to identify which items to exclude, we want to identify the list of transactions that we should potentially include to achieve our reference total. This is the converse of the double counting problem and can be thought of as a bin unpacking problem (i.e., remove items from the bin if they are duplicates).
To solve the converse problem, we can start with our solution to the double counting problem, with the minor modification of INSERTing the results into a temporary table and adding a solution number column.
-- Problem #2: Identify transaction subsets that sum to a specified reference total -- Solution #2-Step1: Using the same approach to identifying double counted -- transactions and put these results into a temp table (using -- ROW_NUMBER() to uniquely identify each solution ;WITH UNIQUEnTuples (n, Tuples, ID, Amount) AS ( SELECT 1, CAST(ID AS VARCHAR(8000)), ID, Amount FROM #Trans WHERE Amount <= @AmountOfInterest UNION ALL SELECT 1 + n.n, CAST(t.ID AS VARCHAR(8000)) + @Delimiter + n.Tuples ,t.ID, t.Amount + n.Amount FROM UNIQUEnTuples n CROSS APPLY (SELECT t.ID, Amount FROM #Trans t WHERE t.ID < n.ID) t WHERE t.Amount + n.Amount <= @AmountOfInterest ) SELECT [Soln No]=ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ,[Double Counted Tuples]=Tuples, Amount INTO #Temp FROM UNIQUEnTuples WHERE Amount = @AmountOfInterest
Now, we will remove this list of items from our original result set, needing to do this for each potential solution that we produced. This is accomplished using some strategic CROSS APPLYs and EXCEPT.
-- Problem #2: Identify transaction subsets that sum to a specified reference total -- Solution #2-Step2: Eliminate the potentially double counted transactions for -- each identified solution from the original transaction set -- and then group the remaining (included) transactions ;WITH MyTrans AS ( SELECT [Soln No], ID FROM #Temp CROSS APPLY #Trans EXCEPT SELECT [Soln No], Item FROM #Temp CROSS APPLY dbo.DelimitedSplit8K([Double Counted Tuples], @Delimiter) ) SELECT [Soln No] ,IncludedIDs=STUFF(( SELECT ',' + CAST(ID AS VARCHAR(10)) FROM MyTrans b WHERE a.[Soln No] = b.[Soln No] ORDER BY ID FOR XML PATH('')), 1, 1, '') FROM MyTrans a GROUP BY [Soln No]
The STUFF/FOR XML PATH/GROUP BY magic is simply the standard method to concatenate strings from multiple rows of the result set, while DelimitedSplit8K is the famous, warp-speed string splitter popularized by SQL MVP Jeff Moden is his Tally Oh! article. The results produced look like this:
Soln No IncludedIDs 1 10001,10002,10003,10004,10005,10006,10007,10008,10010 2 10001,10002,10005,10006,10007,10009,10010 3 10002,10003,10004,10006,10007,10008,10009,10010
You should be able to see by inspection that the results are the converse of the result set of the double counting problem.
Identification of Sign Reversals
An equally annoying, related and somewhat more complex problem is identifying when transactions have been included with a reversal of sign. Here is where the Sign column in our #Trans table comes into play.
In most general ledger systems, there will be a series of accounts that are typically referred to as “Gain/Loss” accounts. The values populating these particular account types can represent:
- The gain or loss on the sale of an asset, specifically the difference between the book value at the time of sale and the proceeds of the sale, taking into account the salvage value of the asset.
- A currency translation gain or loss.
There are other examples, but these will suffice. The amounts entered may be positive or negative, and when you’re rolling them up to get a total (e.g., net before tax, commonly known as NBT), you frequently have to apply a sign to the transaction.
Let’s truncate the data from our sample and populate the table with a new set of values, this time including a sign. For this example, we’ll use nicely rounded numbers to slightly simplify the math, but typically you won’t be afforded that luxury.
TRUNCATE TABLE #Trans INSERT INTO #Trans (Amount, [Sign]) SELECT 100, '+' -- [Assigned ID is 10001] UNION ALL SELECT -100, '+' -- [Assigned ID is 10002] UNION ALL SELECT 200, '-' -- [Assigned ID is 10003] UNION ALL SELECT -200, '-' -- [Assigned ID is 10004] UNION ALL SELECT 300, '-' -- [Assigned ID is 10005] UNION ALL SELECT -300, '+' -- [Assigned ID is 10006] UNION ALL SELECT 400, '+' -- [Assigned ID is 10007] UNION ALL SELECT -400, '-' -- [Assigned ID is 10008] UNION ALL SELECT 800, '+' -- [Assigned ID is 10009] UNION ALL SELECT -800, '-' -- [Assigned ID is 10010] SELECT * FROM #Trans SELECT SUM(CASE [Sign] WHEN '+' THEN Amount ELSE -Amount END) FROM #Trans
The actual total returned by the final SELECT is $1,800. Let’s suppose that the entry: 300, ‘-‘ (transaction ID 10005) has had the wrong sign applied to it. In that case, the total should be $2,400. To see this total, you can change the sign on that transaction and run the above script again. We’ll call the latter our reference total. Note that the difference between the sum of the transactions and the reference total is off by two times the unsigned amount of the incorrectly signed singlet.
Once again, if you need to identify this transaction by inspection it is possible to do so; however the exercise becomes increasingly frustrating and virtually impossible if doublets, triplets and higher order n-Tuples must be considered. So let’s let our SQL Server do the work for us.
Since as I said above, the two problems are related, we’ll modify the UNIQUEnTuples script to handle this case.
SELECT @ReferenceTotal = 2400 ,@ActualTotal = SUM(CASE [Sign] WHEN '+' THEN Amount ELSE -Amount END) FROM #Trans -- Problem #3: Identify transactions with reversed signs -- Solution #3: Create the Tuples based on subtracting 2* the amount from -- the actual total. ;WITH UNIQUEnTuples (n, Tuples, ID, Total) AS ( SELECT 1, CAST(ID AS VARCHAR(8000)), ID ,@ActualTotal - 2 * CASE [Sign] WHEN '+' THEN Amount ELSE -Amount END FROM #Trans UNION ALL SELECT 1 + n.n, CAST(t.ID AS VARCHAR(8000)) + ',' + n.Tuples, t.ID ,n.Total - 2 * CASE [Sign] WHEN '+' THEN t.Amount ELSE -t.Amount END FROM UNIQUEnTuples n CROSS APPLY ( SELECT ID, Amount, [Sign] FROM #Trans t WHERE t.ID < n.ID) t -- Limit the recursion depth (to 3-Tuples) WHERE n < 3 ) SELECT n, [Reversed Sign Tuples]=Tuples, Total FROM UNIQUEnTuples WHERE Total = @ReferenceTotal
Note that the Total column’s calculation in both anchor and recursive legs of the rCTE has been modified to account for twice the amount of each transaction.
We have limited our recursion depth for this exercise to 3 to produce no more than 3-Tuples for the sole reason that the analysis spreadsheet shown below would get quite wide with additional results. There are combinations present above 3-Tuples that represent potential Tuples of reversed signs.
The results displayed from the above script are:
N Reversed Sign Tuples Total 1 10005 2400.00 1 10006 2400.00 3 10003,10004,10006 2400.00 3 10001,10002,10006 2400.00 3 10003,10004,10005 2400.00 3 10001,10002,10005 2400.00 2 10002,10003 2400.00
The singlet  we know about, but we’ll let Excel help us check all of the results anyhow (that workbook is also in the attached file so you can review the cell formulas to better follow the reconciliation).
We have now confirmed that our approach works for identifying potential candidates for combinations of sign reversed transactions. Once again, further analysis is required to identify which Tuple is actually sign reversed.
If you are not concerned with a general solution to double counting that enumerates all n-Tuples, you can generate the individual singlets, doublets or triplets with separate SELECT statements. This solution for the double counting analysis is shown below. UNION ALL and a little adjustment is used to combine the three query results into a single results set identical to the one produced by the general solution.
Note: If you are copying/pasting the SQL above and running the queries in the sequence of this article, be sure to restore the original data at this point. All subsequent queries will use the original data, as none of the alternate solutions we’ll describe are for the sign reversal problem. Sequencing within the downloadable script takes this into account (i.e., the SQL solution to problem #3 appears last).
-- Problem #1: Identify potentially double counted transactions -- Solution #1b: Alternate solution for double counting adhoc analysis -- using n UNIONed SELECTs, one for each Tuple SELECT n=1, [Double Counted Tuples]=CAST(ID AS VARCHAR(8000)) ,Amount FROM #Trans WHERE Amount = @AmountOfInterest UNION ALL -- Now add in the doublets SELECT n=2 ,CAST(a.ID AS VARCHAR(8000)) + ',' + CAST(b.ID AS VARCHAR(8000)) ,Amount=a.Amount + b.Amount FROM #Trans a INNER JOIN #Trans b ON a.ID < b.ID AND a.Amount + b.Amount <= @AmountOfInterest WHERE a.Amount + b.Amount = @AmountOfInterest UNION ALL -- And finally the triplets SELECT n=3 ,CAST(a.ID AS VARCHAR(8000)) + ',' + CAST(b.ID AS VARCHAR(8000)) + ',' + CAST(c.ID AS VARCHAR(8000)) ,Amount=a.Amount + b.Amount + c.Amount FROM #Trans a INNER JOIN #Trans b ON a.ID < b.ID AND a.Amount + b.Amount <= @AmountOfInterest INNER JOIN #Trans c ON b.ID < c.ID AND a.Amount + b.Amount + c.Amount <= @AmountOfInterest WHERE a.Amount + b.Amount + c.Amount = @AmountOfInterest
Each higher order Tuple requires an additional self-JOIN on the transaction table. What this approach lacks in finesse and flexibility, it makes up for in performance. Resolving any particular order of n-Tuple will be much quicker, especially for higher order Tuples, as intermediate results need not be generated. The downside risk is, without enumerating through all the possible levels of Tuples, a match on a higher order n-Tuple may get missed. Of course, I did say earlier that usually you will be looking for a lower order n-Tuple match, so this may represent a minimal risk.
The same approach can be made to work using cascading CROSS APPLYs.
-- Problem #1: Identify potentially double counted transactions -- Solution #1c: Alternate solution for double counting adhoc analysis -- using n UNIONed CROSS APPLYs, one for each Tuple SELECT n=1, [Double Counted Tuples]=CAST(ID AS VARCHAR(8000)) ,Amount FROM #Trans WHERE Amount = @AmountOfInterest UNION ALL -- Now add in the doublets SELECT n=2 ,CAST(a.ID AS VARCHAR(8000)) + ',' + CAST(b.ID AS VARCHAR(8000)) ,Amount=a.Amount + b.Amount FROM #Trans a CROSS APPLY ( SELECT ID, Amount FROM #Trans b WHERE a.ID < b.ID AND a.Amount + b.Amount <= @AmountOfInterest) b WHERE a.Amount + b.Amount = @AmountOfInterest UNION ALL -- And finally the triplets SELECT n=3 ,CAST(a.ID AS VARCHAR(8000)) + ',' + CAST(b.ID AS VARCHAR(8000)) + ',' + CAST(c.ID AS VARCHAR(8000)) ,Amount=a.Amount + b.Amount + c.Amount FROM #Trans a CROSS APPLY ( SELECT ID, Amount FROM #Trans b WHERE a.ID < b.ID AND a.Amount + b.Amount <= @AmountOfInterest) b CROSS APPLY ( SELECT ID, Amount FROM #Trans c WHERE b.ID < c.ID AND a.Amount + b.Amount + c.Amount <= @AmountOfInterest) c WHERE a.Amount + b.Amount + c.Amount = @AmountOfInterest
Performance characteristics should be similar to the prior example using repetitive self-JOINs.
An Alternate General Solution
Jeff Moden, bless his heart and in all of his brilliance, once made a statement to me about recursive CTEs that flashed on the proverbial light bulb. The discussion that led to that epiphany can be found deep in the discussion thread of my Generating n-Tuples with SQL article. Essentially, any rCTE can be rewritten as a set-based WHILE loop.
So let’s try that with our sample data for the double counting scenario.
DECLARE @RowCount INT, @n INT = 1 CREATE TABLE #Temp2 (n INT, Tuples VARCHAR(8000), Amount MONEY, ID INT) -- Generate first result set (equivalent to anchor leg) into temp table INSERT INTO #Temp2 SELECT @n, CAST(ID AS VARCHAR(8000)), Amount, ID FROM #Trans WHERE Amount <= @AmountOfInterest SELECT @RowCount = @@ROWCOUNT -- WHILE loop simulates recursive leg of the rCTE WHILE @RowCount <> 0 BEGIN SELECT @n = @n + 1 INSERT INTO #Temp2 SELECT @n ,CAST(a.ID AS VARCHAR(8000)) + ',' + Tuples ,Amount=a.Amount + b.Amount ,a.ID FROM #Trans a INNER JOIN #Temp2 b ON b.n = @n - 1 AND a.ID < b.ID AND a.Amount + b.Amount <= @AmountOfInterest SELECT @RowCount = @@ROWCOUNT END -- Finally SELECT out based on the amount of interest SELECT n, [Double Counted Tuples]=Tuples, Amount FROM #Temp2 WHERE Amount = @AmountOfInterest DROP TABLE #Temp2
Clearly a bit more verbose than the rCTE solution but at least it won’t miss any higher order Tuples if they exist. The point though is to see if this performs better. So we built a test harness for this case (also available in the download) and ran it multiple times to see what the timing and IO comparisons were. These results are summarized below.
CPU Elapsed UNIQUEnTuples 28205 28635 Multiple INNER JOINs 28860 25688 Multiple CROSS APPLYs 27876 25137 WHILE LOOP 13073 13136
We did our best to ensure comparability between the way the rCTE resolves the n-Tuples and for the set-based WHILE loop. We did not play around too much with alternative constructions that might improve the performance of one solution over the other. No doubt there probably are some variants that might prove faster than the code provided, but it is also likely that the same variant applied to both the rCTE and the set-based WHILE loop would improve both solutions equally.
In this case, the set-based WHILE loop was considerably faster and just as flexible as the rCTE. It was also much faster than the INNER JOIN and CROSS APPLY approaches, which do not have the same flexibility. This will often be the case when comparing an rCTE against a well-crafted set-based WHILE loop.
Final Word and Conclusions
In this article we’ve learned just a little bit about bin packing and how to use SQL scripts to identify:
- Potentially double counted transactions
- Transaction subsets that sum to a specified reference total
- Transactions whose signs are potentially, incorrectly reversed
We purposely omitted providing alternate solution scripts for the all but the first example, thinking that this would be a good exercise for our valued readers to perform as a test of your understanding.
It’s almost a shame (not really! J) that I no longer support Accounting applications because these scripts would undoubtedly save me countless hours of manual reconciliation.
The solution that you choose for your application is of course up to you and will depend on your needs regarding flexibility vs. performance. Be wary of the rCTE solutions in cases where you need to resolve out to higher order n-Tuples, because they will run for long periods due to the humongous results sets that they generate internally. But then again, so will the set-based WHILE loop.
It is interesting to note that when I wrote the Generating n-Tuples with SQL article, I never even realized that the UNIQUEnTuples rCTE could be applied to this particular problem, probably because it has been a long time since I’ve needed to do this.
Until next time valued reader, we appreciate you taking your precious time examine this application with us.