Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Ctrl-alt-geek

Matt Bowler is a DBA at Trade Me, and teaches database design and administration courses at the local institute of technology. A recent but avid discoverer of SQL Server, Matt is a regular contributor to forums at MSDN, Experts Exchange and SSC and he blogs at mattsql.wordpress.com.

Conditional Aggregates

The Problem: A legacy table contains amounts and a char column indicating whether the amount is a credit or a debit. We want an aggregate (over a given range) of the amounts. We need to add the credits and subtract the debits.

The Solution(s): Two possible approaches to the problem suggest themselves. One is to sum all of the credits and all the debits and them subtract the total debit from the total credit. The second approach is to include a case statement inside the aggregate function to add or subtract the given row’s amount based on the credit/debit column.

The Code: Implement a cut down version of the table with some contrived sample data:

create table tbl_ledger
(
	id int identity(1,1),
	credit_debit char(1),
	amount money,
	[description] varchar(128) 
)

insert tbl_ledger(credit_debit,amount,[description])
values 
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment')

This table has 10 rows, 7 are $10 credits and 3 are $10 debits so we would expect the sum to return $40 ($70 credit – $30 debit).

Implement option 1, subtract the aggregates (this could be done a number of ways, including using variables to hold interim results, or common table expressions. I’ve chosen a sub query as I think it reads well. The query plans for sub queries and cte’s are similar, using variables muddies the waters a bit):

select sum(amount) - (select sum(amount) as [debit]
			from tbl_ledger 
			where credit_debit = 'D'
			)
as [credit]
from tbl_ledger 
where credit_debit = 'C'

As hoped this returns 40.

Implement option 2 using a case statement inside the sum function:

select
	sum(case
			when credit_debit = 'C' then amount
			when credit_debit = 'D' then -1 * amount
		end) as [total cash]
from tbl_ledger

This again returns a correct result.

The Analysis: To me option 2 is a more elegantly coded solution, but how does it stack up performance wise? To get some measurable results rerun the insert multiple times.

insert tbl_ledger(credit_debit,amount,[description])
values 
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment'),
	('D',10,'Refund'),
	('C',10,'Payment'),
	('C',10,'Payment')
go 100000

And run the two queries side by side with statistics on:

set statistics io on
set statistics time on

select
	sum(case
			when credit_debit = 'C' then amount
			when credit_debit = 'D' then -1 * amount
		end) as [total cash]
from tbl_ledger

select sum(amount) - (select sum(amount) as [debit]
						from tbl_ledger 
						where credit_debit = 'D'
						)
as [credit]
from tbl_ledger 
where credit_debit = 'C' 

set statistics io off
set statistics time off

I got the following:

(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 1, logical reads 4167, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 295 ms.

(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 10, logical reads 8334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 130 ms.

With the execution plans:

QPlan1

The sub query outperforms the conditional aggregate primarily because the query can be parallelised. The conditional aggregate does a single table scan versus two table scans for the sub query, 4167 logical reads versus 8344 logical reads – but when I ran the tests against cold caches the read ahead mechanism pulled all data pages into cache for both queries so I didn’t notice a significant change in the comparative result.

Running the second query with a maxdop hint brings the results closer:

(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 1, logical reads 4167, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 347 ms.

(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 2, logical reads 8334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 249 ms, elapsed time = 296 ms.

And drops the parallel operators:

QPlan2

The Conclusion: After analysis it seems that the sub query out performs the conditional aggregate option – even if we suppress parallelism. That said these tests were on a cut down table, and a wider table with more rows might bring IO more into play – especially if indexes are added to the mix. Adding an index on the credit_debit column with amount as an included column speeds up the sub query even more – replacing the two table scans with index seeks.


Comments

Leave a comment on the original post [mattsql.wordpress.com, opens in a new window]

Loading comments...