March 6, 2015 at 8:04 am
I am currently reading through Itzik Ben-Gan's "Microsoft SQL Server 2012 High-Performance T-SQL using Windows Functions." In attempt to test the SUM OVER() function in SQL 2008 because that's what I've got. I do not currently have sample data (trying to generate it has become a major PITA), but I have some pseudocode. I'm hoping someone can give me a general answer as to why I'm having a problem. A "what I am missing" kind of thing.
My current code (actual production code) pulls a bunch of ITD (inception to date) contracts then calculates a certain dollar amount based on monthly changes. Not all contracts have values during a given month, so here's what I cobbled together a few months ago. (Per our finance team, these numbers ARE accurate).
WITH MonthlyVals AS
(SELECT ContractID, SUM(Col1 - (Col2 + Col3 + Col4 + Col5)) AS MyTotal
FROM MyTable
WHERE MyDate >= @ThisMonthStartDate AND MyDate <= @ThisMonthEndDate
AND StatementType IN (8,4,2)
SELECT DISTINCT t1.ContractID, <n columns>, ISNULL(t2.MyTotal,0.00)
FROM MyTable1 t1
LEFT OUTER JOIN MonthlyVals t2
ON t1.ContractID = t2.ContractID
...<many left joined tables>
Redoing this code using the SUM OVER() function, I come up with this:
SELECT DISTINCT t1.ContractID, <n columns>,
ISNULL(SUM(t2.Col1 -
(t2.Col2 + t2.Col3 + t2.Col4 + t2.Col5)),0.00) OVER(PARTITION BY t2.ContractID)
AS MyTotal
FROM MyTable1 t1
LEFT OUTER JOIN MyTable t2
ON t1.ContractID = t2.ContractID
AND MyDate >= @ThisMonthStartDate AND MyDate <= @ThisMonthEndDate
AND StatementType IN (8,4,2)
...<many left joined tables>
To test the totals, I also added a COMPUTE SUM(MyTotal) to the end of each query. (Yes, I know COMPUTE is deprecated. Just wanted a quick check.). The difference between the two bits of code was over 68k, with the SUM OVER() code coming up with a total higher than the CTE code.
Can someone tell me where I went wrong or what I'm not taking into consideration here? I know CTE code is correct for a fact. It went through extensive testing before getting put in Production. Is it the way I joined the table for the SUM OVER()? Or is it the use of PARITION BY?
Thoughts? Comments?
March 6, 2015 at 8:26 am
Quick thought, the difference is mainly that the first code snip implies a group by which translates to a distinct aggregation of each set members by group, the over clause preserves the details otherwise masked by the grouping. Difference in the output could simply be equal to on the one hand being
(number of members in a set) x (sum of values of each member)
and the other
(sum of values)
😎
March 6, 2015 at 8:32 am
Eirikur Eiriksson (3/6/2015)
Quick thought, the difference is mainly that the first code snip implies a group by which translates to a distinct aggregation of each set members by group, the over clause preserves the details otherwise masked by the grouping. Difference in the output could simply be equal to on the one hand being(number of members in a set) x (sum of values of each member)
and the other
(sum of values)
Hmm. Good point. I'll take a better look at the queries and see what I can find.
March 6, 2015 at 8:51 am
Turns out I have 244 rows that are sorta duplicated (dups, but not true dups when I look at some of the more unique columns).
So this means there is yet another nifty SQL Server thing that I will hardly use because it doesn't play well with our data needs. SIGH.
And here I thought I could chop some code.
March 8, 2015 at 7:02 pm
I tend to avoid the SUM() OVER and related functions for another reason.
The Performance of the T-SQL Window Functions[/url]
Except in cases where a query is really complex already and the pre-aggregation recommended is extremely awkward.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
March 9, 2015 at 6:26 am
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply