RE: Really confused newbie - Perhaps a Temp Table query?

SSC Guru

Points: 86908

March 5, 2013 at 6:24 pm

Steven Willis (3/5/2013)
Another old habit to break if you are using SQL2008 or greater is to avoid GROUP BY when all you really need is an aggregated column value.
SELECT
A.Portfolio
,B.PortfolioID
,SUM(B.MarketValue) AS SumOfMV
,COUNT(B.SecID) AS [# of Securities]
FROM
ERTutAccounts A
INNER JOIN ERTutPositions B
ON A.PortfolioID = B.PortfolioID
GROUP BY
ERTutAccounts.Portfolio
,ERTutPositions.PortfolioID;
SELECT
A.Portfolio
,B.PortfolioID
,SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS SumOfMV
,COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS [# of Securities]
FROM
ERTutAccounts A
INNER JOIN ERTutPositions B
ON A.PortfolioID = B.PortfolioID
You can still use GROUP BY of course if you need to group the portfolios themselves but just to do sums or counts (or min or max, etc) the OVER() clause makes things a lot easier.

Steven,

Your first query needed a little correction (namely to the stuff in bold) to make it run (minor).

But my big question is why your are making the suggestion to unlearn an old habit here when the two queries produce different results. Knocking up some test data:

CREATE TABLE #ERTutAccounts

(PortfolioID INT IDENTITY(1,1), Portfolio VARCHAR(10))

INSERT INTO #ERTutAccounts

SELECT 'A' UNION ALL SELECT 'B' UNION ALL SELECT 'C'

CREATE TABLE #ERTutPositions

(PortfolioID INT, MarketValue MONEY, SecID INT)

INSERT INTO #ERTutPositions

SELECT 1, 50000, 1 UNION ALL SELECT 1, 100000, 2 UNION ALL SELECT 1, 150000, 3

UNION ALL SELECT 2, 30000, 1 UNION ALL SELECT 2, 175000, 2 UNION ALL SELECT 2, 250000, 3

UNION ALL SELECT 3, 40000, 1 UNION ALL SELECT 3, 150000, 2 UNION ALL SELECT 3, 250000, 3

SELECT

A.Portfolio

,A.PortfolioID

,SUM(B.MarketValue) AS SumOfMV

,COUNT(B.SecID) AS [# of Securities]

FROM

#ERTutAccounts A

INNER JOIN #ERTutPositions B

ON A.PortfolioID = B.PortfolioID

GROUP BY

A.Portfolio

,A.PortfolioID;

SELECT

A.Portfolio

,B.PortfolioID

,SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS SumOfMV

,COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS [# of Securities]

FROM

#ERTutAccounts A

INNER JOIN #ERTutPositions B

ON A.PortfolioID = B.PortfolioID

DECLARE @NumPortfolios INT = 100000

-- Performance test (add some rows)

;WITH Tally (n) AS (

SELECT TOP (@NumPortfolios) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

INSERT INTO #ERTutAccounts

SELECT Portfolio + CAST(n AS VARCHAR(10))

FROM #ERTutAccounts

CROSS APPLY Tally

;WITH Tally (n) AS (

SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

INSERT INTO #ERTutPositions

SELECT 1 + ABS(CHECKSUM(NEWID())) % (3*@NumPortfolios+3)

,20000 + ABS(CHECKSUM(NEWID())) % 250000

,1 + ABS(CHECKSUM(NEWID())) % 500

FROM Tally

DECLARE @Holder1 VARCHAR(10)

,@Holder2 INT

,@Holder3 MONEY

,@Holder4 INT

PRINT 'GROUP BY'

SET STATISTICS TIME ON

SELECT

@Holder1=A.Portfolio

,@Holder2=A.PortfolioID

,@Holder3=SUM(B.MarketValue) -- AS SumOfMV

,@Holder4=COUNT(B.SecID) -- AS [# of Securities]

FROM

#ERTutAccounts A

INNER JOIN #ERTutPositions B

ON A.PortfolioID = B.PortfolioID

GROUP BY

A.Portfolio

,A.PortfolioID;

SET STATISTICS TIME OFF

PRINT 'WINDOW AGGREGATE W-DISTINCT'

SET STATISTICS TIME ON

SELECT DISTINCT

@Holder1=A.Portfolio

,@Holder2=B.PortfolioID

,@Holder3=SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) -- AS SumOfMV

,@Holder4=COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) -- AS [# of Securities]

FROM

#ERTutAccounts A

INNER JOIN #ERTutPositions B

ON A.PortfolioID = B.PortfolioID

SET STATISTICS TIME OFF

DROP TABLE #ERTutAccounts

DROP TABLE #ERTutPositions

I'm thinking that the first result is what the OP was looking for. To make them identical, you'd need to change the second query to SELECT DISTINCT. Or you could do as you suggest which is to add GROUP BY, in which case you'd need to aggregate (using MAX or MIN) the window aggregates, but I don't think you can do that (at least not in this case).

My main intention by looking at this was that I have heard that some of the window aggregates don't perform as well as the corresponding GROUP BY notation (and I believe I've seen it proven at least once). So I thought I'd give that a try (also in the SQL above). The results I got were:

GROUP BY

SQL Server Execution Times:

CPU time = 483 ms, elapsed time = 217 ms.

WINDOW AGGREGATE W-DISTINCT

SQL Server Execution Times:

CPU time = 2151 ms, elapsed time = 774 ms

Which arguably might be at least partially due to adding DISTINCT, or maybe not. Whatever the reason, I think they're sufficiently different to take notice.

Edit: Oooh! Just noticed another case of parallelism introduced by SQL Server into a query for my study on parallelism!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St