• dwain.c (3/5/2013)


    Steven Willis (3/5/2013)


    Another old habit to break if you are using SQL2008 or greater is to avoid GROUP BY when all you really need is an aggregated column value.

    SELECT

    A.Portfolio

    ,B.PortfolioID

    ,SUM(B.MarketValue) AS SumOfMV

    ,COUNT(B.SecID) AS [# of Securities]

    FROM

    ERTutAccounts A

    INNER JOIN ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    GROUP BY

    ERTutAccounts.Portfolio

    ,ERTutPositions.PortfolioID;

    SELECT

    A.Portfolio

    ,B.PortfolioID

    ,SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS SumOfMV

    ,COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS [# of Securities]

    FROM

    ERTutAccounts A

    INNER JOIN ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    You can still use GROUP BY of course if you need to group the portfolios themselves but just to do sums or counts (or min or max, etc) the OVER() clause makes things a lot easier.

     

     

    Steven,

    Your first query needed a little correction (namely to the stuff in bold) to make it run (minor).

    But my big question is why your are making the suggestion to unlearn an old habit here when the two queries produce different results. Knocking up some test data:

    CREATE TABLE #ERTutAccounts

    (PortfolioID INT IDENTITY(1,1), Portfolio VARCHAR(10))

    INSERT INTO #ERTutAccounts

    SELECT 'A' UNION ALL SELECT 'B' UNION ALL SELECT 'C'

    CREATE TABLE #ERTutPositions

    (PortfolioID INT, MarketValue MONEY, SecID INT)

    INSERT INTO #ERTutPositions

    SELECT 1, 50000, 1 UNION ALL SELECT 1, 100000, 2 UNION ALL SELECT 1, 150000, 3

    UNION ALL SELECT 2, 30000, 1 UNION ALL SELECT 2, 175000, 2 UNION ALL SELECT 2, 250000, 3

    UNION ALL SELECT 3, 40000, 1 UNION ALL SELECT 3, 150000, 2 UNION ALL SELECT 3, 250000, 3

    SELECT

    A.Portfolio

    ,A.PortfolioID

    ,SUM(B.MarketValue) AS SumOfMV

    ,COUNT(B.SecID) AS [# of Securities]

    FROM

    #ERTutAccounts A

    INNER JOIN #ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    GROUP BY

    A.Portfolio

    ,A.PortfolioID;

    SELECT

    A.Portfolio

    ,B.PortfolioID

    ,SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS SumOfMV

    ,COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) AS [# of Securities]

    FROM

    #ERTutAccounts A

    INNER JOIN #ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    DECLARE @NumPortfolios INT = 100000

    -- Performance test (add some rows)

    ;WITH Tally (n) AS (

    SELECT TOP (@NumPortfolios) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #ERTutAccounts

    SELECT Portfolio + CAST(n AS VARCHAR(10))

    FROM #ERTutAccounts

    CROSS APPLY Tally

    ;WITH Tally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #ERTutPositions

    SELECT 1 + ABS(CHECKSUM(NEWID())) % (3*@NumPortfolios+3)

    ,20000 + ABS(CHECKSUM(NEWID())) % 250000

    ,1 + ABS(CHECKSUM(NEWID())) % 500

    FROM Tally

    DECLARE @Holder1 VARCHAR(10)

    ,@Holder2 INT

    ,@Holder3 MONEY

    ,@Holder4 INT

    PRINT 'GROUP BY'

    SET STATISTICS TIME ON

    SELECT

    @Holder1=A.Portfolio

    ,@Holder2=A.PortfolioID

    ,@Holder3=SUM(B.MarketValue) -- AS SumOfMV

    ,@Holder4=COUNT(B.SecID) -- AS [# of Securities]

    FROM

    #ERTutAccounts A

    INNER JOIN #ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    GROUP BY

    A.Portfolio

    ,A.PortfolioID;

    SET STATISTICS TIME OFF

    PRINT 'WINDOW AGGREGATE W-DISTINCT'

    SET STATISTICS TIME ON

    SELECT DISTINCT

    @Holder1=A.Portfolio

    ,@Holder2=B.PortfolioID

    ,@Holder3=SUM(B.MarketValue) OVER (PARTITION BY A.Portfolio, B.PortfolioID) -- AS SumOfMV

    ,@Holder4=COUNT(B.SecID) OVER (PARTITION BY A.Portfolio, B.PortfolioID) -- AS [# of Securities]

    FROM

    #ERTutAccounts A

    INNER JOIN #ERTutPositions B

    ON A.PortfolioID = B.PortfolioID

    SET STATISTICS TIME OFF

    DROP TABLE #ERTutAccounts

    DROP TABLE #ERTutPositions

    I'm thinking that the first result is what the OP was looking for. To make them identical, you'd need to change the second query to SELECT DISTINCT. Or you could do as you suggest which is to add GROUP BY, in which case you'd need to aggregate (using MAX or MIN) the window aggregates, but I don't think you can do that (at least not in this case).

    My main intention by looking at this was that I have heard that some of the window aggregates don't perform as well as the corresponding GROUP BY notation (and I believe I've seen it proven at least once). So I thought I'd give that a try (also in the SQL above). The results I got were:

    GROUP BY

    SQL Server Execution Times:

    CPU time = 483 ms, elapsed time = 217 ms.

    WINDOW AGGREGATE W-DISTINCT

    SQL Server Execution Times:

    CPU time = 2151 ms, elapsed time = 774 ms

    Which arguably might be at least partially due to adding DISTINCT, or maybe not. Whatever the reason, I think they're sufficiently different to take notice.

    Edit: Oooh! Just noticed another case of parallelism introduced by SQL Server into a query for my study on parallelism!

    You know Dwain, now if we could just get you to use the semicolon as a terminator instead of a begininator! 😛