Stairway to T-SQL DML Level 7: Summarizing Data Using a Simple GROUP BY Clause

  • Greg Larsen

    SSC-Insane

    Points: 20605

    Comments posted to this topic are about the item Stairway to T-SQL DML Level 7: Summarizing Data Using a Simple GROUP BY Clause

    Gregory A. Larsen, MVP

  • bojan.jekic 78340

    SSC Rookie

    Points: 33

    I noticed an interesting feature when I do "Grouping by Two Columns" on table with PrimaryKey (Clustered Index Unique):

    select ProductID,Shelf,sum(Quantity)

    from Production.ProductInventory

    group by ProductID,Shelf

    In this case, the result is automatically sorted by the second column from GroupBy clause “Shelf”. Is not that strange? In certain situations, the ExecutionPlan shows that SQL Engne executes the illogical/invert sort. Why executes any sort?

    This is not the case when we do "Grouping by Tree Columns or More".

  • Tim Curtin

    Say Hey Kid

    Points: 665

    Good article...

    One comment though...

    I would not filter on a row value in a having statement, the code is wasting CPU cycles and too much memory swapping.

    The last example would be an illustration of that.

    SELECT StoreName

    ,SUM(TotalSalesAmount) AS StoreSalesAmount

    FROM dbo.SalesTransaction

    GROUP BY StoreName

    HAVING StoreName LIKE '%Outlet%' <==== NOT GOOD

    OR StoreName LIKE '%Books%';

    This should be better written like:

    SELECT StoreName

    ,SUM(TotalSalesAmount) AS StoreSalesAmount

    FROM dbo.SalesTransaction

    WHERE StoreName LIKE '%Outlet%' <=== This way the row is not even loaded in to memory to be aggregated.

    OR StoreName LIKE '%Books%';

    GROUP BY StoreName

  • DouglasH

    SSCommitted

    Points: 1669

    I think the following sentence is wrong or could be misinterpreted-

    If you use an expression in the GROUP BY clause the same exact expression must be used in the SELECT list.

    I think it means that a column used in an expression in the GROUP BY can't be in the SELECT list unless the expression itself is used.

    My point is that it doesn't have to be in the SELECT list.

    for example, SELECT MAX(SalesDateTime)

    FROM dbo.SalesTransaction

    GROUP BY CONVERT(CHAR(10),SalesDateTime,101);

    or SELECT COUNT(*)

    FROM dbo.SalesTransaction

    GROUP BY CONVERT(CHAR(10),SalesDateTime,101);

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply