Stairway to T-SQL Part 2: Beyond T-SQL Basics: Level 3: Building a Correlated Subquery

  • Comments posted to this topic are about the item Stairway to T-SQL Part 2: Beyond T-SQL Basics: Level 3: Building a Correlated Subquery

    Gregory A. Larsen, MVP

  • Thank you Gregory for this clear and concise article.

    I cannot resist pitching in some Window function version of the queries;

    ;WITH CUST_SALES_DETAILS AS

    (

    SELECT

    SSOH.CustomerID

    ,ROW_NUMBER() OVER

    (PARTITION BY SSOH.SalesOrderID ORDER BY (SELECT NULL)) AS SSOH_RID

    ,COUNT(SSOD.SalesOrderDetailID) OVER

    (PARTITION BY SSOH.SalesOrderID) AS COUNT_SSOD

    FROM Sales.SalesOrderHeader SSOH

    INNER JOIN Sales.SalesOrderDetail SSOD

    ON SSOH.SalesOrderID = SSOD.SalesOrderID

    )

    SELECT

    CSD.CustomerID

    ,CSD.COUNT_SSOD

    /* ,DENSE_RANK() OVER (ORDER BY CSD.COUNT_SSOD DESC) AS SALES_RANK */

    FROM CUST_SALES_DETAILS CSD

    WHERE CSD.SSOH_RID = 1

    AND CSD.COUNT_SSOD > 70;

    GO

    -------------------------------------------

    ;WITH CUST_SALE_BY_YEAR AS

    (

    SELECT

    SSOH.CustomerID

    ,ROW_NUMBER() OVER

    (PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate])

    ORDER BY (SELECT NULL)) AS CUST_RID

    ,YEAR(SSOH.[OrderDate]) AS SSOH_YEAR

    ,SUM(SSOH.SubTotal) OVER

    (PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate])) AS SSOH_SUBTOTAL

    ,(SUM(SSOH.SubTotal) OVER

    (PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate]))) * (0.10) AS SSOH_REBATE

    FROMSales.SalesOrderHeaderSSOH

    )

    SELECT

    CABY.CustomerID

    /*

    ,DENSE_RANK() OVER (PARTITION BY CABY.SSOH_YEAR ORDER BY CABY.SSOH_SUBTOTAL DESC,CABY.SSOH_YEAR DESC) AS SALES_RANK

    */

    ,CABY.SSOH_YEAR

    ,CABY.SSOH_SUBTOTAL

    ,CABY.SSOH_REBATE

    FROM CUST_SALE_BY_YEARCABY

    WHERE CABY.CUST_RID = 1

    AND CABY.SSOH_YEAR = 2008

    AND CABY.SSOH_SUBTOTAL > 150000;

    GO

    -------------------------------------------

  • Thnks for the great article. I have just one remark :

    For Question 1 the answer is C and not A. as the answer A refers to inner query instead of outer query.

  • The correlated subquery may be executed many times. It will be run once for each candidate row selected in the outer query.

    I believe that Jeff Moden refers to this as RBAR (row by agonizing row).

    A nice followup article would be "How to avoid using the Correlated Subquery."

  • Just to point out a possible improvement on the first query in part 3.

    The sql engine has to match all rows from the subquery to the outer query. This is because the filtering is outside the subquery.

    SELECT CustomerID FROM Sales.SalesOrderHeader OH

    WHERE (SELECT COUNT(*) FROM Sales.SalesOrderDetail

    WHERE SalesOrderID = OH.SalesOrderID) > 70;

    Just by adding a GROUP BY clause, the work is cut by a 1/4, here are two suggestions, the latter slightly faster :cool:;

    SELECT CustomerID FROM Sales.SalesOrderHeader OH

    WHERE (SELECT COUNT(*) FROM Sales.SalesOrderDetail

    WHERE SalesOrderID = OH.SalesOrderID

    GROUP BY SalesOrderID) > 70;

    And

    SELECT

    SOH.CustomerID

    FROM

    (

    SELECT

    COUNT(*) AS SOD_COUNT

    ,SOD.SalesOrderID

    FROM Sales.SalesOrderDetail SOD

    GROUP BY SOD.SalesOrderID

    ) AS X

    INNER JOIN Sales.SalesOrderHeader SOH

    ON X.SalesOrderID = SOH.SalesOrderID

    WHERE X.SOD_COUNT > 70;

  • Dennis Wagner-347763 (3/6/2014)


    The correlated subquery may be executed many times. It will be run once for each candidate row selected in the outer query.

    I believe that Jeff Moden refers to this as RBAR (row by agonizing row).

    Except that correlated subqueries *don't* run once per row of the outer query. It's an old and persistent myth that they do that.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I understand the logic behind listing #3, but I was surprised to see, and uncertain on how it works, that it is not necessary to use another GROUP BY clause in the subquery:

    SELECT Outer_H.[CustomerID]

    , SUM(Outer_H.[SubTotal]) AS TotalPurchase

    , SUM(Outer_H.[SubTotal]) * .10 AS Rebate

    FROM [Sales].[SalesOrderHeader] AS Outer_H

    WHERE YEAR(Outer_H.[OrderDate]) = '2008'

    GROUP BY Outer_H.[CustomerID]

    HAVING (SELECT SUM(Inner_H.[SubTotal]) FROM [Sales].[SalesOrderHeader] AS Inner_H

    WHERE Inner_H.[CustomerID] = Outer_H.[CustomerID]

    AND YEAR(Inner_H.[OrderDate]) = '2008' GROUP BY CLAUSE NOT REQUIRED HERE) > 150000

    ORDER BY Rebate DESC;

    How is SQL able to execute the aggregate SUM() function in the inner query w/out a GROUP BY clause applied? Does it automatically apply the GROUP BY clause from the outer query to the inner query?

    Thanks.

  • Because there's no un-aggregated columns being returned from that subquery. Group By is only needed when you have columns in aggregate and columns that aren't in aggregate in the select clause.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/6/2016)


    Because there's no un-aggregated columns being returned from that subquery. Group By is only needed when you have columns in aggregate and columns that aren't in aggregate in the select clause.

    Gila, thank you very much. If I learn nothing else from this stairway, I've learned this. So thanks again.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply