Nested subqueries

  • Hello SQL-gurus everywhere,

    I was asked to extract some data from the order tables, something like the small Northwind example below.

    Give the numver of orders before and after a certain date, and the difference between the results. The output should be 1 line per customer.

    Some questions:

    - is this an acceptable query? If not, could you give me a clue for a better solution

    - is there an alternative way to determine the value of the difference column?

    use Northwind

    go

    select customerID,

    ( select count(orderID) from orders

    where orderdate < '1997-01-01'

    and orders.customerID = customers.customerID ) AS data_1996,

    ( select count(orderID) from orders

    where orderdate > '1997-01-01'

    and orders.customerID = customers.customerID ) AS data_1997,

    ABS( ( select count(orderID) from orders

    where orderdate < '1997-01-01'

    and orders.customerID = customers.customerID ) -

    ( select count(orderID) from orders

    where orderdate > '1997-01-01'

    and orders.customerID = customers.customerID ) ) AS diff

    from customers

    go

    Thank for any input,

    Gerry


    Dutch Anti-RBAR League

  • Try this out..

    select customers.customerID,sum (case when orderdate < '1997-01-01' then 1 else 0 end) as a,

    sum (case when orderdate > '1997-01-01' then 1 else 0 end) as b ,

    abs(sum (case when orderdate < '1997-01-01' then 1 else 0 end)- sum (case when orderdate > '1997-01-01' then 1 else 0 end))

    from customers,orders where

    orders.customerID = customers.customerID

    group by customers.customerid

    This looks more readable and also the execution plan seems to suggest a better performance.

    Relationships are like Banks.You Invest the Principal and then get the Interest.


    He who knows others is learned but the wise one is one who knows himself.

  • Thank you for the illumination, Brokenrulz.

    The use of the CASE statement is just what I needed.

    Incidentally I needed the data from an undelying table, in my example [Order Details].UnitPrice.

    When I tested both solutions against my own database (>200.000 items), I discovered that the 'dirty' script with the nested subqueries actually was faster than the clean solution you provided. But as you said, the use of CASE results in better readability.

    Once again, thank you for your input.

    In appreciation,

    Gerry.


    Dutch Anti-RBAR League

  • While you are testing for performance, try this. Wonder if it's better than CASE and the subqueries ...

    
    
    SELECT M.customerID,
    IsNull(T96.NumOrders, 0) AS data_1996,
    IsNull(T97.NumOrders, 0) AS data_1997,
    IsNull(T97.NumOrders,0) - IsNull(T96.NumOrders,0) AS Diff
    FROM Orders M
    LEFT OUTER JOIN
    (SELECT CustomerId, Count(*) as NumOrders
    FROM Orders
    WHERE orderdate < '1997-01-01'
    GROUP BY CustomerId) AS T96
    ON M.CustomerId = T96.CustomerID
    LEFT OUTER JOIN
    (SELECT CustomerId, Count(*) as NumOrders
    FROM Orders
    WHERE orderdate >= '1997-01-01'
    GROUP BY CustomerId) AS T97
    ON M.CustomerId = T97.CustomerID

    Edited by - npeeters on 08/18/2003 5:45:04 PM

  • I would go for something similar to that given by brokenrulz:

    SELECT

    o.customerID

    ,SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE 0 END) AS data_1996

    ,SUM(CASE WHEN o.orderdate >= '1997-01-01' THEN 1 ELSE 0 END) AS data_1997

    ,ABS(SUM(CASE WHEN o.orderdate < '1997-01-01' THEN 1 ELSE -1 END)) AS diff

    FROM orders o

    INNER JOIN customers c

    ON o.customerID = c.customerID

    GROUP BY o.customerID

    Reasons for the differences...

    a) All of the data to be presented can be obtained from orders. By forcing SQL to get some data from customers (customerID and the GROUP BY) we may be causing a little more work to be done.

    b) Only 2 categories of order data are required. Therefore the diff derivation can be simplified, reducing the CPU cost slightly. I am assuming that although orders on 1/1/1997 may be unlikely they should be included in 1997 data rather than totally ignored.

    All information provided is a personal opinion that may not match reality.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • NPeeters and EdVassie,

    Thank you both for your remarks.

    NPeeters, you solution was almost identical as ours in that it results in a Cartesian product. That was the problem I was asked to fix, besides the slow performance. A distinct might solve the problem, but I will use outer joins only when forced 🙂

    (Hm, might be a nice subject for a discussion)

    EdVassie shows a nice way of determining the difference. By stating that I only need orderdata he demonstrates that my example was oversimplified. Of course I will need additional customerdata. Mea Culpa.

    Once again, much appreciated

    Gerry


    Dutch Anti-RBAR League

  • Stupid me.

    You will have to add a group by to the first 'table' in the FROM like so

    
    
    ...
    FROM (SELECT CustomerID FROM Orders GROUP BY CustomerID) M
    LEFT OUTER JOIN
    ...

    That should solve the cartesian product stuff.

    And obviously, you could use inner joins if you're sure that at least one record exists for each date range.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply