The Potential of Joins

  • Comments posted to this topic are about the item The Potential of Joins

  • Generally a good article, however there is a potential pitfall which should have been mentioned. This is that when using joins you do always have to be aware of the potential for unintential duplicates, which is not the case with exists/in

  • the general ideas outlined in this article are worth thinking over them.

    However, what is stated therein, cannot be taken as matter of fact, because the optimizer

    has its own ways.

    Just an example: the queries (used my own database)

    select * from kunden_zp_status where id_zp in (select id_zp from kd_zp where zp_name like 'aso%')

    select s.* from kunden_zp_status s join kd_zp z on z.id_zp = s.id_zp and z.zp_name like 'aso%' option

    behave alike : index seek on kd_zp - loop join - clustered index seek on kunden_zp_status ,

    neither execution plan does show a sub-select

    if you, however, force the optimizer to do what you have ordered it to do,

    you will see the penalty from the sub-select

    select * from kunden_zp_status where id_zp in (select id_zp from kd_zp where zp_name like 'aso%') option (force order)

    select s.* from kunden_zp_status s join kd_zp z on z.id_zp = s.id_zp and z.zp_name like 'aso%' option (force order)

    now the execution plan has a loop join from a sub-select for the first query, and a merge join for the second.

  • I'm curious as to why in the last example he used a LEFT OUTER JOIN, although in other queries in the article he used LEFT JOIN. From my knowledge, there is no difference between LEFT OUTER and LEFT type of JOINs in SQL Server.

    Assuming that it doesn't return the same data, wouldn't it be a redundant condition to put WHERE Q.CustomerName IS NULL if you have already used LEFT OUTER JOIN?

    ~ Just some guy trying to tune queries ~

  • radu.gheorghiu (2/26/2015)


    I'm curious as to why in the last example he used a LEFT OUTER JOIN, although in other queries in the article he used LEFT JOIN. From my knowledge, there is no difference between LEFT OUTER and LEFT type of JOINs in SQL Server.

    Assuming that it doesn't return the same data, wouldn't it be a redundant condition to put WHERE Q.CustomerName IS NULL if you have already used LEFT OUTER JOIN?

    I would not say so.

    SELECT T.CustomerName,T.CustomerResponse,T.CalledOn

    FROM #TMP T

    LEFT OUTER JOIN (SELECT CustomerName

    FROM #TMP

    WHERE CustomerResponse = 'Not Interested'

    ) Q

    ON Q.CustomerName = T.CustomerName

    INNER JOIN (SELECT CustomerName,

    MAX(CONVERT(DATE,CalledOn)) AS MaxCalledOn

    FROM #TMP

    GROUP BY CustomerName

    )Q2

    ON Q2.CustomerName = T.CustomerName

    WHERE Q.CustomerName IS NULL

    AND T.CalledOn = Q2.MaxCalledOn

    First the inner join (Q2) restricts data to the last called date

    Then this record set is joined against the main data set and finally the where statement filters out records that are not in the left join (Q)

  • I think a previous post alluded to this, but the last bit of SQL is fine where a customer would only be called once on a given date. If a customer could have been called twice, either the customer would appear twice or there would need to be a max/group in the code to pick one of the outcomes randomly.

  • The English of this article needs to be cleaned up. I'm thinking the author does not speak English as a primary language, which is totally fine, but if you're going to *publish* an article in English, the extra effort needs to be made in order to be professional.

    There were also some inconsistencies such as the one radu pointed out, and one in scenario 2 where it was about a "NOT IN" sub query, but the query was put together "... where IN (select... where <> "Mountain Works")", which is slightly different.

    Also, there are no benchmarks or Execution plans to prove the point, so we're just supposed to take the author's word for it???

    Finally, I'm wondering how sub queries stack up when they are used in conjunction w/EXISTS and/or NOT EXISTS conditions.

  • It has been my experience when I have to refactor scripts that use sub-queries, that the use of join's has improved the efficiency. It would be nice though to see some stats that prove it out.

  • Thinking of the aggregate query.. curious if efficiencies could be gained using a WITH Statement or temp table and joining.

    With Statement:

    With S as (select ProductID,SalesOrderID,SUM(OrderQty) AS OrderQty FROM Sales.SalesOrderDetail GROUP BY ProductID,SalesOrderID)

    Then join it as a table

    Inner join S

    ON P.ProductID = S.ProductID

    Or create a temp table and you could take advantage of creating an index off your results if you happen to be linking off some data that isn't a key (like Customer_ID from an Order table)

  • Scenario 2 doesn't have a NOT IN in the original statement...I believe it belongs here:

    select DISTINCT Name

    from Production.Product

    where ProductID IN (Select ProductID from Purchasing.ProductVendor

    where VendorID NOT IN (select VendorID from Purchasing.Vendor

    where Name<>'Mountain Works'

    )

    )

  • Yeah, scenario 2 examples are the same as scenario 1.

  • My boss is a subquery junkie. When I try to make sense of the queries she has saved as examples for me, I find this confusing. You've intrigued me with your use of joins especially in regards to a substitute for multiple group by's.

    THANKS!

  • I realize that the point of this article was to compare joins to subqueries but I would have thought to use the rank function. Here is how I would have done it:

    ;WITH Test AS (

    SELECT

    CustomerName,

    CalledOn,

    CustomerResponse,

    RANK() OVER (PARTITION BY customername ORDER BY CONVERT(DATE,CalledOn) DESC) AS Rank

    FROM #TMP

    )

    SELECT CustomerName,CustomerResponse,CalledOn

    FROM Test

    WHERE Rank=1

    AND CustomerResponse<>'Not Interested'

    There weren't enough records in the table to know which is fastest. I am curious to know what people think of using the Rank function for these purposes instead.

  • I had the same thought as you but I think you would want to use row_number not rank. Here is a good example of the difference between rank, and row_number.

    http://stackoverflow.com/questions/7747327/sql-rank-versus-row-number

  • Joins vs Exists. Neither is always right. Just be flexible.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply