WHERE IN(sub-query) vs. INNER JOIN

  • Hey all,

    Maybe I should've known better, but for clarity sake we were using WHERE IN (sub-query) in many of our procedures, as opposed to using an INNER JOIN. In most of our empirical tests, SQL was using similiar execution plans for the WHERE IN (sub-query) and the INNER JOIN, so we opted for WHERE IN since it made the intent of the statement clearer (there were many joins).

    Today while tuning one of the procedures we noticed a huge performance increase using the INNER JOIN over the WHERE IN(sub-query). I have never seen this before, and wondered if I'd stepped into an anamoly or if INNER JOINs over WHERE IN(sub-query) should just be a best practice?

    Any thoughts? comments?

    Thanks,

    JasonL


    JasonL

  • Hey Jason,

    I don't remember all of the specific details, but my understanding is that you should always use INNER JOIN over WHERE IN(subquery) when writing queries. The reason has something to do with the way SQL treats the WHERE clause. It is seen as an expression, so the columns you specify in it are not used in the execution plan optimization.  Does this information jog anyone elses memory on this topic??

    Earl

  • I had read an article about using "exists" instead of "In". I think it mentioned that exists would just check for a condition to find out if its true or false; but "In" clause actually retrieves a result set and hence is much slower.

  • Here is a really nice explanation from Itzik Ben-Gan on the difference between (NOT) IN and EXISTS:

    http://groups.google.de/groups?hl=de&lr=&selm=uFffAFPaBHA.1900%40tkmsftngp04&rnum=2

    IMHO, when you can restate your query to using JOINs, you should do so. JOINs are usually process efficiently. But to really find out, as always, you need to test your alternatives in your environment.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • A big key is the Join can actually utilize any indexes that exist for that column when joining. The in is generating a simple const list to compare against or might generate a small internal table of the values before processing actually begins which is where the IN will generally cost you the most.

  • you should always try to use simple JOIN over WHERE IN , if it is possible.

    The same goes for LEFT JOIN over Where NOT IN

  • The trouble with IN clause is when the subquery returns NULL you then get all records. Not critical when you use it in a SELECT, but trouble in a DELETE WHERE clause. Can you say Got Backup...

    Andy

  • All ... thanks for the replies!

    For the most part, the warnings everyone has given are things I am well aware of (Andy's reminder about returning NULL, etc.), and we honestly did do as Frank suggested - test both options in our environment. (The first thing we tried when we realized the WHERE IN() was taking too long was to try it as a JOIN).

    I've heard the very generic "use Joins when you can", but having seen evidence to the contrary in *some* cases have not always followed that advice.

    What I'm really wondering is if anyone has some good links that describe the trade-offs, particulary anything that would give solid evidence to back up claims like "joins can use indexes, WHERE IN cannot", "joins are processed more efficiently". I've dug through "Inside SQL Server" and googled like crazy and can't find anything concrete.

    Appreciate all the help.


    JasonL

  • Google smart and google in the Groups, not the web and you'll find

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Besides performance issues, I think that typical beginners usually prefers WHERE IN or WHERE NOT IN to include or exclude something, because this logic is kind of easier to comprehend. When person starts to think SQL in term of “joins” to include or exclude something to me it is indication of more advanced level

  • Too bad I cannot find the old white paper on the SQL optimizer. It discussed techniques like reducing the query tree or parse tree; the point being that the SQL language offers many ways to express what is essentially the same relational algebra. The white paper stated that subqueries are translated to joins internally. Identical execution plans would be strong evidence for this.

  • Yes!

    And worse: IN and Not IN() require single-column primary keys. This leads beginners to deduce: every table must have an single-column (and thus, often, an identity) PK, otherwise, how can I express my WHERE clauses?

  • What, there are other PK's possible which have not defined the IDENTITY property on it???

    Amazing!

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Yeah, that was the kind of thing that I was looking for.

    I understand the comments regarding WHERE IN() being a crutch for beginners. On the other hand I think an argument could be made for the self-documenting nature of WHERE IN() over JOINs for the very reason that it is easier to read/understand.

    The project I ran into this on had some rather complicated logic, which translated into many joins. The WHERE IN() clearly documented which joins were necessary to get to the data, and what was the "logic" for when a particular modification ran. Naturally, we could use the joins instead (in some cases we have to for performance); and use good comments to describe what is going on. I'm just always a fan of self-documenting code over comments when ever possible.

    I was just caught by surprise by the gross discrepancy in performance, having thought that both were evaluated similiarly internally. But that idea may be a holdover from older platforms (Sybase/6.5). If I recall there was even something in the Sybase documentation describing under what conditions the WHERE IN(sub-query) would be translated to joins internally.

    In any case, thanks to all for the input. I'll be a little more leary of WHERE IN() in the future.


    JasonL

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply