Where is it best to place "where conditions" in the join or inthe final where statement

  • Is it best practice to ensure filtering is done at the time of an inner join or at the end with a filter in the where clause.  Which would mean less data needs to be checked, and filtered and therefor is more efficient would you say.  e..g

    Select * from x

    inner join y on x.id=y.id

    and y.age > 20

    OR

    Select * from x

    inner join y on x.id=y.id

    where y.age > 20.

    I'm thinking the inner join clause is best, but then my initial stats suggest the where is quicker.

  • i guess it might depend on your data somewhat

    But. The where clause is where sql server would expect to find it.

    I've always done joins on the values of one column and restriced the results elsewhere but that is a personal thing - don't know it's any slower or faster -

    cheers

    dbgeezer

  • Try this in Northwind and then look at the execuion plan.

    Select * from "Order details" od

    inner join Orders o

      on od.orderid  = o.orderid

     and  o.orderdate > '5-01-98'

     go

    Select * from "order Details" od

    inner join orders o

    on od.orderid=o.orderid where o.orderdate > '5-01-98'

     


    John Zacharkan

  • i'd still stick with the where clause for readability

    cheers

    dbgeezer

  • aaah, see what you mean by the execution plans.. they're the same, and the same for my own variations in code too.

    I sort of like a bit of both, keeping the conditions with the inner joins so that it clearly identifies the table restrictions and in the where section when it can change for each run.

    thanks very much

    rob.

  • I think the optimizer in SQL Server resolves a lot of these types of issues too.

  • For inner joins I don't think it really matters as noted by the optimizer doing the same thing. However if you go to an outer join many times it HAS to be part of the join clause! Consider the following...

    SELECT *

    FROM [Order Details] od

        LEFT JOIN Orders o

          ON od.OrderID  = o.OrderID

             AND  o.OrderDate > '5-01-98'

    AND o.ShippedDate IS NOT NULL

    SELECT *

    FROM [Order Details] od

        LEFT JOIN Orders o

          ON od.OrderID  = o.OrderID

             AND  o.OrderDate > '5-01-98'

    WHERE o.ShippedDate IS NOT NULL




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • If you are using an outer join as a replacement for a "WHERE NOT IN", then the filter must be in the WHERE clause as in Gary's second example above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thought I'd clarify my statement about using an Outer Join as a replacement for a "Where Not In"...

    About outer joins... I'm sure you've heard all sorts of horror stories about how outer joins return everything and are evil, etc.  Those stories are true when told by those that don't know how to use them.

    One of the best uses for outer joins is as a replacement for "NOT IN".  Hint: anytime you use NOT IN or <>, you have just defeated the optimizer in SQL Server.  So, the general idea is to keep everything in a positive manner as in "IN" (still a bit slow) or "=" (nasty fast!).

    To demo, I created the following 2 tables:

    CREATE TABLE [dbo].[TestJoin1] (

       [ID] [int] IDENTITY (1, 1) NOT NULL ,

       [AColumn] [varchar] (10) NULL )

    CREATE TABLE [dbo].[TestJoin2] (

       [ID] [int] IDENTITY (1, 1) NOT NULL ,

       [AnotherCol] [varchar] (10) NULL )

    TestJoin1 contains the following information:

    ID          AColumn

    ----------- ----------

    1           A

    2           B

    3           C

    4           D

    5           E

    6           F

    (6 row(s) affected)

    TestJoin2 contains the following information:

    ID          AnotherCol

    ----------- ----------

    2           B

    4           D

    5           E

    (3 row(s) affected)

    The problem, for discussion purposes is to select all IDs from TestJoin1 that are NOT IN TestJoin2.  Most people would write the SQL literally and kill themselves on speed when using big tables like this:

    SELECT *

    FROM TestJoin1

    WHERE ID NOT IN (SELECT ID FROM TestJoin2)

    and the results would be:

    ID          AColumn

    ----------- ----------

    1           A

    3           C

    6           F

    (3 row(s) affected)

    Again, the problem is to NOT USE "NOT IN" because "NOT IN" cannot be optimized.

    First, let's just try a simple LEFT OUTER JOIN where TestJoin1 is the "LEFT" table:

    SELECT j1.*,j2.*

     FROM TestJoin1 j1

     LEFT OUTER JOIN TestJoin2 j2

          ON j1.ID = j2.ID

    and the results would not be anything like we wanted but do notice that, even though there wasn't a match in 3 of the records, ALL of the records from the LEFT table (TestJoin1 in this case) were returned (see below).  That's what an outer join does.  To identify which table to do this with, LEFT or RIGHT are added to the OUTER JOIN statement.  Which table is the LEFT table?  The one to the LEFT (or above, in this case) of the LEFT OUTER JOIN clause.  The other table to the right of the outer join is the RIGHT table.

    ID          AColumn    ID          AnotherCol

    ----------- ---------- ----------- ----------

    1           A          NULL        NULL

    2           B          2           B

    3           C          NULL        NULL

    4           D          4           D

    5           E          5           E

    6           F          NULL        NULL

    (6 row(s) affected)

    Notice the pattern?  Again, the problem is to select everything from TestJoin1 where the ID is "NOT IN" TestJoin2.  Notice that where that is true, there are NULLs in the return set (right 2 columns are from TestJoin2) above?  So, all we have to do is add a filter in a WHERE clause and we have exactly what (with extra fields for demo) we want using only "positive" logic in the form of "=" and "IS" like this:

    SELECT j1.*,j2.*

     FROM TestJoin1 j1

     LEFT OUTER JOIN TestJoin2 j2

          ON j1.ID = j2.ID

    WHERE j2.ID IS NULL

    and the results look like this...

    ID          AColumn    ID          AnotherCol

    ----------- ---------- ----------- ----------

    1           A          NULL        NULL

    3           C          NULL        NULL

    6           F          NULL        NULL

    (3 row(s) affected)

    ...and, if we remove all reference to TestJoin2 from the SELECT clause field list like this:

    SELECT j1.*

     FROM TestJoin1 j1

     LEFT OUTER JOIN TestJoin2 j2

          ON j1.ID = j2.ID

    WHERE j2.ID IS NULL

    ...we get exactly what we wanted... All of the fields for all of the records in TestJoin1 whose ID is NOT IN TestJoin2...

    ID          AColumn

    ----------- ----------

    1           A

    3           C

    6           F

    (3 row(s) affected)

    ...And will run nasty fast even on big tables!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • is a left outer join faster than NOT EXISTS then ?

    cheers

    dbgeezer

  • quoteis a left outer join faster than NOT EXISTS then ?

    Generally yes but not always. I depends on the table definitions, PK's indexes and so forth.

    I have done tests before and found with some tables the plan is the same and the time the same but others NOT EXISTS is 3 times faster.

    What I do is try different queries and see which has the best plan / execution time (and make sure you do the tests with different data volume, the number of rows can affect query speed drastically).

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I've found the same as David. It really depends on the data and how much of it you have!

    BTW: My query above was NOT about a Not Exists query but to show that you need to structure your outer joins carefully to get the correct data. In the query I posted above the ShippedDate allows Null values. The same thing would have happened if I had used an exact date.

    SELECT *

    FROM [Order Details] od

        LEFT JOIN Orders o

          ON od.OrderID  = o.OrderID

             AND  o.OrderDate > '5-01-98'

    AND o.ShippedDate = CONVERT(datetime,'1998-05-06')

    SELECT *

    FROM [Order Details] od

        LEFT JOIN Orders o

          ON od.OrderID  = o.OrderID

             AND  o.OrderDate > '5-01-98'

    WHERE o.ShippedDate = CONVERT(datetime,'1998-05-06')




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • I have the same question, but am using a "NOT EXIST" functionality.  We are experience tremondous slowness running this query.  Do you have any ideas on how I can increase query time?

    select *

    from rebate rb

    where unit > ' '

    and file_dte = '2004-05-02'

    and sys_nam like '%odsbeta%'

    and incentiverevcode = 1

    and not exists (select 'x'

                  from rebate rb2

                 where rb.[group] = rb2.[group]

                   and rb.unit = rb2.unit

                   and rb.enterprisecode = rb2.enterprisecode

                   and rb.programcode = rb2.programcode

                   and rb2.incentiverevcode = 0

                   and rb2.sys_nam not like '%odsbeta%')

  • It, of course, depends on the SELECT statement in the NOT EXISTS but, I'd have to say "Yes", in most cases (was going to say ALWAYS but, sure-as-shootin', somebody would find the exception )

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You have two major performance killers in this query.

    in the outer query:

    and sys_nam like '%odsbeta%'

    By using a leading wildcard like this, you are effectively forcing a table scan on table rebate

    in the subselect:

    and rb2.sys_nam not like '%odsbeta%'

    Here we have both a leading wildcard and a negative (not like), both of these constructs also forces a scan of rebate.

    To make matters even worse, the subselect is part of a NOT EXISTS clause.

    This means that table rebate will be scanned from top to bottom once for each row returned by the outer query (select * from rebate...)

    As you can imagine, this will probably quite quickly lead to degrading performance.

    To fix, try to avoid leading wildcards at any cost.

    Also avoid negatives - ie NOT IN and such.

    NOT EXISTS is ok - it's not a negative predicate in the same sense as NOT IN, it's a boolean expression resolving to either true or false - the trick for NOT EXISTS is to make it resolve as quickly as possible. This means that the subselect has to be as efficient as possible.

    To sum up, your main problem is not the NOT EXISTS clause per se, it's in the outer query and in the resolver of NOT EXISTS..

    ..makes any sense?

    =;o)

    /Kenneth

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply