Query cost

  • Comments posted to this topic are about the item Query cost

  • Based on your provided reference, I would agree. However, if one examines execution plans for these two queries and run them together, the query optimizer treats them as the same execution plan and equates both queries to the same cost.

    Here is a nice resource on the topic:

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I had chosen - "both are equally cost-effective" simply because I felt that the correlated subquery still needs to check for the existence of the equality condition for which the passes through the student and teacher tables will have to be made for every outer row. Whereas the simple subquery will have to be evaluated only once and then the IN operator would kick in. Somehow - without having any rigourous fundamentals backing my theory - I felt the queries would perform equally well and so I chose the third option.

    Even though I knew that the question was testing the usage of EXISTS - I wasn't convinced that the 2nd query would perform appreciably better than the first query.

    Anyway - I stand corrected.

    I had not confirmed the execution plan in SSMS. Now that Cirque has confirmed what I felt intuitively I will go ahead and look at it.

    Saurabh Dwivedy
    ___________________________________________________________

    My Blog: http://tinyurl.com/dwivedys

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537

    Be Happy!
  • I answered OK, but...

    The first variant with IN can be more effective, it allways depend on data. Compare two queries without any knowledge about structure of tables, indexes, possible data distributions...



    See, understand, learn, try, use efficient
    © Dr.Plch

  • honza.mf (1/27/2010)


    I answered OK, but...

    The first variant with IN can be more effective, it allways depend on data. Compare two queries without any knowledge about structure of tables, indexes, possible data distributions...

    This is a very valid point - the answer really is an It Depends kind of answer.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • This used to be true, and I have often used this knowledge to optimise slow-running queries. But since SQL 2005 it no longer matters from a performance point of view. It's still good practice to use EXISTS, though, as it expresses the intent more clearly.


    Just because you're right doesn't mean everybody else is wrong.

  • CirquedeSQLeil (1/27/2010)


    honza.mf (1/27/2010)


    I answered OK, but...

    The first variant with IN can be more effective, it allways depend on data. Compare two queries without any knowledge about structure of tables, indexes, possible data distributions...

    This is a very valid point - the answer really is an It Depends kind of answer.

    Thanks.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • Volumes of data returned from tables may vary (test or production database), so a well performing query may slow down or give an error.

    FROM BOL:

    Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table.

    Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.

    I prefer EXISTS instead of IN + subquery because of performance.

  • I recently optimised a query and tried exactly these two constructs. The query plan for both was identical.

    So the correct answer is either "they are equal", or (more likely I suspect) "it depends on the data".

  • Rune Bivrin (1/27/2010)


    This used to be true, and I have often used this knowledge to optimise slow-running queries. But since SQL 2005 it no longer matters from a performance point of view.

    I checked this on SQL Server 2000 (SP4, 8.00.2039) and it produced fully identical execution plans. Seems like the database engine 2000 is smart enough 🙂 Maybe this was one of the improvements in Service Pack 4.

    Carlo Romagnano (1/27/2010)


    FROM BOL:

    Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632.

    This may happen only when a list of values is used in the IN statement. The author used a subquery, so it's O.K.

  • CirquedeSQLeil (1/26/2010)


    if one examines execution plans for these two queries and run them together, the query optimizer treats them as the same execution plan and equates both queries to the same cost.

    Hmmmm. Wonders if they are optimised to a "(s)lowest common denominator" 🙂

    (Being a bit of a lightweight here I don't know how to examine the execution plans. Is that part of the SQL Profiler?)

    Kelsey Thornton
    MBCS CITP

  • If I critisize a QotD, I always try to maintain a positive tone. Especially since, after having submitted some of my own, I know how hard it is to create one, and how impossible it is to satisfy everyone.

    This question makes it very hard for me to stay positive, because it actually shows a severe lack of understanding of the subject matter by the question author. I'll just enumerate my the issues.

    1. The schema of the tables has not been supplied. In questions like this, that may be of the utmost importance. For isntance, had the question been about NOT IN versus NOT EXISTS, than the queries would only have been equivalent if student.teacher_id is not nullable. I must admit that after trying some schema variations, I have not yet found one where the query performance of these particular queries is affected by the schema, but I only tried a few ones so I can't exclude the possibility.

    2. The author obviously has not bothered to check his ideas. I did (as indicated above). And with all the schema variation I tried, the two queries were executed using the EXACT SAME execution plan. The query optimizer obviously sees that these two queries are equivalent, so they are processed the same. And as a result, there can never be any performance difference. "Both are equal" is the correct answer, as based on this evidence - but there is no guarantee whatsoever that the same holds on all versions of SQL Server, on all possible variations of hardware, and with all possible data distributions.

    3. Performance related questions are always disputable because, as indicated above, there are so many factors involved in query optimization that it is almost impossible to predict what the optimizer will do with a query. And it will not always be the same either. Even on the same system, results may change overnight for no apparent reason (happened to me yesterday in the DB I'm working on - a stored proc that suddenly took many minutes to complete).

    Other, minor issues are the unneeded brackets around [name] (name is not on the list of reserved keywords, so no delimitation required); the strangely popular but really rather odd EXISTS 1 instead of EXISTS * (EXISTS checks for rows, not values, so what you put there is immaterial - except that * is the standard thhat anyone understands immediately while EXISTS(SELECT 1 makes everybody pause to think); and the broken link in the explanation (the two links both point to the same page).

    Bottom line - the only truly correct answer is "it depends". Of the options give, "both are equal" is almost correct. The other two options are plain nonsense.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • honza.mf (1/27/2010)


    CirquedeSQLeil (1/27/2010)


    honza.mf (1/27/2010)


    I answered OK, but...

    The first variant with IN can be more effective, it allways depend on data. Compare two queries without any knowledge about structure of tables, indexes, possible data distributions...

    This is a very valid point - the answer really is an It Depends kind of answer.

    Thanks.

    I must say, I think you can infer that that teacher_id is a PK / FK, but I'm not sure that it makes any difference. Both queries boil down to an inner join between the whole of both tables.

    Since all of the data in each table is involved, and the same columns are used in both cases, could you suggest some examples in which the structure / data / indexes would make a difference to these queries?

  • vk-kirov (1/27/2010)


    Carlo Romagnano (1/27/2010)


    FROM BOL:

    Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632.

    This may happen only when a list of values is used in the IN statement. The author used a subquery, so it's O.K.

    Exactly. The query processor will not first execute the subquery to create at a list of teacher_id values and then insert that in the outer query; the optimizer will produce one integral plan to execute the whole query. Which, in my case, consists of scanning the teacher table, and then for each row looking for the first matching value in student.teacher_id (where the exact method of looking depends on whether and how this column is indexed).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/27/2010)


    the strangely popular but really rather odd EXISTS 1 instead of EXISTS * (EXISTS checks for rows, not values, so what you put there is immaterial - except that * is the standard thhat anyone understands immediately while EXISTS(SELECT 1 makes everybody pause to think); and the broken link in the explanation (the two links both point to the same page).

    I must admit I always use EXISTS (SELECT 'X' which is probably even more confusing! It's from my days with Oracle version 5 or 6 - in which we showed that * was slower than 1, by enough that it should be avoided, and that 1 was slightly slower than 'X' so we used the latter in preference. Old habits die hard...

Viewing 15 posts - 1 through 15 (of 83 total)

You must be logged in to reply to this topic. Login to reply