Where Column IN (x,y,z) Better Than Where In (Select from table)

  • I've probably asked this before but each time I encounter it, it baffles me. I have queries similar to the following:

    -- Example 1

    -- Populate a temp table

    Select D.X INTO #tmp FROM

    (SELECT 100 AS X UNION ALL

    SELECT 200 AS X ) D

    SELECT A, B, C FROM myView -- Joined to other tables

    WHERE A IN (SELECT T.X FROM #tmp T)

    -- Example 2

    SELECT A, B, C FROM myView -- Joined to other tables

    WHERE A IN (100, 200)

    In short, example 2 smokes example 1. It results in a completely different execution plan. Is it simply because the constant values are resolved more easily than the same values selected in another table?

  • Yes. In Example 1, you've got the extra expense of compiling the query because it's more complex (it involves a join), and also of actually retrieving the values from the table. I'm surprised you notice a difference with just two rows... I imagine that you've simplified it before posting?

    John

  • In short, yes. The first it two seeks against the table (assuming appropriate indexes), the second is a join between two tables, might be a loop join, might not.

    The difference between the two will change as the number of entries in the IN changes. Try that again with 2000 entries in the IN vs a table with 2000 rows. Should be a different story,

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I was surprised at the difference with only one entry but it was significant enough to investigate. The procedure/query will always use between 1 and 200 values so we've implemented a dynamic SQL approach.

    Thanks for the explanations.

    ST

  • Neither of these two forms is guaranteed to be faster than the other. But the plans will be different.

    If you already have the values in a table, use form 1 and don't go through hoops to somehow force form 2.

    If you have the values as constants, use form 2 and don't use a needless temp table as in form 1.

    Only deviate from these guidelines if you find that the performance is unacceptable and the other form fixes it.

    That being said, I personally only use [NOT] IN with a delimited list, never with a subquery. The latter can always be rewritten as a [NOT] EXISTS, with the additional benefits of not having issues with NULL values in the subquery and not requireing nasty workarounds if you work with composite keys.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • You should definitely cluster the temp table on the key column as well. It can't hurt, but it can help under certain conditions.

    SELECT TOP (0) D.X INTO #tmp FROM

    (SELECT 100 AS X) D

    CREATE CLUSTERED INDEX tmp__CL ON #tmp ( X ) WITH ( FILLFACTOR = 100 );

    INSERT INTO #tmp

    SELECT D.X

    FROM

    (VALUES(100),(200),(300),(400),(500),(600),(700),(800),(900),(1000)) D(X)

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply