Not In v Not Equal

  • For this particular case of continuous values, 'NOT BETWEEN 214 AND 217' runs twice as faster as the best solution (NOT EXISTS) !

  • This article, and its many replies, is a great example of why I like this site so much! Valuable points are made, with time taken to describe exactly how answers were derived, and comments flesh out the original discussion quite nicely.

    Overall, thanks very much for this - Sharon.

  • Paulo G., that was a great catch!

    BETWEEN didn't even cross my mind when looking at this originally. I was looking more at the generalities of which techniques worked best with a multi-value filter and totally missed the fact that my filter data were sequential.

    If we change our sample data to something like (8, 214, 271, 310), NOT EXISTS keeps its performance edge, but the best solution should take into account the filter data, and the BETWEEN solution crushed the NOT EXISTS solution for the given sample (214-217).

    Thanks for the new angle!

    KenJ

  • as an aside set base generation is a tad quicker

    set statistics io on

    set statistics time on

    CREATE TABLE tbl_IN_VS_AND (filterCriterion_sv int PRIMARY KEY CLUSTERED NOT NULL)

    Insert tbl_IN_VS_AND

    SELECT 100000 * U.I + 10000 * V.I + 1000 * W.I + 100 * X.I + 10 * Y.I + Z.I filterCriterion_sv FROM

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) U,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) V,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) W,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) X,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Y,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Z

    set statistics io off

    set statistics time off

    at about 10 seconds on a dual core at least.

    to compare hardware the first two queries in the post on a baseline of 100 I get

    43766 ms and 43750 ms elapsed

    Biggest shock was the times of

    SELECT @results = count(filterCriterion_sv)

    FROM tbl_IN_VS_AND

    WHERE filterCriterion_sv NOT IN (Select 214 I UNION SELECT 215 UNION SELECT 216 UNION SELECT 217)

    SELECT @results = count(filterCriterion_sv)

    FROM tbl_IN_VS_AND

    WHERE filterCriterion_sv NOT IN (214,215,216,217)

    were 27176 ms

    and 43486 ms

    while

    SELECT @results = count(filterCriterion_sv)

    FROM tbl_IN_VS_AND

    WHERE NOT EXISTS(SELECT * FROM

    (

    SELECT 214 AS filterValue_val UNION ALL

    SELECT 215 UNION ALL

    SELECT 216 UNION ALL

    SELECT 217 ) AS tbl

    WHERE tbl.filterValue_val = tbl_IN_VS_AND.filterCriterion_sv )

    ended up was still 14470 ms so twice as good as the union

    I guess it must not construct the in predicate into a working table before starting the query. (until you tell it to via union)

  • In my experience it is always faster to join with "exists" then do a painful "in". "In" is only good for several values. With ssis, you can actually use a pre-existing table to insert your values into and join aginst.

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply