• I've not checked this on SQL Server 2005, but the NUMBER of terms in the NOT IN or the NUMBER of <> , in combination with the distribution statistics, makes a huge difference in Sybase, and I suspect it could in SQL 2005.

    Looking for "not in" is expensive; how expensive depends on how many you have.   On Sybase, if there were many different values and the distribution was pretty even, there was a point - about 4 or 5, I recall, where the plan changed from using index to table scan.   It takes longer to optimize a bunch of NOTs, and at some point (to avoid letting the optimizer use more time than the search!) it bails out and says "if you have that many, a table scan is probably fastest anyway (and besides, if you wrote than many NOTs performance was probably a secondary question).

    In an app that built the statement on the fly from t to X number of items to be NOT IN, performance varied wildly depending on how many the user entered.  3 was fast.  15 was not.  (These were wide rows with 100s of millions of records).

    It was faster to break it into a single batch of smaller queries.  Instead of doing

    select foo from bar where x not in (1, 20, 43, 99, 110, 2000, 3201, 7098, 7099)

    go

    it was faster to do

    select foo from bar where x < 100 and x not in (1, 20, 43, 99)

    select foo from bar where x >= 100 and x < 5000 and and x not in (110, 2000, 3201)

    select foo from bar where x >= 5000  and and x not in (7098, 7099)

    go (a single batch)

    Of course, this is the kind of silly performance hack that will bite your butt when you port it to a new RDB or a new version that better optimizes the original query.

     

     

    Roger L Reid