RE: Not In v Not Equal – SQLServerCentral

SSCrazy

Points: 2004

May 29, 2007 at 8:34 am

I've not checked this on SQL Server 2005, but the NUMBER of terms in the NOT IN or the NUMBER of <> , in combination with the distribution statistics, makes a huge difference in Sybase, and I suspect it could in SQL 2005.

Looking for "not in" is expensive; how expensive depends on how many you have. On Sybase, if there were many different values and the distribution was pretty even, there was a point - about 4 or 5, I recall, where the plan changed from using index to table scan. It takes longer to optimize a bunch of NOTs, and at some point (to avoid letting the optimizer use more time than the search!) it bails out and says "if you have that many, a table scan is probably fastest anyway (and besides, if you wrote than many NOTs performance was probably a secondary question).

In an app that built the statement on the fly from t to X number of items to be NOT IN, performance varied wildly depending on how many the user entered. 3 was fast. 15 was not. (These were wide rows with 100s of millions of records).

It was faster to break it into a single batch of smaller queries. Instead of doing

select foo from bar where x not in (1, 20, 43, 99, 110, 2000, 3201, 7098, 7099)

it was faster to do

select foo from bar where x < 100 and x not in (1, 20, 43, 99)

select foo from bar where x >= 100 and x < 5000 and and x not in (110, 2000, 3201)

select foo from bar where x >= 5000 and and x not in (7098, 7099)

go (a single batch)

Of course, this is the kind of silly performance hack that will bite your butt when you port it to a new RDB or a new version that better optimizes the original query.

Roger L Reid