SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Not In v Not Equal


Not In v Not Equal

Author
Message
Ken Johnson-162971
Ken Johnson-162971
SSC Veteran
SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)

Group: General Forum Members
Points: 234 Visits: 375
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/kjohnson/2924.asp
Leo Miller
Leo Miller
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 20
It would be interesting to see the query plans for the last two options and see how they compare. Logically we would expect the performance to be better on these but most of us struggle with the complexity of the code. I've seen it used a number of times and now use it regularly although it is a lot harder to work out what exactly is happening and I know it scares some newbies to death. Scared me half to death the first time I was it.
Me Da Man
Me Da Man
SSC Rookie
SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)

Group: General Forum Members
Points: 34 Visits: 6

I actually saw a very similar result when updating a large number of rows in a table. I had an SSIS package which updated one or two columns of many rows in a table. It would take hours to use the SQL Command component (and run a stored procedure for each row). So instead I inserted the PK and values of the rows that needed to be change into a temporary table with the same columns and data types as the table that needed updating. After that I ran a stored procedure that was something like this

UPDATE MyTable

SET Col1 = ISNULL(MyTempTable.Col1, MyTable.Col2)

FROM

MyTable INNER JOIN MyTempTable

ON MyTable.PK = MyTempTable.PK

(I know my code isn't right, but I think the principle is clear.)

Doing this takes as little as 5 minutes, when previously I would have waited hours to do the updates one at a time.

I appreciate that the overhead of SSIS's connections almost cetainly exaggreates the benefit, but the basic idea is the same. Using a join instead of looking up the individual rows.


Martin Cairney
Martin Cairney
Mr or Mrs. 500
Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)

Group: General Forum Members
Points: 521 Visits: 161

SQL Server (like all DBMS) works best when dealing with sets rather than row by row operations.

When you use the JOIN variations for this type of query, SQL Server can optimise and perform the operations effectiely in a single pass for the set rather than looking row by row.

The query plans for the second 2 optiosn should show this.


Eduard Koryagin
Eduard Koryagin
Grasshopper
Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)

Group: General Forum Members
Points: 11 Visits: 43
[1] is not quite correct. We can say to server HOW TO DO - just define plan for query.
Koji Matsumura
Koji Matsumura
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 1918
I run same test against 1,000,000 records and got following results:

1 time
NOT IN() : 390ms
<> AND : 373ms
OUTER JOIN: 1,876ms
NOT EXISTS: 966ms

10 times
NOT IN() : 3,826ms
<> AND : 3,826ms
OUTER JOIN: 18,643ms
NOT EXISTS: 9,593ms

100 times
NOT IN() : 38,203ms
<> AND : 38,250ms
OUTER JOIN: 184,656ms
NOT EXISTS: 95,703ms

I used SQL 2000 and table variable.
It seems that the first 2 methods are the fastest.

Is everyone using SQL 2K5?

I will run the same test with actual table to see if there is any difference.

K. Matsumura
Koji Matsumura
Koji Matsumura
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 1918
I run the same test using actual table and got following results:

1 time
NOT IN() : 436ms
<> AND : 390ms
OUTER JOIN: 360ms
NOT EXISTS: 343ms

10 times
NOT IN() : 3,936ms
<> AND : 3,923ms
OUTER JOIN: 3,936ms
NOT EXISTS: 3,873ms

100 times
NOT IN() : 39,530ms
<> AND : 39,360ms
OUTER JOIN: 41,516ms
NOT EXISTS: 42,076ms

It still seems that the first 2 methods are the fastest.

Do we have to think differently when using SQL 2000?

Anyway, I think it is very interesting.

K. Matsumura
Ken Johnson-162971
Ken Johnson-162971
SSC Veteran
SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)SSC Veteran (234 reputation)

Group: General Forum Members
Points: 234 Visits: 375

Hi K. Matsumura

Thanks for taking the time to play with the queries, that's the best way to know what will work best in your environment.

These tests were conducted using SQL Server 2000 SP4. I re-ran the tests this morning on a couple different servers to verify that the results are still similar to what they were when I wrote the article, and they are. I also ran them on SQL Server 2005 SP2 with the same trend in query times.

However, when I ran them on my workstation, today (I had only run them on servers before), the numbers worked out the way you have recorded: faster for the NOT IN() (10 seconds for 100 iterations) than for the NOT EXISTS (12.6 seconds). Something to do with the newer hardware in the workstation, perhaps (single processor, dual-core workstation vs. dual processor, single-core, server)?

KenJ


Adrian Hains
Adrian Hains
SSC-Enthusiastic
SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)SSC-Enthusiastic (170 reputation)

Group: General Forum Members
Points: 170 Visits: 261
I think that execution time is only one piece of the puzzle. For my needs (back end to web server), if an SP executes in .1 seconds I'm typically in good shape, but if it has a few thousand reads that may not be acceptable. Given an acceptable query time, I would usually pick a strategy that uses the least disk and cpu resources.
Roger L Reid
Roger L Reid
SSC Veteran
SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)

Group: General Forum Members
Points: 202 Visits: 166

I've not checked this on SQL Server 2005, but the NUMBER of terms in the NOT IN or the NUMBER of <> , in combination with the distribution statistics, makes a huge difference in Sybase, and I suspect it could in SQL 2005.

Looking for "not in" is expensive; how expensive depends on how many you have. On Sybase, if there were many different values and the distribution was pretty even, there was a point - about 4 or 5, I recall, where the plan changed from using index to table scan. It takes longer to optimize a bunch of NOTs, and at some point (to avoid letting the optimizer use more time than the search!) it bails out and says "if you have that many, a table scan is probably fastest anyway (and besides, if you wrote than many NOTs performance was probably a secondary question).

In an app that built the statement on the fly from t to X number of items to be NOT IN, performance varied wildly depending on how many the user entered. 3 was fast. 15 was not. (These were wide rows with 100s of millions of records).

It was faster to break it into a single batch of smaller queries. Instead of doing

select foo from bar where x not in (1, 20, 43, 99, 110, 2000, 3201, 7098, 7099)

go

it was faster to do

select foo from bar where x < 100 and x not in (1, 20, 43, 99)

select foo from bar where x >= 100 and x < 5000 and and x not in (110, 2000, 3201)

select foo from bar where x >= 5000 and and x not in (7098, 7099)

go (a single batch)

Of course, this is the kind of silly performance hack that will bite your butt when you port it to a new RDB or a new version that better optimizes the original query.



Roger L Reid
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search