Blog Post

In, Exists and join – a roundup

,

Over the last several months I’ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there’s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other.

Previous parts of this series can be found:

In this roundup post, I’m going to do multiple tests on the 6 query forms, with different numbers of rows, indexes, no indexes and, for the negative forms (NOT IN, NOT EXISTS), nullable and non-nullable join columns.

In the individual tests, I used 250000 rows in the first table and around 3000 rows in the secondary table. In this roundup, I’m going to use 3 different row counts, 1000000 rows, 100000 rows and 5000 rows. That should give a reasonable idea for performance at various table sizes. (Not much point in going smaller than 5000 rows. Everything’s fast on 100 rows)

Some notes on the tests.

  • The version of SQL is SQL Server 2008 SP1 x64 Developer Edition.
  • The tests were run on a laptop. Core-2 Duo, 3 GB memory. SQL limited to 1 processor, so no parallelism possible.
  • Each query will be run 10 times, reads, cpu and duration measured by profiler and averaged.
  • Each query will be run once before the tests start to ensure that the data is in cache and the execution plans are generated and cached.
  • Reproduction scripts will be available for download.

Exists vs. In vs. Inner Join

First, no indexes on the join columns

Table SizeOperatorCPUReads Duration
LargeIN1293145859649
Exists1260145859573
Inner Join1302145859716
MediumIN59747538
Exists78747574
Inner Join69747523
SmallIN74165
Exists34191
Inner Join44165

Now with indexes on the join columns

Table SizeOperatorCPUReads Duration
LargeIN97317609707
Exists95617609483
Inner Join117317609539
MediumIN43100516
Exists53100548
Inner Join59100498
SmallIN3964
Exists1980
Inner Join4967

Not Exists vs. Not In vs. Left Outer Join … Is Null

First test with the columns join columns nullable, no indexes

Table SizeOperatorCPUReads Duration
LargeNOT IN319420146223251
NOT Exists82014585837
Outer Join962145851025
MediumNOT IN174100765217
NOT Exists54747121
Outer Join5374779
SmallNOT IN12504313
NOT Exists4416
Outer Join3415

Then with join columns nullable with indexes

Table SizeOperatorCPUReads Duration
LargeNOT IN267720017622726
NOT Exists5691760586
Outer Join94917601029
MediumNOT IN137100102164
NOT Exists40100104
Outer Join4810069
SmallNOT IN11501112
NOT Exists394
Outer Join696

Now, let’s make the join columns not nullable. Again, no indexes to start with.

Table SizeOperatorCPUReads Duration
LargeNOT IN74114585753
NOT Exists78414585790
Outer Join88414585937
MediumNOT IN43747103
NOT Exists49747120
Outer Join5374774
SmallNOT IN4414
NOT Exists1415
Outer Join1415

and finally, join columns not nullable, with indexes

Table SizeOperatorCPUReads Duration
LargeNOT IN5781382588
NOT Exists5851382597
Outer Join95313821006
MediumNOT IN378079
NOT Exists348079
Outer Join398084
SmallNOT IN384
NOT Exists185
Outer Join485

These results seem to pretty much confirm the earlier conclusions.

Exists and IN perform much the same, whether there are indexes on the join column or not. When there are indexes on the join columns, the INNER JOIN is slightly (very slightly) slower, which is more noticeable on the large tables, much less on the medium or small ones. (Note I’m mostly looking at CPU time, as the duration is also affected by sending of results to client, in this case, lots and lots of results)

When it comes to NOT In and NOT Exists they perform much the same when the columns involved are not nullable. If the columns are nullable, Not In is significantly slower because it has a different behaviour when nulls are present.

The join is slightly slower than Not Exists (or Not In on non-nullable columns), again only noticeable on the large table, probably because the optimiser has to do a full join with a secondary filter rather than the anti-semi join that it can use for Not Exists and Not In.

My conclusion from earlier posts stands. If all you are doing is looking for matching or non-matching rows and you don’t need any columns from the second table, use IN or Exists (or their negations), as appropriate for the situation. Only when you need columns from the second table should Join be used.

I think (and hope) that this adequately concludes the discussion on the Exists and In and joins, both behaviour and performance.

Reproduction scripts

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating