Blog Post

Which is faster? IN (list) or IN (SELECT FROM Temp)

,

If you’ve done much with IN (list) then you’ve realized that it basically translates out to

col=val1 OR col=val2 OR ....

You’ve probably also realized that you can throw all the values into a temp table and do this

SELECT *
FROM tablename
WHERE col IN (SELECT col FROM #temp

But which is faster? Well, let’s find out. First I’m going to create a table of 1,000,000 values to pull against. This way I will hopefully get some useful times and not one that takes 10ms and the other 9ms.

tl;dr; It was pretty inconclusive even at large numbers of values.

-- Set up the lookup table
USE Test;
GO
WITH x AS (
    SELECT TOP (100) 
        CAST(ROW_NUMBER() OVER (ORDER BY val.[name]) AS INT) AS x
    FROM [master]..spt_values val),
Nums AS (
SELECT
        CAST(ROW_NUMBER() OVER (ORDER BY x.x) AS INT) AS Num
FROM x
CROSS JOIN x y
CROSS JOIN x z)
SELECT 
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col1,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col2,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col3,
REPLICATE('a',ABS(CHECKSUM(NEWID()) % 1000)) AS col4,
DATEADD(minute,ABS(CHECKSUM(NEWID()) % 10000),'1/1/2000') AS DateCol
INTO ListTable
FROM Nums;
GO

I’m going to do some tests of various sizes. The first one will be a small test of just 10 values.

Small test (10 values)

-- Get a small list of values to look up
SELECT TOP 10 DateCol INTO #ListTemp
FROM ListTable
ORDER BY newid();
DECLARE @ListVar nvarchar(1000)
SELECT @ListVar = 
STUFF((SELECT ', ' + QUOTENAME(CONVERT(varchar(30), DateCol, 121),'''')
        FROM #ListTemp
        FOR XML PATH(''),TYPE).value('.','VARCHAR(MAX)')
    , 1, 2, '');
DECLARE @sql varchar(max)
SET @sql = 
'SELECT * FROM ListTable
WHERE DateCol IN ('+@ListVar+');'
PRINT @sql;

Now using the query printed out:

SET STATISTICS TIME, IO ON
SELECT * FROM ListTable
WHERE DateCol IN ('2000-01-02 01:28:00.000', '2000-01-06 07:05:00.000', '2000-01-05 20:24:00.000', '2000-01-02 18:15:00.000', '2000-01-02 08:12:00.000', '2000-01-07 03:48:00.000', '2000-01-07 18:07:00.000', '2000-01-06 03:31:00.000', '2000-01-03 19:55:00.000', '2000-01-04 22:13:00.000');
SELECT * FROM ListTable
WHERE DateCol IN (SELECT DateCol FROM #ListTemp);

I ran this twice and discarded the first run because it was loading the data into memory. Yes, I realize that could be useful information, but I’m skipping that part of it for today.

SQL Server parse and compile time: 
   CPU time = 12 ms, elapsed time = 12 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 3, logical reads 289125, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 844 ms,  elapsed time = 984 ms.
(1049 row(s) affected)
Table '#ListTemp___________________________________________________________________________________________________________000000000004'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ListTable'. Scan count 3, logical reads 289125, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 484 ms,  elapsed time = 472 ms.

Ok, so the logical reads are about the same. More from the IN (SELECT) actually, but only by a fraction because of the read on the temp table and some work tables. The time, however, was almost exactly half using the temp table. Now, this was without any indexes. So now I’m going to run the same queries again but this time having put a non-clustered index on ListTable.

CREATE INDEX ix_ListTable ON ListTable(DateCol);

This is a pretty simple index, and I don’t have a clustered index, nor is this a covering index but honestly, it’s a SELECT * so a covering index isn’t likely anyway.

This time the results are:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 3 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 10, logical reads 1091, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 264 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 6 ms.
(1049 row(s) affected)
Table 'ListTable'. Scan count 10, logical reads 1081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ListTemp___________________________________________________________________________________________________________000000000004'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 208 ms.

Oddly enough slightly less reads for the IN (SELECT) this time and it’s only slightly faster. I’m also tried adding an index on the temp table but with no significant change.

Summary (10 values)

With a small test (only 10 values) the IN (SELECT) was faster, with or without an index.

Large test (1000 values)

I’m going to use the exact same process as above but I’m not going to actually post the query because I feel like it will be too long for a blog post. Here is the code to generate the query though if you want to play along. I did have to make a few changes. The TOP became a TOP (1000) instead of a TOP (10), the varchar(1000) became a varchar(max) and the PRINT became a SELECT because you get more information that way.

SELECT TOP 1000 DateCol INTO #ListTemp
FROM ListTable
ORDER BY newid();
DECLARE @ListVar varchar(MAX)
SELECT @ListVar = 
STUFF((SELECT ', ' + QUOTENAME(CONVERT(varchar(30), DateCol, 121),'''')
        FROM #ListTemp
        FOR XML PATH(''),TYPE).value('.','VARCHAR(MAX)')
    , 1, 2, '');
DECLARE @sql varchar(max)
SET @sql = 
'SELECT * FROM ListTable
WHERE DateCol IN ('+@ListVar+');'
SELECT @sql;

Without the index: (I’m skipping IO this time because it didn’t seem to tell us much last time)

(96517 row(s) affected)
 SQL Server Execution Times:
   CPU time = 1704 ms,  elapsed time = 5472 ms.
(96517 row(s) affected)
 SQL Server Execution Times:
   CPU time = 1250 ms,  elapsed time = 6031 ms.

Hmm, faster for the IN (list) this time. Let’s try it with an index on ListTable.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
(95297 row(s) affected)
 SQL Server Execution Times:
   CPU time = 1375 ms,  elapsed time = 5047 ms.
(95297 row(s) affected)
 SQL Server Execution Times:
   CPU time = 1250 ms,  elapsed time = 5787 ms.

Still a bit faster for the IN (list). And again, I tried with an index on the temp table with no major difference.

Summary (1000 values)

This time the IN (list) was faster with and without indexes. By less than a second though.

Summary (10000 values)

I also did a test with 10,000 values. In this case, the IN (list) was still faster without an index, but once I put an index on the table the IN (SELECT) became faster, and with an index on both the table and the temp table it was a bit faster.

Over all summary

My results were really pretty inconclusive. I was really surprised since I truly expected the IN (SELECT) to be faster. Particularly with an index. But in the end, the timing wasn’t significantly different (a few seconds here or there) and I can’t be certain if that’s not something I’ve done wrong. So for most cases, I’m just going to use the version more convenient. For cases where even a fraction of a second matters (100’s of runs a second anyone?) I’m going to have to test to see which is working fastest in that case.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating