Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

To TOP or not to TOP an EXISTS

Earlier this year I had a look at a query pattern that I often see on forums and in production code, that of the Distinct within an IN subquery. Today I’m going to look at a similar patters, that being the use of TOP 1 within an EXISTS subquery.

Three tests. First a straightforward exists with no correlation (no where clause linking it to an outer query). Second, an exists with a complex query (one with a non-sargable where clause and a group by and having). Third an exists subquery correlated to the outer query.

Table structures are nice and simple, in fact, for ease I’m going to use the same tables as I did back on the exists, in and inner join tests. Code to create and populate the tables it attached to the end of the post.

First up, a simple exists query, in an IF, just to be different.

IF EXISTS (SELECT 1 FROM PrimaryTable_Medium)
PRINT 'Exists'

IF EXISTS (SELECT TOP (1) 1 FROM PrimaryTable_Medium)
PRINT 'Exists too'

For a benchmark, a SELECT 1 FROM PrimaryTable_Medium has the following IO characteristics

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 89, physical reads 0.

SQL Server Execution Times:
CPU time = 15 ms,  elapsed time = 510 ms.

Ignore the elapsed time, that’s likely mostly from displaying the records. I’m going to focus mostly on the CPU and IO.

Execution plans of the two exists variations are absolutely identical.

TopExists1

The index operators are scans because there is no way they could be anything else, there’s no predicate so a seek is not possible. That said, it’s not a full index scan. The properties of the Index Scan show 1 row only (actual and estimated). So SQL did not read the entire index, just enough to evaluate the EXISTS, and that’s what it did in both cases. IO stats confirm that.

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 2, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.
Exists

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 2, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.
Exists too

Two reads in each case and a CPU time so low it’s immeasurable. A full scan of the index takes 89 reads (as shown earlier) so it should be clear that SQL read a minimal amount of data, both when the TOP was specified and when it wasn’t.

On to a more complex test. Again, using EXISTS within an IF

IF EXISTS (
SELECT 1 FROM PrimaryTable_Medium
WHERE RIGHT(SomeColumn,2) > 'HH'
GROUP BY LEFT(SomeColumn,1)
HAVING COUNT(*) > 1
)
PRINT 'Exists Again'

IF EXISTS (
SELECT TOP (1) 1 FROM PrimaryTable_Medium
WHERE RIGHT(SomeColumn,2) > 'HH'
GROUP BY LEFT(SomeColumn,1)
HAVING COUNT(*) > 1
)
PRINT 'Still Exists'

If I run just the SELECT 1 alone, 10 rows are returned.

Execution plans are a lot more complex, pretty much to be expected. They’re still identical, as are the IOs and CPU time.

TopExists2

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 89, physical reads 0..

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 27 ms.
Exists Again

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 89, physical reads 0.

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 45 ms.
Still Exists

This time the index scan was a scan of the entire index (89 pages). Because of the aggregation and the having, SQL couldn’t abort the scan once it had what it needed. All rows needed to be returned so that the aggregation and subsequent filter could be done.

One last test, with an EXISTS subquery.

I’m going to create a secondary table that has one 20% of the values for SomeColumn in PrimaryTable_Medium, but has each one repeated 500 times for a total of 615000 rows.

SELECT  ID ,
SomeColumn
FROM dbo.PrimaryTable_Medium pm
WHERE EXISTS (SELECT 1 FROM dbo.Secondary s WHERE pm.SomeColumn = s.SomeColumn)

SELECT  ID ,
SomeColumn
FROM dbo.PrimaryTable_Medium pm
WHERE EXISTS (SELECT TOP(1) 1 FROM dbo.Secondary s WHERE pm.SomeColumn = s.SomeColumn)

Again, the execution plans are absolutely identical

TopExists3

So, for that matter, are the execution statistics

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 22, physical reads 0.
Table ‘Secondary’. Scan count 1, logical reads 1605, physical reads 0.

SQL Server Execution Times:
CPU time = 171 ms,  elapsed time = 443 ms.

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 22, physical reads 0.
Table ‘Secondary’. Scan count 1, logical reads 1605, physical reads 0.

SQL Server Execution Times:
CPU time = 172 ms,  elapsed time = 387 ms.

So, in conclusion, is there any point in adding a TOP to an exists subquery? Does it persuade SQL to return only the minimum information needed to satisfy the Exists?

No to both. The Exists operator itself tries to retrieve jus the absolute minimum of information, so the addition of TOP 1 does nothing except add 5 characters to the query size.

Comments

Posted by Steve Jones on 6 April 2011

Interesting test. I would have thought it helped, but apparently not.

Posted by GilaMonster on 11 April 2011

It goes back to the 'say what you want, not how to do it' behaviour of a declarative language. Aka 'don't tell SQL how to do it's work'

Leave a Comment

Please register or log in to leave a comment.