EXISTS queries that can't be written any other way

Question

Post reply

EXISTS queries that can't be written any other way

peter 82125

SSCommitted

Points: 1612
More actions
October 21, 2015 at 1:26 pm

#312435

Hello. All examples of the EXIST operator I know of use correlated subqueries that could be also written as a join with no performance gain or loss. Can anyone give me an example of a subquery that can only be written using EXISTS?
Thank you!

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Michael L John One Orange Chip Points: 27079 More actions · Answer 1

The simple example is if the subquery returns more than one row for the joining condition.

If you use a join, the results contain extra rows.

EXISTS will not create the extra rows.

You can certainly eliminate the extra rows with a GROUP BY or DISTINCT, but that almost certainly will cause a performance hit.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

peter 82125 SSCommitted Points: 1612 More actions · Answer 2

Thanks for the reply. Sounds perfectly reasonable, except that when I compare the 2 queries, one written with the EXISTS operator and a correlated subquery and the other as a join, the execution plans and costs are identical.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

drew.allen SSC Guru Points: 77014 More actions · Answer 4

I think that it's always possible to rewrite an EXISTS query with an INNER JOIN or a CROSS APPLY.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Michael L John One Orange Chip Points: 27079 More actions · Answer 5

Can you post some sample data and your queries?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

peter 82125 SSCommitted Points: 1612 More actions · Answer 6

Thanks, Gail and Drew. Gail, the links don't directly discuss EXISTS vs join, but it looks like one can conclude what Drew is saying that there is really no difference in performance, is that what you would say, too? Unless, perhaps the data set I'm using is not big enough to produce a noticeable difference in performance?

peter 82125 SSCommitted Points: 1612 More actions · Answer 7

Michael, I'm using the AdventureWorks database and the following:

USE AdventureWorks

SELECT Name

FROM Production.Product p

WHERE EXISTS

(SELECT *

FROM Production.ProductSubcategory

WHERE ProductSubcategoryID = p.ProductSubcategoryID

AND Name = 'Wheels')

SELECT p.Name

FROM Production.Product p

JOIN Production.ProductSubcategory psc

ON psc.ProductSubcategoryID = p.ProductSubcategoryID

WHERE psc.Name = 'Wheels'

drew.allen SSC Guru Points: 77014 More actions · Answer 8

peter 82125 (10/21/2015)
Thanks, Gail and Drew. Gail, the links don't directly discuss EXISTS vs join, but it looks like one can conclude what Drew is saying that there is really no difference in performance, is that what you would say, too? Unless, perhaps the data set I'm using is not big enough to produce a noticeable difference in performance?

Note, I did not say that there would be no difference in performance. I just said that they could be rewritten without using EXISTS. I've made assumptions about relative performance in the past and have learned to always test if I think that there might be a better performing alternative. I make no claims about the performance of tenuous queries.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

peter 82125 SSCommitted Points: 1612 More actions · Answer 9

peter 82125

SSCommitted

Points: 1612

October 21, 2015 at 5:35 pm

#1834797

Duly noted.

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 10

It may be rather academic, but there is at least one case where EXISTS is nicer than a JOIN.

The potential (emphasis on potential) performance advantage that EXISTS has is the use of a semi join, which allows the subquery to effectively short-circuit when a match is found (on that note, the query plans you're seeing are likely not actually identical; while the physical join type might well be the same in both cases, the logical join type for the EXISTS query probably shows as a left semi join instead of an inner join).

If there are very few matches, then you may see very little difference in performance, since there won't be much use of short-circuiting when a match is found. If there are many, many matches, then you'll start to see a bigger difference.

Also, if the criterion for matching rows is an equality condition, then once you have enough rows, both the JOIN and the EXISTS will likely end up using hash joins and just scan each table once, again minimizing the difference.

If the criterion for matching rows is not an equality condition and there are many matches, then you can start to see a big difference between EXISTS and the JOIN. Going back to what I said about this being somewhat academic, it's true that matching rows on a condition other than equality is rather uncommon compared to matching rows on equality. Still, this is a case where EXISTS can significantly outperform a JOIN.

Below is some code to illustrate that (takes about a minute to run on my machine; if you have a beastly machine, it might finish much more quickly:-)):

--Of course don't run the FREEPROCCACHE on your production server.

--I'm just running it so my query stats come back more cleanly

--on my lab server

DBCC FREEPROCCACHE

CREATE TABLE #BlackList (Pattern varchar(30))

CREATE TABLE #Phrases (Phrase char(36))

--Populate a bunch of "phrases"

INSERT INTO #Phrases

SELECT TOP 5000 CAST(NEWID() AS char(36))

--Construct a bunch of rows

FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t7(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t8(n)

CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t9(n)

--Populate a bunch of blacklisted patterns based on the phrases

--we just created

INSERT INTO #BlackList

SELECT TOP 1000 '%'+SUBSTRING(Phrase,4,5)+'%'

FROM #Phrases

INSERT INTO #BlackList

SELECT TOP 1000 '%'+SUBSTRING(Phrase,15,6)+'%'

FROM #Phrases

INSERT INTO #BlackList

SELECT TOP 1000 '%'+SUBSTRING(Phrase,23,3)+'%'

FROM #Phrases

--I like to use EXEC just to make sure the query text doesn't change

--just because I include a little extra white space or some such

--nonsense. That makes sure I keep the stats under the same entry

--in sys.dm_exec_query_stats.

--I also like to prefix with "--QueryTest:" so I can find the query

--stats easily and don't have to run SET STATISTICS or other

--observer-effect-laden things

--I'm also using MAXDOP 1 just to avoid any quirks from parallelism

--Ok, let's find out how many of our phrases run afoul of the blacklist

EXEC('--QueryTest: INNER JOIN

SELECT COUNT(Phrase)

FROM #Phrases Phr

INNER JOIN #Blacklist Pat

ON Phr.Phrase LIKE Pat.Pattern

OPTION (MAXDOP 1)')

EXEC('--QueryTest: EXISTS

SELECT COUNT(*) FROM #Phrases Phr

WHERE EXISTS (SELECT NULL FROM #Blacklist Pat WHERE Phr.Phrase LIKE Pat.Pattern)

OPTION (MAXDOP 1)')

--The INNER JOIN above gives us the wrong counts because some phrases

--match multiple patterns, so we have to add a DISTINCT to remedy that

EXEC('--QueryTest: DISTINCT/INNER JOIN

SELECT COUNT(DISTINCT Phrase)

FROM #Phrases Phr

INNER JOIN #Blacklist Pat

ON Phr.Phrase LIKE Pat.Pattern

OPTION (MAXDOP 1)')

--CROSS APPLY with TOP 1 is another option that can perform reasonably

--under the right circumstances. Here it performs similarly to EXISTS.

EXEC('--QueryTest: CROSS APPLY

SELECT COUNT(Phrase)

FROM #Phrases Phr

CROSS APPLY (SELECT TOP 1 Pattern FROM #Blacklist Pat WHERE Phr.Phrase LIKE Pat.Pattern) X

OPTION (MAXDOP 1)')

SELECT

qs.last_logical_reads,

qs.last_worker_time/1000 AS CPU_ms,

qs.last_elapsed_time/1000 AS Duration_ms,

qs.last_execution_time, qp.query_plan, qt.text

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp

CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) qt

WHERE qt.text LIKE '--QueryTest%'

ORDER BY last_execution_time desc

DROP TABLE #Blacklist, #Phrases

You can play around with the ratio between Phrases and matching Blacklist patterns to see how that affects the performance difference.

Hopefully this helps! (Also, full disclosure, it's getting late and I'm severely under-caffeinated, so if something I've said seems ridiculous, it probably is, and I apologize in advance for that :-))

Cheers!

Felix Pamittan Mr or Mrs. 500 Points: 535 More actions · Answer 11

Here is a related blog by Aaron Bertrand: sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url]

Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs[/url]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 12

peter 82125 (10/21/2015)
Thanks, Gail and Drew. Gail, the links don't directly discuss EXISTS vs join, but it looks like one can conclude what Drew is saying that there is really no difference in performance, is that what you would say, too? Unless, perhaps the data set I'm using is not big enough to produce a noticeable difference in performance?

There's a roundup that I didn't post the link for which compares all three. Still, EXISTS compared to IN and IN to join, you can extrapolate the result easily enough

There is a performance difference, but it's minimal. It's not enough that I'd convert a join to EXISTS for performance reasons, but I will use EXISTS when I'm looking for whether there are rows or not and joins when I need to actually join.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

cyp901 Mr or Mrs. 500 Points: 539 More actions · Answer 13

EXISTS is also safer than IN when you're dealing with a nullable column.

peter 82125 SSCommitted Points: 1612 More actions · Answer 14

Thank you all so much for your informative replies! Jacob, you rock, caffeine or none! Really appreciate it 🙂