SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Intersect, Except, Union, All and Any


Intersect, Except, Union, All and Any

Author
Message
pvnidheesh
pvnidheesh
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 74
When INNER JOIN returns duplicate rows then INTERSECT returns distinct values only. Thats why the difference in execution time occurs.
johan.lindell
johan.lindell
Valued Member
Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)

Group: General Forum Members
Points: 50 Visits: 123
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries


-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)




instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan
jim.sinclair
jim.sinclair
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 15
I like using the EXCEPT clause because I find it makes the query easier to read. If you have any familiarity with Venn diagrams you can simply focus on understanding each of the two select statements independently and then apply set theory to understand the final result. Nice to know your experiment shows that performance is equivalent
noeld
noeld
SSCrazy Eights
SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)

Group: General Forum Members
Points: 9492 Visits: 2048
johan.lindell (5/24/2010)
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries


-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)




instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan


I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!


* Noel
Adam Haines
Adam Haines
Hall of Fame
Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)

Group: General Forum Members
Points: 3162 Visits: 3135
noeld (5/24/2010)
johan.lindell (5/24/2010)
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries


-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)




instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan


I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!


+1



My blog: http://jahaines.blogspot.com
johan.lindell
johan.lindell
Valued Member
Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)Valued Member (50 reputation)

Group: General Forum Members
Points: 50 Visits: 123
Sorry, it was a bit drastic to say it always gives better performance, I should have said normally gives same performance or better. In the example case of the article I got a 75 % reduction in query time. In my experience the EXISTS scales better than NOT IN when there are many rows in the NOT IN / EXISTS table. For production environments EXISTS is a safer solution than NOT IN. In my opinion. :-)

Johan
pitvax
pitvax
Old Hand
Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)Old Hand (321 reputation)

Group: General Forum Members
Points: 321 Visits: 155
Dear David,
Thanks for detailed explanation – I use your article as reference for all these topics. Unfortunately unlike your conclusion for EXPECT case I found situation where NOT IN gives dramatically worst performance:
I try to compare local and remote table. For both cases EXCEPT and LEFT JOIN it gives roughly the same performance. See here:
[EXCEPT]

SELECT ac.Article_Code
FROM relAC ac
EXCEPT
SELECT Code
FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products




Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.
Both: Match rows from two suitably sorted input tables exploiting their sort order.

Cost = 48,5442

[LEFT JOIN]

SELECT ac.Article_Code
FROM relAC AS ac
LEFT JOIN [SRV-DCI-PRAHA].[DCI].[dciowner].Products AS pr
ON ac.Article_Code = pr.Code
WHERE pr.Code IS NULL




Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.
Both: Match rows from two suitably sorted input tables exploiting their sort order
Both: Restricting the set of rows based on a predicate.

Cost = 48,6391

[NOT IN]
Unfortunately using of NOT IN generates local temporary table for better performance for rewinds and generate Nested Loops for each row of local table for comparison of each row in temporary table. These loops dramatically impair performance of actual query. See Execution Plan here:

SELECT ac.Article_Code
FROM relAC ac
WHERE ac.Article_Code NOT IN (
SELECT Code
FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products




Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.

Remote Table: Stores the data from the input into a temporary table in order to optimize rewinds.
Both: For each row in the top (outer) input, scan the bottom (inner) input, and output matching rows.

Cost = 77826,3!!
Petr

__________________________________
Standing on the shoulders of giants ...
David.Poole
David.Poole
SSCertifiable
SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)

Group: General Forum Members
Points: 7369 Visits: 3279
Thanks for the feedback. I haven't tried it with a linked server query.

Where possible I base my articles and experiments on the sample databases Adventureworks, Northwind or Pubs simply because they provide a consistent base for people to peer review what I am doing.

One of the most important scientific practises is to put up, not only the results but also the experiment itself up for review. It is quite easy to devise a flawed experiment and put too much store in a set of results produced by that experiment.

What I attempt to do with my articles is put up a good solid starting point for a discussion. I've learnt some valuable lessons from the feedback given which means that (hopefully) I'm getting as much out of the articles as I hope I am giving.

LinkedIn Profile

Newbie on www.simple-talk.com
arun.mav
arun.mav
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 53
How do we calculate the overall query cost? i.e .05... will it be shown in the Exec plan itself
jaguarxj12l
jaguarxj12l
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 31
Recently I ran into the difference between UNION and UNION ALL. Pity you don't mention that.

I encountered that a query result of (say) 300 records UNION'd with a query result of zero records could end up in an overall result set of less than 300 records!
After some googling and stuff I found out that the function UNION in fact also performs a GROUP BY on all fields. In other words: There will no longer be duplicate records in the end result set. If the duplicates are the result of the UNION or not doesn't matter.

A query result of (say) 300 records UNION ALL'd with a query result of zero records gets you a result set of exactly (say) 300 records!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search