Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««56789»»

Intersect, Except, Union, All and Any Expand / Collapse
Author
Message
Posted Sunday, May 23, 2010 11:06 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 1, 2014 7:58 AM
Points: 4, Visits: 74
When INNER JOIN returns duplicate rows then INTERSECT returns distinct values only. Thats why the difference in execution time occurs.
Post #926614
Posted Monday, May 24, 2010 1:48 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, June 22, 2014 5:11 AM
Points: 32, Visits: 78
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)


instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan
Post #926651
Posted Monday, May 24, 2010 7:44 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 8:10 AM
Points: 1, Visits: 14
I like using the EXCEPT clause because I find it makes the query easier to read. If you have any familiarity with Venn diagrams you can simply focus on understanding each of the two select statements independently and then apply set theory to understand the final result. Nice to know your experiment shows that performance is equivalent
Post #926837
Posted Monday, May 24, 2010 8:31 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, May 6, 2014 5:51 AM
Points: 6,266, Visits: 2,028
johan.lindell (5/24/2010)
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)


instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan


I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!



* Noel
Post #926878
Posted Tuesday, May 25, 2010 7:39 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 9:26 AM
Points: 2,278, Visits: 3,055
noeld (5/24/2010)
johan.lindell (5/24/2010)
Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)

-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)


instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan


I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!


+1




My blog: http://jahaines.blogspot.com
Post #927459
Posted Tuesday, May 25, 2010 7:52 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, June 22, 2014 5:11 AM
Points: 32, Visits: 78
Sorry, it was a bit drastic to say it always gives better performance, I should have said normally gives same performance or better. In the example case of the article I got a 75 % reduction in query time. In my experience the EXISTS scales better than NOT IN when there are many rows in the NOT IN / EXISTS table. For production environments EXISTS is a safer solution than NOT IN. In my opinion.

Johan
Post #927479
Posted Thursday, August 11, 2011 4:43 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, June 23, 2014 7:30 AM
Points: 317, Visits: 153
Dear David,
Thanks for detailed explanation – I use your article as reference for all these topics. Unfortunately unlike your conclusion for EXPECT case I found situation where NOT IN gives dramatically worst performance:
I try to compare local and remote table. For both cases EXCEPT and LEFT JOIN it gives roughly the same performance. See here:
[EXCEPT]
SELECT ac.Article_Code
FROM relAC ac
EXCEPT
SELECT Code
FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products



Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.
Both: Match rows from two suitably sorted input tables exploiting their sort order.

Cost = 48,5442

[LEFT JOIN]
SELECT ac.Article_Code
FROM relAC AS ac
LEFT JOIN [SRV-DCI-PRAHA].[DCI].[dciowner].Products AS pr
ON ac.Article_Code = pr.Code
WHERE pr.Code IS NULL



Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.
Both: Match rows from two suitably sorted input tables exploiting their sort order
Both: Restricting the set of rows based on a predicate.

Cost = 48,6391

[NOT IN]
Unfortunately using of NOT IN generates local temporary table for better performance for rewinds and generate Nested Loops for each row of local table for comparison of each row in temporary table. These loops dramatically impair performance of actual query. See Execution Plan here:
SELECT ac.Article_Code
FROM relAC ac
WHERE ac.Article_Code NOT IN (
SELECT Code
FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products



Local table: Scanning a clustered index, entirely or only a range.
Remote Table: Send a SQL query to another than the current SQL Server.
Remote Table: Compute new values from existing values in a row.

Remote Table: Stores the data from the input into a temporary table in order to optimize rewinds.
Both: For each row in the top (outer) input, scan the bottom (inner) input, and output matching rows.

Cost = 77826,3!!
Petr


__________________________________
Standing on the shoulders of giants ...
Post #1158315
Posted Thursday, August 11, 2011 1:57 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 1:30 AM
Points: 2,898, Visits: 1,795
Thanks for the feedback. I haven't tried it with a linked server query.

Where possible I base my articles and experiments on the sample databases Adventureworks, Northwind or Pubs simply because they provide a consistent base for people to peer review what I am doing.

One of the most important scientific practises is to put up, not only the results but also the experiment itself up for review. It is quite easy to devise a flawed experiment and put too much store in a set of results produced by that experiment.

What I attempt to do with my articles is put up a good solid starting point for a discussion. I've learnt some valuable lessons from the feedback given which means that (hopefully) I'm getting as much out of the articles as I hope I am giving.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1158785
Posted Friday, December 23, 2011 7:05 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, November 8, 2013 11:02 AM
Points: 5, Visits: 53
How do we calculate the overall query cost? i.e .05... will it be shown in the Exec plan itself
Post #1226232
Posted Friday, December 23, 2011 7:17 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, March 22, 2012 9:42 AM
Points: 13, Visits: 31
Recently I ran into the difference between UNION and UNION ALL. Pity you don't mention that.

I encountered that a query result of (say) 300 records UNION'd with a query result of zero records could end up in an overall result set of less than 300 records!
After some googling and stuff I found out that the function UNION in fact also performs a GROUP BY on all fields. In other words: There will no longer be duplicate records in the end result set. If the duplicates are the result of the UNION or not doesn't matter.

A query result of (say) 300 records UNION ALL'd with a query result of zero records gets you a result set of exactly (say) 300 records!
Post #1226239
« Prev Topic | Next Topic »

Add to briefcase «««56789»»

Permissions Expand / Collapse