Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Minus in SQLServer Expand / Collapse
Author
Message
Posted Monday, August 15, 2005 3:50 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, July 4, 2007 11:03 AM
Points: 19, Visits: 1

hi all,

how can we minus two sets in SQLServer. e.g.

select * from table1

minus

select * from table2

 
i tried this code, but it seems that there is no MINUS clause in SQLServer
What should i do now?
 
Any help is greately appreciated.
Post #210359
Posted Monday, August 15, 2005 4:58 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Saturday, February 28, 2009 6:51 AM
Points: 1,489, Visits: 7
You are right, there is no built-in relational minus in SQL Server (SQL Server 2005 will have EXCEPT though). This code should do what you want:

SELECT *
FROM Table1
WHERE NOT EXISTS (
SELECT *
FROM Table2
WHERE Col1 = Table1.Col1
AND Col2 = Table1.Col2
)




--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Post #210364
Posted Tuesday, August 16, 2005 1:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, February 13, 2006 4:34 AM
Points: 1, Visits: 1

Right, no minus. I would recommend using JOIN like this:

 

SELECT *
FROM Table1 a
LEFT JOIN Table2 b
  ON a.Col1 = b.Col1 AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL

I think previous example would cause SQL server to run multiple queries against data, while this one simply joins tables and extracts rows that exist in table B.

Post #210707
Posted Tuesday, August 16, 2005 2:41 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Saturday, February 28, 2009 6:51 AM
Points: 1,489, Visits: 7
I think you have misunderstood how the query engine in SQL Server works. One query is always one query, what is interesting is to look at the execution plans that SQL Server choose for exeuting the query. Both of our queries will probably result in similar plans (with the same physical join operation in most cases) and therefore perform more or less the same. However your query will always need a filter operator before outputting the results, which will cause it to perform slightly worse when there is a lot of rows returned.



--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Post #210720
Posted Tuesday, August 16, 2005 11:22 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Sunday, August 16, 2009 5:37 PM
Points: 614, Visits: 14

If you check Kalen Delaney's Inside SQL Server 2000, Chapter 7, Corrolated Subqueries, Page 354, she explains that using the LEFT JOIN and testing for null uses less than half the logical I/O's of the corrolated subquery.  Usually a join is more efficient than the alternative.

You may have indexes that could effect the outcome.

If you use alot of corrolated subqueries you may have to work on thinking more set based, and less row-by-row.

Jeff Lomax

Post #210944
Posted Tuesday, August 16, 2005 3:59 PM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Saturday, February 28, 2009 6:51 AM
Points: 1,489, Visits: 7
I have my copy of Inside SQL 2000 at my client's office where I won't be until thursday, so I can't check exactly what it says there. However, like I said, if you compare these queries you will see that they use almost the same execution plan.

Usually a join is more efficient than the alternative.

I usually look at what is most efficient in a specific situation.

If you use alot of corrolated [sic] subqueries you may have to work on thinking more set based, and less row-by-row.

Why is a correlated subquery row-by-row based?




--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Post #211046
Posted Tuesday, August 16, 2005 10:39 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 4:23 AM
Points: 20,584, Visits: 9,624
Well it does recheck the condition for every row... but it doesn't mean that the server doesn't use some sort a left join to do the check internally and fast .
Post #211096
Posted Friday, August 19, 2005 3:56 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Saturday, February 28, 2009 6:51 AM
Points: 1,489, Visits: 7
But it is still not row-by-row, if by that it would mean "as opposed to being set based". You are talking about the physical implementation. In fact, take a look at the execution plan details. If the physical operator for the outer join that both of the queries above results in is a Nested Loop (depending on the data and indexes etc it might choose another, such as Hash Match, instead), take a look at the lower of the two inputs to the right of it. Note the Number of executes for both cases...

One of the problems with SQL and it's implementations, as opposed to how a true RDBMS should function, is that you can write different queries for getting the same result, but they perform differently. Some will be faster and some slower. That is the direct result of mixing the logical and the physical level. As a programmer (the user of the DBMS) I should not be able to tell the DBMS how it should actually produce my result, I should just tell it what I want. The DBMS would then find the best possible way to do produce that result for me.

Edit: Not really meant to Remi specifically, but to the discussion as a whole.




--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Post #212008
Posted Friday, August 19, 2005 6:18 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Saturday, February 28, 2009 6:51 AM
Points: 1,489, Visits: 7
Ok, I have now checked what it says in Inside SQL Server 2000, and I do not make the same interpretation as you do. I cannot find anywhere where she says that "Usually a join is more efficient than the alternative". The exact words are these:

Depending on your data and indexes, the outer-join formulation might be faster or slower than a correlated subquery. But before deciding to write your query one way or the other, you might want to come up with a couple of alternative formulations and then choose the one that's fastest in your situation.

As I said in my previous post it is unfortunate that different queries can produce the same results but not perform equally well, but since that is the case we should always, like I (and Kalen) said, test our queries and use the best one. In the case presented in this thread I constantly get better results using a correlated subquery than when using a left join filtering for null, even when changing the amount of data and/or indexes. Both queries use very similar execution plans, with exactly the same amount of I/Os, but the left join uses a filter condition that adds a little cpu activity. Hardly noticeable, but still clearly readable in the execution plan and/or statistics time output.

What Kalen say regarding "fewer than half the number of logical I/Os" is that in the example that she provides in the book, that is the result. However, what is interesting is that a) her example is very similar to the one here and b) when I try her exact example I do not at all get "fewer than half the number of logical I/Os". In fact I get the exact same result as I do with the example in this thread, that is the exact same number of logical I/Os. I can not be really sure what she used for testing though, but it looks like the pubs database, so I can not see where she got those numbers from. Perhaps it was a flaw in the RTM engine that has later been fixed in a service pack.

Finally, another interesting thing to try is to use the EXCEPT keyword in SQL Server 2005. Like I said above, EXCEPT performs a relational minus. Now, the product is still in beta, but when I execute the two statements above as well as one using EXCEPT two of the three execution plans are the same. Guess which one differs, and is also the slower one? As expected EXCEPT uses the fastest execution plan to perform a relational minus on the two sets, and that execution plan happens to be the same one as the one used by the query with the correlated subquery.




--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Post #212031
Posted Friday, August 19, 2005 7:18 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 4:23 AM
Points: 20,584, Visits: 9,624
Wow, can't wait for yukon . Thanx for the info.
Post #212061
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse