SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Minus in SQLServer


Minus in SQLServer

Author
Message
Ach-230297
Ach-230297
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 1

hi all,

how can we minus two sets in SQLServer. e.g.

select * from table1

minus

select * from table2

i tried this code, but it seems that there is no MINUS clause in SQLServer
What should i do now?
Any help is greately appreciated.

Chris Hedgate
Chris Hedgate
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2277 Visits: 7
You are right, there is no built-in relational minus in SQL Server (SQL Server 2005 will have EXCEPT though). This code should do what you want:

SELECT *
FROM Table1
WHERE NOT EXISTS (
SELECT *
FROM Table2
WHERE Col1 = Table1.Col1
AND Col2 = Table1.Col2
)

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Martin Suchánek
Martin Suchánek
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 1

Right, no minus. I would recommend using JOIN like this:

SELECT *
FROM Table1 a
LEFT JOIN Table2 b
ON a.Col1 = b.Col1 AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL

I think previous example would cause SQL server to run multiple queries against data, while this one simply joins tables and extracts rows that exist in table B.


Chris Hedgate
Chris Hedgate
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2277 Visits: 7
I think you have misunderstood how the query engine in SQL Server works. One query is always one query, what is interesting is to look at the execution plans that SQL Server choose for exeuting the query. Both of our queries will probably result in similar plans (with the same physical join operation in most cases) and therefore perform more or less the same. However your query will always need a filter operator before outputting the results, which will cause it to perform slightly worse when there is a lot of rows returned.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Robert Lomax
Robert Lomax
SSChasing Mays
SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)

Group: General Forum Members
Points: 640 Visits: 14

If you check Kalen Delaney's Inside SQL Server 2000, Chapter 7, Corrolated Subqueries, Page 354, she explains that using the LEFT JOIN and testing for null uses less than half the logical I/O's of the corrolated subquery. Usually a join is more efficient than the alternative.

You may have indexes that could effect the outcome.

If you use alot of corrolated subqueries you may have to work on thinking more set based, and less row-by-row.

Jeff Lomax


Chris Hedgate
Chris Hedgate
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2277 Visits: 7
I have my copy of Inside SQL 2000 at my client's office where I won't be until thursday, so I can't check exactly what it says there. However, like I said, if you compare these queries you will see that they use almost the same execution plan.

Usually a join is more efficient than the alternative.

I usually look at what is most efficient in a specific situation.

If you use alot of corrolated [sic] subqueries you may have to work on thinking more set based, and less row-by-row.

Why is a correlated subquery row-by-row based?

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Ninja's_RGR'us
Ninja's_RGR'us
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28735 Visits: 9671
Well it does recheck the condition for every row... but it doesn't mean that the server doesn't use some sort a left join to do the check internally and fast .
Chris Hedgate
Chris Hedgate
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2277 Visits: 7
But it is still not row-by-row, if by that it would mean "as opposed to being set based". You are talking about the physical implementation. In fact, take a look at the execution plan details. If the physical operator for the outer join that both of the queries above results in is a Nested Loop (depending on the data and indexes etc it might choose another, such as Hash Match, instead), take a look at the lower of the two inputs to the right of it. Note the Number of executes for both cases...

One of the problems with SQL and it's implementations, as opposed to how a true RDBMS should function, is that you can write different queries for getting the same result, but they perform differently. Some will be faster and some slower. That is the direct result of mixing the logical and the physical level. As a programmer (the user of the DBMS) I should not be able to tell the DBMS how it should actually produce my result, I should just tell it what I want. The DBMS would then find the best possible way to do produce that result for me.

Edit: Not really meant to Remi specifically, but to the discussion as a whole.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Chris Hedgate
Chris Hedgate
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2277 Visits: 7
Ok, I have now checked what it says in Inside SQL Server 2000, and I do not make the same interpretation as you do. I cannot find anywhere where she says that "Usually a join is more efficient than the alternative". The exact words are these:

Depending on your data and indexes, the outer-join formulation might be faster or slower than a correlated subquery. But before deciding to write your query one way or the other, you might want to come up with a couple of alternative formulations and then choose the one that's fastest in your situation.

As I said in my previous post it is unfortunate that different queries can produce the same results but not perform equally well, but since that is the case we should always, like I (and Kalen) said, test our queries and use the best one. In the case presented in this thread I constantly get better results using a correlated subquery than when using a left join filtering for null, even when changing the amount of data and/or indexes. Both queries use very similar execution plans, with exactly the same amount of I/Os, but the left join uses a filter condition that adds a little cpu activity. Hardly noticeable, but still clearly readable in the execution plan and/or statistics time output.

What Kalen say regarding "fewer than half the number of logical I/Os" is that in the example that she provides in the book, that is the result. However, what is interesting is that a) her example is very similar to the one here and b) when I try her exact example I do not at all get "fewer than half the number of logical I/Os". In fact I get the exact same result as I do with the example in this thread, that is the exact same number of logical I/Os. I can not be really sure what she used for testing though, but it looks like the pubs database, so I can not see where she got those numbers from. Perhaps it was a flaw in the RTM engine that has later been fixed in a service pack.

Finally, another interesting thing to try is to use the EXCEPT keyword in SQL Server 2005. Like I said above, EXCEPT performs a relational minus. Now, the product is still in beta, but when I execute the two statements above as well as one using EXCEPT two of the three execution plans are the same. Guess which one differs, and is also the slower one? As expected EXCEPT uses the fastest execution plan to perform a relational minus on the two sets, and that execution plan happens to be the same one as the one used by the query with the correlated subquery.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/
Ninja's_RGR'us
Ninja's_RGR'us
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28735 Visits: 9671
Wow, can't wait for yukon . Thanx for the info.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search