mister.magoo (3/17/2014)
I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:
select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
if (select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
) = 0
print 'count is zero'
As is often the case, we need to review the query plan. The first query gets an estimated row count of 178.5 coming out of the right-anti-semi-join so it chooses a hash join type. The second query has an estimate of 1 row and chooses a merge right-anti-semi-join.
Curious is that you can affect the plan type with different number checks on the IF.
if (select count(*)
from Sales.SalesOrderDetail
where SalesOrderID not in (select SalesOrderHeader.SalesOrderID from Sales.SalesOrderHeader)) > 10000
print 'count is zero'
That one switches back to the hash join. This could be a bug or feature in the optimizer. However, I note that you should pretty much never ever do a query like this. You are forcing the engine to touch EVERY ROW just to determine if there is one or more rows. That should be an EXIST query.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
This is strange, I tried it on one of my databases and got a similar results. Even after creating indexes that I noticed missing. I had the best performance with the following query on my tables.
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
Edit: Remove a tray NOT
I fudged up a test set in case to try and narrow it down. I could not replicate the performance results.
I tried this with MAXDOP on the Server set to 0 and 1.
This is my test set and queries. Though it doesn't really help answer you question:Whistling:
/*
CREATE TABLE Table1 (
ID INT IDENTITY(1,1) PRIMARY KEY
, N INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
);
--CREATE INDEX Tab1_IDX1 ON Table1(ID) --Also tried this rather than a primary key
INSERT INTO Table1 (N) SELECT TOP 1000000 N FROM Tally
CREATE TABLE Table2 (
ID INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
)
INSERT INTO Table2 (ID) SELECT TOP 300000 N + 500 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1000 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1500 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 2000 N FROM Tally
CREATE INDEX Tab2_IDX1 ON Table2(ID)
CREATE TABLE Table3 (
ID INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
)
INSERT INTO Table3 (ID) SELECT TOP 500000 N FROM Tally
INSERT INTO Table3 (ID) SELECT TOP 500000 N + 250000 N FROM Tally
INSERT INTO Table3 (ID) SELECT TOP 500000 N + 500000 N FROM Tally
CREATE INDEX Tab3_IDX1 ON Table3(ID)
*/
DECLARE @c int
DECLARE @d DATETIME2 = GETDATE()
PRINT '--------------'
SELECT @c = COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table2)
PRINT @c
PRINT 'Straight Query - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table2)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT IN - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE NOT EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT EXISTS - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
PRINT 'IF with HAVING - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
SELECT @c = COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table3)
PRINT @c
PRINT 'Straight Query NOT IN - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table3)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT IN - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE NOT EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT EXISTS - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
PRINT 'IF with HAVING - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
/*
DROP TABLE Table1
DROP TABLE Table2
DROP TABLE Table3
*/
If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
TheSQLGuru (3/17/2014)
If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:
I agree and I was a bit loathed to post it to be honest. That was why I also mocked up some test data. It didn't perform the same on that. Tomorrow I'm going to take a closer look at the tables that I did the initial test on and try and make sense of it.
I'll try and get some actual plans posted when I do. The only reason I tried it was I thought it might have an early opt out due to the having.
Hi
The tables I were querying from have the following:
Table1 1877 unique ids with a non clustered index.
Table2 38438 joining ids, avg of 20 rows per ID up to a maximum of 216 rows, non clustered index.
Statistics all updated with fullscan.
I ran the following querys--Statement1
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
--Statement2
IF NOT EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
--Statement3
IF (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
) = 0
PRINT ' Result 0'
Here's the general stats
For Statement 3 the estimated rows are way south of the actual rows. The net effect is that Statement 3 ends up doing way more reads than it needs to.
But your 3 samples are missing the important 4th sample:
IF NOT EXISTS (
SELECT *FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
You may need to play around with the exist/not exists to get the logic you want. Oh, and you want to code this so that it DOES HIT very often/quickly (assuming that is applicable here) - that way the EXISTS logic can short-circuit and exit early in the query.
Once you get the query right, how does it perform?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
hunchback (3/17/2014)
Try using NOT EXISTS instead NOT IN.if exists(
select *
from Responses r
where not exists (
select *
from data.GamingReport_Computerized as t
where t.ResponseID = r.ResponseID
)
)
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
It will be helpful having an index by ResponseID in each table if you do not have one.
I've found that there are typically no differences in performance between NOT EXISTS and NOT IN. Lot's of tests have been run on this site that demonstrate that fact, as well.
As a side bar, I prefer NOT IN because it doesn't require correlation and the sub-query can be tested separately.
--Jeff Moden
Change is inevitable... Change for the better is not.
Hi
Have added the various query options now and we have a new winner with a query that makes sense
-- Statement 1
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
-- Statement 2
IF NOT EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
-- Statement 3
IF EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
-- Statement 4
IF (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
) = 0
PRINT ' Result 0'
-- Statement 5
IF NOT EXISTS (
SELECT 1
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
-- Statement 6
IF EXISTS (
SELECT 1
FROM Table1 c
WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
I suspect I know why, but don't have time to write an essay tonight (and that's what is required to explain it)
The EXISTS adds a row goal of 1 into the plan (because EXISTS only needs to see whether there's a row or not). Paul White's blog has at least one post on Row Goals
To see if that is the cause, does this query have the same odd performance characteristic as the IF EXISTS... version?
select TOP(1) 1 from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Cool! So it looks like the EXISTS/EXISTS is able to very quickly exit with a hit (8 reads). Oh, I forgot to ask - are your plans estimated or actual?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
The plans are actual.
And here is Paul's blog that Gail referred to.
mickyT: Which query do you think is a "winner", because those are not logically equivalent queries to my original query. For example, query 6 is using an "exists" in the where clause, which is incorrect. Of course that will short circuit because there exists 2.3 million matching rows. We're looking for existence of non-matching rows (i.e. where not exists or where ResponseID is null in the 2nd table) When you change it to "not exists" (since we're looking for rows that exist in one table but not the other), then it exhibits the exact same performance issue.
GilaMonster: The "top(1)" query you suggested does exhibit the same query plan and same performance issue.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply