Why would wrapping a TSQL query in an if statement increase its runtime significantly?

Question

Post reply

Why would wrapping a TSQL query in an if statement increase its runtime significantly?

triynko22

SSC Veteran

Points: 286
More actions
March 17, 2014 at 4:12 pm

#305237

I'm having odd performance difference when running the same query two different ways. Literally, the only difference is whether its wrapped in an if statement.
This query is actually part of a larger one, but I've isolated it as the culprit.
Running the query by itself returns almost immediately (returns 0)
select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))
Ultimately, I want to avoid running complex calculations based on the result of that query, so I wrap it in an if statement like so, but it takes much longer to run (~10 seconds) and I can't figure out why:
if (0 = (select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
)))
begin select 'update will be skipped to save time' end
else begin select 'missing rows will be inserted' end
The data set does not change so in both cases the result is zero / 'update will be skipped', and yet running these two versions of the same query always results in the first version completing quickly, and the second version taking about 10-12 seconds to complete.
the following query performs identically to the second version above, with the same execution plan and no increase in performance (i.e. using 'exists' instead of comparing count(*) to zero makes no difference).
if exists(select 1 from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
Changing the query to use left or right join where data.GamingReport_Computerized.ResponseID is null performs the same way as using 'select where not in'.
It seems that using a select statement inside an if statement is causing a bad query execution plan to form. The only way around this I've found is to first assign the result of count to a variable and then test the variable with an if statement. Any combination that involves using 'if' and 'select' in the same statement results in a bad execution plan with poor performance.
See discussion here as well: http://stackoverflow.com/q/22465078/88409
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Hunchback SSCrazy Points: 2649 More actions · Answer 1

Try using NOT EXISTS instead NOT IN.

if exists(

select *

from Responses r

where not exists (

select *

from data.GamingReport_Computerized as t

where t.ResponseID = r.ResponseID

)

begin select 'missing rows will be inserted' end

else begin select 'update will be skipped to save time' end

It will be helpful having an index by ResponseID in each table if you do not have one.

mister.magoo SSC-Forever Points: 47068 More actions · Answer 2

I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:

select count(*)

from sales.salesorderdetail

where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)

if (select count(*)

from sales.salesorderdetail

where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)

) = 0

print 'count is zero'

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

mister.magoo (3/17/2014)
I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:
select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
if (select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
) = 0
print 'count is zero'

As is often the case, we need to review the query plan. The first query gets an estimated row count of 178.5 coming out of the right-anti-semi-join so it chooses a hash join type. The second query has an estimate of 1 row and chooses a merge right-anti-semi-join.

Curious is that you can affect the plan type with different number checks on the IF.

if (select count(*)

from Sales.SalesOrderDetail

where SalesOrderID not in (select SalesOrderHeader.SalesOrderID from Sales.SalesOrderHeader)) > 10000

print 'count is zero'

That one switches back to the hash join. This could be a bug or feature in the optimizer. However, I note that you should pretty much never ever do a query like this. You are forcing the engine to touch EVERY ROW just to determine if there is one or more rows. That should be an EXIST query.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mickyT SSChampion Points: 10360 More actions · Answer 4

This is strange, I tried it on one of my databases and got a similar results. Even after creating indexes that I noticed missing. I had the best performance with the following query on my tables.

IF EXISTS (

SELECT COUNT(*)

FROM table1 c

WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

Edit: Remove a tray NOT

I fudged up a test set in case to try and narrow it down. I could not replicate the performance results.

I tried this with MAXDOP on the Server set to 0 and 1.

This is my test set and queries. Though it doesn't really help answer you question:Whistling:

/*

CREATE TABLE Table1 (

ID INT IDENTITY(1,1) PRIMARY KEY

, N INT NOT NULL

, Something Varchar(21) DEFAULT 'Put Something it here'

);

--CREATE INDEX Tab1_IDX1 ON Table1(ID) --Also tried this rather than a primary key

INSERT INTO Table1 (N) SELECT TOP 1000000 N FROM Tally

CREATE TABLE Table2 (

ID INT NOT NULL

, Something Varchar(21) DEFAULT 'Put Something it here'

)

INSERT INTO Table2 (ID) SELECT TOP 300000 N + 500 N FROM Tally

INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1000 N FROM Tally

INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1500 N FROM Tally

INSERT INTO Table2 (ID)SELECT TOP 300000 N + 2000 N FROM Tally

CREATE INDEX Tab2_IDX1 ON Table2(ID)

CREATE TABLE Table3 (

ID INT NOT NULL

, Something Varchar(21) DEFAULT 'Put Something it here'

)

INSERT INTO Table3 (ID) SELECT TOP 500000 N FROM Tally

INSERT INTO Table3 (ID) SELECT TOP 500000 N + 250000 N FROM Tally

INSERT INTO Table3 (ID) SELECT TOP 500000 N + 500000 N FROM Tally

CREATE INDEX Tab3_IDX1 ON Table3(ID)

*/

DECLARE @C int

DECLARE @d DATETIME2 = GETDATE()

PRINT '--------------'

SELECT @C = COUNT(*)

FROM table1 c

WHERE c.ID NOT IN ( SELECT ID FROM table2)

PRINT @C

PRINT 'Straight Query - Not 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF (

SELECT COUNT(*)

FROM table1 c

WHERE c.ID NOT IN ( SELECT ID FROM table2)

) = 0

PRINT ' Result 0'

PRINT 'IF with NOT IN - Not 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF (

SELECT count(*)

FROM table1 c

WHERE NOT EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)

) = 0

PRINT ' Result 0'

PRINT 'IF with NOT EXISTS - Not 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF EXISTS (

SELECT COUNT(*)

FROM table1 c

WHERE EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

PRINT 'IF with HAVING - Not 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

SELECT @C = COUNT(*)

FROM table1 c

WHERE c.ID NOT IN ( SELECT ID FROM table3)

PRINT @C

PRINT 'Straight Query NOT IN - 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF (

SELECT count(*)

FROM table1 c

WHERE c.ID NOT IN ( SELECT ID FROM table3)

) = 0

PRINT ' Result 0'

PRINT 'IF with NOT IN - 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF (

SELECT count(*)

FROM table1 c

WHERE NOT EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)

) = 0

PRINT ' Result 0'

PRINT 'IF with NOT EXISTS - 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

SET @d = GETDATE()

PRINT '--------------'

IF EXISTS (

SELECT COUNT(*)

FROM table1 c

WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

PRINT 'IF with HAVING - 0'

PRINT DATEDIFF(ms,@d,GETDATE())

PRINT '--------------'

/*

DROP TABLE Table1

DROP TABLE Table2

DROP TABLE Table3

*/

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mickyT SSChampion Points: 10360 More actions · Answer 6

TheSQLGuru (3/17/2014)
If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:

I agree and I was a bit loathed to post it to be honest. That was why I also mocked up some test data. It didn't perform the same on that. Tomorrow I'm going to take a closer look at the tables that I did the initial test on and try and make sense of it.

I'll try and get some actual plans posted when I do. The only reason I tried it was I thought it might have an early opt out due to the having.

mickyT SSChampion Points: 10360 More actions · Answer 7

Hi

The tables I were querying from have the following:

Table1 1877 unique ids with a non clustered index.

Table2 38438 joining ids, avg of 20 rows per ID up to a maximum of 216 rows, non clustered index.

Statistics all updated with fullscan.

I ran the following querys--Statement1

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

--Statement2

IF NOT EXISTS (

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

--Statement3

IF (

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

) = 0

PRINT ' Result 0'

Here's the general stats

For Statement 3 the estimated rows are way south of the actual rows. The net effect is that Statement 3 ends up doing way more reads than it needs to.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

But your 3 samples are missing the important 4th sample:

IF NOT EXISTS (

SELECT *FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

)

PRINT ' Result 0'

You may need to play around with the exist/not exists to get the logic you want. Oh, and you want to code this so that it DOES HIT very often/quickly (assuming that is applicable here) - that way the EXISTS logic can short-circuit and exit early in the query.

Once you get the query right, how does it perform?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 9

hunchback (3/17/2014)
Try using NOT EXISTS instead NOT IN.
if exists(
select *
from Responses r
where not exists (
select *
from data.GamingReport_Computerized as t
where t.ResponseID = r.ResponseID
)
)
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
It will be helpful having an index by ResponseID in each table if you do not have one.

I've found that there are typically no differences in performance between NOT EXISTS and NOT IN. Lot's of tests have been run on this site that demonstrate that fact, as well.

As a side bar, I prefer NOT IN because it doesn't require correlation and the sub-query can be tested separately.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mickyT SSChampion Points: 10360 More actions · Answer 10

Hi

Have added the various query options now and we have a new winner 😀 with a query that makes sense-- Statement 1

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

-- Statement 2

IF NOT EXISTS (

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

-- Statement 3

IF EXISTS (

SELECT COUNT(*)

FROM Table1 c

WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

HAVING COUNT(*) > 0

)

PRINT ' Result 0'

-- Statement 4

IF (

SELECT COUNT(*)

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

) = 0

PRINT ' Result 0'

-- Statement 5

IF NOT EXISTS (

SELECT 1

FROM Table1 c

WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

)

PRINT ' Result 0'

-- Statement 6

IF EXISTS (

SELECT 1

FROM Table1 c

WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

)

PRINT ' Result 0'

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

I suspect I know why, but don't have time to write an essay tonight (and that's what is required to explain it)

The EXISTS adds a row goal of 1 into the plan (because EXISTS only needs to see whether there's a row or not). Paul White's blog has at least one post on Row Goals

To see if that is the cause, does this query have the same odd performance characteristic as the IF EXISTS... version?

select TOP(1) 1 from Responses r where r.ResponseID not in (

select ResponseID from data.GamingReport_Computerized

)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

Cool! So it looks like the EXISTS/EXISTS is able to very quickly exit with a hit (8 reads). Oh, I forgot to ask - are your plans estimated or actual?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mickyT SSChampion Points: 10360 More actions · Answer 13

The plans are actual.

And here is Paul's blog that Gail referred to.

triynko22 SSC Veteran Points: 286 More actions · Answer 14

mickyT: Which query do you think is a "winner", because those are not logically equivalent queries to my original query. For example, query 6 is using an "exists" in the where clause, which is incorrect. Of course that will short circuit because there exists 2.3 million matching rows. We're looking for existence of non-matching rows (i.e. where not exists or where ResponseID is null in the 2nd table) When you change it to "not exists" (since we're looking for rows that exist in one table but not the other), then it exhibits the exact same performance issue.

GilaMonster: The "top(1)" query you suggested does exhibit the same query plan and same performance issue.