Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Why would wrapping a TSQL query in an if statement increase its runtime significantly? Expand / Collapse
Author
Message
Posted Monday, March 17, 2014 4:12 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 7, 2014 9:51 AM
Points: 2, Visits: 5
I'm having odd performance difference when running the same query two different ways. Literally, the only difference is whether its wrapped in an if statement.

This query is actually part of a larger one, but I've isolated it as the culprit.

Running the query by itself returns almost immediately (returns 0)

select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))

Ultimately, I want to avoid running complex calculations based on the result of that query, so I wrap it in an if statement like so, but it takes much longer to run (~10 seconds) and I can't figure out why:

if (0 = (select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
)))
begin select 'update will be skipped to save time' end
else begin select 'missing rows will be inserted' end

The data set does not change so in both cases the result is zero / 'update will be skipped', and yet running these two versions of the same query always results in the first version completing quickly, and the second version taking about 10-12 seconds to complete.

the following query performs identically to the second version above, with the same execution plan and no increase in performance (i.e. using 'exists' instead of comparing count(*) to zero makes no difference).

if exists(select 1 from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end

Changing the query to use left or right join where data.GamingReport_Computerized.ResponseID is null performs the same way as using 'select where not in'.

It seems that using a select statement inside an if statement is causing a bad query execution plan to form. The only way around this I've found is to first assign the result of count to a variable and then test the variable with an if statement. Any combination that involves using 'if' and 'select' in the same statement results in a bad execution plan with poor performance.

See discussion here as well: http://stackoverflow.com/q/22465078/88409
Post #1551954
Posted Monday, March 17, 2014 5:14 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 28, 2014 7:29 AM
Points: 114, Visits: 612
Try using NOT EXISTS instead NOT IN.

if exists(
select *
from Responses r
where not exists (
select *
from data.GamingReport_Computerized as t
where t.ResponseID = r.ResponseID
)
)
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end

It will be helpful having an index by ResponseID in each table if you do not have one.



Post #1551961
Posted Monday, March 17, 2014 5:35 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:01 AM
Points: 1,787, Visits: 5,697
I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:

select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)

if (select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
) = 0
print 'count is zero'



MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw



  •   Post Attachments 
    Plan1.sqlplan (2 views, 37.64 KB)
    Post #1551963
    Posted Monday, March 17, 2014 8:17 PM


    SSCarpal Tunnel

    SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

    Group: General Forum Members
    Last Login: Today @ 11:57 AM
    Points: 4,351, Visits: 6,166
    mister.magoo (3/17/2014)
    I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:

    select count(*)
    from sales.salesorderdetail
    where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)

    if (select count(*)
    from sales.salesorderdetail
    where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
    ) = 0
    print 'count is zero'



    As is often the case, we need to review the query plan. The first query gets an estimated row count of 178.5 coming out of the right-anti-semi-join so it chooses a hash join type. The second query has an estimate of 1 row and chooses a merge right-anti-semi-join.

    Curious is that you can affect the plan type with different number checks on the IF.

    if (select count(*)
    from Sales.SalesOrderDetail
    where SalesOrderID not in (select SalesOrderHeader.SalesOrderID from Sales.SalesOrderHeader)) > 10000
    print 'count is zero'

    That one switches back to the hash join. This could be a bug or feature in the optimizer. However, I note that you should pretty much never ever do a query like this. You are forcing the engine to touch EVERY ROW just to determine if there is one or more rows. That should be an EXIST query.


    Best,

    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru at GMail
    Post #1551975
    Posted Monday, March 17, 2014 9:01 PM
    Ten Centuries

    Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

    Group: General Forum Members
    Last Login: Today @ 1:04 PM
    Points: 1,004, Visits: 3,028
    This is strange, I tried it on one of my databases and got a similar results. Even after creating indexes that I noticed missing. I had the best performance with the following query on my tables.
    IF  EXISTS (
    SELECT COUNT(*)
    FROM table1 c
    WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
    HAVING COUNT(*) > 0
    )
    PRINT ' Result 0'

    Edit: Remove a tray NOT

    I fudged up a test set in case to try and narrow it down. I could not replicate the performance results.
    I tried this with MAXDOP on the Server set to 0 and 1.

    This is my test set and queries. Though it doesn't really help answer you question
    /*
    CREATE TABLE Table1 (
    ID INT IDENTITY(1,1) PRIMARY KEY
    , N INT NOT NULL
    , Something Varchar(21) DEFAULT 'Put Something it here'
    );
    --CREATE INDEX Tab1_IDX1 ON Table1(ID) --Also tried this rather than a primary key
    INSERT INTO Table1 (N) SELECT TOP 1000000 N FROM Tally

    CREATE TABLE Table2 (
    ID INT NOT NULL
    , Something Varchar(21) DEFAULT 'Put Something it here'
    )
    INSERT INTO Table2 (ID) SELECT TOP 300000 N + 500 N FROM Tally
    INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1000 N FROM Tally
    INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1500 N FROM Tally
    INSERT INTO Table2 (ID)SELECT TOP 300000 N + 2000 N FROM Tally
    CREATE INDEX Tab2_IDX1 ON Table2(ID)

    CREATE TABLE Table3 (
    ID INT NOT NULL
    , Something Varchar(21) DEFAULT 'Put Something it here'
    )
    INSERT INTO Table3 (ID) SELECT TOP 500000 N FROM Tally
    INSERT INTO Table3 (ID) SELECT TOP 500000 N + 250000 N FROM Tally
    INSERT INTO Table3 (ID) SELECT TOP 500000 N + 500000 N FROM Tally
    CREATE INDEX Tab3_IDX1 ON Table3(ID)
    */

    DECLARE @c int
    DECLARE @d DATETIME2 = GETDATE()

    PRINT '--------------'
    SELECT @c = COUNT(*)
    FROM table1 c
    WHERE c.ID NOT IN ( SELECT ID FROM table2)
    PRINT @c

    PRINT 'Straight Query - Not 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    IF (
    SELECT COUNT(*)
    FROM table1 c
    WHERE c.ID NOT IN ( SELECT ID FROM table2)
    ) = 0
    PRINT ' Result 0'

    PRINT 'IF with NOT IN - Not 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()


    PRINT '--------------'
    IF (
    SELECT count(*)
    FROM table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
    ) = 0
    PRINT ' Result 0'

    PRINT 'IF with NOT EXISTS - Not 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    IF EXISTS (
    SELECT COUNT(*)
    FROM table1 c
    WHERE EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
    HAVING COUNT(*) > 0
    )
    PRINT ' Result 0'

    PRINT 'IF with HAVING - Not 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    SELECT @c = COUNT(*)
    FROM table1 c
    WHERE c.ID NOT IN ( SELECT ID FROM table3)
    PRINT @c


    PRINT 'Straight Query NOT IN - 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    IF (
    SELECT count(*)
    FROM table1 c
    WHERE c.ID NOT IN ( SELECT ID FROM table3)
    ) = 0
    PRINT ' Result 0'

    PRINT 'IF with NOT IN - 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    IF (
    SELECT count(*)
    FROM table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
    ) = 0
    PRINT ' Result 0'

    PRINT 'IF with NOT EXISTS - 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'
    SET @d = GETDATE()

    PRINT '--------------'
    IF EXISTS (
    SELECT COUNT(*)
    FROM table1 c
    WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
    HAVING COUNT(*) > 0
    )
    PRINT ' Result 0'

    PRINT 'IF with HAVING - 0'
    PRINT DATEDIFF(ms,@d,GETDATE())
    PRINT '--------------'

    /*
    DROP TABLE Table1
    DROP TABLE Table2
    DROP TABLE Table3
    */

    Post #1551982
    Posted Monday, March 17, 2014 9:08 PM


    SSCarpal Tunnel

    SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

    Group: General Forum Members
    Last Login: Today @ 11:57 AM
    Points: 4,351, Visits: 6,166
    If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever!

    Best,

    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru at GMail
    Post #1551983
    Posted Monday, March 17, 2014 11:00 PM
    Ten Centuries

    Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

    Group: General Forum Members
    Last Login: Today @ 1:04 PM
    Points: 1,004, Visits: 3,028
    TheSQLGuru (3/17/2014)
    If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever!

    I agree and I was a bit loathed to post it to be honest. That was why I also mocked up some test data. It didn't perform the same on that. Tomorrow I'm going to take a closer look at the tables that I did the initial test on and try and make sense of it.
    I'll try and get some actual plans posted when I do. The only reason I tried it was I thought it might have an early opt out due to the having.
    Post #1551995
    Posted Tuesday, March 18, 2014 1:06 PM
    Ten Centuries

    Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

    Group: General Forum Members
    Last Login: Today @ 1:04 PM
    Points: 1,004, Visits: 3,028
    Hi

    The tables I were querying from have the following:
    Table1 1877 unique ids with a non clustered index.
    Table2 38438 joining ids, avg of 20 rows per ID up to a maximum of 216 rows, non clustered index.
    Statistics all updated with fullscan.

    I ran the following querys
    --Statement1
    SELECT COUNT(*)
    FROM Table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)

    --Statement2
    IF NOT EXISTS (
    SELECT COUNT(*)
    FROM Table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
    HAVING COUNT(*) > 0
    )
    PRINT ' Result 0'

    --Statement3
    IF (
    SELECT COUNT(*)
    FROM Table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
    ) = 0
    PRINT ' Result 0'

    Here's the general stats

    For Statement 3 the estimated rows are way south of the actual rows. The net effect is that Statement 3 ends up doing way more reads than it needs to.


      Post Attachments 
    EXISTS_HAVING.sqlplan (0 views, 16.37 KB)
    NOT_EXISTS_EQUALS.sqlplan (0 views, 11.19 KB)
    stats.png (176 views, 9.08 KB)
    BASELINE.sqlplan (0 views, 10.07 KB)
    Post #1552368
    Posted Tuesday, March 18, 2014 1:54 PM


    SSCarpal Tunnel

    SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

    Group: General Forum Members
    Last Login: Today @ 11:57 AM
    Points: 4,351, Visits: 6,166
    But your 3 samples are missing the important 4th sample:

    IF  NOT EXISTS (
    SELECT * FROM Table1 c
    WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
    )
    PRINT ' Result 0'

    You may need to play around with the exist/not exists to get the logic you want. Oh, and you want to code this so that it DOES HIT very often/quickly (assuming that is applicable here) - that way the EXISTS logic can short-circuit and exit early in the query.

    Once you get the query right, how does it perform?


    Best,

    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru at GMail
    Post #1552386
    Posted Tuesday, March 18, 2014 2:04 PM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Yesterday @ 9:58 AM
    Points: 36,995, Visits: 31,517
    hunchback (3/17/2014)
    Try using NOT EXISTS instead NOT IN.

    if exists(
    select *
    from Responses r
    where not exists (
    select *
    from data.GamingReport_Computerized as t
    where t.ResponseID = r.ResponseID
    )
    )
    begin select 'missing rows will be inserted' end
    else begin select 'update will be skipped to save time' end

    It will be helpful having an index by ResponseID in each table if you do not have one.


    I've found that there are typically no differences in performance between NOT EXISTS and NOT IN. Lot's of tests have been run on this site that demonstrate that fact, as well.

    As a side bar, I prefer NOT IN because it doesn't require correlation and the sub-query can be tested separately.


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1552389
    « Prev Topic | Next Topic »

    Add to briefcase 12»»

    Permissions Expand / Collapse