NOT IN query very expensive, 100% CPU

  • Hello all. Hopefully someone will be able to help me out with this query.

    The below query takes 42 minutes, returns 1 row and sits at 100% CPU the entire time.

    SELECT table1.column1,'99',left(table1.column2,2),max(table1.column3),

    'Yes', 'Test'

    FROM table1

    WHERE ltrim(rtrim(table1.column1))+ltrim(rtrim(left(table1.column2,2)))

    NOT IN (SELECT DISTINCT ltrim(rtrim(table2.column1))+ltrim(rtrim(left(table2.column2,2)))

    FROM table2)

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1,left(table1.column2,2)

    When I break it into the 2 componants like below, they both finish in < 2 seconds.

    Part 1 - 1 seconds, 15,000 rows

    SELECT ltrim(rtrim(table1.column1))+ltrim(rtrim(left(table1.column2,2))) as test, table1.column1,'99',left(table1.column2,2),max(table1.column3), 'Yes', 'Test'

    FROM table1

    WHERE table1.column4 = 'Yes'

    GROUP BY table1.column1,left(table1.column2,2)

    1 second 15,000 rows

    Part 2 - 2 seconds, 19,000 rows

    SELECT DISTINCT ltrim(rtrim(table2.column1))+ltrim(rtrim(left(table2.column2,2)))

    FROM table2

    2 seconds 19,000 rows

    What is wrong with the first query that causes it to take so long? It's really just a combination of the bottom 2 queries.

    Thanks!

  • Try using not exists(= boolean) instead of not in(= iteration) 🙂

    "We never plan to Fail, We just fail to plan":)

  • Another option:

    SELECTtable1.column1,

    '99',

    left(table1.column2,2),

    max(table1.column3),

    'Yes',

    'Test'

    FROMtable1

    LEFT

    JOINtable2 ON ltrim(rtrim(table1.column1))+ltrim(rtrim(left(table1.column2,2)))

    = ltrim(rtrim(table2.column1))+ltrim(rtrim(left(table2.column2,2)))

    WHERE table2.column1 IS NULL

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1,left(table1.column2,2)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • SELECT table1.column1,

    '99',

    left(table1.column2, 2),

    max(table1.column3),

    'Yes',

    'Test'

    FROM table1

    WHERE NOT EXISTS ( SELECT 1

    FROM TABLE2 WHERE ltrim(rtrim(table2.column1)) + ltrim(rtrim(left(table2.column2, 2)))=

    ltrim(rtrim(table1.column1)) + ltrim(rtrim(left(table1.column2, 2)))

    )

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1, Left(table1.column2, 2)

  • carlosaamaral (9/11/2012)


    SELECT table1.column1,

    '99',

    left(table1.column2, 2),

    max(table1.column3),

    'Yes',

    'Test'

    FROM table1

    WHERE NOT EXISTS ( SELECT 1

    FROM TABLE2 WHERE ltrim(rtrim(table2.column1)) + ltrim(rtrim(left(table2.column2, 2)))=

    ltrim(rtrim(table1.column1)) + ltrim(rtrim(left(table1.column2, 2)))

    )

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1, Left(table1.column2, 2)

    The reason the original (and yours) are slow is because they are not SARGable. The where clause have functions in them which renders the indexes unusable and causes a table/index scan. The left join solution that Luis presented should prove to perform far better.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Is the ltrim and rtrim necessary? SQL ignores trailing spaces when comparing string columns, if you have leading spaces it's probably better to clean up the data if possible.

    If you can get rid of the functions (they're often used where not necessary), then use not exists or not in (the left join is slightly slower usually). Also consider that there's a behavioural difference between not in and not exists when nulls are involved (on non-nullable columns they perform identically)

    p.s. You don't need a DISTINCT in an IN or NOT IN subquery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • carlosaamaral (9/11/2012)


    SELECT table1.column1,

    '99',

    left(table1.column2, 2),

    max(table1.column3),

    'Yes',

    'Test'

    FROM table1

    WHERE NOT EXISTS ( SELECT 1

    FROM TABLE2 WHERE ltrim(rtrim(table2.column1)) + ltrim(rtrim(left(table2.column2, 2)))=

    ltrim(rtrim(table1.column1)) + ltrim(rtrim(left(table1.column2, 2)))

    )

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1, Left(table1.column2, 2)

    I believe the only reason to use concatenation was IN construction.

    NOT EXISTS does not need that:

    SELECT table1.column1,

    '99',

    left(table1.column2, 2),

    max(table1.column3),

    'Yes',

    'Test'

    FROM table1

    WHERE NOT EXISTS ( SELECT *

    FROM TABLE2

    WHERE table2.column1 = table1.column1

    and left(table2.column2, 2) = left(table1.column2, 2)

    )

    AND table1.column4 = 'Yes'

    GROUP BY table1.column1, Left(table1.column2, 2)

    I removed trims as most likely they are not required.

    RTRIM is not needed for sure, and LTRIM looks like just a sign of a bad habit.

    Please check your data to make sure my assumption is correct.

    _____________
    Code for TallyGenerator

  • Thanks everyone. I learned a lot here. I went with the NOT EXISTS and it finishes in 2 seconds now. Very nice!

  • scogeb (9/12/2012)


    Thanks everyone. I learned a lot here. I went with the NOT EXISTS and it finishes in 2 seconds now. Very nice!

    Have you tried the NOT IN without the DISTINCT?

    Also, what's with all the LTRIM/RTRIM stuff? Data should be properly stored without leading spaces and trailing spaces hardly ever matter except when determining the actual number of bytes a strinng may contain.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @celko

    You see that bit of text on the left, under the username - It says Forum Newbie.

    stop shouting at people who are inexperienced and making them feel like idiots. If you did it in the office you would be up on a harrassment charge - so consider yourself on one here as well.

    There is a big difference between passing on knowlege and showing off. Your posts indicate the latter and no one is impressed.

    I am bored of reading your rants at badly formed posts from new forum members and the spewing of technical answers from books which bear no relation to the question being asked.

    Please moderate your language or I will ask the board moderators to do it for you. I am sure your knowlege will be useful here, but not your attitude.

    Aaron

  • ...

    If you did it in the office...

    Looks like he is not working in the "office". He is a TEACHER!

    I just can only image that his students love him to the bits, especially when they start to behave as their "teacher" and expose his bright ideas when trying to get a job - they must have huge success.

    If some one would tell me on T-SQL interview that "We, SQL developers, do not use IDENTITY as it's just MS SQL Server proprietary feature and it's completely crap candidate for PK", I would just ask one more question (just for having some fun): "What other crap and proprietary features of SQL Server you don't use?" then thanked candidate for attendance and get him out of office.

    At the same time, may be his students are trained to be trainers? That is fine...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Jeff Moden (9/12/2012)


    Have you tried the NOT IN without the DISTINCT?

    Also, what's with all the LTRIM/RTRIM stuff? Data should be properly stored without leading spaces and trailing spaces hardly ever matter except when determining the actual number of bytes a strinng may contain.

    Yep, same deal with NOT IN without the DISTINCT, 100% CPU for way too long.

    Who knows with the LTRIM/RTRIM. This was set up before my time here. We're trying to clean it up though. Trust me, I know it's a mess. 🙂 Take for instance the left(table2.column2,2), if you're going to use the left 2 spaces for stuff all the time, why not create another column and index it, right? I'm not sure what the original developers were thinking. Probably just wanted to get it done and go home/didn't care.

    Thanks for the help!

  • scogeb (9/13/2012)


    Jeff Moden (9/12/2012)


    Have you tried the NOT IN without the DISTINCT?

    Also, what's with all the LTRIM/RTRIM stuff? Data should be properly stored without leading spaces and trailing spaces hardly ever matter except when determining the actual number of bytes a strinng may contain.

    Yep, same deal with NOT IN without the DISTINCT, 100% CPU for way too long.

    Who knows with the LTRIM/RTRIM. This was set up before my time here. We're trying to clean it up though. Trust me, I know it's a mess. 🙂 Take for instance the left(table2.column2,2), if you're going to use the left 2 spaces for stuff all the time, why not create another column and index it, right? I'm not sure what the original developers were thinking. Probably just wanted to get it done and go home/didn't care.

    Thanks for the help!

    What would really help us help you is if you would provide the DDL (CREATE TABLE statement) for the tables involved including the indexes currently defined on the tables. Some sample data (as INSERT INTO statements) for the tables, and the actual execution plan for the query (if possible) as a .sqlplan file.

    Right now, you are getting some help, but basically just shots in the dark.

  • Well I did some more digging and it looks like the ltrim/rtrim is doing something because I get different results if I take them out. From what I gathered so far table1.column1 has trailing spaces in some cases and it looks like that causes the differences. Although there are quit a few nulls in column1 and column2 in both tables as well. It really is a mess. I question the data validation on this as well as I think the program wasn't written very well and now we have to deal with it.

  • Lynn Pettis (9/13/2012)


    What would really help us help you is if you would provide the DDL (CREATE TABLE statement) for the tables involved including the indexes currently defined on the tables. Some sample data (as INSERT INTO statements) for the tables, and the actual execution plan for the query (if possible) as a .sqlplan file.

    Right now, you are getting some help, but basically just shots in the dark.

    Your shots in the dark have helped greatly! I think I'm good on this one. Thanks a ton!

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply