NOT IN query very expensive, 100% CPU

  • scogeb (9/13/2012)


    Jeff Moden (9/12/2012)


    Have you tried the NOT IN without the DISTINCT?

    Also, what's with all the LTRIM/RTRIM stuff? Data should be properly stored without leading spaces and trailing spaces hardly ever matter except when determining the actual number of bytes a strinng may contain.

    Yep, same deal with NOT IN without the DISTINCT, 100% CPU for way too long.

    Yeah, it will be. SQL ignores DISTINCT when specified in an IN/NOT IN subquery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Can you try this:

    SELECT column1,'99' [a],left(column2,2) , max(column3) [c], 'Yes' [d], 'Test' [e]

    ,ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre1

    FROM table1

    WHERE column4 = 'Yes'

    GROUP BY column1, left(column2,2)

    SELECT DISTINCT ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre2

    FROM table2

    SELECT p1.column1, p1.a, p1.b, p1.c, p1.d, p1.e

    FROM #pre1 p1

    WHERE NOT EXISTS(SELECT 1 FROM #pre2 p2 WHERE p2.test = p1.test)

    You may also try creating unique clustered index on #pre2 (test)

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • GilaMonster (9/13/2012)


    scogeb (9/13/2012)


    Jeff Moden (9/12/2012)


    Have you tried the NOT IN without the DISTINCT?

    Also, what's with all the LTRIM/RTRIM stuff? Data should be properly stored without leading spaces and trailing spaces hardly ever matter except when determining the actual number of bytes a strinng may contain.

    Yep, same deal with NOT IN without the DISTINCT, 100% CPU for way too long.

    Yeah, it will be. SQL ignores DISTINCT when specified in an IN/NOT IN subquery.

    Thanks, Gail. I learned something new today.

    Just to verify, I tried it a couple of different ways. On larger tables, it'll sometimes add a "Sort (Distinct Sort)" to the execution plan if the optimizer thinks it will help but I couldn't get it to add such a sort using DISTINCT on smaller tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In this case:

    distinct and no where clause (full scan?!?!??!?! almost... Very slow!!)

    options: NOT EXISTS ... sometimes use: (SELECT TOP 1 ....)

    Query analyser and execution plan (performance, index - helps a lot).

    Thanks,

    Carlos

    P.S.: Sorry, my english is very, very BAD !!!

    long time just watching ..... I'll try to write ...

    NEED TO ASK, PARTICIPATE: NOLOCK, JOINS, MODELING ...

    I used Oracle for several years. I'm using SQLSERVER 1 YEAR AGO, I'm enjoying it ..

  • Eugene Elutin (9/13/2012)


    Can you try this:

    SELECT column1,'99' [a],left(column2,2) , max(column3) [c], 'Yes' [d], 'Test' [e]

    ,ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre1

    FROM table1

    WHERE column4 = 'Yes'

    GROUP BY column1, left(column2,2)

    SELECT DISTINCT ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre2

    FROM table2

    SELECT p1.column1, p1.a, p1.b, p1.c, p1.d, p1.e

    FROM #pre1 p1

    WHERE NOT EXISTS(SELECT 1 FROM #pre2 p2 WHERE p2.test = p1.test)

    You may also try creating unique clustered index on #pre2 (test)

    Yep, that works too and I get accurate results in 2 seconds or so. I was under the impression it's best to avoid temp tables if you can though. Is that not accurate?

  • scogeb (9/13/2012)


    Eugene Elutin (9/13/2012)


    Can you try this:

    SELECT column1,'99' [a],left(column2,2) , max(column3) [c], 'Yes' [d], 'Test' [e]

    ,ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre1

    FROM table1

    WHERE column4 = 'Yes'

    GROUP BY column1, left(column2,2)

    SELECT DISTINCT ltrim(rtrim(column1))+ltrim(rtrim(left(column2,2))) as test

    INTO #pre2

    FROM table2

    SELECT p1.column1, p1.a, p1.b, p1.c, p1.d, p1.e

    FROM #pre1 p1

    WHERE NOT EXISTS(SELECT 1 FROM #pre2 p2 WHERE p2.test = p1.test)

    You may also try creating unique clustered index on #pre2 (test)

    Yep, that works too and I get accurate results in 2 seconds or so. I was under the impression it's best to avoid temp tables if you can though. Is that not accurate?

    Sometimes divided and conquer works better than trying to do it all in one.

  • ...

    Yep, that works too and I get accurate results in 2 seconds or so. I was under the impression it's best to avoid temp tables if you can though. Is that not accurate?

    Who gave you such impression or where did you take it from?

    Quite often splitting tasks using temp tables will not only significantly boost performance of complicated processes, but also increase readability and therefore maintainability of code, simplifying testing and debugging.

    I'm trying to use the following simple rule:

    Use available technique where it is appropriate and produces best outcome. Don't use "best common practices" where they are irrelevant. :hehe:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (9/13/2012)


    Who gave you such impression or where did you take it from?

    Quite often splitting tasks using temp tables will not only significantly boost performance of complicated processes, but also increase readability and therefore maintainability of code, simplifying testing and debugging.

    I'm trying to use the following simple rule:

    Use available technique where it is appropriate and produces best outcome. Don't use "best common practices" where they are irrelevant. :hehe:

    Various websites over time. Something about temp tables using tempdb which requires more overhead on the server. Perhaps things have changed though with newer versions of SQL.

  • scogeb (9/13/2012)


    Eugene Elutin (9/13/2012)


    Who gave you such impression or where did you take it from?

    Quite often splitting tasks using temp tables will not only significantly boost performance of complicated processes, but also increase readability and therefore maintainability of code, simplifying testing and debugging.

    I'm trying to use the following simple rule:

    Use available technique where it is appropriate and produces best outcome. Don't use "best common practices" where they are irrelevant. :hehe:

    Various websites over time. Something about temp tables using tempdb which requires more overhead on the server. Perhaps things have changed though with newer versions of SQL.

    Browsing internet is not safe nowadays, there are various dangerous websites around where something bad can happen, be careful:hehe:...

    Yeah, quite few things have changed since SQL6.5

    As with everything else: you need to know how to use it properly (in regards to tempdb it includes how to configure it properly)

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • CELKO (9/13/2012)


    SQL ignores DISTINCT when specified in an IN/NOT IN subquery.

    As an aside, some other SQLs have special optimizations for the IN (<expression list>) predicates. Most of the time, a simple sequential search is just fine because the list is short. The trick in old Oracle was to sort the list by likelihood of matching, but in reverse order because of the way the compiler worked. DB2 would sort the list and do a binary tree search at a certain size. Then at a much larger size, it built a hash table. A trick was to know the critical point, add a dummy value or two and get this to kick in.

    And this coming from Mr. Only-write-standard_SQL, actually extorting the value of using known quirks in the target RDBMS? I'm sorry, but did I just hear that a snowball isn't melting somewhere?

  • CELKO (9/13/2012)


    You see that bit of text on the left, under the username - It says Forum Newbie.

    Which is why I took the time to go into details with him. People seldom make random errors; they usually have the wrong mindset or incomplete information. They carry assumptions and are not even aware of it.

    stop shouting at people who are inexperienced and making them feel like idiots. If you did it in the office you would be up on a harassment charge - so consider yourself on one here as well.

    That is a silly analogy! Nobody is getting paid here. We do not have a common business goal. And you are not a manager who can put me on notice. If someone feels that they are being harassed then they can reply directly to me. Do I ever call anyone stupid? I tell them they are doing it wrong. I quote standards and fundamentals that people need to know and use if they want to work in databases. What do you do?

    What bothers me is that we have a boiler plate about how to post to forums which is ignored. There is a discussion in the “anything but SQL” section on how the posting are getting dumber. I think it is true.

    I am bored of reading your rants at badly formed posts from new forum members and the spewing of technical answers from books which bear no relation to the question being asked.

    You never took a class under Ed Dijkstra or Jim McCarthy, did you? 🙂

    Actually, you have insinuated that some of the OPs are either stupid, ignorant, or both. You have also advocated violence against those you have called morons for designing shoddy databases.

    Basically, you have in many cases demonstrated a serious lack of professionalism.

    If you don't care how you are perceived, that's fine, but this also reflects on this site as well. How about showing us all that you can actually behave in a professional manner. If not, perhaps this site really isn't for you.

  • CELKO (9/13/2012)


    Looks like he is not working in the "office". He is a TEACHER! I just can only image that his students love him to the bits, especially when they start to behave as their "teacher" and expose his bright ideas when trying to get a job - they must have huge success.

    I do mostly industrial training and sometimes I am an invited lecturer at Universities; my students already have a job. But they are not doing it as well as they need to and they know it. The last day is to bring in problem queries or DDL, and dissect it as a class. My “bright ideas” have been improving databases for decades now.

    If some one would tell me on T-SQL interview that "We, SQL developers, do not use IDENTITY as it's just MS SQL Server proprietary feature and it's completely crap candidate for PK", I would just ask one more question (just for having some fun): "What other crap and proprietary features of SQL Server you don't use?" then thanked candidate for attendance and get him out of office.

    Sorry, but IDENTITY is not a candidate key at all. But why would you actively try to program with proprietary features? Want to go back and pull out “LEFT OUTER JOIN” to replace it with *= (well, would give you job security)?

    Better interview questions would how to convert ISNULL() to COALESCE(), how to use the DATE and TIME data types, how UTC works, do a relational division, use the OVER() subclause, etc.

    There was a short piece in the trade press about a job interview which involved debugging a bit of code. The interviewee knocked it down to a fraction of the original size and improved performance. But he did not know this was production code written by the interviewers.

    Did he get the job or did they simply say thank you and send him on his way?

  • ...

    Sorry, but IDENTITY is not a candidate key at all. But why would you actively try to program with proprietary features?

    May be here you could finally suggest a good candidate natural keys for two simple entities:

    1. Person - where person may not be US citizen, but from any country, and not marked by angels with world-wide identifier

    2. Company - where again, company can be a company from any country and actually not listed on any stock exchange.

    Please!!!

    Want to go back and pull out “LEFT OUTER JOIN” to replace it with *= (well, would give you job security)?

    Nope, and I never suggested that. But it has nothing to with IDENTITY feature of SQL 2008 and pre and new Oracle-like sequence "proprietary" feature in SQL 2012.

    Better interview questions would how to convert ISNULL() to COALESCE(), how to use the DATE and TIME data types, how UTC works, do a relational division, use the OVER() subclause, etc.

    I guess there are definitely better topics to discuss on interview than above. It depends on the level of developer required.

    But why on earth would you want to convert ISNULL to COALESCE? What exactly will you gain? Does it worth of money? Please don't tell me about portability. When and if, system will need to be ported from MS SQL Server to other non-SQL Server product, the above will be of the least concern.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • CELKO (9/13/2012)


    Who knows with the LTRIM/RTRIM. This was set up before my time here. We're trying to clean it up though. Trust me, I know it's a mess. 🙂 Take for instance the left(table2.column2,2), if you're going to use the left 2 spaces for stuff all the time, why not create another column and index it, right? I'm not sure what the original developers were thinking. Probably just wanted to get it done and go home/didn't care.

    After 3 decades of cleaning up crap SQL, I think it is COBOL. Even if the guy never wrote a line of COBOL in his life. In that language each record (not row) is a string of characters processed from left to right. Each program read the file pulls out the fields (not columns) it needs. If you do not need everything in the record, you use FILLER to skip over it.

    You also left extra space for growth, so that when a field grew, you were ready. Greater for punch cards and mag tape, but not so good for SQL. American Honda had programs that only allowed 10,000 dealerships and going back to expand files was a bitch.

    Well, as a developer that used COBOL I take offense at your statements. Yes, I know all about FILLER (used it to pad "tables" to allow for future growth without have to rebuild them constantly as fields were added), SAME RECORD AREA, and all sorts of other fun stuff.

    I don't build my SQL databases the way I had to with COBOL because I know better. I will, however, use the terms row and record and column and field interchangeably, because they do relate to each other to me at those levels. Especially since I spent a lot of time and effort transfering data back and forth between the COBOL "databases" (ISAM to be exact) and a SQL Server database.

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply