2005 optimizer behaving badly

  • We migrated our db from 2000 to 2005 recently, and noticed some queries running much longer on the new platform. We've narrowed down the core issue to a ltrim(x) function in the where clause. With this function, the query takes 3+ minutes to run in 2005. Taking out this function allows the query to run in 1 second, as it does in 2000.

    Further investigation via showplan revealed that 2000 was performing a hash join to complete the query, while 2005 (WITH ltrim(x) function) was using a nested loop. If the ltrim(x) function was removed on the 2005 query, the showplan revealed that it was now using hash to join... and performed in 1 second.

    One step further... if I use an "option (hash join)" hint on the 2005 query, it returns results in 1 second regardless of the ltrim(x) function.

    Also, when the nested-loop plan is in effect, the query plan indicates that there is an index scan on the non-clustered index that covers MCTN_ID (below) as expected. However, while it's churning away, the only locks are on the table itself, with no key locks at all referencing the index. This leads me to believe that eve though it's showing "index scan", it's actually doing a table scan. The number of reads is 3.3million as opposed to 12000 when it runs without ltrim().

    So... the core issue looks to be that SQL 2005 is developing an execution plan that is incorrect when the ltrim(x) function is in the query, but correct when the ltrim(x) function is removed.

    As an additional point, rtrim(), upper(), lower() functions don't negatively affect the query... only ltrim().

    Here's the query:

    select *

    FROM

    table1 t1

    WHERE

    t1.REC_TYPE = 'N'

    AND NOT EXISTS (

    SELECT 1

    FROM

    table2 t2

    WHERE

    ltrim(rtrim(t2.MCTN_ID)) = t1.CLPR_TAX_ID

    )

    -- option (hash join) -- this forces hash join, and "fixes" the issue.

    table1 is 755,000 records

    table2 is 60,000 records

    Are there any optimizer-specific configuration settings that may be tweaked to emulate the way 2000 does it ?

    Or, does this sound like a bug?

    Thanks

    -Allen

  • There are some cases where the 2005 optimiser produces worse plans to the 2000 one. Short of a rewrite or query hints, there's not way to force the optimiser to work as it did in 2000. It's not technically a bug.

    That said, have you done an update stats on all tables? The statistics kept by SQL 2005 are more detailed than sql 2000. The 2005 optimiser can use the 2000 stats, but not well.

    An index scan is a complete scan of all pages in that index, it's not a table scan, but it is very similar (table scan reads all the leaf pages of the table)

    With the functions on the column, there's no way you'll get an index seek. It has to scan. I suspect it has something to do with the cardinality estimates. With the rtrim only, the optimiser can guess the rows affected fairly well, with ltrim, it can't and makes a guess, resulting in the poor join choice.

    Is the ltrim necessary? Is there any chance that you can trim the data in the table? (or are the leading spaces meaningful?)

    If it is, and this query is often run, is there any chance that you can create a computed column, index that and join on that?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the reply.

    Yes, the stats have been updated fully several times for these tables.

    Interesting to note: I can force both a hash join and a loop join on both 2000 and 2005. When both 2000 and 2005 are using a nested loop, the 2000 details of the index scan shows "estimated record count" as 755,050 as expected. However, the 2005 details of the index scan shows "estimated record count" as 1. Could it be that this is why it's choosing the nested loop over the hash join...?

    I understand that changing code is an option, but in our case, it's difficult to justify modification of several hundred sprocs unless we have reason to believe that the code is actually "bad design" in the first place.

    What concerns me more is that this is only one of the "bugs" identified so far with this upgrade. If this is an issue with upgrading to 2005 that Microsoft isn't acknowledging, there could easily be many more that we're not aware of.

    I spent Friday on the phone with Microsoft's performance team and they acknowledged that it is definitely a difference in the way the optimizers handle the code, and initially offered a code change as the only solution. They're working now to recreate the problem in their environment, so we'll see how that goes.

  • Allen Krehbiel (9/29/2008)


    Interesting to note: I can force both a hash join and a loop join on both 2000 and 2005. When both 2000 and 2005 are using a nested loop, the 2000 details of the index scan shows "estimated record count" as 755,050 as expected. However, the 2005 details of the index scan shows "estimated record count" as 1. Could it be that this is why it's choosing the nested loop over the hash join...?

    Very likely. It means that the row count estimate is way wrong. So far out and the optimiser's going to make a bad choice.

    Any chance you could post exec plan here? (saved as a .sqlplan file, zipped and attached)

    What concerns me more is that this is only one of the "bugs" identified so far with this upgrade. If this is an issue with upgrading to 2005 that Microsoft isn't acknowledging, there could easily be many more that we're not aware of.

    It's well documented that the optimiser has changed and that some queries may perform worse on 2005 than on 2000. No one's keeping secrets or denying anything. The majority of queries perform better.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • - Did you also rebuild all indexes after your migration process ?

    - replace the ltrim with a like statement if you can.

    - as Gail already pointed to : don't use functions in a where clause because they may have your query not use an index or at least not using it as optimal as you'd expected it to be.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (9/29/2008)


    don't use functions in a where clause because they may have your query not use an index or at least not using it as optimal as you'd expected it to be.

    And that may well be causing the poor estimates

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • We had a performance issue with our migration of PeopleSoft Finance from SQL 2000 to SQL 2005. After months of band-aids and bailing wire, out PeopleSoft Administrator found out how to run a SQL Trace on the application side. We found a table with 200,000+ records that wasn't indexed. Under SQL 2000 this DID NOT cause a problem, however, it cause a significant problem on SQL 2005. Once we properly indexed the table in SQL 2005, problem was solved.

    😎

  • What version are you on? I would recommend testing and moving to SP2 + Cummulative Update 6 (or perhaps CU8). There were a BUNCH of regressions in the optimizer fixed by the CUs.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I get this issue quite often with the data I work with and I've found doing an update statistics helps in most cases. SQL randomly estimates the 1 row and all h3ll brakes loose.

  • bcronce (10/1/2008)


    I get this issue quite often with the data I work with and I've found doing an update statistics helps in most cases. SQL randomly estimates the 1 row and all h3ll brakes loose.

    I don't believe the optimizer 'randomly' estimates 1 row. 🙂 It can DEFINITELY cause performance problems because you will get a nested loop join or index seek/bookmark lookup plan when there could be potentially hundreds of millions of rows affected in a join.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply