Query execution plan different between production/test - same data

  • Well, when Microsoft recommends something - you better stick with it.

     

  • Well, after 2 weeks of running on a new server with the same hardware (dual processors enabled) the performance problem has never returned. I still don't know what the problem was, must have been something in the config or a corrupt install of SQL itself. I sure hope that problem does not come back!

  • Its baaaacccckk. Now it is taking about 3 minutes to run the query. I feel it will creep up to the 15 minute range as it did before. I guess it is time to contact MS.

  • I'd like to ask a few questions.

    When you make the replica of Production database to TEST, did you use dump and load(Backup/Restore) or Object transfer?

    Have you tried this? Since you have 120MB db, you can rebuild the database easily on the same server. Like BCP out to new database and apply Referencial integrity and indexes, etc.

    Also I'd try backup the DB on TEST server and restore as other database name on Production and run to see if you are getting the same result.

    Based on the execution plan, it is not choosing Hash Match for inner join and it seems some previous query make sql statstics to choose wrong plan.. Might as well try to run profile and try to simulate the same on TE as Steve recommends it.

    I've had similar problem while I had SQL7.0 but haven't had the same problem on SQL2k..

  • Thanks for the interest. I backed up prod and restored it to a test server. The slow behavior was not there. So I took the same backup file and restored it to a new DB name in prod (Shopfloor2). Shopfloor2 exhibited the same flawed execution plan as Shopfloor.

    So basically I built up a new server that didn't have the perf problem. Ran on this for the last couple of weeks with no trouble. Now the flawed behavior has come back and it is taking 3 minutes. I have to assume the problem will just get worse as it did before.

    So I revived the old prod server that has sitting around untouched. It no longer exhibits the behavior.

    I have not tried physically rebuilding the database on prod, only restoring from backup. I can give that a try because I'm out of ideas. But if it worked what would my solution be? I've already tried rebuilding the clustered indexes on all tables every night, with varying degrees of free space.

  • My server load include updates, inserts, selects, deletes. Would I just be simulating selects? It seems that the other operations would cause data constraint violations. If I'm just doing selects would it even be a valid test?

    Further advice on this approach is appreciated.

  • This is not recommendable but since your database is small and all other folks suggested good to try and didn't work, this might be worth to try.

    Based on your execution plan, it is faster to choose Hash Match over Nested loops,whish make sense to small database. So why don't you try to use join hint to force to use Hash join.

    SELECT *

    FROM Table_A t1 INNER HASH JOIN Table_B t2 on t1.id = t2.id

    OPTION (maxdop 1) -- This is optional but if w/o might force to make parallelism so I would try w/ and w/o this.

    If that works, this sounds like SQL bug. Are you using the latest service pack for both SQL and Windows?

  • Well, with the inner join syntax the proper efficient execution plan is always chosen so this syntax wouldn't help me. Is there a way to use the hash hint with the 'where t1.field1=t2.field2' type syntax? Even if there is a way, Excel isn't going to start putting those hints in its pivot table queries.

  • Would executing a transaction log from several days of prod activity against the test database work to simulate this load? Just a thought. I'm still confused on how to simulate the traffic since there are updates/inserts/deletes.

  • Applying the log will not simulate the load, but you can create a trace file

    using SQL Profiler and then replay it in test.

    Igor

  • To all:

    Thank you for your time and ideas. After Microsoft analyzed the issue and reproduced it on their end, this behavior was deemed a bug. The temporary work around is to disable Table Spooling using trace flag 8690. I'll let you know if there are any further developments.

    -Aaron

  • Thanks for the update, Aaron.

    It's useful to know about another undocumented trace flag.

     

    Igor

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply