Possible parameter sniffing?

  • Hi,

    I hope I have picked the right sub-forum for this. I have a problem on an internal web-based application we have developed in-house. The basic problem is that a handful of stored procedures within this application are locking up and causing the webpage calling them to crash at random times. Having profiled the database while waiting for the problem to show up again, when it manifests itself, the procedure will run for 30 seconds before the webpage times out and cancels the query. In normal operation, the procedures will run in under a second.

    Some background may be useful. Prior to 9 months ago, this never happened. We used to run on a SQL Server 2000 backend and the code in the affected procedures was written while running on that version. In September last year, we migrated our system across to SQL Server 2005 and since then, we have been encountering this problem. The problem seems to be worse than it was 6 months ago or so. Before, it might only happen once a fortnight, then it was once a week, 2-3 times a week and now it can be as much as 3 times a day.

    Hopefully it will make it easier if I bullet point the key information. One specific procedure is causing about 90% of the problems, so i will speak with regard to that. All procedures affected exhibit a lot of the same problems when they aren't working however

    - This has only been happening since moving to 2005

    - The procedures are all list procedures, so there are no table updates, inserts, deletes etc. There may be some filtering using table variables, but the general intent of the procedure is to return information

    - When traced, the procedure usually run with CPU usage of about 80, reads of up to 10,000, no writes and a duration of about 250ms. When the procedure is not working, it has CPU usage of 25-30 seconds, reads of up to 8 million, writes of 0 and a duration of 30 seconds

    - When we run the affected procedure in SSMS, it works perfectly. Investigating this further, we found that different execution plans are created for the website calling the SP, and SSMS calling it. This was found to be due to arithabort being set differently. When matched to what the website uses, the procedure was slow on SSMS too, to the extent that after 30 minutes we gave up and cancelled it.

    - Running sp_recompile on the affected procedure usually fixes it, though it has been known to need a second or third attempt to get it to work

    - To troubleshoot the problem further, we captured the time the procedure last ran successfully. In every case, the procedure is filtered on a unique value which is not the primary key or an identity. We successfully broke the page again by fixing it with sp_recompile and then re-running the captured code. We therefore believe certain parameter values may be causing the procedure to break.

    - There appears to be no logic or commonality between the records that were last accessed prior to failure.

    - Because there is no predictability to the failures, we have been unable to force it to happen in a test environment.

    - At one stage, we updated a record that would have been called as part of the affected procedure. Updating it caused the page to work again and re-running the code as above did not break it again. We therefore wondered if some sort of corrupt record caused the problem

    - When looking at the trace using the showplanxml event, we can see that the query plan used prior to the problem is completely different to the plan that seems to be in use afterwards. It has parallelization all over the place for a start. We also find that numerous different plans seem to be used after it breaks, so before, one plan shows when the procedure is called. When it doesn't work, as many as 4 different query plans appear in the list, and these might repeat several times before the page crashes.

    - When looking at the before and after plans, the plans after it has broken all include a predicate in, the plan prior to failure was more generic and did not include any parameters.

    Given I have been looking at this for so long, I may have forgotten something so please feel free to suggest ideas and I will let you know if I have already tried it.

    Does the above look like parameter sniffing? To me, some of it does, it certainly goes slow like parameter sniffing does. However, having looked at information on problems caused by parameter sniffing, most people seem to say their code merely goes on a go-slow. This doesn't seem to be the case here. Also, the potential cause of dodgy parameter sniffing, unrepresentative parameters, does not seem to be the case here because there is no one value that is representative, since the searches are all for unique values.

    If it is parameter sniffing that is the problem, are there any alternatives to running the procedures concerned using local variables, or setting option recompile on in the code?

    Thanks

    Paul

  • Could be parameter sniffing, but it's difficult to say without seeing the code.

    Can you post the code, or, at least, part of it?

    -- Gianluca Sartori

  • I don't think I could post all of it, is there anything in particular you wanted to see? I will do what I can if you give me an idea what you're after

    Sorry to complicate things

    Thanks

    Paul

  • I wanted to see if there's some kind of "catch-all" query involved in your procedure. This is well known to cause problems with parameter sniffing and inaccurate query plans.

    You can find a good definition of catch-all query in Gail Shaw's blog: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/.

    Erland Sommarskog discusses the same topic here: http://www.sommarskog.se/dyn-search.html

    -- Gianluca Sartori

  • Yes, there are quite a lot of them in there actually, it . Any ideas though why it would work for however long, a day, a week etc and then fail on one particular query? Also, we have dozens of procedures that this never happens to with similar catch alls.

    Thanks

    Paul

  • Everything depends on what plan is being cached. Every time the procedure is recompiled, the plan is put into the cache. The parameters used for that recompile generate a plan that is optimal for those parameters and suboptimal (but maybe still acceptable) for another set of parameters.

    On the other hand, the second set of parameters could lead to a tremendously expensive plan for the first set of parameters.

    I hope I was clear enough, English is not my first language...

    -- Gianluca Sartori

  • Your English is fine 🙂

    We did look a long time ago at what plans existed for the procedure. There were two and one was always called by the web front end. The other one should have been called by SSMS, but we saw that it always got a cache miss and had to do the work all over again. Maybe that is why it always played ball in SSMS?

    The odd thing is, this procedure is only usually filtered on 2 different parameters, or it isn't filtered at all. When we recompile, odds are the first time the procedure is run and cached, it is run without filters, so there are no parameters passed. When the procedure is then filtered, performance is fine. Even though the plan cached at the start is probably configured for the initial, filterless load, it copes fine with any filter thrown at it. It continues to be fine no matter what you filter on for an indeterminate amount of time, after which the initial, filterless procedure call just stops working in the manner described above. Because this is always what is called on the first load of the page, there is no way past it in order to call from the webpage, a filtered list. However we did test it in SSMS, configured to use the bad query plan, and a filtered list worked fine. It is almost as if one particular filtered list causes the query plan to recompile and from then on, if you don't use a filter, it refuses to complete.

    Thanks

    Paul

    Hope that makes sense

  • It makes sense. Everything I can suggest is trying to convert the code into dynamic SQL. It's fast and it doesn't cache bad plans.

    You could use the ability of sp_executesql to accept parameters not in use in the sql string.

    In the article by Erland Sommarkog there's a good description of how to accomplish this.

    -- Gianluca Sartori

  • paul.goldstraw (5/11/2010)


    Your English is fine 🙂

    We did look a long time ago at what plans existed for the procedure. There were two and one was always called by the web front end. The other one should have been called by SSMS, but we saw that it always got a cache miss and had to do the work all over again. Maybe that is why it always played ball in SSMS?

    The odd thing is, this procedure is only usually filtered on 2 different parameters, or it isn't filtered at all. When we recompile, odds are the first time the procedure is run and cached, it is run without filters, so there are no parameters passed. When the procedure is then filtered, performance is fine. Even though the plan cached at the start is probably configured for the initial, filterless load, it copes fine with any filter thrown at it. It continues to be fine no matter what you filter on for an indeterminate amount of time, after which the initial, filterless procedure call just stops working in the manner described above. Because this is always what is called on the first load of the page, there is no way past it in order to call from the webpage, a filtered list. However we did test it in SSMS, configured to use the bad query plan, and a filtered list worked fine. It is almost as if one particular filtered list causes the query plan to recompile and from then on, if you don't use a filter, it refuses to complete.

    Thanks

    Paul

    Hope that makes sense

    By and large, you nailed it. With so many different sets of parameters, it creates a plan for what it's given. In the event what it's given is one of the sub-sets, it changes the execution plan. There's no way around it without rearchitecting or using a query hint. Personally I'd just go with the recompile hint in this case, but you could try the OPTIMIZE FOR hint.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The only downside I see in using hints is that they have statement scope, not procedure scope. If I remember right, there's an issue still not fixed in the RECOMPILE hint at procedure scope in SQL2005, that causes recompilation to have basically no effect.

    Grant, please correct me if I'm wrong.

    -- Gianluca Sartori

  • Gianluca Sartori (5/11/2010)


    The only downside I see in using hints is that they have statement scope, not procedure scope. If I remember right, there's an issue still not fixed in the RECOMPILE hint at procedure scope in SQL2005, that causes recompilation to have basically no effect.

    Grant, please correct me if I'm wrong.[/quote

    I can't say you're wrong. All I can say is I'm not aware of an issue with the hint.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Id agree with Grant here. We were having the exact same issue as you are currently seeing (with the arithabort off). After careful consideration I added to the recompile hint to the stored procedure. The performance hit due to the recompile is negligible compared to the lengthy run times we were seeing. Execution times went from 5 minutes (when using a bad plan) to less than 200ms using a variety of different parameters. This solution has now been in place for 6 months and we no longer experience time outs.

    Just my 2 cents

    Luke C
    MCSE: Data Platform, MCP, MCTS, MCITP - Database Administrator & Database Developer

  • The issue with the RECOMPILE hint is the one described here by Erland Sommarskog. Maybe I misunderstood what he meant, but I still believe there's something wrong with it.

    -- Gianluca Sartori

  • I said query hint and I meant procedure hint. I wasn't aware of that 2005 issue though.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Great article! I haven't seen this issue as of yet in my environment using "WITH RECOMPILE".

    Environment - SQL Server 2005 x64 sp3 cumulative update 6

    Luke C
    MCSE: Data Platform, MCP, MCTS, MCITP - Database Administrator & Database Developer

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply