PERFMON Statistics For SQL Server Aren't Telling Me What is Going On

  • Since our last software release to production, there have been some observable PERFMON statistics changes on our main database server:

    1. SQL Server:Plan Cache:Cache Hit Ratio:_Total - used to be about 99%. Now is 92%.

    2. SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans - used to be about 96%. Now is anywhere from 60% to 84%.

    The obvious thing to suspect is SQL Stored Procedure Plan recompilations. Some of our stored procedures intentionally force a recompile.

    But, some other PERFMON statistics don't seem to support this suspicion:

    3. SQL Server:SQL Statistics:SQL Compilations/Sec - still close to 0% all the time. This hasn't changed.

    4. SQL Server:SQL Statistics:SQL Re-Compilations/Sec - still close to 0% all the time. This hasn't changed.

    If I execute a "DBCC FREEPROCCACHE" command:

    1. SQL Server:Plan Cache:Cache Hit Ratio:_Total - immediately drops to ~84% and heads back up to ~92% within ~6 seconds.

    2. SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans - immediately drops to 0% and takes a few minutes to climb back up to ~60% to ~84%. This is NOT the way it used to respond. It used to recover in about 6 seconds.

    3. SQL Server:SQL Statistics:SQL Compilations/Sec - immediately spikes to ~60, then in a few seconds, drops back down near 0%.

    4. SQL Server:SQL Statistics:SQL Re-Compilations/Sec - immediately spikes to ~9, then in about 2 seconds, drops back down near 0%.

    If I execute the "sp_UpdateStats" stored procedure on all of our databases (it performs selective updates on columns statistics only if they need it):

    1. SQL Server:Plan Cache:Cache Hit Ratio:_Total - steadily drops to ~79% and then VERY slowly heads back up to ~92% within an hour.

    2. SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans - steadily drops to ~35% and takes an hour or two to climb back up to ~60% to ~84%.

    3. SQL Server:SQL Statistics:SQL Compilations/Sec - immediately spikes to ~60, then in a couple of seconds, drops back down near 0%.

    4. SQL Server:SQL Statistics:SQL Re-Compilations/Sec - immediately spikes to ~6, then in about 1 second, drops back down near 0%.

    Currently:

    > Buffer Cache Hit Ratio is 99.967%.

    > Page Life Expectancy is 27,841 seconds.

    _____________________

    The information, above, about changes in the server's operating characteristics, is all I have to go on. Overall performance and response times are still excellent.

    Does anyone have any ideas on how I could investigate the changes in the:

    1. SQL Server:Plan Cache:Cache Hit Ratio:_Total,

    2. SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans?

    Where should I look? What should I look for? What could be the probable causes of the changes? I'm open to any ideas on this subject.

    Thanks,

    LC

  • I hope this post is useful to others.

    I've discovered the cause of our SQL Plan Cache problem: Cache Bloat.

    Our programmers are pretty clever and to solve some problems associated with parameter sniffing and the Optimizer, they modified several hundred queries to take parameters passed to stored procedures, assign them to local variables, then use the local varialbes in query(ies) in the stored procedure. This had the net effect of preventing the Optimizer from using the current query plan for that stored procedure and forcing a recompile, exactly what they wanted.

    But there was a side-effect. Unlike the option WITH RECOMPILE, localizing stored procedure parameters did not recompile the existing plan in the SQL Plan Cache. Instead, it left that plan in the cache and created another plan. And another, and another, until the SQL Plan Cache was full of unused plans. That's why the SQL Plan Cache efficiency is so low.

    I've discussed the problem with the VP of Engineering and he's realized that the several hundred stored procedures must be modified to bring the SQL Plan Cache efficiency back up to acceptable levels, near 99%.

    LC

  • " Our programmers are pretty clever ... " an oxymoron if I ever saw one methinks.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • The more serious observation is that of are you making changes due to performance issues or perfmon issues?

    I'm never too bothered overall with cache hit ratios in as much as often there's not too much I can do. What does concern me is performance issues with the application that may manifest as issues within the database.

    The whole point about plans is that optimally they need to be reused. With the page life you have I figure there's no issues with memory and in sql 2008 the cache can grow to about 25% of total memory on the box - or something similar - it will grow to accomodate within the memory available and the aging out of plans will be based upon overall memory and the balance of the various caches. During testing of sql 2008 I've seen my procedure cache up to 5GB and I don't have the issues you describe. rebuilding indexes and stats will trigger procedure recompiles.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • COL (chuckle out loud). Our programmers actually are very clever but they specialize in .NET and I specialize in SQL and SQL Server, so I'm constantly educating them on the best ways to access our databases. What our shop has going for it is that no one brings their ego to work, none of us hoard information, and we are all committed to producing excellent results, so everything works pretty well around here.

    Regarding your second post, the problem I am solving is not academic. Part of my job as the company's DBA is to be alert to changes in any of our database servers' or SAN's performance characteristics. When I see a very important metric change, it is my job to find out why, then make recommendations to management to either change something or possibly do nothing.

    The changes in the PERFMON "SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans" metric are important because they show a loss of efficiency. Right now, we have a really powerful server and SAN so deficiencies in implementation can easily be hidden. But our company is growing rapidly. As we acquire new clients, our clients' workloads on our systems will be increasing at a rapid rate. My job is to ensure that our infrastructure, both hard and soft, can seamlessly manage the growth we're experiencing.

    I could probably get by with a low efficiency "SQL Server:Plan Cache:Cache Hit Ratio:SQL Plans" metric for quite a while but that's not my style. I know how to rectify the negative change it's experienced and I've recommended that we should immediately take appropriate steps to rectify it. This also sets the stage to educate our development staff on a best implementation practice for forcing stored procedure plan recompilations in the future and to avoid the problems that the most recent implementation has caused.

    Plus, I'm a proactive person. If I see a problem, I jump on it. I call it "Going Ugly Early".

    😀

    LC

  • I take your point but creating a new plan every time is ( or was ) the same as issuing a recompile, probably less efficient. SQL 2008 will recompile at statement level so efficiency and reuse of plans is improved. In x32 world the size of the procedure cache is still limited ( well I think it is but as I've only ever used x64 from sql2005 onwards I'm not 100% sure here ) with x64 the cache will grow much larger so I would see growth of cache which reduces memory for buffer cache say, as being more important than hit ratio.

    I suppose it's a matter of metrics, I track recompiles vs compiles, miss events, create events and so on - these are to my mind more informative than the actual ratio.

    I'm not going to debate the programmer bit but I'd say a dot net programmer writing against a sql database who hasn't a good knowledge of sql and how databases/sql server works would be a bit of a worry.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Our server is a 64 bit machine.

    I've only recently begun to track the actual size of the plan cache so I have no prior reference points for comparison.

    The number of compilations and recompilations is normally hovering near zero with an occasional small jump upwards.

    At this point, I think I'm only concerned about filling up the plan cache with unused plans. I know that SQL Server will eventually eliminate those that are unused and replace them with other plans but in the meantime, it will have a lot of unused plans to sort through which is inefficient.

    Thanks for your responses, TGODBA.

    LC

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply