Squeezing Wasted Full Scans out of SQL Server Agent



This tweak in this article was done on a server running NT 4, SP6, with SQL Server 7 SP4 installed. The machine is a dual processor 1Ghz Pentium 3 with 2 Gig of ram. As always, make a backup first - your mileage may vary. This isn't for the faint of heart, it involves editing a MS supplied system stored procedure. Additionally, if you're using SQL Server Agent Alerts you won't see any performance benefit.

The Problem:

While using the NT Performance Monitor to check out our server this weekend, I decided to add Full Scans/second and Index Searches/second to the graph to give me some general feedback on data access and how well our indexes and queries were designed. (You'll find Full Scans/second and Index Searches/second under SQL Server/Access Methods in Performance Monitor.) I was disappointed to find an average of 16 Full Scans/second even with a light load. It peaks every 20 seconds at 161 Full Scans/second, the rest of the time it's pretty much at zero. All in all, a very regular "heartbeat" looking graph, although a very slow one. I was quite unhappy to think that our software was doing a full table or index scan on such a regular basis and decided to dive into QA and the SQL Profiler to find out who was at fault and get it fixed.

I'll spare you the details of the time I spent with profiler and reviewing our code (way too long to admit) to find the culprit, here's the summary:

It was SQL Server Agent doing the full scans. Now, like many of you, we use SQL Server Agent to manage backups and optimizations. We have nightly full backups, weekly optimizations and transaction log backups several times an hour. But not anything every 20 seconds. Regardless, it seems that Agent checks out something every 20 seconds.

A couple of questions immediately came to mind. One: Full scans are supposed to be a bad thing. Is this amount worth chasing? Probably not, but I wanted the Full Scans/second to be a red flag for me. Besides, 161 Full Scans 3 times a minute adds up eventually and we have a relatively busy server. The second question: How do I fix it? Can I add an index to whatever table Agent is scanning? Can I turn off a piece of Agent that I don't use like Alerts?

Using Profiler, I found that the scans occur when Agent runs msdb.dbo.sp_sqlagent_get_perf_counters to see what's going on with your server so it can decide if you need any alerts generated. This scan takes place whether you have any defined, active alerts or not. I decided to "improve" on MS's efforts just a bit.

The SP does two things. It first makes a temp table of all your enabled, defined alerts. The check for (performance_condition IS NOT NULL) is most likely done because the sample Alerts that come installed are enabled, but don't have a performance condition. Secondly, the SP does a pretty involved Select statement against the Master DB to find the alerts in your temp table that have out of band numbers in the Master DB. This second section of the SP is complex enough that I didn't want to rewrite it and I immediately ruled out adding any indexes to the tables it was looking at because they are tables in the Master DB.

The Fix - Modify msdb.dbo.sp_sqlagent_get_perf_counters:

It seems smarter to me to just avoid that second chunk entirely if you don't have any alerts defined. So that's what I did: I just added a check on @@RowCount after the Insert section. If you don't have any Alerts that are enabled, you don't add any rows to the Temp table. I included the possibility that SQL calls this SP with the "All_Counters" flag set to 1 because they designed it that way, but I haven't caught it being used yet. My admittedly simple modification is in bold.

--First Section
--Insert on temp table for each alert
IF (@all_counters = 0)
SELECT DISTINCT SUBSTRING(performance_condition, 1,
CHARINDEX('|', performance_condition, PATINDEX('%[_|_]%',
performance_condition) + 1) - 1)
FROM msdb.dbo.sysalerts
WHERE (performance_condition IS NOT NULL)
AND (enabled = 1)
If (@@RowCount > 0) or (@all_counters = 1)
--Long Select Statement against master.dbo.sysperfinfo
--that checks every performance counter SQL has
--and has a "not equals" in the Where clause


It's working just fine. My every 20 second Full Scans are gone from the NT Performance monitor. Presumably if I add alerts in the future, I haven't broken the process. And my original goal, of treating any Full Scans as bad things that need to be investigated, is easier to monitor. Besides, over 695,000 full scans aren't taking place on my server every day now.

MS probably wrote a good SP here in the first place. I think that they added the portion of the Where clause with the "not equals" later to avoid some problem. With the (spi1.cntr_type <> 1073939459) present in the second section, any index on the master.dbo.sysperfinfo table won't be used efficiently, resulting in the full scan.

Bob Musser is the President of Database Services, Inc., a vertical market software company in Winter Park FL. He can be reached at BobM@dbsinfo.com.