Intermittent Stored Proc long running issue.

  • Hello

    I have an interesting (at least to me) case of a stored procedure that is intermittently running long. These are the step I have taken to troubleshoot and am looking for some guidance on where else I might look.

    The behavior of the procedure is as follows. When the procedure has not run for a long time (several hours 8+), it takes up to 30+ seconds.

    Subsequent runs of the proc after that time run in < .050 seconds. The only inserts being done are into table variables with pretty small (<100 rows). No updates/deletes.

    It returns only one row of XML type data.

    Sql version 10.50.2500.0

    I have reviewed the Execution plan. Nothing interesting there. No indication of missing indexes. No scans of any sort. All lookups are seeks on Indexes or Custered Indexes.

    I have rebuilt all the indexes on the tables that are touched by the stored procedure.

    I have updated the statistics on the affected tables as well.

    Auto update statistics is turned on.

    The stored procedure does access the temp db through table variables but I do have 25 MDF files for the tempDB. However, other similar databases on this server do not exhibit the same behavior with this stored procedure.

    I have checked the perfmon stats on the server

    CPU remains pretty constant throughout the day between 30% and 40%.

    Disk queue runs between 0 and 1 all day long with no significant spikes.

    Network utilization is pretty constant at 20% through out the day.

    The Latch wait time shows to be around 1ms during the time where the procedure ran long.

    The Buffer Page life expectancy is a little low averaging around 15min, but other similar databases on this server do not exhibit the same behavior with this stored procedure.

    MAX DOP is set at 4 for the server.

    Disk space has plenty of free space.

    The machine itself is a powerhouse.

    Processors: 4

    Total Mem: 68,705,669,120 ( 68.71 GB )

    Processor Designation: CPU1 Intel(R) Xeon(R) CPU E7540 @ 2.00GHz

    Power Plan: Balanced

    Core: 6 Processor: 12 Calculated: 48

    The disk drives are Equilogic Sumo.

    Any suggestions.

    Thanks in advance

    Steve

  • The execution plan is probably dropping from the procedure cache based on factors referenced in this link.

  • Recompilation would not take execution time from .05s to 30s unless there are some hellaciously complex queries in the proc. Can you post the code? sanitized to protect the guilty of course.

    Can you also post an Actual Execution plan?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Try using SET STATISTICS IO ON next time you run it for the first time in a few hours. Save the results, and then run it again and compare the second results with the original. If your physical reads have decreased, you are seeing pages being flushed out of the cache when they haven't been used for a few hours. You may be reading too many pages in order to produce the query results - perhaps your index fill factors are too low, maybe your clustering keys are too wide, or you could just be missing the index that would make the query run more efficiently.

    John

  • In early testing for the procedure I did find an index that was missing. That did not fix the issue completely, it only made it less recurrent. As far as width of the clustering keys, they are only single column int values. Fill factors are 90 by default, so I would not anticipate that being an issue, but I will review that.

    I'll check the IO setting you recommended, however... if I am "reading too many pages" I am not entirely certain how I would correct that. I am familiar with some queries that read thousands of rows and basically throw away all that work to produce only a small subset of what was touched, however that issue would not produce the wildly variant nature of the execution times. I would anticipate that the execution times would be consistently high.

    I should also mention that no data is ever removed from the tables and the index keys, once inserted, never change. They are not high volume insert/update tables, probably only taking less than 1K insert/update statements a day.

    Steve

  • Steve

    I understand. You may still have a missing index, I suppose. In any case, do the SET STATISTICS IO ON thing as I suggested - you'll then be able to rule out or in whether it's a problem of reading from disk vs reading from cache.

    John

  • They physical reads did fluctuate from 0 on a fast run to 148 on the slow run. While I would not think that would produce such a large change, it certainly is worth investigating.

    Thx

    Steve

  • Thanks for all the comments.

    I moved the database to one of our less utilized servers and the problem disappeared. The procedure now runs between 16 and 200ms all the time. Based on the fact that the Buffer Page Life Expectancy was extremely low and that this really only occurred during higher volume times, I am going to chalk this up to a resource issue based on the physical read information I got from the IO diagnostic. Although this feels like BPLE was probably only a contributor.

    Steve

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply