Reset of a single column in sys.dm_exec_query_stats

Question

Reset of a single column in sys.dm_exec_query_stats

Ondrej Bouda

SSC Enthusiast

Points: 173
More actions
October 19, 2021 at 2:46 pm

#3942015
Dear friends,
I have a custom performance monitor based on snapshotting sys.dm_exec_query_stats, and now I noticed that once upon a time I'm getting something which looks like a reset or overflow for the total_elapsed_time column.
- I'm comparing two subsequent snapshots of the same sql_handle + plan_handle + statement_start_offset + statement_end_offset + creation_time (I can guarantee that this combination is unique as my monitoring includes a custom table with a unique index over these columns).
- I can see that the value for total_elapsed_time sometimes drops down to almost zero (from a very high number, representing more than 20 days) but the values in all other columns are increasing as usual (execution_count, total_worker_time, total_logical_reads etc.). There is no recompile, all other query stats go on - just a single column gets reset...
Actually I found this behavior in 2 MSSQL instances, and there is one more weird thing:
- MSSQL 2017 => the reset threshold seems to be around 7 x 10^12.
- Azure Managed Instance => the reset threshold seems to be around 1.85 x 10^12.
- If there was an overflow, I would expect it to happen at the max bigint value => around 9.2 x 10^18.
  Moreover, I would not expect the reset threshold to change across MSSQL versions (and especially not to go down).
Is this a known issue / expected?
Is there any documentation for that (especially which column has which reset threshold for different MSSQL versions)?
Thanks in advance.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399206 More actions · Answer 1

It's probably just evidence of a given query exiting cache for whatever reason it exited cache. All the execution DMVs are cache dependent. Any given query that leaves cache, for any length of time, resets all counters to zero. So, all the rules around aging out of cache are in play, as is DBCC FREEPROCCACHE and ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE CACHE (both with, or without a plan_handle).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ondrej Bouda SSC Enthusiast Points: 173 More actions · Answer 2

No, the exec plan definitely survived - there was still the same, weeks old creation_time.

Moreover, should there be a reset of all columns/values, I would have no issue with that. The actual issue is that only total_elapsed_time was reset and all other values were going on - which resulted in malformed AVG values:

I was getting really weird numbers for AVG elapsed_time.
I had usual AVG values for all other counters - worker_time, logical_reads etc.

Which just did not fit together...

Maybe I could explain why I experience reset of total_elapsed_time and not other fields.

I'm getting the reset for several different queries, all of them sharing the same characteristics:

Are massively executed in parallel (hundreds of app threads executing the query).
Quite often wait for a lock => have AVG elapsed time up to several seconds
(may look like a poor app design but the use case is rather healthy: it is an import of data from a 3rd party system, and the actual blocking level depends on the actual data, which is highly different each time => sometimes there is almost no lock wait and other times everything is blocked).
The exec plan survives for several weeks without a recompile.
Therefore, total_elapsed_time is growing really fast - much faster than any other counter for any other query - and can grow to really huge numbers over the long weeks.

I think that all other columns/values may be subject to the same reset, but they are really unlikely to hit the reset threshold as all other counters represent a serious consumption of resources (CPU, disk IO) => cannot grow as quickly as elapsed time in this parallel-long-wait use case => not likely that any exec plan would survive long enough to hit the threshold.

Maybe you could make a quick check of your environments:

select max(total_elapsed_time) from sys.dm_exec_query_stats;

I guess that the number will be way bellow the reset threshold I identified => you are unlikely to hit this issue:

7 x 10^12 for on-premise MSSQL 2017.
1.85 x 10^12 for Azure Managed Instance.

Reset of a single column in sys.dm_exec_query_stats

Cookies on SQLServerCentral