Per Call Performance Monitoring

  • I work on a product that utilizes a large number of databases to store user data. The data is partitioned across the databases at a user level, so we guarantee that everything related to a specific user resides in a single database. On top of that, the data stored per user can vary, resulting in different performance of our stored procedures between databases (statistic skew) and between users (data set sizes). This difference can sometimes be extreme, and so I want to come up with a mechanism to collect the following data per sproc call so that we can detect issues as early as possible:

    1. Execution time

    2. CPU time

    3. IO

    I can get the execution time from the machine calling the database, but so far I have yet to find an inexpensive mechanism to collect the CPU and IO information. Thoughts I've had:

    1. Using "set statistics io on" and "set statistics time on", parsing the message output, and generating totals for the values I care about

    - Doable but awkward

    2. Using sys.dm_exec_sessions to harvest before and after values and return a new result set with the performance stats

    - It doesn't appear as though the DMV updates fast enough to get meaningful data (I'm seeing 0s come back)

    3. Hook into the profiling system to try and match "RPC:Completed" events to their calls

    - Unrealistic... i can't fire up a profile session and tear it down for ever sproc call I make

    Has anyone else had do to something similar to this before? I'd appreciate any ideas on how I could collect the data I'm after

  • Extended events?

    But similar to profiler it's more something you'd have running for a period and analyse in general rather than doing that online.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the suggestion, I wasn't aware of Extended Events and it's definitely something that will no doubt be useful for me at some point. However, given that I want to always be collecting the performance information it does feel a little heavy weight. So far it seems like parsing the message output is my best approach, as cumbersome as it is...

  • Ificator (10/30/2011)


    I work on a product that utilizes a large number of databases to store user data. The data is partitioned across the databases at a user level, so we guarantee that everything related to a specific user resides in a single database. On top of that, the data stored per user can vary, resulting in different performance of our stored procedures between databases (statistic skew) and between users (data set sizes). This difference can sometimes be extreme, and so I want to come up with a mechanism to collect the following data per sproc call so that we can detect issues as early as possible:

    1. Execution time

    2. CPU time

    3. IO

    Is it what you are looking for?

    select

    DB_NAME(qt.[dbid]) AS [DB Name],

    OBJECT_SCHEMA_NAME(qt.objectid, qt.[dbid]) + '.' + OBJECT_NAME(qt.objectid, qt.[dbid]) AS [SP Name],

    SUBSTRING(qt.[text],qs.statement_start_offset/2,

    (CASE

    WHEN qs.statement_end_offset = -1

    THEN LEN(CONVERT(nvarchar(max), qt.[text])) * 2

    ELSE qs.statement_end_offset

    END - qs.statement_start_offset)/2) AS [SQL Statement],

    qs.execution_count,

    ISNULL(1.0*qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second],

    -- 1. Execution time

    qs.total_elapsed_time,

    qs.last_elapsed_time,

    qs.min_elapsed_time,

    qs.max_elapsed_time,

    qs.total_elapsed_time/NULLIF(execution_count,0) as avg_elapsed_time,

    -- 2. CPU time

    qs.total_worker_time,

    qs.last_worker_time,

    qs.min_worker_time,

    qs.max_worker_time,

    qs.total_worker_time/NULLIF(execution_count,0) as avg_worker_time,

    -- 3. IO

    qs.total_physical_reads,

    qs.last_physical_reads ,

    qs.min_physical_reads ,

    qs.max_physical_reads,

    qs.max_physical_reads/NULLIF(execution_count,0) as avg_physical_reads,

    qs.total_logical_writes,

    qs.last_logical_writes ,

    qs.min_logical_writes ,

    qs.max_logical_writes,

    qs.max_logical_writes/NULLIF(execution_count,0) as avg_logical_writes,

    qs.total_logical_reads ,

    qs.last_logical_reads ,

    qs.min_logical_reads ,

    qs.max_logical_reads,

    qs.max_logical_reads/NULLIF(execution_count,0) as avg_logical_reads

    FROM sys.dm_exec_query_stats qs

    CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt

    You can do a group by [SP Name] if you don't need the statement level details.


    Alex Suprun

  • Profiler to local disk files, then load into database for analytics. Do it all the time. Very light weight going to disk. All you need is tsql/rpc batch completed events and a few ancillary columns such as databaseid and starttime. Once in a database table you can also do aggregate analysis, outlier analysis, etc.

    You can also pony up for one of numerous third party tools to do this for you.

    BTW, profiler does provide you with parameter values for the stored procedure calls, so you can know which users, etc are having issues.

    Oh, and I will add that either dynamic SQL (be sure to guard against sql injection) or OPTION (RECOMPILE) will likely be your friend when dealing with data skew. Do that all the time for clients too. 😎 It is almost always worth it to give up some CPU ticks and plan cache (which can be regularly flushed btw) to avoid the disastrously bad plans you can get hammered with when you have significant data skew.

    Oh 2: you aren't using table variables either, right??

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply