Logging sql stms into a table when there is a High cpu alert

Question

Logging sql stms into a table when there is a High cpu alert

bobrooney.81

SSC-Addicted

Points: 468
More actions
May 25, 2022 at 7:55 am

#4038844

Hi,
Whenever I get an high cpu alert , be it non-working hours or midnight, I want to run sp_whoisactive and log into a table or a server side trace or xe for 10-mins and stop it. How to do it? Inside a job what logic I need to keep to stop the trace or stop a job?
All I am trying to do is collect sql stmts during the time of issue.
Best Regards,
Bob

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16337 More actions · Answer 1

you don't do it when you have high cpu - you have it running all the time with 10,20, 30, 60 seconds interval or so depending on how busy your system is and on what you consider a excessive time for a sql to run.

see https://www.brentozar.com/archive/2016/07/logging-activity-using-sp_whoisactive-take-2/

I also use the following - again on a SQL Server job running every 10 seconds.

create table dbo.DBA_QueriesSnapshot
(
, Snapshot_Time           datetime       not null
, database_id             smallint       not null
, Database_Name           nvarchar(128)  null
, session_id              smallint       not null
, login_name              nvarchar(128)  not null
, host_name               nvarchar(128)  null
, program_name            nvarchar(128)  null
, client_interface_name   nvarchar(32)   null
, start_time              datetime       not null
, status                  nvarchar(30)   not null
, command                 nvarchar(32)   null
, wait_type               nvarchar(60)   null
, open_transaction_count  int            not null
, memory_usage            int            not null
, deadlock_priority       int            not null
, row_count               bigint         not null
, reads                   bigint         not null
, writes                  bigint         not null
, WaitTimemilliseconds    int            not null
, resource_description    nvarchar(3077) null
, blocking_session_id     smallint       null
, ElapsedTimemilliseconds int            not null
, statement_text          nvarchar(max)  null
, FullText                nvarchar(max)  null
, query_plan              xml            null
)
with
(
data_compression = page
)


create clustered index DBA_QueriesSnapshot_ix1 on dbo.DBA_QueriesSnapshot
( Snapshot_Time ASC
)
with ( data_compression = page)

and then the query to add to the sql job step

insert into DBA_QueriesSnapshot 
select getutcdate() as Snapshot_Time
     , qs.database_id
     , db_name(qs.database_id) as DatabaseName
     , qs.session_id
     , es.login_name
     , es.host_name
     , es.program_name
     , es.client_interface_name
     , qs.start_time
     , qs.status
     , qs.command
     , qs.wait_type
     , qs.open_transaction_count
     , es.memory_usage
     , es.deadlock_priority
     , es.row_count
     , es.reads
     , es.writes
     , wait_time as WaitTimemilliseconds
     , waits.resource_description
     , waits.blocking_session_id
     , qs.total_elapsed_time as ElapsedTimemilliseconds
     , substring(st.text, (qs.statement_start_offset / 2) + 1,
       ((case qs.statement_end_offset
       when -1
           then datalength(st.text)
       else qs.statement_end_offset
       end - qs.statement_start_offset) / 2) + 1) as statement_text
     , text as FullText
     , qp.query_plan
from sys.dm_exec_requests as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan(qs.plan_handle) as qp
inner join sys.dm_exec_sessions es
    on es.session_id = qs.session_id
left outer merge join sys.dm_os_waiting_tasks waits
    on waits.session_id = es.session_id
    and waits.waiting_task_address = qs.task_address
where qs.session_id <> @@spid
    and qs.wait_type <> 'TRACEWRITE'

-- change as required - this is 5 seconds - many time 10 seconds is enough
-- only report queries that are taking more than 5 seconds
    and qs.total_elapsed_time > 5000

    and substring(st.text, (qs.statement_start_offset / 2) + 1,
    ((case qs.statement_end_offset
    when -1
        then datalength(st.text)
    else qs.statement_end_offset
    end - qs.statement_start_offset) / 2) + 1)
    not like 'WAITFOR DELAY%'
    and es.program_name not like 'DatabaseMail%'
    and qs.command not like 'backup%'
    --and db_name(qs.database_id) not in ('DBATools') -- exclude databases if/as required
    --and command not like 'DBCC'                     -- exclude particular commands if/as required

bobrooney.81 SSC-Addicted Points: 468 More actions · Answer 2

So, on a Prod server, when DB Team receives Hgh CPU Alert or incident for a sustained period. what would be the response?

do you check currently running sql stmts driving high cpu , collect those statements and inform app team that these are consuming high cpu and try to optimize them and Kill those high cpu queries?

frederico_fonseca SSCoach Points: 16337 More actions · Answer 3

having one of the above running you always have the statements on the database for investigation.

Even if they aren't consuming high CPU having the history (mainly the second query I have) enables you to look at which queries are taking more than x seconds to execute, what are their normal waits (you can in most, but not all, exclude the parallel waits) as well as which ones are getting wait locks on others.

this can then be used to identify badly performant queries and go back to the owners of those queries and advise of better ways of doing them (and no you should not just tell them they are wrong and THEY have to fix them - you as a DBA should look at them and identify ways of improving them and suggest the owners possible solutions)

regarding killing high CPU - that should ALWAYS be a last resource and only after agreeing with the business and only if it is preventing other processes to execute.

High CPU is not always bad - it just depends on what else needs to be using the server at that time.

I have servers where at times, during batch processing, are at a permanent 95% CPU - this is desirable and the processes were setup in multithreading in a way that they do not make the server work at 100%.

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 4

Alternately, rather than building that yourself, get a SQL monitoring and reporting tool. The good ones will fire off an alert when a condition is met AND provide you with historical information about what was going on at that time.

Nice thing about buying it - support! You don't have to develop the scripts and figure how you are going to capture the data, it will alert you when problems occur other than just high CPU (such as instance offline, instance unreachable, SQL agent not running, low disk space, deadlocks, excessive blocking, etc). Now you CAN build things for this yourself, but why reinvent the wheel? And if you do build it yourself and a new version of SQL has a better method to capture the data (XE for example), do you REALLY want to go in and change all of your code to use the new and better method of capturing the data OR would you rather just have the update to the tool fix the problem?

PLUS the reporting tools offer a nice UI for you to view and manage all of the alerts! 1 nice web (usually web) interface to review all alerts and sometimes even offers recommendations on what to do with the alert. And if it doesn't capture everything you need, you can add custom metrics for your specific systems. For example, if you store serial numbers in an INT column, you MAY want an alert when you are about to run out of INTs and may need to change it to BIGINT. It is easy to add that to a monitoring tool and set up an alert when the value exceeds a specific number.

I also agree with frederico_fonseca - high CPU isn't always a bad thing. In a lot of cases with SQL, it is preferred. Unused CPU is wasted CPU. Now, you likely don't want it running 100% constantly, but having it in the 80% range frequently means your server is being utilized. My opinion - it is never a good practice to "look for issues". Find the problem you are trying to solve and solve that problem. If your "problem" is high CPU BUT it is not impacting anyone or anything, then it is likely not a problem. IF end users are reporting their queries are sometimes fast and sometimes slow, investigate what is happening during the slow times AND the fast times. It COULD be something like parameter sniffing or poor indexes or outdated statistics, all of which can be easy to fix. Not always easy to fix, but it sometimes can be an easy fix. If nobody is noticing any issues and you are just seeing HIGH CPU at times, I personally wouldn't worry about it UNLESS you are seeing problems that end users are not actually reporting like slow running queries, lots of long running transactions, lots of blocking or deadlocks, end users reporting issues, etc. AND the other thing with high CPU is that it MAY not even be SQL server that is causing the high CPU. It COULD be something like the antivirus. Also when I say "lots" that is VERY subjective and system specific. On my systems, 2 queries being blocked may be "lots", but on yours it may be 200 is the threshold for "lots".

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.