Find the root cause of SQL Server slowness and possibly killing blockers automatically

Question

Find the root cause of SQL Server slowness and possibly killing blockers automatically

Sridhar-137443

Hall of Fame

Points: 3762
More actions
December 17, 2018 at 11:10 am

#378904

Hi,
we have noticed this morning around 9:30 AM EST that our applications became unresponsive. we started getting emails from SQL Sentry that the service stopped running. This is probably because SQL is so busy that it cannot run any more tracing for SQL Sentry. I wanted to see if there is a way now to figure out what query or SP or process caused this? Is there a way now or is it too late?
My other question is there a way to automatically detect deadlocks or blocking queries that have been running for more than say 5 minutes? If there is a way, would it be a good idea to kill those blocking queries automatically?
If there is anything else that we can do on daily basis to eliminate the issue, we are interested to find out.
Thank you.
Sridhar.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 1

So, automatically killing long running queries or queries that are blocking is a very bad idea. Can it be done? Certainly. There are a number of ways you could do it. I strongly recommend against it though. For example, backups are frequently the longest running queries on most systems, yet, killing them would be a very poor choice. So, we need to start building in exceptions to our "kill the slow query" methodology.

Second, if you're the person who has submitted Query X, and this is that awful query that runs for a long time, holds lots of locks and blocks off people, what do you do after the system kills your query? I seriously doubt you reexamine your T-SQL code and start a tuning session to ensure you don't get killed again. Nope. You just resubmit it. Three or four times. Then you call the boss, who calls your boss, who complains that their person can't get their work done, so stop killing their heinous query even though it's bringing the server down. Now, you are in a situation where you now can't touch that query. This will happen to you a lot.

Instead, you need to do the hard work of identifying the queries that are causing pain, and then, not killing them (resulting in rollbacks, retries, calls between bosses), but tuning them. You need to work with the people submitting the queries and find ways to filter them appropriately, index them, etc., Whatever you have to do.

I'm not an expert in SentryOne (in fact, I work for the competition). I'm sure they have a long running process report (or reports) and you can then identify the painful queries. Further, I'm pretty sure they have a whole set of tools around query tuning (I know because I've worked with some of them their employees). Use those tools to identify and address the issues. If SentryOne can't help on this (and I'd be very surprised by that), then you can always fall back to using extended events to capture query metrics yourself to identify the painful queries.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 2

I setup a "Blocking ALERT" which runs a stored procedure to save the current active spids and the code they're currently executing into a table (so I have the proof I sometimes need) and then it sends me an email.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sridhar-137443 Hall of Fame Points: 3762 More actions · Answer 3

Thank you, Jeff. I am trying to understand what causes the server to be unresponsive. We received several timeouts when the server became unresponsive. Can Blocking or Deadlocks cause the whole SQL Server to become unresponsive?
If you have any links that can help me understand the issue, Please post it here. I am interested to learn more and hopefully stop the SQL Server from becoming unresponsive again.

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 4

Sridhar-137443 - Tuesday, December 18, 2018 6:25 PM
Thank you, Jeff. I am trying to understand what causes the server to be unresponsive. We received several timeouts when the server became unresponsive. Can Blocking or Deadlocks cause the whole SQL Server to become unresponsive?
If you have any links that can help me understand the issue, Please post it here. I am interested to learn more and hopefully stop the SQL Server from becoming unresponsive again.

Yes, blocked processes can lead to other processes being blocked, which leads to other processes being blocked, etc., until a server is unresponsive. What you need is monitoring. This is a huge topic. However, to get you started on your most fundamental need, blocked processes, I'd suggest you read up on Extended Events and the blocked_process_report event. However, you'll also want to get further into this issue, capturing query metrics, wait statistics, overall performance metrics, all the rest of the process and knowledge for setting up monitoring. This isn't a small task. If you follow the link below on my book on Query Performance Tuning, I devote five chapters, just to this topic. There's a lot there to learn.

You have SentryOne. It should be doing all this work for you. I'd suggest spending time working with the product, or contacting their support.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 5

Grant Fritchey - Tuesday, December 18, 2018 6:28 AM
So, automatically killing long running queries or queries that are blocking is a very bad idea.

+100
😎

Identifying and fixing should be the approach!

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 6

Sridhar-137443 - Tuesday, December 18, 2018 6:25 PM
Thank you, Jeff. I am trying to understand what causes the server to be unresponsive. We received several timeouts when the server became unresponsive. Can Blocking or Deadlocks cause the whole SQL Server to become unresponsive?
If you have any links that can help me understand the issue, Please post it here. I am interested to learn more and hopefully stop the SQL Server from becoming unresponsive again.

Grant covered most of what I was going to say and more.

However, and it's a total shot in the dark based on some similar experience I had bad in 2017, let me ask some questions (especially since you have some tools in place to help you)...
1. If you don't already have one, figure out what the baseline for CPU usage is for "normal" periods.
2. When the episodes you're experiencing occur, does your total average CPU usage ramp up to somewhere over 70% and disk IO seem to drop off a fair bit?
3. When the episodes you're experiencing occur, run sp_Who2 (or whatever tool you have to see such things) and look for things that are in a KILLED/ROLLBACK state. If there are any, how many are in the KILLED/ROLLBACK state and how many CPUs do you have?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlsurfing SSCarpal Tunnel Points: 4883 More actions · Answer 7

I agree with Jeff creating some script that captures blocking into a table, that's probably easier if you are unfamiliar w/SQL Sentry - which is great if you know how to use it... otherwise it's easier to use what you know.... along with rest of posters indicating do not kill sessions... you should identify the root cause as people mentioned.

Also personally i have never had a good experiencing killing sessions... the ROLL/BACK process almost ALWAYS fails and those sessions get stuck... forces a SQL Server reboot... I don't know what that always happens w/my luck in MS SQL...

If you could find the frequency and which ones are at the head (primary one blocking)... potentially fixing that one might start to alleviate pressure of "blocking"...since it's like a queue the head of the blocker takes up time ..locking/blocking things... then all other's in line are affected.... or then you discover other subsequent queries that are affecting the server. sp_who2 you can check... unless you have thousands might be harder to see... I'm sure someone on the internet has written a query to provide some code to find the head blocker ? But I'd review before running, always risk running code

--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully 😀

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 8

A process I've used for many years to capture blocked process information is to use the built in Event Notification feature of SQL Server. It uses the
"Blocked Process Threshold" setting for the instance and Service Broker to notify you immediately when a session has been blocked for the Threshold number of seconds.

Here's a good article that has details and an example set of scripts:
https://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/04/06/event-notifications-monitoring-blocked-processes-and-other-events-end-to-end-how-to-set-it-up-and-make-it-work.aspx