Ghost Process attacks again

Question

Ghost Process attacks again

rogelio.vidaurri

SSCrazy

Points: 2117
More actions
January 23, 2018 at 10:20 am

#327629

Hi,
Every now and then, a process (red arrow) executed by SA in our SQL Server 2012 instance blocks our system. It lasts between 65- 80 minutes.
Image shows 2 processes only for this post's sake: session_id 24 blocking session_id 53.
We never use account SA, we don't have jobs running under those credentials. Nor backups nor anything.
Every time this happens we switch our traffic to our backup SQL Server instance (bidirectional transactional replication) and our system comes back to live immediately.
Any ideas what can be happening?
Today I got to execute sp_Blitz and it detected:
https://www.brentozar.com/blitz/poison-wait-detected/#THREADPOOL
Our CPU Usage was never zero though.
Regards

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

RandomStream Hall of Fame Points: 3735 More actions · Answer 1

SPID <=50 are mostly system processes and usually the program_name column should show blank. What was the program name for SPID24?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.

Did you have any visibility at all to WHAT query(s) were running on spid 24?

Do you have HEAP tables in play?

Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Eric M Russell SSC Guru Points: 125623 More actions · Answer 3

You can use the following to query details about a session. DBCC INPUTBUFFER will return the last command executed by session, even if the last command completed and the session is not running an active request.

DBCC INPUTBUFFER ( 25 ); SELECT * FROM sys.dm_exec_connections WHERE session_id = 25; SELECT * FROM sys.dm_exec_sessions WHERE session_id = 25; SELECT *, qt.text FROM sys.dm_exec_requests AS er OUTER APPLY sys.dm_exec_sql_text( er.sql_handle) AS qt WHERE er.session_id = 25;

Also, Adam Machanic maintains a stored procedure called sp_whoisactive, which does a good job of joining in additional information and filtering the result.
http://whoisactive.com/

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 4

RandomStream - Tuesday, January 23, 2018 12:26 PM
SPID <=50 are mostly system processes and usually the program_name column should show blank. What was the program name for SPID24?

it was the hostname of the server

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 5

TheSQLGuru - Tuesday, January 23, 2018 12:29 PM
There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.
Did you have any visibility at all to WHAT query(s) were running on spid 24?
Do you have HEAP tables in play?
Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.

the results I provided were the output of sp_whoisactive
spid24's sql_text is always NULL when this happens, so we don't have a clue of what's going on

HEAP tables, no, all of them have PKs

about replication, not sure if it could be a factor
when we set it up, we never use account SA
we need to have our data in 2 SQL server instances right away in order to be ready for emergencies like the one from today
it is a nightmare when both instances are active (we rarely do this) as replication crashes, that's why we keep it active/passive

we are thinking on hiring an expert for this

thanks all of you for replying

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 6

rogelio.vidaurri - Tuesday, January 23, 2018 1:38 PM
TheSQLGuru - Tuesday, January 23, 2018 12:29 PM
There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.
Did you have any visibility at all to WHAT query(s) were running on spid 24?
Do you have HEAP tables in play?
Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.
the results I provided were the output of sp_whoisactive
spid24's sql_text is always NULL when this happens, so we don't have a clue of what's going on
HEAP tables, no, all of them have PKs
about replication, not sure if it could be a factor
when we set it up, we never use account SA
we need to have our data in 2 SQL server instances right away in order to be ready for emergencies like the one from today
it is a nightmare when both instances are active (we rarely do this) as replication crashes, that's why we keep it active/passive
we are thinking on hiring an expert for this
thanks all of you for replying

Just a clarification, all your tables have a clustered index (this does not need to be a PK).

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 7

Eric M Russell - Tuesday, January 23, 2018 1:04 PM
You can use the following to query details about a session. DBCC INPUTBUFFER will return the last command executed by session, even if the last command completed and the session is not running an active request.
DBCC INPUTBUFFER ( 25 ); SELECT * FROM sys.dm_exec_connections WHERE session_id = 25; SELECT * FROM sys.dm_exec_sessions WHERE session_id = 25; SELECT *, qt.text FROM sys.dm_exec_requests AS er OUTER APPLY sys.dm_exec_sql_text( er.sql_handle) AS qt WHERE er.session_id = 25;
Also, Adam Machanic maintains a stored procedure called sp_whoisactive, which does a good job of joining in additional information and filtering the result.
http://whoisactive.com/

SELECT *, qt.text FROM sys.dm_exec_requests AS er OUTER APPLY sys.dm_exec_sql_text( er.sql_handle) AS qt
WHERE er.session_id = 24 got this:

I found this on google though:
"BROKER_TASK_STOP occurs when the Service Broker queue task handler tries to shut down the task. It should not be a problem and nothingbtonworry about from a performance perspective."

thanks for the queries

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 8

Lynn Pettis - Tuesday, January 23, 2018 1:41 PM
rogelio.vidaurri - Tuesday, January 23, 2018 1:38 PM
TheSQLGuru - Tuesday, January 23, 2018 12:29 PM
There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.
Did you have any visibility at all to WHAT query(s) were running on spid 24?
Do you have HEAP tables in play?
Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.
the results I provided were the output of sp_whoisactive
spid24's sql_text is always NULL when this happens, so we don't have a clue of what's going on
HEAP tables, no, all of them have PKs
about replication, not sure if it could be a factor
when we set it up, we never use account SA
we need to have our data in 2 SQL server instances right away in order to be ready for emergencies like the one from today
it is a nightmare when both instances are active (we rarely do this) as replication crashes, that's why we keep it active/passive
we are thinking on hiring an expert for this
thanks all of you for replying
Just a clarification, all your tables have a clustered index (this does not need to be a PK).

I'm confused.
I though a HEAP table was a table without a clustered index. Now I'm reading a post that says "a table with only few rows can be a heap".

If HEAP tables were my issue, how could I detect them?

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 9

rogelio.vidaurri - Tuesday, January 23, 2018 1:48 PM
Lynn Pettis - Tuesday, January 23, 2018 1:41 PM
rogelio.vidaurri - Tuesday, January 23, 2018 1:38 PM
TheSQLGuru - Tuesday, January 23, 2018 12:29 PM
There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.
Did you have any visibility at all to WHAT query(s) were running on spid 24?
Do you have HEAP tables in play?
Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.
the results I provided were the output of sp_whoisactive
spid24's sql_text is always NULL when this happens, so we don't have a clue of what's going on
HEAP tables, no, all of them have PKs
about replication, not sure if it could be a factor
when we set it up, we never use account SA
we need to have our data in 2 SQL server instances right away in order to be ready for emergencies like the one from today
it is a nightmare when both instances are active (we rarely do this) as replication crashes, that's why we keep it active/passive
we are thinking on hiring an expert for this
thanks all of you for replying
Just a clarification, all your tables have a clustered index (this does not need to be a PK).
I'm confused.
I though a HEAP table was a table without a clustered index. Now I'm reading a post that says "a table with only few rows can be a heap".
If HEAP tables were my issue, how could I detect them?

A heap table does not have a clustered index. If you query sys.indexes and filter on index_id = 0 you will see if you have any tables that do not have a clustered index.

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 10

Lynn Pettis - Tuesday, January 23, 2018 2:08 PM
rogelio.vidaurri - Tuesday, January 23, 2018 1:48 PM
Lynn Pettis - Tuesday, January 23, 2018 1:41 PM
rogelio.vidaurri - Tuesday, January 23, 2018 1:38 PM
TheSQLGuru - Tuesday, January 23, 2018 12:29 PM
There are a large number of things that could be in play here. I don't know that a forum is the place for them because it is an intermittent problem. If you can get a performance tuning professional on standby to RDP into your system while the event is happening it should be possible to isolate a root cause quickly - certainly before the normal event duration elapses.
Did you have any visibility at all to WHAT query(s) were running on spid 24?
Do you have HEAP tables in play?
Could your bi-directional transactional replication be a factor? I have come across that setup (at scale, anyway) precisely twice in ~25 years of consulting on SQL Server, and both were absolute nightmares.
the results I provided were the output of sp_whoisactive
spid24's sql_text is always NULL when this happens, so we don't have a clue of what's going on
HEAP tables, no, all of them have PKs
about replication, not sure if it could be a factor
when we set it up, we never use account SA
we need to have our data in 2 SQL server instances right away in order to be ready for emergencies like the one from today
it is a nightmare when both instances are active (we rarely do this) as replication crashes, that's why we keep it active/passive
we are thinking on hiring an expert for this
thanks all of you for replying
Just a clarification, all your tables have a clustered index (this does not need to be a PK).
I'm confused.
I though a HEAP table was a table without a clustered index. Now I'm reading a post that says "a table with only few rows can be a heap".
If HEAP tables were my issue, how could I detect them?
A heap table does not have a clustered index. If you query sys.indexes and filter on index_id = 0 you will see if you have any tables that do not have a clustered index.

no tables of our system
only these:
sysfiles1
spt_fallback_db
spt_fallback_dev
spt_fallback_usg
MSreplication_options
spt_monitor

thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

Definitely recommend getting a (good) consultant certified to work for you so they can be on standby to jump on as soon as the event starts occurring again. There are a number of good ones to be found on SSC.com forums.

Note if Ghost Cleanup is actually a factor, I think you actually see something like those words when you evaluate running processes.

Do you DELETE a lot of records in the database? Do tables with such have lots of indexes on them?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 12

TheSQLGuru - Wednesday, January 24, 2018 10:17 AM
Definitely recommend getting a (good) consultant certified to work for you so they can be on standby to jump on as soon as the event starts occurring again. There are a number of good ones to be found on SSC.com forums.
Note if Ghost Cleanup is actually a factor, I think you actually see something like those words when you evaluate running processes.
Do you DELETE a lot of records in the database? Do tables with such have lots of indexes on them?

thanks for the forum tip

hadn't seen ghost cleanup as a factor, or as a running process ever

we do delete lots of records in a few datatables, not more than 3 indexes

rogelio.vidaurri SSCrazy Points: 2117 More actions · Answer 13

Definitely something is wrong with our SQL instance. We switch traffic back, and after a few hours it got really slow.
Even sp_whoisactive was taking more than a minute some times.

I was able to see this (issues with tempdb)?
it is a different issue, not the Ghost Process striking again.

Looks like we need a DBA expert.

login_name waiting for PAGELATCH was our DB user for our main database

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 14

Looks like tempdb contention. How is your tempdb database configured?