SQLServerCentral Article

Monitoring Blocks

,

Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first. One connection can block another connection, regardless of whether they emanate from the same application or separate applications on different client computers. Some of the actions needing locking protection may not be obvious, for example, locks on system catalog tables and indexes. Most blocking problems happen because a single process holds locks for an extended period of time, causing a chain of blocked processes, all waiting on other processes for locks. There are many blocking scenarios such as submitting queries with long execution times, applications that are not processing all results to completion, etc. The application code itself must be designed appropriately and monitored for efficiency. In fact, most experts agree that as much as 70 to 80 percent of performance and blocking problems are caused by improperly coded database applications. SQL is the primary culprit. And this is why one of the common tasks in multi-user multi-database environment is to monitor the processes that are blocked by another process for more than X number of seconds.

Most of the time, blocks don't cause problems unless it is long time blocking. Locking problems arise only when a process holds a lock longer than necessary and blocks other users' processes

The following categories show the division of application blocking processing based on our DBA department decision:

  • Short-time Blocking - average duration less than three seconds. They do not create problems.
  • Mid-time Blocking - average duration ranges from 3 seconds to a maximum of 10 seconds. Usually do not create problems
  • Long-time Blocking - average duration more than 10 seconds. Require attention.

Deadlocks - SQL Server automatically terminates one of the blocked processes.

It is doesn’t mean that division for every company or scenario may not vary. But our application’s connections timeout is 15 seconds and with long-time blocking situation some application’s connections start timeout.

In the environment, where 200+ databases and new databases created on the weekly/daily bases, the monitoring task is required partial or complete automation and a notification when blocking is occur. The purpose of this article is not discussing a strategy for the resolution of SQL Server blocking problems but to show how to automate the task of monitoring and notification if blocking situation happen and lasts for more that X number of seconds. It is understandable, that this task should be scheduled.

As always, I am not pretending to find the perfect or the best solution. The point many of my articles is to present a task, describe the ideas of how to solve the task and show one or few various (may not be the best but) way(s) of solution implementation. For sake of clarity, all stored procedures will not include an error handler, most parts for the variables verification, and some other advanced features.

Sysprocesses is the system table that can reveal the most clues about why a process is blocking others. The sysprocesses table holds information about processes running on Microsoft® SQL Server™. These processes can be client processes or system processes. Sysprocesses is stored only in the master database. Let’s see the source code of developed procedure.

Developed stored procedure has 3 parameters:

@blocker_threshold shows how many blocking processes must exists before send an email,

@email_list is the list of the recipients to whom send an email, @wait_time_threshold define a threshold for the blocking.

create proc dbo.P_blockerS
   @blocker_threshold smallint = 1,   
   @email_list varchar(255) = 'sqldbagroup' ,   
   @wait_time_threshold int 
as
-- This procedure will assist in identifying SQL Blockers 
-- If blocking exceeds a threshold supplied as a parameter it executes xp_sendmail
BEGIN 
set nocount on 
Declare @total_blockers smallint, @waittime int
,  @maxid int, @minid int
declare @blocker_spid smallint, @blocker_spid_info varchar(500) 
, @blocker_spid_msg varchar(7000) 
 declare @tbl_blockers table (blocker_spid int,tid int identity(1,1)) 
select @total_blockers = count(*) 
 from master..sysprocesses 
 where blocked > 0 and waittime > @wait_time_threshold
set @blocker_spid = 0
set @blocker_spid_info = ''
set @blocker_spid_msg = CAST(@total_blockers as char(3)) +   process(es) are blocked. Blocking info: '  
IF (@total_blockers > 0 )              
-- if there are blockers
 BEGIN
   insert into @tbl_blockers (blocker_spid)
     select spid from master..sysprocesses 
     where blocked > 0 and waittime > @wait_time_threshold 
     union 
     select blocked from master..sysprocesses
  where blocked > 0 and waittime > @wait_time_threshold 
     select @minid = min(tid), @maxid = max(tid) from @tbl_blockers 
    WHILE ( @minid <= @maxid ) 
     BEGIN
     -- begin while loop 
       if ( @total_blockers >= @blocker_threshold )
        BEGIN 
          select @blocker_spid = blocker_spid 
 from @tbl_blockers
          where tid = @minid 
                        
          select @blocker_spid_info =  'Block on ' +  rtrim(@@servername) + ',  SPID='+
            rtrim(convert(varchar(6),p.spid))+',  bloked by '+
isNull(cast(blocked as varchar(9)), ' ') + ',  '+ 
            rtrim(convert(varchar(15),IsNull(p.status, ' ')))+' ,  '+                     
            rtrim(convert(varchar(25),IsNull(p.loginame, ' ')))+' , '+                     
            rtrim(convert(varchar(15),IsNull(p.hostname, ' ')))+' , '+                     
            rtrim(convert(varchar(30),IsNull(p.program_name, ' ')))+' , '+                     
            rtrim(convert(varchar(25),IsNull(p.cmd, ' ')))+', login_time='+
            rtrim(convert(varchar(19),IsNull(p.login_time,'1900-01-01'),121))+', last_batch='+
            rtrim(convert(varchar(19),IsNull(p.last_batch,'1900-01-01'),121)) + ' '
           from master..sysprocesses p
           where p.spid = @blocker_spid
        end 
          
      set @blocker_spid_msg = @blocker_spid_msg + @blocker_spid_info
      set @minid = @minid + 1
    END                
    -- end while loop 
         
   exec master..xp_sendmail @recipients = @email_list
     , @subject = 'Blocking Process'
  , @message = @blocker_spid_msg
 END             
 -- if blockers  
END 

Another developed procedure p_blocking_notify gives an extended example for the process of monitoring blocking connections. This procedure assists in identifying SQL Blockers. And, in the case of existence of a blocking process, procedure will not only send an email but provide an ability to kill the blocker connection based on application name and number of seconds’ another process is blocked. The choice of killing (disconnecting) the blocking connection is controlled by the parameter @kill_flag. If the value is ‘Y’ blocking connection will be terminated. If blocking exceeds a threshold supplied as a parameters @blocker_threshold and @waittime_threshold then it executes an raiserror and xp_sendmail.

Usually I am creating a job that runs ones in 1-2 minutes or based on blocking prediction time the schedule may vary (for example, every 2-5 minutes). In addition, stored procedure can be modified to insert results into a history table that may give the ability for the better analysis over the time. I am utilizing this procedure when I need to analyze an intermediate blocking process that may happen rarely. In this case SQL Profiler is useless because excessive blocking situation may not appear within days or even weeks.

create proc dbo.p_blocking_notify
   @blocker_threshold smallint = 1,   
   @email_list varchar(255) = 'emailgroup@abc.com' ,   
   @program_name_par1 varchar(255),
   @waittime_threshold int,   
   @num_of_blocked_processes int,   
   @kill_flag char(1) = 'N'
as 
BEGIN 
set nocount on 
Declare @total_blockers smallint, @cnt int, @waittime int
  , @program_name varchar(50), @APP_total_blockers int
declare @blocker_spid smallint, @blocker_spid_info varchar(250)
  , @below_thres_msg varchar(30)
declare @blocker_spid_msg varchar(5000), @blocker_spid_msg1 varchar(250)
  , @above_thres_msg varchar(255)
select @total_blockers = count(*) from master..sysprocesses where blocked > 0 
select @APP_total_blockers = count(*), @waittime = max(waittime) 
 from master..sysprocesses where blocked > 0 and program_name = @program_name_par1
set @cnt = 1 
WHILE ( @APP_total_blockers > 0  and @waittime >= @waittime_threshold
        and @cnt < @num_of_blocked_processes)              
 BEGIN                                   
 -- begin while loop 
   set @cnt = @cnt + 1 
   --  The following statements can be modified to insert results into a history
   -- table. If run from SQL Executive a blocker count will appear in history 
   if (@total_blockers < @blocker_threshold)
    begin
      set @below_thres_msg = 'Total Blockers at '+convert(char(3),@total_blockers)
      print ' '
      print @below_thres_msg
    end
   else
    BEGIN
      set @above_thres_msg = ' Blocking Threshold Exceeded, Threshold at ' +
          convert(char(3),@blocker_threshold) +
          ' Total Blockers at ' + convert(char(3),@total_blockers) 
      print ' '
      print @above_thres_msg
    end 
       
   -- Send a Message to the log. Can utilize message for SQL Alert
   RAISERROR (@above_thres_msg,16,1)  with log
   set rowcount 1
   select @blocker_spid =  (select TOP 1 spid from master..sysprocesses
                            where blocked = 0 and spid in ( select blocked 
                                 from master..sysprocesses 
where blocked > 0 
)
) 
   select @total_blockers 'Total Blocked Processes'
   select @blocker_spid 'This is the Culprit'
   select @blocker_spid_info = (select 'Block on ' + rtrim(@@servername) + ', SPID=' + 
          rtrim(convert(char(6),p.spid))+', '+
          rtrim(convert(char(15),IsNull(p.status, ' ')))+' ,'+
          rtrim(convert(char(25),IsNull(p.loginame, ' ')))+' ,'+
          rtrim(convert(char(15),IsNull(p.hostname, ' ')))+' ,'+
          rtrim(convert(char(30),IsNull(p.program_name, ' ')))+' ,'+
          rtrim(convert(char(25),IsNull(p.cmd, ' ')))+', login_time='+
          rtrim(convert(char(19),IsNull(p.login_time,'1900-01-01'),121))+',last_batch='+ 
          rtrim(convert(char(19),IsNull(p.last_batch,'1900-01-01'),121)) 
    from master..sysprocesses p
    where p.spid = @blocker_spid)
   select @program_name = RTrim(Ltrim(p.program_name)) 
     from master..sysprocesses p  
  where  p.spid = @blocker_spid 
   set rowcount 0
   -- kill blocking process 
   IF ( @program_name = @program_name_par1 and @waittime > @waittime_threshold  
        and @kill_flag = 'Y')
    begin
      select @blocker_spid_msg = 'kill '+ rtrim(convert(varchar(4),@blocker_spid))
       exec (@blocker_spid_msg )
    end 
   select @total_blockers = count(*) from master..sysprocesses where blocked > 0
   select @APP_total_blockers = count(*) , @waittime = max(waittime) 
    from master..sysprocesses 
 where blocked > 0 and program_name = @program_name_par1
 END                
 -- end while loop 
          
set @blocker_spid_info = CAST(@cnt as char(2)) + ' process(es) killed. Last blocker id info: ' +
    @blocker_spid_info
set @blocker_spid_msg = 'xp_sendmail @recipients = ''' + @email_list + '''
   , @subject = ''Blocking Process'', @message = ''' + @blocker_spid_info + ''''
exec (@blocker_spid_msg )
END

Caution: Using the kill command is a very dangerous approach to clearing the blocking process because it could potentially corrupt the SQL database if the blocking process is performing a database update. The kill command may also take quite a while to execute if the blocking process has performed a lot of work that the kill process must undo (rollback). For details on the kill command, view SQL Books Online.

In a lot of cases, there are some consequences of blocking that are not so obvious. DBA can spend large amount of time detecting, diagnosing and resolving blocking problems. Appropriate tools and processes can minimize the time required to manage the problems. Company can waste money by buying additional hardware instead of addressing the underlying (blocking) problem.

That’s why it is very important not only to know the standard monitoring connections data (that can be done by using system stored procedures sp_who and sp_who2) but to find out the actual SQL command the process was running at the time. For example, get information about running processes and the last/current SQL command they running if any of them are running more than X number of seconds while usually are short. Connection is identified by the host computer running the process, but theoretically, definition criteria may vary based on the specific conditions, environment, running applications, and so on. My task was to find out what processes are running on server and posting significant load on server. The main problem that it was an intermediate issue and there was no an ability to say when it happened. It may happen 2-3 times a day or may happen 1-2 times a week.

I decided to utilize my standard process which is inserting a snapshot of every process every X number of minutes. And if any defined process is running longer than usually – gather the last command data and in addition activate the job to get a snapshot of all running processes/connections.

There are a lot of articles how to monitor SQL server connections and users. The simplest way is to create a table and store data based on sp_who (or sp_who2) stored procedure or select information from system table sysprocesses. I created table T_SP_WHO_TABLE

 
CREATE TABLE [dbo].[T_SP_WHO_TABLE] (
  [PROCESSES_IDENT_ID] [int] IDENTITY (1, 1) NOT NULL 
  , [PROCESS_ID] [int] NOT NULL
  , [DATABASE_NAME] [varchar] (25)  NOT NULL 
  , [CURRENT_COMMAND] [varchar] (50)  NULL
  , [PROCESS_ID_STATUS] [varchar] (25)  NULL
  , [BLOCKING_PROCESS_ID] [int] NULL 
  , [CPU_TIME] [int] NULL 
  , [DISK_IO] [int] NULL 
  , [LAST_BATCH] [datetime] NULL 
  , [LASTWAITTYPE] [varchar] (64)  NULL
  , [WAITTIME] [int] NULL 
  , [WAITRESOURCE] [varchar] (64)  NULL
  , [HOSTNAME] [varchar] (128)  NULL 
  , [PROGRAM_NAME] [varchar] (128)  NULL
  , [NT_USERNAME] [varchar] (128)  NULL
  , [SQL_LOGIN] [varchar] (128)  NULL
  , [OPEN_TRAN] [int] NULL 
  , [LAST_UPDATED_DT] [datetime] NOT NULL DEFAULT getdate()    
) 

Stored procedure P_SP_WHO is running by SQL Server Agent every 5 minutes and insert snapshot of the monitoring data into the table. It gives me some information about connected users and processes. Another procedure P_T_SP_WHO_TABLE_CLEANUP running with daily (or weekly) maintenance job and keeping manageable number of rows in table T_SP_WHO_TABLE by deleting records that are older then X number of days based on the parameter @DAYSBACK.

create procedure P_SP_WHO 
as
BEGIN
 INSERT INTO T_SP_WHO_TABLE 
  (PROCESS_ID , DATABASE_NAME , CURRENT_COMMAND , PROCESS_ID_STATUS 
   , BLOCKING_PROCESS_ID , CPU_TIME , DISK_IO, LAST_BATCH , LASTWAITTYPE 
, WAITTIME , WAITRESOURCE , HOSTNAME ,  PROGRAM_NAME , NT_USERNAME 
, SQL_LOGIN , OPEN_TRAN 
   )
  select spid , convert(varchar(25), case dbid 
                                       when 0 then 'no database context'
                                       else db_name(dbid)
                                     end) 
     , cmd , convert(varchar(20), status) ,  blocked
     , cpu , physical_io , last_batch , lastwaittype , waittime 
  ,  convert(varchar(30), waitresource) , convert(varchar(15), hostname) 
  , convert(varchar(30), program_name) , convert(varchar(20), nt_username) 
  , convert(varchar(30), loginame) , open_tran
  from master..sysprocesses
  -- where conditions may vary or not exists at all
  where not ( ltrim(rtrim(hostname)) = '' and status = 'background' ) 
 END 
create procedure P_T_SP_WHO_TABLE_CLEANUP 
  @DAYSBACK INT 
as
  
DELETE FROM T_SP_WHO_TABLE  
 WHERE DATEDIFF(DD,LAST_UPDATED_DT, GETDATE()) > @DAYSBACK

To be able storing the command with the last statement I created table H_LAST_COMMAND.

Stored procedure P_MON_CONNECTIONS provide an ability to verify business criteria for the process time delay, store the last command information in case of the reaching a threshold and activate the job to store the snapshot of the processes.

CREATE TABLE [dbo].[H_LAST_COMMAND] 
(  [LAST_COMMAND_ID] [int] IDENTITY (1, 1) NOT NULL primary key CLUSTERED
 , [SPID] [int] NOT NULL 
 , [EVENTTYPE] [varchar] (60)  NOT NULL 
 , [PARAMETERS] [INT] NULL 
 , [EVENTINFO] [varchar] (600)  NULL
 , [LAST_UPDATED_DT] [datetime] NOT NULL default getdate()   
)
CREATE procedure dbo.P_MON_CONNECTIONS
   @SEC_BACK INT = -18
, @hostname varchar(255) 
, @job_nm varchar(255)
as
begin
declare  @maxid int
 , @cmd1 varchar(8000)
 , @spid int
SET NOCOUNT ON
DECLARE @tmp1 table (spid INT, tcmd1 varchar(100), tid int identity(1,1))
create table #lastcommand(eventtype varchar(60)
    , Parameters int 
 , EventInfo varchar(600)
)  
IF ( 
      ( select count(*) from master..sysprocesses 
   where hostname like @hostname and open_tran > 0 
and login_time < DATEadd(ss, @SEC_BACK, getdate()
) and last_batch < DATEadd(ss, @SEC_BACK, getdate())  
) = 0 )
 begin
   return 0
 end
insert into @tmp1(spid, tcmd1)
 select spid , 'dbcc inputbuffer(' + cast(spid as varchar) + ')'
  from master..sysprocesses
  /* where condition can vary*/  where status <> 'background'  
  and IsNull(ltrim(hostname),'') <> ''
select @maxid = max(tid) from @tmp1
while @maxid > 0 
 begin 
   select  @cmd1 = tcmd1, @spid = spid from @tmp1  where tid = @maxid 
   truncate table #lastcommand
   insert into #lastcommand(eventtype , Parameters , EventInfo )
     exec (@cmd1) 
   insert into h_last_command (spid , eventtype , Parameters, EventInfo )
    select @spid, eventtype , Parameters, EventInfo 
     from #lastcommand
   select @maxid = @maxid - 1
 end
EXEC msdb.dbo.sp_start_job @job_name = @job_nm
SET NOCOUNT OFF 
return 0 
end

To be able to utilize the procedure an another job with one step was created. The job is running endlessly until stopped manually. The step source code is follow:

Declare @delay_in_seconds int
Set @delay_in_seconds = 10 
WHILE 1=1
 BEGIN
   exec P_MON_CONNECTIONS @SEC_BACK = -15
     , @hostname = ‘PROCESS_COMPUTER’ 
  , @job_nm 
   WAITFOR DELAY  @delay_in_seconds  
 END 

It is possible to wrap the job step into a stored procedure and develop the more sophisticated code with a parameter to define how long job will be running. In addition, for the time delay I created the standard stored procedure, which allows me easier utilize the time delay in many other cases. For example for a data transfer processes.

CREATE procedure dbo.P_TIME_DELAY
  @delay_in_seconds int
  -- delay in seconds, max for 1 hour
as 
begin 
declare @DELAY_TIME varchar(10)
      , @delay_time_num numeric(6,2)
-- check if provided number bigger than 3599 (1 hour) 
IF ( @delay_in_seconds > 3599 or  @delay_in_seconds < 0)
 begin
   select raiserror('Error: @delay_in_seconds more than 3599 or less than 0. Value can be between 0 and
3599', 16, 1)
return 
end
-- convert seconds
set @delay_time_num = @delay_in_seconds
set @DELAY_TIME = '00:'  + right('0' + cast(FLOOR(@delay_time_num/60) as varchar(2)), 2) + ':' + 
    right('0' + cast( cast( (@delay_time_num  - (FLOOR(@delay_time_num/60) * 60)  ) as int) 
       as varchar(2)), 2)
WAITFOR DELAY @DELAY_TIME
End 

Most of stored procedures, user defined functions, and tables that are used only by administrative processes (including the ones described above) reside in a separate database created by DBA for system administrator’s needs. This means that there are no users without administrative rights have an ability to execute DBA specifically created stored procedures or functions as well as view or change gathered data.

Conclusion

This article, while is not pretending to deliver the best possible solutions, shows few DBA specific tasks and outline some solutions to solve the blocking problems. I am not discussing the optimization or optimal source code but presented source code is fully functional. The source code can be modified; an error handler and some advanced features can be added based on the specific conditions.

Redgate SQL Monitor

Rate

5 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (4)

You rated this post out of 5. Change rating