Blog Post

Improving performance by using SET STATISTICS IO

,

When I started using SQL Server the company I was working for at the time had no skill in this area, they were more focussed on other platforms and so I had nobody to learn from. The community was in it’s infancy and I had no idea people would blog about this kind of stuff. You were pretty much limited to what was in Books Online and that was kind of dry and hard to navigate.

Anyway I digress, the point of this is that there are a number of features in SQL Server that you probably don’t know about when starting out that can help you to fine tune the performance of your queries. You just need someone to tell you about them. For me SET STATISTICS IO was one of these features and I figured other people will be in the same boat.

What does SET STATISTICS IO do?

SET STATISTICS IO ON does pretty much what you would expect, it returns information about IO back to your UI as a message. Why is this important? Well it helps you to understand just how much IO has been performed broken down by object. It actually breaks it down further than that, but we’ll cover that once we’ve seen how it works.

Why should I use SET STATISTICS IO?

Once you know how much IO has been used you can use that figure as a benchmark when tuning your queries. A lot of people make the mistake that they should be measuring time. The problem with that is the time that a query takes can fluctuate based on contention for resources (I’m thinking particularly about live locking / blocking here), so unless you are the only person on a system, and everything is dedicated to that one machine then you cannot guarantee consistent results. On the other hand if the same query is repeated the amount of I/O’s performed can be compared as like for like. When tested, implemented and run on your production system the fact you are performing less I/O’s will increase performance.

How to turn on SET STATISTICS IO

There are two ways you can do this, however in this post I am only going to show you how to do it via SYNTAX rather than the UI. My reasoning for this is there are various third party applications that people use to edit SQL Server and to be honest I am not a big advocate of leaving it on as a default anyway.

It’s pretty simple and I bet that most have you have worked it out by now anyway. All you need to do is enter:

SET STATISTICS IO ON;

When you’re done, simply enter :

SET STATISTICS IO OFF;

 

Compare and Contrast

Now we have established how to enable this feature let’s see it in action. This example will return information about your backups and so is safe to run on any machine. I would however recommend that you run this on a nonproduction server. The benefit of using this example over something in AdventureWorks is twofold. Firstly you may not have it and would need to download it at which point you will no doubt lose interest and do something else. Secondly you can check to see if your backups are up to date!

 

USE msdb;
go
SET STATISTICS IO ON;
GO
SELECT
   b.machine_name,  
   b.server_name,  
   b.database_name as DBName,  
   b.backup_start_date,  
   b.backup_finish_date,  
   CASE     
WHEN b.[type] = 'D' THEN 'Database'    
WHEN b.[type] = 'I' THEN 'Differential database'    
WHEN b.[type] = 'L' THEN 'Log'    
WHEN b.[type] = 'F' THEN 'File or filegroup'    
WHEN b.[type] = 'G' THEN 'Differential file'    
WHEN b.[type] = 'P' THEN 'Partial'    
WHEN b.[type] = 'Q' THEN 'Differential partial'    
ELSE b.[type]  
END Backup_Type,      
b.expiration_date,  
DATEDIFF(MINUTE,b.backup_start_date ,b.backup_finish_date) as Total_Time_in_Minute,  
b.recovery_model,  
b.backup_size
FROM   msdb.dbo.backupset AS b
INNER JOIN msdb.dbo.backupmediafamily AS bf  ON b.media_set_id=bf.media_set_id
ORDER BY   b.backup_start_date DESC
GO

 

On my laptop this brought back the following results:

 Click on the image above and it will expand in a new window

 

From this we can see the objects used in the query and the IO that took place. Details of what each of these mean can be found in Books Online here. I may expand upon these in a future post.

Normally what sensible people do is examine the attributes used, check the query plan to see if they can increase performance by adding an index. I’m definitely not normal and I want you to be able to run this without having to make changes, so I’m going slightly leftfield with the next example by changing the physical operator of the join.

SELECT
   b.machine_name,  
   b.server_name,  
   b.database_name as DBName,  
   b.backup_start_date,  
   b.backup_finish_date,  
   CASE     
WHEN b.[type] = 'D' THEN 'Database'    
WHEN b.[type] = 'I' THEN 'Differential database'    
WHEN b.[type] = 'L' THEN 'Log'    
WHEN b.[type] = 'F' THEN 'File or filegroup'    
WHEN b.[type] = 'G' THEN 'Differential file'    
WHEN b.[type] = 'P' THEN 'Partial'    
WHEN b.[type] = 'Q' THEN 'Differential partial'    
ELSE b.[type]  
END Backup_Type,      
b.expiration_date,  
DATEDIFF(MINUTE,b.backup_start_date ,b.backup_finish_date) as Total_Time_in_Minute,  
b.recovery_model,  
b.backup_size
FROM   msdb.dbo.backupset AS b
INNER MERGE JOIN msdb.dbo.backupmediafamily AS bf  ON b.media_set_id=bf.media_set_id
ORDER BY   b.backup_start_date DESC
GO

This brought back the following results:

You’ll notice here that not only did the number of reads change for our two tables, but by using a merge operator we also created a worktable which will of course reside in TempDB. As a DBA I would look at that as an area for optimisation.

You’ve probably got the hang of this now, but I’ll give you one last example before you go back to reading Dilbert

 

This time we’re going to run the same query again, but use the HASH join operator instead of the MERGE join operator.

SELECT
   b.machine_name,  
   b.server_name,  
   b.database_name as DBName,  
   b.backup_start_date,  
   b.backup_finish_date,  
   CASE     
WHEN b.[type] = 'D' THEN 'Database'    
WHEN b.[type] = 'I' THEN 'Differential database'    
WHEN b.[type] = 'L' THEN 'Log'    
WHEN b.[type] = 'F' THEN 'File or filegroup'    
WHEN b.[type] = 'G' THEN 'Differential file'    
WHEN b.[type] = 'P' THEN 'Partial'    
WHEN b.[type] = 'Q' THEN 'Differential partial'    
ELSE b.[type]  
END Backup_Type,      
b.expiration_date,  
DATEDIFF(MINUTE,b.backup_start_date ,b.backup_finish_date) as Total_Time_in_Minute,  
b.recovery_model,  
b.backup_size
FROM   msdb.dbo.backupset AS b
INNER HASH JOIN msdb.dbo.backupmediafamily AS bf  ON b.media_set_id=bf.media_set_id
ORDER BY   b.backup_start_date DESC
GO

 

Here are the corresponding results from my laptop:

 

 

What the !!!!? SQL Server created a table in TempDB but performed no IO? Well no actually it did, but this is a limitation of using STATISTICS IO, sorry. For some things it’s great, but there are a few caveats. I’ll be writing up another one over the next two week so stay tuned for that one.

Hopefully this last example hasn’t put you off using this feature, it is really useful, but you must remember to use it in conjunction with other features such as query plans, traces and extended event sessions. Together these will provide you with a much clearer understanding of how you can optimise your queries.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating