Blog Post

SQL Live Health Check - part 2 - historical data and configuation


In the previous post ( I talk about the main screen where you can have a single glance of the healthiness of all your db instances, clear color-coded for issues occurring at the time. You might ask, whats about trends, how can I change threshold..... This post is all about that!

Let's get to the trend first, to get the server detail screen up, all you have to do is click on the server name, the below screen will come up:

Here is what you can see trends for the given instance, by default, it shows the last 8 hours of data (it can be configured). The screen divided into 3 parts - PLE, buffer hit ratio and blocking info. both PLE and buffer hit ratio is pretty self explanatory, the y-axis for PLE is in ms as what you will get when you perform the query manually, I might change that to a more "human readable" format in the next version. Buffer hit ratio is surely in percentage for y-axis.

The bottom part is the blocking info, and it has lots of information there, here is the list of it:

DB NameThe db name where the blocking occurs
Resource TypeBlocking resource type
Request ModeQuery request mode
Wait TypeWait type for that blocking
Wait DurationTotal duration of the block in ms
Blocker IDThe SPID for the blocker
Blocker IPIP address for the host which execute that query
Blocker LoginLogin name for the blocker query
Blocker HostHost name for the blocker query
Blocker Program NameProgram name for the blocker query
Blocker Last BatchLast batch time for that blocker query
Blocker TextThe SQL statement for that blocker query
Request IDSPID for the requester
Request IPThe IP address for the host which execute the query
Request LoginLogin name for the requester query
Request HostHost name for the requester query
Request Program NameProgram name for the requester query
Request Last BatchLast batch time for that requester query
Request TextThe SQL statement for that requester query

This should gives you a pretty good idea of the blocking info along with which query is being blocked and by whom. As you can see, its very simple to use and show you the key information of your db instance. Let's move on to the configuration:

Under the file menu, click on config to bring up the screen (if will automatically show up at first use). Here you will need to configure the db instance where it keep the data, by simply entering the server name and the login method along with the DB name you want it to use, it will automatically checks if the necessary tables exists, if not, it will prompt to create that for you.

In the general setting panel, you can configure the display hours for report, which controls how much data will be displayed in the server view (shown as above), also if you want to enable auto refresh. Auto refresh interval currently cannot be change, and is running every minute. In the next version, I will allow the change of the interval so that one can set the frequency of the data collection.

Once you got the server setup ready, you can click on the save button and move on to the health check config tab.

This is very self explanatory, simply key in the db instance name and the login info, it will then store the information in the database we configured in the previous screen, then start collecting the db instance info onward. You might can see the threshold setting here as well, that is the threshold setting for the color-coding for the main screen, where it can be configured per server. Anything above warning will display in yellow, and you guess it right, above critical will show red.
Default value is provided, where PLE will show warning if its under 30 mins, and critical if its under 15 mins. This can be configured to any value that one see fits to its environment. But generally speaking, the value should cover most of the scenario. (Please don't use 300 sec as the guideline, we all know that it might not be a good reference!)
Very simple setup that can safe you lots of time, it can help you to answer questions like: is the server running slow, is there any issues on the db etc. Well, you can see that buffer hit ratio is high, PLE is high, and there is no blocking happening, you might need to look at something else. You can also look at the trends of your PLE, is it keep dropping every hour, you might can look into more detail to that and see if there are any batch job that runs every hour that reads lots of data in etc.
To download the app, please go here (, its free and I hope that can help you to perform some basic monitoring of your environment. In the next post, I will go through the scripts that I used, and what permission is required to get it working. Stay tuned.