One morning, you wake up and feel funny. You can't quite put your finger on it, but something isn't quite right. What now? You check your symptoms over the next few days; do you feel the same, better, worse? If better, then great, it was some temporal issue, perhaps caused by an allergic reaction to some suspiciously spicy chicken. If the same or worse then you go to the doctor for some health advice, but armed with some data to share, and having ruled out certain possible causes. Whether you realize it or not, in comparing how you feel one day to the next, you have taken baseline measurements. In much the same way, a DBA uses baselines to gauge the gauge health of their database servers. Of course, while SQL Server is very willing to share data regarding its health and activities, it has no idea of the difference between good and bad.
Over time, experienced DBAs develop "mental" baselines with which they can gauge the health of their servers almost as easily as their own body. They accumulate knowledge of the daily, natural state of each part of their database system, and so know instinctively when one of their databases "feels funny". Equally, they know when an "issue" is just a passing tremor. They see their SQL Server with all of its four CPU cores running close 100% and don't panic. Why? It's 5PM and the end-of-day reports are running, which are very CPU intensive. Equally, they know when they need to respond in earnest.
Nevertheless, no DBA can retain mental baselines for every characteristic of their systems, so we need to collect physical baselines too. In my experience, surprisingly few DBAs do this. Part of the problem is that SQL Server provides a lot of instrumentation. If you look, you will find an almost overwhelming amount of data regarding user activity on your SQL Server instances, and use and abuse of the available CPU, I/O and memory. It seems like a huge task even to work out which data you need to collect, let alone start collecting it on a regular basis, managing its storage over time, and performing detailed comparative analysis.
Without baselines, though, it is very difficult to pinpoint what ails a server, just by looking at a single snapshot of the data, or to spot retrospectively what caused the problem by examining aggregated data for the server, collected over many months.
It isn't as hard as you think to get started. You've probably already established some troubleshooting queries of the type
SELECT Value FROM
SomeSystemTable. Capturing a set of baseline values for such a query can be as easy as changing it as follows:
Of course, there are monitoring tools that will collect and manage this baseline data for you, automatically, and allow you to perform comparison of metrics over different periods. However, to get started, and to prove to yourself the value of baselines, stick something similar to the above query into an agent job, running every hour or so, and you are on your way with no excuses! Then, the next time you investigate a slow server, and see x open transactions, y users logged in, and z rows added per hour in the Orders table, compare to your baselines and see immediately what if anything changed.
Louis Davidson (Guest Editor)