A good doctor knows their patients – literally – inside out. They know the history of health and illness, what the patient has continuing issues with (maybe allergies, sinuses, or in more serious cases, diabetes or cancer), and how well they bounce back from those occasional – or frequent – illnesses. They understand the stressors in their lifestyles and how their health may be impacted by dealing with them. A brief look at a chart before walking into a room, and they are ready to plunge in and help.
I’ve said it before: my servers are my patients. The better I know them, the better I am able to help when things go wrong. Better yet, I may be able to prevent some things from going wrong by looking down the road a bit. But to do that, I need a chart. In the form of an SSRS report. A quick look, and I have some good basic information on my servers, and some solid information to start. I do use monitoring equipment as well in addition to this, but I like the trend data over time to clarify and focus baselines and issues. Especially since my patients don’t always speak English when they are trying to tell me where it hurts.
Here's what I track:
- Brent Ozar’s Blitz results and whether or not they were fixed (done once monthly and recorded to a table on each server, then sent to the hub for archiving)
- CPU utilization – both by SQL Server and non-SQL Server processes. I find breaking these out is helpful
- IOs by drive and server (again, breaking them out is helpful)
- PLE – here I’m looking for sustained low page life expectancy, rather than quick dips. Are the dips getting longer? What times of the day? Can I correlate them to application processes reliably?
- How long is it taking for Ola’s indexing and CHECKDBs to complete on each server?
- How many times have we had to revert either to a backup or a database snapshot?
- How many times has a server been rebooted?
- What are the top five waits on each server? Have they increased or changed?
- What are my top ten databases across production in terms of size, and how much are they growing by month/quarter/year?
- How much space is each server taking up on the SAN?
I am also looking at adding Brent’s BlitzCache output to the mix.
In the coming weeks, I’ll talk about how I gather this information, and how you could do it for yourself. If you are already doing something similar and have some data points that you collect that you don’t see here, let me know! Stay tuned.
This article is part of a series. The complete list of articles is:
- How Often Do You Give Your Server a Physical? (this article)
- Let's Start with Architecture