According to Jason Strate (SQL Server MVP) there are 5 white papers ever SQL Server DBA should read. Considering my blogging is focused on how to monitor and tune previously installed servers, we’ll focus on the first two of these white papers: Troubleshooting Performance Problems in SQL Server 2008 and SQL Server 2005 Waits and Queues. Ignore the age of these papers, after all you’re running mostly the same code to do mostly the same tasks in SQL Server 6.5 and SQL Server 2012…well, except for the lack of DMVs in 6.5.
I don’t agree with them 100%, but not to the point that I don’t agree 100% that everyone should read them. The sources of the information, the uses of the DMVs, all of it is what you’ll need to know to progress as a DBA. Ask any DBA III or higher and they’ll be able to tell you most of what’s said in these papers, even if they went the hard route and made it to where they are without reading them. Where they fall short is by giving static values that aren’t worded as just starting points, and the trending they do is the trending you can do as a Microsoft Support Engineer where nothing is saved or long-term.
My differences range from minor complaints such as me feeling that a blanket statement saying Page Life Expectancy under 300 seconds is when you should start to worry. First, it’s a blanket statement, and I’m a firm believer in that the word always means you’re almost always wrong. Second, servers have more memory now and my SAN team would kill me if I told them I would have to read 500 GB of data every 300 seconds on some of my bigger servers. I do agree that there are times you should worry about PLE, but you have to keep in mind two things; how much memory are you cycling through in that time and what’s normal for this specific server. I just hope that regular drops down to 3,000 are closer to the point that grabs your attention.
Lets build off of that last big there…What’s normal for this specific server? The queries they’re giving you are snapshots of what info the server has for you right now. Some of that is what’s happening now, some is what has happened since the last restart of SQL services, and other pieces fall somewhere in the middle. You can’t tell me that I’ve done X amount of work since the server was rebooted two weeks ago and expect me to tell you if that’s an issue. You also can’t tell me that “This proc has used more CPU than any other by far since that reboot two weeks ago” and expect me to tell you if that’s a critical issue or an intense, yet normal and expected, off-hours maintenance task. You can, however, tell me that on a typical Monday between 2:00 PM and 5:00 PM we do X, but this Monday we’re doing 100 times that while the business has no reason to say this Monday should be any different. Now we’re on to something…we know there’s an issue, and we just found our starting point to solving it. Chances are if you’re reading this then either you or your immediate manager would be able to create a database to keep this trending information if it doesn’t exist already, and you’re making it harder on yourself if you don’t.
These two white papers are almost 200 pages long together, so I’m not going to pretend to cover them in one post. Actually, I’m hoping my next 50 posts might cover most of them. The points I’m going for are that these white papers are telling you the perfect stats to watch, and these stats are going to be more meaningful when you monitor them. I’ve started this in some of my posts such as looking at Wait Stats and Blocking, but there’s a long way to go. More than these two DMVs need to be watched like this, but they are good examples of wait stats capturing cumulative data and blocking capturing point-in-time data from DMVs.
If you see a DMV in these papers, you’ll see it in my posts. If it’s not there yet, it will be. Some like dm_db_exec_sessions will never have a dedicated post unless you count Running Processes which links it to several DMVs. Others like dm_os_performance_counters might not fit comfortably in a single post. The point here is that all the DMVs, especially the ones mentioned here, are worth looking into more.
Read the papers. Always go beyond the question of “what should I do”, get to the “why should I do it”, and strive for “how can I make this mean more and be more useful”. Never take my scripts or anyone else’s as-is. Tinker with them, really learn the DMVs, and it will all start to come together.
Filed under: General Chat, Monitoring, SQL Server Tagged: Blocking, DMV, dm_os_performance_counters, monitoring, Page Life, sys.dm_db_exec_sessions, Wait Stats, White Paper