The real challenges of database monitoring

I’ve worked in IT for 30 years, a large part of which has involved managing data and databases. I’m now a SQL Server Enterprise Architect and I manage the direction and operation of a Microsoft Platform Team, as well as the design, build and implementation of a global SQL Server estate.

I recently took part in a panel discussion with some other database experts, during which we discussed how monitoring is the key to success in database DevOps. You can still watch the webinar online, but I wanted to write about four of the topics that came up because I think they’ll resonate with anyone who’s responsible for managing databases:

  • Manual v third-party monitoring
  • Moving to proactive monitoring in development and testing
  • Managing expectations of the cloud
  • Making the business case for a third-party monitoring tool

Manual v third-party database monitoring

Years ago at BMW, we had what was then a typical setup for a database shop. We used Microsoft tools like SCOM and other things to do monitoring, and basically waited for a problem and then reacted to it.

Like any responsible team, we always tried to be on the front end of patching and planning the infrastructure to the point where machines and systems were well allocated. We also had a good distribution of databases across the infrastructure and machines weren’t overwhelmed or oversubscribed. With a small number of servers, manual monitoring was okay, but as our estate grew into lots of servers operating globally in different time zones, it became very difficult.

At the time, our legacy systems were on SQL Server 2005 and 2008, which were so asymptomatic that we had no idea of what day any particular database was going to go into left field. We never knew if Monday was going to be a good day or a bad day and, with around 80% of our SQL Server estate handling the financial side of the business, that wasn’t a good position to be in.

We always have to do more with less, and we started to ask how we could be more proactive and get better insights into systems that were having a problem and react in real time, to reduce the amount of business impact and downtime.

That was when we decided to bring in a third-party product to help us, and you’ll probably come to the same crunch point if you haven’t yet moved from manual monitoring. For us, it was a change in mindset that resulted in us saying, ‘Okay, how did we live without this for so long?’ If you haven’t yet taken the plunge, it’s worth testing a few third-party tools to see where they could help you.

Moving to proactive monitoring in development and testing

With a third-party product in place, you move from firefighting to truly managing your database estates. We use Redgate SQL Monitor on a daily basis for preventative stuff. Throttling things like deadlocks and blockers come up a lot of the time, so live troubleshooting to solve problems or remediate them in a very fast manner happens regularly.

The proactive side of it comes in when you’re monitoring environments beyond production, like development and testing. A monitoring solution lets you introduce load testing so that you can see if the changes coming through the pipeline will have an impact on performance when they reach production.

Now it’s never perfect because nothing really replicates production, but we can get a sense of the query plans, what’s running fast, what’s running slow, what’s blocking and what’s not blocking. Interestingly, 99.9% of the time it’s usually an application problem, not a SQL or server-based problem.

This is why you need the application team involved. They need to appreciate and understand what monitoring is, so there’s a little bit of an education process in terms of what they should be looking at to improve their code. For example, everybody has code-based testing, but as soon as you throttle it up and put it live with 50,000 users testing it as opposed to 50 users, a very different state of affairs goes on.

Another thing that comes into play here for us because we’re a global company is to also consider

the different DevOps approaches that exist. There’s a huge variation between the US, Europe, India and the wider Asia Pacific in terms of how teams create applications and their understanding of what monitoring is and what it can do for them.

Finally, they need to appreciate that this isn’t a one-off – you don’t run it through the cycle and say, ‘Okay, I’m done’. You need to continuously be on top of it and say, ‘How can I use this as a fine-tuning component to increase my quality of delivery and customer reliability?’ That’s where effective monitoring really comes into its own because it becomes a business tool as well as a database tool.

Managing expectations of the cloud

Cloud adoption rates are increasing rapidly and you, like me, probably get beaten over the head about it every day. I try to be agnostic towards it – I really don’t care if the server is here or there. My biggest concern is the security implications because while we might be a manufacturing entity, a large part of the data we handle concerns finances and sales.

We’ve migrated 10% to 20% of our servers to the cloud and I think a hybrid approach will always be the case. I don’t foresee us ever exceeding maybe 50% because the amount and quantity of data we collect would be cost-prohibitive to move into the cloud.

It’s great to put things in the cloud, but when you need to get them back out, not so much, so we’re taking a very pragmatic approach. If it makes sense, put it there. Non-production systems, for example, like testing and other temporary databases that can be created and destroyed at will, make total sense because they’re a lot easier to deal with. A lot of the steadfast, legacy systems, though, will remain most likely on-premises.

You also need to think about how you migrate to the cloud, and how you monitor your servers and instances in the cloud. It’s been my experience that to simply ‘lift and shift’ is very difficult and probably less successful than most would expect. You need to plan how to make an application cloud-aware and cloud-ready, and how you’re going to approach it from a DevOps perspective or an Agile perspective or a Waterfall perspective, whatever case you may use.

I always reference the triangle: you have time, you have money and you have reliability. Pick two because you can’t have all three. It will help make your decisions about moving to the cloud rational and business-focused rather than jumping in because you feel you have to.

Making the business case for a third-party database monitoring tool

As you’ve seen, I’m a big fan of using a third-party monitoring tool because it lets me manage a big, hybrid estate and be proactive as well as reactive. But like any IT cost, it comes down to budgets, so how do you get buy-in for it? Or even just a commitment to do a Proof of Concept? I’d suggest you find your pain points and then address them.

For example, it’s hard to simply say you’re going to monitor your entire infrastructure of 3,000 servers and it’s going to cost $800,000 a year in licence costs. The average company just can’t do that.

What if you pick one or two parts of your infrastructure instead? Those where you’re having issues. It might be deadlocks on your production server that prevent you meeting SLAs, or helping with the load-based testing we talked about earlier. The point is to do a PoC, and show management that you’ve invested a small sum to achieve a greater return.

That way, finance isn’t the blocker any more. Instead, you can demonstrate that you can accomplish all of this and it only costs this much. If you can turn that conversation around, generally you’ll be able to get there.

Tony Maddonna is the Microsoft Platform Team Lead and SQL Server Enterprise Architect for BMW AG globally. You can find out more about his views on how monitoring is the key to success in database DevOps in the recording of the panel webinar which prompted this article.

If you’d like to see how Redgate’s SQL Monitor can help you monitor large, hybrid estates more effectively, you can also download a fully-functional 14-day free trial, or try our live online demo environment.

 

 

Tools in this post

Redgate Monitor

Real-time SQL Server and PostgreSQL performance monitoring, with alerts and diagnostics

Find out more